Craft TODO

TODO : list of open question + put somewhere the content

how shall we map bucket
- 1 bucket for build deployed objects (js files)
- 1 bucket for each client to store the artefacts content.
- shall we provide access to each front deployement inside client bucket or in a dedicated one ?
- which configuration of the bucket shall we propose (archivage, access right, ...)
clarify uri rules for the proxy to bucket
- client_x.safenai.io/ : maps to client web site
- client_x.safenai.io/artefacts/ : maps artefact tree associated to the client
- common.safeanai.io/bundle/ : maps to the build version of web apps
- Front shall request data on demande with a caching strategy
  - Not too big / Not too small bloc
  - Check how we manage cache strategy inside Navigator.
- Packaging of data with request proximity to prevent segmentation
  - Each "version" shall have
    - a package of all essentials data in one file
    - a package of all traçability / decision associated with
      - how to identify data not to be store inside the version package
      - integrate small definition of artefact in this file
  - Packaging dataset with sample that are referenced inside version
    - TODO : How to reuse data plateforme => action urgente
  - All data are stored inside the S3, some of them loaded inside Database to fasten read access
    - With arberescence of versions inside the bucket
      - shall represent the following path
        project_name
        version_id
        Artefact_type
        Artefact_id
    - With access to data outside the project / shared by projects : reuse data plateforme ?
      - data samples and dataset
        data samples :
        raw data + acquisition meta_data ( immutable )
        meta_data add to raw data (including anotations) (change version of the data sample)
        metrics computed for a sample (associated with a model / tools / component version => To manage ? )
        dataset
        list of used data samples ( version of the dataset)
        metrics computed on the dataset (version with of associted tools + metrics)
        fondation models
    Questions :
    - How to manage access right access with https to the bucket
    - Shall we use S3 api instead of https from the front ?

TODO for blueprint

Model training
- Training diffusion / GAN models ==> Learn how to generate sample in the distribution of each of the subsample ==> add context to generatation from au context computed before ==> Train the model to identify the context (if intrinsec / computed ) ==> Split the latent space between
  - Contextual information discovery
  - Robust to perturbation information
  - perturbation evaluation (attack intensity)
- Training anomaly detection models ==> Add a dedicatec head to identify difficulty to reconstruct sample the sample (unkown unknown) ==> dedicate objectifs, probability of anomaly + type of anomaly ==> location / positions, feature that lead to an anomaly ==>
- Training OOD model (detection out of distribution model) ==> Using specially OOD domain generated and/or identified sample ==> Classifier
How data are managed
- Data shall be imported in KS
- Data processing pipeline shall be use to transform / complete a Data Exemple ==> Configuration of the pipeline - possible trigger (on new data) - on datatransform change - cache storage / traçability policy / (keep reference of origin data or a copy) - can rely on a human (annotation) HOW - DAG creation (Airflow ? / MLFlow ? ) - Generate a graph of data dependency
- Use generic dataset to build auxiliary models

How to get information to build the data process - ODD perturbation to acquisitions process - Do you already know factor that influance anomaly - Do you have a list of anomalies, - Is there a description of these types of anomalies - Can you provide a description of several anomalies - can you provide these data - As much as possible data without anomalies - at least several anomalies already detected

Project steps - gather preliminary information from craftmanship with several sample - create preliminary datapipeline - gather a first bunch of sample (valid / invalid sample already identified is optionnal) - gather several sample of anomalies, with description of the anomalies

- using klarity split to try to identify in the valid data (with or without anomalies) 
	| train several models on different subset and use other subset to indentify anomalies 
- generate a bunsh of primary metrics 
	- density in the distance evaluation space (t-snee or equivalent)  
	- several generated valid sample to check problem understanding / use them as anomalies / reject them ==> classifier of generated samples 
	- Using different classification during training and or between models, select sample of interest 
	- Anomalies metrics identification 
	==> Confusion matrics 
	==> Confirmal prediction probability
	
	
- proposal of multi dimensional data generator for anotation / comment
		- based on the split + t-snee density of auto encoder generator 
		- request user in order to try to name / describe these dimension 
		- identify zone of weak density in this distribution and ask user of the following 
			- Is it inside the ODD or OOD ? 
			- Do you have more shample of these type 
			- Are these genereted sample representatif / clause ? 

- Use a firt anotation job (or use de deployed shadow version to gather anotations)
- Using this bach generate the first realistic AI component associated metrics 
	- Metrics for expert evaluation of FP / TN 
	- Metrics for dataset quality evaluations 
	- Metrics for ODD coverage 
	- Projection of expected error rate if put in production
	- Comparison with operator error rate.
	- Generate a Benefice / Cost metric base on these parameter provided in scoping 
		- Ratio of quality check post production
		- Cost of a FP Recall
		- Cost of a FP undetected
		- Cost of a TN rework generated
		- Cost of operator validation
		- Cost of quality inspection of a sample 
	- Identify several possible action to improve the AI Component

Value proposal : - We configure / deploy our solution in interaction with your production environment - You interact with our product providing sample, several anotation, expert knolodge
- Our product with our team support generate a AI component - We provide the component that shall be deploy in you environment or we provide a per sample inference cost. - We can put a key on the ROI generated (several cost / benefice model) exemple : - Inference cost is link to performance and ROI computed whith the client, shared interest to perform action that improve the performance.

TODO : ==> Add specific metrics that provide a template rational to gather user information, exemple

the metric is a list of parameters to validate in scoping
The metric is displayed as a form, and an action allow custom creation of a rational
The rational is reviewed and validated

This metric shall be used to gather - Cost operation metrics - Proces yes/no verifications (AI Act) - Risque evaluation (High / Low) ...

QUESTIONS :

shall we rename metric as artefact ?

Visual inspection

Dashboard

Craft

Workbench

User Documentation

Craft TODO

TODO for blueprint

Craft TODO ​

TODO for blueprint ​

Craft TODO

TODO for blueprint