Klarity Craft Concepts

Dataspace

Concept used to group datas that share several propoerties in a hierarchy

Data can be in only one data space
A project can be granted access to a data space
Read Only
Modify
Each project has a private dataspace

DataColumn

All data associated to a column share the same data types
They share a set of properties, tags,
list of columns tags (❓ name ?) in klarity :
- index : a value use to associated data item from several column, in can be unique identifier, or something else TODO
raw : data acquire, not manipulated in klarity
input : data that can be use as an input of the models
context : data that can not be use by the models, but by the component
computed : data that are computed (we might be able to reproduce these column)
outputs : computed column that are generated but the AI component or model (TODO : shall we differenciate ?)
metrics : computed column not directly needed by the AI function, but provided for observability
scoping / develpment / operation : data available in the associated scope
human : data provide by human action on other raw data (example annotation)
format constraints :
- a parquet columns logical type (see parquet)
- an image / video (F, C, W, H) : Frame, Canal, Widht, Hight, + precision(nb bits) + associated meta data
- a time series (timestamp, values) (each value shall be a scalar, with a specific precision supported list define using logical type
- text (utf8) Nota : images / time series shall also be sometime store inside parquet, but we can use raw values
For data not store as logical data inside the parquet file, a relative path in dataspace will be stored in the column

DataTable

A set of columns that that regroup

one column shall be use as an index with unique values inside
a data table shall be split in sevel parquet file if needed sharing the same uri request
they shall have following properties :
- version : the current version of the Data Table
- previous_version
- List of source table (if Columns come from other Data Table, or assciated compute contex (see Computed Data Item)
- List of Craft Item dependency with dependency type
  - usage
  - copy
  - TODO : other ?
List of Craft Item using this Craft Item (NOT IMMUTABLE, compute or store in klarity DB to identify Data Table or Models that depend on this Data Table with dependency type

Concrete DataTable

Concrete DataTable are immutable, and group in raw DataItem acquire / generate at the same time Concrete DataTable are store in parquet files

Virtual DataTable

Virtual DataTable provide the join request that generate the DataTable from other DataTable

can be store for caching purpose

Usage example : A training dataset can be the following joind table

TrainIDSelection : a DataTable containing DataSampleIndex selected for a specific training
RawInput : a DataTable containing acquired data for the use case
GroundTruth : a DataTable containing expected result from the training.

DataTable Connector

Provide the same interface content that a Data Table but from an external source.
several element provide from configuration or manuel operation.

DataSample

Group of concrete values associeted to DataColumn that share the same index, it can be access as a raw of a DataTable.

has a unique index_key in the DataTable
Can be load (by group, or alone) from the data space usign index, or by enemation
Data Raw are not stored by default, they are provided to klarity application with ks
For performance optimization, set of DataSample can be serialize in a dedicated Data Table
A DataSample can agregate column store in different parquet that share the same index values
A DataSample (or a list of DataSample) is the result of a request with the following elements
request id : (index in alone, batch_id if batch, and associated function if code needed) TODO : clarify
- The list of column needed
  - parquet file(s) source of the column
  - optional status
- How are select the parquet file to use
- How the selected index are selected

DataItem

DataItem are colums values create / acquires / compute at the same time and group in a same concrete DataTable We have different kind of DataItem

A Raw Data item ( unitary data acquisition)
- An atomic group of values that are acquired at the same time
- associated to a unique identifier in klarity => TODO : how we build it, comprize size / quality / usability
- associated context as dedicated column
  - who integrate the data Item, when
  - Who own the data item
  - A set of tags, properties
Computed Data Item
- Data item that are computed from other Data Item in an Action
- Computed Data Item can be stored in one or several Data Column
- Computed Data Item shall integrate a colum to reference the index (or list of index) use during computation for each row
- they share a set of properties and tags
- a Column store the index of Associated Action instance
- Properties associated to the Action are store in a seperate DataTable:
  - algo / version use to compute
  - associated algo parameters
  - model / version associed (if used)
  - compute time per Data Item (optional)
- list of columns tags in klarity specific to compute raw :
  - TODO : not yet identified
- List of tag (agregate tags from columns, several tags can be put at table level)
- frozen : when version is put on the Table (we can have rc version before frozing table)
- obsolete : when a update configuration make this Table obsolete
- can be duplicated for cache optimization
- is immutable once frozen (any change shall generate a new version)

Exemple of DataItem

Raw data from sensors

N Dim tensor of values
1 of the dimension can be a sequence (see lantency in context data)
meta data associated to the
- sensors parameters
- sensor_id / system_id / ...
- operator_id
lantency ( single value of sequence to registerd delta T between data)
- context of acquisition
  - time (if not in a sequence and computed from order in the sequence)
  - position / dx|dy/dz

DataItem Sequence

When sequence of data item hase a meaning (time series uses cases, video) and relative position of a raw in time shall be keep in mind.

Data sequence shall not be split during import process in order to preserve semantic information of sequence ==> Sequence split shall be perform to generate dataset, and performe evaluation ==> timing depending on blueprint configuration define time with timestemp or sequence order

KCA Klarity Craft Action

Concept to define an operation inside with klarity craft, that base on exiting KCI generate new KCI

These Actions can be trigerred from differents sources using underlying blueprint configuration:

with CI base on modification in the dependency graph of KCI
from a user command (with python API / web API ) to provide an action in klarity craft.
Exemple of actions:
Apply project configuration change (specific cf Project Configuration)
Import new Data Sample
Add data annotation (with dedicated GUI on connection to external tools)
Compute metrics on Data
Create Training data selection (e.g. using Debiai)
Create Test selection (with python API)

Each of these action are available from project configuration :

Existing Action imported from blueprint
Surcharged Action from the blueprint
New action created for the project

Action has at least the following parameters:

a username : used for logs, execution monitoring status
associated capability => for access control post V1
- inputs of the action
  - from user
  - from Klarity Storage (fixed or variable)
- outputs of the action
  - to user
  - inside Klarity Storage (fixed or variable)
- an action has the constraint of simultaneous execution (1, n, on different target, ...)
- action can be runned automatically on data change in Klarity

example of configuration :

name : 'create_test_selection'
- inputs :
  - test_selection_name
  - Data Table source
  - selections of ids
  - optional transformation to perform (ex : blurring, ...)
- output :
  - Dataset

KCI : Klarity Craft Item

Represent an object that can be reference as a dependency from other Klarity Craft Item
- The following object are Klarity Craft Item
  - Data Table
  - Model
  - Composant
  - Dataset ( a kind / extract / group of Data Table)
  - Action
  - Artefact
It can be meta object that can be Dataset / Models / Component / Data Table
- can have following tags
  - immutable
- Rules that apply to Items
  - A immutable item can not depend from item without the immutable tag

Visual inspection

Dashboard

Craft

Workbench

User Documentation

Klarity Craft Concepts

Dataspace

DataColumn

DataTable

Concrete DataTable

Virtual DataTable

DataTable Connector

DataSample

DataItem

Exemple of DataItem

DataItem Sequence

KCA Klarity Craft Action

KCI : Klarity Craft Item

Klarity Craft Concepts ​

Dataspace ​

DataColumn ​

DataTable ​

Concrete DataTable ​

Virtual DataTable ​

DataTable Connector ​

DataSample ​

DataItem ​

Exemple of DataItem ​

DataItem Sequence ​

KCA Klarity Craft Action ​

KCI : Klarity Craft Item ​

Klarity Craft Concepts

Dataspace

DataColumn

DataTable

Concrete DataTable

Virtual DataTable

DataTable Connector

DataSample

DataItem

Exemple of DataItem

DataItem Sequence

KCA Klarity Craft Action

KCI : Klarity Craft Item