Klarity Craft Concepts
Dataspace
Concept used to group datas that share several propoerties in a hierarchy
- Data can be in only one data space
- A project can be granted access to a data space
- Read Only
- Modify
- Each project has a private dataspace
DataColumn
- All data associated to a column share the same data types
- They share a set of properties, tags,
- list of columns tags (❓ name ?) in klarity :
- index : a value use to associated data item from several column, in can be unique identifier, or something else TODO
- raw : data acquire, not manipulated in klarity
- input : data that can be use as an input of the models
- context : data that can not be use by the models, but by the component
- computed : data that are computed (we might be able to reproduce these column)
- outputs : computed column that are generated but the AI component or model (TODO : shall we differenciate ?)
- metrics : computed column not directly needed by the AI function, but provided for observability
- scoping / develpment / operation : data available in the associated scope
- human : data provide by human action on other raw data (example annotation)
- format constraints :
- a parquet columns logical type (see parquet)
- an image / video (F, C, W, H) : Frame, Canal, Widht, Hight, + precision(nb bits) + associated meta data
- a time series (timestamp, values) (each value shall be a scalar, with a specific precision supported list define using logical type
- text (utf8) Nota : images / time series shall also be sometime store inside parquet, but we can use raw values
- For data not store as logical data inside the parquet file, a relative path in dataspace will be stored in the column
DataTable
A set of columns that that regroup
- one column shall be use as an index with unique values inside
- a data table shall be split in sevel parquet file if needed sharing the same uri request
- they shall have following properties :
- version : the current version of the Data Table
- previous_version
- List of source table (if Columns come from other Data Table, or assciated compute contex (see Computed Data Item)
- List of Craft Item dependency with dependency type
- usage
- copy
- TODO : other ?
- List of Craft Item using this Craft Item (NOT IMMUTABLE, compute or store in klarity DB to identify Data Table or Models that depend on this Data Table with dependency type
Concrete DataTable
Concrete DataTable are immutable, and group in raw DataItem acquire / generate at the same time Concrete DataTable are store in parquet files
Virtual DataTable
Virtual DataTable provide the join request that generate the DataTable from other DataTable
- can be store for caching purpose
Usage example : A training dataset can be the following joind table
- TrainIDSelection : a DataTable containing DataSampleIndex selected for a specific training
- RawInput : a DataTable containing acquired data for the use case
- GroundTruth : a DataTable containing expected result from the training.
DataTable Connector
- Provide the same interface content that a Data Table but from an external source.
- several element provide from configuration or manuel operation.
DataSample
Group of concrete values associeted to DataColumn that share the same index, it can be access as a raw of a DataTable.
has a unique index_key in the DataTable
Can be load (by group, or alone) from the data space usign index, or by enemation
Data Raw are not stored by default, they are provided to klarity application with ks
For performance optimization, set of DataSample can be serialize in a dedicated Data Table
A DataSample can agregate column store in different parquet that share the same index values
A DataSample (or a list of DataSample) is the result of a request with the following elements
request id : (index in alone, batch_id if batch, and associated function if code needed) TODO : clarify
- The list of column needed
- parquet file(s) source of the column
- optional status
- How are select the parquet file to use
- How the selected index are selected
- The list of column needed
DataItem
DataItem are colums values create / acquires / compute at the same time and group in a same concrete DataTable We have different kind of DataItem
A Raw Data item ( unitary data acquisition)
- An atomic group of values that are acquired at the same time
- associated to a unique identifier in klarity => TODO : how we build it, comprize size / quality / usability
- associated context as dedicated column
- who integrate the data Item, when
- Who own the data item
- A set of tags, properties
Computed Data Item
Data item that are computed from other Data Item in an Action
Computed Data Item can be stored in one or several Data Column
Computed Data Item shall integrate a colum to reference the index (or list of index) use during computation for each row
they share a set of properties and tags
a Column store the index of Associated Action instance
Properties associated to the Action are store in a seperate DataTable:
- algo / version use to compute
- associated algo parameters
- model / version associed (if used)
- compute time per Data Item (optional)
list of columns tags in klarity specific to compute raw :
- TODO : not yet identified
List of tag (agregate tags from columns, several tags can be put at table level)
frozen : when version is put on the Table (we can have rc version before frozing table)
obsolete : when a update configuration make this Table obsolete
can be duplicated for cache optimization
is immutable once frozen (any change shall generate a new version)
Exemple of DataItem
Raw data from sensors
- N Dim tensor of values
- 1 of the dimension can be a sequence (see lantency in context data)
- meta data associated to the
- sensors parameters
- sensor_id / system_id / ...
- operator_id
- lantency ( single value of sequence to registerd delta T between data)
- context of acquisition
- time (if not in a sequence and computed from order in the sequence)
- position / dx|dy/dz
- context of acquisition
DataItem Sequence
When sequence of data item hase a meaning (time series uses cases, video) and relative position of a raw in time shall be keep in mind.
Data sequence shall not be split during import process in order to preserve semantic information of sequence ==> Sequence split shall be perform to generate dataset, and performe evaluation ==> timing depending on blueprint configuration define time with timestemp or sequence order
KCA Klarity Craft Action
Concept to define an operation inside with klarity craft, that base on exiting KCI generate new KCI
These Actions can be trigerred from differents sources using underlying blueprint configuration:
with CI base on modification in the dependency graph of KCI
from a user command (with python API / web API ) to provide an action in klarity craft.
Exemple of actions:
Apply project configuration change (specific cf Project Configuration)
Import new Data Sample
Add data annotation (with dedicated GUI on connection to external tools)
Compute metrics on Data
Create Training data selection (e.g. using Debiai)
Create Test selection (with python API)
Each of these action are available from project configuration :
- Existing Action imported from blueprint
- Surcharged Action from the blueprint
- New action created for the project
Action has at least the following parameters:
- a username : used for logs, execution monitoring status
- associated capability => for access control post V1
- inputs of the action
- from user
- from Klarity Storage (fixed or variable)
- outputs of the action
- to user
- inside Klarity Storage (fixed or variable)
- an action has the constraint of simultaneous execution (1, n, on different target, ...)
- action can be runned automatically on data change in Klarity
- inputs of the action
example of configuration :
- name : 'create_test_selection'
- inputs :
- test_selection_name
- Data Table source
- selections of ids
- optional transformation to perform (ex : blurring, ...)
- output :
- Dataset
- inputs :
KCI : Klarity Craft Item
Represent an object that can be reference as a dependency from other Klarity Craft Item
- The following object are Klarity Craft Item
- Data Table
- Model
- Composant
- Dataset ( a kind / extract / group of Data Table)
- Action
- Artefact
- The following object are Klarity Craft Item
It can be meta object that can be Dataset / Models / Component / Data Table
- can have following tags
- immutable
- Rules that apply to Items
- A immutable item can not depend from item without the immutable tag
- can have following tags