Skip to content

Klarity Craft Concepts

Dataspace

Concept used to group datas that share several propoerties in a hierarchy

  • Data can be in only one data space
  • A project can be granted access to a data space
  • Read Only
  • Modify
  • Each project has a private dataspace

DataColumn

  • All data associated to a column share the same data types
  • They share a set of properties, tags,
  • list of columns tags (❓ name ?) in klarity :
    • index : a value use to associated data item from several column, in can be unique identifier, or something else TODO
  • raw : data acquire, not manipulated in klarity
  • input : data that can be use as an input of the models
  • context : data that can not be use by the models, but by the component
  • computed : data that are computed (we might be able to reproduce these column)
  • outputs : computed column that are generated but the AI component or model (TODO : shall we differenciate ?)
  • metrics : computed column not directly needed by the AI function, but provided for observability
  • scoping / develpment / operation : data available in the associated scope
  • human : data provide by human action on other raw data (example annotation)
  • format constraints :
    • a parquet columns logical type (see parquet)
    • an image / video (F, C, W, H) : Frame, Canal, Widht, Hight, + precision(nb bits) + associated meta data
    • a time series (timestamp, values) (each value shall be a scalar, with a specific precision supported list define using logical type
    • text (utf8) Nota : images / time series shall also be sometime store inside parquet, but we can use raw values
  • For data not store as logical data inside the parquet file, a relative path in dataspace will be stored in the column

DataTable

A set of columns that that regroup

  • one column shall be use as an index with unique values inside
  • a data table shall be split in sevel parquet file if needed sharing the same uri request
  • they shall have following properties :
    • version : the current version of the Data Table
    • previous_version
    • List of source table (if Columns come from other Data Table, or assciated compute contex (see Computed Data Item)
    • List of Craft Item dependency with dependency type
      • usage
      • copy
      • TODO : other ?
  • List of Craft Item using this Craft Item (NOT IMMUTABLE, compute or store in klarity DB to identify Data Table or Models that depend on this Data Table with dependency type

Concrete DataTable

Concrete DataTable are immutable, and group in raw DataItem acquire / generate at the same time Concrete DataTable are store in parquet files

Virtual DataTable

Virtual DataTable provide the join request that generate the DataTable from other DataTable

  • can be store for caching purpose

Usage example : A training dataset can be the following joind table

  • TrainIDSelection : a DataTable containing DataSampleIndex selected for a specific training
  • RawInput : a DataTable containing acquired data for the use case
  • GroundTruth : a DataTable containing expected result from the training.

DataTable Connector

  • Provide the same interface content that a Data Table but from an external source.
  • several element provide from configuration or manuel operation.

DataSample

Group of concrete values associeted to DataColumn that share the same index, it can be access as a raw of a DataTable.

  • has a unique index_key in the DataTable

  • Can be load (by group, or alone) from the data space usign index, or by enemation

  • Data Raw are not stored by default, they are provided to klarity application with ks

  • For performance optimization, set of DataSample can be serialize in a dedicated Data Table

  • A DataSample can agregate column store in different parquet that share the same index values

  • A DataSample (or a list of DataSample) is the result of a request with the following elements

  • request id : (index in alone, batch_id if batch, and associated function if code needed) TODO : clarify

    • The list of column needed
      • parquet file(s) source of the column
      • optional status
    • How are select the parquet file to use
    • How the selected index are selected

DataItem

DataItem are colums values create / acquires / compute at the same time and group in a same concrete DataTable We have different kind of DataItem

  • A Raw Data item ( unitary data acquisition)

    • An atomic group of values that are acquired at the same time
    • associated to a unique identifier in klarity => TODO : how we build it, comprize size / quality / usability
    • associated context as dedicated column
      • who integrate the data Item, when
      • Who own the data item
      • A set of tags, properties
  • Computed Data Item

    • Data item that are computed from other Data Item in an Action

    • Computed Data Item can be stored in one or several Data Column

    • Computed Data Item shall integrate a colum to reference the index (or list of index) use during computation for each row

    • they share a set of properties and tags

    • a Column store the index of Associated Action instance

    • Properties associated to the Action are store in a seperate DataTable:

      • algo / version use to compute
      • associated algo parameters
      • model / version associed (if used)
      • compute time per Data Item (optional)
    • list of columns tags in klarity specific to compute raw :

      • TODO : not yet identified
    • List of tag (agregate tags from columns, several tags can be put at table level)

    • frozen : when version is put on the Table (we can have rc version before frozing table)

    • obsolete : when a update configuration make this Table obsolete

    • can be duplicated for cache optimization

    • is immutable once frozen (any change shall generate a new version)

Exemple of DataItem

Raw data from sensors

  • N Dim tensor of values
  • 1 of the dimension can be a sequence (see lantency in context data)
  • meta data associated to the
    • sensors parameters
    • sensor_id / system_id / ...
    • operator_id
  • lantency ( single value of sequence to registerd delta T between data)
    • context of acquisition
      • time (if not in a sequence and computed from order in the sequence)
      • position / dx|dy/dz

DataItem Sequence

When sequence of data item hase a meaning (time series uses cases, video) and relative position of a raw in time shall be keep in mind.

Data sequence shall not be split during import process in order to preserve semantic information of sequence ==> Sequence split shall be perform to generate dataset, and performe evaluation ==> timing depending on blueprint configuration define time with timestemp or sequence order

KCA Klarity Craft Action

Concept to define an operation inside with klarity craft, that base on exiting KCI generate new KCI

These Actions can be trigerred from differents sources using underlying blueprint configuration:

  • with CI base on modification in the dependency graph of KCI

  • from a user command (with python API / web API ) to provide an action in klarity craft.

    Exemple of actions:

  • Apply project configuration change (specific cf Project Configuration)

  • Import new Data Sample

  • Add data annotation (with dedicated GUI on connection to external tools)

  • Compute metrics on Data

  • Create Training data selection (e.g. using Debiai)

  • Create Test selection (with python API)

Each of these action are available from project configuration :

  • Existing Action imported from blueprint
  • Surcharged Action from the blueprint
  • New action created for the project

Action has at least the following parameters:

  • a username : used for logs, execution monitoring status
  • associated capability => for access control post V1
    • inputs of the action
      • from user
      • from Klarity Storage (fixed or variable)
    • outputs of the action
      • to user
      • inside Klarity Storage (fixed or variable)
    • an action has the constraint of simultaneous execution (1, n, on different target, ...)
    • action can be runned automatically on data change in Klarity

example of configuration :

  • name : 'create_test_selection'
    • inputs :
      • test_selection_name
      • Data Table source
      • selections of ids
      • optional transformation to perform (ex : blurring, ...)
    • output :
      • Dataset

KCI : Klarity Craft Item

  • Represent an object that can be reference as a dependency from other Klarity Craft Item

    • The following object are Klarity Craft Item
      • Data Table
      • Model
      • Composant
      • Dataset ( a kind / extract / group of Data Table)
      • Action
      • Artefact
  • It can be meta object that can be Dataset / Models / Component / Data Table

    • can have following tags
      • immutable
    • Rules that apply to Items
      • A immutable item can not depend from item without the immutable tag