Skip to content

Klarity Craft storage

Tools and methods to access, store, manage data

Klarity Storage designates an environment where the data is stored and made accessible by the three other products and environments. Its purpose is to avoid duplication and data while facilitating and centralizing access rights management. KS product contain a client part and a server part. It's integrated inside Klarity craft documentation

Storage organisation

Organisation of storage associated to previous datamodel presented :

  • <space_key>/artefacts/<artefact_name>/<version_hash>/
  • <space_key>/models/<model_name>/<version_hash>/
  • <space_key>/tables/<table_name>/<parquet_files>
  • <space_key>/datasets/<dataset_name>/<version_hash>/
  • <space_key>/raw_binary/<table_name>/<row_index>/<binary_files> : binary files associated to a table raw we don't want to store in parquet format

Where is store this structure :

  • local, without accountability, traceability but to allow work during dev (see bellow caching concept)
  • in Workbench during version build process, with traceability meta_data

KS server

The ks server is an object storage, allowing to manage several bucket, depending on associated management strategy. ks-client shall be able to access to object in the bucket with 2 means :

  • with a dedicate s3 library (minio of boto3)
  • with https using the reverse proxy, and possibly dedicated reserve cache (read only requests)

What are the different kind of storage we need :

  • fond web object in differents versions (might be shared between different depoyments)
    • static content
    • transpile and package js files
    • css files
  • front web root files (for each deployment)
    • index.html pointing to right version of the js objects
  • content used to display artefact content :
    • images
    • overlay
    • parquet files
    • ...

Storage Client API

The client part consist of libraries in the different langages supported by klarity to access data elements. the following language are currently supported :

  • typescript
  • python

The following functionality are integrated inside the library:

  • configuration that allow to customize library / api behaviour
    • base_url : base url of the ks server
    • dataspace : list of accessible dataspace
    • datacache : list of dataspace table locally cache / redirected for test
    • user_credential : get user_credentials from env_var or .safenai or from project api
  • check / validate credential to propagate for ks server access
  • manage local caching to minimize network
  • upload new object in the storage
  • download object in the storage
  • getDataTables(dataspace_key, ):
    • input :
      • dataspace_key
    • return list of existing DataTable
  • getDataTable(dataspace_key | all, *filter options)
    • cacheTable(table_name, mode)
    • mode:
      • ro : local cache read_only for perfomance purpose
      • sandbox : local copy enabling writing without reference storage update for local computation test prior delivery
      • serialize in .safenai the cache configuration
  • getDataFrame(data_datatable, + filter options)
    • use datafusion-python, to provide a method allowing to get data
    • direct access to datafusion in MVP V2, a wrapping to add for V1
  • pushDataFrame(data_datatable, csv | panda | numpy | ...)

The KS client can be call from different environments :

  • from kd-font : the user web client with a dedicated ts library (only read access throw this medium)
  • from kd-back : to create / read some artefact content
  • from kc-craft : to create / read / access / some artefact content from user environment (external from safenai infrastructure)
  • from kc-craft : to create / read / access / some artefact content from safenai computation cluster

Klarity Storage access schema