Features
- Ingestion API: Easily ingest data into namespaces with built-in batching and retry mechanisms.
- Inference Workflow Management: Define custom chains of inference and validation steps.
- Entity Resolution: Match entities using configurable exact or geo-based matching methods.
- Namespace Management: Create and manage namespaces with privacy options.
- Data Querying: Query data with options to include/exclude metadata.
- Custom Loaders: Load custom data formats like JSON into defined workflows.
- CLI Tool: Standardized command-line interface for managing ingestion pipelines quickly.
Installation
To install the SDK, simply usepip
:
- Clone the repository from your version-control system.
- Create a virtual environment with your tool/distro of choice.
- Run the following:
Quick Start
Here’s how to get started with the DataLinks SDK:-
Configuration
Ensure you have your required environment variables set up for the DataLinks API:
HOST
DL_API_KEY
NAMESPACE
OBJECT_NAME
(optional)
.env
file in the root of your project for configuration. - Basic Example Import the SDK and initialize the configuration:
- CLI Usage The SDK also provides a built-in CLI that can be extended:
Components
1. DLConfig
DLConfig
reads configurations (e.g., API keys) via environment variables or .env
files. This enables dynamic adaptation across deployment environments.
2. DataLinksAPI
DataLinksAPI
handles interactions with the API. You can:
- Ingest data.
- Query or retrieve data with complex parameters.
- Manage namespaces.
3. Inference Workflow
Use a chain of inference and validation steps defined through classes likeProcessUnstructured
, Normalize
, and Validate
to automate data preparation workflows.
4. Entity Resolution
Supports multiple resolution strategies, configurable viaMatchTypeConfig
:
5. Loaders
Abstract base loaders (e.g.,JSONLoader
) allow seamless data ingestion from custom file formats like .json
.
6. Parametrize LLms
You can choose the model and provider to be used in inference steps (eg.:ProcessUnstructured
, Normalize
, Validate
).