Skip to main content

1. DLConfig

DLConfig reads configurations (e.g., API keys) via environment variables or .env files. This enables dynamic adaptation across deployment environments.

2. DataLinksAPI

DataLinksAPI handles interactions with the API. You can:
  • Ingest data directly or via multipart upload for large files.
  • Track and wait for async ingestion completion.
  • Query or retrieve data with complex parameters.
  • Manage namespaces.

3. Inference Workflow

Use a chain of inference and validation steps defined through classes like ProcessUnstructured, Normalize, and Validate to automate data preparation workflows.
from datalinks.pipeline import Pipeline, ProcessUnstructured, Normalize, Validate, ValidateModes

# Define an inference pipeline
inference_steps = Pipeline(
   ProcessUnstructured(derive_from="source_field", helper_prompt="This extracts tables."),
   Normalize(target_cols={"email": "email_address"}, mode="all-in-one"),
   Validate(mode=ValidateModes.FIELDS, columns=["email", "phone"]),
)

4. Entity Resolution

Supports multiple resolution strategies, configurable via MatchTypeConfig:
from datalinks.links import MatchTypeConfig, ExactMatch

entity_resolution = MatchTypeConfig(
   # parameters are optional
    exact_match=ExactMatch(minVariation=0.2, minDistinct=0.3)
)

5. Loaders

Abstract base loaders (e.g., JSONLoader) allow seamless data ingestion from custom file formats like .json.

6. Parametrize LLMs

You can choose the model and provider to be used in inference steps (eg.: ProcessUnstructured, Normalize, Validate).

from datalinks.pipeline import Pipeline, ProcessUnstructured

steps = Pipeline(
        ProcessUnstructured(
            derive_from="text",
            helper_prompt="If you find a numeric field use only the value and omit the rest.",
            model="gpt-4.1-nano-2025-04-14",
            provider="openai"
        )
    )