Skip to main content
AutoRAG is DataLinks’ retrieval-augmented generation agent. It answers natural language questions by searching across all datasets in a namespace, retrieving relevant data, and synthesizing a response using an LLM. You can use AutoRAG in two ways: through the DataLinks web platform, or programmatically via the API.

Prerequisites

  • A DataLinks account with an API token
  • At least one dataset with data ingested into a namespace
  • Your username, namespace name, and token ready

Using AutoRAG in the Web Platform

The simplest way to use AutoRAG is through the DataLinks dashboard. To open the AutoRAG chatbot, follow the steps below:
  1. Log in to your DataLinks account.
  2. Click Connections Overview in the left sidebar.
  3. Select the namespace you want to query from the Namespace in Use dropdown in the top right corner.
  4. (Optional) Click the three dots at the top of the chatbot, then click Update helper prompt, enter your prompt, then click Update prompt.
Type a natural language question into the “Ask a question about your data…” input at the bottom and click send. AutoRAG will search across all datasets in the selected namespace and return an answer directly in the chat panel. This is ideal for ad-hoc exploration, verifying that your data is queryable, and testing questions before building them into an automated pipeline.

Using AutoRAG via the API

Quick Start (Python)

import requests

API_BASE = "https://api.datalinks.com/api/v1"
TOKEN = "your-api-token"

response = requests.post(
    f"{API_BASE}/query/autorag",
    headers={"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"},
    json={
        "username": "your-username",
        "namespace": "your-namespace",
        "query": "Which supplier serves both European and Asian operations?"
    },
    timeout=120
)

result = response.json()
print(result["response"])

API Reference

Endpoint: POST /query/autorag
ParameterTypeRequiredDescription
usernamestringYesYour DataLinks username
namespacestringYesNamespace to search across
querystringYesNatural language question
helperPromptstringNoSystem-level instruction to guide the LLM’s behavior
modelstringNoLLM model to use (e.g., gemini-3-flash-preview)
providerstringNoModel provider (e.g., gcloud, openai)
Response:
{
  "response": "Acme Corp serves both European and Asian operations.",
  "steps": [
    {
      "instruction": "Search for suppliers operating in Europe",
      "query": "Ontology(\"suppliers\").filter(region == \"Europe\")",
      "data": [{"name": "Acme Corp", "region": "Europe"}]
    },
    {
      "instruction": "Check which of these also operate in Asia",
      "query": "Ontology(\"suppliers\").filter(name == \"Acme Corp\", region == \"Asia\")",
      "data": [{"name": "Acme Corp", "region": "Asia"}]
    }
  ]
}
The steps array shows AutoRAG’s reasoning chain: what it searched for, the query it generated, and the data it retrieved at each step. Streaming: POST /query/ask returns Server-Sent Events for real-time progress. Event types are plan (the query plan), step (each retrieval step as it executes), answer (the final response), and error (if something goes wrong). Use streaming for user-facing applications where you want to show progress rather than waiting for the full response.

How AutoRAG Works

When you submit a query, AutoRAG first interprets your question to identify the relevant entities and relationships it needs to find. It then retrieves data by searching across all datasets in your namespace using semantic search, following any links you have set up between datasets to discover related information. Finally, it synthesizes a response grounded in the actual data it retrieved, ensuring the answer is anchored to real records rather than hallucinated.
AutoRAG searches all datasets in a namespace, not just one. You can split data across multiple datasets and it will search them all.

Preparing Data for AutoRAG

AutoRAG works best with simple, well-described datasets. Keep your schemas narrow wherever possible. Use descriptive column names like article_title rather than col1 so AutoRAG understands what each field contains. If possible, include at least one text-rich column with substantial content like paragraphs or descriptions, because semantic search needs meaningful text to match against.
When creating a dataset, provide clear data descriptions and field definitions wherever possible so AutoRAG has context about what your data represents.

Multi-Dataset Strategy

Create multiple datasets in the same namespace for different retrieval surfaces:
  • paragraphs — granular text chunks for detailed retrieval
  • articles — full concatenated text for broad context
  • entities — extracted people, places, dates for entity lookup
Links tell AutoRAG how datasets relate. Create links in both directions for best results. Always create links as a separate step after ingestion is complete. During ingestion, pass "link": {"ExactMatch": null} to enable auto-discovery. DataLinks will analyze the ingested data and identify columns across datasets that share matching values. You can then review the discovered links in the Connections Overview and choose which ones to save.

Using the helperPrompt

The helperPrompt is a system instruction that guides how the LLM reasons and formats answers. It does not affect retrieval, only the model’s response. Example: Concise factual answers
Answer with ONLY the exact name, number, or term asked for.
Maximum 1-3 words. Do not add titles, qualifiers, or extra context.
Give the shortest commonly-used form of names.
Just the bare factual answer, nothing else.
Example: Multi-hop reasoning
You answer multi-hop factual questions. Each question requires
connecting information across multiple passages.

Before answering, trace the full reasoning chain: identify every
entity, find what the context says about each, and follow the
chain to its conclusion.

Answer rules:
- 1-3 words only
- Exactly one answer, never a list
- Use the most commonly known form of names
- No titles, qualifiers, categories, or explanations
Example: Detailed analytical answers
Provide a thorough answer based on the retrieved data.
Include specific data points, dates, and names where available.
If the data contains conflicting information, note the discrepancy.
Cite which records your answer draws from.
helperPrompt vs query suffix: You can also append instructions directly to the query string (e.g., “Who founded Acme Corp? Answer in 1-3 words.”). However, helperPrompt tends to produce better results because the query sent to semantic search remains a clean question, without format instructions mixed in that could confuse retrieval.

Choosing a Model

AutoRAG supports multiple LLM backends. Available models may change over time; the following are currently supported:
ModelProviderCharacteristics
gemini-3-flash-previewgcloudBest quality for QA tasks, slower
gpt-5.2-2025-12-11openaiFaster, lower quality on factual QA
If omitted, DataLinks uses its default model. Specify with:
{
    "model": "gemini-3-flash-preview",
    "provider": "gcloud"
}

Troubleshooting

“I am sorry” / Refusal Responses AutoRAG could not find relevant data to answer the question. This usually means the information is not in any dataset in the namespace, or the question uses different terminology than the data. Try rephrasing the question or adding a helperPrompt that discourages refusals. Timeouts (120s) Complex queries across large or linked datasets can exceed the default timeout. Reduce the number of datasets or rows in the namespace, simplify dataset schemas, or reduce the number of active links. Wrong Answers AutoRAG retrieved data but the LLM reasoned incorrectly. Try adding a helperPrompt with reasoning guidance, checking whether the answer actually exists in your data, or verifying that links between datasets are correctly configured. 500 Errors Usually caused by overly wide schemas (10+ columns). Keep datasets to 5 columns or fewer for reliable performance.

Best Practices

  1. Keep schemas narrow: fewer columns per dataset works best for AutoRAG answer retrieval.
  2. Use multiple datasets: rather than one wide one, AutoRAG searches all datasets in a namespace.
  3. Create bidirectional links: between datasets that share common fields.
  4. Keep queries clean: use helperPrompt for format control rather than appending instructions to your query.
  5. Include text-rich columns: semantic search needs substantial text to match against.
  6. Provide dataset descriptions: help AutoRAG understand the context of your data.
  7. Test incrementally: start with a small dataset, verify retrieval works, then scale up.