AutoRAG is DataLinks’ retrieval-augmented generation agent. It answers natural language questions by searching across all datasets in a namespace, retrieving relevant data, and synthesizing a response using an LLM. You can use AutoRAG in two ways: through the DataLinks web platform, or programmatically via the API.Documentation Index
Fetch the complete documentation index at: https://docs.datalinks.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- A DataLinks account with an API token
- At least one dataset with data ingested into a namespace
- Your username, namespace name, and token ready
Using AutoRAG in the Web Platform
The simplest way to use AutoRAG is through the DataLinks dashboard. To open the AutoRAG chatbot, follow the steps below:- Log in to your DataLinks account.
- Click Connections Overview in the left sidebar.
- Select the namespace you want to query from the Namespace in Use dropdown in the top right corner.
- (Optional) Click the three dots at the top of the chatbot, then click Update helper prompt, enter your prompt, then click Update prompt.
Using AutoRAG via the API
Quick Start (Python)
API Reference
Endpoint:POST /query/autorag
| Parameter | Type | Required | Description |
|---|---|---|---|
username | string | Yes | Your DataLinks username |
namespace | string | Yes | Namespace to search across |
query | string | Yes | Natural language question |
helperPrompt | string | No | System-level instruction to guide the LLM’s behavior |
model | string | No | LLM model to use (e.g., gemini-3-flash-preview) |
provider | string | No | Model provider (e.g., gcloud, openai) |
steps array shows AutoRAG’s reasoning chain: what it searched for, the query it generated, and the data it retrieved at each step.
Streaming: POST /query/ask returns Server-Sent Events for real-time progress. Event types are plan (the query plan), step (each retrieval step as it executes), answer (the final response), and error (if something goes wrong). Use streaming for user-facing applications where you want to show progress rather than waiting for the full response.
How AutoRAG Works
When you submit a query, AutoRAG first interprets your question to identify the relevant entities and relationships it needs to find. It then retrieves data by searching across all datasets in your namespace using semantic search, following any links you have set up between datasets to discover related information. Finally, it synthesizes a response grounded in the actual data it retrieved, ensuring the answer is anchored to real records rather than hallucinated.AutoRAG searches all datasets in a namespace, not just one. You can split data across multiple datasets and it will search them all.
Preparing Data for AutoRAG
AutoRAG works best with simple, well-described datasets. Keep your schemas narrow wherever possible. Use descriptive column names likearticle_title rather than col1 so AutoRAG understands what each field contains. If possible, include at least one text-rich column with substantial content like paragraphs or descriptions, because semantic search needs meaningful text to match against.
Multi-Dataset Strategy
Create multiple datasets in the same namespace for different retrieval surfaces:- paragraphs — granular text chunks for detailed retrieval
- articles — full concatenated text for broad context
- entities — extracted people, places, dates for entity lookup
Connecting Datasets with Links
Links tell AutoRAG how datasets relate. Create links in both directions for best results. Always create links as a separate step after ingestion is complete. During ingestion, pass"link": {"ExactMatch": null} to enable auto-discovery. DataLinks will analyze the ingested data and identify columns across datasets that share matching values. You can then review the discovered links in the Connections Overview and choose which ones to save.
Using the helperPrompt
ThehelperPrompt is a system instruction that guides how the LLM reasons and formats answers. It does not affect retrieval, only the model’s response.
Example: Concise factual answers
helperPrompt tends to produce better results because the query sent to semantic search remains a clean question, without format instructions mixed in that could confuse retrieval.
Choosing a Model
AutoRAG supports multiple LLM backends. Available models may change over time; the following are currently supported:| Model | Provider | Characteristics |
|---|---|---|
gemini-3-flash-preview | gcloud | Best quality for QA tasks, slower |
gpt-5.2-2025-12-11 | openai | Faster, lower quality on factual QA |
Troubleshooting
“I am sorry” / Refusal Responses AutoRAG could not find relevant data to answer the question. This usually means the information is not in any dataset in the namespace, or the question uses different terminology than the data. Try rephrasing the question or adding ahelperPrompt that discourages refusals.
Timeouts (120s)
Complex queries across large or linked datasets can exceed the default timeout. Reduce the number of datasets or rows in the namespace, simplify dataset schemas, or reduce the number of active links.
Wrong Answers
AutoRAG retrieved data but the LLM reasoned incorrectly. Try adding a helperPrompt with reasoning guidance, checking whether the answer actually exists in your data, or verifying that links between datasets are correctly configured.
500 Errors
Usually caused by overly wide schemas (10+ columns). Keep datasets to 5 columns or fewer for reliable performance.
Best Practices
- Keep schemas narrow: fewer columns per dataset works best for AutoRAG answer retrieval.
- Use multiple datasets: rather than one wide one, AutoRAG searches all datasets in a namespace.
- Create bidirectional links: between datasets that share common fields.
- Keep queries clean: use
helperPromptfor format control rather than appending instructions to your query. - Include text-rich columns: semantic search needs substantial text to match against.
- Provide dataset descriptions: help AutoRAG understand the context of your data.
- Test incrementally: start with a small dataset, verify retrieval works, then scale up.