When uploading data, you can derive new columns from existing ones using our AI for automatic derivations. To do this, you’ll need to specify the desired columns and the steps you want to execute. We support several steps including Table, Rows, Normalize, Validate, ReverseGeo, and column inference.
Copy
Ask AI
curl -H "Authorization: Bearer token1234abcdefg" -X POST https://api.datalinks.com/api/ingest/namespace/datasetName --data '{ "data": [{"col1": 123, "col2": "foo", "colWithATable": "direction,size;up,large"}, {"col1": 444, "col2": "bar"}], "infer": { "steps": [{ "type": "table", "deriveFrom": "colWithATable", "helperPrompt": "This text contains a table comma separated, but instead of line breaks we are using a ';'." }] } }'
The inference can have multiple steps which are combined by our AI. The example above extracts a simple table, but you can have other steps such as table, rows, normalize, validate, and reverseGeo.
In this specific example we specify we want a table extracted from a specific column. This will extract all columns available from the text provided. You can use the helperPrompt to guide the AI system to have higher accuracy during extraction.
The Table Step is designed to extract a table from any text input, whether it’s free-form text such as a restaurant review, a flight log report, or a financial instrument notice. When used independently, this step may produce columns that are somewhat arbitrary and often require normalization afterward. It functions by extracting data to create new columns and appending them to the rows of the existing table.
Pro tip: Calling an extraction with just the table step is a great way to see what kind of structured data can be generated from your unstructured data.
Copy
Ask AI
{ "type": "table", "deriveFrom": "columnName from where the column should be derived from", "helperPrompt": "This text contains a table comma separated, but instead of line breaks we are using a ';'."}
If a table is stored in JSON format within a column, you can directly transform it into a structured table. The JSON should consist of an array of objects, with each key in the objects mapped to a new column.
Copy
Ask AI
{ "type": "rows", "deriveFrom": "columnName from where the json content will be read"}
Use the Normalize Names step to consolidate the schema or column space. In this step, specify the columns you want to include in the final table that will be indexed into DataLinks.
Copy
Ask AI
{ "type": "normalize", "mode": "all-in-one", "targetCols": { "NameOfTheColumn": "Description for what this column is, which will enable the AI to automatically transform other names into the desired schema, you can have as many columns as you want" }, "helperPrompt": "This prompt will help the llm understand what the domain you are operating under is. So that can correctly interpret the existing column names and the target columns."}
The normalize step supports three different modes:
all-in-one: Uses a single prompt to normalize all columns at once
field-by-field: Normalizes each field individually, which can be more accurate for complex schemas
embeddings: Uses embeddings to match column names, which can be faster and more efficient for large datasets