Ai inference
Introduction
When uploading data, you can derive new columns from existing ones using our AI for automatic derivations. To do this, you’ll need to specify the desired columns and the steps you want to execute. We support several steps including Table, Rows, Normalize, Validate, ReverseGeo, and column inference.
-
The inference can have multiple steps which are combined by our AI. The example above extracts a simple table, but you can have other steps such as table, rows, normalize, validate, and reverseGeo.
-
In this specific example we specify we want a table extracted from a specific column. This will extract all columns available from the text provided. You can use the helperPrompt to guide the AI system to have higher accuracy during extraction.
Available steps
Table extraction
The Table Step is designed to extract a table from any text input, whether it’s free-form text such as a restaurant review, a flight log report, or a financial instrument notice. When used independently, this step may produce columns that are somewhat arbitrary and often require normalization afterward. It functions by extracting data to create new columns and appending them to the rows of the existing table.
Pro tip: Calling an extraction with just the table step is a great way to see what kind of structured data can be generated from your unstructured data.
Rows
If a table is stored in JSON format within a column, you can directly transform it into a structured table. The JSON should consist of an array of objects, with each key in the objects mapped to a new column.
Normalize names
Use the Normalize Names step to consolidate the schema or column space. In this step, specify the columns you want to include in the final table that will be indexed into DataLinks.
The normalize step supports three different modes:
- all-in-one: Uses a single prompt to normalize all columns at once
- field-by-field: Normalizes each field individually, which can be more accurate for complex schemas
- embeddings: Uses embeddings to match column names, which can be faster and more efficient for large datasets
Validate
The Validate step ensures data quality by validating the content of specified columns. It supports three validation modes:
- regex: Validates columns using regular expressions
- rows: Validates entire rows based on the specified columns
- fields: Validates individual fields in the specified columns
Validated rows will include a __valid
field indicating whether the validation passed.
ReverseGeo
The ReverseGeo step adds geographical coordinates (latitude and longitude) based on location names in a specified column.
This step will add a new column named {locationColumnName}_latlong
containing the coordinates.