Query Language
Introduction
The DataLinks Query Language is a powerful domain-specific language designed to query and manipulate datasets within the DataLinks ecosystem. It provides a flexible and intuitive way to access data, apply filters, establish relationships between datasets, and sort results.
This language is particularly useful for:
- Retrieving data from specific datasets
- Filtering data based on various conditions
- Discovering relationships between different datasets
- Traversing complex data structures
- Sorting and limiting result sets
Basic Syntax
The query language follows a fluent interface pattern, where methods are chained together to build complex queries. The basic structure of a query is:
or
Both Ontology
and OntologyObject
are valid entry points for creating a query.
Querying Datasets
Basic Dataset Query
To query a dataset, specify its name:
Dataset names are case-insensitive, so Ontology("movies")
and Ontology("Movies")
refer to the same dataset.
Namespaced Datasets
Datasets can be organized in namespaces. To query a namespaced dataset, use the format:
For example:
If a dataset name is ambiguous (exists in multiple namespaces), you must specify the namespace to avoid errors.
Filtering Data
Basic Filtering
To filter data, use the filter()
method with a comparison expression:
or
Comparison Operators
The query language supports the following comparison operators:
==
(equal to)!=
(not equal to)<
(less than)<=
(less than or equal to)>
(greater than)>=
(greater than or equal to)
Examples:
Logical Operators
Combine multiple conditions using logical operators:
&&
(AND)||
(OR)
Examples:
Nested Conditions
Use parentheses to create complex nested conditions:
Filter Expression Syntax
The query language provides flexible syntax for filter expressions:
- Quoting Field Names: Field names can be unquoted, single-quoted, or double-quoted:
- Quoting Values: String values can be unquoted (for simple strings) or quoted:
- Multi-word Values: Multi-word string values must be quoted:
title == The Matrix
) are not supported.Linking Datasets
Searching Related Datasets
To find related datasets, use the searchAround()
and find()
methods:
This searches for actors related to movies based on defined links between the datasets.
Specifying Search Depth
You can specify the depth of the search (how many hops to traverse):
A depth of 2 means it will look for relationships that are two links away.
Following Specific Links
To follow specific columns when linking datasets:
This specifies that the link between Movies and Actor should be through the “director” column.
You can specify multiple columns to follow:
Sorting and Limiting Results
Sorting Results
Sort results in ascending or descending order:
Alternative syntax:
Multiple Sort Criteria
Sort by multiple fields:
This sorts actors first by nationality in descending order, then by age in ascending order.
Limiting Results
Limit the number of results returned:
You can also limit results at different stages of a complex query:
This limits the movies to 10 and the related actors to 5.
Natural Language Query Generation
The system can generate queries from natural language questions. For example:
Natural language: “What is the age of the director of Braveheart?”
Generated query:
Examples
Basic Query Examples
- Find all movies:
- Find a specific movie:
- Find movies released before 2000:
Relationship Query Examples
- Find actors who directed movies:
- Find movies directed by actors under 60:
- Find actors who directed movies, then find other actors who worked with them:
Sorting and Limiting Examples
- Find the 3 oldest movies:
- Find the 5 highest budget movies:
Error Handling
If a query returns unexpected results:
-
Check field names and table relationships for correctness.
-
Ensure the depth in
searchAround
is sufficient for your query. -
Debug by breaking the query into smaller parts and testing them individually.