Skip to content
Snippets Groups Projects

DOC: enhance query documentation

Merged Henrik tom Wörden requested to merge f-doc-query into dev
+ 150
66
# Example queries
## Simple FIND Query
The following query will return any entity which has the name _ename_ and all its children.
`FIND ename`
The following queries are equivalent and will return any entity which has the name _ename_ and all its children, but only if they are genuin records. Of course, the returned set of entities (henceforth referred to as _resultset_) can also be restricted to recordtypes, properties and files.
`FIND RECORD ename`
`FIND RECORDS ename`
Wildcards use `*` for any characters or none at all. Wildcards for single characters (like the '_' wildcard from mysql) are not implemented yet.
`FIND RECORD en*` returns any entity which has a name beginning with _en_.
Regular expressions must be surrounded by _<<_ and '>>':
`FIND RECORD <<e[aemn]{2,5}>>`
`FIND RECORD <<[cC]am_[0-9]*>>`
*TODO* (Timm):
Describe escape sequences like `\\`, `\*`, `\<<` and `\>>`.
Currently, wildcards and regular expressions are only available for the _simple-find-part_ of the query, i. e. no wildcards/regexps for filters.
## Simple COUNT Query
This query counts entities which have certain properties.
`COUNT ename`
will return the number of entities which have the name _ename_ and all their children.
The syntax of the COUNT queries is equivalent to the FIND queries in any respect (this also applies to wildcards and regular expressions) but one: The prefix is to be `COUNT` instead of `FIND`.
Unlike the FIND queries, the COUNT queries do not return any entities. The result of the query is the number of entities which _would be_ returned if the query was a FIND query.
## Filters
# Searching Data
In this chapter, the CaosDB Query Language (CQL) is presented as a means of
formulating search commands, commonly referred to as queries. It is highly
recommended that you experiment with the examples provided, such as those found
on https://demo.indiscale.com. An interactive tour is also available on this
public instance, which includes a comprehensive overview of the query language.
Therefore, it is suggested that you begin there and subsequently proceed with
this more detailed explanation.
## Introduction
Queries typically start with the keyword `FIND`, followed by a description of
what you want to find. For example, you can search for all musical instruments
with `FIND MusicalInstrument`.
*Note*, the CQL is case**in**sensitive. We will write keywords of CQL in all
caps to illustrate what parts are part of the language.
The most common way is to provide a RecordType name after `FIND` (as in the
example above). However, you can also use the name of some other entity:
`FIND 'My first guitar'`.
*Note*, that we put the name here in quotes. Spaces are used in CQL as separator
of words. Thus, if something contains quotes, like the name here, it needs to be
quoted.
While queries like the last one are great to get an impression of the data,
often we need to be more specific. Therefore, queries can include various
conditions to restrict the result set.
Example: `FIND MusicalAnalysis WITH quality_factor>0.5 AND date IN
2019`. The keyword `WITH` signifies that for each Record of the type
`MusicalAnalysis`, an assessment is made to determine whether it possesses a
Property labelled `quality_factor` that exceeds 0.5, as well as another
Property labelled `date` that may correspond to any day within the year 2019.
In order to make CQL easier to learn and to remember we designed it to be close
to natural spoken English language. For example, you can write
`FIND Guitar WHICH HAS A PROPERTY price`. Here, "HAS A PROPERTY" is what we call
syntactic sugar. It lets the query role off the tongue more easily than
`FIND Guitar WHICH price` but it is actually not needed and does not change
the meaning of the query. In fact, you could also write `FIND Guitar WITH
price`.
If you are only interested in the number of Entities that match your query, you
can replace `FIND` with `COUNT` and the query will only return the number of
Entities in the result set.
Sometimes the list of Records that you get using a `FIND` query is not what you
need; especially if you want to export a subset of the data for the analysis
with some external tool.
`SELECT` queries offer to represent the query result in a tabular form.
If you replace the `FIND` keyword of a query with `SELECT x, y, z FROM`, then
CaosDB will return the result as tabular data.
For example, instead of `FIND Guitar`, try out
`SELECT name, electric FROM Guitar`
As you can see, those queries are design to allow very specific data requests.
If you do not want/need to be as specific you can omit the first keyword (`FIND`
or `SELECT`) which creates a search for anything that has a text Property with
something like your expression. For example, the query "John" will search for
any Records that has a text property that contains this string.
With this, we conclude our introduction of CQL. You now know about the basic
elements. The following will cover the various aspects in more detail and you
will for example learn how you can use references among Records, or meta data
like the creation time of a Record to restrict the query result set.
## What am I searching for?
We already learned, that we can provide the name of a RecordType after the `FIND`
keyword. Let's call this part of the query "entity expression". In general, we
need to identify with the entity expression one or more entities via their name, CaosDB ID
or a pattern.
- `FIND Guitar`
- `FIND Guit*` ('*' represents none, one or more characters)
- `FIND <<[gG]ui.*>>` (a regular expression surrounded by _<<_ and '>>'. see below)
- `FIND 110`
The result set will contain Entities that are either identified by the entity expression
directly (i.e. they have the name or the given ID) or the have such an Entity as
parent.
As you know, CaosDB distincts among different Entity roles:
- Entity
- Record
- RecordType
- Property
- File
You can provide the role directly after the `FIND` keyword and before the
entity expression: `FIND RECORD Guitar`. The result set will then restricted
to Entities with that role.
## Conditions / Filters
### POV - Property-Operator-Value
@@ -59,7 +118,7 @@ _currently known operators:_ `=, !=, <=, <, >=, >` (and cf. next paragraphes!)
#### Special Operator: LIKE
The _LIKE_ can be used with wildcards. The `*` is a wildcard for any (possibly empty) sequence of characters. Examples:
The _LIKE_ can be used with wildcards. The `*` is a wildcard for any (possibly empty) sequence of characters. Examples:
`FIND RECORD ename WHICH HAS A pname1 LIKE va*`
@@ -71,7 +130,7 @@ _Note:_ The _LIKE_ operator is will only produce expectable results with text pr
#### Special Case: References
In general a reference can be addressed just like a POV filter. So
In general a reference can be addressed just like a POV filter. So
`FIND ename1.pname1=ename2`
@@ -90,8 +149,8 @@ The query looks like this:
_DateTime operators:_ `=, !=, <, >, IN, NOT IN`
##### `d1=d2`: Equivalence relation.
* ''True'' iff d1 and d2 are equal in every respect (same DateTime flavor, same fields are defined/undefined and all defined fields are equal respectively).
##### `d1=d2`: Equivalence relation.
* ''True'' iff d1 and d2 are equal in every respect (same DateTime flavor, same fields are defined/undefined and all defined fields are equal respectively).
* ''False'' iff they have the same DateTime flavor but have different fields defined or fields with differing values.
* ''Undefined'' otherwise.
@@ -101,11 +160,11 @@ Examples:
* `2015-04-03T00:00:00.0=2015-04-03T00:00:00.0` is true.
* `2015-04-03T00:00:00=2015-04-03T00:00:00` is true.
* `2015-04=2015-05` is false.
* `2015-04=2015-04` is true.
* `2015-04=2015-04` is true.
##### `d1!=d2`: Intransitive, symmetric relation.
* ''True'' iff `d1=d2` is false.
* ''False'' iff `d1=d2` is true.
##### `d1!=d2`: Intransitive, symmetric relation.
* ''True'' iff `d1=d2` is false.
* ''False'' iff `d1=d2` is true.
* ''Undefined'' otherwise.
Examples:
@@ -114,11 +173,11 @@ Examples:
* `2015-04-03T00:00:00.0!=2015-04-03T00:00:00.0` is false.
* `2015-04-03T00:00:00!=2015-04-03T00:00:00` is false.
* `2015-04!=2015-05` is true.
* `2015-04!=2015-04` is false.
* `2015-04!=2015-04` is false.
##### `d1>d2`: Transitive, non-symmetric relation.
Semantics depend on the flavors of d1 and d2. If both are...
###### [UTCDateTime](Datatype#datetime)
Semantics depend on the flavors of d1 and d2. If both are...
###### [UTCDateTime](Datatype#datetime)
* ''True'' iff the time of d1 is after the the time of d2 according to [https://en.wikipedia.org/wiki/Coordinated_Universal_Time](UTC)
* ''False'' otherwise.
@@ -139,7 +198,7 @@ Examples:
* `2014-01-01>2015-01-01T20:15:30` is false.
##### `d1<d2`: Transitive, non-symmetric relation.
Semantics depend on the flavors of d1 and d2. If both are...
Semantics depend on the flavors of d1 and d2. If both are...
###### [UTCDateTime](Datatype#datetime)
* ''True'' iff the time of d1 is before the the time of d2 according to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time)
* ''False'' otherwise.
@@ -161,7 +220,7 @@ Examples:
* `2015-01-01T20:15.00<2015-01-01T20:14` is false.
##### `d1 IN d2`: Transitive, non-symmetric relation.
Semantics depend on the flavors of d1 and d2. If both are...
Semantics depend on the flavors of d1 and d2. If both are...
###### [SemiCompleteDateTime](Datatype#datetime)
* ''True'' iff (`d1.ILB>d2.ILB` is true or `d1.ILB=d2.ILB` is true) and (`d1.EUB<d2.EUB` is true or `d1.EUB=d2.EUB` is true).
* ''False'' otherwise.
@@ -174,7 +233,7 @@ Examples:
* `2015-01-01 IN 2015-01-01T20:15:30` is false.
##### `d1 NOT IN d2`: Transitive, non-symmetric relation.
Semantics depend on the flavors of d1 and d2. If both are...
Semantics depend on the flavors of d1 and d2. If both are...
###### [SemiCompleteDateTime](Datatype#datetime)
* ''True'' iff (`d1.ILB IN d2.ILB` is false.
* ''False'' otherwise.
@@ -270,7 +329,7 @@ _STORED AT_ can be used with wildcards similar to unix wildcards.
* `**` matches any character or none at all.
* A leading `*` is short cut for `/**`
* A star directly between two other stars is ignored: `***` is the same as `**`.
* Escape character: `\` (E.g. `\\` is a literal backslash. `\*` is a literal star. But `\\*` is a literal backslash followed by a wildcard.)
* Escape character: `\` (E.g. `\\` is a literal backslash. `\*` is a literal star. But `\\*` is a literal backslash followed by a wildcard.)
Examples:
@@ -294,7 +353,7 @@ Find any file in a directory which begins with `2016-02`:
### Back References
The back reference filters for entities that are referenced by another entity. The following query returns entities of the type _ename1_ which are referenced by _ename2_ entities via the reference property _pname1_.
The back reference filters for entities that are referenced by another entity. The following query returns entities of the type _ename1_ which are referenced by _ename2_ entities via the reference property _pname1_.
* `FIND ename1 WHICH IS REFERENCED BY ename2 AS A pname1`
* `FIND ename1 WITH @ ename2 / pname1`
@@ -312,25 +371,46 @@ One may omit the property specification:
Any result set can be filtered by logically combining POV filters or back reference filters:
#### Conjunction (AND)
As we saw above, we can combine conditions:
`FIND MusicalAnalysis WHICH HAS quality_factor>0.5 AND date IN 2019`
In general, the conjunction takes the form
`FIND <eexpr> WHICH <filter1> AND <filter2>`. You can also chain more conditions
with `AND`. If you mix conjunctions with disjunctions, you need to add brackets
to define the priority. For example:
`FIND <eexpr> WHICH (<filter1> AND <filter2>) OR <filter3>`.
`FIND Guitar WHICH REFERENCES Manufacturer AND price`. Is a combination of
a reference filter and a POV filter. For readability, you can also write
`FIND Guitar WHICH REFERENCES Manufacturer AND WHICH HAS A price`. However,
the additional "WHICH HAS A" is purely cosmetic (syntactic sugar).
* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 AND A PROPERTY pname2=val2 AND A PROPERTY...`
* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 AND A pname2=val2 AND ...`
* `FIND ename1 . pname1=val1 & pname2=val2 & ...`
#### Disjunction (OR)
The rules for disjunctions are the same as for conjunctions. See above.
* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A PROPERTY pname2=val2 Or A PROPERTY...`
* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A pname2=val2 OR ...`
* `FIND ename1 . pname1=val1 | pname2=val2 | ...`
#### Negation (NOT)
* `FIND ename1 WHICH DOES NOT HAVE A PROPERTY pname1=val1`
* `FIND ename1 WHICH DOESN'T HAVE A pname1=val1`
* `FIND ename1 . NOT pname2=val2`
* `FIND ename1 . !pname2=val2`
#### ... and combinations with parentheses
You can negate any filter by prefixing the filter with 'NOT' or '!':
`FIND <eexpr> WHICH NOT <filter1>`.
There are many syntactic sugar alternatives which are treated the same as "NOT":
- `DOES NOT HAVE`
- `ISN'T`
- and many more
#### Parentheses
Basically, you can put parantheses around filter expressions and con- or
disjunctions.
- `FIND Guitar WHICH (REFERENCES Manufacturer AND WHICH HAS A price)`.
- `FIND Guitar WHICH (REFERENCES Manufacturer) AND (WHICH HAS A price)`.
For better readability, the above query can be written as:
- `FIND Guitar WHICH (REFERENCES Manufacturer AND HAS A price)`.
Note, that this query without syntactic sugar looks like:
- `FIND Guitar WHICH (REFERENCES Manufacturer AND price)`.
* `FIND ename1 WHICH HAS A pname1=val1 AND DOESN'T HAVE A pname2<val2 AND ((WHICH HAS A pname3=val3 AND A pname4=val4) OR DOES NOT HAVE A (pname5=val5 AND pname6=val6))`
* `FIND ename1 . pname1=val1 & !pname2<val2 & ((pname3=val3 & pname4=val4) | !(pname5=val5 & pname6=val6))`
@@ -352,4 +432,8 @@ Any result set can be filtered by logically combining POV filters or back refere
* *Sub Queries* (or *Sub Properties*): `FIND ename WHICH HAS A pname WHICH HAS A subpname=val`. This is like: `FIND AN experiment WHICH HAS A camera WHICH HAS A 'serial number'= 1234567890`
* *More Logic*, especially `ANY`, `ALL`, `NONE`, and `SUCH THAT` key words (and equivalents) for logical quantisation: `FIND ename1 SUCH THAT ALL ename2 WHICH HAVE A REFERENCE TO ename1 HAVE A pname=val`. This is like `FIND experiment SUCH THAT ALL person WHICH ARE REFERENCED BY THIS experiment AS conductor HAVE AN 'academic title'=professor.`
## Text matching
TODO: Describe escape sequences like `\\`, `\*`, `\<<` and `\>>`.
Loading