diff --git a/doc/Query.md b/doc/Query.md index 1a875bf3008be0ef2b078ccae2b713d533acfaf4..ca1499672d9f66ec59731443b48f46dff54cce15 100644 --- a/doc/Query.md +++ b/doc/Query.md @@ -1,43 +1,102 @@ -# Example queries - -## Simple FIND Query -The following query will return any entity which has the name _ename_ and all its children. -`FIND ename` - -The following queries are equivalent and will return any entity which has the name _ename_ and all its children, but only if they are genuin records. Of course, the returned set of entities (henceforth referred to as _resultset_) can also be restricted to recordtypes, properties and files. - -`FIND RECORD ename` - -`FIND RECORDS ename` - -Wildcards use `*` for any characters or none at all. Wildcards for single characters (like the '_' wildcard from mysql) are not implemented yet. - -`FIND RECORD en*` returns any entity which has a name beginning with _en_. - -Regular expressions must be surrounded by _<<_ and '>>': - -`FIND RECORD <<e[aemn]{2,5}>>` - -`FIND RECORD <<[cC]am_[0-9]*>>` - -*TODO* (Timm): -Describe escape sequences like `\\`, `\*`, `\<<` and `\>>`. - -Currently, wildcards and regular expressions are only available for the _simple-find-part_ of the query, i. e. no wildcards/regexps for filters. - -## Simple COUNT Query - -This query counts entities which have certain properties. - -`COUNT ename` -will return the number of entities which have the name _ename_ and all their children. - -The syntax of the COUNT queries is equivalent to the FIND queries in any respect (this also applies to wildcards and regular expressions) but one: The prefix is to be `COUNT` instead of `FIND`. - -Unlike the FIND queries, the COUNT queries do not return any entities. The result of the query is the number of entities which _would be_ returned if the query was a FIND query. - -## Filters +# Searching Data + +In this chapter, the CaosDB Query Language (CQL) is presented as a means of +formulating search commands, commonly referred to as queries. It is highly +recommended that you experiment with the examples provided, such as those found +on https://demo.indiscale.com. An interactive tour is also available on this +public instance, which includes a comprehensive overview of the query language. +Therefore, it is suggested that you begin there and subsequently proceed with +this more detailed explanation. + +## Introduction + +Queries typically start with the keyword `FIND`, followed by a description of +what you want to find. For example, you can search for all musical instruments +with `FIND MusicalInstrument`. + +*Note*, the CQL is case**in**sensitive. We will write keywords of CQL in all +caps to illustrate what parts are part of the language. + +The most common way is to provide a RecordType name after `FIND` (as in the +example above). However, you can also use the name of some other entity: +`FIND 'My first guitar'`. + +*Note*, that we put the name here in quotes. Spaces are used in CQL as separator +of words. Thus, if something contains quotes, like the name here, it needs to be +quoted. + +While queries like the last one are great to get an impression of the data, +often we need to be more specific. Therefore, queries can include various +conditions to restrict the result set. + +Example: `FIND MusicalAnalysis WITH quality_factor>0.5 AND date IN +2019`. The keyword `WITH` signifies that for each Record of the type +`MusicalAnalysis`, an assessment is made to determine whether it possesses a +Property labelled `quality_factor` that exceeds 0.5, as well as another +Property labelled `date` that may correspond to any day within the year 2019. + +In order to make CQL easier to learn and to remember we designed it to be close +to natural spoken English language. For example, you can write +`FIND Guitar WHICH HAS A PROPERTY price`. Here, "HAS A PROPERTY" is what we call +syntactic sugar. It lets the query role off the tongue more easily than +`FIND Guitar WHICH price` but it is actually not needed and does not change +the meaning of the query. In fact, you could also write `FIND Guitar WITH +price`. + +If you are only interested in the number of Entities that match your query, you +can replace `FIND` with `COUNT` and the query will only return the number of +Entities in the result set. + +Sometimes the list of Records that you get using a `FIND` query is not what you +need; especially if you want to export a subset of the data for the analysis +with some external tool. +`SELECT` queries offer to represent the query result in a tabular form. + +If you replace the `FIND` keyword of a query with `SELECT x, y, z FROM`, then +CaosDB will return the result as tabular data. + +For example, instead of `FIND Guitar`, try out +`SELECT name, electric FROM Guitar` + +As you can see, those queries are design to allow very specific data requests. +If you do not want/need to be as specific you can omit the first keyword (`FIND` +or `SELECT`) which creates a search for anything that has a text Property with +something like your expression. For example, the query "John" will search for +any Records that has a text property that contains this string. + +With this, we conclude our introduction of CQL. You now know about the basic +elements. The following will cover the various aspects in more detail and you +will for example learn how you can use references among Records, or meta data +like the creation time of a Record to restrict the query result set. + +## What am I searching for? + +We already learned, that we can provide the name of a RecordType after the `FIND` +keyword. Let's call this part of the query "entity expression". In general, we +need to identify with the entity expression one or more entities via their name, CaosDB ID +or a pattern. +- `FIND Guitar` +- `FIND Guit*` ('*' represents none, one or more characters) +- `FIND <<[gG]ui.*>>` (a regular expression surrounded by _<<_ and '>>'. see below) +- `FIND 110` + +The result set will contain Entities that are either identified by the entity expression +directly (i.e. they have the name or the given ID) or the have such an Entity as +parent. + +As you know, CaosDB distincts among different Entity roles: +- Entity +- Record +- RecordType +- Property +- File + +You can provide the role directly after the `FIND` keyword and before the +entity expression: `FIND RECORD Guitar`. The result set will then restricted +to Entities with that role. + +## Conditions / Filters ### POV - Property-Operator-Value @@ -59,7 +118,7 @@ _currently known operators:_ `=, !=, <=, <, >=, >` (and cf. next paragraphes!) #### Special Operator: LIKE -The _LIKE_ can be used with wildcards. The `*` is a wildcard for any (possibly empty) sequence of characters. Examples: +The _LIKE_ can be used with wildcards. The `*` is a wildcard for any (possibly empty) sequence of characters. Examples: `FIND RECORD ename WHICH HAS A pname1 LIKE va*` @@ -71,7 +130,7 @@ _Note:_ The _LIKE_ operator is will only produce expectable results with text pr #### Special Case: References -In general a reference can be addressed just like a POV filter. So +In general a reference can be addressed just like a POV filter. So `FIND ename1.pname1=ename2` @@ -90,8 +149,8 @@ The query looks like this: _DateTime operators:_ `=, !=, <, >, IN, NOT IN` -##### `d1=d2`: Equivalence relation. -* ''True'' iff d1 and d2 are equal in every respect (same DateTime flavor, same fields are defined/undefined and all defined fields are equal respectively). +##### `d1=d2`: Equivalence relation. +* ''True'' iff d1 and d2 are equal in every respect (same DateTime flavor, same fields are defined/undefined and all defined fields are equal respectively). * ''False'' iff they have the same DateTime flavor but have different fields defined or fields with differing values. * ''Undefined'' otherwise. @@ -101,11 +160,11 @@ Examples: * `2015-04-03T00:00:00.0=2015-04-03T00:00:00.0` is true. * `2015-04-03T00:00:00=2015-04-03T00:00:00` is true. * `2015-04=2015-05` is false. -* `2015-04=2015-04` is true. +* `2015-04=2015-04` is true. -##### `d1!=d2`: Intransitive, symmetric relation. -* ''True'' iff `d1=d2` is false. -* ''False'' iff `d1=d2` is true. +##### `d1!=d2`: Intransitive, symmetric relation. +* ''True'' iff `d1=d2` is false. +* ''False'' iff `d1=d2` is true. * ''Undefined'' otherwise. Examples: @@ -114,11 +173,11 @@ Examples: * `2015-04-03T00:00:00.0!=2015-04-03T00:00:00.0` is false. * `2015-04-03T00:00:00!=2015-04-03T00:00:00` is false. * `2015-04!=2015-05` is true. -* `2015-04!=2015-04` is false. +* `2015-04!=2015-04` is false. ##### `d1>d2`: Transitive, non-symmetric relation. -Semantics depend on the flavors of d1 and d2. If both are... -###### [UTCDateTime](Datatype#datetime) +Semantics depend on the flavors of d1 and d2. If both are... +###### [UTCDateTime](Datatype#datetime) * ''True'' iff the time of d1 is after the the time of d2 according to [https://en.wikipedia.org/wiki/Coordinated_Universal_Time](UTC) * ''False'' otherwise. @@ -139,7 +198,7 @@ Examples: * `2014-01-01>2015-01-01T20:15:30` is false. ##### `d1<d2`: Transitive, non-symmetric relation. -Semantics depend on the flavors of d1 and d2. If both are... +Semantics depend on the flavors of d1 and d2. If both are... ###### [UTCDateTime](Datatype#datetime) * ''True'' iff the time of d1 is before the the time of d2 according to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) * ''False'' otherwise. @@ -161,7 +220,7 @@ Examples: * `2015-01-01T20:15.00<2015-01-01T20:14` is false. ##### `d1 IN d2`: Transitive, non-symmetric relation. -Semantics depend on the flavors of d1 and d2. If both are... +Semantics depend on the flavors of d1 and d2. If both are... ###### [SemiCompleteDateTime](Datatype#datetime) * ''True'' iff (`d1.ILB>d2.ILB` is true or `d1.ILB=d2.ILB` is true) and (`d1.EUB<d2.EUB` is true or `d1.EUB=d2.EUB` is true). * ''False'' otherwise. @@ -174,7 +233,7 @@ Examples: * `2015-01-01 IN 2015-01-01T20:15:30` is false. ##### `d1 NOT IN d2`: Transitive, non-symmetric relation. -Semantics depend on the flavors of d1 and d2. If both are... +Semantics depend on the flavors of d1 and d2. If both are... ###### [SemiCompleteDateTime](Datatype#datetime) * ''True'' iff (`d1.ILB IN d2.ILB` is false. * ''False'' otherwise. @@ -270,7 +329,7 @@ _STORED AT_ can be used with wildcards similar to unix wildcards. * `**` matches any character or none at all. * A leading `*` is short cut for `/**` * A star directly between two other stars is ignored: `***` is the same as `**`. - * Escape character: `\` (E.g. `\\` is a literal backslash. `\*` is a literal star. But `\\*` is a literal backslash followed by a wildcard.) + * Escape character: `\` (E.g. `\\` is a literal backslash. `\*` is a literal star. But `\\*` is a literal backslash followed by a wildcard.) Examples: @@ -294,7 +353,7 @@ Find any file in a directory which begins with `2016-02`: ### Back References -The back reference filters for entities that are referenced by another entity. The following query returns entities of the type _ename1_ which are referenced by _ename2_ entities via the reference property _pname1_. +The back reference filters for entities that are referenced by another entity. The following query returns entities of the type _ename1_ which are referenced by _ename2_ entities via the reference property _pname1_. * `FIND ename1 WHICH IS REFERENCED BY ename2 AS A pname1` * `FIND ename1 WITH @ ename2 / pname1` @@ -312,25 +371,46 @@ One may omit the property specification: Any result set can be filtered by logically combining POV filters or back reference filters: #### Conjunction (AND) +As we saw above, we can combine conditions: +`FIND MusicalAnalysis WHICH HAS quality_factor>0.5 AND date IN 2019` + +In general, the conjunction takes the form +`FIND <eexpr> WHICH <filter1> AND <filter2>`. You can also chain more conditions +with `AND`. If you mix conjunctions with disjunctions, you need to add brackets +to define the priority. For example: +`FIND <eexpr> WHICH (<filter1> AND <filter2>) OR <filter3>`. + +`FIND Guitar WHICH REFERENCES Manufacturer AND price`. Is a combination of +a reference filter and a POV filter. For readability, you can also write +`FIND Guitar WHICH REFERENCES Manufacturer AND WHICH HAS A price`. However, +the additional "WHICH HAS A" is purely cosmetic (syntactic sugar). -* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 AND A PROPERTY pname2=val2 AND A PROPERTY...` -* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 AND A pname2=val2 AND ...` -* `FIND ename1 . pname1=val1 & pname2=val2 & ...` #### Disjunction (OR) +The rules for disjunctions are the same as for conjunctions. See above. * `FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A PROPERTY pname2=val2 Or A PROPERTY...` * `FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A pname2=val2 OR ...` * `FIND ename1 . pname1=val1 | pname2=val2 | ...` #### Negation (NOT) - -* `FIND ename1 WHICH DOES NOT HAVE A PROPERTY pname1=val1` -* `FIND ename1 WHICH DOESN'T HAVE A pname1=val1` -* `FIND ename1 . NOT pname2=val2` -* `FIND ename1 . !pname2=val2` - -#### ... and combinations with parentheses +You can negate any filter by prefixing the filter with 'NOT' or '!': +`FIND <eexpr> WHICH NOT <filter1>`. +There are many syntactic sugar alternatives which are treated the same as "NOT": +- `DOES NOT HAVE` +- `ISN'T` +- and many more + +#### Parentheses +Basically, you can put parantheses around filter expressions and con- or +disjunctions. +- `FIND Guitar WHICH (REFERENCES Manufacturer AND WHICH HAS A price)`. +- `FIND Guitar WHICH (REFERENCES Manufacturer) AND (WHICH HAS A price)`. + +For better readability, the above query can be written as: +- `FIND Guitar WHICH (REFERENCES Manufacturer AND HAS A price)`. +Note, that this query without syntactic sugar looks like: +- `FIND Guitar WHICH (REFERENCES Manufacturer AND price)`. * `FIND ename1 WHICH HAS A pname1=val1 AND DOESN'T HAVE A pname2<val2 AND ((WHICH HAS A pname3=val3 AND A pname4=val4) OR DOES NOT HAVE A (pname5=val5 AND pname6=val6))` * `FIND ename1 . pname1=val1 & !pname2<val2 & ((pname3=val3 & pname4=val4) | !(pname5=val5 & pname6=val6))` @@ -352,4 +432,8 @@ Any result set can be filtered by logically combining POV filters or back refere * *Sub Queries* (or *Sub Properties*): `FIND ename WHICH HAS A pname WHICH HAS A subpname=val`. This is like: `FIND AN experiment WHICH HAS A camera WHICH HAS A 'serial number'= 1234567890` * *More Logic*, especially `ANY`, `ALL`, `NONE`, and `SUCH THAT` key words (and equivalents) for logical quantisation: `FIND ename1 SUCH THAT ALL ename2 WHICH HAVE A REFERENCE TO ename1 HAVE A pname=val`. This is like `FIND experiment SUCH THAT ALL person WHICH ARE REFERENCED BY THIS experiment AS conductor HAVE AN 'academic title'=professor.` - + + +## Text matching + +TODO: Describe escape sequences like `\\`, `\*`, `\<<` and `\>>`.