diff --git a/src/doc/CaosDB-Query-Language.md b/src/doc/CaosDB-Query-Language.md new file mode 100644 index 0000000000000000000000000000000000000000..a700736fcacc1228d27cbdf095a29fa70e42d704 --- /dev/null +++ b/src/doc/CaosDB-Query-Language.md @@ -0,0 +1,377 @@ +**Table of Contents** +[[_TOC_]] + +# Example queries + +## Simple FIND Query +The following query will return any entity which has the name _ename_ and all its children. +`FIND ename` + +The following queries are equivalent and will return any entity which has the name _ename_ and all its children, but only if they are genuin records. Of course, the returned set of entities (henceforth referred to as _resultset_) can also be restricted to recordtypes, properties and files. + +`FIND RECORD ename` + +`FIND RECORDS ename` + +Wildcards use `*` for any characters or none at all. Wildcards for single characters (like the '_' wildcard from mysql) are not implemented yet. + +`FIND RECORD en*` returns any entity which has a name beginning with _en_. + +Regular expressions must be surrounded by _<<_ and '>>': + +`FIND RECORD <<e[aemn]{2,5}>>` + +`FIND RECORD <<[cC]am_[0-9]*>>` + +*TODO* (Timm): +Describe escape sequences like `\\`, `\*`, `\<<` and `\>>`. + +Currently, wildcards and regular expressions are only available for the _simple-find-part_ of the query, i. e. no wildcards/regexps for filters. + +## Simple COUNT Query + +This query counts entities which have certain properties. + +`COUNT ename` +will return the number of entities which have the name _ename_ and all their children. + +The syntax of the COUNT queries is equivalent to the FIND queries in any respect (this also applies to wildcards and regular expressions) but one: The prefix is to be `COUNT` instead of `FIND`. + +Unlike the FIND queries, the COUNT queries do not return any entities. The result of the query is the number of entities which _would be_ returned if the query was a FIND query. + +## Filters + +### POV - Property-Operator-Value + +The following queries are equivalent and will restrict the result set to entities which have a property named _pname1_ that has a value _val1_. + +`FIND ename.pname1=val1` + +`FIND ename WITH pname1=val1` + +`FIND ename WHICH HAS A PROPERTY pname1=val1` + +`FIND ename WHICH HAS A pname1=val1` + +Again, the resultset can be restricted to records: + +`FIND RECORD ename WHICH HAS A pname1=val1` + +_currently known operators:_ `=, !=, <=, <, >=, >` (and cf. next paragraphes!) + +#### Special Operator: LIKE + +The _LIKE_ can be used with wildcards. The `*` is a wildcard for any (possibly empty) sequence of characters. Examples: + +`FIND RECORD ename WHICH HAS A pname1 LIKE va*` + +`FIND RECORD ename WHICH HAS A pname1 LIKE va*1` + +`FIND RECORD ename WHICH HAS A pname1 LIKE *al1` + +_Note:_ The _LIKE_ operator is will only produce expectable results with text properties. + +#### Special Case: References + +In general a reference can be addressed just like a POV filter. So + +`FIND ename1.pname1=ename2` + +will also return any entity named _ename1_ which references the entity with name or id _ename2_ via a reference property named _pname1_. However, it will also return any entity with a text property of that name with the string value _ename2_. In order to restrict the result set to reference properties one may make use of special reference operators: + +_reference operators:_ `->, REFERENCES, REFERENCE TO` + + +The query looks like this: + +`FIND ename1 WHICH HAS A pname1 REFERENCE TO ename2` + +`FIND ename1 WHICH HAS A pname1->ename2` + +#### Time Special Case: DateTime + +_DateTime operators:_ `=, !=, <, >, IN, NOT IN` + +##### `d1=d2`: Equivalence relation. +* ''True'' iff d1 and d2 are equal in every respect (same DateTime flavor, same fields are defined/undefined and all defined fields are equal respectively). +* ''False'' iff they have the same DateTime flavor but have different fields defined or fields with differing values. +* ''Undefined'' otherwise. + +Examples: +* `2015-04-03=2015-04-03T00:00:00` is undefined. +* `2015-04-03T00:00:00=2015-04-03T00:00:00.0` is undefined (second precision vs. nanosecond precision). +* `2015-04-03T00:00:00.0=2015-04-03T00:00:00.0` is true. +* `2015-04-03T00:00:00=2015-04-03T00:00:00` is true. +* `2015-04=2015-05` is false. +* `2015-04=2015-04` is true. + +##### `d1!=d2`: Intransitive, symmetric relation. +* ''True'' iff `d1=d2` is false. +* ''False'' iff `d1=d2` is true. +* ''Undefined'' otherwise. + +Examples: +* `2015-04-03!=2015-04-03T00:00:00` is undefined. +* `2015-04-03T00:00:00!=2015-04-03T00:00:00.0` is undefined. +* `2015-04-03T00:00:00.0!=2015-04-03T00:00:00.0` is false. +* `2015-04-03T00:00:00!=2015-04-03T00:00:00` is false. +* `2015-04!=2015-05` is true. +* `2015-04!=2015-04` is false. + +##### `d1>d2`: Transitive, non-symmetric relation. +Semantics depend on the flavors of d1 and d2. If both are... +###### [UTCDateTime](Datatype#datetime) +* ''True'' iff the time of d1 is after the the time of d2 according to [https://en.wikipedia.org/wiki/Coordinated_Universal_Time](UTC) +* ''False'' otherwise. + +###### [SemiCompleteDateTime](Datatype#datetime) +* ''True'' iff `d1.ILB>d2.EUB` is true or `d1.ILB=d2.EUB` is true. +* ''False'' iff `d1.EUB<d2.ILB}} is true or {{{d1.EUB=d2.ILB` is true. +* ''Undefined'' otherwise. + +Examples: +* `2015>2014` is true. +* `2015-04>2014` is true. +* `2015-01-01T20:15.00>2015-01-01T20:14` is true. +* `2015-04>2015` is undefined. +* `2015>2015-04` is undefined. +* `2015-01-01T20:15>2015-01-01T20:15:15` is undefined. +* `2014>2015` is false. +* `2014-04>2015` is false. +* `2014-01-01>2015-01-01T20:15:30` is false. + +##### `d1<d2`: Transitive, non-symmetric relation. +Semantics depend on the flavors of d1 and d2. If both are... +###### [UTCDateTime](Datatype#datetime) +* ''True'' iff the time of d1 is before the the time of d2 according to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) +* ''False'' otherwise. + +###### [SemiCompleteDateTime](Datatype#datetime) +* ''True'' iff `d1.EUB<d2.ILB` is true or `d1.EUB=d2.ILB` is true. +* ''False'' iff `d1.ILB>d2.EUB}} is true or {{{d1.ILB=d2.EUB` is true. +* ''Undefined'' otherwise. + +Examples: +* `2014<2015` is true. +* `2014-04<2015` is true. +* `2014-01-01<2015-01-01T20:15:30` is true. +* `2015-04<2015` is undefined. +* `2015<2015-04` is undefined. +* `2015-01-01T20:15<2015-01-01T20:15:15` is undefined. +* `2015<2014` is false. +* `2015-04<2014` is false. +* `2015-01-01T20:15.00<2015-01-01T20:14` is false. + +##### `d1 IN d2`: Transitive, non-symmetric relation. +Semantics depend on the flavors of d1 and d2. If both are... +###### [SemiCompleteDateTime](Datatype#datetime) +* ''True'' iff (`d1.ILB>d2.ILB` is true or `d1.ILB=d2.ILB` is true) and (`d1.EUB<d2.EUB` is true or `d1.EUB=d2.EUB` is true). +* ''False'' otherwise. + +Examples: +* `2015-01-01 IN 2015` is true. +* `2015-01-01T20:15:30 IN 2015-01-01` is true. +* `2015-01-01T20:15:30 IN 2015-01-01T20:15:30` is true. +* `2015 IN 2015-01-01` is false. +* `2015-01-01 IN 2015-01-01T20:15:30` is false. + +##### `d1 NOT IN d2`: Non-symmetric relation. +Semantics depend on the flavors of d1 and d2. If both are... +###### [SemiCompleteDateTime](Datatype#datetime) +* ''True'' iff `d1.ILB IN d2.ILB` is false. +* ''False'' otherwise. + +Examples: +* `2015 NOT IN 2015-01-01` is true. +* `2015-01-01 NOT IN 2015-01-01T20:15:30` is true. +* `2015-01-01 NOT IN 2015` is false. +* `2015-01-01T20:15:30 NOT IN 2015-01-01` is false. +* `2015-01-01T20:15:30 NOT IN 2015-01-01T20:15:30` is false. + +##### Note +These semantics follow a three-valued logic with ''true'', ''false'' and ''undefined'' as truth values. Only ''true'' is truth preserving. I.e. only those expressions which evaluate to ''true'' pass the POV filter. `FIND ... WHICH HAS A somedate=2015-01` only returns entities for which `somedate=2015-01` is true. On the other hand, `FIND ... WHICH DOESN'T HAVE A somedate=2015-01` returns entities for which `somedate=2015-01` is false or undefined. Shortly put, `NOT d1=d2` is not equivalent to `d1!=d2`. The latter assertion is stronger. + +#### Omitting the Property or the Value + +One doesn't have to specify the property or the value at all. The following query filters the result set for entities which have any property with a value greater than _val1_. + +`FIND ename WHICH HAS A PROPERTY > val1` + +`FIND ename . > val1` + +`FIND ename.>val1` + + +And for references... + +`FIND ename1 WHICH HAS A REFERENCE TO ename2` + +`FIND ename1 WHICH REFERENCES ename2` + +`FIND ename1 . -> ename2` + +`FIND ename1.->ename2` + + +The following query returns entities which have a _pname1_ property with any value. + +`FIND ename WHICH HAS A PROPERTY pname1` + +`FIND ename WHICH HAS A pname1` + +`FIND ename WITH pname1` + +`FIND ename . pname1` + +`FIND ename.pname1` + +### TransactionFilter + +*Definition* + + sugar:: `HAS BEEN` | `HAVE BEEN` | `HAD BEEN` | `WAS` | `IS` | + + negated_sugar:: `HAS NOT BEEN` | `HASN'T BEEN` | `WAS NOT` | `WASN'T` | `IS NOT` | `ISN'T` | `HAVN'T BEEN` | +`HAVE NOT BEEN` | `HADN'T BEEN` | `HAD NOT BEEN` + + by_clause:: `BY (ME | username | SOMEONE ELSE (BUT ME)? | SOMEONE ELSE BUT username)` + + date:: A date string of the form `YYYY-MM-DD` + + datetime:: A datetime string of the form `YYYY-MM-DD hh:mm:ss` + + time_clause:: `ON ($date|$datetime) ` Here is plenty of room for more syntactic sugar, e.g. a `TODAY` keyword, and more funcionality, e.g. ranges. + +`FIND ename WHICH ($sugar|$negated_sugar)? (NOT)? (CREATED|INSERTED|UPDATED|DELETED) (by_clause time_clause?| time_clause by_clause?)` + +*Examples* + +`FIND ename WHICH HAS BEEN CREATED BY ME ON 2014-12-24` + +`FIND ename WHICH HAS BEEN CREATED BY SOMEONE ELSE ON 2014-12-24` + +`FIND ename WHICH HAS BEEN CREATED BY erwin ON 2014-12-24` + +`FIND ename WHICH HAS BEEN CREATED BY SOMEONE ELSE BUT erwin ON 2014-12-24` + +`FIND ename WHICH HAS BEEN CREATED BY erwin` + +`FIND ename . CREATED BY erwin ON ` + + + +### File Location + +Search for file objects by their location: + +`FIND FILE WHICH IS STORED AT a/certain/path/` + +#### Wildcards + +_STORED AT_ can be used with wildcards similar to unix wildcards. + * `*` matches any characters or none at all, but not the directory separator `/` + * `**` matches any character or none at all. + * A leading `*` is a shortcut for `/**` + * Asterisks directly between two other asterisks are ignored: `***` is the same as `**`. + * Escape character: `\` (E.g. `\\` is a literal backslash. `\*` is a literal star. But `\\*` is a literal backslash followed by a wildcard.) + +Examples: + +Find any files ending with `.acq`: +`FIND FILE WHICH IS STORED AT *.acq` or +`FIND FILE WHICH IS STORED AT **.acq` or +`FIND FILE WHICH IS STORED AT /**.acq` + +Find files stored one directory below `/data/`, ending with `.acq`: +`FIND FILE WHICH IS STORED AT /data/*/*.acq` + +Find files stored in `/data/`, ending with `.acq`: +`FIND FILE WHICH IS STORED AT /data/*.acq` + +Find files stored in a directory at any depth in the tree below `/data/`, ending with `.acq`: +`FIND FILE WHICH IS STORED AT /data/**.acq` + +Find any file in a directory which begins with `2016-02`: +`FIND FILE WHICH IS STORED AT */2016-02*/*` + + +### Back References + +The back reference filters for entities that are referenced by another entity. The following query returns entities of the type _ename1_ which are referenced by _ename2_ entities via the reference property _pname1_. + +* `FIND ename1 WHICH IS REFERENCED BY ename2 AS A pname1` +* `FIND ename1 WITH @ ename2 / pname1` +* `FIND ename1 . @ ename2 / pname1` + +One may omit the property specification: + +* `FIND ename1 WHICH IS REFERENCED BY ename2` +* `FIND ename1 WHICH HAS A PROPERTY @ ename2` +* `FIND ename1 WITH @ ename2` +* `FIND ename1 . @ ename2` + +### Combining Filters with Propositional Logic + +Any result set can be filtered by logically combining POV filters or back reference filters: + +#### Conjunction (AND) + +* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 AND A PROPERTY pname2=val2 AND A PROPERTY...` +* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 AND A pname2=val2 AND ...` +* `FIND ename1 . pname1=val1 & pname2=val2 & ...` + +#### Disjunction (OR) + +* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A PROPERTY pname2=val2 Or A PROPERTY...` +* `FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A pname2=val2 OR ...` +* `FIND ename1 . pname1=val1 | pname2=val2 | ...` + +#### Negation (NOT) + +* `FIND ename1 WHICH DOES NOT HAVE A PROPERTY pname1=val1` +* `FIND ename1 WHICH DOESN'T HAVE A pname1=val1` +* `FIND ename1 . NOT pname2=val2` +* `FIND ename1 . !pname2=val2` + +#### ... and combinations with parentheses + +* `FIND ename1 WHICH HAS A pname1=val1 AND DOESN'T HAVE A pname2<val2 AND ((WHICH HAS A pname3=val3 AND A pname4=val4) OR DOES NOT HAVE A (pname5=val5 AND pname6=val6))` +* `FIND ename1 . pname1=val1 & !pname2<val2 & ((pname3=val3 & pname4=val4) | !(pname5=val5 & pname6=val6))` +* `FIND ename1.pname1=val1&!pname2<val2&((pname3=val3&pname4=val4)|!(pname5=val5&pname6=val6))` + +### A Few Important Expressions + +* A:: The indistinct article. This is only syntactic suger. Equivalent expressions: `A, AN` +* AND:: The logical _and_. Equivalent expressions: `AND, &` +* FIND:: The beginning of the query. +* NOT:: The logical negation. Equivalent expressions: `NOT, DOESN'T HAVE A PROPERTY, DOES NOT HAVE A PROPERTY, DOESN'T HAVE A, DOES NOT HAVE A, DOES NOT, DOESN'T, IS NOT, ISN'T, !` +* OR:: The logical _or_. Equivalent expressions: `OR, |` +* RECORD,RECORDTYPE,FILE,PROPERTY:: Role expression for restricting the result set to a specific role. +* WHICH:: The marker for the beginning of the filters. Equivalent expressions: `WHICH, WHICH HAS A, WHICH HAS A PROPERTY, WHERE, WITH, .` +* REFERENCE:: This one is tricky: `REFERENCE TO` expresses a the state of _having_ a reference property. `REFERENCED BY` expresses the state of _being_ referenced by another entity. +* COUNT:: `COUNT` works like `FIND` but doesn't return the entities. + +## Versioning + +Since Caosdb 0.2 entities are optionally version controlled. The query language will be extended to include versioning in the future. A current minimal implementation introduces the `ANY VERSION OF` modifier which can be used to return all matching versions in the results of `COUNT`, `FIND`, and `SELECT` queries. + +### Example + +* `FIND ANY VERSION OF RECORD WITH pname=value` returns the all past and present versions of records with `pname = value`. + +### Scope and current limitations + +* The `ANY VERSION OF` modifier currently the only expression for taking the versioning into account when using the query language. +* Subproperties are not supported yet, e.g. `FIND ANY VERSION OF ENTITY WHICH IS REFERENCED BY ename WITH ...`. This applies to all cases where you specify properties of *referenced* entities or *referencing* entities. + +### Future + +* Add `(LATEST|LAST|OLDEST|NEWEST|FIRST) VERSION OF` modifiers. +* Add `(ANY|LATEST|LAST|OLDEST|NEWEST|FIRST) VERSION (BEFORE|AFTER) (<timestamp>|<transaction id>|<entity@version>) OF` modifier. +* Add support for subproperties, e.g. `FIND ANY VERSION OF ENTITY WHICH IS REFERENCED BY ename WITH ...`. + +# Future + + * *Sub Queries* (or *Sub Properties*): `FIND ename WHICH HAS A pname WHICH HAS A subpname=val`. This is like: `FIND AN experiment WHICH HAS A camera WHICH HAS A 'serial number'= 1234567890` + * *More Logic*, especially `ANY`, `ALL`, `NONE`, and `SUCH THAT` key words (and equivalents) for logical quantisation: `FIND ename1 SUCH THAT ALL ename2 WHICH HAVE A REFERENCE TO ename1 HAVE A pname=val`. This is like `FIND experiment SUCH THAT ALL person WHICH ARE REFERENCED BY THIS experiment AS conductor HAVE AN 'academic title'=professor.` + diff --git a/src/doc/index.rst b/src/doc/index.rst index 4fb19aca7e45a39bbefcd759bb1e7715cb0f2d77..ae9541b7eb05cd17cdb3b71111a74679cf765eba 100644 --- a/src/doc/index.rst +++ b/src/doc/index.rst @@ -11,6 +11,7 @@ Welcome to caosdb-server's documentation! Getting started <README_SETUP> Concepts <concepts> tutorials + Query Language <CaosDB-Query-Language> administration/* development/* API documentation<_apidoc/packages>