Merge branch 'dev' into f-convert-xlsx-to-json-next

c8ea9f90 · Daniel Hornung · 49c1e0d4 · b49aa866 · c8ea9f90 · c8ea9f90
Verified Commit c8ea9f90 authored 1 year ago by Daniel Hornung
--- a/README_SETUP.md
+++ b/README_SETUP.md
@@ -64,6 +64,7 @@ Build documentation in `build/` with `make doc`.
 - `sphinx`
 - `sphinx-autoapi`
+- `sphinx-rtd-theme`
 - `recommonmark >= 0.6.0`
 ### How to contribute ###

--- a/src/caosadvancedtools/table_json_conversion/convert.py
+++ b/src/caosadvancedtools/table_json_conversion/convert.py
@@ -103,8 +103,6 @@ Look at ``xlsx_utils.get_path_position`` for the specification of the "proper na
        data_column_paths = {col.index: col.path for col in data_columns.values()}
        # Parent path, insert in correct order.
        parent, proper_name = xlsx_utils.get_path_position(sheet)
-        # print(parent, proper_name, sheet.title)
-        # breakpoint()
        if parent:
            parent_sheetname = xlsx_utils.get_worksheet_for_path(parent, self._defining_path_index)
            if parent_sheetname not in self._handled_sheets:
@@ -148,7 +146,6 @@ Look at ``xlsx_utils.get_path_position`` for the specification of the "proper na
                        value = self._validate_and_convert(value, path)
                        _set_in_nested(mydict=data, path=path, value=value, prefix=parent, skip=1)
                    continue
-                continue
            # Find current position in tree
            parent_dict = self._get_parent_dict(parent_path=parent, foreign=foreign)
@@ -157,11 +154,7 @@ Look at ``xlsx_utils.get_path_position`` for the specification of the "proper na
            if proper_name not in parent_dict:
                parent_dict[proper_name] = []
            parent_dict[proper_name].append(data)
-        # breakpoint()
-        # if sheet.title == "Training.Organisation":
-        #     breakpoint()
        self._handled_sheets.add(sheet.title)
-        # print(f"Added sheet: {sheet.title}")
    def _is_multiple_choice(self, path: list[str]) -> bool:
        """Test if the path belongs to a multiple choice section."""
@@ -309,7 +302,7 @@ mydict: dict
 path: list
  A list of keys, denoting the location of the value.
 value
-  The value inside the dict.
+  The value which shall be set inside the dict.
 prefix: list
  A list of keys which shall be removed from ``path``.  A KeyError is raised if ``path`` does not
  start with the elements of ``prefix``.

--- a/src/doc/table-json-conversion/specs.md
+++ b/src/doc/table-json-conversion/specs.md
-# Conversion between LinkAhead data models, JSON schema, and XLSX (and vice versa) #
-This file describes the conversion between JSON schema files and XLSX templates, and between JSON
-data files following a given schema and XLSX files with data.  This conversion is handled by the
-Python modules in the `table_json_conversion` library.
-Requirements: When converting from a json schema, the top level of the json schema must be a
-dict. The keys of the dict are RecordType names.
-## Data models in JSON Schema and JSON data ##
-The data model in LinkAhead defines the types of records present in a LinkAhead instance and their
-structure. This data model can also be represented in a JSON Schema, which defines the structure of
-JSON files containing records pertaining to the data model.
-For example, the following JSON can describe a singe "Person" Record:
-```JSON
-{
-    "Person": [
-        {
-            "family_name": "Steve",
-            "given_name": "Stevie"
-        }
-    ]
-}
-```
-A *JSON Schema* specifies a concrete structure, and the associated JSON files can be used to
-represent data for specific record structures. For instance, one could create a JSON Schema allowing
-the storage of "Training" Records containing information about conducted trainings. This is
-particularly valuable for data import and export. One could generate web forms from the JSON Schema
-or use it to export objects stored in LinkAhead as JSON.
-### Note: Data models and data arrays ###
-The schema as created by ``json_schema_exporter.recordtype_to_json_schema(...)`` is, from a broad
-view, a dict with all the top level recordtypes (the recordtype names are the keys).  While this is
-appropriate for the generation of user input forms, data often consists of multiple entries of the
-same type.  XLSX files are no exception, users expect that they may enter multiple rows of data.
-Since the data model schema does not match multiple data sets, there is a utility function which
-create a *data array* schema out of the *data model* schema: It basically replaces the top-level
-entries of the data model by lists which may contain data.
-A **short example** illustrates this well.  Consider a *data model* schema which fits to this data
-content:
-```JSON
-{
-  "Person": {
-    "name": "Charly"
-  }
-}
-```
-Now the automatically generated *data array* schema would accept the following data:
-```JSON
-{
-  "Person": [
-    {
-      "name": "Charly"
-    },
-    {
-      "name": "Sam"
-    }
-  ]
-}
-```
-## From JSON to XLSX: Data Representation ##
-The following describes how JSON files representing LinkAhead records are converted into XLSX files,
-or how JSON files with records are created from XLSX files.
-The attribute name (e.g., "Person" above) determines the RecordType, and the value of this attribute
-can either be an object or a list. If it is an object (as in the example above), a single record is
-represented. In the case of a list, multiple records sharing the same RecordType as the parent are
-represented.
-The *Properties* of the record (e.g., `family_name` and `given_name` above) become *columns* in the
-XLSX file. These properties have an attribute name and a value. The value can be:
-a. A primitive (text, number, boolean, ...)
-b. A record
-c. A list of primitive types
-d. A list of unique enums (multiple choice)
-e. A list of records
-In cases *a.* and *c.*, a cell is created in the column corresponding to the property in the XLSX
-file.  In case *b.*, columns are created for the Properties of the record, where for each of the
-Properties the cases *a.* - *e.* are considered recursively.  Case *d.* leads to a number of
-columns, one for each of the possible choices.
-For case *e.* however, the two-dimensional structure of an XLSX sheet is not sufficient. Therefore,
-for such cases, *new* XLSX sheets/tables are created.
-In these sheets/tables, the referenced records are treated as described above (new columns for the
-Properties).  However, there are now additional columns that indicate from which "external" record
-these records are referenced.
-Let's now consider these four cases in detail and with examples:
-### a. Properties with primitive data types ###
-```JSON
-{
-    "Training": [
-      {
-        "date": "2023-01-01",
-        "url": "www.indiscale.com",
-        "duration": 1.0,
-        "participants": 1,
-        "remote": false
-      },
-      {
-        "date": "2023-06-15",
-        "url": "www.indiscale.com/next",
-        "duration": 2.5,
-        "participants": None,
-        "remote": true
-      }
-    ]
-}
-```
-This entry will be represented in an XLSX sheet with the following content:
-| date       | url                    | duration | participants | remote |
-|------------|------------------------|----------|--------------|--------|
-| 2023-01-01 | www.indiscale.com      | 1.0      | 1            | false  |
-| 2023-06-15 | www.indiscale.com/next | 2.5      |              | true   |
-### b. Property referencing a record ###
-```JSON
-{
-    "Training": [
-      {
-        "date": "2023-01-01",
-        "supervisor": {
-            "family_name": "Stevenson",
-            "given_name": "Stevie",
-        }
-      }
-    ]
-}
-```
-This entry will be represented in an XLSX sheet with the following content:
-| date       | `supervisor.family_name` | `supervisor.given_name` |
-|------------|--------------------------|-------------------------|
-| 2023-01-01 | Stevenson                | Stevie                  |
-Note that column names may be renamed. The mapping of columns to properties of records is ensured
-through the content of hidden rows.  (See below for the definition of hidden rows.)
-### c. Properties containing lists of primitive data types ###
-```JSON
-{
-    "Training": [
-      {
-        "url": "www.indiscale.com",
-        "subjects": ["Math", "Physics"],
-      }
-    ]
-}
-```
-This entry would be represented in an XLSX sheet with the following content:
-| url               | subjects     |
-|-------------------|--------------|
-| www.indiscale.com | Math;Physics |
-The list elements are written into the cell separated by `;` (semicolon). If the elements contain
-the separator `;`, it is escaped with `\\`.
-### d. Multiple choice properties ###
-```JSON
-{
-    "Training": [
-      {
-        "date": "2024-04-17",
-        "skills": [
-              "Planning",
-              "Evaluation"
-        ]
-      }
-    ]
-}
-```
-If the `skills` list is denoted as an `enum` array with `"uniqueItems": true` in the json schema,
-this entry would be represented like this in an XLSX:
-| date       | skills.Planning | skills.Communication | skills.Evaluation |
-|------------|-----------------|----------------------|-------------------|
-| 2024-04-17 | x               |                      | x                 |
-Note that this example assumes that the list of possible choices, as given in the json schema, was
-"Planning, Communication, Evaluation".
-### e. Properties containing lists with references ###
-```JSON
-{
-    "Training": [
-      {
-        "date": "2023-01-01",
-        "coach": [
-            {
-              "family_name": "Sky",
-              "given_name": "Max",
-            },
-            {
-              "family_name": "Sky",
-              "given_name": "Min",
-            }
-        ]
-      }
-    ]
-}
-```
-Since the two coaches cannot be represented properly in a single cell, another worksheet is needed
-to contain the properties of the coaches.
-The sheet for the Trainings in this example only contains the "date" column
-| date       |
-|------------|
-| 2023-01-01 |
-Additionally, there is *another* sheet where the coaches are stored. Here, it is crucial to define
-how the correct element is chosen from potentially multiple "Trainings". In this case, it means that
-the "date" must be unique.
-Note: This uniqueness requirement is not strictly checked right now, it is your responsibility as a
-user that such "foreign properties" are truly unique.
-The second sheet looks like this:
-| date       | `coach.family_name` | `coach.given_name` |
-|------------|---------------------|--------------------|
-| 2023-01-01 | Sky                 | Max                |
-| 2023-01-01 | Sky                 | Min                |
-## Data in XLSX: Hidden automation logic ##
-### First column: Marker for row types ###
-The first column in each sheet will be hidden and it will contain an entry in each row that needs
-special treatment.  The following values are used:
- ``IGNORE``: This row is ignored.  It can be used for explanatory texts or layout.
- ``COL_TYPE``: Typically the first row that is not `IGNORE`.  It indicates the row that defines the
-  type of columns (`FOREIGN`, `SCALAR`, `LIST`, `MULTIPLE_CHOICE`, `IGNORE`).  This row must occur
-  exactly once per sheet.
- ``PATH``: Indicates that the row is used to define the path within the JSON.  These rows are
-  typically hidden for users.
-An example table could look like this:
-| `IGNORE`   |                                     | Welcome        | to this       | file!        |                    |
-| `IGNORE`   |                                     | Please         | enter your    | data here:   |                    |
-| `COL_TYPE` | `IGNORE`                            | `SCALAR`       | `SCALAR`      | `LIST`       | `SCALAR`           |
-| `PATH`     |                                     | `Training`     | `Training`    | `Training`   | `Training`         |
-| `PATH`     |                                     | `url`          | `date`        | `subjects`   | `supervisor`       |
-| `PATH`     |                                     |                |               |              | `email`            |
-| `IGNORE`   | Please enter one training per line. | Training URL   | Training date | Subjects     | Supervisor's email |
-|------------|-------------------------------------|----------------|---------------|--------------|--------------------|
-|            |                                     | example.com/mp | 2024-02-27    | Math;Physics | steve@example.com  |
-|            |                                     | example.com/m  | 2024-02-27    | Math         | stella@example.com |
-### Parsing XLSX data ###
-To extract the value of a given cell, we traverse all path elements (in ``PATH`` rows) from top to
-bottom. The final element of the path is the name of the Property to which the value belongs.  In
-the example above, `steve@example.com` is the value of the `email` Property in the path
-`["Training", "supervisor", "email"]`.
-The path elements are sufficient to identify the object within a JSON, at least if the corresponding
-JSON element is a single object. If the JSON element is an array, the appropriate object within the
-array needs to be selected.
-For this selection additional ``FOREIGN`` columns are used. The paths in these columns must all have
-the same *base* and one additional *unique key* component.  For example, two `FOREIGN` columns could
-be `["Training", "date"]` and `["Training", "url"]`, where `["Training"]` is the *base path* and
-`"date"` and `"url"` are the *unique keys*.
-The base path defines the table (or recordtype) to which the entries belong, and the values of the
-unique keys define the actual rows to which data belongs.
-For example, this table defines three coaches for the two trainings from the last table:
-| `COL_TYPE` | `FOREIGN`             | `FOREIGN`             | `SCALAR`               |
-| `PATH`     | `Training`            | `Training`            | `Training`             |
-| `PATH`     | `date`                | `url`                 | `coach`                |
-| `PATH`     |                       |                       | `given_name`           |
-| `IGNORE`   | Date of training      | URL of training       | The coach's given name |
-| `IGNORE`   | from sheet 'Training' | from sheet 'Training' |                        |
-|------------|-----------------------|-----------------------|------------------------|
-|            | 2024-02-27            | example.com/mp        | Ada                    |
-|            | 2024-02-27            | example.com/mp        | Berta                  |
-|            | 2024-02-27            | example.com/m         | Chris                  |
-#### Sepcial case: multiple choice "checkboxes" ####
-As a special case, enum arrays with `"uniqueItems": true` can be represented as multiple columns,
-with one column per choice.  The choices are denoted as the last `PATH` component, the column type
-must be `MULTIPLE_CHOICE`.
-Stored data is denoted as an "x" character in the respective cell, empty cells denote that the item
-was not selected.  Additionally, the implementation also allows `TRUE` or `1` for selected items,
-and `FALSE`, `0` or cells with only whitespace characters for deselected items:
-| `COL_TYPE` | `MULTIPLE_CHOICE` | `MULTIPLE_CHOICE`    | `MULTIPLE_CHOICE` |
-| `PATH`     | `skills`          | `skills`             | `skills`          |
-| `PATH`     | `Planning`        | `Communication`      | `Evaluation`      |
-| `IGNORE`   | skills.Planning   | skills.Communication | skills.Evaluation |
-|------------|-------------------|----------------------|-------------------|
-|            | x                 |                      | X                 |
-|            | `"  "`            | `TRUE`               | `FALSE`           |
-|            | 0                 | x                    | 1                 |
-These rows correspond to:
-1. Planning, Evaluation
-2. Communication
-3. Communication, Evaluation
-## Current limitations ##
-The current implementation still lacks the following:
- Files handling is not implemented yet.
--- a/src/doc/table-json-conversion/specs.rst
+++ b/src/doc/table-json-conversion/specs.rst