Skip to content
Snippets Groups Projects
Verified Commit ae2905d9 authored by Daniel Hornung's avatar Daniel Hornung
Browse files

MAINT, DOC: Renamed to "data array schema", added docs.

parent bfdfccd9
No related branches found
No related tags found
2 merge requests!107Release v0.11.0,!101ENH: table_json_conversion/xlsx_utils.py: data array schema generation
......@@ -33,6 +33,7 @@ from openpyxl import load_workbook, Workbook
from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE
from .xlsx_utils import (
array_schema_from_model_schema,
get_foreign_key_columns,
get_row_type_column_index,
is_exploded_sheet,
......@@ -332,7 +333,7 @@ validation_schema: dict, optional
# Validation
if validation_schema is not None:
validation_schema = read_or_dict(validation_schema)
validation_schema = array_schema_from_model_schema(read_or_dict(validation_schema))
try:
validate(data, validation_schema, format_checker=FormatChecker())
except ValidationError as verr:
......
......@@ -61,13 +61,13 @@ class RowType(Enum):
IGNORE = 3
def data_schema_from_model_schema(model_schema: dict) -> dict:
"""Convert a *model* schema to a *data* schema.
def array_schema_from_model_schema(model_schema: dict) -> dict:
"""Convert a *data model* schema to a *data array* schema.
Practically, this means that the top level properties are converted into lists. In a simplified
notation, this can be expressed as:
``data_schema = { elem: [elem typed data...] for elem in model_schema }``
``array_schema = { elem: [elem typed data...] for elem in model_schema }``
Parameters
----------
......@@ -77,7 +77,7 @@ model_schema: dict
Returns
-------
data_schema: dict
array_schema: dict
A corresponding json schema, where the properties are arrays with the types of the input's
top-level properties.
"""
......
......@@ -13,14 +13,16 @@ The data model in LinkAhead defines the types of records present in a LinkAhead
structure. This data model can also be represented in a JSON Schema, which defines the structure of
JSON files containing records pertaining to the data model.
For example, the following JSON can describe a "Person" Record:
For example, the following JSON can describe a singe "Person" Record:
```JSON
{
"Person": {
"family_name": "Steve",
"given_name": "Stevie"
}
"Person": [
{
"family_name": "Steve",
"given_name": "Stevie"
}
]
}
```
......@@ -30,6 +32,43 @@ the storage of "Training" Records containing information about conducted trainin
particularly valuable for data import and export. One could generate web forms from the JSON Schema
or use it to export objects stored in LinkAhead as JSON.
### Note: Data models and data arrays ###
The schema as created by ``json_schema_exporter.recordtype_to_json_schema(...)`` is, from a broad
view, a dict with all the top level recordtypes (the recordtype names are the keys). While this is
appropriate for the generation of user input forms, data often consists of multiple entries of the
same type. XLSX files are no exception, users expect that they may enter multiple rows of data.
Since the data model schema does not match multiple data sets, there is a utility function which
create a *data array* schema out of the *data model* schema: It basically replaces the top-level
entries of the data model by lists which may contain data.
A **short example** illustrates this well. Consider a *data model* schema which fits to this data
content:
```JSON
{
"Person": {
"name": "Charly"
}
}
```
Now the automatically generated *data array* schema would accept the following data:
```JSON
{
"Person": [
{
"name": "Charly"
},
{
"name": "Sam"
}
]
}
```
## From JSON to XLSX: Data Representation ##
The following describes how JSON files representing LinkAhead records are converted into XLSX files,
......@@ -67,33 +106,45 @@ Let's now consider these four cases in detail and with examples:
```JSON
{
"Training": {
"Training": [
{
"date": "2023-01-01",
"url": "www.indiscale.com",
"duration": 1.0,
"participants": 1,
"remote": false
}
},
{
"date": "2023-06-15",
"url": "www.indiscale.com/next",
"duration": 2.5,
"participants": None,
"remote": true
}
]
}
```
This entry will be represented in an XLSX sheet with the following content:
| date | url | duration | participants | remote |
|------------|-------------------|----------|--------------|--------|
| 2023-01-01 | www.indiscale.com | 1.0 | 1 | false |
| date | url | duration | participants | remote |
|------------|------------------------|----------|--------------|--------|
| 2023-01-01 | www.indiscale.com | 1.0 | 1 | false |
| 2023-06-15 | www.indiscale.com/next | 2.5 | | true |
### b. Property referencing a record ###
```JSON
{
"Training": {
"Training": [
{
"date": "2023-01-01",
"supervisor": {
"family_name": "Stevenson",
"given_name": "Stevie",
}
}
}
]
}
```
......@@ -110,10 +161,12 @@ through the content of hidden rows. (See below for the definition of hidden row
```JSON
{
"Training": {
"Training": [
{
"url": "www.indiscale.com",
"subjects": ["Math", "Physics"],
}
}
]
}
```
......@@ -130,13 +183,15 @@ the separator `;`, it is escaped with `\\`.
```JSON
{
"Training": {
"Training": [
{
"date": "2024-04-17",
"skills": [
"Planning",
"Evaluation"
]
}
}
]
}
```
......@@ -154,7 +209,8 @@ Note that this example assumes that the list of possible choices, as given in th
```JSON
{
"Training": {
"Training": [
{
"date": "2023-01-01",
"coach": [
{
......@@ -166,7 +222,8 @@ Note that this example assumes that the list of possible choices, as given in th
"given_name": "Min",
}
]
}
}
]
}
```
......@@ -281,6 +338,4 @@ These rows correspond to:
The current implementation still lacks the following:
- Lists of enum references are not yet implemented as columns where matching cell can simply be
ticked/crossed.
- Files handling is not implemented yet.
#!/usr/bin/env python3
# encoding: utf-8
#
# This file is a part of the LinkAhead Project.
#
# Copyright (C) 2024 Indiscale GmbH <info@indiscale.com>
# Copyright (C) 2024 Henrik tom Wörden <h.tomwoerden@indiscale.com>
# Copyright (C) 2024 Daniel Hornung <d.hornung@indiscale.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
......@@ -165,6 +165,6 @@ def test_errors():
def test_data_schema_generation():
model_schema = xlsx_utils.read_or_dict(rfp("data/simple_schema.json"))
data_schema = xlsx_utils.data_schema_from_model_schema(model_schema)
array_schema = xlsx_utils.array_schema_from_model_schema(model_schema)
expected = xlsx_utils.read_or_dict(rfp("data/simple_data_schema.json"))
assert data_schema == expected
assert array_schema == expected
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment