Skip to content
Snippets Groups Projects

Draft: Automatic XLSX export

Open I. Nüske requested to merge f-enh-143-automatic-xlsx-exporting into dev
12 unresolved threads

Summary

Added a function to automatically export records from an Iterable to xlsx.
Addresses #143 / https://gitlab.indiscale.com/caosdb/customers/dimr/management/-/issues/269

Focus

Does this lack any features that are necessary and might have been missed during planning? Can this be used in the intended workflow?

Test Environment

Manual tests, ideally against test data I might not have available.
Additionally, if this needs to work on Windows and there is a working VM somewhere, this should probably also be tested there, as afaik NamedTemporaryFiles are somewhat error-prone on Win.

Check List for the Author

  • All automated tests pass
  • Reference related issues
  • Up-to-date CHANGELOG.md (or not necessary)
  • Up-to-date JSON schema (or not necessary)
  • Appropriate developer documentation
  • Appropriate user documentation
  • Tests (or not necessary)
  • Annotations in code (Gitlab comments)

Check List for the Reviewer

  • I understand the intent of this MR
  • All automated tests pass
  • Up-to-date CHANGELOG.md (or not necessary)
  • Appropriate user and developer documentation (or not necessary)
  • The test environment setup works and the intended behavior is reproducible in the test environment
  • In-code documentation and comments are up-to-date.
  • Check: Are there specifications? Are they satisfied?
Edited by I. Nüske

Merge request reports

Pipeline #61893 passed

Pipeline passed for 09ba1827 on f-enh-143-automatic-xlsx-exporting

Approval is optional
Ready to merge by members who can write to the target branch.
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
153 153 self.definitions = definitions
154 154
155 155
156 def _validate_jsonschema(instance, schema):
  • I. Nüske added 1 commit

    added 1 commit

    Compare with previous version

  • I. Nüske changed the description

    changed the description

  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 8 # it under the terms of the GNU Affero General Public License as
    9 # published by the Free Software Foundation, either version 3 of the
    10 # License, or (at your option) any later version.
    11 #
    12 # This program is distributed in the hope that it will be useful,
    13 # but WITHOUT ANY WARRANTY; without even the implied warranty of
    14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    15 # GNU Affero General Public License for more details.
    16 #
    17 # You should have received a copy of the GNU Affero General Public License
    18 # along with this program. If not, see <https://www.gnu.org/licenses/>.
    19
    20 """
    21 Utilities for validation of conversion / import / export results.
    22 For internal use.
    23 """
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 61 61 """
    62 if schema is not None:
    63 with open(schema, encoding="utf8", mode="r") as sch_f:
    64 model_schema = json.load(sch_f)
    65 data_schema = xlsx_utils.array_schema_from_model_schema(model_schema)
    66 else:
    67 data_schema = schema
    68
    62 69 with tempfile.TemporaryDirectory() as tmpdir:
    63 70 outfile = os.path.join(tmpdir, 'test.xlsx')
    64 71 assert not os.path.exists(outfile)
    65 72 if custom_output is not None:
    66 73 outfile = custom_output
    67 74 fill_template(data=json_file, template=template_file, result=outfile,
    68 validation_schema=schema)
    75 validation_schema=data_schema)
    • Comment on lines +62 to +75
      Author Maintainer

      Convert the model schema to the needed data schema for validation. Daniel had already fixed this for test_read_xlsx, this is analogous.

    • Please register or sign in to reply
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 354 355
    355 356 # Validation
    356 357 if validation_schema is not None:
    358 # convert to array_schema is given schema is a model_schema
    359 if 'properties' in validation_schema and validation_schema['properties'].values():
    360 if list(validation_schema['properties'].values())[0]["type"] != "array":
    357 361 validation_schema = array_schema_from_model_schema(read_or_dict(validation_schema))
    • Comment on lines +358 to 361
      Author Maintainer

      Check whether the given schema is a data schema, and if not, assume its the corresponding model schema and convert. This way, both can be given as schema.

    • Please register or sign in to reply
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 354 355
    355 356 # Validation
    356 357 if validation_schema is not None:
    358 # convert to array_schema is given schema is a model_schema
    359 if 'properties' in validation_schema and validation_schema['properties'].values():
    360 if list(validation_schema['properties'].values())[0]["type"] != "array":
    357 361 validation_schema = array_schema_from_model_schema(read_or_dict(validation_schema))
    358 362 try:
    359 # FIXME redefine checker for datetime
    360 validate(data, validation_schema, format_checker=FormatChecker())
    363 _validate_jsonschema(data, validation_schema)
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 355 356 # Validation
    356 357 if validation_schema is not None:
    358 # convert to array_schema is given schema is a model_schema
    359 if 'properties' in validation_schema and validation_schema['properties'].values():
    360 if list(validation_schema['properties'].values())[0]["type"] != "array":
    357 361 validation_schema = array_schema_from_model_schema(read_or_dict(validation_schema))
    358 362 try:
    359 # FIXME redefine checker for datetime
    360 validate(data, validation_schema, format_checker=FormatChecker())
    363 _validate_jsonschema(data, validation_schema)
    361 364 except ValidationError as verr:
    362 365 print(verr.message)
    363 366 raise verr
    364 367 else:
    365 print("No validation schema given, continue at your own risk.")
    368 warnings.warn("No validation schema given, continue at your own risk.")
    • Comment on lines -365 to +368
      Author Maintainer

      Changed to warnings.warn for filtering, and because the warnings in these files generally are done with warnings.warn. Should these also be replaced with logs?

    • Please register or sign in to reply
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 92 93 name_property_for_new_records : bool, optional
    93 94 Whether objects shall generally have a `name` property in the generated schema.
    94 95 Optional, default is False.
    96 use_id_for_identification: bool, optional
    97 If set to true, an 'id' property is added to all records, and
    98 foreign key references are assumed to be ids.
    • Comment on lines +96 to +98
      Author Maintainer

      Added use_id_for_identification to the JsonSchemaExporter, which sets identification to id instead of name / foreign keys / identifiables etc. To be able to do this, this also adds an id column to the schema.

    • Please register or sign in to reply
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 257 262 if inner_ui_schema:
    258 263 ui_schema["items"] = inner_ui_schema
    259 264 elif prop.is_reference():
    260 if prop.datatype == db.REFERENCE:
    265 if self._use_id_for_identification:
    266 json_prop["type"] = "object"
    267 json_prop["required"] = []
    268 json_prop["additionalProperties"] = False
    269 json_prop["title"] = prop.name
    270 if prop.datatype == db.FILE:
    271 json_prop["description"] = "Path to file"
    272 json_prop["properties"] = {"path": {"type": "string"}}
    273 else:
    274 json_prop["properties"] = {
    275 "id": {"oneOf": [{"type": "integer"}, {"type": "string"}]}}
    • Comment on lines +270 to +275
      Author Maintainer

      Ids are set to int/str, because this solves some of the problems with version-pinned references. Only exception for _use_id_for_identification is for files, where the path is used instead where possible.

      Edited by I. Nüske
    • Please register or sign in to reply
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 728 753 If given, also merge the react-jsonschema-forms from this argument and return as the second return
    729 754 value. If ``schemas`` is a dict, this parameter must also be a dict, if ``schemas`` is only an
    730 755 iterable, this paramater must support numerical indexing.
    756 return_data_schema : bool, default False
    757 If set to True, a second schema with all top-level entries wrapped in an
    758 array will be returned. This is necessary if the schema describes the
    759 data layout of an XLSX file.
    760 Cannot be used together with rjsf_uischemas.
    • Comment on lines 719 to +760
      Author Maintainer

      merge_schemas now has a parameter to indicate that the data_schema should also be returned. The new code does not need both, but as rjsf_uischemas already set the method to return a tuple, I figured that having the same behaviour would probably be desirable. Should it stay like this, or changed to only return the data_schema?

    • Please register or sign in to reply
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 91 json_data : dict
    92 The given records data in json form.
    93 """
    94 json_data = {}
    95 for record in records:
    96 # Convert records to high level api objects
    97 record_obj = convert_to_python_object(record)
    98 try:
    99 record_obj.resolve_references(False, None)
    100 except linkahead.LinkAheadException:
    101 warnings.warn(f"Data for record with id {record_obj.id} might be "
    102 f"incomplete, unsuccessful retrieve.")
    103 # Get json representation & adjust layout for compatibility
    104 raw_data = record_obj.serialize()
    105 raw_data.update(raw_data.get('properties', {}))
    106 raw_data.pop('properties')
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 182 if isinstance(element, (int, str)):
    183 elem_id = element
    184 elif isinstance(element, linkahead.Entity):
    185 elem_id = element.id
    186 else:
    187 warnings.warn(f"Cannot handle referenced "
    188 f"entity '{prop.value}'")
    189 continue
    190 entity_ids.add(elem_id)
    191 except linkahead.LinkAheadException as e:
    192 warnings.warn(f"Cannot handle referenced entity "
    193 f"'{prop.value}' because of error '{e}'")
    194 # Retrieve data
    195 new_records = []
    196 for entity_id in entity_ids:
    197 entity_id = str(entity_id).split('@')[0] # Queries cannot handle version
    • Author Maintainer

      FIND fails on version-pinned references (e.g. "FIND ENTITY WITH (ID = 114@f2h6dEr56)"), so we need to strip the version. This is not ideal, as this does remove some information that might be expected to be there, but I did not find a time-efficient workaround.

    • Please register or sign in to reply
  • I. Nüske
    I. Nüske @i.nueske started a thread on the diff
  • 209 for rt_id in recordtype_ids]
    210 recordtype_names = {recordtype.name for recordtype in recordtypes}
    211 # Generate schema and data from the records
    212 json_schema = _generate_jsonschema_from_recordtypes(recordtypes,
    213 jsonschema_filepath)
    214 json_data = _generate_jsondata_from_records(records, jsondata_filepath)
    215 # Generate xlsx template
    216 # _generate_xlsx_template_file needs a file name, so use NamedTemporaryFile
    217 # ToDo: This might not work on windows, if not, fix _generate file handling
    218 if xlsx_template_filepath is None:
    219 xlsx_template_file = tempfile.NamedTemporaryFile(suffix='.xlsx')
    220 xlsx_template_filepath = xlsx_template_file.name
    221 else:
    222 xlsx_template_file = None
    223 _generate_xlsx_template_file(json_schema, recordtype_names,
    224 xlsx_template_filepath)
    • Comment on lines +215 to +224
      Author Maintainer

      I do not have a Windows VM available to test this. If it does not work, the easiest workaround would probably be to change _generate_xlsx_template_file to also work with file-like objects and never write to disk.

    • Please register or sign in to reply
  • I. Nüske marked the checklist item Annotations in code (Gitlab comments) as completed

    marked the checklist item Annotations in code (Gitlab comments) as completed

  • added 2 commits

    • 93fdae41 - TST: Add more XLSX tests, unignore validation parameter in convert_and_compare, fix typo
    • 64e51fa1 - Merge branch 'f-144-more-xlsx-tests' into 'f-enh-143-automatic-xlsx-exporting'

    Compare with previous version

  • added 4 commits

    • b1d0ec96 - MNT: Suppress high_level_api import warning
    • aec14b4a - STY: Ignore style issue
    • 0e9a9433 - Merge branch 'f-enh-143-automatic-xlsx-exporting' into f-146-make-highlevelapi-warning-filterable
    • 09ba1827 - Merge branch 'f-146-make-highlevelapi-warning-filterable' into 'f-enh-143-automatic-xlsx-exporting'

    Compare with previous version

  • Please register or sign in to reply
    Loading