Draft: Automatic XLSX export
Summary
Added a function to automatically export records from an Iterable to xlsx.
Addresses #143 / https://gitlab.indiscale.com/caosdb/customers/dimr/management/-/issues/269
Focus
Does this lack any features that are necessary and might have been missed during planning? Can this be used in the intended workflow?
Test Environment
Manual tests, ideally against test data I might not have available.
Additionally, if this needs to work on Windows and there is a working VM somewhere, this should probably also be tested there, as afaik NamedTemporaryFiles are somewhat error-prone on Win.
Check List for the Author
-
All automated tests pass -
Reference related issues -
Up-to-date CHANGELOG.md (or not necessary) -
Up-to-date JSON schema (or not necessary) -
Appropriate developer documentation -
Appropriate user documentation -
Tests (or not necessary) -
Annotations in code (Gitlab comments)
Check List for the Reviewer
-
I understand the intent of this MR -
All automated tests pass -
Up-to-date CHANGELOG.md (or not necessary) -
Appropriate user and developer documentation (or not necessary) -
The test environment setup works and the intended behavior is reproducible in the test environment -
In-code documentation and comments are up-to-date. -
Check: Are there specifications? Are they satisfied?
Merge request reports
Activity
assigned to @i.nueske
8 # it under the terms of the GNU Affero General Public License as 9 # published by the Free Software Foundation, either version 3 of the 10 # License, or (at your option) any later version. 11 # 12 # This program is distributed in the hope that it will be useful, 13 # but WITHOUT ANY WARRANTY; without even the implied warranty of 14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 # GNU Affero General Public License for more details. 16 # 17 # You should have received a copy of the GNU Affero General Public License 18 # along with this program. If not, see <https://www.gnu.org/licenses/>. 19 20 """ 21 Utilities for validation of conversion / import / export results. 22 For internal use. 23 """ 61 61 """ 62 if schema is not None: 63 with open(schema, encoding="utf8", mode="r") as sch_f: 64 model_schema = json.load(sch_f) 65 data_schema = xlsx_utils.array_schema_from_model_schema(model_schema) 66 else: 67 data_schema = schema 68 62 69 with tempfile.TemporaryDirectory() as tmpdir: 63 70 outfile = os.path.join(tmpdir, 'test.xlsx') 64 71 assert not os.path.exists(outfile) 65 72 if custom_output is not None: 66 73 outfile = custom_output 67 74 fill_template(data=json_file, template=template_file, result=outfile, 68 validation_schema=schema) 75 validation_schema=data_schema) 354 355 355 356 # Validation 356 357 if validation_schema is not None: 358 # convert to array_schema is given schema is a model_schema 359 if 'properties' in validation_schema and validation_schema['properties'].values(): 360 if list(validation_schema['properties'].values())[0]["type"] != "array": 357 361 validation_schema = array_schema_from_model_schema(read_or_dict(validation_schema)) 354 355 355 356 # Validation 356 357 if validation_schema is not None: 358 # convert to array_schema is given schema is a model_schema 359 if 'properties' in validation_schema and validation_schema['properties'].values(): 360 if list(validation_schema['properties'].values())[0]["type"] != "array": 357 361 validation_schema = array_schema_from_model_schema(read_or_dict(validation_schema)) 358 362 try: 359 # FIXME redefine checker for datetime 360 validate(data, validation_schema, format_checker=FormatChecker()) 363 _validate_jsonschema(data, validation_schema) 355 356 # Validation 356 357 if validation_schema is not None: 358 # convert to array_schema is given schema is a model_schema 359 if 'properties' in validation_schema and validation_schema['properties'].values(): 360 if list(validation_schema['properties'].values())[0]["type"] != "array": 357 361 validation_schema = array_schema_from_model_schema(read_or_dict(validation_schema)) 358 362 try: 359 # FIXME redefine checker for datetime 360 validate(data, validation_schema, format_checker=FormatChecker()) 363 _validate_jsonschema(data, validation_schema) 361 364 except ValidationError as verr: 362 365 print(verr.message) 363 366 raise verr 364 367 else: 365 print("No validation schema given, continue at your own risk.") 368 warnings.warn("No validation schema given, continue at your own risk.") 92 93 name_property_for_new_records : bool, optional 93 94 Whether objects shall generally have a `name` property in the generated schema. 94 95 Optional, default is False. 96 use_id_for_identification: bool, optional 97 If set to true, an 'id' property is added to all records, and 98 foreign key references are assumed to be ids. 257 262 if inner_ui_schema: 258 263 ui_schema["items"] = inner_ui_schema 259 264 elif prop.is_reference(): 260 if prop.datatype == db.REFERENCE: 265 if self._use_id_for_identification: 266 json_prop["type"] = "object" 267 json_prop["required"] = [] 268 json_prop["additionalProperties"] = False 269 json_prop["title"] = prop.name 270 if prop.datatype == db.FILE: 271 json_prop["description"] = "Path to file" 272 json_prop["properties"] = {"path": {"type": "string"}} 273 else: 274 json_prop["properties"] = { 275 "id": {"oneOf": [{"type": "integer"}, {"type": "string"}]}} - Comment on lines +270 to +275
Ids are set to int/str, because this solves some of the problems with version-pinned references. Only exception for _use_id_for_identification is for files, where the path is used instead where possible.
Edited by I. Nüske
728 753 If given, also merge the react-jsonschema-forms from this argument and return as the second return 729 754 value. If ``schemas`` is a dict, this parameter must also be a dict, if ``schemas`` is only an 730 755 iterable, this paramater must support numerical indexing. 756 return_data_schema : bool, default False 757 If set to True, a second schema with all top-level entries wrapped in an 758 array will be returned. This is necessary if the schema describes the 759 data layout of an XLSX file. 760 Cannot be used together with rjsf_uischemas. - Comment on lines 719 to +760
merge_schemas now has a parameter to indicate that the data_schema should also be returned. The new code does not need both, but as rjsf_uischemas already set the method to return a tuple, I figured that having the same behaviour would probably be desirable. Should it stay like this, or changed to only return the data_schema?
91 json_data : dict 92 The given records data in json form. 93 """ 94 json_data = {} 95 for record in records: 96 # Convert records to high level api objects 97 record_obj = convert_to_python_object(record) 98 try: 99 record_obj.resolve_references(False, None) 100 except linkahead.LinkAheadException: 101 warnings.warn(f"Data for record with id {record_obj.id} might be " 102 f"incomplete, unsuccessful retrieve.") 103 # Get json representation & adjust layout for compatibility 104 raw_data = record_obj.serialize() 105 raw_data.update(raw_data.get('properties', {})) 106 raw_data.pop('properties') 182 if isinstance(element, (int, str)): 183 elem_id = element 184 elif isinstance(element, linkahead.Entity): 185 elem_id = element.id 186 else: 187 warnings.warn(f"Cannot handle referenced " 188 f"entity '{prop.value}'") 189 continue 190 entity_ids.add(elem_id) 191 except linkahead.LinkAheadException as e: 192 warnings.warn(f"Cannot handle referenced entity " 193 f"'{prop.value}' because of error '{e}'") 194 # Retrieve data 195 new_records = [] 196 for entity_id in entity_ids: 197 entity_id = str(entity_id).split('@')[0] # Queries cannot handle version 209 for rt_id in recordtype_ids] 210 recordtype_names = {recordtype.name for recordtype in recordtypes} 211 # Generate schema and data from the records 212 json_schema = _generate_jsonschema_from_recordtypes(recordtypes, 213 jsonschema_filepath) 214 json_data = _generate_jsondata_from_records(records, jsondata_filepath) 215 # Generate xlsx template 216 # _generate_xlsx_template_file needs a file name, so use NamedTemporaryFile 217 # ToDo: This might not work on windows, if not, fix _generate file handling 218 if xlsx_template_filepath is None: 219 xlsx_template_file = tempfile.NamedTemporaryFile(suffix='.xlsx') 220 xlsx_template_filepath = xlsx_template_file.name 221 else: 222 xlsx_template_file = None 223 _generate_xlsx_template_file(json_schema, recordtype_names, 224 xlsx_template_filepath) added 4 commits
- b1d0ec96 - MNT: Suppress high_level_api import warning
- aec14b4a - STY: Ignore style issue
- 0e9a9433 - Merge branch 'f-enh-143-automatic-xlsx-exporting' into f-146-make-highlevelapi-warning-filterable
- 09ba1827 - Merge branch 'f-146-make-highlevelapi-warning-filterable' into 'f-enh-143-automatic-xlsx-exporting'
Toggle commit list