diff --git a/CHANGELOG.md b/CHANGELOG.md index 34ce18520e245cbc82c558967f549c72071f62ac..1a0dea9a118f25b21fb7e8216839746f0dcb256a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -52,6 +52,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Security ### ### Documentation ### +- Added documentation for ROCrateConverter, ELNFileConverter, and ROCrateEntityConverter ## [0.10.1] - 2024-11-13 ## diff --git a/src/doc/converters/further_converters.rst b/src/doc/converters/further_converters.rst index a334c8778f440e108fd141b0fc53ec06765deb8c..0fffc2e7de1bd23327194c6379cca94bd7c72a29 100644 --- a/src/doc/converters/further_converters.rst +++ b/src/doc/converters/further_converters.rst @@ -98,3 +98,90 @@ given ``recordname``, this record can be used within the cfood. Most importantly, this record stores the internal path of this array within the HDF5 file in a text property, the name of which can be configured with the ``internal_path_property_name`` option which defaults to ``internal_hdf5_path``. + + + +ROCrateConverter +================ + +The ROCrateConverter unpacks ro-crate files, and creates one instance of the +``ROCrateEntity`` structure element for each contained object. Currently only +zipped ro-crate files are supported. The created ROCrateEntities wrap a +``rocrate.model.entity.Entity`` with a path to the folder the ROCrate data +is saved in. They are appended as children and can then be accessed via the +subtree and treated using the :ref:`ROCrateEntityConverter`. + +To use the ROCrateConverter, you need to install the LinkAhead crawler with its +optional ``rocrate`` dependency. + +ELNFileConverter +---------------- + +As .eln files are zipped ro-crate files, the ELNFileConverter works analogously +to the ROCrateConverter and also creates ROCrateEntities for contained objects. + +ROCrateEntityConverter +---------------------- + +The ROCrateEntityConverter unpacks the ``rocrate.model.entity.Entity`` wrapped +within a ROCrateEntity, and appends all properties, contained files, and parts +as children. Properties are converted to a basic element matching their value +(``BooleanElement``, ``IntegerElement``, etc.) and can be matched using +match_properties. Each ``rocrate.model.file.File`` is converted to a crawler +File object, which can be matched with SimpleFile. And each subpart of the +ROCrateEntity is also converted to a ROCrateEntity, which can then again be +treated using this converter. + +The ``match_entity_type`` keyword can be used to match a ROCrateEntity using its +entity_type. With the ``match_properties`` keyword, properties of a ROCrateEntity +can be either matched or extracted, as seen in the cfood example below: +* with ``match_properties: "@id": ro-crate-metadata.json`` the ROCrateEntities +can be filtered to only match the metadata json files. +* with ``match_properties: dateCreated: (?P<dateCreated>.*)`` the ``dateCreated`` +entry of that metadata json file is extracted and accessible through the +``dateCreated`` variable. +* the example could then be extended to use any other entry present in the metadata +json to filter the results, or insert the extracted information into generated records. + +Example cfood +------------- + +One short cfood to generate records for each .eln file in a directory and +their metadata files could be: + +.. code-block:: yaml + + --- + metadata: + crawler-version: 0.9.0 + --- + Converters: + ELNFile: + converter: ELNFileConverter + package: caoscrawler.converters.rocrate + ROCrateEntity: + converter: ROCrateEntityConverter + package: caoscrawler.converters.rocrate + + ParentDirectory: + type: Directory + match: (.*) + subtree: + ELNFile: + type: ELNFile + match: (?P<filename>.*)\.eln + records: + ELNExampleRecord: + filename: $filename + subtree: + ROCrateEntity: + type: ROCrateEntity + match_properties: + "@id": ro-crate-metadata.json + dateCreated: (?P<dateCreated>.*) + records: + MDExampleRecord: + parent: $ELNFile + filename: ro-crate-metadata.json + time: $dateCreated +