Skip to content
Snippets Groups Projects
Commit 0186c0ee authored by Florian Spreckelsen's avatar Florian Spreckelsen
Browse files

Merge branch 'dev' into f-fix-rocrate

parents dd3f75bf 3c836b10
Branches
Tags
2 merge requests!217TST: Make NamedTemporaryFiles Windows-compatible,!215Fix issues in rocrate support
Pipeline #60539 passed
......@@ -55,6 +55,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Security ###
### Documentation ###
- Added documentation for ROCrateConverter, ELNFileConverter, and ROCrateEntityConverter
## [0.10.1] - 2024-11-13 ##
......
......@@ -98,3 +98,90 @@ given ``recordname``, this record can be used within the cfood. Most
importantly, this record stores the internal path of this array within the HDF5
file in a text property, the name of which can be configured with the
``internal_path_property_name`` option which defaults to ``internal_hdf5_path``.
ROCrateConverter
================
The ROCrateConverter unpacks ro-crate files, and creates one instance of the
``ROCrateEntity`` structure element for each contained object. Currently only
zipped ro-crate files are supported. The created ROCrateEntities wrap a
``rocrate.model.entity.Entity`` with a path to the folder the ROCrate data
is saved in. They are appended as children and can then be accessed via the
subtree and treated using the :ref:`ROCrateEntityConverter`.
To use the ROCrateConverter, you need to install the LinkAhead crawler with its
optional ``rocrate`` dependency.
ELNFileConverter
----------------
As .eln files are zipped ro-crate files, the ELNFileConverter works analogously
to the ROCrateConverter and also creates ROCrateEntities for contained objects.
ROCrateEntityConverter
----------------------
The ROCrateEntityConverter unpacks the ``rocrate.model.entity.Entity`` wrapped
within a ROCrateEntity, and appends all properties, contained files, and parts
as children. Properties are converted to a basic element matching their value
(``BooleanElement``, ``IntegerElement``, etc.) and can be matched using
match_properties. Each ``rocrate.model.file.File`` is converted to a crawler
File object, which can be matched with SimpleFile. And each subpart of the
ROCrateEntity is also converted to a ROCrateEntity, which can then again be
treated using this converter.
The ``match_entity_type`` keyword can be used to match a ROCrateEntity using its
entity_type. With the ``match_properties`` keyword, properties of a ROCrateEntity
can be either matched or extracted, as seen in the cfood example below:
* with ``match_properties: "@id": ro-crate-metadata.json`` the ROCrateEntities
can be filtered to only match the metadata json files.
* with ``match_properties: dateCreated: (?P<dateCreated>.*)`` the ``dateCreated``
entry of that metadata json file is extracted and accessible through the
``dateCreated`` variable.
* the example could then be extended to use any other entry present in the metadata
json to filter the results, or insert the extracted information into generated records.
Example cfood
-------------
One short cfood to generate records for each .eln file in a directory and
their metadata files could be:
.. code-block:: yaml
---
metadata:
crawler-version: 0.9.0
---
Converters:
ELNFile:
converter: ELNFileConverter
package: caoscrawler.converters.rocrate
ROCrateEntity:
converter: ROCrateEntityConverter
package: caoscrawler.converters.rocrate
ParentDirectory:
type: Directory
match: (.*)
subtree:
ELNFile:
type: ELNFile
match: (?P<filename>.*)\.eln
records:
ELNExampleRecord:
filename: $filename
subtree:
ROCrateEntity:
type: ROCrateEntity
match_properties:
"@id": ro-crate-metadata.json
dateCreated: (?P<dateCreated>.*)
records:
MDExampleRecord:
parent: $ELNFile
filename: ro-crate-metadata.json
time: $dateCreated
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment