<!--This decision was made prior to the beginning of the project in the proposal.
The quality check (for meta data) will not be implemented in the existing gitlab-runner (Demonstrator 4.2), but directly in the new ruqad-component which is based on the LinkAhead crawler.
Date: 2024-10-01-->
<!-- TODO: Describe: Don't implement fairness check in 4.2 demonstrator, but in linkahead crawler. -->
<!-- TODO: Alex -->
<!-- TODO: Check date of protocol. -->
Date: 2024-11-08
## Status
...
...
@@ -16,32 +10,19 @@ Accepted
## Context
<!--Pulling data from Kadi4Mat and publishing it to the EDC-based BatCAT Data Space.-->
Meta data that is contained in the exported ELN from kadi4mat needs to be checked to make sure that
the data model matches the data model of LinkAhead.
## Decision
<!--We will use the LinkAhead Crawler as the base of the RuQaD Demonstrator.
In the BatCAT Data Space LinkAhead is being used as Data and Knowlegde
Management System. Instead of publishing assets directly to the catalog which
would leave us with the task to host and serve the raw data, we can reuse this
infrastructure.
The established way to insert data into LinkAhead following an ETL approach is
the LinkAhead Crawler.
Additionally, the LinkAhead Crawler is a Python framework and Kadi4Mat has a
Python API as well which is a good match.-->
The meta data check will be implemented in the ruqad-component in the crawler module. It will work as follows:
- The scanner of the crawler is run in order to collect all information from the file system.
- A json schema is generated from the data model in LinkAhead. This data model is stored in a YAML file.
- The meta data that was found by the scanner is validated against the json schema. In case of unsuccessful validation
detailed error messages are shown.
- If the validation was successful, the crawler will be run completely and insert and update records in LinkAhead based on the scanning results.
## Consequences
<!--We don't have to take car about the hosting and serving of the data in the data
space. The RuQaD Demonstrator will be a purely functional, i.e. stateless
component.
We have a solid base for defining the mapping from the input data to the
desired output and for defining and checking criteria, e.g. FAIRness criteria.
On the downside, the RuQaD demonstrator will only be usable for system
landscapes and Data Spaces using LinkAhead. However, it is always possible to
just add a LinkAhead instance to any EDC-based data space.-->
We don't need to modify the code base of the existing gitlab-runner (Demonstrator 4.2).
Futhermore, the result of the scanner that is generated anyways will be re-used for validating the meta data.