diff --git a/.docker/Dockerfile b/.docker/Dockerfile index 8e18096490632372d589749867c41e2244a67c11..dd4f3d258443dc1f8b2bacb8d535780e8e37e5e8 100644 --- a/.docker/Dockerfile +++ b/.docker/Dockerfile @@ -7,8 +7,10 @@ RUN apt-get update && \ python3-autopep8 \ python3-pip \ python3-pytest \ + python3-sphinx \ tox \ -y +RUN pip3 install recommonmark sphinx-rtd-theme COPY .docker/wait-for-it.sh /wait-for-it.sh ARG PYLIB ADD https://gitlab.indiscale.com/api/v4/projects/97/repository/commits/${PYLIB} \ diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index 67415c1b3a7da52e3179bec8463cd69ac3c667aa..8840e613f1e1eb86f30779b8b3535e2ff97ad0cc 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -296,8 +296,9 @@ style: pages_prepare: &pages_prepare tags: [ cached-dind ] stage: deploy - needs: [] - image: $CI_REGISTRY/caosdb/src/caosdb-pylib/testenv:latest + needs: + - job: build-testenv + image: $CI_REGISTRY_IMAGE only: refs: - /^release-.*$/i diff --git a/.gitlab/merge_request_templates/Default.md b/.gitlab/merge_request_templates/Default.md index 7859b7be21fb1c3eda91ee35173a8e3412a62066..b3eec01c595a461beec1b0a50fb598bdf8108c77 100644 --- a/.gitlab/merge_request_templates/Default.md +++ b/.gitlab/merge_request_templates/Default.md @@ -27,6 +27,7 @@ guidelines](https://gitlab.com/caosdb/caosdb/-/blob/dev/REVIEW_GUIDELINES.md) - [ ] Reference related issues - [ ] Up-to-date CHANGELOG.md (or not necessary) - [ ] Appropriate user and developer documentation (or not necessary) + - Update / write published documentation (`make doc`). - How do I use the software? Assume "stupid" users. - How do I develop or debug the software? Assume novice developers. - [ ] Annotations in code (Gitlab comments) @@ -40,7 +41,8 @@ guidelines](https://gitlab.com/caosdb/caosdb/-/blob/dev/REVIEW_GUIDELINES.md) - [ ] I understand the intent of this MR - [ ] All automated tests pass - [ ] Up-to-date CHANGELOG.md (or not necessary) -- [ ] Appropriate user and developer documentation (or not necessary) +- [ ] Appropriate user and developer documentation (or not necessary), also in published + documentation. - [ ] The test environment setup works and the intended behavior is reproducible in the test environment - [ ] In-code documentation and comments are up-to-date. diff --git a/CHANGELOG.md b/CHANGELOG.md index c7e51f3018e78e22fbcaa48400baab5e868281ad..29013891d7f53dd4c4d8164e79eecfa169fcb289 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,12 +10,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added ### ### Changed ### +- If the `parents` key is used in a cfood at a lower level for a Record that + already has a Parent (because it was explicitly given or the default Parent), + the old Parent(s) are now overwritten with the value belonging to the + `parents` key. +- If a registered identifiable states, that a reference by a Record with parent + RT1 is needed, then now also references from Records that have a child of RT1 + as parent are accepted. ### Deprecated ### ### Removed ### ### Fixed ### +- Empty Records can now be created (https://gitlab.com/caosdb/caosdb-crawler/-/issues/27) + +* [#58](https://gitlab.com/caosdb/caosdb-crawler/-/issues/58) Documentation builds API docs in pipeline now. ### Security ### diff --git a/README.md b/README.md index 6c94473c066439b1645712c0046cd890b6b38715..39f8d36769a520f35e717d180537a4cce704180c 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,11 @@ -# CaosDB-Crawler ## Welcome -This is the repository of the CaosDB-Crawler, a tool for automatic data -insertion into [CaosDB](https://gitlab.com/caosdb/caosdb-meta). +This is the repository of the LinkAhead Crawler, a tool for automatic data +insertion into [LinkAhead](https://gitlab.com/linkahead/linkahead). This is a new implementation resolving problems of the original implementation -in [caosdb-advancedtools](https://gitlab.com/caosdb/caosdb-advanced-user-tools) +in [LinkAhead Python Advanced User Tools](https://gitlab.com/caosdb/caosdb-advanced-user-tools) ## Setup @@ -16,20 +15,23 @@ setup this code. ## Further Reading -Please refer to the [official documentation](https://docs.indiscale.com/caosdb-crawler/) of the CaosDB-Crawler for more information. +Please refer to the [official documentation](https://docs.indiscale.com/caosdb-crawler/) of the LinkAhead Crawler for more information. ## Contributing -Thank you very much to all contributers—[past, present](https://gitlab.com/caosdb/caosdb/-/blob/dev/HUMANS.md), and prospective ones. +Thank you very much to all contributers—[past, +present](https://gitlab.com/linkahead/linkahead/-/blob/main/HUMANS.md), and prospective +ones. ### Code of Conduct -By participating, you are expected to uphold our [Code of Conduct](https://gitlab.com/caosdb/caosdb/-/blob/dev/CODE_OF_CONDUCT.md). +By participating, you are expected to uphold our [Code of +Conduct](https://gitlab.com/linkahead/linkahead/-/blob/main/CODE_OF_CONDUCT.md). ### How to Contribute * You found a bug, have a question, or want to request a feature? Please -[create an issue](https://gitlab.com/caosdb/caosdb-crawler). +[create an issue](https://gitlab.com/linkahead/linkahead-crawler/-/issues). * You want to contribute code? * **Forking:** Please fork the repository and create a merge request in GitLab and choose this repository as target. Make sure to select "Allow commits from members who can merge the target branch" under @@ -38,9 +40,8 @@ By participating, you are expected to uphold our [Code of Conduct](https://gitla * **Code style:** This project adhers to the PEP8 recommendations, you can test your code style using the `autopep8` tool (`autopep8 -i -r ./`). Please write your doc strings following the [NumpyDoc](https://numpydoc.readthedocs.io/en/latest/format.html) conventions. -* You can also contact us at **info (AT) caosdb.de** and join the - CaosDB community on - [#caosdb:matrix.org](https://matrix.to/#/!unwwlTfOznjEnMMXxf:matrix.org). +* You can also join the LinkAhead community on + [#linkahead:matrix.org](https://matrix.to/#/!unwwlTfOznjEnMMXxf:matrix.org). There is the file `unittests/records.xml` that servers as a dummy for a server state with files. diff --git a/integrationtests/test_issues.py b/integrationtests/test_issues.py index 86ce9307a74606bea03aa83b273de259041abf58..3bdac745f392cf747d4c4a46378047b76b04e2b4 100644 --- a/integrationtests/test_issues.py +++ b/integrationtests/test_issues.py @@ -114,7 +114,7 @@ def test_issue_23(clear_database): assert rec_crawled.get_property("identifying_prop").value == "identifier" assert rec_crawled.get_property("prop_b") is not None assert rec_crawled.get_property("prop_b").value == "something_else" - # no interaction with the database yet, so the rrecord shouldn't have a prop_a yet + # no interaction with the database yet, so the record shouldn't have a prop_a yet assert rec_crawled.get_property("prop_a") is None # synchronize with database and update the record @@ -133,3 +133,78 @@ def test_issue_23(clear_database): "identifying_prop").value == rec_crawled.get_property("identifying_prop").value assert rec_retrieved.get_property( "prop_b").value == rec_crawled.get_property("prop_b").value + + +def test_issue_83(clear_database): + """https://gitlab.com/linkahead/linkahead-crawler/-/issues/83. Test that + names don't need to be unique for referenced entities if they are not part + of the identifiable. + + """ + + # Very simple data model + identifying_prop = db.Property(name="IdentifyingProp", datatype=db.INTEGER).insert() + referenced_type = db.RecordType(name="ReferencedType").add_property( + name=identifying_prop.name, importance=db.OBLIGATORY).insert() + referencing_type = db.RecordType(name="ReferencingType").add_property( + name=referenced_type.name, datatype=db.LIST(referenced_type.name)).insert() + + # Define identifiables. ReferencingType by name, ReferencedType by + # IdentifyingProp and not by name. + ident = CaosDBIdentifiableAdapter() + ident.register_identifiable(referenced_type.name, db.RecordType().add_parent( + name=referenced_type.name).add_property(name=identifying_prop.name)) + ident.register_identifiable(referencing_type.name, db.RecordType().add_parent( + name=referencing_type.name).add_property(name="name")) + + crawler = Crawler(identifiableAdapter=ident) + + ref_target1 = db.Record(name="RefTarget").add_parent( + name=referenced_type.name).add_property(name=identifying_prop.name, value=1) + ref_target2 = db.Record(name="RefTarget").add_parent( + name=referenced_type.name).add_property(name=identifying_prop.name, value=2) + + referencing1 = db.Record(name="Referencing1").add_parent( + name=referencing_type.name).add_property(name=referenced_type.name, value=[ref_target1]) + referencing2 = db.Record(name="Referencing2").add_parent( + name=referencing_type.name).add_property(name=referenced_type.name, value=[ref_target2]) + referencing3 = db.Record(name="Referencing3").add_parent(name=referencing_type.name).add_property( + name=referenced_type.name, value=[ref_target1, ref_target2]) + + records = db.Container().extend( + [ref_target1, ref_target2, referencing1, referencing2, referencing3]) + + ins, ups = crawler.synchronize(crawled_data=records, unique_names=False) + assert len(ins) == len(records) + assert len(ups) == 0 + + retrieved_target1 = db.execute_query( + f"FIND {referenced_type.name} WITH {identifying_prop.name}=1", unique=True) + retrieved_target2 = db.execute_query( + f"FIND {referenced_type.name} WITH {identifying_prop.name}=2", unique=True) + assert retrieved_target2.name == retrieved_target1.name + assert retrieved_target1.name == ref_target1.name + assert retrieved_target1.id != retrieved_target2.id + + retrieved_referencing1 = db.execute_query( + f"FIND {referencing_type.name} WITH name={referencing1.name}", unique=True) + assert retrieved_referencing1.get_property(referenced_type.name) is not None + assert retrieved_referencing1.get_property(referenced_type.name).value == [ + retrieved_target1.id] + assert retrieved_referencing1.get_property(referenced_type.name).value != [ + retrieved_target2.id] + + retrieved_referencing2 = db.execute_query( + f"FIND {referencing_type.name} WITH name={referencing2.name}", unique=True) + assert retrieved_referencing2.get_property(referenced_type.name) is not None + assert retrieved_referencing2.get_property(referenced_type.name).value == [ + retrieved_target2.id] + assert retrieved_referencing2.get_property(referenced_type.name).value != [ + retrieved_target1.id] + + retrieved_referencing3 = db.execute_query( + f"FIND {referencing_type.name} WITH name={referencing3.name}", unique=True) + assert retrieved_referencing3.get_property(referenced_type.name) is not None + assert len(retrieved_referencing3.get_property(referenced_type.name).value) == 2 + assert retrieved_target1.id in retrieved_referencing3.get_property(referenced_type.name).value + assert retrieved_target2.id in retrieved_referencing3.get_property(referenced_type.name).value diff --git a/setup.cfg b/setup.cfg index fbbdc1031b4e023b351bd8c331a2aa579e0cf5b9..32edcde630172cb991ea28898ae0c5e9f5770f90 100644 --- a/setup.cfg +++ b/setup.cfg @@ -20,8 +20,8 @@ packages = find: python_requires = >=3.7 install_requires = importlib-resources - caosdb > 0.11.2 caosadvancedtools >= 0.7.0 + linkahead >= 0.13.1 yaml-header-tools >= 0.2.1 pyyaml odfpy #make optional diff --git a/src/caoscrawler/config.py b/src/caoscrawler/config.py index 18993b539a09aa58fa280759333b3e7fd315c5e0..8a5a2c48e714f721855d05d6ae6df2412c27836e 100644 --- a/src/caoscrawler/config.py +++ b/src/caoscrawler/config.py @@ -17,7 +17,7 @@ # along with this program. If not, see <https://www.gnu.org/licenses/>. # -import caosdb as db +import linkahead as db DEFAULTS = { "send_crawler_notifications": False, diff --git a/src/caoscrawler/converters.py b/src/caoscrawler/converters.py index 94efb6de13d3ec0867df8585f702ae8d8c79ab8f..5a3f4a090785f762cad49054570d15a62649bfbe 100644 --- a/src/caoscrawler/converters.py +++ b/src/caoscrawler/converters.py @@ -24,29 +24,29 @@ # from __future__ import annotations -from jsonschema import validate, ValidationError -import os -import re import datetime -import caosdb as db import json +import logging +import os +import re import warnings -from .utils import has_parent -from .stores import GeneralStore, RecordStore -from .structure_elements import (StructureElement, Directory, File, DictElement, JSONFile, - IntegerElement, BooleanElement, FloatElement, NoneElement, - TextElement, TextElement, ListElement) -from typing import List, Optional, Tuple, Union from abc import ABCMeta, abstractmethod from string import Template -import yaml_header_tools +from typing import List, Optional, Tuple, Union +import caosdb as db import pandas as pd -import logging - - import yaml +import yaml_header_tools +from jsonschema import ValidationError, validate + +from .stores import GeneralStore, RecordStore +from .structure_elements import (BooleanElement, DictElement, Directory, File, + FloatElement, IntegerElement, JSONFile, + ListElement, NoneElement, StructureElement, + TextElement) +from .utils import has_parent # These are special properties which are (currently) treated differently # by the converters: @@ -235,6 +235,12 @@ def create_records(values: GeneralStore, records: RecordStore, def_records: dict keys_modified = [] for name, record in def_records.items(): + # If only a name was given (Like this: + # Experiment: + # ) set record to an empty dict / empty configuration + if record is None: + record = {} + role = "Record" # This allows us to create e.g. Files if "role" in record: @@ -300,6 +306,7 @@ def create_records(values: GeneralStore, records: RecordStore, def_records: dict # no matter whether the record existed in the record store or not, # parents will be added when they aren't present in the record yet: if "parents" in record: + c_record.parents.clear() for parent in record["parents"]: # Do the variables replacement: var_replaced_parent = replace_variables(parent, values) diff --git a/src/caoscrawler/crawl.py b/src/caoscrawler/crawl.py index dd8edd3a7e62c892ab142bc489619c64bd6dc77f..2aeb220cb3279c5bca367305f374218c4ce5c304 100644 --- a/src/caoscrawler/crawl.py +++ b/src/caoscrawler/crawl.py @@ -36,6 +36,7 @@ import importlib import logging import os import sys +import traceback import uuid import warnings from argparse import RawTextHelpFormatter @@ -407,12 +408,12 @@ class Crawler(object): if p.value.path != cached.path: raise RuntimeError( "The cached and the refernced entity are not identical.\n" - f"Cached:\n{cached}\nRefernced:\n{el}" + f"Cached:\n{cached}\nReferenced:\n{el}" ) else: raise RuntimeError( "The cached and the refernced entity are not identical.\n" - f"Cached:\n{cached}\nRefernced:\n{el}" + f"Cached:\n{cached}\nReferenced:\n{el}" ) lst.append(cached) else: @@ -428,12 +429,12 @@ class Crawler(object): if p.value.path != cached.path: raise RuntimeError( "The cached and the refernced entity are not identical.\n" - f"Cached:\n{cached}\nRefernced:\n{p.value}" + f"Cached:\n{cached}\nReferenced:\n{p.value}" ) else: raise RuntimeError( "The cached and the refernced entity are not identical.\n" - f"Cached:\n{cached}\nRefernced:\n{p.value}" + f"Cached:\n{cached}\nReferenced:\n{p.value}" ) p.value = cached @@ -783,6 +784,8 @@ class Crawler(object): for i in reversed(range(len(crawled_data))): if not check_identical(crawled_data[i], identified_records[i]): + logger.debug("Sheduled update because of the folllowing diff:\n" + + str(compare_entities(crawled_data[i], identified_records[i]))) actual_updates.append(crawled_data[i]) return actual_updates @@ -1165,11 +1168,29 @@ def _treat_deprecated_prefix(prefix, remove_prefix): return remove_prefix -def _fix_file_paths(crawled_data, add_prefix, remove_prefix): - """adjust the path according to add_/remove_prefix +def _fix_file_paths(crawled_data: list[db.Entity], + add_prefix: Optional[str], + remove_prefix: Optional[str]): + """ + Adjust the path according to add_/remove_prefix Also remove the `file` attribute from File entities (because inserts need currently be done by loadfiles. + + Arguments: + ------------ + + crawled_data: list[db.Entity] + A list of entities. This list will be searched for instances of db.File. + + add_prefix: Optional[str] + If add_prefix is not None, the given prefix will be added in front of elem.path. + + remove_prefix: Optional[str] + If remove_prefix is not None the given prefix will be removed from the front of + elem.path. In this case a RuntimeError will be raised if any path of a file does + not begin with "remove_prefix". + """ for elem in crawled_data: if isinstance(elem, db.File): @@ -1265,11 +1286,14 @@ def crawler_main(crawled_directory_path: str, whether or not to update or insert entities inspite of name conflicts restricted_path: optional, list of strings Traverse the data tree only along the given path. When the end of the given path - is reached, traverse the full tree as normal. + is reached, traverse the full tree as normal. See docstring of 'scanner' in + module 'scanner' for more details. remove_prefix : Optional[str] - remove the given prefix from file paths + Remove the given prefix from file paths. + See docstring of '_fix_file_paths' for more details. add_prefix : Optional[str] - add the given prefix to file paths + Add the given prefix to file paths. + See docstring of '_fix_file_paths' for more details. Returns ------- @@ -1314,14 +1338,17 @@ def crawler_main(crawled_directory_path: str, _update_status_record(crawler.run_id, len(inserts), len(updates), status="OK") return 0 except ForbiddenTransaction as err: + logger.debug(traceback.format_exc()) logger.error(err) _update_status_record(crawler.run_id, 0, 0, status="FAILED") return 1 except ConverterValidationError as err: + logger.debug(traceback.format_exc()) logger.error(err) _update_status_record(crawler.run_id, 0, 0, status="FAILED") return 1 except Exception as err: + logger.debug(traceback.format_exc()) logger.debug(err) if "SHARED_DIR" in os.environ: @@ -1382,12 +1409,18 @@ def parse_args(): def split_restricted_path(path): - elements = [] - while path != "/": - path, el = os.path.split(path) - if el != "": - elements.insert(0, el) - return elements + """ + Split a path string into components separated by slashes or other os.path.sep. + Empty elements will be removed. + """ + # This implementation leads to infinite loops + # for "ill-posed" paths (see test_utilities.py"): + # elements = [] + # while path != "/": + # path, el = os.path.split(path) + # if el != "": + # elements.insert(0, el) + return [i for i in path.split(os.path.sep) if i != ""] def main(): diff --git a/src/caoscrawler/identifiable_adapters.py b/src/caoscrawler/identifiable_adapters.py index 776baeaeac7caf961d6dba97641804c9e1608114..d9c9c00b22443121b4989bf988a40308b143dbf1 100644 --- a/src/caoscrawler/identifiable_adapters.py +++ b/src/caoscrawler/identifiable_adapters.py @@ -40,6 +40,12 @@ from .utils import has_parent logger = logging.getLogger(__name__) +def get_children_of_rt(rtname): + """Supply the name of a recordtype. This name and the name of all children RTs are returned in + a list""" + return [p.name for p in db.execute_query(f"FIND RECORDTYPE {rtname}")] + + def convert_value(value: Any): """ Returns a string representation of the value that is suitable to be used in the query @@ -212,11 +218,16 @@ identifiabel, identifiable and identified record) for a Record. # TODO: similar to the Identifiable class, Registred Identifiable should be a # separate class too if prop.name.lower() == "is_referenced_by": - for rtname in prop.value: - if (id(record) in referencing_entities - and rtname in referencing_entities[id(record)]): - identifiable_backrefs.extend(referencing_entities[id(record)][rtname]) - else: + for givenrt in prop.value: + rt_and_children = get_children_of_rt(givenrt) + found = False + for rtname in rt_and_children: + if (id(record) in referencing_entities + and rtname in referencing_entities[id(record)]): + identifiable_backrefs.extend( + referencing_entities[id(record)][rtname]) + found = True + if not found: # TODO: is this the appropriate error? raise NotImplementedError( f"The following record is missing an identifying property:" diff --git a/src/caoscrawler/scanner.py b/src/caoscrawler/scanner.py index 54102109d53776c7db21026097cb2f696776650d..6169de213e4a2d33e0329d4c5ed9299392410d2d 100644 --- a/src/caoscrawler/scanner.py +++ b/src/caoscrawler/scanner.py @@ -274,7 +274,7 @@ def scanner(items: list[StructureElement], restricted_path: optional, list of strings, traverse the data tree only along the given path. For example, when a directory contains files a, b and c and b is - given in restricted_path, a and c will be ignroed by the crawler. + given as restricted_path, a and c will be ignroed by the crawler. When the end of the given path is reached, traverse the full tree as normal. The first element of the list provided by restricted_path should be the name of the StructureElement at this level, i.e. denoting the @@ -318,6 +318,8 @@ def scanner(items: list[StructureElement], converters_path = [] for element in items: + element_path = os.path.join(*(structure_elements_path + [element.get_name()])) + logger.debug(f"Dealing with {element_path}") for converter in converters: # type is something like "matches files", replace isinstance with "type_matches" @@ -330,8 +332,7 @@ def scanner(items: list[StructureElement], record_store_copy = record_store.create_scoped_copy() # Create an entry for this matched structure element that contains the path: - general_store_copy[converter.name] = ( - os.path.join(*(structure_elements_path + [element.get_name()]))) + general_store_copy[converter.name] = element_path # extracts values from structure element and stores them in the # variable store. @@ -385,17 +386,6 @@ def scanner(items: list[StructureElement], for record in scoped_records: crawled_data.append(record) - # TODO: the scoped variables should be cleaned up as soon if the variables - # are no longer in the current scope. This can be implemented as follows, - # but this breaks the test "test_record_structure_generation", because - # some debug info is also deleted. This implementation can be used as soon - # as the remaining problems with the debug_tree are fixed. - # Delete the variables that are no longer needed: - # scoped_names = record_store.get_names_current_scope() - # for name in scoped_names: - # del record_store[name] - # del general_store[name] - return crawled_data @@ -414,9 +404,18 @@ def scan_directory(dirname: str, crawler_definition_path: str, Convenience function that starts the crawler (calls start_crawling) with a single directory as the StructureElement. + Parameters + ---------- + restricted_path: optional, list of strings Traverse the data tree only along the given path. When the end of the given path - is reached, traverse the full tree as normal. + is reached, traverse the full tree as normal. See docstring of 'scanner' for + more details. + + Returns + ------- + crawled_data : list + the final list with the target state of Records. """ crawler_definition = load_definition(crawler_definition_path) @@ -472,7 +471,8 @@ def scan_structure_elements(items: Union[list[StructureElement], StructureElemen file. restricted_path: optional, list of strings Traverse the data tree only along the given path. When the end of the given path - is reached, traverse the full tree as normal. + is reached, traverse the full tree as normal. See docstring of 'scanner' for + more details. Returns ------- diff --git a/src/doc/how-to-upgrade.md b/src/doc/how-to-upgrade.md index 2e26531d27a1cb038afa1b487007157b532a6fb7..30d23f8f3a4ad88f6b3f4fca18013e26fbcb1dc1 100644 --- a/src/doc/how-to-upgrade.md +++ b/src/doc/how-to-upgrade.md @@ -1,5 +1,10 @@ # How to upgrade +## 0.6.x to 0.7.0 +If you added Parents to Records at multiple places in the CFood, you must now +do this at a single location because this key now overwrites previously set +parents. + ## 0.5.x to 0.6.0 [#41](https://gitlab.com/caosdb/caosdb-crawler/-/issues/41) was fixed. This means that you previously used the name of Entities as an identifying diff --git a/unittests/cfood_variable_deletion.yml b/unittests/cfood_variable_deletion.yml new file mode 100644 index 0000000000000000000000000000000000000000..9edfc1b06cdd6f57a52cc71a96306984ee9f2dbe --- /dev/null +++ b/unittests/cfood_variable_deletion.yml @@ -0,0 +1,29 @@ + +Data: + type: Directory + match: (.*) + subtree: + Data_1: + type: Directory + match: ^Data_1$ + subtree: + Subdir: + type: Directory + match: ^(?P<test_1>.*)$ + records: + DummyRecord: + name: "Record from Data_1" + var1: $test_1 + var2: $test_2 + Data_2: + type: Directory + match: ^Data_2$ + subtree: + Subdir: + type: Directory + match: ^(?P<test_2>.*)$ + records: + DummyRecord: + name: "Record from Data_2" + var1: $test_1 + var2: $test_2 diff --git a/unittests/cfood_variable_deletion2.yml b/unittests/cfood_variable_deletion2.yml new file mode 100644 index 0000000000000000000000000000000000000000..729fe519e00323c046d77e93904421c3ba6a666e --- /dev/null +++ b/unittests/cfood_variable_deletion2.yml @@ -0,0 +1,29 @@ + +Data: + type: Directory + match: (?P<test_1>.*) + subtree: + Data_1: + type: Directory + match: ^Data_1$ + subtree: + Subdir: + type: Directory + match: ^(?P<test_1>.*)$ + records: + DummyRecord: + name: "Record from Data_1" + var1: $test_1 + var2: $test_2 + Data_2: + type: Directory + match: ^Data_2$ + subtree: + Subdir: + type: Directory + match: ^(?P<test_2>.*)$ + records: + DummyRecord: + name: "Record from Data_2" + var1: $test_1 + var2: $test_2 diff --git a/unittests/test_crawler.py b/unittests/test_crawler.py index dc53cb099eb5e4b225b13461176504a003f2d2ba..91e0e86a6d6cf2967ab3567a2ef93b7ccde56e64 100644 --- a/unittests/test_crawler.py +++ b/unittests/test_crawler.py @@ -607,7 +607,7 @@ def test_create_flat_list(): assert c in flat -@ pytest.fixture +@pytest.fixture def crawler_mocked_for_backref_test(): crawler = Crawler() # mock retrieval of registered identifiabls: return Record with just a parent @@ -651,6 +651,8 @@ def test_validation_error_print(caplog): caplog.clear() +@patch("caoscrawler.identifiable_adapters.get_children_of_rt", + new=Mock(side_effect=lambda x: [x])) def test_split_into_inserts_and_updates_backref(crawler_mocked_for_backref_test): crawler = crawler_mocked_for_backref_test identlist = [Identifiable(name="A", record_type="BR"), @@ -685,6 +687,8 @@ def test_split_into_inserts_and_updates_backref(crawler_mocked_for_backref_test) assert insert[0].name == "B" +@patch("caoscrawler.identifiable_adapters.get_children_of_rt", + new=Mock(side_effect=lambda x: [x])) def test_split_into_inserts_and_updates_mult_backref(crawler_mocked_for_backref_test): # test whether multiple references of the same record type are correctly used crawler = crawler_mocked_for_backref_test @@ -705,6 +709,8 @@ def test_split_into_inserts_and_updates_mult_backref(crawler_mocked_for_backref_ assert len(insert) == 2 +@patch("caoscrawler.identifiable_adapters.get_children_of_rt", + new=Mock(side_effect=lambda x: [x])) def test_split_into_inserts_and_updates_diff_backref(crawler_mocked_for_backref_test): # test whether multiple references of the different record types are correctly used crawler = crawler_mocked_for_backref_test diff --git a/unittests/test_directories/example_variable_deletion/Data_1/bla/README.md b/unittests/test_directories/example_variable_deletion/Data_1/bla/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/unittests/test_directories/example_variable_deletion/Data_2/test/README.md b/unittests/test_directories/example_variable_deletion/Data_2/test/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/unittests/test_file_identifiables.py b/unittests/test_file_identifiables.py index 2852b40ffde98180d5dd7b11b9109cc5875502da..4ec02aa3fc497f8dc35adc709533ef5b35066f3a 100644 --- a/unittests/test_file_identifiables.py +++ b/unittests/test_file_identifiables.py @@ -20,6 +20,8 @@ def clear_cache(): cache_clear() +@patch("caoscrawler.identifiable_adapters.get_children_of_rt", + new=Mock(side_effect=id)) @patch("caoscrawler.identifiable_adapters.cached_get_entity_by", new=Mock(side_effect=mock_get_entity_by)) def test_file_identifiable(): diff --git a/unittests/test_parent_cfood.yml b/unittests/test_parent_cfood.yml new file mode 100644 index 0000000000000000000000000000000000000000..b8d0eaf597641d311cb70017dc2bc75c7c3434f3 --- /dev/null +++ b/unittests/test_parent_cfood.yml @@ -0,0 +1,39 @@ +--- +metadata: + crawler-version: 0.6.1 +--- +Definitions: + type: Definitions + +data: + type: Dict + match_name: '.*' + records: + Experiment: + Projekt: + parents: ["project"] + name: "p" + Campaign: + name: "c" + Stuff: + name: "s" + subtree: + Experiment: + type: DictElement + match: '.*' + records: + Experiment: + parents: ["Exp"] + name: "e" + Projekt: + parents: ["Projekt"] + Campaign: + parents: ["Cap"] + Stuff: + name: "s" + Experiment2: + type: DictElement + match: '.*' + records: + Campaign: + parents: ["Cap2"] diff --git a/unittests/test_scanner.py b/unittests/test_scanner.py index 9c271efd29a6b539b3d675f45ab125506292e72a..c0ce736fc4bed18f371f1626b6bc451ee103db49 100644 --- a/unittests/test_scanner.py +++ b/unittests/test_scanner.py @@ -1,11 +1,11 @@ #!/usr/bin/env python3 # encoding: utf-8 # -# This file is a part of the CaosDB Project. +# This file is a part of the LinkAhead Project. # -# Copyright (C) 2023 Indiscale GmbH <info@indiscale.com> -# Copyright (C) 2023 Henrik tom Wörden <h.tomwoerden@indiscale.com> -# 2023 Research Group Biomedical Physics, +# Copyright (C) 2023,2024 Indiscale GmbH <info@indiscale.com> +# Copyright (C) 2023,2024 Henrik tom Wörden <h.tomwoerden@indiscale.com> +# 2021-2023 Research Group Biomedical Physics, # Max-Planck-Institute for Dynamics and Self-Organization Göttingen # Alexander Schlemmer <alexander.schlemmer@ds.mpg.de> # @@ -22,7 +22,6 @@ # You should have received a copy of the GNU Affero General Public License # along with this program. If not, see <https://www.gnu.org/licenses/>. # - """ Unit test functions for the scanner. """ @@ -256,3 +255,64 @@ def test_record_generation(): persons_found = check_properties(persons, check_props) for f in persons_found: assert f > 0 + + +def test_variable_deletion_problems(): + records = scan_directory(UNITTESTDIR / "test_directories" / "example_variable_deletion", + UNITTESTDIR / "cfood_variable_deletion.yml") + + for record in records: + if record.name == "Record from Data_1": + assert record.get_property("var1").value == "bla" + assert record.get_property("var2").value == "$test_2" + elif record.name == "Record from Data_2": + assert record.get_property("var1").value == "$test_1" + assert record.get_property("var2").value == "test" + else: + raise RuntimeError("Wrong name") + + records = scan_directory(UNITTESTDIR / "test_directories" / "example_variable_deletion", + UNITTESTDIR / "cfood_variable_deletion2.yml") + + # For the following test the order of records is actually important: + assert records[0].name == "Record from Data_1" + assert records[1].name == "Record from Data_2" + for record in records: + if record.name == "Record from Data_1": + assert record.get_property("var1").value == "bla" + assert record.get_property("var2").value == "$test_2" + elif record.name == "Record from Data_2": + assert record.get_property("var1").value == "example_variable_deletion" + assert record.get_property("var2").value == "test" + else: + raise RuntimeError("Wrong name") + + +def test_record_parents(): + """ Test the correct list of returned records by the scanner """ + + data = { + 'Experiments': {} + } + + crawler_definition = load_definition(UNITTESTDIR / "test_parent_cfood.yml") + converter_registry = create_converter_registry(crawler_definition) + + records = scan_structure_elements(DictElement(name="", value=data), crawler_definition, + converter_registry) + assert len(records) == 4 + for rec in records: + if rec.name == 'e': + assert rec.parents[0].name == 'Exp' # default parent was overwritten + assert len(rec.parents) == 1 + elif rec.name == 'c': + assert rec.parents[0].name == 'Cap2' # default parent was overwritten by second + # converter + assert len(rec.parents) == 1 + elif rec.name == 'p': + assert rec.parents[0].name == 'Projekt' # top level set parent was overwritten + assert len(rec.parents) == 1 + elif rec.name == 's': + assert rec.parents[0].name == 'Stuff' # default parent stays if no parent is given on + # lower levels + assert len(rec.parents) == 1 diff --git a/unittests/test_utilities.py b/unittests/test_utilities.py new file mode 100644 index 0000000000000000000000000000000000000000..5a80ab9b230db4540d741bf8fa4f9d11b5158aab --- /dev/null +++ b/unittests/test_utilities.py @@ -0,0 +1,35 @@ +#!/usr/bin/env python3 +# encoding: utf-8 +# +# This file is a part of the CaosDB Project. +# +# Copyright (C) 2023 Alexander Schlemmer <alexander.schlemmer@ds.mpg.de> +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as +# published by the Free Software Foundation, either version 3 of the +# License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see <https://www.gnu.org/licenses/>. +# + +from caoscrawler.crawl import split_restricted_path + + +def test_split_restricted_path(): + assert split_restricted_path("") == [] + assert split_restricted_path("/") == [] + assert split_restricted_path("test/") == ["test"] + assert split_restricted_path("/test/") == ["test"] + assert split_restricted_path("test/bla") == ["test", "bla"] + assert split_restricted_path("/test/bla") == ["test", "bla"] + assert split_restricted_path("/test1/test2/bla") == ["test1", "test2", "bla"] + assert split_restricted_path("/test//bla") == ["test", "bla"] + assert split_restricted_path("//test/bla") == ["test", "bla"] + assert split_restricted_path("///test//bla////") == ["test", "bla"]