Skip to content
Snippets Groups Projects
Commit a8f20d42 authored by Henrik tom Wörden's avatar Henrik tom Wörden
Browse files

Merge branch 'dev' into f-fix-merge

parents e3468abc 2aa524bf
No related branches found
No related tags found
5 merge requests!160STY: styling,!140New f fix merge,!136Revert "Merge branch 'f-overview' into 'dev'",!134Link "Back to overview",!120MAINT: Deal with Merge Conflicts of Records in the "split" function
Pipeline #44558 passed
...@@ -7,8 +7,10 @@ RUN apt-get update && \ ...@@ -7,8 +7,10 @@ RUN apt-get update && \
python3-autopep8 \ python3-autopep8 \
python3-pip \ python3-pip \
python3-pytest \ python3-pytest \
python3-sphinx \
tox \ tox \
-y -y
RUN pip3 install recommonmark sphinx-rtd-theme
COPY .docker/wait-for-it.sh /wait-for-it.sh COPY .docker/wait-for-it.sh /wait-for-it.sh
ARG PYLIB ARG PYLIB
ADD https://gitlab.indiscale.com/api/v4/projects/97/repository/commits/${PYLIB} \ ADD https://gitlab.indiscale.com/api/v4/projects/97/repository/commits/${PYLIB} \
......
...@@ -296,8 +296,9 @@ style: ...@@ -296,8 +296,9 @@ style:
pages_prepare: &pages_prepare pages_prepare: &pages_prepare
tags: [ cached-dind ] tags: [ cached-dind ]
stage: deploy stage: deploy
needs: [] needs:
image: $CI_REGISTRY/caosdb/src/caosdb-pylib/testenv:latest - job: build-testenv
image: $CI_REGISTRY_IMAGE
only: only:
refs: refs:
- /^release-.*$/i - /^release-.*$/i
......
...@@ -14,12 +14,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ...@@ -14,12 +14,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
already has a Parent (because it was explicitly given or the default Parent), already has a Parent (because it was explicitly given or the default Parent),
the old Parent(s) are now overwritten with the value belonging to the the old Parent(s) are now overwritten with the value belonging to the
`parents` key. `parents` key.
- If a registered identifiable states, that a reference by a Record with parent
RT1 is needed, then now also references from Records that have a child of RT1
as parent are accepted.
### Deprecated ### ### Deprecated ###
### Removed ### ### Removed ###
### Fixed ### ### Fixed ###
- Empty Records can now be created (https://gitlab.com/caosdb/caosdb-crawler/-/issues/27)
* [#58](https://gitlab.com/caosdb/caosdb-crawler/-/issues/58) Documentation builds API docs in pipeline now.
### Security ### ### Security ###
......
# CaosDB-Crawler
## Welcome ## Welcome
This is the repository of the CaosDB-Crawler, a tool for automatic data This is the repository of the LinkAhead Crawler, a tool for automatic data
insertion into [CaosDB](https://gitlab.com/caosdb/caosdb-meta). insertion into [LinkAhead](https://gitlab.com/linkahead/linkahead).
This is a new implementation resolving problems of the original implementation This is a new implementation resolving problems of the original implementation
in [caosdb-advancedtools](https://gitlab.com/caosdb/caosdb-advanced-user-tools) in [LinkAhead Python Advanced User Tools](https://gitlab.com/caosdb/caosdb-advanced-user-tools)
## Setup ## Setup
...@@ -16,20 +15,23 @@ setup this code. ...@@ -16,20 +15,23 @@ setup this code.
## Further Reading ## Further Reading
Please refer to the [official documentation](https://docs.indiscale.com/caosdb-crawler/) of the CaosDB-Crawler for more information. Please refer to the [official documentation](https://docs.indiscale.com/caosdb-crawler/) of the LinkAhead Crawler for more information.
## Contributing ## Contributing
Thank you very much to all contributers—[past, present](https://gitlab.com/caosdb/caosdb/-/blob/dev/HUMANS.md), and prospective ones. Thank you very much to all contributers—[past,
present](https://gitlab.com/linkahead/linkahead/-/blob/main/HUMANS.md), and prospective
ones.
### Code of Conduct ### Code of Conduct
By participating, you are expected to uphold our [Code of Conduct](https://gitlab.com/caosdb/caosdb/-/blob/dev/CODE_OF_CONDUCT.md). By participating, you are expected to uphold our [Code of
Conduct](https://gitlab.com/linkahead/linkahead/-/blob/main/CODE_OF_CONDUCT.md).
### How to Contribute ### How to Contribute
* You found a bug, have a question, or want to request a feature? Please * You found a bug, have a question, or want to request a feature? Please
[create an issue](https://gitlab.com/caosdb/caosdb-crawler). [create an issue](https://gitlab.com/linkahead/linkahead-crawler/-/issues).
* You want to contribute code? * You want to contribute code?
* **Forking:** Please fork the repository and create a merge request in GitLab and choose this repository as * **Forking:** Please fork the repository and create a merge request in GitLab and choose this repository as
target. Make sure to select "Allow commits from members who can merge the target branch" under target. Make sure to select "Allow commits from members who can merge the target branch" under
...@@ -38,9 +40,8 @@ By participating, you are expected to uphold our [Code of Conduct](https://gitla ...@@ -38,9 +40,8 @@ By participating, you are expected to uphold our [Code of Conduct](https://gitla
* **Code style:** This project adhers to the PEP8 recommendations, you can test your code style * **Code style:** This project adhers to the PEP8 recommendations, you can test your code style
using the `autopep8` tool (`autopep8 -i -r ./`). Please write your doc strings following the using the `autopep8` tool (`autopep8 -i -r ./`). Please write your doc strings following the
[NumpyDoc](https://numpydoc.readthedocs.io/en/latest/format.html) conventions. [NumpyDoc](https://numpydoc.readthedocs.io/en/latest/format.html) conventions.
* You can also contact us at **info (AT) caosdb.de** and join the * You can also join the LinkAhead community on
CaosDB community on [#linkahead:matrix.org](https://matrix.to/#/!unwwlTfOznjEnMMXxf:matrix.org).
[#caosdb:matrix.org](https://matrix.to/#/!unwwlTfOznjEnMMXxf:matrix.org).
There is the file `unittests/records.xml` that servers as a dummy for a server state with files. There is the file `unittests/records.xml` that servers as a dummy for a server state with files.
......
...@@ -114,7 +114,7 @@ def test_issue_23(clear_database): ...@@ -114,7 +114,7 @@ def test_issue_23(clear_database):
assert rec_crawled.get_property("identifying_prop").value == "identifier" assert rec_crawled.get_property("identifying_prop").value == "identifier"
assert rec_crawled.get_property("prop_b") is not None assert rec_crawled.get_property("prop_b") is not None
assert rec_crawled.get_property("prop_b").value == "something_else" assert rec_crawled.get_property("prop_b").value == "something_else"
# no interaction with the database yet, so the rrecord shouldn't have a prop_a yet # no interaction with the database yet, so the record shouldn't have a prop_a yet
assert rec_crawled.get_property("prop_a") is None assert rec_crawled.get_property("prop_a") is None
# synchronize with database and update the record # synchronize with database and update the record
...@@ -133,3 +133,78 @@ def test_issue_23(clear_database): ...@@ -133,3 +133,78 @@ def test_issue_23(clear_database):
"identifying_prop").value == rec_crawled.get_property("identifying_prop").value "identifying_prop").value == rec_crawled.get_property("identifying_prop").value
assert rec_retrieved.get_property( assert rec_retrieved.get_property(
"prop_b").value == rec_crawled.get_property("prop_b").value "prop_b").value == rec_crawled.get_property("prop_b").value
def test_issue_83(clear_database):
"""https://gitlab.com/linkahead/linkahead-crawler/-/issues/83. Test that
names don't need to be unique for referenced entities if they are not part
of the identifiable.
"""
# Very simple data model
identifying_prop = db.Property(name="IdentifyingProp", datatype=db.INTEGER).insert()
referenced_type = db.RecordType(name="ReferencedType").add_property(
name=identifying_prop.name, importance=db.OBLIGATORY).insert()
referencing_type = db.RecordType(name="ReferencingType").add_property(
name=referenced_type.name, datatype=db.LIST(referenced_type.name)).insert()
# Define identifiables. ReferencingType by name, ReferencedType by
# IdentifyingProp and not by name.
ident = CaosDBIdentifiableAdapter()
ident.register_identifiable(referenced_type.name, db.RecordType().add_parent(
name=referenced_type.name).add_property(name=identifying_prop.name))
ident.register_identifiable(referencing_type.name, db.RecordType().add_parent(
name=referencing_type.name).add_property(name="name"))
crawler = Crawler(identifiableAdapter=ident)
ref_target1 = db.Record(name="RefTarget").add_parent(
name=referenced_type.name).add_property(name=identifying_prop.name, value=1)
ref_target2 = db.Record(name="RefTarget").add_parent(
name=referenced_type.name).add_property(name=identifying_prop.name, value=2)
referencing1 = db.Record(name="Referencing1").add_parent(
name=referencing_type.name).add_property(name=referenced_type.name, value=[ref_target1])
referencing2 = db.Record(name="Referencing2").add_parent(
name=referencing_type.name).add_property(name=referenced_type.name, value=[ref_target2])
referencing3 = db.Record(name="Referencing3").add_parent(name=referencing_type.name).add_property(
name=referenced_type.name, value=[ref_target1, ref_target2])
records = db.Container().extend(
[ref_target1, ref_target2, referencing1, referencing2, referencing3])
ins, ups = crawler.synchronize(crawled_data=records, unique_names=False)
assert len(ins) == len(records)
assert len(ups) == 0
retrieved_target1 = db.execute_query(
f"FIND {referenced_type.name} WITH {identifying_prop.name}=1", unique=True)
retrieved_target2 = db.execute_query(
f"FIND {referenced_type.name} WITH {identifying_prop.name}=2", unique=True)
assert retrieved_target2.name == retrieved_target1.name
assert retrieved_target1.name == ref_target1.name
assert retrieved_target1.id != retrieved_target2.id
retrieved_referencing1 = db.execute_query(
f"FIND {referencing_type.name} WITH name={referencing1.name}", unique=True)
assert retrieved_referencing1.get_property(referenced_type.name) is not None
assert retrieved_referencing1.get_property(referenced_type.name).value == [
retrieved_target1.id]
assert retrieved_referencing1.get_property(referenced_type.name).value != [
retrieved_target2.id]
retrieved_referencing2 = db.execute_query(
f"FIND {referencing_type.name} WITH name={referencing2.name}", unique=True)
assert retrieved_referencing2.get_property(referenced_type.name) is not None
assert retrieved_referencing2.get_property(referenced_type.name).value == [
retrieved_target2.id]
assert retrieved_referencing2.get_property(referenced_type.name).value != [
retrieved_target1.id]
retrieved_referencing3 = db.execute_query(
f"FIND {referencing_type.name} WITH name={referencing3.name}", unique=True)
assert retrieved_referencing3.get_property(referenced_type.name) is not None
assert len(retrieved_referencing3.get_property(referenced_type.name).value) == 2
assert retrieved_target1.id in retrieved_referencing3.get_property(referenced_type.name).value
assert retrieved_target2.id in retrieved_referencing3.get_property(referenced_type.name).value
...@@ -20,8 +20,8 @@ packages = find: ...@@ -20,8 +20,8 @@ packages = find:
python_requires = >=3.7 python_requires = >=3.7
install_requires = install_requires =
importlib-resources importlib-resources
caosdb > 0.11.2
caosadvancedtools >= 0.7.0 caosadvancedtools >= 0.7.0
linkahead >= 0.13.1
yaml-header-tools >= 0.2.1 yaml-header-tools >= 0.2.1
pyyaml pyyaml
odfpy #make optional odfpy #make optional
......
...@@ -17,7 +17,7 @@ ...@@ -17,7 +17,7 @@
# along with this program. If not, see <https://www.gnu.org/licenses/>. # along with this program. If not, see <https://www.gnu.org/licenses/>.
# #
import caosdb as db import linkahead as db
DEFAULTS = { DEFAULTS = {
"send_crawler_notifications": False, "send_crawler_notifications": False,
......
...@@ -235,6 +235,12 @@ def create_records(values: GeneralStore, records: RecordStore, def_records: dict ...@@ -235,6 +235,12 @@ def create_records(values: GeneralStore, records: RecordStore, def_records: dict
keys_modified = [] keys_modified = []
for name, record in def_records.items(): for name, record in def_records.items():
# If only a name was given (Like this:
# Experiment:
# ) set record to an empty dict / empty configuration
if record is None:
record = {}
role = "Record" role = "Record"
# This allows us to create e.g. Files # This allows us to create e.g. Files
if "role" in record: if "role" in record:
......
...@@ -834,6 +834,8 @@ class Crawler(object): ...@@ -834,6 +834,8 @@ class Crawler(object):
for i in reversed(range(len(crawled_data))): for i in reversed(range(len(crawled_data))):
if not check_identical(crawled_data[i], identified_records[i]): if not check_identical(crawled_data[i], identified_records[i]):
logger.debug("Sheduled update because of the folllowing diff:\n"
+ str(compare_entities(crawled_data[i], identified_records[i])))
actual_updates.append(crawled_data[i]) actual_updates.append(crawled_data[i])
return actual_updates return actual_updates
......
...@@ -40,6 +40,12 @@ from .utils import has_parent ...@@ -40,6 +40,12 @@ from .utils import has_parent
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
def get_children_of_rt(rtname):
"""Supply the name of a recordtype. This name and the name of all children RTs are returned in
a list"""
return [p.name for p in db.execute_query(f"FIND RECORDTYPE {rtname}")]
def convert_value(value: Any): def convert_value(value: Any):
""" Returns a string representation of the value that is suitable """ Returns a string representation of the value that is suitable
to be used in the query to be used in the query
...@@ -212,11 +218,16 @@ identifiabel, identifiable and identified record) for a Record. ...@@ -212,11 +218,16 @@ identifiabel, identifiable and identified record) for a Record.
# TODO: similar to the Identifiable class, Registred Identifiable should be a # TODO: similar to the Identifiable class, Registred Identifiable should be a
# separate class too # separate class too
if prop.name.lower() == "is_referenced_by": if prop.name.lower() == "is_referenced_by":
for rtname in prop.value: for givenrt in prop.value:
rt_and_children = get_children_of_rt(givenrt)
found = False
for rtname in rt_and_children:
if (id(record) in referencing_entities if (id(record) in referencing_entities
and rtname in referencing_entities[id(record)]): and rtname in referencing_entities[id(record)]):
identifiable_backrefs.extend(referencing_entities[id(record)][rtname]) identifiable_backrefs.extend(
else: referencing_entities[id(record)][rtname])
found = True
if not found:
# TODO: is this the appropriate error? # TODO: is this the appropriate error?
raise NotImplementedError( raise NotImplementedError(
f"The following record is missing an identifying property:" f"The following record is missing an identifying property:"
......
...@@ -264,6 +264,8 @@ def scanner(items: list[StructureElement], ...@@ -264,6 +264,8 @@ def scanner(items: list[StructureElement],
converters_path = [] converters_path = []
for element in items: for element in items:
element_path = os.path.join(*(structure_elements_path + [element.get_name()]))
logger.debug(f"Dealing with {element_path}")
for converter in converters: for converter in converters:
# type is something like "matches files", replace isinstance with "type_matches" # type is something like "matches files", replace isinstance with "type_matches"
...@@ -276,8 +278,7 @@ def scanner(items: list[StructureElement], ...@@ -276,8 +278,7 @@ def scanner(items: list[StructureElement],
record_store_copy = record_store.create_scoped_copy() record_store_copy = record_store.create_scoped_copy()
# Create an entry for this matched structure element that contains the path: # Create an entry for this matched structure element that contains the path:
general_store_copy[converter.name] = ( general_store_copy[converter.name] = element_path
os.path.join(*(structure_elements_path + [element.get_name()])))
# extracts values from structure element and stores them in the # extracts values from structure element and stores them in the
# variable store # variable store
......
...@@ -651,6 +651,8 @@ def test_validation_error_print(caplog): ...@@ -651,6 +651,8 @@ def test_validation_error_print(caplog):
caplog.clear() caplog.clear()
@patch("caoscrawler.identifiable_adapters.get_children_of_rt",
new=Mock(side_effect=lambda x: [x]))
def test_split_into_inserts_and_updates_backref(crawler_mocked_for_backref_test): def test_split_into_inserts_and_updates_backref(crawler_mocked_for_backref_test):
crawler = crawler_mocked_for_backref_test crawler = crawler_mocked_for_backref_test
identlist = [Identifiable(name="A", record_type="BR"), identlist = [Identifiable(name="A", record_type="BR"),
...@@ -685,6 +687,8 @@ def test_split_into_inserts_and_updates_backref(crawler_mocked_for_backref_test) ...@@ -685,6 +687,8 @@ def test_split_into_inserts_and_updates_backref(crawler_mocked_for_backref_test)
assert insert[0].name == "B" assert insert[0].name == "B"
@patch("caoscrawler.identifiable_adapters.get_children_of_rt",
new=Mock(side_effect=lambda x: [x]))
def test_split_into_inserts_and_updates_mult_backref(crawler_mocked_for_backref_test): def test_split_into_inserts_and_updates_mult_backref(crawler_mocked_for_backref_test):
# test whether multiple references of the same record type are correctly used # test whether multiple references of the same record type are correctly used
crawler = crawler_mocked_for_backref_test crawler = crawler_mocked_for_backref_test
...@@ -705,6 +709,8 @@ def test_split_into_inserts_and_updates_mult_backref(crawler_mocked_for_backref_ ...@@ -705,6 +709,8 @@ def test_split_into_inserts_and_updates_mult_backref(crawler_mocked_for_backref_
assert len(insert) == 2 assert len(insert) == 2
@patch("caoscrawler.identifiable_adapters.get_children_of_rt",
new=Mock(side_effect=lambda x: [x]))
def test_split_into_inserts_and_updates_diff_backref(crawler_mocked_for_backref_test): def test_split_into_inserts_and_updates_diff_backref(crawler_mocked_for_backref_test):
# test whether multiple references of the different record types are correctly used # test whether multiple references of the different record types are correctly used
crawler = crawler_mocked_for_backref_test crawler = crawler_mocked_for_backref_test
......
...@@ -20,6 +20,8 @@ def clear_cache(): ...@@ -20,6 +20,8 @@ def clear_cache():
cache_clear() cache_clear()
@patch("caoscrawler.identifiable_adapters.get_children_of_rt",
new=Mock(side_effect=id))
@patch("caoscrawler.identifiable_adapters.cached_get_entity_by", @patch("caoscrawler.identifiable_adapters.cached_get_entity_by",
new=Mock(side_effect=mock_get_entity_by)) new=Mock(side_effect=mock_get_entity_by))
def test_file_identifiable(): def test_file_identifiable():
......
---
metadata:
crawler-version: 0.6.1
---
Definitions:
type: Definitions
data:
type: Dict
match_name: '.*'
records:
Experiment:
Projekt:
parents: ["project"]
name: "p"
Campaign:
name: "c"
Stuff:
name: "s"
subtree:
Experiment:
type: DictElement
match: '.*'
records:
Experiment:
parents: ["Exp"]
name: "e"
Projekt:
parents: ["Projekt"]
Campaign:
parents: ["Cap"]
Stuff:
name: "s"
Experiment2:
type: DictElement
match: '.*'
records:
Campaign:
parents: ["Cap2"]
# encoding: utf-8
#
# This file is a part of the CaosDB Project.
#
# Copyright (C) 2021 Henrik tom Wörden <h.tomwoerden@indiscale.com>
# 2021-2023 Research Group Biomedical Physics,
# Max-Planck-Institute for Dynamics and Self-Organization Göttingen
# Alexander Schlemmer <alexander.schlemmer@ds.mpg.de>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
#
import json import json
import logging import logging
...@@ -276,3 +298,33 @@ def test_variable_deletion_problems(): ...@@ -276,3 +298,33 @@ def test_variable_deletion_problems():
assert record.get_property("var2").value == "test" assert record.get_property("var2").value == "test"
else: else:
raise RuntimeError("Wrong name") raise RuntimeError("Wrong name")
def test_record_parents():
""" Test the correct list of returned records by the scanner """
data = {
'Experiments': {}
}
crawler_definition = load_definition(UNITTESTDIR / "test_parent_cfood.yml")
converter_registry = create_converter_registry(crawler_definition)
records = scan_structure_elements(DictElement(name="", value=data), crawler_definition,
converter_registry)
assert len(records) == 4
for rec in records:
if rec.name == 'e':
assert rec.parents[0].name == 'Exp' # default parent was overwritten
assert len(rec.parents) == 1
elif rec.name == 'c':
assert rec.parents[0].name == 'Cap2' # default parent was overwritten by second
# converter
assert len(rec.parents) == 1
elif rec.name == 'p':
assert rec.parents[0].name == 'Projekt' # top level set parent was overwritten
assert len(rec.parents) == 1
elif rec.name == 's':
assert rec.parents[0].name == 'Stuff' # default parent stays if no parent is given on
# lower levels
assert len(rec.parents) == 1
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment