Skip to content
Snippets Groups Projects
Commit db13ac4e authored by Henrik tom Wörden's avatar Henrik tom Wörden
Browse files

Merge branch 'dev' into resulttable

parents 9e2c6b31 d5b3e46b
No related branches found
No related tags found
No related merge requests found
Pipeline #14710 failed
This commit is part of merge request !6. Comments created here will be created in the context of that merge request.
Showing
with 258 additions and 36 deletions
...@@ -2,6 +2,8 @@ FROM debian:10 ...@@ -2,6 +2,8 @@ FROM debian:10
RUN apt-get update && \ RUN apt-get update && \
apt-get install \ apt-get install \
curl \ curl \
libhdf5-dev \
pkgconf \
python3 \ python3 \
python3-pip \ python3-pip \
python3-requests \ python3-requests \
...@@ -27,6 +29,6 @@ RUN pip3 install recommonmark sphinx-rtd-theme ...@@ -27,6 +29,6 @@ RUN pip3 install recommonmark sphinx-rtd-theme
COPY . /git COPY . /git
RUN rm -r /git/.git \ RUN rm -r /git/.git \
&& mv /git/.docker/pycaosdb.ini /git/integrationtests && mv /git/.docker/pycaosdb.ini /git/integrationtests
RUN cd /git && pip3 install . RUN cd /git && pip3 install .[h5-crawler]
WORKDIR /git/integrationtests WORKDIR /git/integrationtests
CMD /wait-for-it.sh caosdb-server:10443 -t 500 -- ./test.sh CMD /wait-for-it.sh caosdb-server:10443 -t 500 -- ./test.sh --force
...@@ -58,8 +58,8 @@ test: ...@@ -58,8 +58,8 @@ test:
- cd .docker - cd .docker
- /bin/sh ./run.sh - /bin/sh ./run.sh
- cd .. - cd ..
- docker logs docker_caosdb-server_1 &> ../caosdb_log.txt - docker logs docker_caosdb-server_1 &> caosdb_log.txt
- docker logs docker_sqldb_1 &> ../mariadb_log.txt - docker logs docker_sqldb_1 &> mariadb_log.txt
- docker-compose -f .docker/docker-compose.yml down - docker-compose -f .docker/docker-compose.yml down
- rc=`cat .docker/result` - rc=`cat .docker/result`
- exit $rc - exit $rc
......
...@@ -28,9 +28,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ...@@ -28,9 +28,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Automated documentation builds: `make doc` - Automated documentation builds: `make doc`
- Crawler documentation - Crawler documentation
- Proof-of-concept integration with Bloxberg. - Proof-of-concept integration with Bloxberg.
- Introduce a cfood that can create a Record structure based on the contents of a hdf5 file
h5py is now an optional dependency
### Changed ### ### Changed ###
- identifiables of single CFoods are now treated one after the other. This
allows them to have dependencies among each other if they are ordered
correctly
- identifiables must have at least one property or a name - identifiables must have at least one property or a name
* `caosadvancedtools.serverside.helper.init_data_model` also checks the role * `caosadvancedtools.serverside.helper.init_data_model` also checks the role
and data type of entities. and data type of entities.
...@@ -61,6 +66,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ...@@ -61,6 +66,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
cause an `sqlite3.IntegrityError` if more than one change was cached cause an `sqlite3.IntegrityError` if more than one change was cached
for the same entity. for the same entity.
* #40 Insertion of identifiables with missing obligatory properties * #40 Insertion of identifiables with missing obligatory properties
- Before, a Property with the datatype "LIST(TEXT)" would lead to the creation
of a RecordType. This is fixed now.
* #52 `XLSimporter.read_xls` throwed a wrong error when reading from a file with a wrong ending.
Now, a `DataInconsistencyError` is raised instead of a ValueError.
### Security ### ### Security ###
......
...@@ -21,7 +21,7 @@ ...@@ -21,7 +21,7 @@
# This Makefile is a wrapper for several other scripts. # This Makefile is a wrapper for several other scripts.
.PHONY: help doc install .PHONY: help doc install unittest
help: help:
@echo 'Type `make doc` for documentation, or `make install` for (local) installation.' @echo 'Type `make doc` for documentation, or `make install` for (local) installation.'
...@@ -30,4 +30,7 @@ doc: ...@@ -30,4 +30,7 @@ doc:
$(MAKE) -C src/doc html $(MAKE) -C src/doc html
install: install:
@echo "Not implemented yet, use pip for installation." pip3 install .
unittest:
pytest-3 unittests
...@@ -12,6 +12,11 @@ Dependencies will be installed automatically if you use the below described proc ...@@ -12,6 +12,11 @@ Dependencies will be installed automatically if you use the below described proc
- `caosdb>=0.4.0` - `caosdb>=0.4.0`
- `openpyxl>=3.0.0` - `openpyxl>=3.0.0`
- `xlrd>=1.2.0` - `xlrd>=1.2.0`
- `pandas>=1.2.0`
- `numpy>=1.17.3`
If you want to use the optional h5-crawler the following dependencies will be installed additionally:
- `h5py>=3.3.0`
For testing: For testing:
- `tox` - `tox`
...@@ -21,6 +26,9 @@ For testing: ...@@ -21,6 +26,9 @@ For testing:
- `pip install . --user` - `pip install . --user`
- `pip install tox --user` - `pip install tox --user`
Optional h5-crawler:
- `pip install .[h5-crawler] --user`
## Run Unit Tests ## Run Unit Tests
`tox` `tox`
...@@ -31,9 +39,11 @@ For testing: ...@@ -31,9 +39,11 @@ For testing:
extroot. E.g. `sudo mount -o bind extroot extroot. E.g. `sudo mount -o bind extroot
../../caosdb-deploy/profiles/empty/paths/extroot` (or whatever path ../../caosdb-deploy/profiles/empty/paths/extroot` (or whatever path
the extroot of the empty profile to be used is located at). the extroot of the empty profile to be used is located at).
3. Start an empty (!) CaosDB instance (with the mounted extroot). The 3. Start (or restart) an empty (!) CaosDB instance (with the mounted extroot).
database will be cleared during testing, so it's important to use The database will be cleared during testing, so it's important to use
an empty instance. an empty instance.
Make sure your configuration for the python caosdb module is correct and
allows to connect to the server.
4. Run `test.sh`. Note that this may modify content of the `integrationtest/extroot/` directory. 4. Run `test.sh`. Note that this may modify content of the `integrationtest/extroot/` directory.
## Code Formatting ## Code Formatting
......
...@@ -36,6 +36,8 @@ from caosadvancedtools.scifolder import (AnalysisCFood, ExperimentCFood, ...@@ -36,6 +36,8 @@ from caosadvancedtools.scifolder import (AnalysisCFood, ExperimentCFood,
PublicationCFood, SimulationCFood, PublicationCFood, SimulationCFood,
SoftwareCFood, ResultTableCFood) SoftwareCFood, ResultTableCFood)
from example_hdf5cfood import ExampleH5CFood
try: try:
from sss_helper import get_argument_parser, print_success from sss_helper import get_argument_parser, print_success
except ModuleNotFoundError: except ModuleNotFoundError:
...@@ -89,7 +91,7 @@ if __name__ == "__main__": ...@@ -89,7 +91,7 @@ if __name__ == "__main__":
interactive=False, hideKnown=False, interactive=False, hideKnown=False,
cfood_types=[ExperimentCFood, AnalysisCFood, SoftwareCFood, cfood_types=[ExperimentCFood, AnalysisCFood, SoftwareCFood,
PublicationCFood, SimulationCFood, PublicationCFood, SimulationCFood,
ResultTableCFood]) ExampleH5CFood, ResultTableCFood])
if args.authorize_run: if args.authorize_run:
for run_id in args.authorize_run: for run_id in args.authorize_run:
......
#!/usr/bin/env python3
# encoding: utf-8
#
# ** header v3.0
# This file is a part of the CaosDB Project.
#
# Copyright (C) 2021 IndiScale GmbH <www.indiscale.com>
# Copyright (C) 2021 Henrik tom Wörden <h.tomwoerden@indiscale.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
#
# ** end header
#
"""
An exemplary definition of a HDF5 CFood for integration testing
"""
import caosdb as db
from caosadvancedtools.cfoods.h5 import H5CFood
from caosadvancedtools.scifolder import ExperimentCFood
from caosadvancedtools.scifolder.generic_pattern import readme_pattern
class ExampleH5CFood(H5CFood):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.root_name = "ExampleH5"
@staticmethod
def get_re():
return ExperimentCFood.get_re()[:-len(readme_pattern)] + r".*\.hdf5"
def create_identifiables(self):
super().create_identifiables()
self.identifiable_root = db.Record()
self.identifiable_root.add_property("hdf5File", self.crawled_file)
self.identifiable_root.add_parent("ExampleH5")
self.identifiables.append(self.identifiable_root)
def special_treatment(self, key, value, dtype):
if key == "attr_data_root":
return "single_attribute", value, dtype
return key, value, dtype
File added
---
responsible:
- Tom Wood
description: Something.
...
...@@ -7,4 +7,5 @@ python3 -m caosadvancedtools.loadFiles /opt/caosdb/mnt/extroot/SimulationData ...@@ -7,4 +7,5 @@ python3 -m caosadvancedtools.loadFiles /opt/caosdb/mnt/extroot/SimulationData
python3 -m caosadvancedtools.loadFiles /opt/caosdb/mnt/extroot/Publications python3 -m caosadvancedtools.loadFiles /opt/caosdb/mnt/extroot/Publications
python3 -m caosadvancedtools.loadFiles /opt/caosdb/mnt/extroot/Software python3 -m caosadvancedtools.loadFiles /opt/caosdb/mnt/extroot/Software
python3 insert_model.py python3 insert_model.py
python3 insert_some.py
python3 crawl.py / python3 crawl.py /
#!/usr/bin/env python3 #!/usr/bin/env python3
import caosdb as db import caosdb as db
import h5py
from caosadvancedtools.cfoods.h5 import H5CFood
from caosadvancedtools.models.data_model import DataModel
from caosadvancedtools.models.parser import parse_model_from_yaml from caosadvancedtools.models.parser import parse_model_from_yaml
model = parse_model_from_yaml("model.yml") model = parse_model_from_yaml("model.yml")
...@@ -9,3 +12,11 @@ if len(db.execute_query("FIND Property alias")) == 0: ...@@ -9,3 +12,11 @@ if len(db.execute_query("FIND Property alias")) == 0:
al = db.Property(name="alias") al = db.Property(name="alias")
al.add_parent(name="name") al.add_parent(name="name")
al.insert() al.insert()
h5model = db.Container()
h5file = h5py.File('extroot/ExperimentalData/2010_TestProject/2019-02-03/hdf5_dummy_file.hdf5', 'r')
H5CFood.create_structure(h5file, create_recordTypes=True, collection=h5model,
root_name="ExampleH5")
print(h5model)
h5model = DataModel(h5model)
h5model.sync_data_model(noquestion=True)
#!/usr/bin/env python3
import caosdb as db
from caosadvancedtools.scifolder.experiment_cfood import dm
# This inserts two identifiables. When no dependencies are possible among
# identifiables, it should not be possible to find both: the experiment
# identifiable would for example not reference the correct project Record
project = db.Record(name='2010_TestProject')
project.add_parent(name=dm.Project)
project.insert()
pers = db.Record()
pers.add_parent("Person")
pers.add_property("lastname", "Wood")
pers.add_property("firstname", "Tom")
pers.insert()
experiment = db.Record()
experiment.add_parent(name=dm.Experiment)
experiment.description = "Something."
experiment.add_property(
name=dm.date, value='2019-02-04')
experiment.add_property(name=dm.Project, value=project)
experiment.add_property(
name="identifier", value="empty_identifier")
experiment.add_property(
name="responsible", value=pers)
experiment.insert(flags={"force-missing-obligatory": "ignore"})
...@@ -9,6 +9,7 @@ Experiment: ...@@ -9,6 +9,7 @@ Experiment:
# TODO empty recommended_properties is a problem # TODO empty recommended_properties is a problem
#recommended_properties: #recommended_properties:
responsible: responsible:
datatype: LIST<Person>
Project: Project:
SoftwareVersion: SoftwareVersion:
recommended_properties: recommended_properties:
...@@ -38,16 +39,16 @@ Person: ...@@ -38,16 +39,16 @@ Person:
email: email:
datatype: TEXT datatype: TEXT
description: 'Email of a Person.' description: 'Email of a Person.'
responsible:
datatype: REFERENCE
revisionOf: revisionOf:
datatype: REFERENCE datatype: REFERENCE
results: results:
datatype: REFERENCE datatype: LIST<REFERENCE>
sources: sources:
datatype: REFERENCE datatype: LIST<REFERENCE>
scripts: scripts:
datatype: REFERENCE datatype: LIST<REFERENCE>
single_attribute:
datatype: LIST<INTEGER>
Simulation: Simulation:
obligatory_properties: obligatory_properties:
date: date:
...@@ -74,3 +75,5 @@ Presentation: ...@@ -74,3 +75,5 @@ Presentation:
Report: Report:
inherit_from_suggested: inherit_from_suggested:
- Publication - Publication
hdf5File:
datatype: REFERENCE
#!/bin/bash #!/bin/bash
if [ "$1" != "--force" ]
then
echo "Warning: For these tests, the whole database will be deleted. Do you want to proceed? (yes/Exit)"
read safety
if [ -z $safety ]
then
echo "Exiting..."
exit 0
elif [ $safety != "yes" ]
then
echo "Exiting..."
exit 0
fi
fi
OUT=/tmp/crawler.output OUT=/tmp/crawler.output
ls ls
cat pycaosdb.ini cat pycaosdb.ini
rm -rf cache.db rm -rf cache.db
set -e
echo "Clearing database" echo "Clearing database"
python3 clear_database.py python3 clear_database.py
echo "Testing crawler without cfoods" echo "Testing crawler without cfoods"
...@@ -19,17 +34,16 @@ echo "Filling the database" ...@@ -19,17 +34,16 @@ echo "Filling the database"
echo "Testing the crawler database" echo "Testing the crawler database"
python3 -m pytest test_crawler_with_cfoods.py python3 -m pytest test_crawler_with_cfoods.py
echo "make a change" echo "make a change"
pushd extroot cd extroot
egrep -liRZ 'A description of another example' . | xargs -0 -l sed -i -e 's/A description of another example/A description of this example/g' egrep -liRZ 'A description of another example' . | xargs -0 -l sed -i -e 's/A description of another example/A description of this example/g'
# remove a file to check that this does not lead to a crawler crash # remove a file to check that this does not lead to a crawler crash
mv DataAnalysis/2010_TestProject/2019-02-03_something/README.xlsx DataAnalysis/2010_TestProject/2019-02-03_something/README.xlsx_back mv DataAnalysis/2010_TestProject/2019-02-03_something/README.xlsx DataAnalysis/2010_TestProject/2019-02-03_something/README.xlsx_back
popd cd ..
echo "run crawler" echo "run crawler"
./crawl.py / | tee $OUT ./crawl.py / | tee $OUT
# rename the moved file # rename the moved file
mv extroot/DataAnalysis/2010_TestProject/2019-02-03_something/README.xlsx_back extroot/DataAnalysis/2010_TestProject/2019-02-03_something/README.xlsx mv extroot/DataAnalysis/2010_TestProject/2019-02-03_something/README.xlsx_back extroot/DataAnalysis/2010_TestProject/2019-02-03_something/README.xlsx
# check whether there was something UNAUTHORIZED # check whether there was something UNAUTHORIZED
set -e
grep "There where unauthorized changes" $OUT grep "There where unauthorized changes" $OUT
# get the id of the run which is the last field of the output string # get the id of the run which is the last field of the output string
RUN_ID=$(grep "run id:" $OUT | awk '{ print $NF }') RUN_ID=$(grep "run id:" $OUT | awk '{ print $NF }')
...@@ -44,9 +58,9 @@ then ...@@ -44,9 +58,9 @@ then
fi fi
set -e set -e
echo "undo changes" echo "undo changes"
pushd extroot cd extroot
egrep -liRZ 'A description of this example' . | xargs -0 -l sed -i -e 's/A description of this example/A description of another example/g' egrep -liRZ 'A description of this example' . | xargs -0 -l sed -i -e 's/A description of this example/A description of another example/g'
popd cd ..
python3 test_table.py python3 test_table.py
# TODO the following test deletes lots of the data inserted by the crawler # TODO the following test deletes lots of the data inserted by the crawler
echo "Testing im and export" echo "Testing im and export"
......
...@@ -26,6 +26,7 @@ import os ...@@ -26,6 +26,7 @@ import os
import unittest import unittest
import caosdb as db import caosdb as db
from caosdb.apiutils import retrieve_entity_with_id
def get_entity_with_id(eid): def get_entity_with_id(eid):
...@@ -34,6 +35,14 @@ def get_entity_with_id(eid): ...@@ -34,6 +35,14 @@ def get_entity_with_id(eid):
class CrawlerTest(unittest.TestCase): class CrawlerTest(unittest.TestCase):
def test_experiment(self): def test_experiment(self):
########################
# # dummy for dependency test experiment # #
########################
exp = db.execute_query(
"FIND Experiment with date=2019-02-04 and identifier=empty_identifier",
unique=True)
######################## ########################
# # first experiment # # # # first experiment # #
######################## ########################
...@@ -489,3 +498,17 @@ class CrawlerTest(unittest.TestCase): ...@@ -489,3 +498,17 @@ class CrawlerTest(unittest.TestCase):
# Should have a description # Should have a description
self.assertIsNotNone(ana.description) self.assertIsNotNone(ana.description)
def test_exampleh5(self):
examp = db.execute_query("FIND Record ExampleH5", unique=True)
for prop in examp.properties:
if prop.name == 'group_level1_a':
self.assertTrue(retrieve_entity_with_id(prop.value).get_property("group_level2_aa") is not None)
self.assertTrue(retrieve_entity_with_id(prop.value).get_property("group_level1_a") is None)
elif prop.name == 'group_level1_b':
self.assertTrue(retrieve_entity_with_id(prop.value).get_property("level1_b_floats") is not None)
elif prop.name == 'group_level1_c':
self.assertTrue(retrieve_entity_with_id(prop.value).get_property("level1_c_floats") is not None)
elif prop.name == 'root_integers':
self.assertTrue(retrieve_entity_with_id(prop.value).get_property("single_attribute") is not None)
...@@ -33,13 +33,6 @@ class DataModelTest(unittest.TestCase): ...@@ -33,13 +33,6 @@ class DataModelTest(unittest.TestCase):
rt = db.execute_query("FIND RECORDTYPE TestRecord", unique=True) rt = db.execute_query("FIND RECORDTYPE TestRecord", unique=True)
assert rt.get_property("test") is not None assert rt.get_property("test") is not None
def tearDown(self):
try:
tests = db.execute_query("FIND test*")
tests.delete()
except Exception:
pass
def test_missing(self): def test_missing(self):
# Test sync with missing prop # Test sync with missing prop
# insert propt # insert propt
...@@ -52,3 +45,19 @@ class DataModelTest(unittest.TestCase): ...@@ -52,3 +45,19 @@ class DataModelTest(unittest.TestCase):
dm.sync_data_model(noquestion=True) dm.sync_data_model(noquestion=True)
rt = db.execute_query("FIND RECORDTYPE TestRecord", unique=True) rt = db.execute_query("FIND RECORDTYPE TestRecord", unique=True)
assert rt.get_property("testproperty") is not None assert rt.get_property("testproperty") is not None
def test_get_existing_entities(self):
db.RecordType(name="TestRecord").insert()
c = db.Container().extend([
db.Property(name="test"),
db.RecordType(name="TestRecord")])
exist = DataModel.get_existing_entities(c)
assert len(exist) == 1
assert exist[0].name == "TestRecord"
def tearDown(self):
try:
tests = db.execute_query("FIND test*")
tests.delete()
except Exception:
pass
...@@ -3,15 +3,14 @@ import os ...@@ -3,15 +3,14 @@ import os
from tempfile import TemporaryDirectory from tempfile import TemporaryDirectory
import caosdb as db import caosdb as db
from caosadvancedtools.export_related import export_related_to
from caosadvancedtools.export_related import export
from caosadvancedtools.import_from_xml import import_xml from caosadvancedtools.import_from_xml import import_xml
if __name__ == "__main__": if __name__ == "__main__":
print("Conducting im- and export tests") print("Conducting im- and export tests")
rec = db.execute_query("FIND 2019-02-03_really_cool_finding", unique=True) rec = db.execute_query("FIND 2019-02-03_really_cool_finding", unique=True)
directory = TemporaryDirectory() directory = TemporaryDirectory()
export(rec.id, directory=directory.name) export_related_to(rec.id, directory=directory.name)
# delete everything # delete everything
recs = db.execute_query("FIND entity with id>99") recs = db.execute_query("FIND entity with id>99")
recs.delete() recs.delete()
......
...@@ -157,12 +157,15 @@ def setup_package(): ...@@ -157,12 +157,15 @@ def setup_package():
install_requires=["caosdb>=0.4.0", install_requires=["caosdb>=0.4.0",
"openpyxl>=3.0.0", "openpyxl>=3.0.0",
"pandas>=1.2.0", "pandas>=1.2.0",
"numpy>=1.17.3",
"xlrd>=2.0", "xlrd>=2.0",
], ],
extras_require={"h5-crawler": ["h5py>=3.3.0", ],
},
packages=find_packages('src'), packages=find_packages('src'),
package_dir={'': 'src'}, package_dir={'': 'src'},
setup_requires=["pytest-runner>=2.0,<3dev"], setup_requires=["pytest-runner>=2.0,<3dev"],
tests_require=["pytest", "pytest-cov", "coverage>=4.4.2"], tests_require=["pytest", "pytest-pythonpath", "pytest-cov", "coverage>=4.4.2"],
) )
try: try:
setup(**metadata) setup(**metadata)
......
...@@ -32,6 +32,8 @@ from hashlib import sha256 ...@@ -32,6 +32,8 @@ from hashlib import sha256
import caosdb as db import caosdb as db
from lxml import etree from lxml import etree
import tempfile
def put_in_container(stuff): def put_in_container(stuff):
if isinstance(stuff, list): if isinstance(stuff, list):
...@@ -154,7 +156,9 @@ class UpdateCache(Cache): ...@@ -154,7 +156,9 @@ class UpdateCache(Cache):
def __init__(self, db_file=None): def __init__(self, db_file=None):
if db_file is None: if db_file is None:
db_file = "/tmp/crawler_update_cache.db" tmppath = tempfile.gettempdir()
tmpf = os.path.join(tmppath, "crawler_update_cache.db")
db_file = tmpf
super().__init__(db_file=db_file) super().__init__(db_file=db_file)
@staticmethod @staticmethod
......
...@@ -152,9 +152,19 @@ fileguide = FileGuide() ...@@ -152,9 +152,19 @@ fileguide = FileGuide()
class AbstractCFood(object, metaclass=ABCMeta): class AbstractCFood(object, metaclass=ABCMeta):
""" Abstract base class for Crawler food (CFood)."""
def __init__(self, item): def __init__(self, item):
""" Abstract base class for Crawler food (CFood).""" """A CFood has two main methods which must be customized:
1. `create_identifiables`
This method defines (and inserts if necessary) the identifiables which may be updated at a
later stage. After calling this method, the `identifiables` Container contains those
Records which will be updated at a later time.
2. `update_identifiables`
This method updates the stored identifiables as necessary.
"""
self.to_be_updated = db.Container() self.to_be_updated = db.Container()
self.identifiables = db.Container() self.identifiables = db.Container()
self.item = item self.item = item
...@@ -298,7 +308,7 @@ class AbstractFileCFood(AbstractCFood): ...@@ -298,7 +308,7 @@ class AbstractFileCFood(AbstractCFood):
super().__init__(*args, item=crawled_path, **kwargs) super().__init__(*args, item=crawled_path, **kwargs)
self._crawled_file = None self._crawled_file = None
self.crawled_path = crawled_path self.crawled_path = crawled_path
self.match = re.match(type(self).get_re(), crawled_path) self.match = re.match(self.get_re(), crawled_path)
self.attached_filenames = [] self.attached_filenames = []
@property @property
...@@ -309,7 +319,31 @@ class AbstractFileCFood(AbstractCFood): ...@@ -309,7 +319,31 @@ class AbstractFileCFood(AbstractCFood):
return self._crawled_file return self._crawled_file
@staticmethod @staticmethod
def get_re(): def re_from_extensions(extensions):
"""Return a regular expression which matches the given file extensions.
Useful for inheriting classes.
Parameters
----------
extensions : iterable<str>
An iterable with the allowed extensions.
Returns
-------
out : str
The regular expression, starting with ``.*\\.`` and ending with the EOL dollar
character. The actual extension will be accessible in the
:py:attribute:`pattern group name<python:re.Pattern.groupindexe>` ``ext``.
"""
if not extensions:
return None
return r".*\.(?P<ext>" + "|".join(extensions) + ")$"
@classmethod
def get_re(cls):
""" Returns the regular expression used to identify files that shall be """ Returns the regular expression used to identify files that shall be
processed processed
...@@ -377,6 +411,7 @@ def assure_object_is_in_list(obj, containing_object, property_name, ...@@ -377,6 +411,7 @@ def assure_object_is_in_list(obj, containing_object, property_name,
if containing_object.get_property(property_name) is None: if containing_object.get_property(property_name) is None:
containing_object.add_property(property_name, value=[], containing_object.add_property(property_name, value=[],
datatype=datatype) datatype=datatype)
# TODO: case where multiple times the same property exists is not treated
if not isinstance(containing_object.get_property(property_name).value, list): if not isinstance(containing_object.get_property(property_name).value, list):
containing_object.get_property(property_name).value = [ containing_object.get_property(property_name).value = [
...@@ -628,7 +663,12 @@ def assure_has_property(entity, name, value, to_be_updated=None, ...@@ -628,7 +663,12 @@ def assure_has_property(entity, name, value, to_be_updated=None,
value = value.id value = value.id
for el in possible_properties: for el in possible_properties:
if el.value == value: tmp_value = el.value
if isinstance(tmp_value, db.Entity):
tmp_value = el.value.id
if tmp_value == value:
contained = True contained = True
break break
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment