Create a new scanner module and move functions from crawl module there
Summary
Refactoring of the crawler and scanner:
- I basically split the crawler into two parts:
- Scanner module that walks through the file system and other structure elements and collects all the information into record types.
- The crawler module (crawl.py) does everything else which needs a CaosDB interaction.
The main steps in refactoring were:
- Extraction of the functions related only to the scanning process.
- Slight renaming and consistency checks.
- Adapting all the tests to the new structure.
Left TODO:
-
Fixing the integration tests
Focus
The best procedure for the review probably is to go through the individual commits. I tried to keep them as fine-grained as possible. Fixing the tests actually was very repetitive, because the API was changed slightly.
There is one thing that might be solved not completely ideal:
- Setting of the runid and the crawled_directory attributes. These are member variables that were previously set by functions now contained in the independent scanner module. This might need some cross-checking with @henrik who probably introduced these variables for logging.
Test Environment
Unittests
I did not run the integration tests manually, but this is probably done by the pipeline.
Check List for the Author
Please, prepare your MR for a review. Be sure to write a summary and a focus and create gitlab comments for the reviewer. They should guide the reviewer through the changes, explain your changes and also point out open questions. For further good practices have a look at our review guidelines
-
All automated tests pass -
Reference related issues -
Up-to-date CHANGELOG.md (or not necessary) -
Up-to-date JSON schema (or not necessary) -
Appropriate user and developer documentation (or not necessary) - How do I use the software? Assume "stupid" users.
- How do I develop or debug the software? Assume novice developers.
-
Annotations in code (Gitlab comments) - Intent of new code
- Problems with old code
- Why this implementation?
Check List for the Reviewer
-
I understand the intent of this MR -
All automated tests pass -
Up-to-date CHANGELOG.md (or not necessary) -
Appropriate user and developer documentation (or not necessary) -
The test environment setup works and the intended behavior is reproducible in the test environment -
In-code documentation and comments are up-to-date. -
Check: Are there specifications? Are they satisfied?
For further good practices have a look at our review guidelines.