Create a new scanner module and move functions from crawl module there
Summary
Refactoring of the crawler and scanner:
- I basically split the crawler into two parts:
- Scanner module that walks through the file system and other structure elements and collects all the information into record types.
- The crawler module (crawl.py) does everything else which needs a CaosDB interaction.
The main steps in refactoring were:
- Extraction of the functions related only to the scanning process.
- Slight renaming and consistency checks.
- Adapting all the tests to the new structure.
Left TODO:
-
Fixing the integration tests
Focus
The best procedure for the review probably is to go through the individual commits. I tried to keep them as fine-grained as possible. Fixing the tests actually was very repetitive, because the API was changed slightly.
There is one thing that might be solved not completely ideal:
- Setting of the runid and the crawled_directory attributes. These are member variables that were previously set by functions now contained in the independent scanner module. This might need some cross-checking with @henrik who probably introduced these variables for logging.
Test Environment
Unittests
I did not run the integration tests manually, but this is probably done by the pipeline.
Check List for the Author
Please, prepare your MR for a review. Be sure to write a summary and a focus and create gitlab comments for the reviewer. They should guide the reviewer through the changes, explain your changes and also point out open questions. For further good practices have a look at our review guidelines
-
All automated tests pass -
Reference related issues -
Up-to-date CHANGELOG.md (or not necessary) -
Up-to-date JSON schema (or not necessary) -
Appropriate user and developer documentation (or not necessary) - How do I use the software? Assume "stupid" users.
- How do I develop or debug the software? Assume novice developers.
-
Annotations in code (Gitlab comments) - Intent of new code
- Problems with old code
- Why this implementation?
Check List for the Reviewer
-
I understand the intent of this MR -
All automated tests pass -
Up-to-date CHANGELOG.md (or not necessary) -
Appropriate user and developer documentation (or not necessary) -
The test environment setup works and the intended behavior is reproducible in the test environment -
In-code documentation and comments are up-to-date. -
Check: Are there specifications? Are they satisfied?
For further good practices have a look at our review guidelines.
Merge request reports
Activity
requested review from @florian
assigned to @salexan
added 7 commits
- 8cc9c99a - MAIN: renamed load_converters function and removed references to self
- f547fa39 - MAINT: made utility and converter registry functions top level functions without references to self
- 31a6b372 - MAINT: moved main scanner function to scanner module
- 90620c94 - MAIN: refactored scan_structure_elements and scan_directory functions
- 40f3cc5f - MAIN: changed name and name of a parameter of main scanner function
- 9fd76b44 - MAINT: refactored some names in main scanner function
- 47ea54d8 - MAINT: reintroduced the converters path needed for the debug tree
Toggle commit listadded 1 commit
- 50a18727 - MAINT: moved debug tree from crawl.py to scanner.py and created a new class in module debug tree
added 1 commit
- ef8d6f66 - MAINT: finished refactoring of crawler module
added 2 commits
added 1 commit
- dbeea36b - TST: more small fixes for the integration tests
added 1 commit
- e3bc51fc - FIX: fixed test_usses by introducing a function in the crawl module to generate a run id manually