Skip to content

Create a new scanner module and move functions from crawl module there

Alexander Schlemmer requested to merge f-refactor-scanner-crawler into dev

Summary

Refactoring of the crawler and scanner:

  • I basically split the crawler into two parts:
    • Scanner module that walks through the file system and other structure elements and collects all the information into record types.
    • The crawler module (crawl.py) does everything else which needs a CaosDB interaction.

The main steps in refactoring were:

  • Extraction of the functions related only to the scanning process.
  • Slight renaming and consistency checks.
  • Adapting all the tests to the new structure.

Left TODO:

  • Fixing the integration tests

Focus

The best procedure for the review probably is to go through the individual commits. I tried to keep them as fine-grained as possible. Fixing the tests actually was very repetitive, because the API was changed slightly.

There is one thing that might be solved not completely ideal:

  • Setting of the runid and the crawled_directory attributes. These are member variables that were previously set by functions now contained in the independent scanner module. This might need some cross-checking with @henrik who probably introduced these variables for logging.

Test Environment

Unittests

I did not run the integration tests manually, but this is probably done by the pipeline.

Check List for the Author

Please, prepare your MR for a review. Be sure to write a summary and a focus and create gitlab comments for the reviewer. They should guide the reviewer through the changes, explain your changes and also point out open questions. For further good practices have a look at our review guidelines

  • All automated tests pass
  • Reference related issues
  • Up-to-date CHANGELOG.md (or not necessary)
  • Up-to-date JSON schema (or not necessary)
  • Appropriate user and developer documentation (or not necessary)
    • How do I use the software? Assume "stupid" users.
    • How do I develop or debug the software? Assume novice developers.
  • Annotations in code (Gitlab comments)
    • Intent of new code
    • Problems with old code
    • Why this implementation?

Check List for the Reviewer

  • I understand the intent of this MR
  • All automated tests pass
  • Up-to-date CHANGELOG.md (or not necessary)
  • Appropriate user and developer documentation (or not necessary)
  • The test environment setup works and the intended behavior is reproducible in the test environment
  • In-code documentation and comments are up-to-date.
  • Check: Are there specifications? Are they satisfied?

For further good practices have a look at our review guidelines.

Edited by Florian Spreckelsen

Merge request reports