Skip to content
Snippets Groups Projects

ENH: Allow crawler_main to operate on a list of paths

Merged Florian Spreckelsen requested to merge f-unify-notifications into dev

Summary

For https://gitlab.indiscale.com/caosdb/customers/umg/management/-/issues/235. Allow a list of directories for crawler_main and scan_directory so that we can loop over them without creating individual status records and mail notifications.

Focus

Essentially uses the fact that the underlying scan_structure_element function already supports a list of StructureElements.

Test Environment

New integration/system test should be sufficient.

Check List for the Author

Please, prepare your MR for a review. Be sure to write a summary and a focus and create gitlab comments for the reviewer. They should guide the reviewer through the changes, explain your changes and also point out open questions. For further good practices have a look at our review guidelines

  • All automated tests pass
  • Reference related issues
  • Up-to-date CHANGELOG.md (or not necessary)
  • Up-to-date JSON schema (or not necessary)
  • Appropriate user and developer documentation (or not necessary)
    • How do I use the software? Assume "stupid" users.
    • How do I develop or debug the software? Assume novice developers.
  • Annotations in code (Gitlab comments)
    • Intent of new code
    • Problems with old code
    • Why this implementation?

Check List for the Reviewer

  • I understand the intent of this MR
  • All automated tests pass
  • Up-to-date CHANGELOG.md (or not necessary)
  • Appropriate user and developer documentation (or not necessary)
  • The test environment setup works and the intended behavior is reproducible in the test environment
  • In-code documentation and comments are up-to-date.
  • Check: Are there specifications? Are they satisfied?

For further good practices have a look at our review guidelines.

Edited by Henrik tom Wörden

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
1 ---
  • 34
    35 def test_list_of_paths(clear_database, monkeypatch):
    36
    37 # Mock the status record
    38 dummy_status = {
    39 "n_calls": 0
    40 }
    41
    42 def _mock_update_status_record(run_id, n_inserts, n_updates, status):
    43 print("Update mocked status")
    44 dummy_status["run_id"] = run_id
    45 dummy_status["n_inserts"] = n_inserts
    46 dummy_status["n_updates"] = n_updates
    47 dummy_status["status"] = status
    48 dummy_status["n_calls"] += 1
    49 monkeypatch.setattr(crawl, "_update_status_record", _mock_update_status_record)
  • 37 # Mock the status record
    38 dummy_status = {
    39 "n_calls": 0
    40 }
    41
    42 def _mock_update_status_record(run_id, n_inserts, n_updates, status):
    43 print("Update mocked status")
    44 dummy_status["run_id"] = run_id
    45 dummy_status["n_inserts"] = n_inserts
    46 dummy_status["n_updates"] = n_updates
    47 dummy_status["status"] = status
    48 dummy_status["n_calls"] += 1
    49 monkeypatch.setattr(crawl, "_update_status_record", _mock_update_status_record)
    50
    51 # mock SSS environment
    52 monkeypatch.setenv("SHARED_DIR", "/tmp")
  • Florian Spreckelsen
  • 74 def test_not_implemented_list_with_authorization(caplog, clear_database):
    75
    76 rt = db.RecordType(name="TestType").insert()
    77 basepath = INTTESTDIR / "test_data" / "crawler_main_with_list_of_dirs"
    78 dirlist = [basepath / "dir1", basepath / "dir2"]
    79
    80 # This is not implemented yet, so check log for correct error.
    81 ret = crawler_main(
    82 dirlist,
    83 cfood_file_name=basepath / "cfood.yml",
    84 identifiables_definition_file=basepath / "identifiable.yml",
    85 securityMode=SecurityMode.RETRIEVE
    86 )
    87 # crawler_main hides the error, but has a non-zero return code and
    88 # errors in the log:
    89 assert ret != 0
  • 1115 1121 crawler.run_id)
    1116 1122 _update_status_record(crawler.run_id, len(inserts), len(updates), status="OK")
    1117 1123 return 0
    1118 except ForbiddenTransaction as err:
    1119 logger.debug(traceback.format_exc())
    1120 logger.error(err)
    1121 _update_status_record(crawler.run_id, 0, 0, status="FAILED")
    1122 return 1
    1123 except ConverterValidationError as err:
    1124 except Exception as err:
  • 455 457 if not dirname:
    456 458 raise ValueError(
    457 459 "You have to provide a non-empty path for crawling.")
    458 dir_structure_name = os.path.basename(dirname)
    460 if not isinstance(dirname, list):
  • Florian Spreckelsen marked the checklist item Annotations in code (Gitlab comments) as completed

    marked the checklist item Annotations in code (Gitlab comments) as completed

  • added 5 commits

    Compare with previous version

  • added 1 commit

    • 3ebadf37 - MAINT: Use platform-independent tmp and paths

    Compare with previous version

  • requested review from @henrik

  • Henrik tom Wörden marked the checklist item I understand the intent of this MR as completed

    marked the checklist item I understand the intent of this MR as completed

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading