ENH: Allow crawler_main to operate on a list of paths
Summary
For https://gitlab.indiscale.com/caosdb/customers/umg/management/-/issues/235. Allow a list of directories for crawler_main
and scan_directory
so that we can loop over them without creating individual status records and mail notifications.
Focus
Essentially uses the fact that the underlying scan_structure_element
function already supports a list of StructureElements.
Test Environment
New integration/system test should be sufficient.
Check List for the Author
Please, prepare your MR for a review. Be sure to write a summary and a focus and create gitlab comments for the reviewer. They should guide the reviewer through the changes, explain your changes and also point out open questions. For further good practices have a look at our review guidelines
-
All automated tests pass -
Reference related issues -
Up-to-date CHANGELOG.md (or not necessary) -
Up-to-date JSON schema (or not necessary) -
Appropriate user and developer documentation (or not necessary) - How do I use the software? Assume "stupid" users.
- How do I develop or debug the software? Assume novice developers.
-
Annotations in code (Gitlab comments) - Intent of new code
- Problems with old code
- Why this implementation?
Check List for the Reviewer
-
I understand the intent of this MR -
All automated tests pass -
Up-to-date CHANGELOG.md (or not necessary) -
Appropriate user and developer documentation (or not necessary) -
The test environment setup works and the intended behavior is reproducible in the test environment -
In-code documentation and comments are up-to-date. -
Check: Are there specifications? Are they satisfied?
For further good practices have a look at our review guidelines.
Merge request reports
Activity
assigned to @florian
- integrationtests/test_crawler_main.py 0 → 100644
34 35 def test_list_of_paths(clear_database, monkeypatch): 36 37 # Mock the status record 38 dummy_status = { 39 "n_calls": 0 40 } 41 42 def _mock_update_status_record(run_id, n_inserts, n_updates, status): 43 print("Update mocked status") 44 dummy_status["run_id"] = run_id 45 dummy_status["n_inserts"] = n_inserts 46 dummy_status["n_updates"] = n_updates 47 dummy_status["status"] = status 48 dummy_status["n_calls"] += 1 49 monkeypatch.setattr(crawl, "_update_status_record", _mock_update_status_record) - integrationtests/test_crawler_main.py 0 → 100644
37 # Mock the status record 38 dummy_status = { 39 "n_calls": 0 40 } 41 42 def _mock_update_status_record(run_id, n_inserts, n_updates, status): 43 print("Update mocked status") 44 dummy_status["run_id"] = run_id 45 dummy_status["n_inserts"] = n_inserts 46 dummy_status["n_updates"] = n_updates 47 dummy_status["status"] = status 48 dummy_status["n_calls"] += 1 49 monkeypatch.setattr(crawl, "_update_status_record", _mock_update_status_record) 50 51 # mock SSS environment 52 monkeypatch.setenv("SHARED_DIR", "/tmp") changed this line in version 5 of the diff
- Resolved by Henrik tom Wörden
- integrationtests/test_crawler_main.py 0 → 100644
74 def test_not_implemented_list_with_authorization(caplog, clear_database): 75 76 rt = db.RecordType(name="TestType").insert() 77 basepath = INTTESTDIR / "test_data" / "crawler_main_with_list_of_dirs" 78 dirlist = [basepath / "dir1", basepath / "dir2"] 79 80 # This is not implemented yet, so check log for correct error. 81 ret = crawler_main( 82 dirlist, 83 cfood_file_name=basepath / "cfood.yml", 84 identifiables_definition_file=basepath / "identifiable.yml", 85 securityMode=SecurityMode.RETRIEVE 86 ) 87 # crawler_main hides the error, but has a non-zero return code and 88 # errors in the log: 89 assert ret != 0 1115 1121 crawler.run_id) 1116 1122 _update_status_record(crawler.run_id, len(inserts), len(updates), status="OK") 1117 1123 return 0 1118 except ForbiddenTransaction as err: 1119 logger.debug(traceback.format_exc()) 1120 logger.error(err) 1121 _update_status_record(crawler.run_id, 0, 0, status="FAILED") 1122 return 1 1123 except ConverterValidationError as err: 1124 except Exception as err: 455 457 if not dirname: 456 458 raise ValueError( 457 459 "You have to provide a non-empty path for crawling.") 458 dir_structure_name = os.path.basename(dirname) 460 if not isinstance(dirname, list): added 5 commits
-
4de47f4a...7155ee56 - 4 commits from branch
dev
- c4d76f94 - Merge branch 'dev' into f-unify-notifications
-
4de47f4a...7155ee56 - 4 commits from branch
added 1 commit
- 3ebadf37 - MAINT: Use platform-independent tmp and paths
requested review from @henrik