Deletion of data sources and the CaosDB Crawler
The CaosDB Crawler was designed to be able to scan some (File-)Structure and insert Records in CaosDB as needed. The design, that it reacts to what it finds (e.g. there is a README.md
in a folder -> create a DataSet Record with appropriate Properties) imply an important caveat: It does not react to what it does not find. I.e. it does not react to removed stuff.
Example: A folder with a README.md
file is translated into a data set. If that folder is removed, it is not found by the crawler. The data set remains in the CaosDB server.
Users might not expect this behavior since they expect some synchronization. If an addition is treated automatically, why wouldn't a deletion???
This also impacts moving or renaming files.
In many cases a reverse check would not even be possible since Records can stem from multiple sources. I.e. you cannot simply remove data sets that would not be created by a crawler run because a user could have created them manually or there could be another crawler etc.
One possibility would be to create some user assistance based on caching: Track, what files (or structure elements in general) were used by a particular crawler to create Records. If such a file is removed, it might be at least indicated to the user. This could be expanded with advice, what to do or even the option to automatically fix it (adjust Records after renaming of a file or deletion of Records due to file removal).