Skip to content

External IDs

Timm Fitschen requested to merge f-external-ids into dev

Summary

Major refactoring of the server which replaces integer-based entity ids with string-based entity ids. The changes only happen under the hood. The server still behaves as if it were using integer ids as before. The integration tests should pass in the dev branch without any changes (except for removed xfails).

Back-end counter part in caosdb-mysqlbackend!15 (merged)

Related issue: https://gitlab.indiscale.com/caosdb/customers/umg_med_inf/management/-/issues/50

Focus

The main changes here:

  • Entity ids are Strings, even if they look like interger.
  • The ids are being generated in the server now and not in the backend anymore.
  • (Simple right?)

Unrelated refactoring (Not necessary for string ids, but I had to touch the code anyways):

  • Jobs CheckReferenceDependencyExistent and CheckChildDependencyExistent merged into CheckDependenciesBeforeDeletion
  • Caching and how the information whether the cache should be used in the current transaction is being propagated through all the classes that would wanna know about it.
  • Some of the SQL statements where transformed into a procedure (i.e. the code moved to the mysqlbackend repo)
  • The Domain role is being hidden from the api entirely now and domains also do not have external id either. It is purely used in the backend and even the server does only see the domains internal id only in the .../database/backend/implementation/MySQL/ package.
  • We were able to delete a lot of code which deals with string ids (which have not been used in the server yet) from the grpc api implemenation.
  • There have been A LOT of implementations for the same task: Resolve an id or a name used as datatype, parent, property, reference value.... Look into the current transactions's container first then try the backend. There is a new resolve function now which does it correctly once and for all (and fixes a known issue along the way). It replaces a lot of (similar) code in a lot of jobs and also uses caching which might even give us a speed up in huge transactions of similar entities. This is the main reason why we have a negative net change of LoC.
  • Refactoring of Schedule and ScheduledJob prevents loading the same job several times. Might give us a speed up as well but I mainly changed it because debugging was painful before the refactoring.

Notable issues with the new string id:

  • In the past, when in doubt whether something is either a name or an id, we used the rule if id_or_name instanceof integer then is_id(id_or_name) else is_name(id_or_name). This heuristic had to be replaced by something more sophisticated. The EntityIdRegistry class has the matchIdPattern function for this purpose now. We should also add a job which checks names on insert/update and issues a warning when the name matches an the id pattern.
  • The special entities with the old ids < 100 (which are used for bootstrapping the core data model) have been translated into entities with identical but stringyfied ids ("0", "1",...). This is ok for now, but in the future, when we actually make the switch to string ids, they should receive more string-like ids as well which also match the id pattern - so, if we go for some kind of uuid based ids, or something like the handle-system (prefix/suffix) we have to keep an eye on that.
  • The old pattern for temporary ids (used during insertions, mainly) was: if id < 0 then temporary(id). We use a similar and compatible pattern now: if id.starts_with("-") then temporary(id).

Test Environment

No manual testing, updated inttests:

Check List for the Author

Please, prepare your MR for a review. Be sure to write a summary and a focus and create gitlab comments for the reviewer. They should guide the reviewer through the changes, explain your changes and also point out open questions. For further good practices have a look at our review guidelines

  • All automated tests pass
  • Reference related issues
  • Up-to-date CHANGELOG.md (or not necessary)
  • Annotations in code (Gitlab comments)
    • Intent of new code
    • Problems with old code
    • Why this implementation?

Check List for the Reviewer

  • I understand the intent of this MR
  • All automated tests pass
  • Up-to-date CHANGELOG.md (or not necessary)
  • The test environment setup works and the intended behavior is reproducible in the test environment
  • In-code documentation and comments are up-to-date.
  • Check: Are there specifications? Are they satisfied?

For further good practices have a look at our review guidelines.

Edited by Daniel Hornung

Merge request reports