Skip to content
Snippets Groups Projects

F fix strict values in table columns

Merged Florian Spreckelsen requested to merge f-fix-strict-values-in-table-columns into dev

Summary

For https://gitlab.indiscale.com/caosdb/customers/geomar/management/-/issues/152. The TableImporter.check_datatype function is now less strict (by default) with numeric values in columns that are expected to have string values.

Focus

There had been already a strict option but it was only used for float columns containing integer values. This was extended.

Test Environment

No manual testing is needed, unit tests should be sufficient.

If you still want to test it manually, run the Geomar server-profile with this branch and use, e.g., an integer in the "Storage ID" column of the sample upload.

Check List for the Author

Please, prepare your MR for a review. Be sure to write a summary and a focus and create gitlab comments for the reviewer. They should guide the reviewer through the changes, explain your changes and also point out open questions. For further good practices have a look at our review guidelines

  • All automated tests pass
  • Reference related issues
  • Up-to-date CHANGELOG.md (or not necessary)
  • Up-to-date JSON schema (or not necessary)
  • Appropriate user and developer documentation (or not necessary)
    • How do I use the software? Assume "stupid" users.
    • How do I develop or debug the software? Assume novice developers.
  • Annotations in code (Gitlab comments)
    • Intent of new code
    • Problems with old code
    • Why this implementation?

Check List for the Reviewer

  • I understand the intent of this MR
  • All automated tests pass
  • Up-to-date CHANGELOG.md (or not necessary)
  • Appropriate user and developer documentation (or not necessary)
  • The test environment setup works and the intended behavior is reproducible in the test environment
  • In-code documentation and comments are up-to-date.
  • Check: Are there specifications? Are they satisfied?

For further good practices have a look at our review guidelines.

Edited by Daniel Hornung

Merge request reports

Pipeline #40253 passed

Pipeline passed for 8fd2460a on f-fix-strict-values-in-table-columns

Approval is optional

Set by to be merged automatically when the pipeline succeeds

Ready to merge by members who can write to the target branch.

Merge details

  • 8 commits and 1 merge commit will be added to dev.
  • Source branch will be deleted.
  • Auto-merge enabled

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
342 342 # These special cases should be fine.
343 343 if issub(col_dtype, np.integer) and issub(datatype, np.floating):
344 344 df[key] = df[key].astype(datatype)
345 elif datatype == str:
346 df[key] = df[key].astype(datatype)
  • Florian Spreckelsen
  • Florian Spreckelsen marked the checklist item All automated tests pass as completed

    marked the checklist item All automated tests pass as completed

  • Florian Spreckelsen marked the checklist item Annotations in code (Gitlab comments) as completed

    marked the checklist item Annotations in code (Gitlab comments) as completed

  • Florian Spreckelsen requested review from @daniel

    requested review from @daniel

  • Daniel Hornung added 2 commits

    added 2 commits

    • c7b108bd - MAINT: Put all dtype checks into one condition.
    • 4cbff6df - TEST: Additional column type conversion checks.

    Compare with previous version

  • Daniel Hornung added 1 commit

    added 1 commit

    Compare with previous version

  • 44 44 # For testing the table importer
    45 45 IMPORTER_KWARGS = dict(
    46 46 converters={'c': float, 'd': yes_no_converter, 'x': float}, # x does not exist
    47 datatypes={'a': str, 'b': int, 'x': int}, # x does not exist
    47 datatypes={'a': str, 'b': int, 'float': float, 'x': int}, # x does not exist
  • 192 192
    193 193 def test_wrong_datatype(self):
    194 194 importer = TableImporter(**self.importer_kwargs)
    195 df = pd.DataFrame([[None, np.nan, 2.0, 'yes'],
    195 df = pd.DataFrame([[1234, 0, 2.0, 3, 'yes'],
    196 [5678, 1, 2.0, 3, 'yes']],
    197 columns=['a', 'b', 'c', 'float', 'd'])
  • 204 assert df["a"].dtype == pd.StringDtype
    205 assert df["float"].dtype == float
    206
    207 # Resetting `df` since check_datatype may change datatypes
    208 df = pd.DataFrame([[None, 0, 2.0, 'yes'],
    196 209 [5, 1, 2.0, 'yes']],
    197 210 columns=['a', 'b', 'c', 'd'])
    198 self.assertRaises(DataInconsistencyError, importer.check_datatype, df)
    211 # strict=True, so number in str column raises an error
    212 self.assertRaises(DataInconsistencyError, importer.check_datatype, df, None, True)
    213
    214 df = pd.DataFrame([[0],
    215 [1]],
    216 columns=['float'])
    217 # strict=True, so int in float column raises an error
    218 self.assertRaises(DataInconsistencyError, importer.check_datatype, df, None, True)
  • Daniel Hornung marked the checklist item I understand the intent of this MR as completed

    marked the checklist item I understand the intent of this MR as completed

  • Daniel Hornung marked the checklist item All automated tests pass as completed

    marked the checklist item All automated tests pass as completed

  • Daniel Hornung marked the checklist item Up-to-date CHANGELOG.md (or not necessary) as completed

    marked the checklist item Up-to-date CHANGELOG.md (or not necessary) as completed

  • Daniel Hornung marked the checklist item Appropriate user and developer documentation (or not necessary) as completed

    marked the checklist item Appropriate user and developer documentation (or not necessary) as completed

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading