F fix strict values in table columns
Summary
For https://gitlab.indiscale.com/caosdb/customers/geomar/management/-/issues/152. The TableImporter.check_datatype
function is now less strict (by default) with numeric values in columns that are expected to have string values.
Focus
There had been already a strict
option but it was only used for float columns containing integer values. This was extended.
Test Environment
No manual testing is needed, unit tests should be sufficient.
If you still want to test it manually, run the Geomar server-profile with this branch and use, e.g., an integer in the "Storage ID" column of the sample upload.
Check List for the Author
Please, prepare your MR for a review. Be sure to write a summary and a focus and create gitlab comments for the reviewer. They should guide the reviewer through the changes, explain your changes and also point out open questions. For further good practices have a look at our review guidelines
-
All automated tests pass -
Reference related issues -
Up-to-date CHANGELOG.md (or not necessary) -
Up-to-date JSON schema (or not necessary) -
Appropriate user and developer documentation (or not necessary) - How do I use the software? Assume "stupid" users.
- How do I develop or debug the software? Assume novice developers.
-
Annotations in code (Gitlab comments) - Intent of new code
- Problems with old code
- Why this implementation?
Check List for the Reviewer
-
I understand the intent of this MR -
All automated tests pass -
Up-to-date CHANGELOG.md (or not necessary) -
Appropriate user and developer documentation (or not necessary) -
The test environment setup works and the intended behavior is reproducible in the test environment -
In-code documentation and comments are up-to-date. -
Check: Are there specifications? Are they satisfied?
For further good practices have a look at our review guidelines.
Merge request reports
Activity
assigned to @florian
342 342 # These special cases should be fine. 343 343 if issub(col_dtype, np.integer) and issub(datatype, np.floating): 344 344 df[key] = df[key].astype(datatype) 345 elif datatype == str: 346 df[key] = df[key].astype(datatype) - Resolved by Daniel Hornung
requested review from @daniel
44 44 # For testing the table importer 45 45 IMPORTER_KWARGS = dict( 46 46 converters={'c': float, 'd': yes_no_converter, 'x': float}, # x does not exist 47 datatypes={'a': str, 'b': int, 'x': int}, # x does not exist 47 datatypes={'a': str, 'b': int, 'float': float, 'x': int}, # x does not exist 192 192 193 193 def test_wrong_datatype(self): 194 194 importer = TableImporter(**self.importer_kwargs) 195 df = pd.DataFrame([[None, np.nan, 2.0, 'yes'], 195 df = pd.DataFrame([[1234, 0, 2.0, 3, 'yes'], 196 [5678, 1, 2.0, 3, 'yes']], 197 columns=['a', 'b', 'c', 'float', 'd']) 204 assert df["a"].dtype == pd.StringDtype 205 assert df["float"].dtype == float 206 207 # Resetting `df` since check_datatype may change datatypes 208 df = pd.DataFrame([[None, 0, 2.0, 'yes'], 196 209 [5, 1, 2.0, 'yes']], 197 210 columns=['a', 'b', 'c', 'd']) 198 self.assertRaises(DataInconsistencyError, importer.check_datatype, df) 211 # strict=True, so number in str column raises an error 212 self.assertRaises(DataInconsistencyError, importer.check_datatype, df, None, True) 213 214 df = pd.DataFrame([[0], 215 [1]], 216 columns=['float']) 217 # strict=True, so int in float column raises an error 218 self.assertRaises(DataInconsistencyError, importer.check_datatype, df, None, True)