Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
C
caosdb-advanced-user-tools
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Iterations
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
caosdb
Software
caosdb-advanced-user-tools
Commits
ab0788ab
Commit
ab0788ab
authored
10 months ago
by
Florian Spreckelsen
Browse files
Options
Downloads
Patches
Plain Diff
ENH: Improve CSV TypeErrors in TableImporter
parent
aae82b60
No related branches found
No related tags found
2 merge requests
!112
Release 0.12.0
,
!111
F better csv value error
Pipeline
#53472
passed
10 months ago
Stage: setup
Stage: cert
Stage: style
Stage: unittest
Stage: integrationtest
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
src/caosadvancedtools/table_importer.py
+39
-1
39 additions, 1 deletion
src/caosadvancedtools/table_importer.py
with
39 additions
and
1 deletion
src/caosadvancedtools/table_importer.py
+
39
−
1
View file @
ab0788ab
...
...
@@ -31,7 +31,7 @@ import logging
import
pathlib
from
datetime
import
datetime
import
caosdb
as
db
import
linkahead
as
db
import
numpy
as
np
import
pandas
as
pd
from
xlrd
import
XLRDError
...
...
@@ -537,6 +537,44 @@ class CSVImporter(TableImporter):
extra
=
{
'
identifier
'
:
str
(
filename
),
'
category
'
:
"
inconsistency
"
})
raise
DataInconsistencyError
(
*
ve
.
args
)
except
TypeError
as
te
:
# Iterate through the columns and rows to identify
# problematic cells with wrong types.
df
=
pd
.
read_csv
(
filename
,
sep
=
sep
,
converters
=
applicable_converters
,
dtype
=
None
,
**
kwargs
)
error_dict
=
{}
columns_with_errors
=
[]
for
key
,
dtype
in
self
.
datatypes
.
items
():
try
:
df
[
key
].
astype
(
dtype
)
except
(
TypeError
,
ValueError
):
columns_with_errors
.
append
(
key
)
if
not
columns_with_errors
:
# We may have run into any other TypeError not caused
# by wrong datatypes within the table.
raise
te
for
ii
,
row
in
df
.
iterrows
():
for
name
in
columns_with_errors
:
try
:
# we need to check with astype to provoke the
# same errors, but that only works on
# Dataframes, so cast value to list to
# DataFrame.
pd
.
DataFrame
([
row
[
name
]]).
astype
(
self
.
datatypes
[
name
])
except
(
TypeError
,
ValueError
):
if
ii
not
in
error_dict
:
error_dict
[
ii
]
=
[]
error_dict
[
ii
].
append
(
(
name
,
str
(
self
.
datatypes
[
name
]).
strip
(
"
<>
"
),
str
(
type
(
row
[
name
])).
strip
(
"
<>
"
))
)
msg
=
"
Elements with wrong datatypes encountered:
\n
"
for
ii
,
error_list
in
error_dict
.
items
():
msg
+=
f
"
* row
{
ii
}
:
\n
"
for
err
in
error_list
:
msg
+=
f
"
\t
* column
\"
{
err
[
0
]
}
\"
: Expected
\"
{
err
[
1
]
}
\"
but found
\"
{
err
[
2
]
}
\"
.
\n
"
msg
+=
'
\n
'
raise
DataInconsistencyError
(
msg
)
df
=
self
.
check_dataframe
(
df
,
filename
)
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment