Skip to content
Snippets Groups Projects
Commit 5f94c93b authored by Florian Spreckelsen's avatar Florian Spreckelsen
Browse files

DOC: Add some documentation for the TableImporter

parent 2f9d7ed2
Branches
Tags
2 merge requests!107Release v0.11.0,!106F gaps in int columns
......@@ -18,6 +18,7 @@ This documentation helps you to :doc:`get started<README_SETUP>`, explains the m
Specifying a datamodel with JSON schema <json_schema_interface>
Convert a data model into a json schema <json_schema_exporter>
Conversion between XLSX, JSON and LinkAhead Entities <table-json-conversion/specs>
Other utilities <utilities>
_apidoc/modules
Related Projects <related_projects/index>
Back to overview <https://docs.indiscale.com/>
......
Other utilities in LinkAhead Advanced User Tools
================================================
The table file importer
%%%%%%%%%%%%%%%%%%%%%%%
The LinkAhead Advanced user tools provide a generic
:py:class:`~caosadvancedtools.table_importer.TableImporter` class which reads
different table file formats (at the time of writing of this documentation,
.xls(x), .csv, and .tsv) and converts them into :py:class:`pandas.DataFrame`
objects. It provides helper functions for converting column values (e.g.,
converting the string values "yes" or "no" to ``True`` or ``False``), checking
the presence of obligatory columns in a table and whether those have missing
values, and datatype checks.
The base class :py:class:`~caosadvancedtools.table_importer.TableImporter`
provides the general verification methods, while each subclass like
:py:class:`~caosadvancedtools.table_importer.XLSXImporter` or
:py:class:`~caosadvancedtools.table_importer.CSVImporter` implements its own
``read_file`` function that is used to convert a given table file into a
:py:class:`pandas.DataFrame`.
Empty fields in integer columns
--------------------------------
Reading in table files that have integer-valued columns with missing data can
result in datatype contradictions (see the Pandas documentation on `nullable
integers <https://pandas.pydata.org/docs/user_guide/integer_na.html>`_) since
the default value for missing fields, ``numpy.nan``, is a float. This is why
from version 0.11 and above, the ``TableImporter`` uses
:py:class:`pandas.Int64Dtype` as the default datatype for all integer columns
which allows for empty fields while keeping all actual data integer-valued. This
behavior can be changed by initializing the ``TableImporter`` with
``convert_int_to_nullable_int=False`` in which case a
:py:class:`~caosadvancedtools.datainconsistency.DataInconsistencyError` is
raised when an empty field is encountered in a column with an non-nullable
integer datatype.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment