DOC: Add some documentation for the TableImporter

5f94c93b · Florian Spreckelsen · 2f9d7ed2 · 5f94c93b · 5f94c93b
Commit 5f94c93b authored 1 year ago by Florian Spreckelsen
--- a/src/doc/index.rst
+++ b/src/doc/index.rst
@@ -18,6 +18,7 @@ This documentation helps you to :doc:`get started<README_SETUP>`, explains the m
   Specifying a datamodel with JSON schema <json_schema_interface>
   Convert a data model into a json schema <json_schema_exporter>
   Conversion between XLSX, JSON and LinkAhead Entities <table-json-conversion/specs>
+   Other utilities <utilities>
   _apidoc/modules
   Related Projects <related_projects/index>
   Back to overview <https://docs.indiscale.com/>

--- a/src/doc/utilities.rst
+++ b/src/doc/utilities.rst
+Other utilities in LinkAhead Advanced User Tools
+================================================
+
+The table file importer
+%%%%%%%%%%%%%%%%%%%%%%%
+
+The LinkAhead Advanced user tools provide a generic
+:py:class:`~caosadvancedtools.table_importer.TableImporter` class which reads
+different table file formats (at the time of writing of this documentation,
+.xls(x), .csv, and .tsv) and converts them into :py:class:`pandas.DataFrame`
+objects. It provides helper functions for converting column values (e.g.,
+converting the string values "yes" or "no" to ``True`` or ``False``), checking
+the presence of obligatory columns in a table and whether those have missing
+values, and datatype checks.
+
+The base class :py:class:`~caosadvancedtools.table_importer.TableImporter`
+provides the general verification methods, while each subclass like
+:py:class:`~caosadvancedtools.table_importer.XLSXImporter` or
+:py:class:`~caosadvancedtools.table_importer.CSVImporter` implements its own
+``read_file`` function that is used to convert a given table file into a
+:py:class:`pandas.DataFrame`.
+
+Empty fields in integer columns
+--------------------------------
+
+Reading in table files that have integer-valued columns with missing data can
+result in datatype contradictions (see the Pandas documentation on `nullable
+integers <https://pandas.pydata.org/docs/user_guide/integer_na.html>`_) since
+the default value for missing fields, ``numpy.nan``, is a float. This is why
+from version 0.11 and above, the ``TableImporter`` uses
+:py:class:`pandas.Int64Dtype` as the default datatype for all integer columns
+which allows for empty fields while keeping all actual data integer-valued. This
+behavior can be changed by initializing the ``TableImporter`` with
+``convert_int_to_nullable_int=False`` in which case a
+:py:class:`~caosadvancedtools.datainconsistency.DataInconsistencyError` is
+raised when an empty field is encountered in a column with an non-nullable
+integer datatype.