From 5f94c93b4758407ad6d3e93c51c1f4f062c3b7cc Mon Sep 17 00:00:00 2001
From: Florian Spreckelsen <f.spreckelsen@indiscale.com>
Date: Thu, 27 Jun 2024 15:17:52 +0200
Subject: [PATCH] DOC: Add some documentation for the TableImporter

---
 src/doc/index.rst     |  1 +
 src/doc/utilities.rst | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)
 create mode 100644 src/doc/utilities.rst

diff --git a/src/doc/index.rst b/src/doc/index.rst
index 7fa017ec..7032e2c2 100644
--- a/src/doc/index.rst
+++ b/src/doc/index.rst
@@ -18,6 +18,7 @@ This documentation helps you to :doc:`get started<README_SETUP>`, explains the m
    Specifying a datamodel with JSON schema <json_schema_interface>
    Convert a data model into a json schema <json_schema_exporter>
    Conversion between XLSX, JSON and LinkAhead Entities <table-json-conversion/specs>
+   Other utilities <utilities>
    _apidoc/modules
    Related Projects <related_projects/index>
    Back to overview <https://docs.indiscale.com/>
diff --git a/src/doc/utilities.rst b/src/doc/utilities.rst
new file mode 100644
index 00000000..4d520ae2
--- /dev/null
+++ b/src/doc/utilities.rst
@@ -0,0 +1,37 @@
+Other utilities in LinkAhead Advanced User Tools
+================================================
+
+The table file importer
+%%%%%%%%%%%%%%%%%%%%%%%
+
+The LinkAhead Advanced user tools provide a generic
+:py:class:`~caosadvancedtools.table_importer.TableImporter` class which reads
+different table file formats (at the time of writing of this documentation,
+.xls(x), .csv, and .tsv) and converts them into :py:class:`pandas.DataFrame`
+objects. It provides helper functions for converting column values (e.g.,
+converting the string values "yes" or "no" to ``True`` or ``False``), checking
+the presence of obligatory columns in a table and whether those have missing
+values, and datatype checks.
+
+The base class :py:class:`~caosadvancedtools.table_importer.TableImporter`
+provides the general verification methods, while each subclass like
+:py:class:`~caosadvancedtools.table_importer.XLSXImporter` or
+:py:class:`~caosadvancedtools.table_importer.CSVImporter` implements its own
+``read_file`` function that is used to convert a given table file into a
+:py:class:`pandas.DataFrame`.
+
+Empty fields in integer columns
+--------------------------------
+
+Reading in table files that have integer-valued columns with missing data can
+result in datatype contradictions (see the Pandas documentation on `nullable
+integers <https://pandas.pydata.org/docs/user_guide/integer_na.html>`_) since
+the default value for missing fields, ``numpy.nan``, is a float. This is why
+from version 0.11 and above, the ``TableImporter`` uses
+:py:class:`pandas.Int64Dtype` as the default datatype for all integer columns
+which allows for empty fields while keeping all actual data integer-valued. This
+behavior can be changed by initializing the ``TableImporter`` with
+``convert_int_to_nullable_int=False`` in which case a
+:py:class:`~caosadvancedtools.datainconsistency.DataInconsistencyError` is
+raised when an empty field is encountered in a column with an non-nullable
+integer datatype.
-- 
GitLab