From 5f94c93b4758407ad6d3e93c51c1f4f062c3b7cc Mon Sep 17 00:00:00 2001 From: Florian Spreckelsen <f.spreckelsen@indiscale.com> Date: Thu, 27 Jun 2024 15:17:52 +0200 Subject: [PATCH] DOC: Add some documentation for the TableImporter --- src/doc/index.rst | 1 + src/doc/utilities.rst | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+) create mode 100644 src/doc/utilities.rst diff --git a/src/doc/index.rst b/src/doc/index.rst index 7fa017ec..7032e2c2 100644 --- a/src/doc/index.rst +++ b/src/doc/index.rst @@ -18,6 +18,7 @@ This documentation helps you to :doc:`get started<README_SETUP>`, explains the m Specifying a datamodel with JSON schema <json_schema_interface> Convert a data model into a json schema <json_schema_exporter> Conversion between XLSX, JSON and LinkAhead Entities <table-json-conversion/specs> + Other utilities <utilities> _apidoc/modules Related Projects <related_projects/index> Back to overview <https://docs.indiscale.com/> diff --git a/src/doc/utilities.rst b/src/doc/utilities.rst new file mode 100644 index 00000000..4d520ae2 --- /dev/null +++ b/src/doc/utilities.rst @@ -0,0 +1,37 @@ +Other utilities in LinkAhead Advanced User Tools +================================================ + +The table file importer +%%%%%%%%%%%%%%%%%%%%%%% + +The LinkAhead Advanced user tools provide a generic +:py:class:`~caosadvancedtools.table_importer.TableImporter` class which reads +different table file formats (at the time of writing of this documentation, +.xls(x), .csv, and .tsv) and converts them into :py:class:`pandas.DataFrame` +objects. It provides helper functions for converting column values (e.g., +converting the string values "yes" or "no" to ``True`` or ``False``), checking +the presence of obligatory columns in a table and whether those have missing +values, and datatype checks. + +The base class :py:class:`~caosadvancedtools.table_importer.TableImporter` +provides the general verification methods, while each subclass like +:py:class:`~caosadvancedtools.table_importer.XLSXImporter` or +:py:class:`~caosadvancedtools.table_importer.CSVImporter` implements its own +``read_file`` function that is used to convert a given table file into a +:py:class:`pandas.DataFrame`. + +Empty fields in integer columns +-------------------------------- + +Reading in table files that have integer-valued columns with missing data can +result in datatype contradictions (see the Pandas documentation on `nullable +integers <https://pandas.pydata.org/docs/user_guide/integer_na.html>`_) since +the default value for missing fields, ``numpy.nan``, is a float. This is why +from version 0.11 and above, the ``TableImporter`` uses +:py:class:`pandas.Int64Dtype` as the default datatype for all integer columns +which allows for empty fields while keeping all actual data integer-valued. This +behavior can be changed by initializing the ``TableImporter`` with +``convert_int_to_nullable_int=False`` in which case a +:py:class:`~caosadvancedtools.datainconsistency.DataInconsistencyError` is +raised when an empty field is encountered in a column with an non-nullable +integer datatype. -- GitLab