diff --git a/src/doc/index.rst b/src/doc/index.rst index 7fa017ec4202f25fe9f94a154ed8762c4581eebc..7032e2c24ea32b0f1efad2bd2e5b7930259daf61 100644 --- a/src/doc/index.rst +++ b/src/doc/index.rst @@ -18,6 +18,7 @@ This documentation helps you to :doc:`get started<README_SETUP>`, explains the m Specifying a datamodel with JSON schema <json_schema_interface> Convert a data model into a json schema <json_schema_exporter> Conversion between XLSX, JSON and LinkAhead Entities <table-json-conversion/specs> + Other utilities <utilities> _apidoc/modules Related Projects <related_projects/index> Back to overview <https://docs.indiscale.com/> diff --git a/src/doc/utilities.rst b/src/doc/utilities.rst new file mode 100644 index 0000000000000000000000000000000000000000..4d520ae2d4b7a9bbd81171ba002c4f736223713a --- /dev/null +++ b/src/doc/utilities.rst @@ -0,0 +1,37 @@ +Other utilities in LinkAhead Advanced User Tools +================================================ + +The table file importer +%%%%%%%%%%%%%%%%%%%%%%% + +The LinkAhead Advanced user tools provide a generic +:py:class:`~caosadvancedtools.table_importer.TableImporter` class which reads +different table file formats (at the time of writing of this documentation, +.xls(x), .csv, and .tsv) and converts them into :py:class:`pandas.DataFrame` +objects. It provides helper functions for converting column values (e.g., +converting the string values "yes" or "no" to ``True`` or ``False``), checking +the presence of obligatory columns in a table and whether those have missing +values, and datatype checks. + +The base class :py:class:`~caosadvancedtools.table_importer.TableImporter` +provides the general verification methods, while each subclass like +:py:class:`~caosadvancedtools.table_importer.XLSXImporter` or +:py:class:`~caosadvancedtools.table_importer.CSVImporter` implements its own +``read_file`` function that is used to convert a given table file into a +:py:class:`pandas.DataFrame`. + +Empty fields in integer columns +-------------------------------- + +Reading in table files that have integer-valued columns with missing data can +result in datatype contradictions (see the Pandas documentation on `nullable +integers <https://pandas.pydata.org/docs/user_guide/integer_na.html>`_) since +the default value for missing fields, ``numpy.nan``, is a float. This is why +from version 0.11 and above, the ``TableImporter`` uses +:py:class:`pandas.Int64Dtype` as the default datatype for all integer columns +which allows for empty fields while keeping all actual data integer-valued. This +behavior can be changed by initializing the ``TableImporter`` with +``convert_int_to_nullable_int=False`` in which case a +:py:class:`~caosadvancedtools.datainconsistency.DataInconsistencyError` is +raised when an empty field is encountered in a column with an non-nullable +integer datatype.