Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
CaosDB Crawler
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Iterations
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
caosdb
Software
CaosDB Crawler
Commits
3b8e99cc
Verified
Commit
3b8e99cc
authored
1 year ago
by
Daniel Hornung
Browse files
Options
Downloads
Patches
Plain Diff
DOC WIP: Tutorial: Single structured file
For issue
#80
.
parent
59f79e19
No related branches found
No related tags found
2 merge requests
!162
DOC WIP: Tutorial: Single structured file
,
!129
Documentation: many small changes
Pipeline
#40561
passed
1 year ago
Stage: info
Stage: setup
Stage: cert
Stage: style
Stage: test
Changes
2
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
src/doc/tutorials/index.rst
+1
-1
1 addition, 1 deletion
src/doc/tutorials/index.rst
src/doc/tutorials/single_file.rst
+129
-0
129 additions, 0 deletions
src/doc/tutorials/single_file.rst
with
130 additions
and
1 deletion
src/doc/tutorials/index.rst
+
1
−
1
View file @
3b8e99cc
...
...
@@ -9,4 +9,4 @@ This chapter contains a collection of tutorials.
Parameter File<parameterfile>
Scientific Data Folder<scifolder>
WIP: Single Structured File <single_file>
This diff is collapsed.
Click to expand it.
src/doc/tutorials/single_file.rst
0 → 100644
+
129
−
0
View file @
3b8e99cc
WIP Tutorial: Single structured file
================================
.. warning::
This tutorial is still work in progress. It may be better than nothing, but it is still
incomplete and probably contains serious errors.
Use at your own risk.
In this tutorial, we will create a crawler that reads a single structured file, such as an XLSX
file.
Declarations
------------
``identifiables.yml``
.. code-block:: yaml
Präventionsmaßnahme:
- Organisation
- titel
- Laufzeit
``cfood.yml``
.. code-block:: yaml
---
metadata:
crawler-version: 0.6.1
---
Präventionsmaßnahme der Organisation: # Eine Excel-Datei mit Präventionsmaßnahmen
type: XLSXTableConverter
match: ".*xlsx$" # Any xlsx file.
subtree:
Maßnahme: # Eine Zeile in der Datei
type: DictElement
match_name: .*
match_value: .*
records:
Präventionsmaßnahme: # Records edited for each row
name: ""
subtree:
MaßnahmenArt: # Spalte mit Art der Maßnahme
type: IntegerElement
match_name: Art der Maßnahme # Name of the column in the table file
match_value: (?P<column_value).*)
MaßnahmenTitel:
type: TextElement
match_name: Titel der Maßnahme # Name of the column in the table file
match_value: (?P<column_value).*)
records: # Records edited for each cell
Präventionsmaßnahme:
titel: $column_value
Python code
-----------
.. code-block:: python
#!/usr/bin/env python3
# Crawler für Präventionsmaßnahme
#
# Copyright (C) 2023 IndiScale GmbH <info@indiscale.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""Crawler für Präventionsmaßnahmen"""
import argparse
from caoscrawler.scanner import load_definition, create_converter_registry, scan_structure_elements
from caoscrawler.structure_elements import File
def crawl_file(filename: str, dry_run: bool = False):
"""Read an XLSX file into a LinkAhead container.
Parameters
----------
filename : str
The name of the XLSX file.
dry_run : bool
If True, do not modify the database.
"""
definition = load_definition("cfood.yml")
converter_registry = create_converter_registry(definition)
records = scan_structure_elements(items=File(name="somename.xlsx", path=filename),
crawler_definition=definition,
converter_registry=converter_registry)
from IPython import embed
embed()
def _parse_arguments():
"""Parse the arguments."""
parser = argparse.ArgumentParser(description='Crawler für Präventionsmaßnahme')
parser.add_argument('-n', '--dry-run', help="Do not modify the database.", action="store_true")
parser.add_argument('xlsx_file', metavar="XSLX file", help="The xlsx file to be crawled.")
return parser.parse_args()
def main():
"""Main function."""
args = _parse_arguments()
crawl_file(args.xlsx_file, dry_run=args.dry_run)
if __name__ == '__main__':
main()
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment