Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
CaosDB Crawler
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Iterations
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
caosdb
Software
CaosDB Crawler
Commits
66a6af29
Commit
66a6af29
authored
9 months ago
by
Alexander Schlemmer
Browse files
Options
Downloads
Plain Diff
Merge branch 'dev' into f-rocrate-converter
parents
ea42c5f8
9c85e25d
No related branches found
No related tags found
2 merge requests
!198
REL: Release 0.10.0
,
!193
ROCrate-Converter (also for .eln-files)
Pipeline
#57338
failed
9 months ago
Stage: info
Stage: setup
Stage: cert
Stage: style
Stage: test
Changes
3
Pipelines
1
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
CHANGELOG.md
+11
-0
11 additions, 0 deletions
CHANGELOG.md
src/caoscrawler/crawl.py
+12
-1
12 additions, 1 deletion
src/caoscrawler/crawl.py
src/caoscrawler/logging.py
+21
-4
21 additions, 4 deletions
src/caoscrawler/logging.py
with
44 additions
and
5 deletions
CHANGELOG.md
+
11
−
0
View file @
66a6af29
...
@@ -20,6 +20,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
...
@@ -20,6 +20,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
```
```
-
Support for Python 3.13
-
Support for Python 3.13
-
ROCrateConverter, ELNFileConverter and ROCrateEntityConverter for crawling ROCrate and .eln files
-
ROCrateConverter, ELNFileConverter and ROCrateEntityConverter for crawling ROCrate and .eln files
-
`max_log_level`
parameter to
`logging.configure_server_side_logging`
to control the server-side debuglog's verboosity, and an optional
`sss_max_log_level`
parameter to
`crawler_main`
to control the SSS
loglevel separately from the global
`debug`
option.
### Changed ###
### Changed ###
...
@@ -30,6 +34,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
...
@@ -30,6 +34,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
-
if
`value`
starts with '+', collection mode is "list".
-
if
`value`
starts with '+', collection mode is "list".
-
if
`value`
starts with '
*
', collection mode is "multiproperty".
-
if
`value`
starts with '
*
', collection mode is "multiproperty".
-
in all other cases, collection mode is "single".
-
in all other cases, collection mode is "single".
-
The default server-side scrippting debug level is now controlled by
the global
`debug`
option by default and set to log level
`INFO`
in
case of
`debug=False`
. The previous behavior can be restored by
calling
`crawler_main`
with
`sss_max_log_level=logging.DEBUG`
.
### Deprecated ###
### Deprecated ###
...
@@ -39,6 +47,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
...
@@ -39,6 +47,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
-
Added better error message for some cases of broken converter and
-
Added better error message for some cases of broken converter and
record definitions.
record definitions.
-
[
#108
](
https://gitlab.com/linkahead/linkahead-crawler/-/issues/108
)
Too verbose server-side scripting logs that could lead to high disk
usage.
### Security ###
### Security ###
...
...
This diff is collapsed.
Click to expand it.
src/caoscrawler/crawl.py
+
12
−
1
View file @
66a6af29
...
@@ -1020,6 +1020,7 @@ def crawler_main(crawled_directory_path: str,
...
@@ -1020,6 +1020,7 @@ def crawler_main(crawled_directory_path: str,
restricted_path
:
Optional
[
list
[
str
]]
=
None
,
restricted_path
:
Optional
[
list
[
str
]]
=
None
,
remove_prefix
:
Optional
[
str
]
=
None
,
remove_prefix
:
Optional
[
str
]
=
None
,
add_prefix
:
Optional
[
str
]
=
None
,
add_prefix
:
Optional
[
str
]
=
None
,
sss_max_log_level
:
Optional
[
int
]
=
None
,
):
):
"""
"""
...
@@ -1053,6 +1054,12 @@ def crawler_main(crawled_directory_path: str,
...
@@ -1053,6 +1054,12 @@ def crawler_main(crawled_directory_path: str,
add_prefix : Optional[str]
add_prefix : Optional[str]
Add the given prefix to file paths.
Add the given prefix to file paths.
See docstring of
'
_fix_file_paths
'
for more details.
See docstring of
'
_fix_file_paths
'
for more details.
sss_max_log_level : Optional[int]
If given, set the maximum log level of the server-side
scripting log separately from the general ``debug`` option. If
None is given, the maximum sss log level will be determined
from the value of ``debug``: ``logging.INFO`` if ``debug`` is
False, ``logging.DEBUG`` if ``debug`` is True.
Returns
Returns
-------
-------
...
@@ -1063,7 +1070,11 @@ def crawler_main(crawled_directory_path: str,
...
@@ -1063,7 +1070,11 @@ def crawler_main(crawled_directory_path: str,
crawler
=
Crawler
(
securityMode
=
securityMode
)
crawler
=
Crawler
(
securityMode
=
securityMode
)
if
"
SHARED_DIR
"
in
os
.
environ
:
# setup logging and reporting if serverside execution
if
"
SHARED_DIR
"
in
os
.
environ
:
# setup logging and reporting if serverside execution
userlog_public
,
htmluserlog_public
,
debuglog_public
=
configure_server_side_logging
()
if
sss_max_log_level
is
None
:
sss_max_log_level
=
logging
.
DEBUG
if
debug
else
logging
.
INFO
userlog_public
,
htmluserlog_public
,
debuglog_public
=
configure_server_side_logging
(
max_log_level
=
sss_max_log_level
)
# TODO make this optional
# TODO make this optional
_create_status_record
(
_create_status_record
(
get_shared_resource_link
(
get_config_setting
(
"
public_host_url
"
),
htmluserlog_public
),
get_shared_resource_link
(
get_config_setting
(
"
public_host_url
"
),
htmluserlog_public
),
...
...
This diff is collapsed.
Click to expand it.
src/caoscrawler/logging.py
+
21
−
4
View file @
66a6af29
...
@@ -26,23 +26,40 @@ from caosadvancedtools.serverside.helper import get_shared_filename
...
@@ -26,23 +26,40 @@ from caosadvancedtools.serverside.helper import get_shared_filename
import
sys
import
sys
def
configure_server_side_logging
():
def
configure_server_side_logging
(
max_log_level
:
int
=
logging
.
INFO
):
"""
"""
Set logging up to save one plain debugging log file, one plain info log
Set logging up to save one plain debugging log file, one plain info log
file (for users) and a stdout stream with messages wrapped in html elements
file (for users) and a stdout stream with messages wrapped in html elements
returns the path to the file with debugging output
returns the path to the file with debugging output
Parameters
----------
max_log_level : int, optional
The maximum log level to use for SSS-logs. Default is
``logging.INFO``.
Returns
-------
userlog_public, htmluserlog_public, debuglog_public: str
Public paths of the respective log files.
"""
"""
adv_logger
=
logging
.
getLogger
(
"
caosadvancedtools
"
)
adv_logger
=
logging
.
getLogger
(
"
caosadvancedtools
"
)
adv_logger
.
setLevel
(
level
=
logging
.
DEBUG
)
# The max_<level> variables will be used to set the logger levels
# to the respective maximum of intended level and max_log_level,
# effectively cutting off logging above the specified
# max_log_level.
max_info
=
max
(
logging
.
INFO
,
max_log_level
)
max_debug
=
max
(
logging
.
DEBUG
,
max_log_level
)
adv_logger
.
setLevel
(
level
=
max_debug
)
cr_logger
=
logging
.
getLogger
(
"
caoscrawler
"
)
cr_logger
=
logging
.
getLogger
(
"
caoscrawler
"
)
cr_logger
.
setLevel
(
level
=
logging
.
DEBUG
)
cr_logger
.
setLevel
(
level
=
max_debug
)
userlog_public
,
userlog_internal
=
get_shared_filename
(
"
userlog.txt
"
)
userlog_public
,
userlog_internal
=
get_shared_filename
(
"
userlog.txt
"
)
root_logger
=
logging
.
getLogger
()
root_logger
=
logging
.
getLogger
()
root_logger
.
setLevel
(
level
=
logging
.
INFO
)
root_logger
.
setLevel
(
level
=
max_info
)
# this is a log file with INFO level for the user
# this is a log file with INFO level for the user
user_file_handler
=
logging
.
FileHandler
(
filename
=
userlog_internal
)
user_file_handler
=
logging
.
FileHandler
(
filename
=
userlog_internal
)
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment