Further reduced code snippet for reproducing the bug:
file1 = db.File(file="/dev/null", path="/1Ä Ö Ü ÖR ¢")file1.insert()
Cursory testing suggests that any insertion of a file with at least five non-ASCII characters anywhere (names, parents, properties, path) will trigger this.
The characters do not have to be part of the file entity, this code:
I did not find a way to reproduce the error over the webinterface or with a record, which makes it seem likely that the bug is caused somewhere in the encoding of a multipart request in the pylib.
However, using the two calls
db.File(file="/dev/null", path="/1Ä Ö Ü ÖR ¢").insert()
and
db.File(file="/dev/null", path="/1Ä Ö Ü OR ¢").insert()
whose only difference is O vs. Ö in the path, the requests seem equivalent down to urllib3/connection.py, which then calls the python standard lib http, and I think any bug there would have been noticed at some point before now. A quick search found no corresponding results, so either I missed something (which is very possible, not an expert on http requests) or the problem might not be in the request sent by the pylib after all.
In my opinion the approach most likely to yield results would be to check whether the requests are equivalent when they reach the server and then go from there, but this is currently outside the allotted timeframe.
with the only differences between both calls being different hex codes (naturally) and different paths: path="/1\xc3\x84 \xc3\x96 \xc3\x9c \xc3\x96R \xc2\xa2" vs path="/1\xc3\x84 \xc3\x96 \xc3\x9c OR \xc2\xa2", as would be expected.
However, while one of them successfully terminates when called from the pythonlib and the other causes an error, both are successfully uploaded when calling the curl command curl -b cookie.txt -F "FileRepresentation=<file.xml" -F "testfile.bla=@testfile.bla" "<SERVER>/Entity" --insecure
with file.xml containing either
which are the closest xml versions to the xml from the chunk and to_xml() respectively that do not trigger any other errors. Additionally, while there are other server errors that potentially may have masked this one, any xml I tested that lead to a successful upload via curl will still upload successfully when adding any number of non-ASCII chars.
This makes it rather likely to be a problem with the pylib calls again. My next approach would be to check whether there is any reasonably quick way of intercepting the final calls from python and curl themselves, probably via proxy, and comparing them. However, as this will take up more time than is currently available, further investigation of the bug has been deferred.