We need a solution to transfer larger files (> 64M), ideally we want things like resumable file transfers in case of unstable connections, and possibly continuous streams (e.g., sensor data). It's unclear whether GRPC is the correct protocol for this or whether a protocol like tus might be more suited for files specifically.
I'd make two unary gRPCs: Upload and Finalize. Upload returns an xsrf-based HTTPS upload link that > encodes the offset (default zero if not specified, i.e. a new upload), file size, expiration time, > and path and such. Finalize takes the upload URL and finds the file, checks it for completeness, > and moves it from the temp dir to the final location.
Now you can implement an HTTP endpoint that verifies the token, opens the file, seeks to the offset, and CopyN's from the request body to the seeked File. In case of an error it can return the amount written in case the client is interested in resuming.
This makes it efficient to not buffer too much, avoids infinite uploads, is resumable, and is stateless (or more specifically doesn't store much state outside the filesystem, at least state that's on the serving path).
gRPC has its own flow control, framing, error handling, etc but fundamentally you are processing the data in chunks and you have to handle that logic yourself. With http, you can let the runtime and standard libraries choose optimal ways of streaming to disk -- potentially even using a special splice syscall to make it zero copies through user space under special circumstances.
tus is a relatively simple protocol for resumable file uploads via HTTP. A Java server implementation of the tus protocol that could be integrated into the LinkAhead server can be found here: https://github.com/tomdesair/tus-java-server
However, the recommended way to use tus is via the Go tusd reference implentation, which can be integrated in your own application using the hook system:
When integrating tusd into an application, it is important to establish a communication channel between tusd and your main application. For this purpose, tusd provides a hook system which triggers user-defined actions when certain events happen, for example when an upload is created or finished. This simple-but-powerful system enables many uses, such as logging, validation, authorization, and post-processing of the uploaded files.
Yes, this is made possible by the hook system inside the tusd binary. It enables custom routines to be executed when certain events occurs, such as a new upload being created which can be handled by the pre-create hook. Inside the corresponding hook logic, you can run your own validations against the provided upload metadata to determine whether the action is actually allowed or should be rejected by tusd. Please have a look at the corresponding example for a more detailed explanation.
User authentication can be achieved by two ways: Either, user tokens can be included in the upload meta data, as described in the above example. Alternatively, traditional header fields, such as Authorization or Cookie can be used to carry user-identifying information. These header values are also present for the hook requests and are accessible for the pre-create hook, where the authorization tokens or cookies can be validated to authenticate the user.
If the authentication is successful, the hook can return an empty hook response to indicate tusd that the upload should continue as normal. If the authentication fails, the hook can instruct tusd to reject the upload and return a custom error response to the client. For example, this is a possible hook response:
Note that this handles authentication during the initial POST request when creating an upload. When tusd responds, it sends a random upload URL to the client, which is used to transmit the remaining data via PATCH and resume the upload via HEAD requests. Currently, there is no mechanism to ensure that the upload is resumed by the same user that created it. We plan on addressing this in the future. However, since the upload URL is randomly generated and only short-lived, it is hard to guess for uninvolved parties.
After discussing with @timm, we agreed that having tusd as a separate microservice handling uploads and communicating with the LinkAhead server via hooks is a promising solution.