Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce implications of Location header field. #30

Closed
sheeep opened this issue Sep 29, 2013 · 8 comments
Closed

Reduce implications of Location header field. #30

sheeep opened this issue Sep 29, 2013 · 8 comments

Comments

@sheeep
Copy link

sheeep commented Sep 29, 2013

I'm glad to see such projects in the wild. Uploading files was a complete mess until now. :)

I considered implementing this protocol to the Symfony2 bundle OneupUploaderBundle and stumbled over the following part of the specification.

Servers MUST acknowledge a successful file creation request with a 201 Created response code and include an absolute url for the created resource in the Location header.

See section 6.1.3.1 for details.

This has multiple severe implications on application backends that support tus.

  • The enforcement of a Location header that is an absolute url for the created resource assumes that there is always a public url for an uploaded file. Even though possible use-cases might be edge cases, I don't think a protocol should enforce such behavior.
  • It implies that the backend stores currently uploading files to the same directory than previously (and complete) uploaded files. The protocol itself does not mention a possibility to send a final destination to the frontend after having a file upload complete. (See section 5.1). The usage of a temporary directory would make the cleanup pretty easy.
  • Obviously there is no way of accessing MIME data while uploading an image, as there is no way of proving that the file headers are uploaded completely. This makes it impossible to name the file according to its mime type, which is as far as I can tell pretty common.

Given the fact that there must be a valid identifier which can be sent along in subsequent requests, I can think of the following possibility:

Do not enforce an accessible url for the created resource in the following requests:

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
PATCH /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Content-Type: application/offset+octet-stream
Content-Length: 30
Offset: 70

[remaining 30 bytes]

Think of 24e533e02ec3bc40c387f1a0e460e216 as an identifier rather than a file name.

And extend the last response from the backend by a Location field to send an accessible url to the client. (This could be optional given the fact that there might be none. RECOMMENDED or OPTIONAL)

This way it is possible to name the file after the last uploaded bytes and move it from a temporary directory to its final place.

What do you think? Would this be a reasonable way to go, or am I missing something (maybe I misunderstand the specification at this point)?

cc 1up-lab/OneupUploaderBundle#52

@vayam
Copy link
Member

vayam commented Oct 1, 2013

Is your typical use case TUS Upload server for receiving uploads and finally archiving in S3 or something equivalent? Can you not implement a simple redirect /files/id to permanent storage?

Correct me if I am wrong. You want it be more like upload_id similar to google cloud storage resumable api or s3 multipart api?

@sheeep
Copy link
Author

sheeep commented Oct 1, 2013

Is your typical use case TUS Upload server for receiving uploads [...]

By TUS Upload server do you mean the reference implementation / php-tus? If so: No, I thought about implementing it directly, as the php variant depends on predis.

[...] and finally archiving in S3 or something equivalent?

The mentioned bundle supports the usage of Gaufrette, a filesystem abstraction layer. This is why I can't predict the used storage backend, so archiving on a S3 lies within the possibilities.

Can you not implement a simple redirect /files/id to permanent storage?

An upload server should be storage agnostic IMHO, so you can easily change the type of storage when you need to. I could force the user to map ids to public accessible files once the upload is complete. But then again, you would have to define for how long such redirects should be present.

Correct me if I am wrong. You want it be more like upload_id similar to google cloud storage resumable api or s3 multipart api?

This is correct. I'm not saying it could not be an accessible route or an url if you like. All I asked myself was: Why force the upload server to have an url/route in the first place and before the file completely uploaded? Ah. I have probably found the source of my confusion. According to #28 the file retrieval strategy is still in discussion.

However, the problem naming the file still persists. I think there should be a way of telling the client the location of a fully uploaded file, once the upload process is completed. This way, you could easily determine MIME types server side and name the file accordingly. And as a positive side effect, you would not have to implement a redirect from the temporary directory to the chosen storage layer.

@vayam
Copy link
Member

vayam commented Oct 3, 2013

This is correct. I'm not saying it could not be an accessible route or an url if you like. All I asked myself was: Why force the upload server to have an url/route in the first place and before the file completely uploaded?

Location header has to be absolute uri

However, the problem naming the file still persists. I think there should be a way of telling the client the location of a fully uploaded file, once the upload process is completed. This way, you could easily determine MIME types server side and name the file accordingly. And as a positive side effect, you would not have to implement a redirect from the temporary directory to the chosen storage layer

There are two possibilities here:

  • The client is intelligent and may want to put the the final uploaded file in a specific permanent location.
  • Once the upload is complete, upload server archives it in a permanent location and provides that to user.

I am not sure if this should be part of TUS spec. dropbox api has something similar.
Thinking aloud here!! How about using optional Entity-Location

Client specifies final file location

POST /files HTTP/1.1
Host: tus.example.org
Content-Length: 0
Entity-Length: 100
Entity-Location: https://mycdn.com/vayam/gravatar.webp 

HTTP/1.1 201 Created
Location: http://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216

Upload server generates final file location

POST /files HTTP/1.1
Host: tus.example.org
Content-Length: 0
Entity-Length: 100

HTTP/1.1 201 Created
Location: http://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216
Entity-Location:  https://mycdn.com/24e533e02ec3bc40c387f1a0e460e216

It is not perfect. The other option is to have an additional PATCH request. Once the upload is completed, Client makes final PATCH request. If there is Entity-Location header, upload server uses it. Else it generates one and sends it. This is final request.

Client specifies final location

PATCH /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Content-Type: application/offset+octet-stream
Entity-Location: https://mycdn.com/vayam/gravatar.webp
Content-Length: 0

HTTP/1.1 200 Ok

Upload server specifies final location

PATCH /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Content-Type: application/offset+octet-stream
Content-Length: 0

HTTP/1.1 200 Ok
Entity-Location: https://mycdn.com/24e533e02ec3bc40c387f1a0e460e216

Also it is tricky for large files. Consider a situation a file is being transferred to final location. We should probably return another header

Entity-Archived: False

upon transfer complete

Entity-Archived: True

Doesn't look elegant. You might have thought through this already. I am open to suggestions.

@sheeep
Copy link
Author

sheeep commented Oct 7, 2013

The client is intelligent and may want to put the the final uploaded file in a specific permanent location.

I'd rather not assume any client to be intelligent in any other way than supporting the basic protocol.

Once the upload is complete, upload server archives it in a permanent location and provides that to user.

That's in my opinion the way to go. I'd imagine a possible conversation like this:

POST /files HTTP/1.1
Host: tus.example.org
Content-Length: 0
Entity-Length: 100

I don't see the need that a client specifies an Entity-Location, as the filesystem/path handling should lie within the scope of the server.

HTTP/1.1 201 Created
Location: http://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216

The Location marks either a temporary file location or the final one. A client should not care until further noticed. Note that no Entity-Location is provided, as long as the file is not completely uploaded yet.

The last PATCH.

PATCH /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Content-Type: application/offset+octet-stream
Content-Length: 30
Offset: 70

[remaining 30 bytes]

And of course the servers response.

HTTP/1.1 200 Ok
Location: http://tus.example.org/web/uploads/my-file-12.png

The only thing that changes to the current version of the protocol is the last response from the server, where either an additional (but optional) Location or Entity-Location would be feasible. If it is not provided, a client should assume that the file is requestable through the path it was given in the first server response.

Furthermore, this behaviour would not break backward compatibility. What do you think, could this be a possibility?

@vayam
Copy link
Member

vayam commented Oct 8, 2013

I'd rather not assume any client to be intelligent in any other way than supporting the basic protocol.

It as a valid scenario. The API clients may decide to save it in a specific bucket/filepath similar to S3.

The only thing that changes to the current version of the protocol is the last response from the server, where either an additional (but optional) Location or Entity-Location would be feasible. If it is not provided, a client should assume that the file is requestable through the path it was given in the first server response.

Adding to location to final PATCH response: the only problem I see is that if you are archiving your file on S3 or some long term storage, It can fail and you might have to retry. For large files it might take a while and client might timeout. In the event any failure, would you retry the entire last PATCH request? How do you suggest we handle errors related to copying the file to final destination? IMO archiving to permanent storage should be asynchronous.

@sheeep
Copy link
Author

sheeep commented Oct 8, 2013

IMO archiving to permanent storage should be asynchronous.

Yes. That's necessary for remote filesystems like S3. But if your permanent storage is the same (local) filesystem than the temporary one (but different folders), you don't have to move a file the classical way. A simple rename does the same in no time.

It as a valid scenario. The API clients may decide to save it in a specific bucket/filepath similar to S3.

You're right, sorry. I was just thinking of frontend uploaders. Taking APIs into consideration, the protocol design makes perfectly sense. Maybe I'll just implement it the way you mentioned previously. Link the id to the actual file and implement some kind of expiration policy for the id, like the one mentioned here.

Do you mind if I leave this issue open for the time I'm implementing the protocol? Maybe I spot some more possibilities during this process.

@vayam
Copy link
Member

vayam commented Oct 9, 2013

Do you mind if I leave this issue open for the time I'm implementing the protocol? Maybe I spot some more possibilities during this process.

Sounds good to me.

@Acconut
Copy link
Member

Acconut commented Dec 3, 2014

I don't see the need that a client specifies an Entity-Location, as the filesystem/path handling should lie within the scope of the server.

I agree.

Do you mind if I leave this issue open for the time I'm implementing the protocol? Maybe I spot some more possibilities during this process.

Any update on this?

We have discussed handling filenames already (see #38) although this may be be exactly the same problem as the one you have.

@Acconut Acconut closed this as completed Oct 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants