Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large File Upload Support #3336

Open
t83714 opened this issue Mar 21, 2022 · 0 comments
Open

Large File Upload Support #3336

t83714 opened this issue Mar 21, 2022 · 0 comments

Comments

@t83714
Copy link
Contributor

t83714 commented Mar 21, 2022

Large File Upload Support

Magda allow you to store the data file into either internal storage (k8s PV) or cloud storage via our storage API (backed by minio).

However, the current web UI might not work for a file in very large size (e.g. 10GB) as:

  • Our UI read whole file content in one go
  • The storage API process the whole file in one request

We should:

  • As part of Reshape Storage API #3335, upload file via resigned URL only
  • Process file in chunks at the frontend (using file slice API) and upload to storage via AWS multi-part protocol (supported by minio)
  • Our implementation should also support resuming a previously dropped off upload
    • The current create dataset UI already has the "recover previous change" function. We should extend this function to cover the file upload process as well

Acceptance Criteria

  • The feature can be turned on / off via web-server Helm Chart config multipartUpload.enabled
  • the part size can be config via multipartUpload.partSize with default value 6MB. If a file is smaller than the part size, we will not upload the file with multipart mode.
  • If a multipart upload process is interrupted (e.g. due to network error or browser crash etc.), we should allow user to resume the upload without re-upload those complete parts.
  • As "List Multipart Uploads always returns an empty list" when using GCS & Azure blob, we should find a way to accommodate the cloud vendor implementation differences and make sure the function works consistently across AWS S3, GCS & Azure blob storage options.
@t83714 t83714 added this to the Next milestone Mar 21, 2022
@t83714 t83714 added the Epic label Apr 4, 2022
@t83714 t83714 modified the milestones: Next (v2.0.0), v3.0.0 Jul 28, 2022
@t83714 t83714 modified the milestones: v2.1.0, v3.0.0 Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant