Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

q.site.upload is slow #982

Closed
bminixhofer opened this issue Sep 6, 2021 · 9 comments · Fixed by #1765
Closed

q.site.upload is slow #982

bminixhofer opened this issue Sep 6, 2021 · 9 comments · Fixed by #1765
Assignees
Labels
perf Related to performance py Related to Python Driver
Milestone

Comments

@bminixhofer
Copy link

Wave SDK Version, OS

0.17.0, Linux

Actual behavior

We are dealing with some large files (predictions, model weights etc.) with a couple hundred MB, which we want to make available for download.

I'm running Wave locally, and with the following code:

from h2o_wave import Q, main, app, ui
import time

@app("/demo")
async def serve(q: Q):
    start = time.time()

    (url,) = await q.site.upload(["data.dump"])

    upload_time = time.time() - start
    print(f"upload time: {upload_time}")

    q.page['meta'] = ui.meta_card(box='')
    q.page["meta"].redirect = f"http://localhost:10101/{url}"

    await q.page.save()

I get

$ head -c 100MB < /dev/urandom > data.dump
$ wave run app
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [28044] using statreload
INFO:     Started server process [28046]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     127.0.0.1:37576 - "POST / HTTP/1.1" 200 OK
upload time: 33.335158348083496

i.e. it takes ~ 33 seconds for a 100MB file to start downloading.

Expected behavior

Since the file only has to be copied to a location on the same machine, I'd expect the download to start almost immediately.

Steps To Reproduce

  1. Start waved locally.

  2. head -c 100MB < /dev/urandom > data.dump

  3. Write app.py:

from h2o_wave import Q, main, app, ui
import time

@app("/demo")
async def serve(q: Q):
    start = time.time()

    (url,) = await q.site.upload(["data.dump"])

    upload_time = time.time() - start
    print(f"upload time: {upload_time}")

    q.page['meta'] = ui.meta_card(box='')
    q.page["meta"].redirect = f"http://localhost:10101/{url}"

    await q.page.save()
  1. wave run app
@bminixhofer
Copy link
Author

@srini-x I can't seem to assign you, please just assign yourself.

@lo5 lo5 self-assigned this Sep 10, 2021
@lo5 lo5 added the py Related to Python Driver label Sep 10, 2021
@lo5 lo5 added this to the 2021 milestone Sep 10, 2021
@lo5
Copy link
Member

lo5 commented Sep 10, 2021

Looks like the httpx async client is slower than the regular client:

package main

import (
	"io"
	"log"
	"net/http"
	"os"
)

func upload(w http.ResponseWriter, r *http.Request) {
	if err := r.ParseMultipartForm(32 << 20); err != nil { // 32 MB
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}
	form := r.MultipartForm
	files, ok := form.File["files"]
	if !ok {
		http.Error(w, "no files", http.StatusBadRequest)
		return
	}
	for _, file := range files {
		src, err := file.Open()
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}
		defer src.Close()
		dst, err := os.OpenFile(file.Filename, os.O_WRONLY|os.O_CREATE, 0666)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}
		defer dst.Close()
		io.Copy(dst, src)
	}
}
func main() {
	http.HandleFunc("/", upload)
	log.Fatal(http.ListenAndServe(":8080", nil))
import httpx
import time
import os

files = ['data.dump']
print('uploading...')
start = time.time()
res = httpx.post('http://localhost:8080/', files=[('files', (os.path.basename(f), open(f, 'rb'))) for f in files])
print(f'upload time: {time.time() - start}, status: {res.status_code} {res.text}')
(venv) elp@studio py % python upload.py
uploading...
upload time: 8.068389892578125, status: 200 
import asyncio
import httpx
import time
import os

files = ['data.dump']


async def main():
    async with httpx.AsyncClient() as client:
        print('uploading...')
        start = time.time()
        res = await client.post('http://localhost:8080/',
                                files=[('files', (os.path.basename(f), open(f, 'rb'))) for f in files])
        print(f'upload time: {time.time() - start}, status: {res.status_code} {res.text}')


asyncio.run(main())
(venv) elp@studio py % python upload.py
uploading...
upload time: 39.59935522079468, status: 200 

@lo5
Copy link
Member

lo5 commented Sep 10, 2021

Maybe related: encode/httpx#838

@psinger
Copy link

psinger commented Sep 10, 2021

Maybe related: encode/httpx#838

Looks like they are not eager to improve speed soon:
encode/httpx#838 (comment)

@lo5
Copy link
Member

lo5 commented Sep 10, 2021

FYI: workaround / alternate feature that might help circumventing this issue: https://github.com/h2oai/wave/blob/master/website/docs/files.md#serving-files-directly-from-the-wave-server

Will be out in the next release.

@psinger
Copy link

psinger commented Sep 10, 2021

@lo5 looks good, will this also work in h2o-cloud?

@lo5
Copy link
Member

lo5 commented Sep 10, 2021

@psinger Filed https://github.com/h2oai/h2o-ai-cloud/issues/1920

@lo5
Copy link
Member

lo5 commented Feb 27, 2022

Closed:
encode/httpx#1948
encode/httpx#838

@mturoci
Copy link
Collaborator

mturoci commented Aug 12, 2022

Looks like they are not eager to improve speed soon:

We could move the files within the FS in if wave app and wave server are located on the same machine (which is very common anyway) and dodge pushing it through HTTP. Wdyt @lo5?

@mturoci mturoci added perf Related to performance and removed bug Bug in code labels Oct 5, 2022
mturoci added a commit that referenced this issue Dec 15, 2022
mturoci added a commit that referenced this issue Jan 10, 2023
mturoci added a commit that referenced this issue Jan 10, 2023
mturoci added a commit that referenced this issue Jan 12, 2023
mturoci added a commit that referenced this issue Jan 13, 2023
mturoci added a commit that referenced this issue Jan 13, 2023
mturoci added a commit that referenced this issue Jan 20, 2023
mturoci added a commit that referenced this issue Jan 20, 2023
mturoci added a commit that referenced this issue Jan 20, 2023
mturoci added a commit that referenced this issue Jan 20, 2023
mturoci added a commit that referenced this issue Jan 20, 2023
mturoci added a commit that referenced this issue Jan 20, 2023
mturoci added a commit that referenced this issue Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Related to performance py Related to Python Driver
Projects
None yet
4 participants