Slow line list download speeds #1107

sratcliffe118 · 2020-09-17T11:12:58Z

Do we have a sense for the speeds when a user clicks the download button?
I just tested on a 70-75Mb per second connection in UK for 137k entries and it is still running +2mins
This feels too slow for users

@Mougk Are there any download speed benchmarks / datasets we can look to?

attwad · 2020-09-17T11:59:01Z

Took me 1.4min to start the content download of 130K cases:

That's quite a long time.

After that network seems to be the bottleneck as the servers aren't using any crazy CPU or RAM at all

kubectl top pods
NAME                            CPU(cores)   MEMORY(bytes)
curator-prod-7bd48b496f-tbrtb   3m           53Mi
data-prod-c469b5ccf-rsqtj       18m          44Mi

Download seems to have stopped altogether after 2.2min though and no sign of anything going on other than that spinning widget in the download button.

I'm not sure if we can set some special http headers to start the download immediately in the browser instead of waiting for what seems like the complete file to be sent. @allysonjp715 in case she has any thoughts?

attwad · 2020-09-21T09:49:50Z

This might be relevant: https://nodejs.org/pt-br/docs/guides/backpressuring-in-streams/

Also, we should probably gzip the CSV before sending it, it will reduce the size of the download and maybe make it faster?
The stream pipeline stuff of node described in this website looks nice, perhaps we can use it in the curator service as well to do this?

allysonjp715 · 2020-09-21T10:09:47Z

Just found this, could be part of the problem: axios/axios#479

attwad · 2020-09-21T10:16:22Z

Also, I've looked a gzip and nginx already does gzip compression of responses for us (visible in the dev tools nettwork tab), although not for the download it seems, we should be able to configure it so that it does:

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#gzip-types

attwad · 2020-09-21T10:17:46Z

Just found this, could be part of the problem: axios/axios#479

I don't think it applies to us though? We're using node on the backend, not in the browser.

allysonjp715 · 2020-09-21T10:25:06Z

That bug is on axios. So IIUC, the response stream can't be handled while it's coming in (if it's going across the browser). So we couldn't start the download until all the data has come in. The slow UI after clicking download makes sense then because it's holding all the data in memory until the full response comes in, only then does the download happen.

Agreed we should try zipping the data.

attwad · 2020-09-21T11:37:59Z

I've enabled gzipping of csv data in nginx, will send a PR for that.

attwad · 2020-09-21T11:38:55Z

That bug is on axios. So IIUC, the response stream can't be handled while it's coming in (if it's going across the browser). So we couldn't start the download until all the data has come in. The slow UI after clicking download makes sense then because it's holding all the data in memory until the full response comes in, only then does the download happen.

Agreed we should try zipping the data.

I see, in that that's why the download don't work at all as we can't hold all the data in memory for sure.

attwad · 2020-09-22T07:14:15Z

Reloaded dev, took me 7s to start the download then browser took it over and it worked nicely, thanks @allysonjp715 for the fix! (I sent you a PR to push to prod as well today).

attwad · 2020-09-22T07:14:23Z

allysonjp715 · 2020-09-22T08:36:36Z

Unfortunately this isn't working in prod. For 100,000 cases it takes ~30s for the download to start. It shows the URL pending bar in the lower left hand corner, but we should show a loading spinner in the UX for that too.

I've also tried a couple times now and the download keeps hanging at ~46MB. The browser's download object shows continual spinning at that number and doesn't complete or increase after that.

allysonjp715 · 2020-09-22T12:07:21Z

The download is fully working after that last PR 👍 The browser download object shows up immediately and begins the download, and the full download succeeds with ~100,000 cases. The full response is streaming so it should succeed no matter how many cases there are.

SandraAdele · 2020-12-17T12:51:14Z

Line list download either slow or does not work. I get the error code "502 Bad Gateway" @calremmel

Mougk · 2020-12-17T12:52:59Z

In addition download speed was very slow (>10minutes for me)

calremmel · 2021-01-05T16:05:44Z

Switching to a cached download for the entire dataset should help address this. Most filtered subsets should be fine, but we should be mindful for large filtered subsets, such as Brazil, which can have in excess of 100K cases. If many people try to do this, we should think of how we may want to make this more user friendly for them.

calremmel · 2021-01-20T15:25:00Z

Next steps:

Write script to export cases as csv in 100K chunks
Write script to process nested arrays and other formatting per chunk
Write script to combine chunks into single file and compress
Schedule pipeline to run nightly at 12am UTC

calremmel · 2021-01-20T15:27:50Z

Tracked in #1436

sratcliffe118 added Data Bug is related to data Eng ready labels Sep 17, 2020

sratcliffe118 added this to the Public launch milestone Sep 17, 2020

sratcliffe118 assigned attwad, Mougk and allysonjp715 Sep 17, 2020

attwad mentioned this issue Sep 21, 2020

Compress text/csv responses in nginx #1151

Merged

allysonjp715 mentioned this issue Sep 21, 2020

Use <form> to download cases in UI #1158

Merged

attwad closed this as completed Sep 22, 2020

allysonjp715 reopened this Sep 22, 2020

allysonjp715 unassigned Mougk Sep 22, 2020

allysonjp715 mentioned this issue Sep 22, 2020

Stream mongo docs in download action #1169

Merged

allysonjp715 closed this as completed Sep 22, 2020

SandraAdele reopened this Dec 17, 2020

Mougk added the P1: Launch blocker Needs fixing before we launch, schedule some time to investigate & fix label Dec 17, 2020

Mougk removed this from the Friends and Family launch milestone Dec 17, 2020

Mougk added this to the Marketing Comms launch milestone Dec 17, 2020

Mougk removed Data Bug is related to data Eng ready labels Dec 17, 2020

joe-brilliant assigned calremmel Jan 15, 2021

calremmel added this to In progress in Global.health BCH/NE/Oxford Kanban Board Jan 15, 2021

calremmel closed this as completed Jan 20, 2021

calremmel removed this from In progress in Global.health BCH/NE/Oxford Kanban Board Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow line list download speeds #1107

Slow line list download speeds #1107

sratcliffe118 commented Sep 17, 2020

attwad commented Sep 17, 2020

attwad commented Sep 21, 2020

allysonjp715 commented Sep 21, 2020

attwad commented Sep 21, 2020 •

edited

attwad commented Sep 21, 2020

allysonjp715 commented Sep 21, 2020

attwad commented Sep 21, 2020 •

edited

attwad commented Sep 21, 2020

attwad commented Sep 22, 2020

attwad commented Sep 22, 2020

allysonjp715 commented Sep 22, 2020

allysonjp715 commented Sep 22, 2020

SandraAdele commented Dec 17, 2020

Mougk commented Dec 17, 2020

calremmel commented Jan 5, 2021

calremmel commented Jan 20, 2021 •

edited

calremmel commented Jan 20, 2021

Slow line list download speeds #1107

Slow line list download speeds #1107

Comments

sratcliffe118 commented Sep 17, 2020

attwad commented Sep 17, 2020

attwad commented Sep 21, 2020

allysonjp715 commented Sep 21, 2020

attwad commented Sep 21, 2020 • edited

attwad commented Sep 21, 2020

allysonjp715 commented Sep 21, 2020

attwad commented Sep 21, 2020 • edited

attwad commented Sep 21, 2020

attwad commented Sep 22, 2020

attwad commented Sep 22, 2020

allysonjp715 commented Sep 22, 2020

allysonjp715 commented Sep 22, 2020

SandraAdele commented Dec 17, 2020

Mougk commented Dec 17, 2020

calremmel commented Jan 5, 2021

calremmel commented Jan 20, 2021 • edited

calremmel commented Jan 20, 2021

attwad commented Sep 21, 2020 •

edited

attwad commented Sep 21, 2020 •

edited

calremmel commented Jan 20, 2021 •

edited