Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster gateway directory listings #8178

Closed
Stebalien opened this issue Jun 7, 2021 · 11 comments · Fixed by #8853
Closed

Faster gateway directory listings #8178

Stebalien opened this issue Jun 7, 2021 · 11 comments · Fixed by #8853
Assignees
Labels
effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature P3 Low: Not priority right now topic/gateway Topic gateway

Comments

@Stebalien
Copy link
Member

Stebalien commented Jun 7, 2021

Update 11/4(@schomatis): We need to refactor the gateway code to use UnixfsAPI.Ls() instead of Get.

Instead of resolving file types up-front in the gateway, we can take advantage of streaming HTML rendering (most browsers support this) by:

  1. Returning an HTML body with the directory listing as fast as possible.
  2. Streaming back some trailing style elements to "update" the directory listing with file types.

That way the directory listing appears immediately, and icons for file-types will be filled in as new <style> elements get sent back to the browser.

Alternatively, we could do this with some javascript. But I'd like to avoid that if possible.

@Stebalien Stebalien added the kind/enhancement A net-new feature or improvement to an existing feature label Jun 7, 2021
@Stebalien Stebalien added effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue P3 Low: Not priority right now labels Jun 7, 2021
@lidel
Copy link
Member

lidel commented Mar 30, 2022

I believe this will be unblocked and ready to work on after go-ipfs 0.13/14 ships with alternative response formats. Loose idea is to return listing without sizes fast, and then fetch them on-demand (if needed):

@lidel lidel modified the milestones: go-ipfs 0.14, Best Effort Track Mar 30, 2022
@lidel lidel added the topic/gateway Topic gateway label Mar 30, 2022
@lidel
Copy link
Member

lidel commented Apr 5, 2022

quick thought on a band-aid for this:

  • we should detect when dir is over N items and only render that + add footer saying Directories bigger than N items can't be rendered fully. Try using the CLI: ipfs ls -s --size=false --resolve-type=false $BIG_DIR_CID
  • this would be a band-aid, but probably easy to implement in a few lines of go around gateway_handler_unixfs_dir.go#L112 and then add footer when .Listing is equal or bigger than N

@lidel
Copy link
Member

lidel commented Apr 5, 2022

@schomatis would you be interested in doing the band-aid described above, so we could ship it in go-ipfs 0.13? It would solve some perf problems on gateways. Let's go with N=5000 for now.

@schomatis
Copy link
Contributor

Assigning. If this is time sensitive I'm probably not the best person though, but will give you an update on this this week.

@schomatis schomatis self-assigned this Apr 5, 2022
@schomatis
Copy link
Contributor

Sorry, just now reading the band-aid alternative: yes, seems like a quick fix I can implement to help alleviate this.

@schomatis
Copy link
Contributor

@lidel Need some clarifications:

A. On the objective of this issue, I'm not clear which is the performance downgrade we're trying to fix. From the OP:

Instead of resolving file types up-front in the gateway

From what I see in the code we don't fetch the type information but guess it from file name extension (which is something we already have available in the directory node itself).

B. Guessing about any other potential performance issues, the only potential other fetch is to get the hash of the file in the path resolver. We have that available in the directory but it's not exposed in the Node interface. (This may not even be a real network fetch. I'm not familiar with our path resolver; it might just fetch the directory DAG node locally. I'll need to take a closer look if necessary.)

C. On the band-aid, assuming there is some performance hit somewhere, we don't know upfront how many entries a directory has (at least for the HAMT case, which is probably the type of big directory this issue has in mind). We can't limit upfront by its size since this is being continually streamed (is maybe this the performance issue we're trying to address in the first place?). In any case, what we could do is add a timer and if the directory hasn't been fully iterated stop there and add the warning proposed.

@schomatis
Copy link
Contributor

Reading the original issue that spawned this one, #7495, I now see we're conflating two different listing mechanisms:

  • ipfs ls command (original issue), using UnixfsAPI.Ls(): without setting -size=false --resolve-type=false it indeed fetches independently every entry in the directory (with its performance hit).
  • Gateway listing (this issue here), using UnixfsAPI.Get(): we do not fetch directory entries (except in the potential path resolution of point B above to get the hash, but that's a big maybe, and even if that were true we could just get it from the directory itself).

We need to clearly distinguish between the two, possibly having independent issues (since each mechanism has its independent code; the gateway is not just calling ipfs ls/UnixfsAPI.Ls() behind the scenes).

@schomatis
Copy link
Contributor

(Marking as blocked until we clarify all this.)

@lidel
Copy link
Member

lidel commented Apr 6, 2022

This should unblock you:

  • This issue is about gateway behavior in gateway_handler_unixfs_dir.go when no index.html is present (when HTML listing is generated), don't worry about ipfs ls, it already has flags that solve the perf problem:
    ipfs ls -s --size=false --resolve-type=false bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm | wc -l
    
  • Indeed, iiuc UnixfsAPI.Get() does not fetch child blocks, however inside for dirit.Next() loop we do fetch them via ResolvePath call (Node().Size() is reading Size based on root block of each child)
  • My understanding is that band-aid would count how many times we called dirit.Next() and break out of the loop when we hit some number. That way, we only fetch 5k+1 blocks.
    • If this helps, you can use bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm for testing.
      This dir was originally reported by @alanshaw to load slow on gateway (it has 10100 items 🤯)

@alanshaw
Copy link
Member

alanshaw commented Apr 6, 2022

🙏 This would be super useful - folks are uploading NFT drops with directories of 10k+ items to NFT.Storage and are getting confused when trying to view them on the gateway - it times out and they assume their upload did not complete successfully.

@lidel
Copy link
Member

lidel commented Apr 14, 2022

@alanshaw check #8853 (review) – I've tested it with bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm and loads as soon the root node arrives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature P3 Low: Not priority right now topic/gateway Topic gateway
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants