Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webui Hangs after adding heavy workload to ipfs cluster peers #2143

Open
trendsetter37 opened this issue Jul 26, 2023 · 12 comments
Open

Webui Hangs after adding heavy workload to ipfs cluster peers #2143

trendsetter37 opened this issue Jul 26, 2023 · 12 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding need/author-input Needs input from the original author topic/perf Performance

Comments

@trendsetter37
Copy link

Description
So I’ve come across a weird situation where my private cluster was empty, The only operations we used were just adding small test files to watch how they were distributed. During this time the webui for each private node continued to work as usual.

However, after adding a heavier load ~ 10GB of data to the private cluster using ips-cluster-ctl, the webui stopped working for each node. Good news is that the gateways still work but this seems like a weird bug. After the data was added to the private cluster the webui hangs if we attempt to navigate to the url in the browser.

To Reproduce
Steps to reproduce the behavior:

  1. Install and start private ipfs nodes using swarm keys.
  2. Check webui (still works at this point)
  3. install connect nodes to ipfs-cluster service
  4. Check webui (still works)
  5. Load at least 10GB of data through the cluster
  6. Check webui (Starts to hang when attempting to access)

Expected behavior
WebUI loads and is working normally.

Desktop (please complete the following information):

  • OS: Linux servers (Arch, Rasbian)
  • Browsers: Brave, Firefox, Palemoon
  • Version: Most recent

Additional context
It is interesting that the problem only occurs after loading more that a trivial amount of data to the cluster. Unsure of what the lower threshold would be.

@trendsetter37 trendsetter37 added the need/triage Needs initial labeling and prioritization label Jul 26, 2023
@welcome
Copy link

welcome bot commented Jul 26, 2023

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@whizzzkid
Copy link
Contributor

Thanks for submitting this issue @trendsetter37, I am not surprised that this happened, the performance degradation could be because of multiple reasons, but before we jump to conclusions on what might've happened, I would like to learn more. Can you please:

  • Describe the shape of the data that you were benchmarking against?
    • Number of files
    • Levels, Nodes/Level, Edges
    • Type of Data
  • Confirm if a single 10gig file can cause this? or what's the minimum number of files you've experienced this issue with.
  • Share Logs and Configs which might help us understand what happens at the node when this content is added.
  • Share a dataset that makes it easy to reproduce.

We have huge datasets in the explore section, which seem to work fine, but it's most likely the file-browser that's not liking the number of nodes it needs to work with. A reproducible example can help us identify the root cause and triage this better.

Thanks!

@whizzzkid whizzzkid added need/author-input Needs input from the original author kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding topic/perf Performance and removed need/triage Needs initial labeling and prioritization labels Aug 3, 2023
@trendsetter37
Copy link
Author

Let's see, probably more than 50 files all reading from mostly PDFs at this point to a few video and audio files. I believe the audio files (~11GBs) may have been when I first noticed this. I did choose the trickle layout for those sense I wanted to take advantage of the more efficient Merkel Dag layout for linearly read files.

Note that I continued to add files even after the webui stopped working so I'll go back and spin up a fresh cluster to see if I can reproduce with less variables.

Thanks for reaching out @whizzzkid !

@trendsetter37
Copy link
Author

trendsetter37 commented Aug 3, 2023

@whizzzkid Can you say more about regarding: I am not surprised that this happened I can poke around the relevant code as well to look for clues. It sounds as if you already expected something like this to happen based on your knowledge of the relevant code surrounding webui/ipfs cluster stuff

@whizzzkid
Copy link
Contributor

@trendsetter37 the reason I'm not surprised is because UIs can slow down significantly as the number of DOM nodes grow, that's why I'm really interested in knowing the shape of the data that the UI is trying to render.

The easiest way to get the structure is tree -ahs you can dump it in a file tree -ahs -o /tmp/report.log. I attached the output of node_modules folder from ipfs-webui report.log but if you don't wanna share your filenames then that would be understandable.

I tested a few things:

  • tried with a large 35+gig folder with 15 files and webui did not have any issues.
  • Adding the node_modules seem to make webui unresponsive. Which sort of confirms my hunch that it's the innode count + nesting that makes it harder than the size itself.

@SgtPooki
Copy link
Member

SgtPooki commented Aug 4, 2023

@trendsetter37 thanks for reporting this.

Can you confirm the following?

  1. You added content to your ipfs node via CLI (i.e. outside of webui)
  2. You attempted to load the main webui page and it hangs (127.0.0.1:5001/webui on a local Kubo node)

For #1, I want to make sure we're looking at the correct issue. For #2, if webui hangs on the initial load (regardless of page) this seems like it could be an issue exacerbated by the content on the node, or an ipfsd-ctl issue when attempting to connect to that node, and not a fully contained webui issue (i.e. some bug in our processing of large data). Though either way, we still have some work to do.

If it's hanging on accessing a particular page, then that means some operations more specific to that page might be at fault.

I'm just trying to narrow down where our focus should be.

Thanks for your continued input!

@trendsetter37
Copy link
Author

trendsetter37 commented Aug 4, 2023

@SgtPooki yes, all content was added via CLI and the main page is what never loads. I haven't attempted to load any secondary pages.

Also, seems like what @whizzzkid found with seeing the same behaviour after adding the node_modulels/ here pretty close to what may be happening

@SgtPooki
Copy link
Member

SgtPooki commented Aug 4, 2023

Ok thanks 🙏. I figured but wanted to confirm

@trendsetter37
Copy link
Author

Update here: I'm also getting a 302 status when attempting to go directly to http://localhost:5001/webui. Is that expected?

@whizzzkid
Copy link
Contributor

@trendsetter37 that should not happen, is this after you added content?

@trendsetter37
Copy link
Author

trendsetter37 commented Aug 26, 2023

@whizzzkid Well i didn't check the status code before adding content. I can spin up a fresh install on docker to see if I get the same. However, at this time the variables are that these nodes are a part of a private cluster and have substantial data they are surving. 4 ipfs nodes collectively holding about 2.5 TB of data.

Current freespace:

12D3KooWDgX6pqihEMvq7a1TaK1VHDacsKsJWoG1JQtm6HTkwtq7 | freespace: 3.6 TB | Expires in: 23 seconds from now
12D3KooWFXXoYCfVMhpiRgWfuCw8DtuAkc1eWfRaNij2whsiboz9 | freespace: 3.0 TB | Expires in: 27 seconds from now
12D3KooWFfWM7eGQhibVzC4jqRrzBXWG8txCH5YXirG8ttqx32rF | freespace: 3.6 TB | Expires in: 18 seconds from now
12D3KooWReWQoDSB77UyKwxz2cH9NqMxMb9mFx8LUhjKU7AkNiq8 | freespace: 4.6 TB | Expires in: 20 seconds from now

And adding another node today actually.

@SgtPooki
Copy link
Member

If you're running a kubo node, it does a temp redirect(302) from the /webui path to the appropriate cid for the webui

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding need/author-input Needs input from the original author topic/perf Performance
Projects
No open projects
Status: Needs Grooming
Development

No branches or pull requests

3 participants