Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Disk Usage and Sync Issues with gaiad v14.1.0 #2879

Closed
5 tasks
ronigk8io opened this issue Jan 3, 2024 · 4 comments
Closed
5 tasks

High Disk Usage and Sync Issues with gaiad v14.1.0 #2879

ronigk8io opened this issue Jan 3, 2024 · 4 comments
Labels
general questions other: decayed Stale issues that need follow up from commentators. Were closed for inactivity

Comments

@ronigk8io
Copy link

Summary of Bug

After upgrading to gaiad version 14.1.0, I've encountered two significant issues:

  1. Excessive Disk Write: The gaiad process is writing approximately 20GB of data to the data directory daily, which seems unusually high compared to previous versions.
  2. Intermittent Sync Delays: Every few hours, the synchronization process slows down dramatically, nearly halting, before eventually continuing. This erratic behavior was not observed in earlier versions.

These issues have only arisen following the recent upgrade to version 14.

Version

I am currently running gaiad version v14.1.0.

Steps to Reproduce

  1. Followed the official Cosmos Quickstart Guide to set up gaiad.
  2. Ran an upgrade from a previous version to v14.1.0.
  3. Noticed the issues shortly after the upgrade was completed and the node began normal operations.

Environment:

  • Instance Type: AWS EC2
  • Specifications: 4 vCPUs, 32GB RAM
  • Resource Utilization: ~50% CPU and 15GB RAM usage
  • Additional Context: This is an upgrade from an older version of gaiad, and I have not installed gaiacli as per the new recommendations.

Expected vs. Actual Behavior:

  • Expected: Normal disk write operations and consistent syncing performance as experienced in previous versions.
  • Actual: Unusually high disk write activity (~20GB/day) and periodic, significant sync slowdowns.

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
  • Is a spike necessary to map out how the issue should be approached?
@ronigk8io ronigk8io added type: bug Issues that need priority attention -- something isn't working status: waiting-triage This issue/PR has not yet been triaged by the team. labels Jan 3, 2024
@ronigk8io ronigk8io changed the title High Disk Usage Issue: gaiad Writing ~20GB/Day to Data Directory High Disk Usage and Sync Issues with gaiad v14.1.0 Jan 3, 2024
@mmulji-ic
Copy link
Contributor

mmulji-ic commented Jan 4, 2024

Thanks @ronigk8io for reporting, we'll check with our validator team to see if they're experiencing the same issues.

EDIT the validator team came back and said that they're not seeing missing blocks for at least a couple of weeks. We did have a syncing problem related to proposal vote counting, not sure if that issue is fixed, but it lies with the SDK version that we're using. It predates v14 by a long way also. Regarding storage, so the team came back asking about the pruning level that you've set, they get ~6gb/day with pruning interval at 100. Can you provide more details wrt the pruning and also, if possible, when the slow downs in synching occured. I can check whether it was proposal related or if other validators had the same issue at the same time.

@fmira21
Copy link

fmira21 commented Jan 17, 2024

@mmulji-ic
I can confirm, sync speed is down on my nodes as well.
I use the Dockerised version of the node, all AWS, r5.xlarge.
Also, the node dropped the instance twice due to high disk usage, and it takes insanely long to sync it back.

@mmulji-ic
Copy link
Contributor

Thanks @fmira21 will follow up with the comet team on this on.

@MSalopek MSalopek added other: decayed Stale issues that need follow up from commentators. Were closed for inactivity general questions and removed type: bug Issues that need priority attention -- something isn't working status: waiting-triage This issue/PR has not yet been triaged by the team. labels May 17, 2024
@MSalopek
Copy link
Contributor

v14.x is no longer active on mainnet.

The network migrated to v15.x which used comet v0.37.x and cosmos-sdk v47.
Earlier this week, the network upgraded to v16.x.

This issue may no longer be relevant and the discussion seems to have decayed.

cosmos-sdk v0.47.x and comet v0.37.x brought many improvements and there were no recent reports of high disk usage.

Please feel free to reopen if this happens again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general questions other: decayed Stale issues that need follow up from commentators. Were closed for inactivity
Projects
Status: 👍 F4: Assessment
Development

No branches or pull requests

4 participants