Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geth cli - block number incrementing keeps stalling #2570

Closed
ghost opened this issue May 14, 2016 · 11 comments
Closed

geth cli - block number incrementing keeps stalling #2570

ghost opened this issue May 14, 2016 · 11 comments

Comments

@ghost
Copy link

ghost commented May 14, 2016

System information

Geth version: Geth/v1.4.3-stable/linux/go1.6.1
OS & Version: Ubuntu 16.04
Commit hash : n/a

Expected behaviour

Block number increments normally

Actual behaviour

Block number stalls/freezes continuously

Steps to reproduce the behaviour

Ubuntu 16.04 EC2 instance

sudo apt-get install go-ethereum
geth --maxpeers "300" console 2>> ~/geth.log

Backtrace

I have an Ubuntu 16.04 instance on Amazon EC2

I can do:

geth console 2>> ~/geth.log
instance: Geth/v1.4.3-stable/linux/go1.6.1
>

Everything looks fine:

> eth.blockNumber
546120

> net.peerCount
300

OK. Great. The blockNumber ticks up as expected.

Now, I'd like to be able to logout and go away and get on with things.

I am using screen

I do Ctrl+A followed by D. I can tail ~/geth.log and I can see the blockNumber incrementing.

I logout. However, when I tail the geth.log file several hours later, it keeps getting stuck:

I0513 05:53:21.227568 eth/downloader/downloader.go:274] Synchronisation failed: no peers to keep download active

The Amazon firewall is wide open. I have ufw (Ubuntu's firewall) switched off.

I have no idea what's going on. I have spent hours at this.

I wonder is Amazon traffic shaping or blocking UDP on this port? I have the Amazon security wide open - all IP addresses, all tcp, all udp connections allowed...

@ghost
Copy link
Author

ghost commented May 17, 2016

I find the stalling is particularly bad on my 3G/4G network.

It doesn't even increment up slowly - if the bandwidth drops below a certain level, it just stalls completely!

@taoeffect
Copy link

Not sure but maybe related to #2569 ?

@ghost
Copy link
Author

ghost commented May 17, 2016

Yes, I think so. Thanks for the reference.

Hopefully things will improve in future releases. It's doing OK with --fast option.

@taoeffect
Copy link

I'm seeing this issue also with 1.4.4 on OS X. It's downloading, just very slowly, periodically showing Synchronisation failed: no peers to keep download active. Not sure what the deal is and I don't think I can use --fast since I'm not starting from scratch.

@taoeffect
Copy link

Don't know if you saw this or if it helps, but check it out: https://blog.ethereum.org/2016/05/17/security-alert-geth-suffers-from-a-very-low-probable-dos-attack-vector-update-immediately/

I updated and I'm still occasionally seeing Synchronisation failed: no peers to keep download active, but not too often. In general there's mostly forward progress, albeit slow and occasionally CPU intensive, on this hot summer's day… I sit, synchronizing with the chain.

@taoeffect
Copy link

For me, restarting the client seems to fix the problem fairly reliably (until the next time it starts to stall).

@ghost
Copy link
Author

ghost commented May 20, 2016

@taoeffect It was also stalling again for me about 30 mins after a restart...

So...

I deleted the entire chaindata folder (*goes without saying to be very careful not to delete your keystore folder!)

I then restarted geth with the --fast option

I now have a sync'd blockchain

I think the newer releases are sorting this out. apt-get is currently on 1.4.3 and homebrew is similar. The updates are working their way through. I'm on stable branch on both Ubuntu and OSX/homebrew.

@obscuren
Copy link
Contributor

I suggest you lower your peer count and leave it at the default. More peers does not mean better connectivity.

You can image your node like this: put it in a room with 300 other nodes, each of those nodes are telling the same story, just at different intervals. Your node has to interpret ever 300 stories and as you can imagine that would take considerable amount of time and effort. At some point you'll be so busy following up on all the stories you're starting to lack behind with the interpretation (because you're too slow to catch up on all 300) and so you must ask them to wait. Nodes don't like waiting because being stalled by someone means they won't be able to communicate further, and the longer you do this, the more likely you'll get dropped.

This might be a reason why you're seeing connectivity drops. Please remove the --maxpeers and let us know if that solved anything.

Additionally, 300 peers on a 3G network is a bad, bad idea. Your 3G network might also be an issue. Try running geth with geth --vmodule=p2p=6,downloader=6 and upload the logs once it starts to stall again.

@taoeffect
Copy link

taoeffect commented May 23, 2016

FWIW I was not using the --maxpeers setting or 3G.

@macht1
Copy link

macht1 commented Jun 3, 2016

Getting same problems here on Win 10. Works for about 5 minutes after entering "geth --rpc" then stalls.

@stale
Copy link

stale bot commented Mar 5, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status:inactive label Mar 5, 2018
@stale stale bot closed this as completed Apr 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants