Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent JS tracer fails #23559

Closed
tanmaster opened this issue Sep 9, 2021 · 15 comments · Fixed by #23632
Closed

inconsistent JS tracer fails #23559

tanmaster opened this issue Sep 9, 2021 · 15 comments · Fixed by #23632
Labels

Comments

@tanmaster
Copy link

tanmaster commented Sep 9, 2021

System information

Geth version: 1.10.6-stable, 1.10.9-unstable
OS & Version: Linux
Commit hash : 576681f29b895dd39e559b7ba17fcd89b42e4833, 90987db7334c1d10eb866ca550efedb66dea8a20

Expected behaviour

I am trying to do chain tracing from block 0 until now using a custom JS tracer. We want to extract data which otherwise would be difficult to obtain.

Actual behaviour

However we are experiencing seemingly random aborts of the tracings. At first I thought it was due to my js code, but after further experimenting I found that using even the noop_tracer.js can produce these random aborts. Here are some of the error logs when the abort happens:

- WARN [09-09|15:54:50.297] Chain tracing failed                     start=0          end=12,300,000 transactions=1,127,999 elapsed=10m16.166119292s err="missing trie node fe7d23a257f7528f27cb8d821f9212cdb4ce1835eca729e16464be6423ad1ae2 (path 03)"
- WARN [09-09|16:01:39.373] Tracing failed                           hash=9fe770..825038 block=531,280 err="tracing failed: insufficient funds for gas * price + value: address 0xa6BDc31c431B8F97848c9D60cbbe9660dc450eB7 have 5159916608128893981 want 5165818380000000000"
- WARN [09-09|16:01:39.984] Chain tracing failed                     start=0          end=12,300,000 transactions=566,584   elapsed=4m59.667592415s  err="processing block 531280 failed: could not apply tx 6 [0x9fe77018a2ecc3a011c74dc95b040c86050f67aa84e4f3f82bfc5c9837825038]: insufficient funds for gas * price + value: address 0xa6BDc31c431B8F97848c9D60cbbe9660dc450eB7 have 5159916608128893981 want 5165818380000000000"
- WARN [09-09|16:06:47.806] Chain tracing failed                     start=0          end=12,300,000 transactions=305,093 elapsed=2m31.62350779s  err="can't create storage trie: missing trie node ba59fa34a843ee0ac546128f5f1e9f4c889d38bf060948ecca48a0412b7107c0 (path )"
- WARN [09-09|16:11:57.855] Chain tracing failed                     start=0          end=12,300,000 transactions=207,006 elapsed=1m51.909702218s err="missing trie node 7f0767d73909ad822e4e811db203e6e6781cffdd46d082644ab4262bd6fb7c72 (path 0d0008)"
- WARN [09-09|19:29:08.723] Chain tracing failed                     start=0          end=13,000,000 transactions=542,976 elapsed=4m18.446828689s err="processing block 519064 failed: could not apply tx 1 [0x0c7706562bab01109373b69db37530eb3acd1ae711b709a5395ca23c1f31bef3]: gas limit reached"

Whats even more frustrating is that these happen inconsistently and I can't reproduce the failures at the same transaction between different runs. For what it's worth, I could confirm this behavior on two different machines on two different versions of geth. The logs also make me think that the fails are due to different causes, which would make this even more annoying.

The furthest I was able to trace was up to block ~2M

Steps to reproduce the behaviour (it eventually happens...)

Geth is started with:

geth \
--datadir='/mnt/7F050DA422DCAB55/.ethereum' \
--txlookuplimit=0 \
--verbosity=3 \
--vmdebug \
--maxpeers=0 

IPC socket tracer command (using nc -U geth.ipc):

{"id": 1, "method": "debug_subscribe", "params": ["traceChain", "0x0", "0xc65d40", {"tracer": "{step: function(log, db) { }, fault: function(log, db) { }, result: function(ctx, db) { return {}; }}", "timeout": "10000s"}]}
@s1na
Copy link
Contributor

s1na commented Sep 21, 2021

I was able to reproduce this. Funnily enough my error contains the same address:

WARN [09-21|15:13:41.060] Tracing failed                           hash=746e11..0532bc block=532,825 err="tracing failed: insufficient funds for gas * price + value: address 0xa6BDc31c431B8F97848c9D60cbbe9660dc450eB7 have 5011163958128893981 want 5014107070000000000"
WARN [09-21|15:13:41.352] Chain tracing failed                     start=0 end=13,000,000 transactions=568,482 elapsed=33m56.98872039s  err="processing block 532825 failed: could not apply tx 3 [0x746e11c653adc14484ef9ab3cd7ef47c926a0e13d88474d9e798ac6d660532bc]: insufficient funds for gas * price + value: address 0xa6BDc31c431B8F97848c9D60cbbe9660dc450eB7 have 5011163958128893981 want 5014107070000000000"

Edit: on a second run:

WARN [09-21|16:44:17.073] Chain tracing failed                     start=0 end=13,000,000 transactions=551,324 elapsed=32m48.451866988s err="commit aborted due to earlier
error: getDeleteStateObject (22d7957d81497cc48c1020de201b07fc526a0c59) error: missing trie node 33591eee04145bdd97b9d824b81d4852772e2adc3e8790bdfa5dc71a276a14d0 (path 0d00
)"

@holiman
Copy link
Contributor

holiman commented Sep 22, 2021

Hm, 5,014,107,078,128,893,981 is the true value of that account right then. The cost is indeed 5,014,107,070,000,000,000.

What's odd about this, IMO, is that if we had an underlying trie error, I'd expect the value-lookup to return 0 balance. Not a "nearly but not quite correct" value.

@evertonfraga
Copy link
Member

evertonfraga commented Sep 22, 2021

The address 0xa6BDc31c431B8F97848c9D60cbbe9660dc450eB7 had mined several blocks prior to the transaction mentioned above. The closest block was mined only 9 blocks before it.

Is it possible the account balance wasn't fetched/updated correctly in the context of tracing?

@holiman
Copy link
Contributor

holiman commented Sep 22, 2021

 Is it possible the account balance wasn't fetched/updated correctly in the context of tracing?

Yes I had the same idea, but it's not as straight-forward as that. On the upside, I am now able to somewhat reliably reproduce it, so I'm throwing various debug code at it to narrow it down further.

What makes it a bit hard is that "reliably repro" still means it takes ~4 minutes of tracing for it to hit, and it doesn't happen on the same place each time -- even rerunning the tracing without restarting the node doesn't make it hit on the same place.

@evertonfraga
Copy link
Member

FWIW, I also found that the OP report also had the same account producing a block just 10 blocks before.

  • address: 0xa6BDc31c431B8F97848c9D60cbbe9660dc450eB7
  • tx: 0x9fe77018a2ecc3a011c74dc95b040c86050f67aa84e4f3f82bfc5c9837825038 (block 531280)
  • mined block: https://etherscan.io/block/531270

@holiman
Copy link
Contributor

holiman commented Sep 24, 2021

I'm fairly certain that the root cause has been identified, and it's fixed by #23632 . Please give it a test and report back if it seems to work.

@tanmaster
Copy link
Author

I'm fairly certain that the root cause has been identified, and it's fixed by #23632 . Please give it a test and report back if it seems to work.

It sure looks like the fix is working. I let the script run overnight and had no fail so far, whereas before it would almost certainly fail after a couple of hours.

Thank's a lot for your work everyone, it is very much appreciated 🙏

@holiman
Copy link
Contributor

holiman commented Sep 25, 2021

I let the script run overnight and had no fail so far, whereas before it would almost certainly fail after a couple of hours.

Great, thanks for reporting back!
Just FYI: if you trace long sequences, with the way it works currently, it's guaranteed to go OOM eventually. The chaintracer never flushes to disk (it does memory garbage collection though). Also, it never "resets" the base disk image, which is whatever you started tracing at (so e.g. genesis).

So instead of tracing 0-13M blocks, it's better to trace e.g. 0-1M, 1M-2M, etc. We might fix something so it automatically clears out the in-memory data and starts from a new disk trie at certain intervals, but that's how it is right now.

@tanmaster
Copy link
Author

tanmaster commented Sep 25, 2021

So instead of tracing 0-13M blocks, it's better to trace e.g. 0-1M, 1M-2M, etc. We might fix something so it automatically clears out the in-memory data and starts from a new disk trie at certain intervals, but that's how it is right now.

Thanks for the heads up. That would actually benefit us even more because once we're done with 0-13M, we'd need to trace 13-14M at some point. However we also want to do this while not having to run a full archive node. Is that possible with your suggestion?

Currently when I try to continue tracing from a point where I previously had stopped, I get an error like this:

WARN [09-25|11:15:26.160] Chain tracing failed start=50000 end=100,000 transactions=0 elapsed=133.902753ms err="required historical state unavailable (reexec=128)"

@holiman
Copy link
Contributor

holiman commented Sep 25, 2021 via email

@holiman
Copy link
Contributor

holiman commented Sep 25, 2021 via email

@s1na
Copy link
Contributor

s1na commented Sep 28, 2021

Whenever it finds a disk-root, it will throw away the mem-trie and start fresh.

Just wanted to add to what @holiman said: you can use #23646 to find out for which blocks you have the state on disk. It's possible to start the chain tracer from those blocks. But if you did a fast/snap sync then up until the first synced state there'll be no state on disk. So that means it's not right now possible to start the chain tracer from blocks [1, firstSyncedBlock).

@tanmaster
Copy link
Author

tanmaster commented Sep 28, 2021

Just wanted to add to what @holiman said: you can use #23646 to find out for which blocks you have the state on disk. It's possible to start the chain tracer from those blocks. But if you did a fast/snap sync then up until the first synced state there'll be no state on disk. So that means it's not right now possible to start the chain tracer from blocks [1, firstSyncedBlock).

Thanks for the addition, that's very valuable information for us 🙏
So if the chain data was synchronized using the syncmode=full flag, there are some states that will permanently stay on my machine which I can use as anchors to start tracing? Is it defined somewhere how regularly the states are flushed? By what are these flushes triggered?

@holiman
Copy link
Contributor

holiman commented Sep 28, 2021

there are some states that will permanently stay on my machine which I can use as anchors to start tracing?

Yes

By what are these flushes triggered?

Either memory constraints or ~1h of EVM processing, which usually means every ~10-15K blocks, or node restart

@tanmaster
Copy link
Author

Either memory constraints or ~1h of EVM processing, which usually means every ~10-15K blocks, or node restart

That sounds exactly like what I need, thanks a lot everyone 😌🤩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants