-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debug write() with dump() #579
Comments
i need some help. i'm working with 16 dbs at a time, still trying to find out what my problem is.
can anyone shed some light on the const MinKeysPerPage in 'internals/common/page.go'? who knows how affects MinKeysPerPage and PageSize the write behavior? i've toooo many numbers to play with now. if you want to dive into it, here is a test app with lots of options. but i'm impressed with the good write performance, when it writes normally hits 100K+ tx/s with 4 core limit and very little write bandwidth used to update the databases. I'm using the latest github commits now, not release. happens too with r1.3.7. thanks to any reply in advance! |
Lots of things to unpack, it sounds very similar to #401 to a degree. Can you create a CPU profile with pprof for us please? Maybe that sheds the light a bit better on your load spikes. |
is creating one now. with values: inserting 50M objects and mid way now load ?? ... 2023/10/18 17:31:41 BoltSpeed: 83784.84 tx/s ( did=837846 in 10.0 sec ) totalTX=2818893 otx=16 now the process is getting weird. cpu usage drops, load rises. load 29 ...2023/10/18 18:14:31 WARN boltBucketPutBatch [c|f2] Q=157/Qcap=256 wCBBS=256 forcing flush! zfs won't stop writing will wait for it to complete (100M tx (search/inserts 50/50)) and post the cpu profile //edit maybe it's just my fault, hitting bbolt too much xD |
what GPT says to PEEK log from profile Certainly, here's a summary of the functions that consumed the most CPU time, along with their line endings:
These functions are the top consumers of CPU time in your Go application, and optimizing them should lead to significant performance improvements. |
go-while/nntp-history/L3CACHE.go:143 //getexpired:
for key, item := range ptr.cache { go-while/nntp-history/L1CACHE.go:210 //getexpired:
for hash, item := range ptr.cache { go-while/nntp-history/L2CACHE.go:152 //getexpired:
for offset, item := range ptr.cache { this wasn't very helpful for me 👯 |
ok cool all bad map iters are gone and found queue congestion which should be fixed, but the problem with the writes is still happening. will pull it after some testing into the repo. i tested a lot of stuff these days and as far as i can tell, the pagesize alone does not solve the problem. i did not touch the code in my repo, code lines still match the cpu profile. good night. // edit a little slowdown to the end but test is now finished and has never been that fast 🥇 50M x 4 jobs = 50M searches, 50M inserts and 200M hits to cache. not bad 👯 tree/L1prioQue/test wow i can't sleep now. rescanning the db with same values dupe checking: 90k-100k tx/sec pure reads. 👯 |
2023/10/19 07:12:50 1697692370252293163 split() pagesize='4096' nodes=2 n.inodes=6 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370252496266 splitTwo pagesize='4096' (n.inodes=6 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=6
2023/10/19 07:12:50 1697692370252502507 split() pagesize='4096' nodes=2 n.inodes=6 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370252702502 splitTwo pagesize='4096' (n.inodes=8 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=8
2023/10/19 07:12:50 1697692370252707171 split() pagesize='4096' nodes=2 n.inodes=8 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370252759513 splitTwo pagesize='4096' (n.inodes=6 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=6
2023/10/19 07:12:50 1697692370252764333 split() pagesize='4096' nodes=2 n.inodes=6 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370253169850 splitTwo pagesize='4096' (n.inodes=7 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=7
2023/10/19 07:12:50 1697692370253176809 split() pagesize='4096' nodes=2 n.inodes=7 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370253192470 splitTwo pagesize='4096' (n.inodes=5 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=5
2023/10/19 07:12:50 1697692370253196873 split() pagesize='4096' nodes=2 n.inodes=5 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370254264268 splitTwo pagesize='4096' (n.inodes=5 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=5
2023/10/19 07:12:50 1697692370254272712 split() pagesize='4096' nodes=2 n.inodes=5 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370256154835 splitTwo pagesize='4096' (n.inodes=6 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=6
2023/10/19 07:12:50 1697692370256163311 split() pagesize='4096' nodes=2 n.inodes=6 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370256188272 splitTwo pagesize='4096' (n.inodes=6 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=6
2023/10/19 07:12:50 1697692370256192864 split() pagesize='4096' nodes=2 n.inodes=6 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370256242848 splitTwo pagesize='4096' (n.inodes=7 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=7
2023/10/19 07:12:50 1697692370256249612 split() pagesize='4096' nodes=2 n.inodes=7 common.MinKeysPerPage=2
2023/10/19 07:12:50 splitIndex threshold=3686
2023/10/19 07:12:50 1697692370256670093 splitTwo pagesize='4096' (n.inodes=7 common.MinKeysPerPage=2*2 || n.sizeLessThan=true) threshold=3686 splitIndex=7 with the default 4K pagesize it constantly splits inodes and i guess the heavy writes are coming from this? there is this if clause in node.go splitTwo() . can anyone explain why this values had been choosen? if len(n.inodes) <= (common.MinKeysPerPage*2) || n.sizeLessThan(pageSize) {
//log.Printf("%d !splitTwo pagesize='%d' (n.inodes=%d <= common.MinKeysPerPage=%d*2 || n.sizeLessThan=%t)", time.Now().UnixNano(), pageSize, len(n.inodes), common.MinKeysPerPage, n.sizeLessThan(pageSize))
return n, nil
} // place in node.go splitTwo() before the return
log.Printf("%d splitTwoB pagesize='%d' (n.inodes=%d common.MinKeysPerPage=%d*2 || n.sizeLessThan=%t) threshold=%d splitIndex=%d", time.Now().UnixNano(), pageSize, len(n.inodes), common.MinKeysPerPage, n.sizeLessThan(pageSize), threshold, splitIndex)
return n, next
} modded it to this for now // splitTwo breaks up a node into two smaller nodes, if appropriate.
// This should only be called from the split() function.
func (n *node) splitTwo(pageSize uintptr) (*node, *node) {
// Ignore the split if the page doesn't have at least enough nodes for
// two pages or if the nodes can fit in a single page.
if len(n.inodes) <= (common.MinKeysPerPage*2) || n.sizeLessThan(pageSize) {
//log.Printf("%d !splitTwo pagesize='%d' (n.inodes=%d <= common.MinKeysPerPage=%d*2 || n.sizeLessThan=%t)", time.Now().UnixNano(), pageSize, len(n.inodes), common.MinKeysPerPage, n.sizeLessThan(pageSize))
return n, nil
}
log.Printf("%d splitTwoA pagesize='%d' (n.inodes=%d common.MinKeysPerPage=%d*2 || n.sizeLessThan=%t)", time.Now().UnixNano(), pageSize, len(n.inodes), common.MinKeysPerPage, n.sizeLessThan(pageSize))
...
... |
Thanks for all the information.
it certainly seems the splitting causes a lot of GC activity. The calls to |
i can't test on ext4 at the moment. no more hardware. could setup a VM but still zfs on the host. |
i dumped my debugs into this tree: with defaults 4KB pagesize and a fresh db i can see this: 2023/10/19 16:11:05 1697724665147285003 splitTwo determine pagesize='4096' (n.inodes=50 common.MinKeysPerPage=2*2 || n.sizeLessThan=false)
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=0 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=89 elsize=107 sz=16
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=1 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=123
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=2 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=202
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=3 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=281
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=4 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=360
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=5 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=439
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=6 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=518
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=7 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=597
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=8 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=676
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=9 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=755
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=10 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=834
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=11 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=913
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=12 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=60 elsize=78 sz=992
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=13 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=60 elsize=78 sz=1070
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=14 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1148
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=15 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1227
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=16 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1306
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=17 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=89 elsize=107 sz=1385
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=18 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1492
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=19 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=89 elsize=107 sz=1571
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=20 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1678
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=21 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1757
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=22 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1836
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=23 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1915
2023/10/19 16:11:05 1697724665147285003 splitIndex forloop index=24 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1994
2023/10/19 16:11:05 1697724665147285003 splitIndex break @ index=24 n.inodes=50 n.pageElementSize=16 keyLen=2 valLen=61 elsize=79 sz=1994 threshold=2048
...
...
...
...
2023/10/19 16:16:35 1697724995627077478 splitTwo determine pagesize='4096' (n.inodes=16 common.MinKeysPerPage=2*2 || n.sizeLessThan=false)
2023/10/19 16:16:35 1697724995627077478 splitIndex forloop index=0 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=124 elsize=142 sz=16
2023/10/19 16:16:35 1697724995627077478 splitIndex forloop index=1 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=215 elsize=233 sz=158
2023/10/19 16:16:35 1697724995627077478 splitIndex forloop index=2 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=154 elsize=172 sz=391
2023/10/19 16:16:35 1697724995627077478 splitIndex forloop index=3 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=401 elsize=419 sz=563
2023/10/19 16:16:35 1697724995627077478 splitIndex forloop index=4 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=185 elsize=203 sz=982
2023/10/19 16:16:35 1697724995627077478 splitIndex forloop index=5 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=276 elsize=294 sz=1185
2023/10/19 16:16:35 1697724995627077478 splitIndex forloop index=6 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=217 elsize=235 sz=1479
2023/10/19 16:16:35 1697724995627077478 splitIndex forloop index=7 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=156 elsize=174 sz=1714
2023/10/19 16:16:35 1697724995627077478 splitIndex forloop index=8 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=246 elsize=264 sz=1888
2023/10/19 16:16:35 1697724995627077478 splitIndex break @ index=8 n.inodes=16 n.pageElementSize=16 keyLen=2 valLen=246 elsize=264 sz=1888 threshold=2048
...
...
...
2023/10/19 16:24:00 1697725440019337071 splitTwo determine pagesize='4096' (n.inodes=6 common.MinKeysPerPage=2*2 || n.sizeLessThan=false)
2023/10/19 16:24:00 1697725440019337071 splitIndex forloop index=0 n.inodes=6 n.pageElementSize=16 keyLen=2 valLen=554 elsize=572 sz=16
2023/10/19 16:24:00 1697725440019337071 splitIndex forloop index=1 n.inodes=6 n.pageElementSize=16 keyLen=2 valLen=649 elsize=667 sz=588
2023/10/19 16:24:00 1697725440019337071 splitIndex forloop index=2 n.inodes=6 n.pageElementSize=16 keyLen=2 valLen=618 elsize=636 sz=1255
2023/10/19 16:24:00 1697725440019337071 splitIndex forloop index=3 n.inodes=6 n.pageElementSize=16 keyLen=2 valLen=894 elsize=912 sz=1891
2023/10/19 16:24:00 1697725440019337071 splitIndex break @ index=3 n.inodes=6 n.pageElementSize=16 keyLen=2 valLen=894 elsize=912 sz=1891 threshold=2048 over time "valLen=" grows and grows near threshold and the splitIndex gets smaller and smaller reaching to 1 digits and falling low as MinKeyLen. I'm honest: i have no idea what I'm doing. maybe it's just normal how btrees work and grow, but i don't get it xD i will fill one with 50M now and try to compact it afterwards and fill another 50M into to see what happens. |
with 32KB pagesize 023/10/19 16:27:28 1697725648796881482 splitTwo determine pagesize='32768' (n.inodes=117 common.MinKeysPerPage=2*2 || n.sizeLessThan=false)
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=0 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=217 elsize=235 sz=16
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=1 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=186 elsize=204 sz=251
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=2 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=217 elsize=235 sz=455
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=3 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=308 elsize=326 sz=690
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=4 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=279 elsize=297 sz=1016
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=5 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=218 elsize=236 sz=1313
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=6 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=428 elsize=446 sz=1549
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=7 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=309 elsize=327 sz=1995
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=8 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=186 elsize=204 sz=2322
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=9 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=335 elsize=353 sz=2526
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=10 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=368 elsize=386 sz=2879
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=11 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=155 elsize=173 sz=3265
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=12 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=186 elsize=204 sz=3438
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=13 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=306 elsize=324 sz=3642
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=14 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=245 elsize=263 sz=3966
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=15 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=249 elsize=267 sz=4229
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=16 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=308 elsize=326 sz=4496
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=17 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=183 elsize=201 sz=4822
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=18 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=366 elsize=384 sz=5023
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=19 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=245 elsize=263 sz=5407
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=20 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=155 elsize=173 sz=5670
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=21 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=308 elsize=326 sz=5843
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=22 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=217 elsize=235 sz=6169
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=23 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=248 elsize=266 sz=6404
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=24 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=247 elsize=265 sz=6670
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=25 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=305 elsize=323 sz=6935
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=26 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=430 elsize=448 sz=7258
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=27 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=247 elsize=265 sz=7706
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=28 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=215 elsize=233 sz=7971
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=29 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=125 elsize=143 sz=8204
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=30 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=216 elsize=234 sz=8347
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=31 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=278 elsize=296 sz=8581
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=32 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=154 elsize=172 sz=8877
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=33 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=339 elsize=357 sz=9049
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=34 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=277 elsize=295 sz=9406
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=35 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=306 elsize=324 sz=9701
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=36 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=370 elsize=388 sz=10025
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=37 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=124 elsize=142 sz=10413
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=38 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=245 elsize=263 sz=10555
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=39 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=464 elsize=482 sz=10818
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=40 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=248 elsize=266 sz=11300
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=41 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=246 elsize=264 sz=11566
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=42 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=308 elsize=326 sz=11830
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=43 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=92 elsize=110 sz=12156
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=44 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=186 elsize=204 sz=12266
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=45 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=246 elsize=264 sz=12470
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=46 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=215 elsize=233 sz=12734
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=47 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=428 elsize=446 sz=12967
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=48 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=276 elsize=294 sz=13413
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=49 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=123 elsize=141 sz=13707
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=50 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=430 elsize=448 sz=13848
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=51 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=249 elsize=267 sz=14296
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=52 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=154 elsize=172 sz=14563
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=53 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=308 elsize=326 sz=14735
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=54 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=277 elsize=295 sz=15061
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=55 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=306 elsize=324 sz=15356
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=56 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=217 elsize=235 sz=15680
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=57 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=341 elsize=359 sz=15915
2023/10/19 16:27:28 1697725648796881482 splitIndex forloop index=58 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=215 elsize=233 sz=16274
2023/10/19 16:27:28 1697725648796881482 splitIndex break @ index=58 n.inodes=117 n.pageElementSize=16 keyLen=2 valLen=215 elsize=233 sz=16274 threshold=16384 i see much less splits, almost no splits up to 15M inserts (30M tx with searches) // edit at 40M total tx it stopped the splitting war and write bandwidth is low again, almost no existent. normal zfs flush frequency. |
never give up! now with bbolt everything at defaults and 4KB pagesize i believe since pages need to be split when reach the tresh, and we feed at constant pace... pages get split by 2, means in any point in the future we have to split the double amount of pages when they get full. and repeat. is somehow exponential? key_add=49999923 key_app=77 fseeks=75 |
i've never looked at iotop but that's burning 🤣 Total DISK WRITE: 580.49 M/s |
when you don't know what your own vars can do 👯 while writing this there is a split happening (128K pagesize i'm testing) at 35M tx in but ZFS stayed cool. see it flush for few 100 milliseconds only. 😮 and split again, then you see a load of zfs write bandwidth but everything in normal range. 2023/10/19 19:50:18 BoltSpeed: 95592.68 tx/s ( did=955918 in 10.0 sec ) totalTX=18930332 otx=16
2023/10/19 19:50:28 WatchBolt: (inserted 45441/s) (+454419 inserted in 10s)
2023/10/19 19:50:28 WatchBolt: (batchins 391/s) (+3913 batchins in 10s) medBS=~109
2023/10/19 19:50:28 WatchBolt: (searches 46400/s) (+464000 searches in 10s)
2023/10/19 19:50:28 L1: [fex=0/set:9930119] [get=0/mux=9930119] [del:9439766/bat:9447857] [g/s:0/0] cached:490353 (~30647/char)
2023/10/19 19:50:28 L2: [fex=2/set:9930119] [get=0/mis=0] [del:9439990/bat:9702131] [g/s:0/0] cached:490133 (~30633/char)
2023/10/19 19:50:28 L3: [fex=0/set:9930119] [get=0/mis=0] [del:9439765/bat:9447856] [g/s:0/0] cached:490353 (~30647/char)
2023/10/19 19:50:28 BoltSpeed: 92928.93 tx/s ( did=930270 in 10.0 sec ) totalTX=19860602 otx=16 2023/10/19 19:52:50 1697737970250138442 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970256381548 splitIndex break @ index=1995 n.inodes=4061 n.pageElementSize=16 keyLen=8 valLen=8 elsize=32 sz=65513 threshold=65536
2023/10/19 19:52:50 1697737970256381548 splitTwo newParent pagesize='131072' (n.inodes=4061 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970256381548 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970275315118 splitIndex break @ index=1995 n.inodes=3998 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65534 threshold=65536
2023/10/19 19:52:50 1697737970275315118 splitTwo newParent pagesize='131072' (n.inodes=3998 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970275315118 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970276073188 splitIndex break @ index=1994 n.inodes=4020 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65513 threshold=65536
2023/10/19 19:52:50 1697737970276073188 splitTwo newParent pagesize='131072' (n.inodes=4020 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1994
2023/10/19 19:52:50 1697737970276073188 splitTwo returned@ pagesize='131072' n.inodes=1994/2*2 threshold=65536 splitIndex=1994
2023/10/19 19:52:50 1697737970277572230 splitIndex break @ index=1995 n.inodes=4035 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65526 threshold=65536
2023/10/19 19:52:50 1697737970277572230 splitTwo newParent pagesize='131072' (n.inodes=4035 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970277572230 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970321967503 splitIndex break @ index=1995 n.inodes=4017 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65524 threshold=65536
2023/10/19 19:52:50 1697737970321967503 splitTwo newParent pagesize='131072' (n.inodes=4017 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970321967503 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970322768950 splitIndex break @ index=1995 n.inodes=4063 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65521 threshold=65536
2023/10/19 19:52:50 1697737970322768950 splitTwo newParent pagesize='131072' (n.inodes=4063 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970322768950 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970336452985 splitIndex break @ index=1995 n.inodes=4042 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65520 threshold=65536
2023/10/19 19:52:50 1697737970336452985 splitTwo newParent pagesize='131072' (n.inodes=4042 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970336452985 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970362805778 splitIndex break @ index=1996 n.inodes=3998 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65521 threshold=65536
2023/10/19 19:52:50 1697737970362805778 splitTwo newParent pagesize='131072' (n.inodes=3998 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1996
2023/10/19 19:52:50 1697737970362805778 splitTwo returned@ pagesize='131072' n.inodes=1996/2*2 threshold=65536 splitIndex=1996
2023/10/19 19:52:50 1697737970363002594 splitIndex break @ index=1996 n.inodes=4024 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65519 threshold=65536
2023/10/19 19:52:50 1697737970363002594 splitTwo newParent pagesize='131072' (n.inodes=4024 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1996
2023/10/19 19:52:50 1697737970363002594 splitTwo returned@ pagesize='131072' n.inodes=1996/2*2 threshold=65536 splitIndex=1996
2023/10/19 19:52:50 1697737970366462978 splitIndex break @ index=1995 n.inodes=4045 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65536 threshold=65536
2023/10/19 19:52:50 1697737970366462978 splitTwo newParent pagesize='131072' (n.inodes=4045 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970366462978 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970375120406 splitIndex break @ index=1995 n.inodes=3995 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65517 threshold=65536
2023/10/19 19:52:50 1697737970375120406 splitTwo newParent pagesize='131072' (n.inodes=3995 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970375120406 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970377918820 splitIndex break @ index=1995 n.inodes=4036 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65509 threshold=65536
2023/10/19 19:52:50 1697737970377918820 splitTwo newParent pagesize='131072' (n.inodes=4036 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970377918820 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970379005367 splitIndex break @ index=1995 n.inodes=4039 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65523 threshold=65536
2023/10/19 19:52:50 1697737970379005367 splitTwo newParent pagesize='131072' (n.inodes=4039 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970379005367 splitTwo returned@ pagesize='131072' n.inodes=1995/2*2 threshold=65536 splitIndex=1995
2023/10/19 19:52:50 1697737970380971296 splitIndex break @ index=1995 n.inodes=4023 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=65505 threshold=65536
2023/10/19 19:52:50 1697737970380971296 splitTwo newParent pagesize='131072' (n.inodes=4023 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=65536 splitIndex=1995
...
...
Load average: 8.69 8.68 9.91 oops. houston we have too many variables. rescan runs with 90k read tx/sec, not bad near 💯k ... and mem stable at 5.1G // 2023/10/19 20:23:42 done=200000000 (took 585 seconds) (closewait 3 seconds)
Alloc: 2006 MiB, TotalAlloc: 86054 MiB, Sys: 2906 MiB, NumGC: 53
du -hs hashdb/50M/
1,2G hashdb/50M/
zfs recordsize is at 16K
nntp-history/hashdb compressratio 2.06x
zfs list
NAME USED
tank0/nntp-history/hashdb 1.16G
hashdb/50M$ du -hs *
74M history.dat.hash.0
72M history.dat.hash.1
74M history.dat.hash.2
75M history.dat.hash.3
76M history.dat.hash.4
74M history.dat.hash.5
76M history.dat.hash.6
77M history.dat.hash.7
77M history.dat.hash.8
76M history.dat.hash.9
73M history.dat.hash.a
76M history.dat.hash.b
73M history.dat.hash.c
73M history.dat.hash.d
74M history.dat.hash.e
73M history.dat.hash.f
hashdb/50M$ du -b *
201457664 history.dat.hash.0
197656576 history.dat.hash.1
202375168 history.dat.hash.2
207224832 history.dat.hash.3
208011264 history.dat.hash.4
203816960 history.dat.hash.5
208273408 history.dat.hash.6
213254144 history.dat.hash.7
210632704 history.dat.hash.8
207355904 history.dat.hash.9
200146944 history.dat.hash.a
210763776 history.dat.hash.b
200409088 history.dat.hash.c
200015872 history.dat.hash.d
203423744 history.dat.hash.e
204210176 history.dat.hash.f 2023/10/19 21:06:25 1697742385316289224 splitTwo newParent pagesize='262144' (n.inodes=7993 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=131072 splitIndex=3981
2023/10/19 21:06:25 1697742385316289224 splitTwo returned@ pagesize='262144' n.inodes=3981/2*2 threshold=131072 splitIndex=3981
2023/10/19 21:06:25 1697742385383028640 splitIndex break @ index=3981 n.inodes=7980 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=131069 threshold=131072
2023/10/19 21:06:25 1697742385383028640 splitTwo newParent pagesize='262144' (n.inodes=7980 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=131072 splitIndex=3981
2023/10/19 21:06:25 1697742385383028640 splitTwo returned@ pagesize='262144' n.inodes=3981/2*2 threshold=131072 splitIndex=3981
2023/10/19 21:06:25 RUN test p=2 nntp-history added=7977214 dupes=0 cLock=12133921 addretry=0 retry=0 adddupes=0 cdupes=0 cretry1=11888865 cretry2=0 32000000/50000000
2023/10/19 21:06:25 RUN test p=1 nntp-history added=8014512 dupes=0 cLock=12140183 addretry=0 retry=0 adddupes=0 cdupes=0 cretry1=11845305 cretry2=0 32000000/50000000
2023/10/19 21:06:25 RUN test p=4 nntp-history added=7993694 dupes=0 cLock=12131730 addretry=0 retry=0 adddupes=0 cdupes=0 cretry1=11874576 cretry2=0 32000000/50000000
2023/10/19 21:06:25 RUN test p=3 nntp-history added=8014580 dupes=0 cLock=12139764 addretry=0 retry=0 adddupes=0 cdupes=0 cretry1=11845656 cretry2=0 32000000/50000000
2023/10/19 21:06:26 1697742386057204104 splitIndex break @ index=3981 n.inodes=7981 n.pageElementSize=16 keyLen=8 valLen=9 elsize=33 sz=131040 threshold=131072
2023/10/19 21:06:26 1697742386057204104 splitTwo newParent pagesize='262144' (n.inodes=7981 common.MinKeysPerPage=2*2 || n.sizeLessThan=false) threshold=131072 splitIndex=3981
2023/10/19 21:06:26 1697742386057204104 splitTwo returned@ pagesize='262144' n.inodes=3981/2*2 threshold=131072 splitIndex=3981
... when it splits pages you can see more writes happening, else it's now completely silent. |
this is very interesting now. tuned some numbers from the app and bbolt runs smoothly 🤣 for {
delay := j * int(BatchFlushEvery) / len(ROOTBUCKETS)
j++
go bootworker(delay)
}
/*
j is the worker id, should start with 1, not 0 or math fails.
BatchFlushEvery we have 15000 ms.
len(ROOTBUCKETS) is the total number of workers
i named it rootbuckets, because we create N rootbuckets
and keep the chars that make the rootbuckets [0-9a-f][0-9a-f][0-9a-f]... in this slic
so we can iterate over one slice, not 3 for-loops to generate the 3 depths levels again where needed.
you'd need a struct{} chan with size of workers
and wait until all booted workers pushed their struct{}{}
into this channel before starting rest of your app.
if anybody can come up with a better idea, i'd glad to read it :)
*/ 256 batchqueue workers per db, all booted at same time and all hit the bboltdbs at almost the same time. now boltdb has maxbatchdelay 100ms and maxbatchsize 16384? zfs write bandwidth is near 0. conclusion. stable insert at peaks 120k tx/sec (50/50 search/insert) now ! the nvmes don't help much for read performance as there are no reads from zfs, all ram... how to get it faster? 🍡 |
nested buckets are back in game again and working too. no issues. let's go to the moon, new target: 1b keys. 100M it is doing laughingly. 256 workers, one for every rootbucket with own queue. |
the most interesting point to look at while debugging is the splitTwo() function in bbolt:node.go log.Printf("%d splitTwo returned@ pagesize='%d' n.inodes=%d/%d*2 threshold=%d splitIndex=%d fillPercent=%f", now, pageSize, len(n.inodes), common.MinKeysPerPage, threshold, splitIndex, fillPercent)
return speed is really good 👯 2023/10/21 22:15:39 WatchBolt: (inserted 136566/s) (+1365661 inserted in 10s)
2023/10/21 22:15:39 WatchBolt: (batchins 402/s) (+4027 batchins in 10s) medBS=~350
2023/10/21 22:15:39 WatchBolt: (searches 76800/s) (+768000 searches in 10s) |
Glad you've got it to work fast 💪 |
hi! i'm debugging the hell out of bbolt and maybe i've found something in db.go Batch() in this line if (db.batch == nil) || (db.batch != nil && len(db.batch.calls) >= db.MaxBatchSize) {
// There is no existing batch, or the existing batch is full; start a new one.
db.batch = &batch{
db: db,
}
db.batch.timer = time.AfterFunc(db.MaxBatchDelay, db.batch.trigger)
} with a heavy insert load like 256+ go routines doing db.Batch and if db.MaxBatchSize is set to something low like 3 ( so that it will most likely trigger before the delayed timer) and MaxBatchDelay is higher than default, like 1000 ms or worse 30seconds, we always create a new db.Batch and call trigger with AfterFunc. there can be an undefined open/waiting amount of timers? in line #GPT said
* 0x0 (0): The timer is idle or inactive.
* 0x1 (1): The timer is active and waiting to fire.
* 0x2 (2): The timer has been stopped.
* 0x3 (3): The timer has fired. now i think when the hardware hits write limits of the disks, things go nuts. when MaxBatchSize triggered in L930, we hit this condition many times in line if b.db.batch == b {
//log.Printf("INFO db.go: (121a) run() b.db.batch==b\n\n b.db.batch='%#v'\n\n b='%#v'\n\n", b.db.batch, b)
b.db.batch = nil
} else {
// TODO DEBUG THIS
log.Printf("WARN db.go: (121b) run() b.db.batch != b ???\n\n b.db.batch='%#v'\n\n b='%#v'\n\n", b.db.batch, b)
//os.Exit(121)
} if only change the condition in the first quote to this: if db.batch == nil { to only create a new batch if there is no open one, i can see db.batch.calls number rising higher than batchSize but the debug 'else' in "if b.db.batch == b {" never hits. still not sure, maybe this is intended but i believe, (can be wrong), these infinite writes under some circumstances could be related to this? replace the Batch with this very verbose debug version and see how it exits with these loglines 2023/10/24 14:18:55 1698149935390088386 db.batch=nil creates new
2023/10/24 14:18:55 1698149935390099527 overwrites? check batch.calls=3
2023/10/24 14:18:55 info db.go: trigger() b.date=1698149935390088386 age=(0 mils) b.calls=3
2023/10/24 14:18:55 1698149935390099527 overwrites calls? (db.batch.calls=0) this is zero now but was (3) before?
2023/10/24 14:18:55 1698149935390099527 ERROR? overwrites=true after append to: db.batch.calls
len_initial=3 len_overwrites=0 before=0 after=1 flip the 'const BUGFIX bool = true' and see it not exit ❔ type batch struct {
db *DB
timer *time.Timer
start sync.Once
calls []call
date int64 // debugs age in trigger
}
const BUGFIX bool = false
func (db *DB) Batch(fn func(*Tx) error) error {
errCh := make(chan error, 1)
trace := time.Now().UnixNano()
overwrites := false
// ORIGINAL WITH DEBUGS ADDED
len_initial := -1
len_overwrites := -1
len_before_append := -1
len_after_append := -1
db.batchMu.Lock()
if (db.batch == nil) || (!BUGFIX && db.batch != nil && len(db.batch.calls) >= db.MaxBatchSize) {
// There is no existing batch, or the existing batch is full; start a new one.
if db.batch == nil {
log.Printf("%d db.batch=nil creates new", trace)
} else if len(db.batch.calls) > 0 {
len_initial = len(db.batch.calls)
log.Printf("%d overwrites? check batch.calls=%d", trace, len_initial)
overwrites = true
}
db.batch = &batch{
db: db,
date: trace, // debugs age in trigger
}
db.batch.timer = time.AfterFunc(db.MaxBatchDelay, db.batch.trigger)
//log.Printf("Launch timer db.batch.timer='%#v'", db.batch.timer)
/*
* Launch timer db.MaxBatchDelay=10000000000
* db.batch.timer='&time.Timer{
* C:(<-chan time.Time)(nil),
* r:time.runtimeTimer{
* pp:0xc00002a000, when:1448181161246213, period:0,
* f:(func(interface {}, uintptr))(0x4abca0),
* arg:(func())(0x4fd420), seq:0x0, nextwhen:0,
* status:0x1
* }}'
*/
if overwrites {
len_overwrites = len(db.batch.calls)
log.Printf("%d overwrites calls? (db.batch.calls=%d) this is zero now but was (%d) before?", trace, len_overwrites, len_initial)
//time.Sleep(time.Second*5)
}
}
len_before_append = len(db.batch.calls)
if overwrites {
//log.Printf("%d overwrites before append to: db.batch.calls=%d", trace, len_before)
}
db.batch.calls = append(db.batch.calls, call{fn: fn, err: errCh})
len_after_append = len(db.batch.calls)
if overwrites && len_initial != len_before_append {
log.Printf("%d ERROR? overwrites=%t after append to: db.batch.calls len_initial=%d len_overwrites=%d before=%d after=%d", trace, overwrites, len_initial, len_overwrites, len_before_append, len_after_append)
os.Exit(66)
}
if len(db.batch.calls) >= db.MaxBatchSize {
// wake up batch, it's ready to run
// DEBUG
//if len(db.batch.calls) > db.MaxBatchSize {
//log.Printf("db.go: wakeup batch.calls=%d / db.MaxBatchSize=%d => trigger", len(db.batch.calls), db.MaxBatchSize)
//}
go db.batch.trigger()
} else {
//log.Printf("db.go: nowakeup batch.calls=%d / db.MaxBatchSize=%d db.MaxBatchDelay=%d", len(db.batch.calls), db.MaxBatchSize, db.MaxBatchDelay)
}
db.batchMu.Unlock()
err := <-errCh
if err == trySolo {
err = db.Update(fn)
}
return err
}
// trigger runs the batch if it hasn't already been run.
func (b *batch) trigger() {
kill := false
if b == nil {
log.Printf("WARN trigger() b=nil")
kill = true
} else
if b.db == nil {
log.Printf("WARN trigger() b.db=nil")
kill = true
} else
if b.db.batch == nil {
log.Printf("WARN trigger() b.db.batch=nil")
//kill = true
} else
if b.db.batch.calls == nil {
log.Printf("WARN trigger() b.db.batch.calls=nil")
//kill = true
} else
if len(b.db.batch.calls) == 0 {
log.Printf("WARN trigger() b.db.batch.calls empty")
//kill = true
}
log.Printf("info db.go: trigger() b.date=%d age=(%d mils) b.calls=%d", b.date, (time.Now().UnixNano()-b.date)/1e6, len(b.calls))
if kill {
os.Exit(99)
}
b.start.Do(b.run)
} maybe adding a debugging counter into it to get an idea of how many timers are running/waiting. gogogo |
maybe i got something that can help ` 2023/10/24 18:03:23 1698163402996429837 !!! b='&bbolt.batch{db:(*bbolt.DB)(0xc0097d2fc0), timer:(*time.Timer)(0xc013319450), start:sync.Once{done:0x0, m:sync.Mutex{state:1, sema:0x0}}, calls:[]bbolt.call{bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc01332dec0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc01332df80)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc01333cba0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc01333cc60)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259380)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259620)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0122596e0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0122597a0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259860)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259920)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0122599e0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259b00)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259b60)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259c20)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259ce0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259da0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259e60)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012259f20)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274000)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0122740c0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0122743c0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274480)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274780)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274840)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274900)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274a80)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274b40)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274c00)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274cc0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274d80)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274e40)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274f00)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012274fc0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275080)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275140)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275200)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0122752c0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275380)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275440)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275500)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0122755c0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275680)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275740)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275800)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0122758c0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275980)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275a40)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275b00)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275bc0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275c80)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275d40)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275e00)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275ec0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc012275f80)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc013682060)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc013682120)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0136821e0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0136822a0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc013682360)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc013682420)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0136824e0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc0136825a0)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc013682660)}, bbolt.call{fn:(func(*bbolt.Tx) error)(0x662b80), err:(chan<- error)(0xc01360f200)}}, date:1698163402846272732, id:0x1}' 2023/10/24 18:03:23 1698163402996429837 !3! b.db.batch.id=4 != b.id=1 ??? ###DEBUG END 1698163402996429837### i guess my idea was somehow right: batches can build up patching debugs into this branch testing with this now
just fixed some data races introduced by log.prints. 💃 if anyone can pinpoint me to a internal test, i'd run this too. running it in "race" or "go run" mode is not fast enough to trigger a build up but i can see some with the compiled one. 2023/10/24 18:29:10 1698164950185247089 !3! b.db.batch.id=40 != b.id=38 ??? 2023/10/24 18:44:44 1698165884676286942 !9! b.db.batch.id=26 != b.id=25 diff=1 ??? note that my test app runs with 16 dbs at a time // edit ... and the getNextID which i placed into the debug tree is in the batchMU mutex, should not hurt anyways. performance with getNextID is normal 100k+ tx/sec |
i don't know how it works on ext4 or any other filesystem. 1560679 be/4 user 0.00 B/s 38.66 M/s ?unavailable? ./nntp-history-test -numcpu=4 -BoltDB_PageSize=16 -BoltDB_MaxBatchSize=3 -BoltDB_MaxBatchDelay=1 -BatchFlushEvery=100 -RootBUCKETSperDB=256 -todo 100000000 -p 4 -keyindex=2 -pprofcpu=true -pprofmem=true -pprof localhost:1234 -NoReplayHisDat=false
1560680 be/4 user 0.00 B/s 46.45 M/s ?unavailable? ./nntp-history-test -numcpu=4 -BoltDB_PageSize=16 -BoltDB_MaxBatchSize=3 -BoltDB_MaxBatchDelay=1 -BatchFlushEvery=100 -RootBUCKETSperDB=256 -todo 100000000 -p 4 -keyindex=2 -pprofcpu=true -pprofmem=true -pprof localhost:1234 -NoReplayHisDat=false
1560681 be/4 user 0.00 B/s 200.80 M/s ?unavailable? ./nntp-history-test -numcpu=4 -BoltDB_PageSize=16 -BoltDB_MaxBatchSize=3 -BoltDB_MaxBatchDelay=1 -BatchFlushEvery=100 -RootBUCKETSperDB=256 -todo 100000000 -p 4 -keyindex=2 -pprofcpu=true -pprofmem=true -pprof localhost:1234 -NoReplayHisDat=false
1560683 be/4 user 0.00 B/s 206.21 M/s ?unavailable? ./nntp-history-test -numcpu=4 -BoltDB_PageSize=16 -BoltDB_MaxBatchSize=3 -BoltDB_MaxBatchDelay=1 -BatchFlushEvery=100 -RootBUCKETSperDB=256 -todo 100000000 -p 4 -keyindex=2 -pprofcpu=true -pprofmem=true -pprof localhost:1234 -NoReplayHisDat=false
1564523 be/4 user 0.00 B/s 165.91 M/s ?unavailable? ./nntp-history-test -numcpu=4 -BoltDB_PageSize=16 -BoltDB_MaxBatchSize=3 -BoltDB_MaxBatchDelay=1 -BatchFlushEvery=100 -RootBUCKETSperDB=256 -todo 100000000 -p 4 -keyindex=2 -pprofcpu=true -pprofmem=true -pprof localhost:1234 -NoReplayHisDat=false
1665661 be/4 user 0.00 B/s 172.42 M/s ?unavailable? ./nntp-history-test -numcpu=4 -BoltDB_PageSize=16 -BoltDB_MaxBatchSize=3 -BoltDB_MaxBatchDelay=1 -BatchFlushEvery=100 -RootBUCKETSperDB=256 -todo 100000000 -p 4 -keyindex=2 -pprofcpu=true -pprofmem=true -pprof localhost:1234 -NoReplayHisDat=false my problem is now that i hardly try to reproduce the problem i had but since i re wrote a lot of my flushing code, i can't get it to infinite writes anymore... at the moment. still trying to reproduce. what i can get is longer zfs flushes but not continues any more xD but these accumulations are happening and i'm almost 100% sure it has to do something with this. 2023/10/24 20:08:06 1698170886015972085 !9! b.db.batch.id=25873 != b.id=25823 diff=50 ???
2023/10/24 20:08:06 1698170886016537722 !9! b.db.batch.id=25919 != b.id=25869 diff=50 ???
2023/10/24 20:08:06 1698170886018185658 !9! b.db.batch.id=25911 != b.id=25861 diff=50 ???
2023/10/24 20:08:06 1698170886019904352 !9! b.db.batch.id=25919 != b.id=25868 diff=51 ???
2023/10/24 20:08:11 1698170891185435838 !9! b.db.batch.id=27566 != b.id=27513 diff=53 ???
2023/10/24 20:08:11 1698170891418007268 !9! b.db.batch.id=27770 != b.id=27720 diff=50 ??? |
this debugging branch is a bit messed up but maybe this works. thus don't know if anyone can follow. 👯 here this line 927 Batch() is a clean approach i'd suggest. with a batchProcessor() below, no time.AfterFunc but a static ticker. and overshooting does not happen. which is kind of hard to debug in this link, as all counters and debugs a use are missing 🍡 go batchProcessor() needs it's place right before the return in db.go Open() |
every day with bbolt is a good day but there is still some logical issue i have and i'm not happy with this idea either. (initially) we call(ed) trigger in a go routine to "wake up", but this can't run before we released the lock because it runs into a lock too. i understand that there was the intention to not break (or delay) running batches as the comment in run() says, but this, under some rare circumstances can lead to go routines building up.. i can see some, but not that much but i know a bug is in there. just because i can't trigger it atm, does not mean it won't hit sometime in future. another thing we should talk about is the run() function. if the storage can't write fast enough, the Batch() func always creates new batches and fires them out to the time.AfterFunc. further the comment ☕ please, thanks! |
The current implementation of MaxBatchDelay and MaxBatchSize does not seem to function as expected. I've been working on a potential solution, here's how the updated process works: When Batch() creates a new batch because the existing batch exceeds MaxBatchSize, the old (full) batch (pointer to calls) is passed to the db.batCH channel in BatchProcessor. This approach ensures that the delay between flushes works as intended without interrupting batches, as long as MaxBatchSize does not trigger.
full commit in this branch batchProcessor-t2 testing now if you got an email with a different text... don't ask... cat jumped table while i tried to hit edit.. she landed at enter! unreal... |
Hello World!
I'm debugging writes to bboltDB but the dump() function which exists in node.go is commented-out in write() and if enabled: won't help much and has bugs: does not work ...
Maybe one finds this snippet helpful.
Should i create a pull request?
An updated version can exist here: https://github.com/go-while/nntp-history/blob/main/_DEBUGS_bbolt_node.go
// edit days later
i dumped my debugs into this tree:
https://github.com/go-while/bbolt/tree/debugs
The text was updated successfully, but these errors were encountered: