New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data usage crawler refactor #9075
Data usage crawler refactor #9075
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, just minor comments, LGTM.
Can all your intermediate commits be squashed? or squashed to relevant number of commits? |
@harshavardhana Don't we squash when merging? |
There are way too many commits, the merger won't know what to keep in terms of commit subject. So it is better for you to squash and provide a healthy commit subject. |
## Description Clarify disk (硬盘) and node (节点). Remove the limit (限制) paragraph since there are no max 16 disks limit now. ## Motivation and Context ## How to test this PR? ## Types of changes - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Checklist: - [ ] Fixes a regression (If yes, please add `commit-id` or `PR #` here) - [ ] Documentation needed - [ ] Unit tests needed - [ ] Functional tests needed (If yes, add [mint](https://github.com/minio/mint) PR # here: )
looks like |
It is closed by the caller, here: Line 104 in 9c1fa3a
|
Squashed version of/replaces minio#9075 ## Description Implementation overview: https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b Includes staticcheck upgrade and fixes for that. Some quick performance tests of crawls. This is with a cycle size of 16, 174082 XL objects. ~4K folders at prefix level 2, NVME. ``` BEFORE: Crawl time 1m17.3025259s Disk access: 350MB/s Kernel Time = 59.468 = 70% User Time = 21.984 = 25% Process Time = 81.453 = 95% Virtual Memory = 303 MB AFTER: SET MINIO_DISK_USAGE_CRAWL_DELAY=10 (default) Cycle scan time: 45.354097s Disk Access: 15MB/s Kernel Time = 0.562 = 1% User Time = 0.640 = 1% Process Time = 1.203 = 2% Virtual Memory = 304 MB SET MINIO_DISK_USAGE_CRAWL_DELAY=1 Cycle scan time: 3.3367481s Disk Access: 160MB/s Kernel Time = 0.468 = 3% User Time = 1.421 = 12% Process Time = 1.890 = 16% Virtual Memory = 303 MB SET MINIO_DISK_USAGE_CRAWL_DELAY=0 Cycle scan time: 2.1245395s Disk Access: 175MB/s Kernel Time = 1.500 = 14% User Time = 0.765 = 7% Process Time = 2.265 = 22% Virtual Memory = 304 MB ``` ## How to test this PR? For now the server will display extra information on crawling. ## Types of changes - [x] New feature (non-breaking change which adds functionality)
Squashed version of/replaces minio#9075 ## Description Implementation overview: https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b Includes staticcheck upgrade and fixes for that. Some quick performance tests of crawls. This is with a cycle size of 16, 174082 XL objects. ~4K folders at prefix level 2, NVME. ``` BEFORE: Crawl time 1m17.3025259s Disk access: 350MB/s Kernel Time = 59.468 = 70% User Time = 21.984 = 25% Process Time = 81.453 = 95% Virtual Memory = 303 MB AFTER: SET MINIO_DISK_USAGE_CRAWL_DELAY=10 (default) Cycle scan time: 45.354097s Disk Access: 15MB/s Kernel Time = 0.562 = 1% User Time = 0.640 = 1% Process Time = 1.203 = 2% Virtual Memory = 304 MB SET MINIO_DISK_USAGE_CRAWL_DELAY=1 Cycle scan time: 3.3367481s Disk Access: 160MB/s Kernel Time = 0.468 = 3% User Time = 1.421 = 12% Process Time = 1.890 = 16% Virtual Memory = 303 MB SET MINIO_DISK_USAGE_CRAWL_DELAY=0 Cycle scan time: 2.1245395s Disk Access: 175MB/s Kernel Time = 1.500 = 14% User Time = 0.765 = 7% Process Time = 2.265 = 22% Virtual Memory = 304 MB ``` ## How to test this PR? For now the server will display extra information on crawling. ## Types of changes - [x] New feature (non-breaking change which adds functionality)
7a51742
to
003cb18
Compare
ok, did it, reluctantly. It is very annoying for anyone working on it not being able to pull anymore and get straight up fixes. I know you like force pushing stuff, but that is extremely annoying for anyone else working on it (and I will have to spend time re-merging the stuff into the bloom filter branch). To get a reasonable commit message, may I recommend going to the top comment, find Edit: And copy paste the message as the commit message. Takes 5 seconds and far outweighs having everything force pushed. |
That is definitely easy and well understood, but when we merge the squashed commits become bullet items which are harder to sift through when intermediate commits have titles like below
Should we save it, is it intended to be like this? - It is an onus on the developer to write concise titles and messages even if they are intermediate which needs to be saved. A merger cannot make up commit titles, this also helps the one who merges ample time - because this is not the only PR the merger is merging in his/her day to day activity. So as a practice we generally ask if it's a large PR to squash at the end such that we know exactly what to save as part of the commit message, to keep the history clean as per the original author's intention. |
@klauspost it was decided that we should move the usage file to
So we can remove |
Yes, but there are many more files than this. I've changed it to:
Caches are not strict msgpack, so I don't want to give that impression. |
…inio into data-usage-crawler-recfactor
Mint Automation
9075-8b7a814/mint-gateway-azure.sh.log:
Deleting image on docker hub |
Description
Implementation overview: https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b
Includes staticcheck upgrade and fixes for that.
Some quick performance tests of crawls.
This is with a cycle size of 16, 174082 XL objects. ~4K folders at prefix level 2, NVME.
How to test this PR?
For now the server will display extra information on crawling.
Types of changes
Checklist: