{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":65763282,"defaultBranch":"master","name":"zfs","ownerLogin":"ahrens","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2016-08-15T20:36:15.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/799124?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1702183165.0","currentOid":""},"activityList":{"items":[{"before":"954eb594634966abab1f32ce0687f46e8405d47f","after":"7fc38a82d40c049be6ab7bd272a9f7ee9b3b1478","ref":"refs/heads/spin","pushedAt":"2023-12-10T04:42:14.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"zvol_disk_open() may spin on CPU\n\n`zvol_disk_open()` waits for up to `zfs_vdev_open_timeout_ms` (1 second by\ndefault) (e.g. if the block device does not exist).  While in this loop,\nit calls `schedule_timeout()`.\n\nThe problem is that `schedule_timeout()` may not actually cause the\nthread to go off-CPU.  Per the \"documentation\" (comment in the source\ncode):\n```\n * The function behavior depends on the current task state:\n * %TASK_RUNNING - the scheduler is called, but the task does not sleep\n * at all. That happens because sched_submit_work() does nothing for\n * tasks in %TASK_RUNNING state.\n```\n\nIn my experience, `schedule_timeout()` never sleeps from this code path.\nThis is especially noticeable if `zfs_vdev_open_timeout_ms` has been\nincreased from its default.\n\nThis commit uses `msleep()` to actually sleep.  Note that this is how it\nwas before https://github.com/openzfs/zfs/pull/7629.\n\nSigned-off-by: Matthew Ahrens <mahrens@delphix.com>","shortMessageHtmlLink":"zvol_disk_open() may spin on CPU"}},{"before":null,"after":"954eb594634966abab1f32ce0687f46e8405d47f","ref":"refs/heads/spin","pushedAt":"2023-12-10T04:39:25.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"zvol_disk_open() may spin on CPU\n\n`zvol_disk_open()` waits for up to `zfs_vdev_open_timeout_ms` (1 second by\ndefault) (e.g. if the block device does not exist).  While in this loop,\nit calls `schedule_timeout()`.\n\nThe problem is that `schedule_timeout()` may not actually cause the\nthread to go off-CPU.  Per the \"documentation\" (comment in the source\ncode):\n```\n * The function behavior depends on the current task state:\n * %TASK_RUNNING - the scheduler is called, but the task does not sleep\n * at all. That happens because sched_submit_work() does nothing for\n * tasks in %TASK_RUNNING state.\n```\n\nIn my experience, `schedule_timeout()` never sleeps from this code path.\nThis is especially noticeable if `zfs_vdev_open_timeout_ms` has been\nincreased from its default.\n\nThis commit uses `msleep()` to actually sleep.  Note that this is how it\nwas before https://github.com/openzfs/zfs/pull/7629.","shortMessageHtmlLink":"zvol_disk_open() may spin on CPU"}},{"before":null,"after":"074487ba4fb4a17480a04371ec46055bb5de30e3","ref":"refs/heads/visitbp","pushedAt":"2023-11-20T16:44:08.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"unnecessary alloc/free in dsl_scan_visitbp()\n\nClean up code in dsl_scan_visitbp() by removing an unnecessary\nalloc/free and `goto`.  This has the side benefit of reducing CPU usage,\nwhich is only really noticeable if we are not doing i/o for the leaf\nblocks, like when `zfs_no_scrub_io` is set.\n\nSigned-off-by: Matthew Ahrens <mahrens@delphix.com>","shortMessageHtmlLink":"unnecessary alloc/free in dsl_scan_visitbp()"}},{"before":"f145665488da5effad65603e4c4202c1a865c20a","after":"6f244cdab70d1c93688a103a4591374e15f49255","ref":"refs/heads/verify_bp","pushedAt":"2023-05-04T15:03:00.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"verify block pointers before writing them out\n\nIf a block pointer is corrupted (but the block containing it checksums\ncorrectly, e.g. due to a bug that overwrites random memory), we can\noften detect it before the block is read, with the `zfs_blkptr_verify()`\nfunction, which is used in `arc_read()`, `zio_free()`, etc.\n\nHowever, such corruption is not typically recoverable.  To recover from\nit we would need to detect the memory error before the block pointer is\nwritten to disk.\n\nThis PR verifies BP's that are contained in indirect blocks and dnodes\nbefore they are written to disk, in `dbuf_write_ready()`. This way,\nwe'll get a panic before the on-disk data is corrupted. This will help\nus to diagnose what's causing the corruption, as well as being much\neasier to recover from.\n\nTo minimize performance impact, only checks that can be done without\nholding the spa_config_lock are performed.\n\nAdditionally, when corruption is detected, the raw words of the block\npointer are logged.  (Note that `dprintf_bp()` is a no-op by default,\nbut if enabled it is not safe to use with invalid block pointers.)\n\nSigned-off-by: Matthew Ahrens <mahrens@delphix.com>","shortMessageHtmlLink":"verify block pointers before writing them out"}},{"before":"cf3ad8796efb04779088ce0eab2450eb25a93b7c","after":"f145665488da5effad65603e4c4202c1a865c20a","ref":"refs/heads/verify_bp","pushedAt":"2023-05-02T20:05:16.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"verify block pointers before writing them out\n\nIf a block pointer is corrupted (but the block containing it checksums\ncorrectly, e.g. due to a bug that overwrites random memory), we can\noften detect it before the block is read, with the `zfs_blkptr_verify()`\nfunction, which is used in `arc_read()`, `zio_free()`, etc.\n\nHowever, such corruption is not typically recoverable.  To recover from\nit we would need to detect the memory error before the block pointer is\nwritten to disk.\n\nThis PR verifies BP's that are contained in indirect blocks and dnodes\nbefore they are written to disk, in `dbuf_write_ready()`. This way,\nwe'll get a panic before the on-disk data is corrupted. This will help\nus to diagnose what's causing the corruption, as well as being much\neasier to recover from.\n\nTo minimize performance impact, only checks that can be done without\nholding the spa_config_lock are performed.\n\nAdditionally, when corruption is detected, the raw words of the block\npointer are logged.  (Note that `dprintf_bp()` is a no-op by default,\nbut if enabled it is not safe to use with invalid block pointers.)\n\nSigned-off-by: Matthew Ahrens <mahrens@delphix.com>","shortMessageHtmlLink":"verify block pointers before writing them out"}},{"before":null,"after":"cf3ad8796efb04779088ce0eab2450eb25a93b7c","ref":"refs/heads/verify_bp","pushedAt":"2023-05-02T20:02:02.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"verify block pointers before writing them out\n\nIf a block pointer is corrupted (but the block containing it checksums\ncorrectly, e.g. due to a bug that overwrites random memory), we can\noften detect it before the block is read, with the `zfs_blkptr_verify()`\nfunction, which is used in `arc_read()`, `zio_free()`, etc.\n\nHowever, such corruption is not typically recoverable.  To recover from\nit we would need to detect the memory error before the block pointer is\nwritten to disk.\n\nThis PR verifies BP's that are contained in indirect blocks and dnodes\nbefore they are written to disk, in `dbuf_write_ready()`. This way,\nwe'll get a panic before the on-disk data is corrupted. This will help\nus to diagnose what's causing the corruption, as well as being much\neasier to recover from.\n\nTo minimize performance impact, only checks that can be done without\nholding the spa_config_lock are performed.\n\nAdditionally, when corruption is detected, the raw words of the block\npointer are logged.  (Note that `dprintf_bp()` is a no-op by default,\nbut if enabled it is not safe to use with invalid block pointers.)","shortMessageHtmlLink":"verify block pointers before writing them out"}},{"before":"c4ae54e6d6ea41fee8437702a1363949a85cdcf8","after":"60ef5d5ea60c52827cf05f058b45d1bd8df07e47","ref":"refs/heads/dnode_dirty","pushedAt":"2023-03-13T16:54:22.124Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"new file is holey","shortMessageHtmlLink":"new file is holey"}},{"before":"e3cfa1fbaa43303f07861dec15cff14d2f2dbba0","after":"8e2b28c853298b03cb26b1bc5eda0b40b2295bfb","ref":"refs/heads/prefetch_indirect","pushedAt":"2023-03-11T05:47:16.169Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"fix prefetching - no zero","shortMessageHtmlLink":"fix prefetching - no zero"}},{"before":null,"after":"e3cfa1fbaa43303f07861dec15cff14d2f2dbba0","ref":"refs/heads/prefetch_indirect","pushedAt":"2023-03-10T02:45:10.994Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"fix prefetching of indirect blocks while destroying\n\nWhen traversing a tree of block pointers (e.g. for `zfs destroy <fs>` or\n`zfs send`), we prefetch the indirect blocks that will be needed, in\n`traverse_prefetch_metadata()`.  In the case of `zfs destroy <fs>`, we\ndo a little traversing each txg, and resume the traversal the next txg.\nSo the indirect blocks that will be needed, and thus are candidates for\nprefetching, does not include blocks that are before the resume point.\n\nThe problem is that the logic for determining if the indirect blocks are\nbefore the resume point is incorrect, causing the (up to 1024) L1\nindirect blocks that are inside the first L2 to not be prefetched.  In\npractice, if we are able to read many more than 1024 blocks per txg,\nthen this will be inconsequential.  But if i/o latency is more than a\nfew milliseconds, almost no L1's will be prefetched, so they will be\nread serially, and thus the destroying will be very slow.  This can be\nobserved as `zpool get freeing` decreasing very slowly.\n\nSpecifically: When we first examine the L2 that contains the block we'll\nbe resuming from, we have not yet resumed, so `td_resume` is nonzero.\nAt this point, all calls to `traverse_prefetch_metadata()` will fail,\neven if the L1 in question is after the resume point.  It isn't until\nthe callback is issued for the resume point that we zero out\n`td_resume`, but by this point we've already attempted and failed to\nprefetch everything under this L2 indirect block.\n\nThis commit addresses the issue by reusing the existing\n`resume_skip_check()` to determine if the L1's bookmark is before or\nafter the resume point.  To do so, this function is made non-mutating\n(the caller now zeros `td_resume`).\n\nNote, this bug likely predates (was not introduced by) #11803.\n\nSigned-off-by: Matthew Ahrens <mahrens@delphix.com>","shortMessageHtmlLink":"fix prefetching of indirect blocks while destroying"}},{"before":"dea10003f83d6e13d1d751d34309db63a9b05c2a","after":"c4ae54e6d6ea41fee8437702a1363949a85cdcf8","ref":"refs/heads/dnode_dirty","pushedAt":"2023-03-09T22:55:05.743Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ahrens","name":"Matthew Ahrens","path":"/ahrens","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/799124?s=80&v=4"},"commit":{"message":"ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()\n\n`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks\nreflect all writes, i.e. when there are no dirty data blocks.  To ensure\nthis, if the target dnode is dirty, they wait for the open txg to be\nsynced, so we can call them \"stabilizing operations\".  If they cause\ntxg_wait_synced often, it can be detrimental to performance.\n\nTypically, a group of files are all modified, and then SEEK_DATA/HOLE\nare performed on them.  In this case, the first SEEK does a\ntxg_wait_synced(), and subsequent SEEKs don't need to wait, so\nperformance is good.\n\nHowever, if a workload involves an interleaved metadata modification,\nthe subsequent SEEK may do a txg_wait_synced() unnecessarily.  For\nexample, if we do a `read()` syscall to each file before we do its SEEK.\nThis applies even with `relatime=on`, when the `read()` is the first\nread after the last write.  The txg_wait_synced() is unnecessary because\nthe SEEK operations only care that the structure of the tree of indirect\nand data blocks is up to date on disk.  They don't care about metadata\nlike the contents of the bonus or spill blocks.  (They also don't care\nif an existing data block is modified, but this would be more involved\nto filter out.)\n\nThis commit changes the behavior of SEEK_DATA/HOLE operations such that\nthey do not call txg_wait_synced() if there is only a pending change to\nthe bonus or spill block.\n\nSigned-off-by: Matthew Ahrens <mahrens@delphix.com>","shortMessageHtmlLink":"ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADxlqeZwA","startCursor":null,"endCursor":null}},"title":"Activity · ahrens/zfs"}