Skip to content

Commit

Permalink
[FLINK-32082][docs] Documentation of checkpoint file-merging (#24766)
Browse files Browse the repository at this point in the history
  • Loading branch information
fredia committed May 16, 2024
1 parent b87ead7 commit a8cf2ba
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -250,5 +250,22 @@ StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironm
需要注意的是,这一行为可能会延长任务运行的时间,如果 checkpoint 周期比较大,这一延迟会非常明显。
极端情况下,如果 checkpoint 的周期被设置为 `Long.MAX_VALUE`,那么任务永远不会结束,因为下一次 checkpoint 不会进行。

## 统一的 checkpoint 文件合并机制 (实验性功能)

Flink 1.20 引入了 MVP 版本的统一 checkpoint 文件合并机制,该机制允许把分散的 checkpoint 小文件合并到大文件中,减少 checkpoint 文件创建删除的次数,
有助于减轻文件过多问题带来的文件系统元数据管理的压力。可以通过将 `state.checkpoints.file-merging.enabled` 设置为 `true` 来开启该机制。
**注意**,考虑 trade-off,开启该机制会导致空间放大,即文件系统上的实际占用会比 state size 更大,可以通过设置 `state.checkpoints.file-merging.max-space-amplification`
来控制文件放大的上限。

该机制适用于 Flink 中的 keyed state、operator state 和 channel state。对 shared scope state
提供 subtask 级别的合并;对 private scope state 提供 TaskManager 级别的合并,可以通过
`state.checkpoints.file-merging.max-subtasks-per-file` 选项配置单个文件允许写入的最大 subtask 数目。

统一文件合并机制也支持跨 checkpoint 的文件合并,通过设置 `state.checkpoints.file-merging.across-checkpoint-boundary``true` 开启。

该机制引入了文件池用于处理并发写的场景,文件池有两种模式,Non-blocking 模式的文件池会对每个文件请求即时返回一个物理文件,在频繁请求的情况下会创建出许多物理文件;
而 Blocking 模式的文件池会一直阻塞文件请求,直到文件池中有返回的文件可用,可以通过设置 `state.checkpoints.file-merging.pool-blocking``true`
选择 Blocking 模式,设置为 `false` 选择 Non-blocking 模式。

{{< top >}}

22 changes: 22 additions & 0 deletions docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,4 +292,26 @@ The final checkpoint would be triggered immediately after all operators have rea
without waiting for periodic triggering, but the job will need to wait for this final checkpoint
to be completed.

## Unify file merging mechanism for checkpoints (Experimental)

The unified file merging mechanism for checkpointing is introduced to Flink 1.20 as an MVP ("minimum viable product") feature,
which allows scattered small checkpoint files to be written into larger files, reducing the number of file creations
and file deletions, which alleviates the pressure of file system metadata management raised by the file flooding problem during checkpoints.
The mechanism can be enabled by setting `state.checkpoints.file-merging.enabled` to `true`.
**Note** that as a trade-off, enabling this mechanism may lead to space amplification, that is, the actual occupation on the file system
will be larger than actual state size. `state.checkpoints.file-merging.max-space-amplification`
can be used to limit the upper bound of space amplification.

This mechanism is applicable to keyed state, operator state and channel state in Flink. Merging at subtask level is
provided for shared scope state; Merging at TaskManager level is provided for private scope state. The maximum number of subtasks
allowed to be written to a single file can be configured through the `state.checkpoints.file-merging.max-subtasks-per-file` option.

This feature also supports merging files across checkpoints. To enable this, set
`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`.

This mechanism introduces a file pool to handle concurrent writing scenarios. There are two modes, the non-blocking mode will
always provide usable physical file without blocking when receive a file request, it may create many physical files if poll
file frequently; while the blocking mode will be blocked until there are returned files available in the file pool. This can be configured via
setting `state.checkpoints.file-merging.pool-blocking` as `true` for blocking or `false` for non-blocking.

{{< top >}}

0 comments on commit a8cf2ba

Please sign in to comment.