Skip to content

Commit

Permalink
[FLINK-25716][docs-zh] Translate "Streaming Concepts" page of "Applic…
Browse files Browse the repository at this point in the history
…ation Development > Table API & SQL" to Chinese
  • Loading branch information
snailHumming authored and MartijnVisser committed Apr 7, 2022
1 parent ead6db7 commit 12620e8
Showing 1 changed file with 40 additions and 67 deletions.
107 changes: 40 additions & 67 deletions docs/content.zh/docs/dev/table/concepts/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,106 +34,79 @@ Flink 的 [Table API]({{< ref "docs/dev/table/tableApi" >}}) 和 [SQL]({{< ref "

下面这些页面包含了概念、实际的限制,以及流式数据处理中的一些特定的配置。

State Management
状态管理
----------------
流模式下运行的表程序利用了Flink作为有状态流处理器的所有能力。

Table programs that run in streaming mode leverage all capabilities of Flink as a stateful stream
processor.
事实上,一个表程序(Table program)可以配置一个 [state backend]({{< ref "docs/ops/state/state_backends" >}})
和多个不同的 [checkpoint 选项]({{< ref "docs/dev/datastream/fault-tolerance/checkpointing" >}})
以处理对不同状态大小和容错需求。这可以对正在运行的 Table API & SQL 管道(pipeline)生成 savepoint,并在这之后用其恢复应用程序的状态。

In particular, a table program can be configured with a [state backend]({{< ref "docs/ops/state/state_backends" >}})
and various [checkpointing options]({{< ref "docs/dev/datastream/fault-tolerance/checkpointing" >}})
for handling different requirements regarding state size and fault tolerance. It is possible to take
a savepoint of a running Table API & SQL pipeline and to restore the application's state at a later
point in time.
### 状态使用

### State Usage

Due to the declarative nature of Table API & SQL programs, it is not always obvious where and how much
state is used within a pipeline. The planner decides whether state is necessary to compute a correct
result. A pipeline is optimized to claim as little state as possible given the current set of optimizer
rules.
由于 Table API & SQL 程序是声明式的,管道内的状态会在哪以及如何被使用并不显然。 Planner 会确认是否需要状态来得到正确的计算结果,
管道会被现有优化规则集优化成尽可能少地索要状态。

{{< hint info >}}
Conceptually, source tables are never kept entirely in state. An implementer deals with logical tables
(i.e. [dynamic tables]({{< ref "docs/dev/table/concepts/dynamic_tables" >}})). Their state requirements
depend on the used operations.
从概念上讲, 源表从来不会在状态中被完全保存。 实现者在处理逻辑表(即[动态表]({{< ref "docs/dev/table/concepts/dynamic_tables" >}}))时,
它们的状态取决于用到的操作。
{{< /hint >}}

Queries such as `SELECT ... FROM ... WHERE` which only consist of field projections or filters are usually
stateless pipelines. However, operations such as joins, aggregations, or deduplications require keeping
intermediate results in a fault-tolerant storage for which Flink's state abstractions are used.
形如 `SELECT ... FROM ... WHERE` 这种只包含字段映射或过滤器的查询的查询语句通常是无状态的管道。 然而诸如 join 、
聚合或去重操作需要在Flink抽象的容错存储内保持中间结果。

{{< hint info >}}
Please refer to the individual operator documentation for more details about how much state is required
and how to limit a potentially ever-growing state size.
请参考独立的算子文档来获取更多关于状态需求量和限制潜在状态大小增长的信息。
{{< /hint >}}

For example, a regular SQL join of two tables requires the operator to keep both input tables in state
entirely. For correct SQL semantics, the runtime needs to assume that a matching could occur at any
point in time from both sides. Flink provides [optimized window and interval joins]({{< ref "docs/dev/table/sql/queries/joins" >}})
that aim to keep the state size small by exploiting the concept of [watermarks]({{< ref "docs/dev/table/concepts/time_attributes" >}}).
例如对两个表进行 join 操作的普通 SQL 需要算子保存两个表的全部输入。基于正确的 SQL 语义,运行时假设两表会在任意时间点进行匹配。
Flink 提供了 [优化窗口和时段 Join 聚合]({{< ref "docs/dev/table/sql/queries/joins" >}})
以利用 [watermarks]({{< ref "docs/dev/table/concepts/time_attributes" >}}) 概念来让保持较小的状态规模。

Another example is the following query that computes the number of clicks per session.
另一个计算每个会话的点击次数的查询语句的例子如下

```sql
SELECT sessionId, COUNT(*) FROM clicks GROUP BY sessionId;
```

The `sessionId` attribute is used as a grouping key and the continuous query maintains a count
for each `sessionId` it observes. The `sessionId` attribute is evolving over time and `sessionId`
values are only active until the session ends, i.e., for a limited period of time. However, the
continuous query cannot know about this property of `sessionId` and expects that every `sessionId`
value can occur at any point of time. It maintains a count for each observed `sessionId` value.
Consequently, the total state size of the query is continuously growing as more and more `sessionId`
values are observed.
`sessionId` 是用于分组的键,连续查询(Continuous Query)维护了每个观察到的 `sessionId` 次数。 `sessionId` 属性随着时间逐步演变,
`sessionId` 的值只活跃到会话结束(即在有限的时间周期内)。然而连续查询无法得知sessionId的这个性质,
并且预期每个 `sessionId` 值会在任何时间点上出现。这维护了每个可见的 `sessionId` 值。因此总状态量会随着 `sessionId` 的发现不断地增长。

#### Idle State Retention Time
#### 空闲状态维持时间

The *Idle State Retention Time* parameter [`table.exec.state.ttl`]({{< ref "docs/dev/table/config" >}}#table-exec-state-ttl)
defines for how long the state of a key is retained without being updated before it is removed.
For the previous example query, the count of a`sessionId` would be removed as soon as it has not
been updated for the configured period of time.
*空间状态位置时间*参数 [`table.exec.state.ttl`]({{< ref "docs/dev/table/config" >}}#table-exec-state-ttl)
定义了状态的键在被更新后要保持多长时间才被移除。在之前的查询例子中,`sessionId` 的数目会在配置的时间内未更新时立刻被移除。

By removing the state of a key, the continuous query completely forgets that it has seen this key
before. If a record with a key, whose state has been removed before, is processed, the record will
be treated as if it was the first record with the respective key. For the example above this means
that the count of a `sessionId` would start again at `0`.
通过移除状态的键,连续查询会完全忘记它曾经见过这个键。如果一个状态带有曾被移除状态的键被处理了,这条记录将被认为是
对应键的第一条记录。上述例子中意味着 `sessionId` 会再次从 `0` 开始计数。

### Stateful Upgrades and Evolution
### 状态化更新与演化

Table programs that are executed in streaming mode are intended as *standing queries* which means they
are defined once and are continuously evaluated as static end-to-end pipelines.
表程序在流模式下执行将被视为*标准查询*,这意味着它们被定义一次后将被一直视为静态的端到端 (end-to-end) 管道

In case of stateful pipelines, any change to both the query or Flink's planner might lead to a completely
different execution plan. This makes stateful upgrades and the evolution of table programs challenging
at the moment. The community is working on improving those shortcomings.
对于这种状态化的管道,对查询和Flink的Planner的改动都有可能导致完全不同的执行计划。这让表程序的状态化的升级和演化在目前而言
仍具有挑战,社区正致力于改进这一缺点。

For example, by adding a filter predicate, the optimizer might decide to reorder joins or change the
schema of an intermediate operator. This prevents restoring from a savepoint due to either changed
topology or different column layout within the state of an operator.
例如为了添加过滤谓词,优化器可能决定重排 join 或改变内部算子的 schema。 这会阻碍从 savepoint 的恢复,因为其被改变的拓扑和
算子状态的列布局差异。

The query implementer must ensure that the optimized plans before and after the change are compatible.
Use the `EXPLAIN` command in SQL or `table.explain()` in Table API to [get insights]({{< ref "docs/dev/table/common" >}}#explaining-a-table).
查询实现者需要确保改变在优化计划前后是兼容的,在 SQL 中使用 `EXPLAIN` 或在 Table API 中使用 `table.explain()`
[获取详情]({{< ref "docs/dev/table/common" >}}#explaining-a-table).

Since new optimizer rules are continuously added, and operators become more efficient and specialized,
also the upgrade to a newer Flink version could lead to incompatible plans.
由于新的优化器规则正不断地被添加,算子变得更加高效和专用,升级到更新的Flink版本可能造成不兼容的计划。

{{< hint warning >}}
Currently, the framework cannot guarantee that state can be mapped from a savepoint to a new table
operator topology.
当前框架无法保证状态可以从 savepoint 映射到新的算子拓扑上。

In other words: Savepoints are only supported if both the query and the Flink version remain constant.
换言之: Savepoint 只在查询语句和版本保持恒定的情况下被支持。
{{< /hint >}}

Since the community rejects contributions that modify the optimized plan and the operator topology
in a patch version (e.g. from `1.13.1` to `1.13.2`), it should be safe to upgrade a Table API & SQL
pipeline to a newer bug fix release. However, major-minor upgrades from (e.g. from `1.12` to `1.13`)
are not supported.
由于社区拒绝在版本补丁(如 `1.13.1``1.13.2`)上对优化计划和算子拓扑进行修改的贡献,对一个 Table API & SQL 管道
升级到新的 bug fix 发行版应当是安全的。然而主次(major-minor)版本的更新(如 `1.12``1.13`)不被支持。

For both shortcomings (i.e. modified query and modified Flink version), we recommend to investigate
whether the state of an updated table program can be "warmed up" (i.e. initialized) with historical
data again before switching to real-time data. The Flink community is working on a [hybrid source]({{< ref "docs/connectors/datastream/hybridsource" >}})
to make this switching as convenient as possible.
由于这两个缺点(即修改查询语句和修改Flink版本),我们推荐实现调查升级后的表程序是否可以在切换到实时数据前,被历史数据"暖机"
(即被初始化)。Flink社区正致力于 [混合源]({{< ref "docs/connectors/datastream/hybridsource" >}}) 来让切换变得尽可能方便。

接下来?
-----------------
Expand Down

0 comments on commit 12620e8

Please sign in to comment.