Skip to content

Commit

Permalink
Add an faq about Structured Streaming. (#1298)
Browse files Browse the repository at this point in the history
So far we don't support Structured Streaming.
We should explicitly add a doc and throw a better exception for now. (related to #1297)
  • Loading branch information
rising-star92 committed Feb 21, 2020
1 parent 8b688a9 commit 7dc2b63
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 2 deletions.
1 change: 1 addition & 0 deletions databricks/koalas/internal.py
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,7 @@ def __init__(self, sdf: spark.DataFrame,
Column<b'(a, y)'>
"""
assert isinstance(sdf, spark.DataFrame)
assert not sdf.isStreaming, "Koalas does not support Structured Streaming."

if index_map is None:
assert not any(SPARK_INDEX_NAME_PATTERN.match(name) for name in sdf.columns), \
Expand Down
2 changes: 1 addition & 1 deletion docs/source/development/design.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ Be a lean API layer and move fast

Koalas is designed as an API overlay layer on top of Spark. The project should be lightweight, and most functions should be implemented as wrappers around Spark or pandas. Koalas does not accept heavyweight implementations, e.g. execution engine changes.

This approach enables us to move fast. For the considerable future, we aim to be making weekly releases. If we find a critical bug, we will be making a new release as soon as the bug fix is available.
This approach enables us to move fast. For the considerable future, we aim to be making bi-weekly releases. If we find a critical bug, we will be making a new release as soon as the bug fix is available.

High test coverage
------------------
Expand Down
23 changes: 22 additions & 1 deletion docs/source/user_guide/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ What's the project's status?
----------------------------

This project is currently in beta and is rapidly evolving.
We plan to do weekly releases at this stage.
We plan to do bi-weekly releases at this stage.
You should expect the following differences:

- some functions may be missing. Please create a GitHub issue if your favorite function is not yet supported. We also document all the functions that are not yet supported in the `missing directory <https://github.com/databricks/koalas/tree/master/databricks/koalas/missing>`_.
Expand All @@ -30,6 +30,27 @@ Should I use PySpark's DataFrame API or Koalas?
If you are already familiar with pandas and want to leverage Spark for big data, we recommend
using Koalas. If you are learning Spark from ground up, we recommend you start with PySpark's API.

Does Koalas support Structured Streaming?
-----------------------------------------

No, Koalas does not support Structured Streaming officially.

As a workaround, you can use Koalas APIs with `foreachBatch` in Structured Streaming which allows batch APIs:

.. code-block:: python
>>> def func(batch_df, batch_id):
... koalas_df = ks.DataFrame(batch_df)
... koalas_df['a'] = 1
... print(koalas_df)
>>> spark.readStream.format("rate").load().writeStream.foreachBatch(func).start()
timestamp value a
0 2020-02-21 09:49:37.574 4 1
timestamp value a
0 2020-02-21 09:49:38.574 5 1
...
How can I request support for a method?
---------------------------------------

Expand Down

0 comments on commit 7dc2b63

Please sign in to comment.