diff --git a/docs/source/loading.mdx b/docs/source/loading.mdx index 58b30cc59a7..e11ab34083f 100644 --- a/docs/source/loading.mdx +++ b/docs/source/loading.mdx @@ -220,6 +220,22 @@ Load a list of Python dictionaries with [`~Dataset.from_list`]: >>> dataset = Dataset.from_list(my_list) ``` +### Python generator + +Create a dataset from a Python generator with [`~Dataset.from_generator`] + +```py +>>> from datasets import Dataset +>>> def my_gen(): +... yield {"a": 1} +... yield {"a": 2} +... yield {"a": 3} +... +>>> dataset = Dataset.from_generator(my_dict) +``` + +This approach supports loading data larger than available memory. + ### Pandas DataFrame Load Pandas DataFrames with [`~Dataset.from_pandas`]: diff --git a/docs/source/package_reference/main_classes.mdx b/docs/source/package_reference/main_classes.mdx index e9371166580..b4dba764abb 100644 --- a/docs/source/package_reference/main_classes.mdx +++ b/docs/source/package_reference/main_classes.mdx @@ -16,6 +16,7 @@ The base class [`Dataset`] implements a Dataset backed by an Apache Arrow table. - from_buffer - from_pandas - from_dict + - from_generator - data - cache_files - num_columns