[[autodoc]] datasets.DatasetInfo
The base class [Dataset
] implements a Dataset backed by an Apache Arrow table.
[[autodoc]] datasets.Dataset - add_column - add_item - from_file - from_buffer - from_pandas - from_dict - from_generator - data - cache_files - num_columns - num_rows - column_names - shape - unique - flatten - cast - cast_column - remove_columns - rename_column - rename_columns - class_encode_column - len - iter - formatted_as - set_format - set_transform - reset_format - with_format - with_transform - getitem - cleanup_cache_files - map - filter - select - sort - shuffle - train_test_split - shard - to_tf_dataset - push_to_hub - save_to_disk - load_from_disk - flatten_indices - to_csv - to_pandas - to_dict - to_json - to_parquet - add_faiss_index - add_faiss_index_from_external_arrays - save_faiss_index - load_faiss_index - add_elasticsearch_index - load_elasticsearch_index - list_indexes - get_index - drop_index - search - search_batch - get_nearest_examples - get_nearest_examples_batch - info - split - builder_name - citation - config_name - dataset_size - description - download_checksums - download_size - features - homepage - license - size_in_bytes - supervised_keys - version - from_csv - from_json - from_parquet - from_text - prepare_for_task - align_labels_with_mapping
[[autodoc]] datasets.concatenate_datasets
[[autodoc]] datasets.interleave_datasets
[[autodoc]] datasets.enable_caching
[[autodoc]] datasets.disable_caching
[[autodoc]] datasets.is_caching_enabled
Dictionary with split names as keys ('train', 'test' for example), and Dataset
objects as values.
It also has dataset transform methods like map or filter, to process all the splits at once.
[[autodoc]] datasets.DatasetDict - data - cache_files - num_columns - num_rows - column_names - shape - unique - cleanup_cache_files - map - filter - sort - shuffle - set_format - reset_format - formatted_as - with_format - with_transform - flatten - cast - cast_column - remove_columns - rename_column - rename_columns - class_encode_column - push_to_hub - save_to_disk - load_from_disk - from_csv - from_json - from_parquet - from_text - prepare_for_task
The base class [IterableDataset
] implements an iterable Dataset backed by python generators.
[[autodoc]] datasets.IterableDataset - remove_columns - cast_column - cast - iter - map - rename_column - filter - shuffle - skip - take - info - split - builder_name - citation - config_name - dataset_size - description - download_checksums - download_size - features - homepage - license - size_in_bytes - supervised_keys - version
Dictionary with split names as keys ('train', 'test' for example), and IterableDataset
objects as values.
[[autodoc]] datasets.IterableDatasetDict - map - filter - shuffle - with_format - cast - cast_column - remove_columns - rename_column - rename_columns
[[autodoc]] datasets.Features
[[autodoc]] datasets.Sequence
[[autodoc]] datasets.ClassLabel
[[autodoc]] datasets.Value
[[autodoc]] datasets.Translation
[[autodoc]] datasets.TranslationVariableLanguages
[[autodoc]] datasets.Array2D
[[autodoc]] datasets.Array3D
[[autodoc]] datasets.Array4D
[[autodoc]] datasets.Array5D
[[autodoc]] datasets.Audio
[[autodoc]] datasets.Image
[[autodoc]] datasets.MetricInfo
The base class Metric
implements a Metric backed by one or several [Dataset
].
[[autodoc]] datasets.Metric
[[autodoc]] datasets.filesystems.S3FileSystem
[[autodoc]] datasets.filesystems.extract_path_from_uri
[[autodoc]] datasets.filesystems.is_remote_filesystem
[[autodoc]] datasets.fingerprint.Hasher