Skip to content

Latest commit

History

History
240 lines (194 loc) 路 4.31 KB

main_classes.mdx

File metadata and controls

240 lines (194 loc) 路 4.31 KB

Main classes

DatasetInfo

[[autodoc]] datasets.DatasetInfo

Dataset

The base class [Dataset] implements a Dataset backed by an Apache Arrow table.

[[autodoc]] datasets.Dataset - add_column - add_item - from_file - from_buffer - from_pandas - from_dict - from_generator - data - cache_files - num_columns - num_rows - column_names - shape - unique - flatten - cast - cast_column - remove_columns - rename_column - rename_columns - class_encode_column - len - iter - formatted_as - set_format - set_transform - reset_format - with_format - with_transform - getitem - cleanup_cache_files - map - filter - select - sort - shuffle - train_test_split - shard - to_tf_dataset - push_to_hub - save_to_disk - load_from_disk - flatten_indices - to_csv - to_pandas - to_dict - to_json - to_parquet - add_faiss_index - add_faiss_index_from_external_arrays - save_faiss_index - load_faiss_index - add_elasticsearch_index - load_elasticsearch_index - list_indexes - get_index - drop_index - search - search_batch - get_nearest_examples - get_nearest_examples_batch - info - split - builder_name - citation - config_name - dataset_size - description - download_checksums - download_size - features - homepage - license - size_in_bytes - supervised_keys - version - from_csv - from_json - from_parquet - from_text - prepare_for_task - align_labels_with_mapping

[[autodoc]] datasets.concatenate_datasets

[[autodoc]] datasets.interleave_datasets

[[autodoc]] datasets.enable_caching

[[autodoc]] datasets.disable_caching

[[autodoc]] datasets.is_caching_enabled

DatasetDict

Dictionary with split names as keys ('train', 'test' for example), and Dataset objects as values. It also has dataset transform methods like map or filter, to process all the splits at once.

[[autodoc]] datasets.DatasetDict - data - cache_files - num_columns - num_rows - column_names - shape - unique - cleanup_cache_files - map - filter - sort - shuffle - set_format - reset_format - formatted_as - with_format - with_transform - flatten - cast - cast_column - remove_columns - rename_column - rename_columns - class_encode_column - push_to_hub - save_to_disk - load_from_disk - from_csv - from_json - from_parquet - from_text - prepare_for_task

IterableDataset

The base class [IterableDataset] implements an iterable Dataset backed by python generators.

[[autodoc]] datasets.IterableDataset - remove_columns - cast_column - cast - iter - map - rename_column - filter - shuffle - skip - take - info - split - builder_name - citation - config_name - dataset_size - description - download_checksums - download_size - features - homepage - license - size_in_bytes - supervised_keys - version

IterableDatasetDict

Dictionary with split names as keys ('train', 'test' for example), and IterableDataset objects as values.

[[autodoc]] datasets.IterableDatasetDict - map - filter - shuffle - with_format - cast - cast_column - remove_columns - rename_column - rename_columns

Features

[[autodoc]] datasets.Features

[[autodoc]] datasets.Sequence

[[autodoc]] datasets.ClassLabel

[[autodoc]] datasets.Value

[[autodoc]] datasets.Translation

[[autodoc]] datasets.TranslationVariableLanguages

[[autodoc]] datasets.Array2D

[[autodoc]] datasets.Array3D

[[autodoc]] datasets.Array4D

[[autodoc]] datasets.Array5D

[[autodoc]] datasets.Audio

[[autodoc]] datasets.Image

MetricInfo

[[autodoc]] datasets.MetricInfo

Metric

The base class Metric implements a Metric backed by one or several [Dataset].

[[autodoc]] datasets.Metric

Filesystems

[[autodoc]] datasets.filesystems.S3FileSystem

[[autodoc]] datasets.filesystems.extract_path_from_uri

[[autodoc]] datasets.filesystems.is_remote_filesystem

Fingerprint

[[autodoc]] datasets.fingerprint.Hasher