Skip to content

Commit

Permalink
Deprecate metrics (#4739)
Browse files Browse the repository at this point in the history
* Deprecate public metric functions

* Test metric deprecation warnings

* Deprecate metrics in docs

* Remove mentions to metrics in docs and README

* Deprecate internal metric functions/classes

* Warn metric deprecation only once

* Deprecate Metric class

* Support deprecating __init__ method for subclassed classes

* Move deprecated decorator to __init__ class method

* Update deprecation message in docs

* Remove mentions to metrics in docstring/README

* Remove new_metric_script template

* Skip metric tests

* Remove metrics from code quality check

* Remove metric test requirements

* Add rouge_score test requirement needed by bigbench

* Remove metrics additional tests requirements

* Remove test requirements only used by metrics

* Address requested changes

* Update deprecation version after latest release

* Remove repeated comment

* Give hint to switch to evaluate

* Fix minor details

* Revert removal of metrics CI tests

* Revert removal of metrics CI tests

* Fix style

* Mock emitted_deprecation_warnings to test warnings
  • Loading branch information
albertvillanova committed Jul 28, 2022
1 parent f9713d2 commit 0c1d099
Show file tree
Hide file tree
Showing 15 changed files with 207 additions and 192 deletions.
14 changes: 3 additions & 11 deletions README.md
Expand Up @@ -40,7 +40,7 @@
<a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/datasets/main/docs/source/imgs/course_banner.png"></a>
</h3>

馃 Datasets also provides access to +40 evaluation metrics and is designed to let the community easily add and share new datasets and evaluation metrics.
馃 Datasets is designed to let the community easily add and share new datasets.

馃 Datasets has many additional interesting features:

Expand Down Expand Up @@ -85,15 +85,13 @@ For more details on using the library with NumPy, pandas, PyTorch or TensorFlow,

- `datasets.list_datasets()` to list the available datasets
- `datasets.load_dataset(dataset_name, **kwargs)` to instantiate a dataset
- `datasets.list_metrics()` to list the available metrics
- `datasets.load_metric(metric_name, **kwargs)` to instantiate a metric

This library can be used for text/image/audio/etc. datasets. Here is an example to load a text dataset:

Here is a quick example:

```python
from datasets import list_datasets, load_dataset, list_metrics, load_metric
from datasets import list_datasets, load_dataset

# Print all the available datasets
print(list_datasets())
Expand All @@ -102,12 +100,6 @@ print(list_datasets())
squad_dataset = load_dataset('squad')
print(squad_dataset['train'][0])

# List all the available metrics
print(list_metrics())

# Load a metric
squad_metric = load_metric('squad')

# Process the dataset - add a column with the length of the context texts
dataset_with_length = squad_dataset.map(lambda x: {"length": len(x["context"])})

Expand Down Expand Up @@ -150,7 +142,7 @@ If you are familiar with the great TensorFlow Datasets, here are the main differ

# Disclaimers

Similar to TensorFlow Datasets, 馃 Datasets is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use them. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.
Similar to TensorFlow Datasets, 馃 Datasets is a utility library that downloads and prepares public datasets. We do not host or distribute most of these datasets, vouch for their quality or fairness, or claim that you have license to use them. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a [GitHub issue](https://github.com/huggingface/datasets/issues/new). Thanks for your contribution to the ML community!

Expand Down
6 changes: 6 additions & 0 deletions docs/source/about_metrics.mdx
@@ -1,5 +1,11 @@
# All about metrics

<Tip warning={true}>

Metrics is deprecated in 馃 Datasets. To learn more about how to use metrics, take a look at the library 馃 [Evaluate](https://huggingface.co/docs/evaluate/index)! In addition to metrics, you can find more tools for evaluating models and datasets.

</Tip>

馃 Datasets provides access to a wide range of NLP metrics. You can load metrics associated with benchmark datasets like GLUE or SQuAD, and complex metrics like BLEURT or BERTScore, with a single command: [`load_metric`]. Once you've loaded a metric, easily compute and evaluate a model's performance.

## ELI5: `load_metric`
Expand Down
2 changes: 1 addition & 1 deletion docs/source/how_to_metrics.mdx
Expand Up @@ -2,7 +2,7 @@

<Tip warning={true}>

Metrics will soon be deprecated in 馃 Datasets. To learn more about how to use metrics, take a look at our newest library 馃 [Evaluate](https://huggingface.co/docs/evaluate/index)! In addition to metrics, we've also added more tools for evaluating models and datasets.
Metrics is deprecated in 馃 Datasets. To learn more about how to use metrics, take a look at the library 馃 [Evaluate](https://huggingface.co/docs/evaluate/index)! In addition to metrics, you can find more tools for evaluating models and datasets.

</Tip>

Expand Down
4 changes: 2 additions & 2 deletions docs/source/index.mdx
Expand Up @@ -2,9 +2,9 @@

<img class="float-left !m-0 !border-0 !dark:border-0 !shadow-none !max-w-lg w-[150px]" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/datasets_logo.png"/>

馃 Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks.
馃 Datasets is a library for easily accessing and sharing datasets for Natural Language Processing (NLP), computer vision, and audio tasks.

Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the [Hugging Face Hub](https://huggingface.co/datasets), allowing you to easily load and share a dataset with the wider NLP community. There are currently over 2658 datasets, and more than 34 metrics available.
Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the [Hugging Face Hub](https://huggingface.co/datasets), allowing you to easily load and share a dataset with the wider NLP community.

Find your dataset today on the [Hugging Face Hub](https://huggingface.co/datasets), and take an in-depth look inside of it with the live viewer.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/loading.mdx
Expand Up @@ -340,7 +340,7 @@ Now when you look at your dataset features, you can see it uses the custom label

<Tip warning={true}>

Metrics will soon be deprecated in 馃 Datasets. To learn more about how to use metrics, take a look at our newest library 馃 [Evaluate](https://huggingface.co/docs/evaluate/index)! In addition to metrics, we've also added more tools for evaluating models and datasets.
Metrics is deprecated in 馃 Datasets. To learn more about how to use metrics, take a look at the library 馃 [Evaluate](https://huggingface.co/docs/evaluate/index)! In addition to metrics, you can find more tools for evaluating models and datasets.

</Tip>

Expand Down
2 changes: 1 addition & 1 deletion docs/source/metrics.mdx
Expand Up @@ -2,7 +2,7 @@

<Tip warning={true}>

Metrics will soon be deprecated in 馃 Datasets. To learn more about how to use metrics, take a look at our newest library 馃 [Evaluate](https://huggingface.co/docs/evaluate/index)! In addition to metrics, we've also added more tools for evaluating models and datasets.
Metrics is deprecated in 馃 Datasets. To learn more about how to use metrics, take a look at the library 馃 [Evaluate](https://huggingface.co/docs/evaluate/index)! In addition to metrics, you can find more tools for evaluating models and datasets.

</Tip>

Expand Down
6 changes: 6 additions & 0 deletions docs/source/package_reference/loading_methods.mdx
Expand Up @@ -22,6 +22,12 @@ Methods for listing and loading datasets and metrics:

## Metrics

<Tip warning={true}>

Metrics is deprecated in 馃 Datasets. To learn more about how to use metrics, take a look at the library 馃 [Evaluate](https://huggingface.co/docs/evaluate/index)! In addition to metrics, you can find more tools for evaluating models and datasets.

</Tip>

[[autodoc]] datasets.list_metrics

[[autodoc]] datasets.load_metric
Expand Down
11 changes: 6 additions & 5 deletions setup.py
Expand Up @@ -120,6 +120,7 @@
"botocore>=1.22.8", # to be compatible with aiobotocore and boto3
"faiss-cpu>=1.6.4",
"fsspec[s3]",
"lz4",
"moto[s3,server]==2.0.4",
"rarfile>=4.0",
"s3fs>=2021.11.1", # aligned with fsspec[http]>=2021.11.1
Expand All @@ -132,29 +133,29 @@
"bs4",
"conllu",
"h5py",
"langdetect",
"lxml",
"lz4",
"mwparserfromhell",
"nltk",
"openpyxl",
"py7zr",
"tldextract",
"zstandard",
"bigbench @ https://storage.googleapis.com/public_research_data/bigbench/bigbench-0.0.1.tar.gz",
"sentencepiece", # bigbench requires t5 which requires seqio which requires sentencepiece
"rouge_score<0.0.7", # required by bigbench: bigbench.api.util.bb_utils > t5.evaluation.metrics > rouge_score
"sacremoses",
# metrics dependencies
"bert_score>=0.3.6",
"jiwer",
"langdetect",
"mauve-text",
"rouge_score<0.0.7",
"nltk",
# "rouge_score<0.0.7", # also required by bigbench
"sacrebleu",
"sacremoses",
"scikit-learn",
"scipy",
"sentencepiece", # for bleurt
"seqeval",
"tldextract",
# to speed up pip backtracking
"toml>=0.10.1",
"requests_file>=1.5.1",
Expand Down
21 changes: 20 additions & 1 deletion src/datasets/inspect.py
Expand Up @@ -13,7 +13,7 @@
# limitations under the License.

# Lint as: python3
""" List and inspect datasets and metrics."""
""" List and inspect datasets."""

import inspect
import os
Expand All @@ -28,6 +28,7 @@
from .download.streaming_download_manager import StreamingDownloadManager
from .info import DatasetInfo
from .load import dataset_module_factory, import_main_class, load_dataset_builder, metric_module_factory
from .utils.deprecation_utils import deprecated
from .utils.file_utils import relative_to_absolute_path
from .utils.logging import get_logger
from .utils.version import Version
Expand Down Expand Up @@ -70,9 +71,18 @@ def list_datasets(with_community_datasets=True, with_details=False):
return datasets


@deprecated(
"Use 'evaluate.list_evaluation_modules' instead, from the new library 馃 Evaluate: https://huggingface.co/docs/evaluate"
)
def list_metrics(with_community_metrics=True, with_details=False):
"""List all the metrics script available on the Hugging Face Hub.
<Deprecated version="2.5.0">
Use `evaluate.list_evaluation_modules` instead, from the new library 馃 Evaluate: https://huggingface.co/docs/evaluate
</Deprecated>
Args:
with_community_metrics (:obj:`bool`, optional, default ``True``): Include the community provided metrics.
with_details (:obj:`bool`, optional, default ``False``): Return the full details on the metrics instead of only the short name.
Expand Down Expand Up @@ -138,10 +148,19 @@ def inspect_dataset(path: str, local_path: str, download_config: Optional[Downlo
)


@deprecated(
"Use 'evaluate.inspect_evaluation_module' instead, from the new library 馃 Evaluate: https://huggingface.co/docs/evaluate"
)
def inspect_metric(path: str, local_path: str, download_config: Optional[DownloadConfig] = None, **download_kwargs):
r"""
Allow inspection/modification of a metric script by copying it on local drive at local_path.
<Deprecated version="2.5.0">
Use `evaluate.inspect_evaluation_module` instead, from the new library 馃 Evaluate instead: https://huggingface.co/docs/evaluate
</Deprecated>
Args:
path (``str``): path to the dataset processing script with the dataset builder. Can be either:
Expand Down

1 comment on commit 0c1d099

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.008017 / 0.011353 (-0.003336) 0.003940 / 0.011008 (-0.007068) 0.029637 / 0.038508 (-0.008871) 0.034622 / 0.023109 (0.011512) 0.298209 / 0.275898 (0.022311) 0.351120 / 0.323480 (0.027640) 0.005894 / 0.007986 (-0.002092) 0.004862 / 0.004328 (0.000533) 0.006934 / 0.004250 (0.002683) 0.049286 / 0.037052 (0.012234) 0.308095 / 0.258489 (0.049606) 0.348909 / 0.293841 (0.055068) 0.031861 / 0.128546 (-0.096685) 0.009634 / 0.075646 (-0.066012) 0.258003 / 0.419271 (-0.161269) 0.056545 / 0.043533 (0.013012) 0.294436 / 0.255139 (0.039297) 0.312797 / 0.283200 (0.029597) 0.108764 / 0.141683 (-0.032918) 1.456828 / 1.452155 (0.004673) 1.509651 / 1.492716 (0.016934)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.205769 / 0.018006 (0.187763) 0.520963 / 0.000490 (0.520473) 0.006467 / 0.000200 (0.006267) 0.000084 / 0.000054 (0.000030)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.025275 / 0.037411 (-0.012136) 0.101963 / 0.014526 (0.087437) 0.115152 / 0.176557 (-0.061405) 0.163957 / 0.737135 (-0.573178) 0.119420 / 0.296338 (-0.176918)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.399301 / 0.215209 (0.184092) 3.973926 / 2.077655 (1.896272) 1.884189 / 1.504120 (0.380069) 1.703463 / 1.541195 (0.162269) 1.740809 / 1.468490 (0.272319) 0.419875 / 4.584777 (-4.164902) 3.726192 / 3.745712 (-0.019521) 1.972765 / 5.269862 (-3.297096) 1.198000 / 4.565676 (-3.367677) 0.050802 / 0.424275 (-0.373473) 0.011125 / 0.007607 (0.003518) 0.498757 / 0.226044 (0.272712) 4.966685 / 2.268929 (2.697757) 2.323632 / 55.444624 (-53.120992) 2.006996 / 6.876477 (-4.869481) 2.115972 / 2.142072 (-0.026100) 0.536275 / 4.805227 (-4.268953) 0.119538 / 6.500664 (-6.381126) 0.061143 / 0.075469 (-0.014326)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.466666 / 1.841788 (-0.375121) 13.579256 / 8.074308 (5.504948) 25.286239 / 10.191392 (15.094847) 0.860281 / 0.680424 (0.179857) 0.550674 / 0.534201 (0.016473) 0.384589 / 0.579283 (-0.194694) 0.437542 / 0.434364 (0.003178) 0.276053 / 0.540337 (-0.264284) 0.281365 / 1.386936 (-1.105571)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.005994 / 0.011353 (-0.005359) 0.003896 / 0.011008 (-0.007112) 0.027700 / 0.038508 (-0.010809) 0.033394 / 0.023109 (0.010284) 0.298860 / 0.275898 (0.022961) 0.369841 / 0.323480 (0.046361) 0.003911 / 0.007986 (-0.004075) 0.004648 / 0.004328 (0.000319) 0.004890 / 0.004250 (0.000640) 0.044280 / 0.037052 (0.007228) 0.309628 / 0.258489 (0.051139) 0.355540 / 0.293841 (0.061699) 0.029378 / 0.128546 (-0.099169) 0.009591 / 0.075646 (-0.066055) 0.256256 / 0.419271 (-0.163016) 0.053568 / 0.043533 (0.010035) 0.302546 / 0.255139 (0.047407) 0.336340 / 0.283200 (0.053141) 0.102698 / 0.141683 (-0.038985) 1.520068 / 1.452155 (0.067913) 1.597666 / 1.492716 (0.104950)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.283601 / 0.018006 (0.265595) 0.447649 / 0.000490 (0.447159) 0.022287 / 0.000200 (0.022088) 0.000292 / 0.000054 (0.000237)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.025248 / 0.037411 (-0.012163) 0.103149 / 0.014526 (0.088623) 0.116103 / 0.176557 (-0.060453) 0.160182 / 0.737135 (-0.576954) 0.118720 / 0.296338 (-0.177618)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.410761 / 0.215209 (0.195552) 4.109969 / 2.077655 (2.032314) 1.960843 / 1.504120 (0.456723) 1.794215 / 1.541195 (0.253020) 1.845203 / 1.468490 (0.376713) 0.426392 / 4.584777 (-4.158385) 3.788020 / 3.745712 (0.042308) 1.996828 / 5.269862 (-3.273034) 1.216524 / 4.565676 (-3.349153) 0.050867 / 0.424275 (-0.373408) 0.010911 / 0.007607 (0.003304) 0.512840 / 0.226044 (0.286795) 5.118318 / 2.268929 (2.849390) 2.446011 / 55.444624 (-52.998613) 2.174925 / 6.876477 (-4.701551) 2.275525 / 2.142072 (0.133453) 0.530582 / 4.805227 (-4.274645) 0.121369 / 6.500664 (-6.379296) 0.063386 / 0.075469 (-0.012083)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.505261 / 1.841788 (-0.336527) 13.823840 / 8.074308 (5.749532) 24.789139 / 10.191392 (14.597747) 0.907925 / 0.680424 (0.227501) 0.555813 / 0.534201 (0.021612) 0.384579 / 0.579283 (-0.194704) 0.424632 / 0.434364 (-0.009732) 0.267095 / 0.540337 (-0.273243) 0.274966 / 1.386936 (-1.111970)

CML watermark

Please sign in to comment.