Skip to content

Commit

Permalink
update dev version
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Oct 5, 2022
1 parent a946a84 commit 41f7fb4
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@

setup(
name="datasets",
version="2.5.2.dev0", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
version="2.5.3.dev0", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
description="HuggingFace community-driven open-source library of datasets",
long_description=open("README.md", encoding="utf-8").read(),
long_description_content_type="text/markdown",
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "2.5.2.dev0"
__version__ = "2.5.3.dev0"

import platform

Expand Down

1 comment on commit 41f7fb4

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.007035 / 0.011353 (-0.004318) 0.003539 / 0.011008 (-0.007469) 0.026401 / 0.038508 (-0.012108) 0.031296 / 0.023109 (0.008186) 0.258358 / 0.275898 (-0.017540) 0.315229 / 0.323480 (-0.008251) 0.005460 / 0.007986 (-0.002526) 0.003179 / 0.004328 (-0.001150) 0.006214 / 0.004250 (0.001963) 0.041621 / 0.037052 (0.004569) 0.272010 / 0.258489 (0.013521) 0.303260 / 0.293841 (0.009420) 0.027784 / 0.128546 (-0.100762) 0.008520 / 0.075646 (-0.067127) 0.227972 / 0.419271 (-0.191299) 0.046480 / 0.043533 (0.002947) 0.258467 / 0.255139 (0.003328) 0.280646 / 0.283200 (-0.002554) 0.098044 / 0.141683 (-0.043639) 1.311952 / 1.452155 (-0.140203) 1.334765 / 1.492716 (-0.157951)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.200313 / 0.018006 (0.182307) 0.439682 / 0.000490 (0.439192) 0.003999 / 0.000200 (0.003799) 0.000091 / 0.000054 (0.000036)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.023488 / 0.037411 (-0.013923) 0.101195 / 0.014526 (0.086669) 0.114176 / 0.176557 (-0.062381) 0.166381 / 0.737135 (-0.570755) 0.119829 / 0.296338 (-0.176509)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.396302 / 0.215209 (0.181093) 3.955087 / 2.077655 (1.877433) 1.787655 / 1.504120 (0.283535) 1.596379 / 1.541195 (0.055184) 1.642919 / 1.468490 (0.174428) 0.416136 / 4.584777 (-4.168641) 3.719583 / 3.745712 (-0.026129) 3.249323 / 5.269862 (-2.020539) 1.679104 / 4.565676 (-2.886572) 0.051225 / 0.424275 (-0.373051) 0.011128 / 0.007607 (0.003521) 0.497013 / 0.226044 (0.270969) 4.969108 / 2.268929 (2.700179) 2.223065 / 55.444624 (-53.221559) 1.889258 / 6.876477 (-4.987218) 2.031313 / 2.142072 (-0.110759) 0.533200 / 4.805227 (-4.272028) 0.118836 / 6.500664 (-6.381828) 0.060205 / 0.075469 (-0.015264)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.305164 / 1.841788 (-0.536624) 12.396078 / 8.074308 (4.321770) 21.717675 / 10.191392 (11.526283) 0.796635 / 0.680424 (0.116211) 0.498523 / 0.534201 (-0.035678) 0.343551 / 0.579283 (-0.235732) 0.430264 / 0.434364 (-0.004100) 0.259817 / 0.540337 (-0.280520) 0.239763 / 1.386936 (-1.147173)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.005898 / 0.011353 (-0.005455) 0.003802 / 0.011008 (-0.007206) 0.027676 / 0.038508 (-0.010832) 0.032847 / 0.023109 (0.009738) 0.380379 / 0.275898 (0.104481) 0.408650 / 0.323480 (0.085170) 0.003730 / 0.007986 (-0.004255) 0.003294 / 0.004328 (-0.001034) 0.004806 / 0.004250 (0.000556) 0.039843 / 0.037052 (0.002791) 0.392209 / 0.258489 (0.133720) 0.426866 / 0.293841 (0.133025) 0.025471 / 0.128546 (-0.103075) 0.006853 / 0.075646 (-0.068794) 0.253057 / 0.419271 (-0.166214) 0.049243 / 0.043533 (0.005711) 0.389880 / 0.255139 (0.134741) 0.399930 / 0.283200 (0.116730) 0.097397 / 0.141683 (-0.044286) 1.501098 / 1.452155 (0.048944) 1.578698 / 1.492716 (0.085982)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.226384 / 0.018006 (0.208378) 0.438983 / 0.000490 (0.438493) 0.006124 / 0.000200 (0.005924) 0.000100 / 0.000054 (0.000045)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.023344 / 0.037411 (-0.014068) 0.101072 / 0.014526 (0.086546) 0.115036 / 0.176557 (-0.061521) 0.151908 / 0.737135 (-0.585227) 0.117232 / 0.296338 (-0.179106)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.423941 / 0.215209 (0.208732) 4.220953 / 2.077655 (2.143298) 2.024655 / 1.504120 (0.520535) 1.828073 / 1.541195 (0.286878) 1.864144 / 1.468490 (0.395654) 0.423010 / 4.584777 (-4.161767) 3.390384 / 3.745712 (-0.355328) 1.757653 / 5.269862 (-3.512208) 1.066262 / 4.565676 (-3.499415) 0.045043 / 0.424275 (-0.379232) 0.009721 / 0.007607 (0.002114) 0.465055 / 0.226044 (0.239010) 4.586173 / 2.268929 (2.317244) 2.221707 / 55.444624 (-53.222917) 1.889289 / 6.876477 (-4.987188) 2.018535 / 2.142072 (-0.123537) 0.467463 / 4.805227 (-4.337764) 0.104970 / 6.500664 (-6.395695) 0.053395 / 0.075469 (-0.022074)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.347346 / 1.841788 (-0.494442) 12.902633 / 8.074308 (4.828325) 11.335700 / 10.191392 (1.144308) 0.823342 / 0.680424 (0.142918) 0.538788 / 0.534201 (0.004587) 0.328957 / 0.579283 (-0.250326) 0.413229 / 0.434364 (-0.021135) 0.247758 / 0.540337 (-0.292580) 0.253394 / 1.386936 (-1.133542)

CML watermark

Please sign in to comment.