Skip to content

Commit

Permalink
Fix tests
Browse files Browse the repository at this point in the history
  • Loading branch information
mariosasko committed Sep 30, 2022
1 parent b4fcc11 commit fd38197
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/source/package_reference/task_templates.mdx
Expand Up @@ -4,6 +4,8 @@ The tasks supported by [`Dataset.prepare_for_task`] and [`DatasetDict.prepare_fo

[[autodoc]] datasets.tasks.AutomaticSpeechRecognition

[[autodoc]] datasets.tasks.AudioClassification

[[autodoc]] datasets.tasks.ImageClassification
- align_with_features

Expand Down
1 change: 1 addition & 0 deletions tests/packaged_modules/test_folder_based_builder.py
Expand Up @@ -18,6 +18,7 @@ class DummyFolderBasedBuilder(FolderBasedBuilder):
BASE_COLUMN_NAME = "base"
BUILDER_CONFIG_CLASS = FolderBasedBuilderConfig
EXTENSIONS = [".txt"]
CLASSIFICATION_TASK = None


@pytest.fixture
Expand Down

1 comment on commit fd38197

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.009741 / 0.011353 (-0.001612) 0.005008 / 0.011008 (-0.006000) 0.035345 / 0.038508 (-0.003163) 0.043481 / 0.023109 (0.020372) 0.375388 / 0.275898 (0.099490) 0.463817 / 0.323480 (0.140337) 0.006997 / 0.007986 (-0.000989) 0.005455 / 0.004328 (0.001126) 0.008725 / 0.004250 (0.004475) 0.046088 / 0.037052 (0.009036) 0.400392 / 0.258489 (0.141903) 0.429322 / 0.293841 (0.135481) 0.047802 / 0.128546 (-0.080744) 0.023716 / 0.075646 (-0.051930) 0.325783 / 0.419271 (-0.093488) 0.068095 / 0.043533 (0.024562) 0.390060 / 0.255139 (0.134921) 0.396720 / 0.283200 (0.113520) 0.120882 / 0.141683 (-0.020801) 1.743753 / 1.452155 (0.291599) 1.844224 / 1.492716 (0.351507)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.275245 / 0.018006 (0.257239) 0.617806 / 0.000490 (0.617316) 0.003261 / 0.000200 (0.003061) 0.000110 / 0.000054 (0.000055)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.024537 / 0.037411 (-0.012874) 0.110218 / 0.014526 (0.095692) 0.129090 / 0.176557 (-0.047466) 0.177710 / 0.737135 (-0.559425) 0.128949 / 0.296338 (-0.167389)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.598947 / 0.215209 (0.383738) 6.028672 / 2.077655 (3.951017) 2.442168 / 1.504120 (0.938048) 2.172734 / 1.541195 (0.631539) 2.103708 / 1.468490 (0.635217) 0.717567 / 4.584777 (-3.867210) 5.534376 / 3.745712 (1.788663) 2.844379 / 5.269862 (-2.425483) 1.897387 / 4.565676 (-2.668289) 0.081432 / 0.424275 (-0.342843) 0.013316 / 0.007607 (0.005709) 0.729069 / 0.226044 (0.503025) 7.490761 / 2.268929 (5.221833) 3.177736 / 55.444624 (-52.266889) 2.511463 / 6.876477 (-4.365014) 2.612129 / 2.142072 (0.470057) 0.886044 / 4.805227 (-3.919183) 0.188195 / 6.500664 (-6.312469) 0.082815 / 0.075469 (0.007345)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.818302 / 1.841788 (-0.023486) 16.225205 / 8.074308 (8.150897) 39.547128 / 10.191392 (29.355736) 1.206194 / 0.680424 (0.525771) 0.684066 / 0.534201 (0.149865) 0.514598 / 0.579283 (-0.064686) 0.639681 / 0.434364 (0.205317) 0.366445 / 0.540337 (-0.173892) 0.358624 / 1.386936 (-1.028312)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.008020 / 0.011353 (-0.003333) 0.004913 / 0.011008 (-0.006095) 0.034814 / 0.038508 (-0.003694) 0.036713 / 0.023109 (0.013604) 0.421896 / 0.275898 (0.145998) 0.532355 / 0.323480 (0.208875) 0.005254 / 0.007986 (-0.002731) 0.005671 / 0.004328 (0.001342) 0.006177 / 0.004250 (0.001926) 0.041908 / 0.037052 (0.004856) 0.434092 / 0.258489 (0.175603) 0.501482 / 0.293841 (0.207642) 0.046537 / 0.128546 (-0.082009) 0.015139 / 0.075646 (-0.060508) 0.339246 / 0.419271 (-0.080026) 0.066892 / 0.043533 (0.023359) 0.418952 / 0.255139 (0.163813) 0.465041 / 0.283200 (0.181842) 0.119380 / 0.141683 (-0.022303) 1.739895 / 1.452155 (0.287741) 1.793212 / 1.492716 (0.300495)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.326975 / 0.018006 (0.308968) 0.556283 / 0.000490 (0.555793) 0.042769 / 0.000200 (0.042569) 0.000478 / 0.000054 (0.000424)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.025371 / 0.037411 (-0.012040) 0.111579 / 0.014526 (0.097053) 0.126399 / 0.176557 (-0.050158) 0.172466 / 0.737135 (-0.564670) 0.133059 / 0.296338 (-0.163279)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.650886 / 0.215209 (0.435677) 6.449908 / 2.077655 (4.372254) 2.792735 / 1.504120 (1.288615) 2.395483 / 1.541195 (0.854288) 2.550657 / 1.468490 (1.082167) 0.741656 / 4.584777 (-3.843121) 5.524500 / 3.745712 (1.778788) 2.869196 / 5.269862 (-2.400666) 1.862293 / 4.565676 (-2.703384) 0.082395 / 0.424275 (-0.341880) 0.013256 / 0.007607 (0.005648) 0.779311 / 0.226044 (0.553267) 7.944132 / 2.268929 (5.675204) 3.477485 / 55.444624 (-51.967139) 2.812675 / 6.876477 (-4.063802) 2.851297 / 2.142072 (0.709225) 0.912354 / 4.805227 (-3.892873) 0.189556 / 6.500664 (-6.311108) 0.074117 / 0.075469 (-0.001352)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.988850 / 1.841788 (0.147062) 16.185138 / 8.074308 (8.110830) 40.195112 / 10.191392 (30.003720) 1.244627 / 0.680424 (0.564203) 0.804354 / 0.534201 (0.270154) 0.490522 / 0.579283 (-0.088761) 0.636858 / 0.434364 (0.202494) 0.352435 / 0.540337 (-0.187902) 0.379627 / 1.386936 (-1.007309)

CML watermark

Please sign in to comment.