Skip to content

Commit

Permalink
Add suggested command to benchmark action
Browse files Browse the repository at this point in the history
  • Loading branch information
mariosasko committed Jun 13, 2022
1 parent f9b24c4 commit 8a82d72
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/benchmarks.yaml
Expand Up @@ -10,6 +10,9 @@ jobs:
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
run: |
# See https://github.com/actions/checkout/issues/760
git config --global --add safe.directory /__w/datasets/datasets
# Your ML workflow goes here
pip install --upgrade pip
Expand Down

1 comment on commit 8a82d72

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.009043 / 0.011353 (-0.002310) 0.004598 / 0.011008 (-0.006411) 0.034826 / 0.038508 (-0.003682) 0.038290 / 0.023109 (0.015181) 0.372208 / 0.275898 (0.096310) 0.408463 / 0.323480 (0.084983) 0.006551 / 0.007986 (-0.001435) 0.003869 / 0.004328 (-0.000459) 0.008202 / 0.004250 (0.003952) 0.049328 / 0.037052 (0.012275) 0.373025 / 0.258489 (0.114536) 0.409938 / 0.293841 (0.116097) 0.035505 / 0.128546 (-0.093042) 0.011043 / 0.075646 (-0.064604) 0.303495 / 0.419271 (-0.115776) 0.056750 / 0.043533 (0.013217) 0.370589 / 0.255139 (0.115450) 0.411304 / 0.283200 (0.128104) 0.113412 / 0.141683 (-0.028271) 1.676455 / 1.452155 (0.224300) 1.712847 / 1.492716 (0.220131)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.240170 / 0.018006 (0.222164) 0.496482 / 0.000490 (0.495993) 0.009496 / 0.000200 (0.009296) 0.000451 / 0.000054 (0.000397)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.027277 / 0.037411 (-0.010135) 0.118686 / 0.014526 (0.104160) 0.133196 / 0.176557 (-0.043361) 0.188992 / 0.737135 (-0.548143) 0.138735 / 0.296338 (-0.157603)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.469678 / 0.215209 (0.254469) 4.620882 / 2.077655 (2.543227) 2.105627 / 1.504120 (0.601507) 1.913330 / 1.541195 (0.372135) 1.989933 / 1.468490 (0.521443) 0.486650 / 4.584777 (-4.098127) 4.487808 / 3.745712 (0.742096) 1.066411 / 5.269862 (-4.203451) 1.070045 / 4.565676 (-3.495631) 0.061480 / 0.424275 (-0.362795) 0.012688 / 0.007607 (0.005081) 0.596698 / 0.226044 (0.370653) 5.923950 / 2.268929 (3.655022) 2.656304 / 55.444624 (-52.788320) 2.290783 / 6.876477 (-4.585694) 2.390951 / 2.142072 (0.248879) 0.612338 / 4.805227 (-4.192889) 0.135604 / 6.500664 (-6.365060) 0.070851 / 0.075469 (-0.004618)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.716224 / 1.841788 (-0.125564) 16.138292 / 8.074308 (8.063984) 29.601884 / 10.191392 (19.410492) 1.011742 / 0.680424 (0.331318) 0.649605 / 0.534201 (0.115404) 0.451842 / 0.579283 (-0.127441) 0.521070 / 0.434364 (0.086706) 0.321496 / 0.540337 (-0.218842) 0.345593 / 1.386936 (-1.041343)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.007253 / 0.011353 (-0.004100) 0.004580 / 0.011008 (-0.006428) 0.032980 / 0.038508 (-0.005528) 0.039148 / 0.023109 (0.016039) 0.388127 / 0.275898 (0.112229) 0.387589 / 0.323480 (0.064109) 0.004539 / 0.007986 (-0.003447) 0.003914 / 0.004328 (-0.000414) 0.005656 / 0.004250 (0.001406) 0.046664 / 0.037052 (0.009611) 0.353449 / 0.258489 (0.094960) 0.397707 / 0.293841 (0.103866) 0.035047 / 0.128546 (-0.093499) 0.011361 / 0.075646 (-0.064286) 0.301868 / 0.419271 (-0.117404) 0.063398 / 0.043533 (0.019865) 0.355012 / 0.255139 (0.099873) 0.378466 / 0.283200 (0.095266) 0.116707 / 0.141683 (-0.024976) 1.735253 / 1.452155 (0.283099) 1.803050 / 1.492716 (0.310333)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.290745 / 0.018006 (0.272739) 0.507573 / 0.000490 (0.507083) 0.060437 / 0.000200 (0.060237) 0.000384 / 0.000054 (0.000330)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.028408 / 0.037411 (-0.009004) 0.117961 / 0.014526 (0.103435) 0.133373 / 0.176557 (-0.043183) 0.179033 / 0.737135 (-0.558102) 0.136856 / 0.296338 (-0.159483)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.469886 / 0.215209 (0.254677) 4.778086 / 2.077655 (2.700432) 2.312407 / 1.504120 (0.808287) 2.121395 / 1.541195 (0.580200) 2.129276 / 1.468490 (0.660786) 0.509561 / 4.584777 (-4.075216) 4.505175 / 3.745712 (0.759462) 1.108123 / 5.269862 (-4.161738) 1.088305 / 4.565676 (-3.477372) 0.061614 / 0.424275 (-0.362661) 0.012658 / 0.007607 (0.005051) 0.581836 / 0.226044 (0.355792) 5.818010 / 2.268929 (3.549082) 2.675072 / 55.444624 (-52.769552) 2.306189 / 6.876477 (-4.570287) 2.480382 / 2.142072 (0.338310) 0.617323 / 4.805227 (-4.187905) 0.142797 / 6.500664 (-6.357867) 0.074574 / 0.075469 (-0.000895)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.730901 / 1.841788 (-0.110887) 15.960685 / 8.074308 (7.886377) 29.670087 / 10.191392 (19.478695) 1.025512 / 0.680424 (0.345089) 0.647073 / 0.534201 (0.112872) 0.454320 / 0.579283 (-0.124963) 0.520929 / 0.434364 (0.086565) 0.311323 / 0.540337 (-0.229014) 0.319601 / 1.386936 (-1.067335)

CML watermark

Please sign in to comment.