Skip to content

Commit

Permalink
run ci
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Oct 5, 2022
1 parent c1a66f0 commit 4a81477
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ on:
push:
branches:
- main
- patch-release-2.5.2

env:
HF_SCRIPTS_VERSION: main
Expand Down

1 comment on commit 4a81477

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.007305 / 0.011353 (-0.004048) 0.004073 / 0.011008 (-0.006935) 0.026600 / 0.038508 (-0.011908) 0.032239 / 0.023109 (0.009130) 0.279887 / 0.275898 (0.003989) 0.336017 / 0.323480 (0.012537) 0.005808 / 0.007986 (-0.002177) 0.003560 / 0.004328 (-0.000769) 0.006329 / 0.004250 (0.002079) 0.048487 / 0.037052 (0.011435) 0.279465 / 0.258489 (0.020976) 0.315334 / 0.293841 (0.021493) 0.028346 / 0.128546 (-0.100201) 0.008981 / 0.075646 (-0.066665) 0.228449 / 0.419271 (-0.190823) 0.048226 / 0.043533 (0.004693) 0.268219 / 0.255139 (0.013081) 0.284064 / 0.283200 (0.000864) 0.106336 / 0.141683 (-0.035347) 1.299714 / 1.452155 (-0.152441) 1.327623 / 1.492716 (-0.165093)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.286762 / 0.018006 (0.268755) 0.571925 / 0.000490 (0.571435) 0.003195 / 0.000200 (0.002995) 0.000133 / 0.000054 (0.000078)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.024258 / 0.037411 (-0.013153) 0.105112 / 0.014526 (0.090586) 0.116048 / 0.176557 (-0.060509) 0.165885 / 0.737135 (-0.571251) 0.124776 / 0.296338 (-0.171562)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.399163 / 0.215209 (0.183954) 3.963586 / 2.077655 (1.885931) 1.792536 / 1.504120 (0.288416) 1.607917 / 1.541195 (0.066722) 1.690353 / 1.468490 (0.221862) 0.424261 / 4.584777 (-4.160516) 3.731642 / 3.745712 (-0.014070) 2.093867 / 5.269862 (-3.175995) 1.455194 / 4.565676 (-3.110483) 0.051514 / 0.424275 (-0.372762) 0.011031 / 0.007607 (0.003424) 0.504658 / 0.226044 (0.278614) 4.974999 / 2.268929 (2.706070) 2.243218 / 55.444624 (-53.201406) 1.886898 / 6.876477 (-4.989579) 2.072674 / 2.142072 (-0.069398) 0.536842 / 4.805227 (-4.268385) 0.118016 / 6.500664 (-6.382648) 0.060489 / 0.075469 (-0.014980)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.308538 / 1.841788 (-0.533250) 13.240538 / 8.074308 (5.166230) 22.097267 / 10.191392 (11.905875) 0.783744 / 0.680424 (0.103320) 0.488262 / 0.534201 (-0.045939) 0.345756 / 0.579283 (-0.233527) 0.412869 / 0.434364 (-0.021495) 0.275006 / 0.540337 (-0.265331) 0.256202 / 1.386936 (-1.130734)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.006407 / 0.011353 (-0.004946) 0.004254 / 0.011008 (-0.006754) 0.027993 / 0.038508 (-0.010515) 0.036599 / 0.023109 (0.013489) 0.384315 / 0.275898 (0.108417) 0.475339 / 0.323480 (0.151859) 0.004322 / 0.007986 (-0.003663) 0.003688 / 0.004328 (-0.000640) 0.005096 / 0.004250 (0.000846) 0.045406 / 0.037052 (0.008354) 0.387560 / 0.258489 (0.129071) 0.440532 / 0.293841 (0.146691) 0.030429 / 0.128546 (-0.098118) 0.009806 / 0.075646 (-0.065841) 0.257119 / 0.419271 (-0.162152) 0.054295 / 0.043533 (0.010762) 0.380287 / 0.255139 (0.125148) 0.398700 / 0.283200 (0.115501) 0.108943 / 0.141683 (-0.032740) 1.477986 / 1.452155 (0.025832) 1.533836 / 1.492716 (0.041120)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.369053 / 0.018006 (0.351047) 0.525061 / 0.000490 (0.524571) 0.022382 / 0.000200 (0.022182) 0.000183 / 0.000054 (0.000129)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.024672 / 0.037411 (-0.012739) 0.104267 / 0.014526 (0.089742) 0.114398 / 0.176557 (-0.062159) 0.161471 / 0.737135 (-0.575665) 0.120839 / 0.296338 (-0.175500)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.418505 / 0.215209 (0.203295) 4.162400 / 2.077655 (2.084745) 2.010442 / 1.504120 (0.506322) 1.830617 / 1.541195 (0.289423) 1.921378 / 1.468490 (0.452887) 0.424523 / 4.584777 (-4.160254) 3.822478 / 3.745712 (0.076766) 2.050734 / 5.269862 (-3.219127) 1.249684 / 4.565676 (-3.315993) 0.050708 / 0.424275 (-0.373567) 0.010983 / 0.007607 (0.003375) 0.519382 / 0.226044 (0.293337) 5.178505 / 2.268929 (2.909577) 2.456389 / 55.444624 (-52.988236) 2.115778 / 6.876477 (-4.760698) 2.295090 / 2.142072 (0.153018) 0.533644 / 4.805227 (-4.271583) 0.118740 / 6.500664 (-6.381924) 0.060793 / 0.075469 (-0.014676)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.360886 / 1.841788 (-0.480901) 13.543711 / 8.074308 (5.469403) 25.385723 / 10.191392 (15.194331) 0.846131 / 0.680424 (0.165707) 0.548484 / 0.534201 (0.014283) 0.345389 / 0.579283 (-0.233894) 0.397956 / 0.434364 (-0.036408) 0.237599 / 0.540337 (-0.302738) 0.244466 / 1.386936 (-1.142470)

CML watermark

Please sign in to comment.