Skip to content

Commit

Permalink
Increase max retries for GitHub metrics
Browse files Browse the repository at this point in the history
  • Loading branch information
albertvillanova committed Mar 30, 2022
1 parent a4c9be6 commit 85f5a12
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions src/datasets/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,6 +581,7 @@ def __init__(
self.name = name
self.revision = revision
self.download_config = download_config or DownloadConfig()
self.download_config.max_retries = 3
self.download_mode = download_mode
self.dynamic_modules_path = dynamic_modules_path
assert self.name.count("/") == 0
Expand Down

1 comment on commit 85f5a12

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==5.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.009669 / 0.011353 (-0.001684) 0.004196 / 0.011008 (-0.006813) 0.028317 / 0.038508 (-0.010191) 0.034872 / 0.023109 (0.011763) 0.266419 / 0.275898 (-0.009479) 0.289313 / 0.323480 (-0.034167) 0.007922 / 0.007986 (-0.000063) 0.003747 / 0.004328 (-0.000581) 0.008578 / 0.004250 (0.004328) 0.043551 / 0.037052 (0.006499) 0.250642 / 0.258489 (-0.007847) 0.293538 / 0.293841 (-0.000303) 0.029319 / 0.128546 (-0.099228) 0.009085 / 0.075646 (-0.066562) 0.227205 / 0.419271 (-0.192066) 0.049007 / 0.043533 (0.005475) 0.255266 / 0.255139 (0.000127) 0.276164 / 0.283200 (-0.007035) 0.110644 / 0.141683 (-0.031039) 1.540537 / 1.452155 (0.088382) 1.587509 / 1.492716 (0.094793)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.367923 / 0.018006 (0.349917) 0.555384 / 0.000490 (0.554894) 0.021173 / 0.000200 (0.020973) 0.000468 / 0.000054 (0.000413)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.029178 / 0.037411 (-0.008233) 0.107855 / 0.014526 (0.093329) 0.115222 / 0.176557 (-0.061335) 0.168228 / 0.737135 (-0.568908) 0.116004 / 0.296338 (-0.180335)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.362754 / 0.215209 (0.147545) 3.627778 / 2.077655 (1.550124) 1.571820 / 1.504120 (0.067700) 1.406632 / 1.541195 (-0.134563) 1.533765 / 1.468490 (0.065275) 0.388641 / 4.584777 (-4.196136) 4.230196 / 3.745712 (0.484483) 3.539808 / 5.269862 (-1.730054) 0.856923 / 4.565676 (-3.708753) 0.053313 / 0.424275 (-0.370962) 0.012713 / 0.007607 (0.005106) 0.518661 / 0.226044 (0.292616) 5.202287 / 2.268929 (2.933359) 2.227970 / 55.444624 (-53.216655) 1.880226 / 6.876477 (-4.996251) 2.095213 / 2.142072 (-0.046860) 0.560588 / 4.805227 (-4.244639) 0.124595 / 6.500664 (-6.376069) 0.064636 / 0.075469 (-0.010833)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.427277 / 1.841788 (-0.414510) 14.458356 / 8.074308 (6.384048) 26.618873 / 10.191392 (16.427481) 0.885585 / 0.680424 (0.205161) 0.519814 / 0.534201 (-0.014387) 0.512707 / 0.579283 (-0.066576) 0.510986 / 0.434364 (0.076622) 0.332291 / 0.540337 (-0.208047) 0.341928 / 1.386936 (-1.045008)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.008119 / 0.011353 (-0.003233) 0.004093 / 0.011008 (-0.006915) 0.026803 / 0.038508 (-0.011705) 0.032584 / 0.023109 (0.009475) 0.300138 / 0.275898 (0.024240) 0.318926 / 0.323480 (-0.004554) 0.006257 / 0.007986 (-0.001729) 0.004864 / 0.004328 (0.000536) 0.006933 / 0.004250 (0.002683) 0.040350 / 0.037052 (0.003298) 0.283985 / 0.258489 (0.025496) 0.313099 / 0.293841 (0.019258) 0.029293 / 0.128546 (-0.099253) 0.009053 / 0.075646 (-0.066593) 0.225584 / 0.419271 (-0.193687) 0.046845 / 0.043533 (0.003312) 0.287341 / 0.255139 (0.032202) 0.306944 / 0.283200 (0.023744) 0.089238 / 0.141683 (-0.052445) 1.613743 / 1.452155 (0.161588) 1.683491 / 1.492716 (0.190775)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.403551 / 0.018006 (0.385545) 0.540698 / 0.000490 (0.540208) 0.018173 / 0.000200 (0.017973) 0.000456 / 0.000054 (0.000401)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.025273 / 0.037411 (-0.012139) 0.100388 / 0.014526 (0.085862) 0.104430 / 0.176557 (-0.072127) 0.152140 / 0.737135 (-0.584995) 0.108984 / 0.296338 (-0.187354)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.366050 / 0.215209 (0.150841) 3.651353 / 2.077655 (1.573699) 1.559730 / 1.504120 (0.055610) 1.382258 / 1.541195 (-0.158937) 1.515696 / 1.468490 (0.047206) 0.387450 / 4.584777 (-4.197327) 4.278619 / 3.745712 (0.532907) 3.604239 / 5.269862 (-1.665622) 0.890006 / 4.565676 (-3.675670) 0.053308 / 0.424275 (-0.370968) 0.012296 / 0.007607 (0.004689) 0.512378 / 0.226044 (0.286334) 5.137855 / 2.268929 (2.868926) 2.250759 / 55.444624 (-53.193865) 1.872435 / 6.876477 (-5.004041) 2.057591 / 2.142072 (-0.084481) 0.553721 / 4.805227 (-4.251506) 0.123909 / 6.500664 (-6.376755) 0.062547 / 0.075469 (-0.012922)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.424929 / 1.841788 (-0.416859) 13.651535 / 8.074308 (5.577227) 24.009168 / 10.191392 (13.817776) 0.798004 / 0.680424 (0.117580) 0.473633 / 0.534201 (-0.060568) 0.437261 / 0.579283 (-0.142022) 0.459464 / 0.434364 (0.025101) 0.282225 / 0.540337 (-0.258112) 0.289703 / 1.386936 (-1.097233)

CML watermark

Please sign in to comment.