Skip to content

Commit

Permalink
Increase max retries for GitHub metrics (#4063)
Browse files Browse the repository at this point in the history
* Increase max retries for GitHub metrics

* Address requested changes
  • Loading branch information
albertvillanova committed Mar 31, 2022
1 parent 1b5db0a commit 1c83d53
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion src/datasets/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -580,7 +580,9 @@ def __init__(
):
self.name = name
self.revision = revision
self.download_config = download_config or DownloadConfig()
self.download_config = download_config.copy() or DownloadConfig()
if self.download_config.max_retries < 3:
self.download_config.max_retries = 3
self.download_mode = download_mode
self.dynamic_modules_path = dynamic_modules_path
assert self.name.count("/") == 0
Expand Down

1 comment on commit 1c83d53

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==5.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010057 / 0.011353 (-0.001296) 0.004051 / 0.011008 (-0.006958) 0.031151 / 0.038508 (-0.007357) 0.035781 / 0.023109 (0.012672) 0.293675 / 0.275898 (0.017777) 0.317775 / 0.323480 (-0.005705) 0.008119 / 0.007986 (0.000133) 0.005725 / 0.004328 (0.001396) 0.009200 / 0.004250 (0.004950) 0.044073 / 0.037052 (0.007020) 0.296481 / 0.258489 (0.037992) 0.326044 / 0.293841 (0.032203) 0.032503 / 0.128546 (-0.096044) 0.009868 / 0.075646 (-0.065778) 0.254130 / 0.419271 (-0.165141) 0.051485 / 0.043533 (0.007952) 0.291542 / 0.255139 (0.036403) 0.314107 / 0.283200 (0.030907) 0.110658 / 0.141683 (-0.031025) 1.789752 / 1.452155 (0.337597) 1.875796 / 1.492716 (0.383080)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.348329 / 0.018006 (0.330323) 0.557975 / 0.000490 (0.557486) 0.018654 / 0.000200 (0.018454) 0.000266 / 0.000054 (0.000211)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.026384 / 0.037411 (-0.011027) 0.103194 / 0.014526 (0.088668) 0.113749 / 0.176557 (-0.062807) 0.154846 / 0.737135 (-0.582289) 0.115176 / 0.296338 (-0.181163)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.410391 / 0.215209 (0.195182) 4.094843 / 2.077655 (2.017188) 1.741282 / 1.504120 (0.237162) 1.530703 / 1.541195 (-0.010492) 1.609429 / 1.468490 (0.140938) 0.437453 / 4.584777 (-4.147324) 4.668201 / 3.745712 (0.922489) 3.287047 / 5.269862 (-1.982814) 0.939732 / 4.565676 (-3.625945) 0.052849 / 0.424275 (-0.371426) 0.012426 / 0.007607 (0.004819) 0.514697 / 0.226044 (0.288653) 5.184383 / 2.268929 (2.915455) 2.186631 / 55.444624 (-53.257994) 1.810726 / 6.876477 (-5.065751) 1.956149 / 2.142072 (-0.185923) 0.554990 / 4.805227 (-4.250237) 0.122877 / 6.500664 (-6.377787) 0.061318 / 0.075469 (-0.014151)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.619420 / 1.841788 (-0.222368) 14.109523 / 8.074308 (6.035215) 26.500540 / 10.191392 (16.309148) 0.860438 / 0.680424 (0.180014) 0.538076 / 0.534201 (0.003875) 0.489040 / 0.579283 (-0.090243) 0.502381 / 0.434364 (0.068017) 0.318987 / 0.540337 (-0.221350) 0.333405 / 1.386936 (-1.053531)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.007888 / 0.011353 (-0.003465) 0.003971 / 0.011008 (-0.007037) 0.029327 / 0.038508 (-0.009181) 0.033915 / 0.023109 (0.010806) 0.328841 / 0.275898 (0.052943) 0.357993 / 0.323480 (0.034513) 0.006108 / 0.007986 (-0.001878) 0.005008 / 0.004328 (0.000679) 0.007362 / 0.004250 (0.003112) 0.038651 / 0.037052 (0.001598) 0.324381 / 0.258489 (0.065892) 0.347236 / 0.293841 (0.053395) 0.031130 / 0.128546 (-0.097416) 0.009944 / 0.075646 (-0.065702) 0.252135 / 0.419271 (-0.167136) 0.051235 / 0.043533 (0.007702) 0.309243 / 0.255139 (0.054105) 0.338301 / 0.283200 (0.055101) 0.093166 / 0.141683 (-0.048516) 1.786131 / 1.452155 (0.333976) 1.845540 / 1.492716 (0.352823)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.350352 / 0.018006 (0.332346) 0.527314 / 0.000490 (0.526825) 0.010947 / 0.000200 (0.010747) 0.000254 / 0.000054 (0.000199)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.024850 / 0.037411 (-0.012561) 0.101977 / 0.014526 (0.087451) 0.112808 / 0.176557 (-0.063749) 0.160037 / 0.737135 (-0.577099) 0.112395 / 0.296338 (-0.183944)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.420491 / 0.215209 (0.205282) 4.205801 / 2.077655 (2.128147) 1.854658 / 1.504120 (0.350538) 1.687250 / 1.541195 (0.146055) 1.767844 / 1.468490 (0.299354) 0.441332 / 4.584777 (-4.143445) 4.590075 / 3.745712 (0.844363) 2.135830 / 5.269862 (-3.134032) 0.920934 / 4.565676 (-3.644743) 0.053059 / 0.424275 (-0.371217) 0.011935 / 0.007607 (0.004328) 0.529626 / 0.226044 (0.303582) 5.291838 / 2.268929 (3.022909) 2.359930 / 55.444624 (-53.084694) 1.998791 / 6.876477 (-4.877686) 2.115553 / 2.142072 (-0.026520) 0.557891 / 4.805227 (-4.247336) 0.122289 / 6.500664 (-6.378375) 0.060890 / 0.075469 (-0.014580)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.621852 / 1.841788 (-0.219936) 13.814562 / 8.074308 (5.740253) 26.389433 / 10.191392 (16.198041) 0.864389 / 0.680424 (0.183965) 0.518357 / 0.534201 (-0.015844) 0.492834 / 0.579283 (-0.086450) 0.511540 / 0.434364 (0.077176) 0.323748 / 0.540337 (-0.216590) 0.338331 / 1.386936 (-1.048605)

CML watermark

Please sign in to comment.