Skip to content

Commit

Permalink
remove tmp lines
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Oct 14, 2022
1 parent 30059ed commit 0600ef9
Showing 1 changed file with 0 additions and 2 deletions.
2 changes: 0 additions & 2 deletions tests/test_table.py
Expand Up @@ -1137,8 +1137,6 @@ def test_table_iter(pa_table, batch_size, drop_last_batch):
subtables = list(table_iter(pa_table, batch_size=batch_size, drop_last_batch=drop_last_batch))
assert len(subtables) == num_batches
if drop_last_batch:
if not all(len(subtable) == batch_size for subtable in subtables):
raise ArithmeticError([len(subtable) for subtable in subtables])
assert all(len(subtable) == batch_size for subtable in subtables)
else:
assert all(len(subtable) == batch_size for subtable in subtables[:-1])
Expand Down

1 comment on commit 0600ef9

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.011105 / 0.011353 (-0.000248) 0.006029 / 0.011008 (-0.004979) 0.116818 / 0.038508 (0.078310) 0.042870 / 0.023109 (0.019761) 0.351826 / 0.275898 (0.075928) 0.426103 / 0.323480 (0.102624) 0.009409 / 0.007986 (0.001424) 0.004664 / 0.004328 (0.000336) 0.087663 / 0.004250 (0.083412) 0.050847 / 0.037052 (0.013794) 0.360074 / 0.258489 (0.101585) 0.390900 / 0.293841 (0.097059) 0.050161 / 0.128546 (-0.078386) 0.017944 / 0.075646 (-0.057702) 0.405933 / 0.419271 (-0.013338) 0.059608 / 0.043533 (0.016075) 0.351279 / 0.255139 (0.096140) 0.383309 / 0.283200 (0.100109) 0.121810 / 0.141683 (-0.019873) 1.755772 / 1.452155 (0.303617) 1.784157 / 1.492716 (0.291441)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.014678 / 0.018006 (-0.003328) 0.520131 / 0.000490 (0.519641) 0.002878 / 0.000200 (0.002678) 0.000088 / 0.000054 (0.000034)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.026046 / 0.037411 (-0.011366) 0.112058 / 0.014526 (0.097532) 0.124778 / 0.176557 (-0.051778) 0.173532 / 0.737135 (-0.563603) 0.133165 / 0.296338 (-0.163174)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.468009 / 0.215209 (0.252800) 4.681071 / 2.077655 (2.603416) 2.125818 / 1.504120 (0.621698) 1.920853 / 1.541195 (0.379658) 1.959476 / 1.468490 (0.490986) 0.822643 / 4.584777 (-3.762134) 4.535246 / 3.745712 (0.789533) 2.429239 / 5.269862 (-2.840623) 1.618314 / 4.565676 (-2.947363) 0.099496 / 0.424275 (-0.324779) 0.013946 / 0.007607 (0.006339) 0.593523 / 0.226044 (0.367479) 5.918952 / 2.268929 (3.650023) 2.644030 / 55.444624 (-52.800594) 2.264166 / 6.876477 (-4.612310) 2.423173 / 2.142072 (0.281100) 1.027460 / 4.805227 (-3.777768) 0.197666 / 6.500664 (-6.302998) 0.073508 / 0.075469 (-0.001961)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.795311 / 1.841788 (-0.046477) 15.826208 / 8.074308 (7.751900) 28.257505 / 10.191392 (18.066113) 1.071618 / 0.680424 (0.391195) 0.691302 / 0.534201 (0.157101) 0.512495 / 0.579283 (-0.066788) 0.499386 / 0.434364 (0.065022) 0.317789 / 0.540337 (-0.222549) 0.329785 / 1.386936 (-1.057151)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.008495 / 0.011353 (-0.002858) 0.005557 / 0.011008 (-0.005451) 0.110483 / 0.038508 (0.071975) 0.039028 / 0.023109 (0.015919) 0.430067 / 0.275898 (0.154169) 0.483976 / 0.323480 (0.160496) 0.006484 / 0.007986 (-0.001501) 0.005604 / 0.004328 (0.001276) 0.082791 / 0.004250 (0.078540) 0.046221 / 0.037052 (0.009169) 0.421777 / 0.258489 (0.163288) 0.488596 / 0.293841 (0.194756) 0.042661 / 0.128546 (-0.085885) 0.014216 / 0.075646 (-0.061430) 0.390227 / 0.419271 (-0.029044) 0.057141 / 0.043533 (0.013608) 0.441058 / 0.255139 (0.185919) 0.466946 / 0.283200 (0.183746) 0.112466 / 0.141683 (-0.029217) 1.715952 / 1.452155 (0.263797) 1.809031 / 1.492716 (0.316314)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.242313 / 0.018006 (0.224307) 0.490172 / 0.000490 (0.489683) 0.001266 / 0.000200 (0.001066) 0.000098 / 0.000054 (0.000043)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.027107 / 0.037411 (-0.010304) 0.116589 / 0.014526 (0.102063) 0.121985 / 0.176557 (-0.054572) 0.177577 / 0.737135 (-0.559558) 0.128697 / 0.296338 (-0.167642)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.524268 / 0.215209 (0.309059) 5.168490 / 2.077655 (3.090835) 2.591592 / 1.504120 (1.087472) 2.344249 / 1.541195 (0.803054) 2.459630 / 1.468490 (0.991140) 0.827247 / 4.584777 (-3.757530) 4.396194 / 3.745712 (0.650482) 3.934257 / 5.269862 (-1.335604) 2.064447 / 4.565676 (-2.501230) 0.098117 / 0.424275 (-0.326158) 0.014166 / 0.007607 (0.006559) 0.628293 / 0.226044 (0.402248) 6.236363 / 2.268929 (3.967435) 3.146625 / 55.444624 (-52.297999) 2.782781 / 6.876477 (-4.093696) 2.957148 / 2.142072 (0.815076) 0.993627 / 4.805227 (-3.811600) 0.195482 / 6.500664 (-6.305182) 0.073061 / 0.075469 (-0.002408)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.814215 / 1.841788 (-0.027572) 15.964057 / 8.074308 (7.889749) 13.565457 / 10.191392 (3.374065) 1.043885 / 0.680424 (0.363461) 0.693415 / 0.534201 (0.159214) 0.490421 / 0.579283 (-0.088862) 0.481954 / 0.434364 (0.047590) 0.292210 / 0.540337 (-0.248127) 0.297582 / 1.386936 (-1.089354)

CML watermark

Please sign in to comment.