Skip to content

Commit

Permalink
Add Returns to docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
mariosasko committed Sep 8, 2022
1 parent add20c2 commit 0af7b2c
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions src/datasets/arrow_dataset.py
Expand Up @@ -954,6 +954,9 @@ def from_generator(
features (:class:`Features`, optional): Dataset features.
gen_kwargs(:obj:`dict`, optional): Keyword arguments to be passed to the `generator` callable.
Returns:
:class:`Dataset`
Example:
```py
Expand Down

1 comment on commit 0af7b2c

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010158 / 0.011353 (-0.001195) 0.004680 / 0.011008 (-0.006328) 0.034636 / 0.038508 (-0.003872) 0.039050 / 0.023109 (0.015941) 0.362976 / 0.275898 (0.087078) 0.428579 / 0.323480 (0.105099) 0.006879 / 0.007986 (-0.001107) 0.005296 / 0.004328 (0.000968) 0.007600 / 0.004250 (0.003350) 0.049361 / 0.037052 (0.012308) 0.379406 / 0.258489 (0.120917) 0.421302 / 0.293841 (0.127461) 0.046039 / 0.128546 (-0.082507) 0.013602 / 0.075646 (-0.062044) 0.309126 / 0.419271 (-0.110146) 0.062652 / 0.043533 (0.019119) 0.356206 / 0.255139 (0.101067) 0.402725 / 0.283200 (0.119525) 0.108689 / 0.141683 (-0.032994) 1.718005 / 1.452155 (0.265851) 1.782088 / 1.492716 (0.289372)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.253772 / 0.018006 (0.235766) 0.543209 / 0.000490 (0.542719) 0.000990 / 0.000200 (0.000790) 0.000125 / 0.000054 (0.000070)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.023884 / 0.037411 (-0.013527) 0.113472 / 0.014526 (0.098946) 0.127187 / 0.176557 (-0.049370) 0.173903 / 0.737135 (-0.563232) 0.137874 / 0.296338 (-0.158465)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.564929 / 0.215209 (0.349720) 5.563495 / 2.077655 (3.485840) 2.317450 / 1.504120 (0.813330) 1.986362 / 1.541195 (0.445167) 2.033033 / 1.468490 (0.564543) 0.705249 / 4.584777 (-3.879528) 5.274099 / 3.745712 (1.528387) 2.718179 / 5.269862 (-2.551683) 1.754918 / 4.565676 (-2.810759) 0.079348 / 0.424275 (-0.344927) 0.012740 / 0.007607 (0.005133) 0.709568 / 0.226044 (0.483523) 7.014714 / 2.268929 (4.745785) 2.873503 / 55.444624 (-52.571121) 2.236166 / 6.876477 (-4.640310) 2.276243 / 2.142072 (0.134171) 0.862797 / 4.805227 (-3.942430) 0.176612 / 6.500664 (-6.324052) 0.072429 / 0.075469 (-0.003040)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.784223 / 1.841788 (-0.057565) 15.400803 / 8.074308 (7.326494) 40.236512 / 10.191392 (30.045119) 1.086415 / 0.680424 (0.405991) 0.666425 / 0.534201 (0.132225) 0.467646 / 0.579283 (-0.111637) 0.596657 / 0.434364 (0.162293) 0.352749 / 0.540337 (-0.187589) 0.342851 / 1.386936 (-1.044085)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.007189 / 0.011353 (-0.004163) 0.004542 / 0.011008 (-0.006466) 0.033188 / 0.038508 (-0.005320) 0.035613 / 0.023109 (0.012504) 0.391509 / 0.275898 (0.115611) 0.465214 / 0.323480 (0.141734) 0.005128 / 0.007986 (-0.002858) 0.004147 / 0.004328 (-0.000182) 0.005841 / 0.004250 (0.001591) 0.039401 / 0.037052 (0.002348) 0.401389 / 0.258489 (0.142900) 0.450032 / 0.293841 (0.156191) 0.045515 / 0.128546 (-0.083032) 0.015889 / 0.075646 (-0.059757) 0.326221 / 0.419271 (-0.093051) 0.065217 / 0.043533 (0.021684) 0.428764 / 0.255139 (0.173625) 0.423847 / 0.283200 (0.140648) 0.109218 / 0.141683 (-0.032465) 1.611480 / 1.452155 (0.159325) 1.699154 / 1.492716 (0.206438)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.248130 / 0.018006 (0.230124) 0.526897 / 0.000490 (0.526407) 0.009197 / 0.000200 (0.008998) 0.000099 / 0.000054 (0.000045)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.022674 / 0.037411 (-0.014737) 0.101164 / 0.014526 (0.086639) 0.132638 / 0.176557 (-0.043919) 0.164286 / 0.737135 (-0.572850) 0.130747 / 0.296338 (-0.165591)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.576544 / 0.215209 (0.361335) 5.821690 / 2.077655 (3.744035) 2.545063 / 1.504120 (1.040943) 2.096013 / 1.541195 (0.554818) 2.150575 / 1.468490 (0.682085) 0.706312 / 4.584777 (-3.878464) 5.319314 / 3.745712 (1.573602) 4.440959 / 5.269862 (-0.828902) 2.617891 / 4.565676 (-1.947786) 0.076898 / 0.424275 (-0.347377) 0.012043 / 0.007607 (0.004436) 0.732763 / 0.226044 (0.506719) 7.291433 / 2.268929 (5.022505) 3.271374 / 55.444624 (-52.173250) 2.562230 / 6.876477 (-4.314247) 2.692172 / 2.142072 (0.550099) 0.855033 / 4.805227 (-3.950195) 0.166966 / 6.500664 (-6.333698) 0.071979 / 0.075469 (-0.003490)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.756062 / 1.841788 (-0.085726) 15.266098 / 8.074308 (7.191790) 38.919009 / 10.191392 (28.727617) 1.168453 / 0.680424 (0.488029) 0.751568 / 0.534201 (0.217367) 0.491083 / 0.579283 (-0.088200) 0.581605 / 0.434364 (0.147241) 0.353618 / 0.540337 (-0.186720) 0.378665 / 1.386936 (-1.008271)

CML watermark

Please sign in to comment.