You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to preface this with: If I missed a contributor guideline or anything, please let me know. I did check other issues and did not see one relevant to this.
I am somewhat new to using rich-cli (but am familiar with rich) and recently attempted to parse a somewhat large CSV file (~119Mb, 483k lines).
I did not expect the whole CSV to load quickly, but I was somewhat surprised that running --head and --tail took as long as they did. Obviously they won't behave like GNU tail / head, but I took a jab at a minimal / naive change to this and was able to get it much faster. It's around this here
if you want I am happy to open a PR. I'll also just put a code block of what I did. I did take the somewhat naive approach to file parsing (rather than parsing the buffer stream per line, which would be more efficient for tail) to avoid making a huge change.
head is just using the existing generator to parse out x rows and filtering out None values. Since the list gets iterated ~ twice, this means the second iteration that adds indexes is also way faster.
tail is using a collections.deque example recipe (which, while still going through the whole file, does not store the whole file in memory).
rows = iter(reader)
if has_header:
header = next(rows)
for column in header:
table.add_column(column)
if head is not None:
table_rows = list(
filter(
None,
(next(rows, None) for _ in range(head)),
)
)
elif tail is not None:
table_rows = deque(rows, tail)
else:
table_rows = list(rows)
These are naive benchmarks, but comparing the two (where rich command is the install CLI, and python3 ./src/rich_cli having my changes:
Head
└> time python3 ./src/rich_cli --head 500 large_csv.csv &> /dev/null [👾 3.10.5]➜
python3 ./src/rich_cli --head 500 large_csv.csv &> /dev/null 0.83s user 0.47s system 94% cpu 1.369 total
└> time rich --head 500 large_csv.csv &> /dev/null [👾 3.10.5]➜
rich --head 500 large_csv.csv &> /dev/null 2.81s user 0.60s system 99% cpu 3.443 total
Tail
└> time rich --tail 500 large_csv.csv &> /dev/null [👾 3.10.5]➜
rich --tail 500 large_csv.csv &> /dev/null 2.95s user 0.63s system 99% cpu 3.604 total
└> time python3 ./src/rich_cli --tail 500 large_csv.csv &> /dev/null [👾 3.10.5]➜
python3 ./src/rich_cli --tail 500 large_csv.csv &> /dev/null 1.93s user 0.53s system 96% cpu 2.545 total
Anyway, let me know if you want me to do anything here!
The text was updated successfully, but these errors were encountered:
Howdy -
I wanted to preface this with: If I missed a contributor guideline or anything, please let me know. I did check other issues and did not see one relevant to this.
I am somewhat new to using
rich-cli
(but am familiar withrich
) and recently attempted to parse a somewhat large CSV file (~119Mb, 483k lines).I did not expect the whole CSV to load quickly, but I was somewhat surprised that running
--head
and--tail
took as long as they did. Obviously they won't behave like GNUtail
/head
, but I took a jab at a minimal / naive change to this and was able to get it much faster. It's around this hereif you want I am happy to open a PR. I'll also just put a code block of what I did. I did take the somewhat naive approach to file parsing (rather than parsing the buffer stream per line, which would be more efficient for tail) to avoid making a huge change.
head is just using the existing generator to parse out x rows and filtering out None values. Since the list gets iterated ~ twice, this means the second iteration that adds indexes is also way faster.
tail is using a collections.deque example recipe (which, while still going through the whole file, does not store the whole file in memory).
These are naive benchmarks, but comparing the two (where
rich
command is the install CLI, andpython3 ./src/rich_cli
having my changes:Head
Tail
Anyway, let me know if you want me to do anything here!
The text was updated successfully, but these errors were encountered: