Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add upload statistics to report #3615

Merged
merged 5 commits into from
Apr 9, 2024

Conversation

dbutenhof
Copy link
Member

@dbutenhof dbutenhof commented Apr 5, 2024

Present upload counts in various buckets, including "this year", "this month", "this week", "today" along with per year/month/day-of-month/day-of-week/hour-of-day just for fun.

Upload report:
  30,086 uploads this year (2024)
  62 uploads this month (April 2024)
  62 uploads this week (March 30 to April 06)
  9 uploads today (06 April 2024)
 Total uploads by year:
    2023:  100,497    2024:   30,086
 Total uploads by month of year:
    Jan:   17,084    Feb:   12,586    Mar:      354    Apr:       62
    Dec:  100,497
 Total uploads by day of month:
    01:       14    02:       68    03:       63    04:       11
    05:    3,848    06:    3,476    07:      758    08:   10,851
    09:    5,110    10:   21,787    11:    5,450    12:      192
    13:    8,351    14:   20,960    15:    3,714    16:    7,715
    17:    3,551    18:   10,381    19:       66    20:       47
    21:       96    22:    1,620    23:   11,200    24:   10,888
    25:       34    26:       58    27:       83    28:       50
    29:       51    30:       57    31:       33
 Total uploads by day of week:
    Mon:   19,395    Tue:      512    Wed:    8,590    Thu:   21,737
    Fri:   11,912    Sat:   32,570    Sun:   35,867
 Total uploads by hour of day:
    00:    7,747    01:    8,691    02:    6,794    03:    5,600
    04:    3,248    05:    4,591    06:    3,156    07:    3,051
    08:    2,998    09:    2,565    10:    1,635    11:    1,793
    12:    1,586    13:    3,337    14:    9,135    15:    5,347
    16:    5,740    17:    6,647    18:    9,018    19:    5,731
    20:    7,637    21:   10,244    22:    6,972    23:    7,320

Present upload counts in various buckets, including "this year", "this month",
"this week", "today" along with per year/month/day-of-month/hour just for fun.
@webbnh webbnh self-requested a review April 5, 2024 21:44
webbnh
webbnh previously approved these changes Apr 5, 2024
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great (although, the headers for the buckets could be more descriptive -- I fell into the trap of thinking that the "by day" buckets were for each of the last 31 days and that the "by hour" were for the last 24 hours, as opposed to, e.g., "the second hour of any day"...unfortunately, I don't have any good suggestions off the top of my head).

I do have a pointed question and a small 1+ for your consideration.

Comment on lines 368 to 370
click.echo(f" {this_month:,d} uploads this month ({month:%B, %Y})")
click.echo(f" {this_week:,d} uploads this week ({week:%B %d} to {day:%B %d})")
click.echo(f" {this_day:,d} uploads today ({day:%Y-%m-%d})")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to use %Y-%m-%d for "today" after using %B in the previous two lines? (And, do you really want the comma in %B, %Y?) I would opt for %d %B %Y.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did bounce back and forth on the exact format for several. Basically I was doing this on half the burners in between bouts of wrestling with Horreum and Kiota.

Comment on lines 311 to 317
def columnize(items: dict[str, Any], width: int = 80):
line = ""
for item, count in sorted(items.items()):
if len(line) >= width:
click.echo(line)
line = ""
line += f" {item:4d}: {count:>8,d}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the code adds to the line in chunks of 18 characters (I think...), the test at line 314 could produce a line which is "too long" (i.e., 90 characters, by default). [I don't know why this doesn't happen in the same output in the description...did you specify a smaller width for that?]

I would recommend something like this:

    line = ""
    for item, count in sorted(items.items()):
        addition = f"    {item:4d}: {count:>8,d}"
        if len(line) + len(addition) >= width:
            click.echo(line)
            line = ""
        line += addition

or

    line = ""
    for item, count in sorted(items.items()):
        addition = f"    {item:4d}: {count:>8,d}"
        if len(line) + len(addition) >= width:
            click.echo(line)
            line = addition
        else:
            line += addition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I threw in the columnization at the last minute because the lists were rather long: you're right, it's not entirely accurate as is, but I didn't spend too many cycles worrying about it. However, I guess it's worth cleaning up.

(And, yeah, I think I generated the sample output with --width 70 to make sure GitHub didn't wrap it. 😆 )

Copy link
Member Author

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized after pushing the PR that I'd meant to also touch up the feedback (and add a "show" indexing state) on reindex after our ops review experience, so this is due another pass anyway this weekend... and I'd've maybe felt slightly bad about that if you'd approved without comments.

Comment on lines 368 to 370
click.echo(f" {this_month:,d} uploads this month ({month:%B, %Y})")
click.echo(f" {this_week:,d} uploads this week ({week:%B %d} to {day:%B %d})")
click.echo(f" {this_day:,d} uploads today ({day:%Y-%m-%d})")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did bounce back and forth on the exact format for several. Basically I was doing this on half the burners in between bouts of wrestling with Horreum and Kiota.

Comment on lines 311 to 317
def columnize(items: dict[str, Any], width: int = 80):
line = ""
for item, count in sorted(items.items()):
if len(line) >= width:
click.echo(line)
line = ""
line += f" {item:4d}: {count:>8,d}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I threw in the columnization at the last minute because the lists were rather long: you're right, it's not entirely accurate as is, but I didn't spend too many cycles worrying about it. However, I guess it's worth cleaning up.

(And, yeah, I think I generated the sample output with --width 70 to make sure GitHub didn't wrap it. 😆 )

@dbutenhof
Copy link
Member Author

On a whim, I split the dataset history statistics to be able to show by dataset.metalog.pbench.date rather than just by the primary internal dataset.uploaded:

Dataset statistics by creation date:
  1,637 this year (2024)
  71 this month (April 2024)
  71 this week (March 31 to April 07)
  9 today (07 April 2024)
 Total by year:
    2019:    8,055    2020:   42,817    2021:   21,328    2022:   32,523
    2023:   23,146    2024:    1,637
 Total by month of year:
    Jan:   12,659    Feb:    7,472    Mar:    7,208    Apr:    8,624
    May:    7,511    Jun:    6,927    Jul:    6,254    Aug:    8,777
    Sep:   11,910    Oct:   14,346    Nov:   11,988    Dec:   25,830
 Total by day of month:
    01:    4,623    02:    4,214    03:    4,634    04:    4,493
    05:    4,265    06:    3,295    07:    2,906    08:    3,357
    09:    3,891    10:    2,646    11:    2,960    12:    3,617
    13:    4,608    14:    3,638    15:    2,713    16:    4,505
    17:    3,556    18:    4,349    19:    3,014    20:    4,235
    21:    4,944    22:    4,912    23:    5,792    24:    5,629
    25:    5,175    26:    5,271    27:    4,660    28:    4,781
    29:    4,468    30:    5,208    31:    3,147
 Total by day of week:
    Mon:   19,896    Tue:   19,504    Wed:   24,110    Thu:   18,946
    Fri:   23,197    Sat:   13,790    Sun:   10,063
 Total by hour of day:
    00:    5,355    01:    4,537    02:    5,607    03:    5,975
    04:    6,499    05:    5,306    06:    3,844    07:    4,076
    08:    4,269    09:    5,357    10:    5,481    11:    6,141
    12:    6,411    13:    5,991    14:    5,391    15:    6,015
    16:    5,611    17:    5,531    18:    5,290    19:    5,664
    20:    5,793    21:    5,694    22:    4,665    23:    5,003

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks excellent.

@dbutenhof dbutenhof merged commit 9c898cb into distributed-system-analysis:main Apr 9, 2024
4 checks passed
@dbutenhof dbutenhof deleted the uploads branch April 9, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants