Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of transpose_list() #396

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Commits on Jul 14, 2022

  1. Improve performance of transpose_list()

    This improves the runtime significantly for loading data with many
    columns. The order of loop nesting as well as a much more efficient
    binary search does the trick.
    
    In a real world example, fetching ~300k rows with ~50 columns from
    MongoDB, this brings the query + load time from 70 seconds to ~40.
    
    Microbenchmark with synthetic data on an AMD 5950X, 128GB RAM, Fedora
    Linux 36, R 4.1.3, jsonlite 1.8.0.9000 commit 8085435
    
    ```
    > set.seed(1)
    > rows <- 10000
    > columns <- 100
    > p_missing <- 0.2
    >
    > recordlist <- lapply(1:rows, function(rownum) {
    +   row <- as.list(1:columns)
    +   names(row) <- paste0("col_", row)
    +   row[runif(columns) > p_missing]
    + })
    > columns <- unique(unlist(lapply(recordlist, names), recursive = FALSE,
    +                          use.names = FALSE))
    ```
    
    Before this change
    
    ```
    > microbenchmark::microbenchmark(
    +     jsonlite:::transpose_list(recordlist, columns),
    +     times = 10
    + )
    Unit: milliseconds
                                               expr      min       lq     mean   median       uq      max neval
     jsonlite:::transpose_list(recordlist, columns) 577.8338 589.4064 593.0518 591.6895 599.4221 607.3057    10
    ```
    
    With this change
    
    ```
    > microbenchmark::microbenchmark(
    +     jsonlite:::transpose_list(recordlist, columns),
    +     times = 10
    + )
    Unit: milliseconds
                                               expr      min       lq     mean   median       uq      max neval
     jsonlite:::transpose_list(recordlist, columns) 41.37537 43.22655 43.88987 43.76705 45.43552 46.81052    10
    ```
    halhen committed Jul 14, 2022
    Configuration menu
    Copy the full SHA
    02e3409 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2022

  1. Protect from edge case infinite loop

    If a name exists in the data, sorted less than the smallest being
    requested, the previous code would end up in an infinite loop.
    halhen committed Jul 17, 2022
    Configuration menu
    Copy the full SHA
    8b467f6 View commit details
    Browse the repository at this point in the history
  2. Fix typo

    halhen committed Jul 17, 2022
    Configuration menu
    Copy the full SHA
    5a9331e View commit details
    Browse the repository at this point in the history