Skip to content

Releases: pola-rs/r-polars

v0.16.4

08 May 15:41
Compare
Choose a tag to compare

New features

  • pl$read_ipc() can read a raw vector of Apache Arrow IPC file (#1072).
  • New method <DataFrame>$to_raw_ipc() to serialize a DataFrame to a raw vector of Apache Arrow IPC file format (#1072).
  • New method <LazyFrame>$serialize() to serialize a LazyFrame to a character vector of JSON representation (#1073).
  • New function pl$deserialize_lf() to deserialize a LazyFrame from a character vector of JSON representation (#1073).
  • New methods $str$head() and $str$tail() (#1074).
  • New S3 methods nanoarrow::as_nanoarrow_array_stream() and nanoarrow::infer_nanoarrow_schema() for RPolarsSeries (#1076).
  • New method $dt$is_leap_year() (#1077).
  • as_polars_df() and as_polars_series() supports arrow::RecordBatchReader (#1078).
  • The new experimental argument for as_polars_df(<ArrowTabular>), as_polars_df(<RecordBatchReader>), as_polars_series(<nanoarrow_array_stream>), and as_polars_df(<nanoarrow_array_stream>) (#1078).
    If experimental = TRUE, these functions switch to use the Arrow C stream interface internally.
    At this point, the performance is degraded under the expected use cases, so the default is set to experimental = FALSE.

Full Changelog: v0.16.3...v0.16.4

lib-v0.39.3

08 May 14:31
b42ee0a
Compare
Choose a tag to compare
lib-v0.39.3 Pre-release
Pre-release
feat: import_stream internal method for Series to support Arrow C str…

…eam interface (#1078)

v0.16.3

03 May 05:31
Compare
Choose a tag to compare

New features

  • New method <SQLContext>$register_globals() (#1064).
  • New experimental method $sql() for DataFrame and LazyFrame (#1065).

Miscellaneous

  • Move the API document website to the new place (#1067, #1068).
    Access to the old website is set to redirect to the top page of the new website.
    • Old URL: https://rpolars.github.io/
    • New URL: https://pola-rs.github.io/r-polars/

Full Changelog: v0.16.2...v0.16.3

v0.16.2

27 Apr 05:57
Compare
Choose a tag to compare

New features

  • $cut() and $qcut() to bin continuous values into discrete categories (#1057).
  • pl$scan_parquet() and pl$read_parquet() can read data from the internet by specifying a URL to the first argument (#1056, @andyquinterom).
  • pl$scan_parquet() and pl$read_parquet() gain an argument storage_options to scan/read data via cloud storage providers (GCP, AWS, Azure). Note that this support is experimental (#1056, @andyquinterom).
  • Add support for the Enum datatype via pl$Enum() (#1061).

Bug fixes

  • In some read/scan functions, downloading files could fail if the URL was too long. This is now fixed (#1049, @DyfanJones).

New Contributors

Full Changelog: v0.16.1...v0.16.2

lib-v0.39.2

27 Apr 04:27
4a81740
Compare
Choose a tag to compare
lib-v0.39.2 Pre-release
Pre-release
ci: exclude R devel on windows from binary library check step (#1062)

v0.16.1

16 Apr 13:21
Compare
Choose a tag to compare

This is a small hot-fix release to update dependent Rust polars to 0.39.1 (#1042).

Also, there are some updates.

Bug fixes

  • $len() now correctly includes null values in the count (#1044).

Other improvements

  • $arr$max() and $arr$min() work without the nightly feature (#1042).

Full Changelog: v0.16.0...v0.16.1

lib-v0.39.1

16 Apr 11:47
bb234ed
Compare
Choose a tag to compare
lib-v0.39.1 Pre-release
Pre-release
fix: `$len()` should also count `null` values (#1044)

v0.16.0

15 Apr 14:08
Compare
Choose a tag to compare

Breaking changes

  • Rust polars is updated to 0.39.0 (#937, #1034).

  • R objects inside an R list are now converted to Polars data types via
    as_polars_series() (#1021, #1022, #1023). For example, up to polars 0.15.1,
    a list containing a data.frame with a column of {clock} naive-time class
    was converted to a nested List type of Float64:

    data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day"))
    pl$select(
      nested_data = pl$lit(list(data))
    )
    #> shape: (1, 1)
    #> ┌──────────────────────────┐
    #> │ nested_data              │
    #> │ ---                      │
    #> │ list[list[list[f64]]]    │
    #> ╞══════════════════════════╡
    #> │ [[[2.1475e9], [7305.0]]] │
    #> └──────────────────────────┘

    From 0.16.0, nested types are correctly converted, so that will be
    a List type of Struct type containing a Datetime type.

    data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day"))
    pl$select(
      nested_data = pl$lit(list(data))
    )
    #> shape: (1, 1)
    #> ┌─────────────────────────┐
    #> │ nested_data             │
    #> │ ---                     │
    #> │ list[struct[1]]         │
    #> ╞═════════════════════════╡
    #> │ [{1990-01-01 00:00:00}] │
    #> └─────────────────────────┘
  • Several functions have been rewritten to match the behavior of Python Polars.
    There are four types of changes: i) change in argument names, ii) change in
    the way arguments are passed (named or by position), iii) arguments are removed,
    and iv) change in the default and accepted values. Those are addressed separately
    below.

    1. Change in argument names:

      • In $reshape(), the dims argument is renamed to dimensions (#1019).
      • In pl$read_* and pl$scan_* functions, the first argument is now
        source (#935).
      • In pl$Series(), the argument x is renamed values (#933).
      • In <DataFrame>$write_* functions, the first argument is now file (#935).
      • In <LazyFrame>$sink_* functions, the first argument is now path (#935).
      • In <LazyFrame>$sink_ipc(), the argument memmap is renamed to memory_map (#1032).
      • In <DataFrame>$rolling(), <LazyFrame>$rolling(), <DataFrame>$group_by_dynamic()
        and <LazyFrame>$group_by_dynamic(), the by argument is renamed to
        group_by (#983).
      • In $dt$convert_time_zone() and $dt$replace_time_zone(), the tz
        argument is renamed to time_zone (#944).
      • In $str$strptime(), the argument datatype is renamed to dtype (#939).
      • In $str$to_integer() (renamed from $str$parse_int()), argument radix is
        renamed to base (#1038).
    2. Change in the way arguments are passed:

      • In all input/output functions, all arguments except the first argument
        must be named arguments (#935).

      • In <DataFrame>$rolling() and <DataFrame>$group_by_dynamic(), all
        arguments except index_column must be named arguments (#983).

      • In $unique() for DataFrame and LazyFrame, arguments keep and
        maintain_order must be named (#953).

      • In $bin$decode(), the strict argument must be a named argument (#980).

      • In $dt$replace_time_zone(), all arguments except time_zone must be named
        arguments (#944).

      • In $str$contains(), the arguments literal and strict must be named
        (#982).

      • In $str$contains_any(), the ascii_case_insensitive argument must be
        named (#986).

      • In $str$count_matches(), $str$replace() and $str$replace_all(),
        the literal argument must be named (#987).

      • In $str$strptime(), $str$to_date(), $str$to_datetime(), and
        $str$to_time(), all arguments (except the first one) must be named (#939).

      • In $str$to_integer() (renamed from $str$parse_int()), all arguments
        must be named (#1038).

      • In pl$date_range(), the arguments closed, time_unit, and time_zone
        must be named (#950).

      • In $set_sorted() and $sort_by(), argument descending must be named
        (#1034).

      • In pl$Series(), using positional arguments throws a warning, since the
        argument positions will be changed in the future (#966).

        # polars 0.15.1 or earlier
        # The first argument is `x`, the second argument is `name`.
        pl$Series(1:3, "foo")
        
        # The code above will warn in 0.16.0
        # Use named arguments to silence the warning.
        pl$Series(values = 1:3, name = "foo")
        pl$Series(name = "foo", values = 1:3)
        
        # polars 0.17.0 or later (future version)
        # The first argument is `name`, the second argument is `values`.
        pl$Series("foo", 1:3)

        This warning can also be silenced by replacing pl$Series(<values>, <name>)
        by as_polars_series(<values>, <name>).

    3. Arguments removed:

      • The argument columns in $drop() is removed. $drop() now accepts
        several character scalars, such as $drop("a", "b", "c") (#912).
      • In pl$col(), the name argument is removed, and the ... argument no
        longer accepts a list of characters and RPolarsSeries class objects (#923).
      • In pl$date_range(), the unused argument (not working in recent versions)
        explode is removed. (#950).
    4. Change in arguments default and accepted values:

      • In pl$Series(), the argument values has a new default value NULL
        (#966).
      • In $unique() for DataFrame and LazyFrame, argument keep has a new
        default value "any" (#953).
      • In rolling aggregation functions (such as $rolling_mean()), the default
        value of argument closed now is NULL. Using closed with a fixed
        window_size now throws an error (#937).
      • In pl$date_range(), the argument end must be specified and the default
        value of interval is changed to "1d". The arguments start and end
        no longer accept numeric values (#950).
      • In pl$scan_parquet(), the default value of the argument rechunk is
        changed from TRUE to FALSE (#1033).
      • In pl$scan_parquet() and pl$read_parquet(), the argument parallel
        only accepts "auto", "columns", "row_groups", and "none".
        Previously, it also accepted upper-case notation of "auto", "columns",
        "none", and "RowGroups" instead of "row_groups" (#1033).
      • In $str$to_integer() (renamed from $str$parse_int()), the default
        value of base is changed from 2 to 10 (#1038).
  • The usage of pl$date_range() to create a range of Datetime data type is
    deprecated. pl$date_range() will always create a range of Date data type
    in the future. Use pl$datetime_range() if you want to create a range of
    Datetime instead (#950).

  • <DataFrame>$get_columns() now returns an unnamed list instead of a named
    list (#991).

  • Removed $argsort() which was an old alias for $arg_sort() (#930).

  • Removed pl$expr_to_r() which was an alias for $to_r() (#938).

  • <Series>$to_r_list() is renamed <Series>$to_list() (#938).

  • Removed <Series>$to_r_vector() which was an old alias for
    <Series>$to_vector() (#938).

  • Removed <Expr>$rep_extend(), which was an experimental method created at the
    early stage of this package and does not exist in other language APIs (#1028).

  • The following deprecated functions are now removed: pl$threadpool_size(),
    <DataFrame>$with_row_count(), <LazyFrame>$with_row_count() (#965).

  • In $group_by_dynamic(), the first datapoint is always preserved (#1034).

  • $str$parse_int() is renamed to $str$to_integer() (#1038).

New features

  • New functions:

    • pl$arg_sort_by() (#929).
    • pl$arg_where() to get the indices that match a condition (#922).
    • pl$datetime(), pl$date(), and pl$time() to easily create Expr of class
      datetime, date, and time via columns and literals (#918).
    • pl$datetime_range(), pl$date_ranges() and pl$datetime_ranges() (#950, #962).
    • pl$int_range() and pl$int_ranges() (#968)
    • pl$mean_horizontal() (#959)
    • pl$read_ipc() (#1033).
    • is_polars_dtype() (#927).
  • New methods:

    • <LazyFrame>$to_dot() to print the query plan of a LazyFrame with graphviz
      dot syntax (#928).
    • $clear() for DataFrame, LazyFrame, and Series (#1004).
    • $item() for DataFrame and Series (#992).
    • $select_seq() and $with_columns_seq() for DataFrame and LazyFrame
      (#1003).
    • $arr$to_list() (#1018).
    • $str$extract_groups() (#979).
    • $str$find() (#985).
    • <DataFrame>$write_ipc() (#1032).
    • RPolarsDataType gains several methods to check the datatype, such as
      $is_integer(), $is_null() or $is_list() (#1036).
  • New arguments or argument values:

    • ambiguous can now take the value "null" to convert ambigous datetimes to
      null values (#937).
    • n in $str$replace() (#987).
    • non_existent in $dt$replace_time_zone() to specify what should happen
      when a datetime doesn't exist.
    • mapping_strategy in $over() (#984, #988).
    • raise_if_undetermined in $meta$output_name() (#961).
    • null_on_oob in $arr$get() and $list$get() to determine what happens
      when the index is out of bounds (#1034).
    • nulls_last, multithreaded, and maintain_order in $sort_by() (#1034).
  • Other:

    • pl$Series() now calls as_polars_series() internally, so it can convert
      more classes to Series properly (#1015).
    • Export the Duration datatype (#955).
    • New active binding <Series>$struct$fields (#1002).
    • All $write_*() and $sink_*() functions now invisibly return the input
      data (#1039).

Bug fixes

  • The join_nulls and ...
Read more

lib-v0.39.0

15 Apr 12:21
7cbffaf
Compare
Choose a tag to compare
lib-v0.39.0 Pre-release
Pre-release
refactor!: `$str$parse_int()` -> `$str$to_integer()` (#1038)

Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>

v0.15.1

11 Mar 15:16
Compare
Choose a tag to compare

New features

  • rust-polars is updated to 0.38.2 (#907).
    • Minimum supported Rust version (MSRV) is now 1.76.0.
  • as_polars_df(<nanoarrow_array>) is added (#893).
  • It is now possible to create an empty DataFrame with a specific schema with pl$DataFrame(schema = my_schema) (#901).
  • New arguments dtype and nan_to_null for pl$Series() (#902).
  • New method <DataFrame>$partition_by() (#898).

Bug fixes

  • The default value of the format of $str$strptime() is now correctly set (#892).

Other improvements

  • Performance of as_polars_df(<nanoarrow_array_stream>) is improved (#896).

Full Changelog: v0.15.0...v0.15.1