Releases: pola-rs/r-polars
v0.16.4
New features
pl$read_ipc()
can read a raw vector of Apache Arrow IPC file (#1072).- New method
<DataFrame>$to_raw_ipc()
to serialize a DataFrame to a raw vector of Apache Arrow IPC file format (#1072). - New method
<LazyFrame>$serialize()
to serialize a LazyFrame to a character vector of JSON representation (#1073). - New function
pl$deserialize_lf()
to deserialize a LazyFrame from a character vector of JSON representation (#1073). - New methods
$str$head()
and$str$tail()
(#1074). - New S3 methods
nanoarrow::as_nanoarrow_array_stream()
andnanoarrow::infer_nanoarrow_schema()
forRPolarsSeries
(#1076). - New method
$dt$is_leap_year()
(#1077). as_polars_df()
andas_polars_series()
supportsarrow::RecordBatchReader
(#1078).- The new
experimental
argument foras_polars_df(<ArrowTabular>)
,as_polars_df(<RecordBatchReader>)
,as_polars_series(<nanoarrow_array_stream>)
, andas_polars_df(<nanoarrow_array_stream>)
(#1078).
Ifexperimental = TRUE
, these functions switch to use the Arrow C stream interface internally.
At this point, the performance is degraded under the expected use cases, so the default is set toexperimental = FALSE
.
Full Changelog: v0.16.3...v0.16.4
lib-v0.39.3
feat: import_stream internal method for Series to support Arrow C str… …eam interface (#1078)
v0.16.3
New features
- New method
<SQLContext>$register_globals()
(#1064). - New experimental method
$sql()
for DataFrame and LazyFrame (#1065).
Miscellaneous
- Move the API document website to the new place (#1067, #1068).
Access to the old website is set to redirect to the top page of the new website.- Old URL:
https://rpolars.github.io/
- New URL:
https://pola-rs.github.io/r-polars/
- Old URL:
Full Changelog: v0.16.2...v0.16.3
v0.16.2
New features
$cut()
and$qcut()
to bin continuous values into discrete categories (#1057).pl$scan_parquet()
andpl$read_parquet()
can read data from the internet by specifying a URL to the first argument (#1056, @andyquinterom).pl$scan_parquet()
andpl$read_parquet()
gain an argumentstorage_options
to scan/read data via cloud storage providers (GCP, AWS, Azure). Note that this support is experimental (#1056, @andyquinterom).- Add support for the
Enum
datatype viapl$Enum()
(#1061).
Bug fixes
- In some read/scan functions, downloading files could fail if the URL was too long. This is now fixed (#1049, @DyfanJones).
New Contributors
- @DyfanJones made their first contribution in #1049
- @andyquinterom made their first contribution in #1056
Full Changelog: v0.16.1...v0.16.2
lib-v0.39.2
ci: exclude R devel on windows from binary library check step (#1062)
v0.16.1
This is a small hot-fix release to update dependent Rust polars to 0.39.1 (#1042).
Also, there are some updates.
Bug fixes
$len()
now correctly includesnull
values in the count (#1044).
Other improvements
$arr$max()
and$arr$min()
work without thenightly
feature (#1042).
Full Changelog: v0.16.0...v0.16.1
lib-v0.39.1
fix: `$len()` should also count `null` values (#1044)
v0.16.0
Breaking changes
-
R objects inside an R list are now converted to Polars data types via
as_polars_series()
(#1021, #1022, #1023). For example, up to polars 0.15.1,
a list containing a data.frame with a column of{clock}
naive-time class
was converted to a nested List type of Float64:data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> ┌──────────────────────────┐ #> │ nested_data │ #> │ --- │ #> │ list[list[list[f64]]] │ #> ╞══════════════════════════╡ #> │ [[[2.1475e9], [7305.0]]] │ #> └──────────────────────────┘
From 0.16.0, nested types are correctly converted, so that will be
a List type of Struct type containing a Datetime type.data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> ┌─────────────────────────┐ #> │ nested_data │ #> │ --- │ #> │ list[struct[1]] │ #> ╞═════════════════════════╡ #> │ [{1990-01-01 00:00:00}] │ #> └─────────────────────────┘
-
Several functions have been rewritten to match the behavior of Python Polars.
There are four types of changes: i) change in argument names, ii) change in
the way arguments are passed (named or by position), iii) arguments are removed,
and iv) change in the default and accepted values. Those are addressed separately
below.-
Change in argument names:
- In
$reshape()
, thedims
argument is renamed todimensions
(#1019). - In
pl$read_*
andpl$scan_*
functions, the first argument is now
source
(#935). - In
pl$Series()
, the argumentx
is renamedvalues
(#933). - In
<DataFrame>$write_*
functions, the first argument is nowfile
(#935). - In
<LazyFrame>$sink_*
functions, the first argument is nowpath
(#935). - In
<LazyFrame>$sink_ipc()
, the argumentmemmap
is renamed tomemory_map
(#1032). - In
<DataFrame>$rolling()
,<LazyFrame>$rolling()
,<DataFrame>$group_by_dynamic()
and<LazyFrame>$group_by_dynamic()
, theby
argument is renamed to
group_by
(#983). - In
$dt$convert_time_zone()
and$dt$replace_time_zone()
, thetz
argument is renamed totime_zone
(#944). - In
$str$strptime()
, the argumentdatatype
is renamed todtype
(#939). - In
$str$to_integer()
(renamed from$str$parse_int()
), argumentradix
is
renamed tobase
(#1038).
- In
-
Change in the way arguments are passed:
-
In all input/output functions, all arguments except the first argument
must be named arguments (#935). -
In
<DataFrame>$rolling()
and<DataFrame>$group_by_dynamic()
, all
arguments exceptindex_column
must be named arguments (#983). -
In
$unique()
forDataFrame
andLazyFrame
, argumentskeep
and
maintain_order
must be named (#953). -
In
$bin$decode()
, thestrict
argument must be a named argument (#980). -
In
$dt$replace_time_zone()
, all arguments excepttime_zone
must be named
arguments (#944). -
In
$str$contains()
, the argumentsliteral
andstrict
must be named
(#982). -
In
$str$contains_any()
, theascii_case_insensitive
argument must be
named (#986). -
In
$str$count_matches()
,$str$replace()
and$str$replace_all()
,
theliteral
argument must be named (#987). -
In
$str$strptime()
,$str$to_date()
,$str$to_datetime()
, and
$str$to_time()
, all arguments (except the first one) must be named (#939). -
In
$str$to_integer()
(renamed from$str$parse_int()
), all arguments
must be named (#1038). -
In
pl$date_range()
, the argumentsclosed
,time_unit
, andtime_zone
must be named (#950). -
In
$set_sorted()
and$sort_by()
, argumentdescending
must be named
(#1034). -
In
pl$Series()
, using positional arguments throws a warning, since the
argument positions will be changed in the future (#966).# polars 0.15.1 or earlier # The first argument is `x`, the second argument is `name`. pl$Series(1:3, "foo") # The code above will warn in 0.16.0 # Use named arguments to silence the warning. pl$Series(values = 1:3, name = "foo") pl$Series(name = "foo", values = 1:3) # polars 0.17.0 or later (future version) # The first argument is `name`, the second argument is `values`. pl$Series("foo", 1:3)
This warning can also be silenced by replacing
pl$Series(<values>, <name>)
byas_polars_series(<values>, <name>)
.
-
-
Arguments removed:
- The argument
columns
in$drop()
is removed.$drop()
now accepts
several character scalars, such as$drop("a", "b", "c")
(#912). - In
pl$col()
, thename
argument is removed, and the...
argument no
longer accepts a list of characters andRPolarsSeries
class objects (#923). - In
pl$date_range()
, the unused argument (not working in recent versions)
explode
is removed. (#950).
- The argument
-
Change in arguments default and accepted values:
- In
pl$Series()
, the argumentvalues
has a new default valueNULL
(#966). - In
$unique()
forDataFrame
andLazyFrame
, argumentkeep
has a new
default value"any"
(#953). - In rolling aggregation functions (such as
$rolling_mean()
), the default
value of argumentclosed
now isNULL
. Usingclosed
with a fixed
window_size
now throws an error (#937). - In
pl$date_range()
, the argumentend
must be specified and the default
value ofinterval
is changed to"1d"
. The argumentsstart
andend
no longer accept numeric values (#950). - In
pl$scan_parquet()
, the default value of the argumentrechunk
is
changed fromTRUE
toFALSE
(#1033). - In
pl$scan_parquet()
andpl$read_parquet()
, the argumentparallel
only accepts"auto"
,"columns"
,"row_groups"
, and"none"
.
Previously, it also accepted upper-case notation of"auto"
,"columns"
,
"none"
, and"RowGroups"
instead of"row_groups"
(#1033). - In
$str$to_integer()
(renamed from$str$parse_int()
), the default
value ofbase
is changed from2
to10
(#1038).
- In
-
-
The usage of
pl$date_range()
to create a range ofDatetime
data type is
deprecated.pl$date_range()
will always create a range ofDate
data type
in the future. Usepl$datetime_range()
if you want to create a range of
Datetime
instead (#950). -
<DataFrame>$get_columns()
now returns an unnamed list instead of a named
list (#991). -
Removed
$argsort()
which was an old alias for$arg_sort()
(#930). -
Removed
pl$expr_to_r()
which was an alias for$to_r()
(#938). -
<Series>$to_r_list()
is renamed<Series>$to_list()
(#938). -
Removed
<Series>$to_r_vector()
which was an old alias for
<Series>$to_vector()
(#938). -
Removed
<Expr>$rep_extend()
, which was an experimental method created at the
early stage of this package and does not exist in other language APIs (#1028). -
The following deprecated functions are now removed:
pl$threadpool_size()
,
<DataFrame>$with_row_count()
,<LazyFrame>$with_row_count()
(#965). -
In
$group_by_dynamic()
, the first datapoint is always preserved (#1034). -
$str$parse_int()
is renamed to$str$to_integer()
(#1038).
New features
-
New functions:
pl$arg_sort_by()
(#929).pl$arg_where()
to get the indices that match a condition (#922).pl$datetime()
,pl$date()
, andpl$time()
to easily create Expr of class
datetime, date, and time via columns and literals (#918).pl$datetime_range()
,pl$date_ranges()
andpl$datetime_ranges()
(#950, #962).pl$int_range()
andpl$int_ranges()
(#968)pl$mean_horizontal()
(#959)pl$read_ipc()
(#1033).is_polars_dtype()
(#927).
-
New methods:
<LazyFrame>$to_dot()
to print the query plan of a LazyFrame with graphviz
dot syntax (#928).$clear()
forDataFrame
,LazyFrame
, andSeries
(#1004).$item()
forDataFrame
andSeries
(#992).$select_seq()
and$with_columns_seq()
forDataFrame
andLazyFrame
(#1003).$arr$to_list()
(#1018).$str$extract_groups()
(#979).$str$find()
(#985).<DataFrame>$write_ipc()
(#1032).RPolarsDataType
gains several methods to check the datatype, such as
$is_integer()
,$is_null()
or$is_list()
(#1036).
-
New arguments or argument values:
ambiguous
can now take the value"null"
to convert ambigous datetimes to
null values (#937).n
in$str$replace()
(#987).non_existent
in$dt$replace_time_zone()
to specify what should happen
when a datetime doesn't exist.mapping_strategy
in$over()
(#984, #988).raise_if_undetermined
in$meta$output_name()
(#961).null_on_oob
in$arr$get()
and$list$get()
to determine what happens
when the index is out of bounds (#1034).nulls_last
,multithreaded
, andmaintain_order
in$sort_by()
(#1034).
-
Other:
Bug fixes
- The
join_nulls
and ...
lib-v0.39.0
refactor!: `$str$parse_int()` -> `$str$to_integer()` (#1038) Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>
v0.15.1
New features
- rust-polars is updated to 0.38.2 (#907).
- Minimum supported Rust version (MSRV) is now 1.76.0.
as_polars_df(<nanoarrow_array>)
is added (#893).- It is now possible to create an empty
DataFrame
with a specific schema withpl$DataFrame(schema = my_schema)
(#901). - New arguments
dtype
andnan_to_null
forpl$Series()
(#902). - New method
<DataFrame>$partition_by()
(#898).
Bug fixes
- The default value of the
format
of$str$strptime()
is now correctly set (#892).
Other improvements
- Performance of
as_polars_df(<nanoarrow_array_stream>)
is improved (#896).
Full Changelog: v0.15.0...v0.15.1