Fix concat #673

AndreiKingsley · 2024-04-23T14:20:39Z

concat removes key column entirely (name and values)

The text was updated successfully, but these errors were encountered:

AndreiKingsley · 2024-04-23T14:23:55Z

Maybe it's useful to add GroupBy.origin: DataFrame? that returns original dataframe if it was created via DataFrame.groupBy()

Jolanrensen · 2024-04-23T18:15:06Z

I think this is the intended behavior. The key of the group is something temporary and usually consists of columns already in the DF.
We are working on a way to access the group keys from aggregate though (#662), maybe that can be a nice alternative.

The original DataFrame can be retrieved using concat (albeit with a different order perhaps).

AndreiKingsley · 2024-04-23T18:33:55Z

Ok, anyway new concat is needed for the purpose I described.

AndreiKingsley · 2024-04-23T18:35:06Z

This's my current solution
https://github.com/Kotlin/kandy/blob/main/kandy-api/src/main/kotlin/org/jetbrains/kotlinx/kandy/dsl/internal/concatFixed.kt

Jolanrensen · 2024-04-24T13:33:19Z

maybe a concatWithKeys() would be a nice addition?

koperagen · 2024-04-24T14:00:47Z

I think it won't hurt to make do it by default. One might say that df.groupBy { expr { } } is a shortcut for df.add() { }.groupBy {}

Jolanrensen · 2024-04-24T14:25:14Z

if we do it by default, then we would get duplicate columns, because the key columns are often in the groups as well

koperagen · 2024-04-24T14:32:17Z

Andrey's implementation only adds "new" columns (or so i understood)

Jolanrensen · 2024-04-25T10:39:50Z

But then, what qualifies as "new"?

groupBy { expr { myCol } }, yes
but groupBy { myCol + 1 }?
or groupBy { myCol named "other" }

I think we should be careful here

Jolanrensen · 2024-05-06T19:40:04Z

There's also the case where a user creates a new expr column with a duplicate name that should still be kept, so my suggestion is the following:
Create a concatWithKeys() that will add all key columns to the front of the groups regardless of whether they were in the DF already. Avoid naming clashes by using the ColumnNameGenerator, for instance with DynamicDataFrameBuilder.

Something like:

internal fun GroupBy<*, *>.concatWithKeys(): DataFrame<*> =
    mapToFrames {
        DynamicDataFrameBuilder()
            .apply {
                for (column in group.columns()) {
                    add(column)
                }
                val rowsCount = group.rowsCount()
                for ((name, value) in key.toMap()) {
                    add(List(rowsCount) { value }.toColumn(name))
                }
            }
            .toDataFrame()
            .moveToLeft { takeLast(key.count()) }
    }.concat()

Jolanrensen · 2024-05-07T10:13:32Z

Alternatively, what's arguably a lot simpler, we could just explode the groups column. Like:

internal fun GroupBy<*, *>.concatWithKeys(): DataFrame<*> =
    toDataFrame().explode { groups }

This will generate extra key values where necessary and keep the grouped columns in a column group, avoiding any potential name clashes :).

AndreiKingsley added the bug Something isn't working label Apr 23, 2024

zaleslaw added this to the 0.14.0 milestone Apr 23, 2024

koperagen self-assigned this Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix concat #673

Fix concat #673

AndreiKingsley commented Apr 23, 2024

AndreiKingsley commented Apr 23, 2024

Jolanrensen commented Apr 23, 2024

AndreiKingsley commented Apr 23, 2024

AndreiKingsley commented Apr 23, 2024

Jolanrensen commented Apr 24, 2024

koperagen commented Apr 24, 2024

Jolanrensen commented Apr 24, 2024

koperagen commented Apr 24, 2024 •

edited

Jolanrensen commented Apr 25, 2024

Jolanrensen commented May 6, 2024

Jolanrensen commented May 7, 2024

Fix concat #673

Fix concat #673

Comments

AndreiKingsley commented Apr 23, 2024

AndreiKingsley commented Apr 23, 2024

Jolanrensen commented Apr 23, 2024

AndreiKingsley commented Apr 23, 2024

AndreiKingsley commented Apr 23, 2024

Jolanrensen commented Apr 24, 2024

koperagen commented Apr 24, 2024

Jolanrensen commented Apr 24, 2024

koperagen commented Apr 24, 2024 • edited

Jolanrensen commented Apr 25, 2024

Jolanrensen commented May 6, 2024

Jolanrensen commented May 7, 2024

koperagen commented Apr 24, 2024 •

edited