Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed documentation regarding extension properties api #651

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ internal interface AccessApi {
*
* For example:
* ```kotlin
* val df = DataFrame.read("titanic.csv")
* val df /* : AnyFrame */ = DataFrame.read("titanic.csv")
* ```
*/
interface ExtensionPropertiesApi
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ class ApiLevels {
@TransformDataFrameExpressions
fun extensionProperties1() {
// SampleStart
val df = DataFrame.read("titanic.csv")
val df /* : AnyFrame */ = DataFrame.read("titanic.csv")
// SampleEnd
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ class ApiLevels {
@TransformDataFrameExpressions
fun extensionProperties1() {
// SampleStart
val df = DataFrame.read("titanic.csv")
val df /* : AnyFrame */ = DataFrame.read("titanic.csv")
// SampleEnd
}
}
28 changes: 18 additions & 10 deletions docs/StardustDocs/topics/extensionPropertiesApi.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,30 @@

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->

When [`DataFrame`](DataFrame.md) is used within Jupyter Notebooks or Datalore with Kotlin Kernel,
after every cell execution all new global variables of type DataFrame are analyzed and replaced
with typed [`DataFrame`](DataFrame.md) wrapper with auto-generated extension properties for data access:
When [`DataFrame`](DataFrame.md) is used within Jupyter/Kotlin Notebook or Datalore with the Kotlin Kernel,
something special happens:
After every cell execution, all new global variables of type DataFrame are analyzed and replaced
with a typed [`DataFrame`](DataFrame.md) wrapper along with auto-generated extension properties for data access.
For instance, say we run:

<!---FUN extensionProperties1-->

```kotlin
val df = DataFrame.read("titanic.csv")
val df /* : AnyFrame */ = DataFrame.read("titanic.csv")
```

<!---END-->

Now data can be accessed by `.` member accessor
In normal Kotlin code, we would now have a variable of type [`AnyFrame` (=`DataFrame<*>`)](DataFrame.md) that doesn't have any
extension properties to access its columns. We would either have to define them manually or use the
[`@DataSchema`](schemas.md) annotation to [generate them](schemasGradle.md#configuration).

By contrast, after this cell is run in a notebook, the columns of the dataframe are used as a basis
to generate a hidden `@DataSchema interface TypeX`,
along with extension properties like `val DataFrame<TypeX>.age` etc.
Next, the `df` variable is shadowed by a new version cast to `DataFrame<TypeX>`.

As a result, now columns can be accessed directly on `df`!

<!---FUN extensionProperties2-->

Expand All @@ -28,12 +39,9 @@ df.add("lastName") { name.split(",").last() }

The `titanic.csv` file could be found [here](https://github.com/Kotlin/dataframe/blob/master/data/titanic.csv).

In notebooks, extension properties are generated for [`DataSchema`](schemas.md) that is extracted from [`DataFrame`](DataFrame.md)
instance after REPL line execution.
After that [`DataFrame`](DataFrame.md) variable is typed with its own [`DataSchema`](schemas.md), so only valid extension properties corresponding to actual columns in DataFrame will be allowed by the compiler and suggested by completion.

Extension properties can be generated in IntelliJ IDEA using the [Kotlin Dataframe Gradle plugin](schemasGradle.md#configuration).

<warning>
In notebooks generated properties won't appear and be updated until the cell has been executed. It often means that you have to introduce new variable frequently to sync extension properties with actual schema
In notebooks generated properties won't appear and be updated until the cell has been executed.
It often means that you have to introduce new variable frequently to sync extension properties with actual schema.
zaleslaw marked this conversation as resolved.
Show resolved Hide resolved
</warning>