Description
I'd argue that KType nullability should always check actual column values.
https://github.com/Kotlin/dataframe/blob/master/dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt#L597
Which is done by infer = Infer.Nulls
My reasoning is mostly related to notebooks
Pros: you won't have to handle nullable values if given snapshot doesn't have any! Very convenient if you just want to work with specific fragment of data
Cons: Imagine you want to rerun the same notebook, but this time data has nulls. Now, you'll have to modify your code to handle it, or it will be compilation error
So, depending on your use case: explore data once vs reuse notebook, desirable behavior can vary.
My suggestion here: to support re-usability of notebooks, JDBC integration should have method to import data schema from DB schema the same way as open api
support does.
Things to consider here: it's already possible to write (or generate and edit) a data schema to rerun notebooks without problems. There're other operation that work like this: add
, convert
and other functions will create nullable KType only if there are nulls, as well as other data sources (discussion about this in context of Arrow: #428 with additional argument about KType nullability)
public inline fun <reified R, T> DataFrame<T>.add(
name: String,
noinline expression: AddExpression<T, R>
): DataFrame<T> = add(name, Infer.Nulls, expression)