Expressions - another attempt

I'd just like to make another case for the expressions syntax, and to see if a simpler version of the proposal I'd previous put together might be acceptable

## The current syntax isn't sufficient for new libraries

Great Tables (a new project by Posit) has recently introduced native Polars support - the way they did it really caught my attention. They didn't repeat their pandas support but with Polars functions - instead, they really make expressions a main focus: https://posit-dev.github.io/great-tables/blog/polars-styling/ . The whole thing's worth reading, but I really want to draw your attention to

> As it turns out, polars expressions make styling tables very straightforward. The same polars code that you would use to select or filter combines with Great Tables to highlight, circle, or bolden text.
>
> In this post, I’ll show how Great Tables uses polars expressions to make delightful tables, like the one below.

I was expecting this to happen, and I expect it'll happen a whole load more. If new libraries lean in to the expressions syntax, then the Standard will be dead on arrival.

If we want to promote the Standard, we need to keep up with the times. This requires extra work, but so does anything worthwhile.

## The current rules break method chaining

Let's take the following:
- join `lineitem` and `supplier` on `'a'` (left join)
- we only keep rows where column `'a'` plus column `'b'` is greater than 0
- double the value of column `'a'` and only keep that and column `'d'`

You might expect to be able to do this with:
```python
(
    lineitem.join(supplier, on="a", how="left")
    .filter((lineitem.col("a") + lineitem.col("b")) > 0)
    .assign(lineitem.col("a") * 2)
    .select("a", "d")
)
```
However, it will raise, because `lineitem.col('a')` was derived from a different dataframe than `lineitem.join(supplier, on='a', how='left')`, and that's not allowed. (yes, I'm aware that you can workaround this with temporary variables, but my point is: method chaining is very popular among dataframe users and devs - are we sure we don't want to support it?).

With expressions, though, there's no issue:
```python
(
    lineitem.join(supplier, on="a", how="left")
    .filter((pdx.col("a") + pdx.col("b")) > 0)
    .select(pdx.col("a") * 2, pdx.col("d"))
)
```
You also don't need the extra `assign` statement

## It's not a zero-cost abstraction

The current syntax is also not a zero-cost abstraction on Polars - trying to use only two objects (`Column`, `DataFrame`) to represent four (`Series`, `Expr`, `DataFrame`, `LazyFrame`) means that the resulting code isn't going to be as efficient as it could be:
```python
df: pl.DataFrame
df.filter((df['a']+df['b'])>0)
```
is less efficient than
```python
df: pl.DataFrame
df.filter((pl.col('a')+pl.col('b'))>0)
```
and the current API, in the `persist`ed case, resolves to the first one. I don't see a way of this unfortunately.

Telling people "you can use the standard if you want, but it'll be more efficient to use the Polars API directly" is a recipe for people just using Polars and forgetting about the Standard. I'm calling it.

## The way forwards

We don't necessarily need to separate `DataFrame` from `LazyFrame`. But I'm once again making the case for `Expr` being separate from `Column`.

@shwina @kkraus14 if I made a simpler version of this summer's proposal, would you be open to reconsidering this? I'm tagging you two specifically because, as far as I remember, everyone else was positive about it.

## Alternatives

We need to do something here, I don't want my name on standard which is just a "pandas minus"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expressions - another attempt #346

The current syntax isn't sufficient for new libraries

The current rules break method chaining

It's not a zero-cost abstraction

The way forwards

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expressions - another attempt #346

Description

The current syntax isn't sufficient for new libraries

The current rules break method chaining

It's not a zero-cost abstraction

The way forwards

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions