Skip to content

ENH: Add depends_on kwarg to pytask.task() #502

@NickCrews

Description

@NickCrews

Is your feature request related to a problem?

Hi! I'm finally upgrading to pytask 0.4.0. The new functional interface is exactly what I needed to remove the glue/translation code I currently have so I can define/execute my own DAG based on CLI args.

One problem though: the pytask.task() function only accepts the produces argument, but I also want to define the dependencies of my function.

Describe the solution you'd like

Add a depends_on kwargs

API breaking implications

None. It is a little treading on the toes of the existing kwargs argument, but I think they are named differently enough to not be confusing.

Describe alternatives you've considered

Different names other than depends_on? dependencies?

I think in general this is the logical extension and way to solve this problem.

Additional context

my desired code:

class DFNode:
    ....

def process_df(df: pd.DataFrame) -> pd.DataFrame:
    return df.dropna()

def cli():
    tasks = [
        pytask.task(depends_on=DFNode("input.parquet"), produces=DFNode("output.parquet")(process_df),
       # others
    ]
    pytask.build(tasks)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions