Skip to content

ENH: Add depends_on kwarg to pytask.task() #502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NickCrews opened this issue Nov 26, 2023 · 2 comments · Fixed by #509
Closed

ENH: Add depends_on kwarg to pytask.task() #502

NickCrews opened this issue Nov 26, 2023 · 2 comments · Fixed by #509
Labels
enhancement New feature or request

Comments

@NickCrews
Copy link
Contributor

Is your feature request related to a problem?

Hi! I'm finally upgrading to pytask 0.4.0. The new functional interface is exactly what I needed to remove the glue/translation code I currently have so I can define/execute my own DAG based on CLI args.

One problem though: the pytask.task() function only accepts the produces argument, but I also want to define the dependencies of my function.

Describe the solution you'd like

Add a depends_on kwargs

API breaking implications

None. It is a little treading on the toes of the existing kwargs argument, but I think they are named differently enough to not be confusing.

Describe alternatives you've considered

Different names other than depends_on? dependencies?

I think in general this is the logical extension and way to solve this problem.

Additional context

my desired code:

class DFNode:
    ....

def process_df(df: pd.DataFrame) -> pd.DataFrame:
    return df.dropna()

def cli():
    tasks = [
        pytask.task(depends_on=DFNode("input.parquet"), produces=DFNode("output.parquet")(process_df),
       # others
    ]
    pytask.build(tasks)
@NickCrews NickCrews added the enhancement New feature or request label Nov 26, 2023
@tobiasraabe
Copy link
Member

Hi @NickCrews, great you are upgrading! Let me know if you have more feedback.

Regarding the issue, what are you trying to accomplish with depends_on that is not possible with kwargs?

Maybe it is not well documented, but I believe kwargs is what you are looking for. It allows you to pass arguments to the function and given the function signature, they are treated as either dependencies or products (because of the product annotation). Since function arguments can be dependencies or products, kwargs is not called depends_on.

produces is only in situations where the function returns is the product and we cannot alter the the function signature to annotate the return. It is for handling the use-case you raised when pytask is used to wrap third-party functions.

@NickCrews
Copy link
Contributor Author

Ah, I think you are totally right, I just misunderstood. Can we add to the examples so it uses this functionality? If those were there then I would have gotten this right I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants