-
Notifications
You must be signed in to change notification settings - Fork 11
ENH: Using named arguments in an iteration of a parametrization. #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Discussion at pytestThere is a lot of discussion about this over at pytest.
The main thing to keep in mind is that they already have two mechanisms to describe parametrizations.
The main arguments against dictionaries in the first approach:
The main arguments against
|
WorkaroundsNamed arguments in parametrization
from pathlib import Path
from typing import NamedTuple
class Task(NamedTuple):
depends_on: Path
produces: Path
@pytask.mark.parametrize("depends_on, produces", [
Task(depends_on="first_dataset.pkl", produces="first_plot.png"),
Task(depends_on="second_dataset.pkl", produces="second_plot.png"),
])
def task_plot_data(depends_on, produces):
df = pd.read_pickle(depends_on)
ax = df.plot()
plt.savefig(produces) Ids closer to iterations This is a workaround for connecting ids of tasks closer to its values with dictionaries. In combination with tasks = {
"task_1": ("first_dataset.pkl", "first_plot.png"),
"task_2": ("second_dataset.pkl", "second_plot.png"),
}
@pytask.mark.parametrize("depends_on, produces", tasks.values(), ids=tasks.keys())
def task_plot_data(depends_on, produces):
df = pd.read_pickle(depends_on)
ax = df.plot()
plt.savefig(produces) |
@hmgaudecker @janosg @roecla @timmens What do you think about the issue and the two workarounds? I like about the namedtuple approach that I dont have to offer a new API, but I confess that for each complicated task you have to write the namedtuple instead of using dictionaries directly. I am in favor of implementing We could also buy us some time by just documenting the namedtuple workaround and wait for/collect more feedback. |
Thanks for your research and ideas!
Yes, sounds great.
I simply was not aware of that and I think the use cases of pytest / pytask are very different here.
So
Behaviour would be:
my_tasks = {
"task_1": {
"depends_on": {
"first_dataset.pkl",
"some_specification.yaml",
},
"produces": "first_plot.png"
},
"task_2": {
"depends_on": "second_dataset.pkl",
"produces": "second_plot.png",
"marks": pytask.mark.skip
},
}
@pytask.mark.parametrize(*pytask.convert_dict_to_parametrization(my_tasks))
def task_plot_data(depends_on, produces):
df = pd.read_pickle(depends_on)
ax = df.plot()
plt.savefig(produces) How does that sound? |
(and clearly advertise that as a convenience function, not a core pytask component) |
I like the focus on IDs in the dictionary approach and Hans-Martin's proposed Solution. One thing I don't like about NamedTuples as I had another look at ward and really like their way of parametrising tests. Pytest/Pytask style parametrizations are very confusing for beginners (for me personally and most students in courses). The ward style seems much simpler because it hides the lazy evaluation of test functions. I also think that a lot of the complexity discussed here comes exactly from the separation of function calls and definition of inputs with which they are called. Of course, this would be a paradigm shift for pytask as it would move away from pytest and experienced pytest users would have to learn something new to use pytask. It is also potentially hard to implement. I guess it requires that in the collection phase all task functions are mocked with some tracer object that records with which argument they were called. Without the decorators that are required in ward this is probably even harder to achieve. |
Maybe I am missing something, but I am not sure how ward-style parametrizations would look easier in the pytask case, where we are usually not talking about 1, 2, 3 but each element being a rather long list of deps and targets? What I like about the above dict is that everything is named and it keeps deps and targets closer together, without having to think about the order. But I am probably missing something fundamental. Maybe you could translate the above dict to how it would look like ward-style? Old man needs examples. 🤷 |
Love the idea about the check for whether namedtuples are consistent with the signature! What do you mean by ward hides the lazy evaluation of test functions? Do you mean the for-loop around the task function? I am not sure I like wards way apart from the for-loop.
I think wards way of writing tests is somewhat inspired by how you would do it in java script. Meaning the focus on ids as decorators and anonymous test functions. What's nice about this is that you will never have nameclashes. You can assign different internal ids to a test with the same id in the same module. |
I just mean the for loop instead of a parametrize decorator. I would leave everything else as is. That's also why I mentioned that implementation could be tricky if there is no decorator to mark tasks. By hiding lazy evaluation I mean exactly the for loop. In pytask you specify functions and arguments with which they will be called at some point. In ward you just call functions in a loop. Whether you construct arguments inside the loop or beforehand becomes your choice. Edit: In ward you don't call the functions in a loop, but you define them setting default values for all arguments, so they can be called very easily later. |
We have a task decorator which could be extended. And offer two interfaces. Actually, I really like the last point about it because instead of preventing users from making mistakes and informing them about them, dont let them do mistakes. |
@hmgaudecker, Janos and I had a very short talk about ward's approach to parametrizations by looping over the task function and we think this is the easiest interface one could imagine. Your dictionary approach just needs the extra lines for the looping over the function and unpacking the dictionary with Additionally, I think this approach renders
obsolete which is a clear win. Additionally, I would open a ticket for documenting namedtuples and adding a check that attribute names match the signature of the parametrization. What are your thoughts? |
I don't see through all that, but I trust you guys. All I want is to use keywords everywhere (no reliance on order) without having to define a NamedTuple myself. 😄 |
Is your feature request related to a problem?
Currently, each iteration of a parametrization is usually a tuple, but we would like to have name arguments instead. It is easier to read especially when there are many arguments and the order of the arguments can be ignored.
Describe the solution you'd like
Related to pytest, we could introduce
pytask.param
which receives either args or kwargs and represents one iteration. It also has an id parameter to change the id of the iteration.We could also allow for dictionary inputs to each iteration, checking that the keys match the function arguments.
API breaking implications
None, the old style will be preserved.
Describe alternatives you've considered
None
The text was updated successfully, but these errors were encountered: