Skip to content

replace arrays_only argument by skip_errors in Session._unaryop and add it to Session._unaryop #862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alixdamman opened this issue Apr 22, 2020 · 4 comments

Comments

@alixdamman
Copy link
Collaborator

Currently available unary ops on Session are __neg__, __pos__, __abs__ and __invert__.

These ops should only apply on arrays (assuming integer and float elements of Session objects are only used to store years, maximum/minimum age, ... and should not be modified by unary ops like __neg__, __pos__ and __abs__).

@gdementen
Copy link
Contributor

We cannot assume what those scalars will be used for. It could be anything.

The arrays_only=True argument should probably not have been implemented the way it was to binary ops in the first place.

I don't remember the exact problem this fixed, but I now fear it was not the right way to fix this.

Instead of having:

    def _binop(opname, arrays_only=True):
        opfullname = '__%s__' % opname
        def opmethod(self, other):
            [...]
    __add__ = _binop('add')

I think we should have something like this instead:

    def _binop(opname):
        opfullname = '__%s__' % opname
        def opmethod(self, other, arrays_only=False, we_could_imagine_other_filter_options=...):
            [...]
    add = __add__ = _binop('add')

i.e. add the possibility to choose what you want to do at runtime instead of hard-coding it.

Now, the important question is what default value to use for arrays_only. Do you remember the exact case which triggered the introduction of that behavior?

Without the exact problem case in mind, I think it would be better to revert arrays_only to False for all ops, and recommend using session1.add(session2, arrays_only=True). What do you think?

As for unary ops, they should of course follow what we do with binary ops.

@alixdamman
Copy link
Collaborator Author

We cannot assume what those scalars will be used for. It could be anything.

Yes, it could be anything but I still think that they are mostly used to store fixed years, min or max values. In other words, I still think the default value of arrays_only should be True.

Instead of having:

    def _binop(opname, arrays_only=True):
        opfullname = '__%s__' % opname
        def opmethod(self, other):
            [...]
    __add__ = _binop('add')

I think we should have something like this instead:

    def _binop(opname):
        opfullname = '__%s__' % opname
        def opmethod(self, other, arrays_only=False, we_could_imagine_other_filter_options=...):
            [...]
    add = __add__ = _binop('add')

i.e. add the possibility to choose what you want to do at runtime instead of hard-coding it.

Why not. I have nothing against it.

Now, the important question is what default value to use for arrays_only. Do you remember the exact case which triggered the introduction of that behavior?

Without the exact problem case in mind, I think it would be better to revert arrays_only to False for all ops, and recommend using session1.add(session2, arrays_only=True). What do you think?

I guess this was introduced after we implemented the possibility to store other kinds of object than arrays like Axis and Group objects but before scalars objects. Adding/divinding/... axes or groups makes no sense.

Users may want to add (substract, divide, ...) two sessions without having all axes, groups and strings replaced by nan values.

As I explained above, I still think that the default value of arrays_only should be True.

Not a good reason but choosing arrays_only=False will make that adding (divding/...) two ConstrainedSession with axes and groups (and strings) will always return a normal Session instead of a ConstrainedSession (see discussion from issue #832).

Last point: we could add a new 'option' (see /util/options.py) so that users will be able to specify in the beginning of their program or within a with block what behavior they want. What do you think?

@gdementen
Copy link
Contributor

Now, the important question is what default value to use for arrays_only. Do you remember the exact case which triggered the introduction of that behavior?

Without the exact problem case in mind, I think it would be better to revert arrays_only to False for all ops, and recommend using session1.add(session2, arrays_only=True). What do you think?

I guess this was introduced after we implemented the possibility to store other kinds of object than arrays like Axis and Group objects but before scalars objects. Adding/divinding/... axes or groups makes no sense.

Thanks a lot for the little refresher! It is now a lot clearer for me and indeed for axes or groups, most binary ops make no sense. But for scalars it does make sense and I can imagine several cases where users would want (some) scalars to be included. So in my opinion the dichotomy of arrays vs non-array is no longer the criterion we should use. It is whether those operations work or not. So the "arrays_only" argument should be changed to something like "skip_errors" (or skip_undefined_ops, or whatever_I_am_not_good_about_naming_things) which should indeed default to True.

Users may want to add (substract, divide, ...) two sessions without having all axes, groups and strings replaced by nan values.

For axes, groups and strings, yes but not for numeric scalar values. Having junk for some variables (this will happen for some arrays too!) is better than not having the variable you wanted in the first place. Remember a scalar could be the result of a model and can be seen as just an array with 0 axes, and if you want to compare two variants, you have to compare scalars too.

As I explained above, I still think that the default value of arrays_only should be True.

Not for numeric scalars, as explained above.

Last point: we could add a new 'option' (see /util/options.py) so that users will be able to specify in the beginning of their program or within a with block what behavior they want. What do you think?

We could, when and if users use so many "session.add(session2, filter=some_filter_criterion)" lines that it becomes more convenient to specify the filter once in a with block, but IMO, we are very far from that point.

@alixdamman
Copy link
Collaborator Author

Remember a scalar could be the result of a model and can be seen as just an array with 0 axes, and if you want to compare two variants, you have to compare scalars too.

Right. My brain skipped that possibility.

So the "arrays_only" argument should be changed to something like "skip_errors" (or skip_undefined_ops, or whatever_I_am_not_good_about_naming_things) which should indeed default to True.

This makes sense to me. All right then, let's do this.

@alixdamman alixdamman changed the title add arrays_only=True argument to Session._unaryop? replace arrays_only argument by skip_errors in Session._unaryop and add it to Session._unaryop Apr 23, 2020
@gdementen gdementen modified the milestones: 0.33, 0.34 Aug 17, 2021
@gdementen gdementen assigned gdementen and unassigned alixdamman Sep 20, 2022
@gdementen gdementen modified the milestones: 0.34, 0.35 Sep 26, 2022
@gdementen gdementen modified the milestones: 0.35, 0.36 Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants