@@ -41,8 +41,8 @@ retain(.data, ..., .by = NULL)
4141exclude(.data , ... , .by = NULL )
4242
4343# Vector functions
44- when_any(... , missing = NULL )
45- when_all(... , missing = NULL )
44+ when_any(... , na_rm = FALSE )
45+ when_all(... , na_rm = FALSE )
4646```
4747
4848For ` retain() ` and ` exclude() ` :
@@ -63,11 +63,9 @@ For `when_any()` and `when_all()`:
6363
6464- ` when_all() ` combines conditions with ` & ` .
6565
66- - ` missing = NULL ` propagates ` NA ` through according to the typical ` & ` and ` | ` rules.
66+ - ` na_rm = FALSE ` propagates ` NA ` through according to the typical ` & ` and ` | ` rules.
6767 Propagating missing values by default combines well ` retain() ` and ` exclude() ` .
68-
69- - ` missing = FALSE / TRUE ` replace ` NA ` with the specified ` missing ` value before combining.
70- This acts as a more flexible ` na_rm ` style argument.
68+ ` na_rm = TRUE ` removes ` NA ` s "rowwise" from the computation, exactly like in ` pmin() ` and ` pmax() ` .
7169
7270- These functions can be used anywhere, not just in ` retain() ` and ` exclude() ` .
7371
@@ -636,8 +634,9 @@ Every issue / question below is actually a request for `exclude()` in disguise:
636634- [ ` exclude(col == "str") ` ] ( https://stackoverflow.com/questions/46378437/how-to-filter-data-without-losing-na-rows-using-dplyr )
637635- [ ` exclude(var1 == 1) ` ] ( https://stackoverflow.com/questions/32908589/why-does-dplyrs-filter-drop-na-values-from-a-factor-variable )
638636
639- In the * extremely* rare cases where you might need ` missing = ` , you can use ` when_any() ` and ` when_all() ` inside of ` retain() ` and ` exclude() ` .
640- These propagate missings by default but have a ` missing ` argument for you to control this behavior.
637+ In the * extremely* rare cases where you might need ` missing = TRUE ` , you can nest ` when_all(na_rm = TRUE) ` inside of ` retain() ` and ` exclude() ` .
638+ This propagates missings by default but ` na_rm = TRUE ` removes missings from the computation.
639+ For an "all" style operation, that is equivalent to treating missings like ` TRUE ` (i.e. ` all() ` and ` all(NA, na.rm = TRUE) ` both return ` TRUE ` ).
641640
642641## Appendix
643642
@@ -658,23 +657,23 @@ Related issues and examples:
658657
659658Tables like these help us ensure there aren't any holes in our designs.
660659
661- Intent vs Combine
660+ #### Intent vs Combine
662661
663- +------------+- -----------+-------------------------------------------------------+
664- | Intent | Combine | Solution |
665- +============+= ===========+=======================================================+
666- | Retain | And | ` retain(a, b, c) ` |
667- +------------+- -----------+-------------------------------------------------------+
668- | Retain | Or | ` retain(when_any(a, b, c)) ` |
669- +------------+- -----------+-------------------------------------------------------+
670- | Exclude | And | ` exclude(a, b, c) ` |
671- +------------+- -----------+-------------------------------------------------------+
672- | Exclude | Or | ` exclude(when_any(a, b, c)) ` |
673- | | | |
674- | | | In practice: `exclude(a) | > exclude(b) | > exclude(c)` |
675- +------------+- -----------+-------------------------------------------------------+
662+ +---------+ ---------+----------------------+ -----------+-------------------------------------------------------+
663+ | Intent | Combine | Hypothetical usage % | Currently | Solution |
664+ +=========+ =========+======================+ ===========+=======================================================+
665+ | Retain | And | 50% | ✅ | ` retain(a, b, c) ` |
666+ +---------+ ---------+----------------------+ -----------+-------------------------------------------------------+
667+ | Retain | Or | 5% | ❌ | ` retain(when_any(a, b, c)) ` |
668+ +---------+ ---------+----------------------+ -----------+-------------------------------------------------------+
669+ | Exclude | And | 35% | ❌ | ` exclude(a, b, c) ` |
670+ +---------+ ---------+----------------------+ -----------+-------------------------------------------------------+
671+ | Exclude | Or | 10% | ❌ | ` exclude(when_any(a, b, c)) ` |
672+ | | | | | |
673+ | | | | | In practice: `exclude(a) | > exclude(b) | > exclude(c)` |
674+ +---------+ ---------+----------------------+ -----------+-------------------------------------------------------+
676675
677- Intent vs Missings
676+ #### Intent vs Missings
678677
679678+-----------+------------------+---------------------------------------------------------+----------------------------------------------------------+
680679| Intent | Missings | Outcome | Usefulness |
@@ -683,7 +682,40 @@ Intent vs Missings
683682+-----------+------------------+---------------------------------------------------------+----------------------------------------------------------+
684683| Exclude | Treat as ` FALSE ` | Exclude rows where you * know* the conditions are ` TRUE ` | Very. Simplifies "treat ` filter() ` as an exclude" cases. |
685684+-----------+------------------+---------------------------------------------------------+----------------------------------------------------------+
686- | Retain | Treat as ` TRUE ` | Retain rows where conditions are ` TRUE ` or ` NA ` | Unconvinced. Often this is an ` exclude() ` in disguise. |
685+ | Retain | Treat as ` TRUE ` | Retain rows where conditions are ` TRUE ` or ` NA ` | Not. This is an ` exclude() ` in disguise. |
687686+-----------+------------------+---------------------------------------------------------+----------------------------------------------------------+
688- | Exclude | Treat as ` TRUE ` | Exclude rows where conditions are ` TRUE ` or ` NA ` | Unconvinced. |
687+ | Exclude | Treat as ` TRUE ` | Exclude rows where conditions are ` TRUE ` or ` NA ` | Not. Never seen an example of this. |
689688+-----------+------------------+---------------------------------------------------------+----------------------------------------------------------+
689+
690+ #### Connection to vctrs
691+
692+ We purposefully don't expose ` missing ` directly on the dplyr side.
693+ The 3-valued argument is quite complicated to think about.
694+ Instead it bubbles up through ` retain() ` / ` exclude() ` using ` missing = FALSE ` and ` when_all() ` / ` when_any() ` 's ` na_rm ` argument.
695+
696+ Particularly confusing for the average consumer is that ` when_all(na_rm = TRUE) ` maps to ` list_pall(missing = TRUE) ` but ` when_any(na_rm = TRUE) ` maps to ` list_pany(missing = FALSE) ` .
697+ Exposing only ` na_rm = TRUE ` saves users from having to do these very hard mental gymnastics.
698+
699+ +------------------------------+--------------------------+---------------------------+
700+ | vctrs | Data frame | Vector |
701+ +==============================+==========================+===========================+
702+ | ` list_pall(missing = NULL) ` | | ` when_all(na_rm = FALSE) ` |
703+ +------------------------------+--------------------------+---------------------------+
704+ | ` list_pall(missing = FALSE) ` | ` retain() ` / ` exclude() ` | |
705+ +------------------------------+--------------------------+---------------------------+
706+ | ` list_pall(missing = TRUE) ` | | ` when_all(na_rm = TRUE) ` |
707+ +------------------------------+--------------------------+---------------------------+
708+ | ` list_pany(missing = NULL) ` | | ` when_any(na_rm = FALSE) ` |
709+ +------------------------------+--------------------------+---------------------------+
710+ | ` list_pany(missing = FALSE) ` | | ` when_any(na_rm = TRUE) ` |
711+ +------------------------------+--------------------------+---------------------------+
712+ | ` list_pany(missing = TRUE) ` | | |
713+ +------------------------------+--------------------------+---------------------------+
714+
715+ - ` list_pall(missing = FALSE) ` :
716+
717+ - Interesting how this is useful as the ` retain() ` / ` exclude() ` default behavior but becomes too confusing to try and expose in ` when_all() ` as ` missing ` vs the simpler ` na_rm ` . Keeping "the most flexible" vector function way in vctrs feels right since the ` missing = FALSE ` case here is less useful in a vector context. It doesn't prevent you from doing ` retain(when_all()) ` because the default propagates ` NA ` and then ` retain() ` itself does the ` missing = FALSE ` part.
718+
719+ - ` list_pany(missing = TRUE) ` :
720+
721+ - Like ` list_pall(missing = FALSE) ` , this is not the useful variant to expose at the vector level. Also happens to not have an exposed data frame variant, so dplyr doesn't expose it at all, which feels fine. Not a single example needed it.
0 commit comments