Skip to content

comply to Bioconductor review #8

@stemangiola

Description

@stemangiola

DESCRIPTION

  • I think the package should be re-named for maximum clarity as TidySingleCellExperiment -- the appreviation SCE could mean anything. The capitalization follows Bioconductor convention.

We are well in-line as I don't like acronyms either. As note, sce is the common name for SingleCellExperiment object e.g. in this official source, is really commonly accepted by the community.

In order to respect both Bioconductor tradition and tidyverse standards (afterall this is a equal-rights marriage) the name tidySingleCellExperiment can be a good solution. (I understand the S4 point of view, but this package is dedicated to end-users, so the elegance in design and vocabulary has them as our priority)

Having said that, at the BioC-Asia2020 we will gather the opinion of the community with a pool at our workshop.

  • Title: field opening quote not closed; is it necessary

  • Description: Not sure what an 'invisible layer' is as a programming concept?

  • Imports: not all packages listed here are actually used directly? Remove packages that provide functionality that is not directly referenced.

TESTS

  • consider organising test files so that they have the same structure as the R/ files -- each tests/testthat/test-* is testing code defined in the corresponding R/* file. This organization makes it easier to navigate the code.

  • hmm, I find the coding style here very difficult to read, and wonder why it does not follow tidyverse guidelines? https://style.tidyverse.org/pipes.html#whitespace .

  • Likewise with tests, it would seem that nesting pipes within expect_() is antithetic to readability -- instead, create a temporary variable from a piped expresssion, and use the variable in expect_() statement?

I opted for full tydyverse style without the need to create temporary variables

  • test-dplyr.R:16 It is never appropriate to use direct slot access; use accessors instead.

R (comments are on specfic lines of code, but apply throughout)

  • data.R I was surprised that these data sets were not documented here, where they are defined? The documentation in man/pbmc_small.Rd, for instance, is inadequate -- provide a description of what the object represents, how it was derived, how it is used in the package.

  • dplyr_methods.R:58 Redefining the generic is not appropriate. Instead, define S3 methods that implement functionality for the specific classes you wish to provide methods for, and export the methods. Since you expect the user to use the dplyr generics, place dplyr in the Depends: field of the DESCRIPTION file.

I worked on it, and I agree. Respecting this standard is on out to-do list for tidySCE, tidySE and tidybulk. However I was not able to make things work under this standard this branch (either the export from roxigen disappears, or the method.class does not get exported). I ask to give us a release cycle to fix this. This although is a better approach does not affect negatively when using any package.

  • dplyr_methods.R:157 I find the mix of 'base' and tidy functionality confusing; why use cbind() rather than the tidy equivalent? In the next line,

I used cbind to cbind two SingleCellExperiment objects

  • dplyr_methods.R:189 Never use direct slot access @, instead use accessors colData(). It's strange to see as.data.frame() in this line, rather than as_tibble(), and likewise creating the second argument as a DataFrame()???

tts[[1]] is a SingleCellExperiment; colData(tts[[1]]) is a DataFrame that cannot be converted to tibble directly; DataFrame is used to update colData from SCE

  • dplyr_methods.R:224 This formatting

message("tidySCE says: A data frame is returned for
independent data analysis.")
results in quasi-arbitrary formatting for output, where the white space introduced for programmatic convenience is echoed literally to the user

tidySCE says: A data frame is returned for
independent data analysis.
Rely on message()'s internal use of paste0() or, if precise formatting is important, use strwrap()

message(
"tidySCE says: A data frame is returned for ",
"independent data analysis."
)
or (presummable in a helper function for re-use across your code)

txt <- paste(
"tidySCE says: A data frame is returned for ",
"independent data analysis."
)
message(paste(strwrap(txt, exdent = 4), collapse = "\n"))

  • dplyr_methods.R:622 long lines of piped code and nested conditional tests are unreadable. Reformat, e.g.,

tst <- intersect(
cols %>%
names(),
get_special_columns(.data) %>%
c(get_needed_columns())) %>%
length() %>%
gt(0)
)
if (tst) {

  • (why do you write c(get_needed_columns())) instead of get_needed_columns()?)

this is a concatenation "a" %>% c("b")

  • In the stop() message immediately after, reconsider use of stop(sprintf()) and instead rely on stop()'s internal use of paste0()

columns <-
get_special_columns(.data) %>%
c(get_needed_columns()) %>%
paste(collapse=", ")
stop(
"tidySCE says: you are trying to rename a column that is view only ",
columns, " ",
"(it is not present in the colData). If you want to mutate a view-only",
"column, make a copy and mutate that one.",
)

  • For such long messages adopting a strwrap() discipline as outlined above would seem to be highly desireable.

For the moment we have adopted the stop paste0 function suggested above. In the future this will be an area of improvement.

  • dplyr_methods.R:775 this code is repeated several times. Write a function instead to remove code duplication.

the code is similar but not identical

  • dplyr_methods.R:1146 it should not be necessary to preface functions with their package names; I don't think this helps readability of the code.

I understan the concern, however since I am overwriting some tidyverse function this helps some inconsistencies. We plan to improve the code clarity as the pakage matures

  • methods.R:3 it's strange to see a class definition at the top of a file called 'methods.R'; a more common organization would give the file the name of the class -- tidySCE.R. The Bioconductor / S4 convention (this is an S4 class, after all...) would name this class TidySCE, and perhaps further recognizing the imporance of being clear to the user and that the user will seldom have to type out the word in it's entirety (e.g., because of autocompletion in RStudio) it would be better to name the class explicitly TidySingleCellExperiment.

We agree, we wait to create this file once the package name has been settled (first point in the review list).

  • methods.R:106 highly nested calls like this are just too complicated for the user to parse. Adopt a more procedural programming style with less nesting.

MAN

  • not sure what the figures/ directory is doing here?

It's used for the figures in the README, as here: https://r-pkgs.org/whole-game.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions