-
Notifications
You must be signed in to change notification settings - Fork 16
Description
It is a widespread practice to rely on set.seed() to construct pseudo-random sets of analyses, but in a reproducible manner.
Some functions within the cachem package modify this seed, but it is not obvious to users of the package that they should do so. This can (and did in my case) lead to unexpected and hard to debug behavior when cachem is introduced into such analysis directly or through intermediate packages that rely on cachem for its effectiveness as a caching library.
The issue can be demonstrated with a simple reprex:
library(cachem)
library(testthat)
cm <- cache_mem()
cd <- cache_disk(tempdir())
set.seed(42)
expect_equal(sample(99999, 1), 61413)
set.seed(42)
cm$set("x",letters)
expect_equal(sample(99999, 1), 61413)
set.seed(42)
cd$set("x",letters)
expect_equal(sample(99999, 1), 61413)
#> Error: sample(99999, 1) not equal to 61413.
#> 1/1 mismatches
#> [1] 73236 - 61413 == 11823The above also demonstrates that seed updates depend on the type of the cache, which can further complicate things (something that works with a memory cache can stop working when switching to a layered or disk cache).
My suggested fix would be to isolate the usage of the random seed wherever cachem needs access to randomness, so that the global seed is not affected by calls to cachem functions.
A discussion of this issue can be found here:
And an implementation of this within the shiny package (which is the approach I would recommend) can be found here:
I'd be happy to discuss the best approach to this fix/enhancement, and provide an eventual pull request if it seems like it would be well-received.