diff --git a/DESCRIPTION b/DESCRIPTION index 586ef0f308..cf54691231 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -75,7 +75,7 @@ Authors@R: c( person("Michael","Czekanski", role="ctb")) Depends: R (>= 3.1.0) Imports: methods -Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), curl, R.utils, xts, nanotime, zoo (>= 1.8-1), yaml, knitr, rmarkdown, markdown +Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), curl, R.utils, xts, nanotime, zoo (>= 1.8-1), yaml, knitr, rmarkdown, markdown, parallel SystemRequirements: zlib Description: Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development. License: MPL-2.0 | file LICENSE diff --git a/NAMESPACE b/NAMESPACE index c22782440a..9ea5613559 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -51,7 +51,11 @@ S3method(cube, data.table) S3method(rollup, data.table) export(frollmean) export(frollsum) +export(frollmax) +export(frollmin) +export(frollprod) export(frollapply) +export(frolladapt) export(nafill) export(setnafill) export(.Last.updated) diff --git a/NEWS.md b/NEWS.md index 4f4a2f417a..e69a518d02 100644 --- a/NEWS.md +++ b/NEWS.md @@ -4,8 +4,49 @@ # data.table [v1.14.3](https://github.com/Rdatatable/data.table/milestone/20) (in development) +## POTENTIALLY BREAKING CHANGES + +1. Rolling functions `frollmean` and `frollsum` used to treat `Inf` and `-Inf` as `NA` when using default `algo="fast"`. It has been changed now and infinity values are not treated as `NA` anymore. If your input into those function has `Inf` or `-Inf` then you will be affected by this change. [#5441](https://github.com/Rdatatable/data.table/pull/5441). + ## NEW FEATURES +0. (needs to be moved after rebase anyway) New `frollmin` and `frollprod` has been implemented, towards[#2778](https://github.com/Rdatatable/data.table/issues/2778). + +0. (needs to be moved after rebase anyway) New `frolladapt` helper function has been added to aid in preparing adaptive length rolling window width when dealing with _irregularly spaced ordered data_. This lets the user to apply a rolling function over a period without having to deal with gaps in a data where some periods might be missing. + +```r +idx = as.Date("2022-10-23") + c(0,1,4,5,6,7,9,10,14) +dt = data.table(index=idx, value=seq_along(idx)) +dt +# index value +# +#1: 2022-10-23 1 +#2: 2022-10-24 2 +#3: 2022-10-27 3 +#4: 2022-10-28 4 +#5: 2022-10-29 5 +#6: 2022-10-30 6 +#7: 2022-11-01 7 +#8: 2022-11-02 8 +#9: 2022-11-06 9 +dt[, c("rollmean3","rollmean3days") := list( + frollmean(value, 3), + frollmean(value, frolladapt(index, 3), adaptive=TRUE) + )] +dt +# index value rollmean3 rollmean3days +# +#1: 2022-10-23 1 NA NA +#2: 2022-10-24 2 NA NA +#3: 2022-10-27 3 2 3.0 +#4: 2022-10-28 4 3 3.5 +#5: 2022-10-29 5 4 4.0 +#6: 2022-10-30 6 5 5.0 +#7: 2022-11-01 7 6 6.5 +#8: 2022-11-02 8 7 7.5 +#9: 2022-11-06 9 8 9.0 +``` + 1. `nafill()` now applies `fill=` to the front/back of the vector when `type="locf|nocb"`, [#3594](https://github.com/Rdatatable/data.table/issues/3594). Thanks to @ben519 for the feature request. It also now returns a named object based on the input names. Note that if you are considering joining and then using `nafill(...,type='locf|nocb')` afterwards, please review `roll=`/`rollends=` which should achieve the same result in one step more efficiently. `nafill()` is for when filling-while-joining (i.e. `roll=`/`rollends=`/`nomatch=`) cannot be applied. 2. `mean(na.rm=TRUE)` by group is now GForce optimized, [#4849](https://github.com/Rdatatable/data.table/issues/4849). Thanks to the [h2oai/db-benchmark](https://github.com/h2oai/db-benchmark) project for spotting this issue. The 1 billion row example in the issue shows 48s reduced to 14s. The optimization also applies to type `integer64` resulting in a difference to the `bit64::mean.integer64` method: `data.table` returns a `double` result whereas `bit64` rounds the mean to the nearest integer. @@ -296,6 +337,44 @@ 41. New function `%notin%` provides a convenient alternative to `!(x %in% y)`, [#4152](https://github.com/Rdatatable/data.table/issues/4152). Thanks to Jan Gorecki for suggesting and Michael Czekanski for the PR. `%notin%` uses half the memory because it computes the result directly as opposed to `!` which allocates a new vector to hold the negated result. If `x` is long enough to occupy more than half the remaining free memory, this can make the difference between the operation working, or failing with an out-of-memory error. +42. Multiple improvements has been added to rolling functions. Request came from @gpierard who needed left aligned, adaptive, rolling max, [#5438](https://github.com/Rdatatable/data.table/issues/5438). There was no `frollmax` function yet. Adaptive rolling functions did not have support for `align="left"`. `frollapply` did not support `adaptive=TRUE`. Available alternatives were base R `mapply` or self-join using `max` and grouping `by=.EACHI`. As a follow up of his request, following features has been added: +- new function `frollmax`, applies `max` over a rolling window. +- support for `align="left"` for adaptive rolling function. +- support for `adaptive=TRUE` in `frollapply`. +- `partial` argument to trim window width to available observations rather than returning `NA` whenever window is not complete. +- `give.names` argument that can be used to automatically give the names based on the names of `x` and `n`. +- `frollmean` and `frollsum` no longer treat `Inf` and `-Inf` as `NA`s as it used to be for `algo="fast"` (breaking change). +- `hasNA` argument has been renamed to `has.nf` to convey that it is not only related to `NA/NaN` but other non-finite values (`Inf/-Inf`) as well. + +For a comprehensive description about all available features see `?froll` manual. + +Adaptive `frollmax` has observed to be up to 50 times faster than second fastest solution (data.table self-join using `max` and grouping `by=.EACHI`). Note that important factor in performance is width of the rolling window. Code for the benchmark below has been taken from [this SO answer](https://stackoverflow.com/a/73408459/2490497). +```r +set.seed(108) +setDTthreads(8) +x = data.table( + value = cumsum(rnorm(1e6, 0.1)), + end_window = 1:1e6 + sample(50:500, 1e6, TRUE), + row = 1:1e6 +)[, "end_window" := pmin(end_window, .N) + ][, "len_window" := end_window-row+1L] + +baser = function(x) x[, mapply(function(from, to) max(value[from:to]), row, end_window)] +sj = function(x) x[x, max(value), on=.(row >= row, row <= end_window), by=.EACHI]$V1 +frmax = function(x) x[, frollmax(value, len_window, adaptive=TRUE, align="left", has.nf=FALSE)] +frapply = function(x) x[, frollapply(value, len_window, max, adaptive=TRUE, align="left")] +microbenchmark::microbenchmark( + baser(x), sj(x), frmax(x), frapply(x), + times=10, check="identical" +) +#Unit: milliseconds +# expr min lq mean median uq max neval +# baser(x) 5181.36076 5417.57505 5537.2929 5494.73652 5706.2721 5818.6627 10 +# sj(x) 4608.28940 4627.57186 4792.4031 4785.35306 4856.4475 5054.3301 10 +# frmax(x) 70.41253 75.28659 91.3774 91.40227 102.0248 116.8622 10 +# frapply(x) 713.23108 742.34657 865.2524 848.31641 965.3599 1114.0531 10 +``` + ## BUG FIXES 1. `by=.EACHI` when `i` is keyed but `on=` different columns than `i`'s key could create an invalidly keyed result, [#4603](https://github.com/Rdatatable/data.table/issues/4603) [#4911](https://github.com/Rdatatable/data.table/issues/4911). Thanks to @myoung3 and @adamaltmejd for reporting, and @ColeMiller1 for the PR. An invalid key is where a `data.table` is marked as sorted by the key columns but the data is not sorted by those columns, leading to incorrect results from subsequent queries. diff --git a/R/froll.R b/R/froll.R index df901f0b84..038e0797a0 100644 --- a/R/froll.R +++ b/R/froll.R @@ -1,21 +1,188 @@ -froll = function(fun, x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) { +## those two helpers does not quote argument names in errors because frollapply has them in uppercase +partial2adaptive = function(x, n, align, adaptive) { + if (align=="center") + stopf("'partial' cannot be used together with align='center'") + if (is.list(x)) { + if (!is.data.frame(x) && length(unique(vapply(x, length, 0L)))!=1L) ## froll + stopf("'partial' does not support variable length of columns in x") + else if (all(vapply(x, is.data.frame, FALSE)) && length(unique(vapply(x, nrow, 0L)))!=1L) ## frollapply by.column=F, single DT already wrapped into list + stopf("'partial' does not support variable nrow of data.tables in x") + } + if (!adaptive) { + if (is.list(n)) + stopf("n must be integer, list is accepted for adaptive TRUE") + else if (!is.numeric(n)) + stopf("n must be integer vector") + } else if (!(is.numeric(n) || (is.list(n) && all(vapply(n, is.numeric, FALSE))))) { + stopf("n must be integer vector or list of integer vectors") + } + len = if (is.list(x)) { + if (is.data.frame(x[[1L]])) ## frollapply by.column + nrow(x[[1L]]) + else + length(x[[1L]]) ## froll, this will work for both x list and x dt on input + } else length(x) + verbose = getOption("datatable.verbose") + if (!adaptive) { + n = as.list(n) ## test 6006.032 + if (verbose) + cat("partial2adaptive: froll partial=TRUE trimming n and redirecting to adaptive=TRUE\n") + trimn = function(n, len, align) { + n = min(n, len) + if (align=="right") + c(seq_len(n), rep(n, len-n)) + else + c(rep(n, len-n), rev(seq_len(n))) + } + sapply(n, len, align, FUN=trimn, simplify=FALSE) + } else { + if (!is.list(n)) n = list(n) + if (length(unique(vapply(n, length, 0L)))!=1L) + stopf("adaptive windows provided in n must not to have different lengths") + if (length(n[[1L]]) != len) + stopf("length of vectors in x must match to length of adaptive window in n") + if (verbose) + cat("partial2adaptive: froll adaptive=TRUE and partial=TRUE trimming n\n") + triman = function(n, align) { + if (align=="right") + pmin(n, seq_along(n)) + else + pmin(n, rev(seq_along(n))) + } + sapply(n, align, FUN=triman, simplify=FALSE) + } +} +make.roll.names = function(x.len, n.len, n, x.nm, n.nm, fun, adaptive) { + if (is.null(n.nm)) { + if (!adaptive) { + if (!is.numeric(n)) + stopf("internal error: misuse of make.names, n must be numeric for !adaptive") ## nocov + n.nm = paste0("roll", fun, as.character(as.integer(n))) + } else { + n.nm = paste0("aroll", fun, seq_len(n.len)) + } + } else if (!length(n.nm) && !adaptive) + stopf("internal error: misuse of make.names, non-null length 0 n is not possible for !adaptive") ## nocov + if (is.null(x.nm)) { + x.nm = paste0("V", seq_len(x.len)) + } + ans = if (length(x.nm)) { ## is.list(x) && !is.data.frame(x) + if (length(n.nm)) { ## !adaptive || is.list(n) + paste(rep(x.nm, each=length(n.nm)), n.nm, sep="_") + } else { ## adaptive && is.numeric(n) + x.nm + } + } else { ## (by.column && is.atomic(x)) || (!by.column && is.data.frame(x)) + if (length(n.nm)) { ## !adaptive || is.list(n) + n.nm + } else { ## adaptive && is.numeric(n) + NULL + } + } + if (!is.null(ans) && length(ans) != x.len*n.len) + stopf("internal error: make.names generated names of wrong length") ## nocov + ans +} + +## irregularly spaced time series, helper for creating adaptive window size +frolladapt = function(x, n, align="right", partial=FALSE, give.names=FALSE) { + x = unclass(x) + if (!is.numeric(x)) + stopf("Index vector 'x' must of numeric type") + if (!is.integer(x)) + x = as.integer(x) + if (!is.numeric(n)) { + stopf("Window size 'n' must be integer") + } else { + nms = names(n) ## only for give.names + if (!is.integer(n)) { + if (!isRealReallyInt(n)) + stopf("Window size 'n' must be integer") + n = as.integer(n) + } + } + if (length(n) < 1L || anyNA(n)) + stopf("Argument 'n' must be non-zero length and must not have NAs") + if (!identical(align, "right")) + stopf("Argument 'align' other than 'right' has not yet been implemented") + if (!isTRUEorFALSE(partial)) + stopf("Argument 'partial' must be TRUE or FALSE") + if (!isTRUEorFALSE(give.names)) + stopf("Argument 'give.names' must be TRUE or FALSE") + + if (length(n) == 1L) { + ans = .Call(Cfrolladapt, x, n, partial) + } else { + ans = lapply(n, function(.n) .Call(Cfrolladapt, x, .n, partial)) + if (give.names) { + if (is.null(nms)) + nms = paste0("n", as.character(n)) + setattr(ans, "names", nms) + } + } + ans +} + +froll = function(fun, x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA=NA) { stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun)) + if (!missing(hasNA)) { + if (!is.na(has.nf)) + stopf("hasNA is deprecated, use has.nf instead") + warning("hasNA is deprecated, use has.nf instead") + has.nf = hasNA + } # remove check on next major release algo = match.arg(algo) align = match.arg(align) - ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, hasNA, adaptive) + if (isTRUE(give.names)) { + orig = list(n=n, adaptive=adaptive) + xnam = if (is.list(x)) names(x) else character() + nnam = if (isTRUE(adaptive)) { + if (is.list(n)) names(n) else character() + } else names(n) + nx = if (is.list(x)) length(x) else 1L + nn = if (isTRUE(adaptive)) { + if (is.list(n)) length(n) else 1L + } else length(n) + } + if (isTRUE(partial)) { + n = partial2adaptive(x, n, align, adaptive) + adaptive = TRUE + } ## support for partial added in #5441 + leftadaptive = isTRUE(adaptive) && align=="left" + if (leftadaptive) { + verbose = getOption("datatable.verbose") + rev2 = function(x) if (is.list(x)) sapply(x, rev, simplify=FALSE) else rev(x) + if (verbose) + cat("froll: adaptive=TRUE && align='left' pre-processing for align='right'\n") + x = rev2(x) + n = rev2(n) + align = "right" + } ## support for left adaptive added in #5441 + ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, has.nf, adaptive) + if (leftadaptive) { + if (verbose) + cat("froll: adaptive=TRUE && align='left' post-processing from align='right'\n") + ans = rev2(ans) + } + if (isTRUE(give.names) && is.list(ans)) { + nms = make.roll.names(x.len=nx, n.len=nn, n=orig$n, x.nm=xnam, n.nm=nnam, fun=fun, adaptive=orig$adaptive) + setattr(ans, "names", nms) + } ans } -frollmean = function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) { - froll(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive) +frollmean = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) { + froll(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names) } -frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) { - froll(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive) +frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) { + froll(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names) } -frollapply = function(x, n, FUN, ..., fill=NA, align=c("right", "left", "center")) { - FUN = match.fun(FUN) - align = match.arg(align) - rho = new.env() - ans = .Call(CfrollapplyR, FUN, x, n, fill, align, rho) - ans +frollmax = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) { + froll(fun="max", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names) +} +frollmin = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) { + froll(fun="min", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names) +} +frollprod = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) { + froll(fun="prod", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, has.nf=has.nf, adaptive=adaptive, partial=partial, hasNA=hasNA, give.names=give.names) } diff --git a/R/frollapply.R b/R/frollapply.R new file mode 100644 index 0000000000..d983bd844b --- /dev/null +++ b/R/frollapply.R @@ -0,0 +1,368 @@ +## ansmask is to handle leading values from fill to match type of the ans +simplifylist = function(x, fill, ansmask) { + l = lengths(x) + ul = unique(l) + if (length(ul)!=1L) ## different lenghts + return(x) + t = vapply(x, typeof, "", USE.NAMES=FALSE) + ut = unique(t) + if (length(ut)==2L) { + all.ut = ut + t = vapply(x[ansmask], typeof, "", USE.NAMES=FALSE) + ut = unique(t) + if (length(ut)!=1L) + return(x) ## different typeof even excluding fill, a FUN was not type stable + if (!(ut=="integer"||ut=="logical"||ut=="double"||ut=="complex"||ut=="character"||ut=="raw")) + return(x) ## ans is not atomic + if (identical(fill, NA)) { ## different typeof, try to handle fill=NA logical type + filli = which(!ansmask) + ans1 = x[[which.first(ansmask)]] + x[filli] = rep_len(list(ans1[NA]), length(filli)) ## this will recycle to length of ans1 + } else if (all(c("integer","double") %in% all.ut)) { ## typeof numeric and int, fill is coerced to the type FUN + filli = which(!ansmask) + cast = if (ut=="double") as.numeric else as.integer + x[filli] = rep_len(list(cast(fill)), length(filli)) + } else { ## length == 2L but no easy way to match type + return(x) + } + } else if (length(ut)>2L) { ## unique typeof length > 2L + return(x) + } + if (ut=="integer"||ut=="logical"||ut=="double"||ut=="complex"||ut=="character"||ut=="raw") { + if (ul==1L) ## length 1 + return(unlist(x, recursive=FALSE, use.names=FALSE)) + else ## length 2+ + return(rbindlist(lapply(x, as.list))) + } else if (ut=="list") { + if (all(vapply(x, is.data.frame, FALSE, USE.NAMES=FALSE))) ## list(data.table(...), data.table(...)) + return(rbindlist(x)) + ll = lapply(x, lengths) ## length of each column of each x + ull = unique(ll) + if (length(ull)==1L) ## list(list(1:2, 1:2), list(2:3, 2:3)) + return(rbindlist(x)) + lu = function(x) length(unique(x)) + if (all(vapply(ull, lu, 0L, USE.NAMES=FALSE)==1L)) ## within each x column lengths the same, each could be DF: list(list(1, 2), list(1:2, 2:3)) + return(rbindlist(x)) + } ## else NULL, closure, special, builtin, environment, S4, ... + x +} + +## parallel's fork serializes results so we need setalloccol +fixselfref = function(x) { + if (length(x) && is.data.table(x[[1L]])) { ## calling is.data.table many times always may be too much, so first we check only first element + dtmask = vapply(x, is.data.table, FALSE, USE.NAMES=FALSE) ## most likely all, but we cannot be sure that function is type stable + x[dtmask] = lapply(x[dtmask], setalloccol) + } + x +} + +all.atomic = function(x) all(vapply(x, is.atomic, FALSE, USE.NAMES=FALSE)) +all.data.frame = function(x) all(vapply(x, is.data.frame, FALSE, USE.NAMES=FALSE)) +all.list = function(x) all(vapply(x, is.list, FALSE, USE.NAMES=FALSE)) +equal.lengths = function(x) length(unique(lengths(x)))<=1L + +frollapply = function(X, N, FUN, ..., by.column=TRUE, fill=NA, align=c("right","left","center"), adaptive=FALSE, partial=FALSE, give.names=FALSE, simplify=TRUE, x, n) { + if (!missing(x)) { + warningf("'x' is deprecated in frollapply, use 'X' instead") + X = x + } + if (!missing(n)) { + warningf("'n' is deprecated in frollapply, use 'N' instead") + N = n + } + if (!isTRUEorFALSE(by.column)) + stopf("'by.column' must be TRUE or FALSE") + if (!isTRUEorFALSE(adaptive)) + stopf("'adaptive' must be TRUE or FALSE") + if (!isTRUEorFALSE(partial)) + stopf("'partial' must be TRUE or FALSE") + if (!isTRUEorFALSE(give.names)) + stopf("'give.names' must be TRUE or FALSE") + if (!isTRUEorFALSE(simplify) && !is.function(simplify)) + stopf("'simplify' must be TRUE or FALSE or a function") + + align = match.arg(align) + FUN = match.fun(FUN) + verbose = getOption("datatable.verbose") + if (give.names) + orig = list(N=N, adaptive=adaptive) + + ## by.column, x validation, x preprocess + if (by.column) { + if (is.atomic(X)) { + xvec = FALSE ## marker about form of input, used to unpack answer to vector + len = length(X) ## count of observations for deepest loop + nx = as.integer(as.logical(len)) ## top level loop for vectorized x + X = list(X) + xnam = character() ## used for give.names + } else if (is.list(X) && all.atomic(X)) { + xvec = TRUE + nx = length(X) + len = lengths(X) + xnam = names(X) + } else + stopf("frollapply by.column=TRUE requires 'X' argument to be atomic or a list of those") + } else { + list.df = FALSE + if (is.data.frame(X)) { + xvec = FALSE + len = nrow(X) + nx = as.integer(as.logical(len)) + X = list(X) + xnam = character() + } else if (is.list(X)) { + if (all.atomic(X)) { ## handles frollapply(.(col1, col2), ...) + if (!equal.lengths(X)) + stopf("frollapply by.column=FALSE, when provided a list in 'X' then all vectors must have equal lengths, like data.frame") + list.df = TRUE + xvec = FALSE + len = if (length(X)) length(X[[1L]]) else 0L + nx = as.integer(as.logical(len)) + X = list(X) + xnam = character() + } else if (all.data.frame(X)) { + if (!all(vapply(X, all.atomic, FALSE, USE.NAMES=FALSE))) + stopf("frollapply by.column=FALSE got vectorized input in 'X', list of data.frames/data.tables, but not all columns of data.frames/data.tables are atomic") + xvec = TRUE + len = vapply(X, nrow, 0L, USE.NAMES=FALSE) + nx = length(X) + xnam = names(X) + } else if (all.list(X)) { ## vectorized input does not support lists as that would be ambiguous + stopf("frollapply by.column=FALSE supports vectorized input in 'X' as a list of data.frames/data.tables, not a list of lists. Turn nested lists into data.frames/data.table and retry.") + } else { ## mixed types + stopf("frollapply by.column=FALSE got list in 'X' but it is not valid one. If intent is to pass a list as non-vectorized input, but a single object to apply function to, then the list must have all its vectors atomic. For a vectorized input, passing multiple objects to apply function to, it must be a list of data.frames/data.tables.") + } + } else + stopf("frollapply by.column=FALSE requires 'X' argument to be a data.table/data.frame or a list of equal length vectors. For vectorized input can be a list of data.frames/data.tables, but not a list of lists. All columns/vectors must be atomic.") + } + ## adaptive, n validation, n preprocess + if (!length(N)) + stopf("'N' must be non 0 length") + if (!adaptive) { + if (is.list(N)) + stopf("'N' must be integer, list is accepted for adaptive TRUE") + else if (!is.numeric(N)) + stopf("'N' must be integer vector") + nnam = names(N) ## used for give.names + if (!is.integer(N)) + N = as.integer(N) + nn = length(N) ## top level loop for vectorized n + } else { + if (length(unique(len)) > 1L) ## vectorized x requires same nrow for adaptive + stopf("adaptive rolling function can only process 'X' having equal length of elements; If you want to call rolling function on list having variable length of elements call it for each field separately") + if (is.numeric(N)) { + if (length(N) != len[1L]) + stopf("length of integer vector(s) provided as list to 'N' argument must be equal to number of observations provided in 'X'") + if (!is.integer(N)) + N = as.integer(N) + nn = 1L + N = list(N) + nnam = character() + } else if (is.list(N)) { + if (length(N[[1L]])!=len[1L]) + stopf("length of integer vector(s) provided as list to 'N' argument must be equal to number of observations provided in 'X'") + if (!equal.lengths(N)) + stopf("adaptive windows provided in 'N' must not to have different lengths") + if (!all(vapply(N, is.numeric, FALSE, USE.NAMES=FALSE))) + stopf("n must be integer vector or list of integer vectors") + if (!all(vapply(N, is.integer, FALSE, USE.NAMES=FALSE))) + N = lapply(N, as.integer) + nn = length(N) + nnam = names(N) + } else + stopf("'N' must be integer vector or list of integer vectors") + } + ## partial + if (partial) { + N = partial2adaptive(X, N, align, adaptive) + adaptive = TRUE + } + ## left adaptive preprocess x and n + if (adaptive) { + if (align=="center") + stopf("using adaptive TRUE and align 'center' is not implemented") + leftadaptive = align=="left" + } else leftadaptive = FALSE + if (leftadaptive) { + if (verbose) + cat("frollapply: adaptive=TRUE && align='left' pre-processing for align='right'\n") + if (by.column) { + X = lapply(X, rev) + } else { + rev.d = function(d) { + l = lapply(d, rev) + if (is.data.table(d)) setDT(l) else if (is.data.frame(d)) setDF(l) else l + } + X = lapply(X, rev.d) + } + N = lapply(N, rev) + align = "right" + } + + ## prepare functions so we don't need to branch inside the loops, makes code in loops cleaner as well + ## only tight has to be optimized + if (!adaptive) { + cpy = copy + ansMask = function(len, n) { + mask = rep(TRUE, len) + mask[seq_len(n-1L)] = FALSE + mask + } + if (by.column) { + allocWindow = function(x, n) x[seq_len(n)] + tight = function(i, dest, src, n) FUN(.Call(CmemcpyVector, dest, src, i, n), ...) + } else { + if (!list.df) { + allocWindow = function(x, n) x[seq_len(n), , drop=FALSE] + } else { + allocWindow = function(x, n) lapply(x, `[`, seq_len(n)) + } + tight = function(i, dest, src, n) FUN(.Call(CmemcpyDT, dest, src, i, n), ...) + } + } else { + has.growable = base::getRversion() >= "3.4.0" + cpy = if (has.growable) function(x) .Call(Csetgrowable, copy(x)) else copy + ansMask = function(len, n) { + seq_len(len) >= n + } + if (by.column) { + allocWindow = function(x, n) x[seq_len(max(n))] + if (has.growable) { + tight = function(i, dest, src, n) FUN(.Call(CmemcpyVectoradaptive, dest, src, i, n), ...) + } else { + tight = function(i, dest, src, n) FUN(src[(i-n[i]+1L):i], ...) + } + } else { + if (!list.df) { + allocWindow = function(x, n) x[seq_len(max(n)), , drop=FALSE] + } else { + allocWindow = function(x, n) lapply(x, `[`, seq_len(max(n))) + } + if (has.growable) { + tight = function(i, dest, src, n) FUN(.Call(CmemcpyDTadaptive, dest, src, i, n), ...) + } else { + if (!list.df) { + tight = function(i, dest, src, n) FUN(src[(i-n[i]+1L):i, , drop=FALSE], ...) + } else { + tight = function(i, dest, src, n) FUN(lapply(src, `[`, (i-n[i]+1L):i), ...) + } + } + } + } + ## prepare templates for errors and warnings + msg.collect = "frollapply calling parallel::mccollect to collect results from forked processes raised %s.\n%s" + msg.simplify = if (is.function(simplify)) + "frollapply completed successfully but raised %s when attempting to simplify results using user specified function in 'simplify' argument. Be sure to provide 'fill' argument matching the type and shape of results returned by the your function. Use simplify=FALSE to obtain a list instead.\n%s" + else if (isTRUE(simplify)) + "frollapply completed successfully but raised %s when attempting to simplify results using our internal 'simplifylist' function. Be sure to provide 'fill' argument matching the type and shape of results returned by the your function. Use simplify=FALSE to obtain a list instead. If you believe your results could be automatically simplified please submit your use case as new issue in our issue tracker.\n%s" + + DTths = getDTthreads(FALSE) + use.fork = .Platform$OS.type!="windows" && DTths > 1L + if (verbose) { + if (use.fork) cat("frollapply running on multiple CPU threads using parallel::mcparallel\n") + else cat("frollapply running on single CPU thread\n") + } + ans = vector("list", nx*nn) + ## vectorized x + for (i in seq_len(nx)) { + thisx = X[[i]] + thislen = len[i] + if (!thislen) + next + ## vectorized n + for (j in seq_len(nn)) { + thisn = N[[j]] + w = allocWindow(thisx, thisn) ## prepare window, handles adaptive + ansmask = ansMask(thislen, thisn) + ansi = which(ansmask) + if (use.fork) { ## !windows && getDTthreads()>1L + ths = min(DTths, length(ansi)) + ii = split(ansi, sort(rep_len(seq_len(ths), length(ansi)))) ## assign row indexes to threads + jobs = vector("integer", ths) + for (th in seq_len(ths)) { + jobs[th] = parallel::mcparallel({ + setDTthreads(1L) ## disable nested parallelism + lapply(ii[[th]], ## loops over indexes for that thread + FUN = tight, ## handles adaptive and by.column + dest = cpy(w), ## allocate own window for each thread + src = thisx, ## full input + n = thisn) ## scalar or in adaptive case a vector + })[["pid"]] + } + } else { ## windows || getDTthreads()==1L + h = list(err=NULL, warn=NULL) ## pretty printing errors/warnings + oldDTthreads = setDTthreads(1L) ## for consistency, anyway window size is unlikely to be big enough to benefit any parallelism + withCallingHandlers( + tryCatch( + thisans <- lapply(ansi, FUN = tight, dest = cpy(w), src = thisx, n = thisn), + error = function(e) h$err <<- conditionMessage(e) + ), warning = function(w) {h$warn <<- c(h$warn, conditionMessage(w)); invokeRestart("muffleWarning")} + ) + setDTthreads(oldDTthreads) + if (!is.null(h$warn)) + warningf("frollapply received a warning(s) when evaluating FUN:\n%s", paste(unique(h$warn), collapse="\n")) + if (!is.null(h$err)) + stopf("frollapply received an error(s) when evaluating FUN:\n%s", h$err) + } + ## align + if (leftadaptive) { + ansmask = rev(ansmask) + ansi = which(ansmask) + } else if (align!="right") { ## must be non-adaptive bc adaptive don't support align=center + ansmask = shift(ansmask, if (align=="left") 1L-thisn else -floor(thisn/2L), fill=FALSE, type="shift") + ansi = which(ansmask) + } + ## fill + thisansi = (i-1L)*nn+j + ans[[thisansi]] = vector("list", thislen) + filli = which(!ansmask) + ans[[thisansi]][filli] = rep_len(list(fill), length(filli)) + ## collect results + if (length(ansi)) { + if (use.fork) { + fork.res = tryCatch( + parallel::mccollect(jobs), + error = function(e) stopf(msg.collect, "an error", e[["message"]]), + warning = function(w) warningf(msg.collect, "a warning", w[["message"]]) + ) + ## check for any errors in FUN, warnings are silently ignored + fork.err = vapply(fork.res, inherits, FALSE, "try-error", USE.NAMES=FALSE) + if (any(fork.err)) + stopf("frollapply received an error(s) when evaluating FUN:\n%s", + paste(unique(vapply(fork.res[fork.err], function(err) attr(err,"condition",TRUE)[["message"]], "", USE.NAMES=FALSE)), collapse="\n")) + thisans = unlist(fork.res, recursive=FALSE, use.names=FALSE) + ## fix selfref after serializing data.table from forked process + thisans = fixselfref(thisans) + } ## thisans is already created from !use.fork, don't need error check, unlist or fixselfref + if (leftadaptive) + thisans = rev(thisans) + ans[[thisansi]][ansi] = thisans + } + ## simplify + if (is.function(simplify)) { + ans[[thisansi]] = tryCatch( + simplify(ans[[thisansi]]), + error = function(e) stopf(msg.simplify, "an error", e[["message"]]), + warning = function(w) warningf(msg.simplify, "a warning", w[["message"]]) + ) + } else if (isTRUE(simplify)) { + ans[[thisansi]] = tryCatch( + simplifylist(ans[[thisansi]], fill, ansmask), + error = function(e) stopf(msg.simplify, "an error", e[["message"]]), + warning = function(w) warningf(msg.simplify, "a warning", w[["message"]]) + ) + } + } + } + + ## preparing output format + if (length(ans)) { + if (!xvec && length(ans)==1L) { + ans = ans[[1L]] ## unpack atomic input + } else if (give.names) { + nms = make.roll.names(x.len=nx, n.len=nn, n=orig$N, x.nm=xnam, n.nm=nnam, fun="apply", adaptive=orig$adaptive) + setattr(ans, "names", nms) + } + } + ans +} diff --git a/R/utils.R b/R/utils.R index 575913d345..fe3e75d4af 100644 --- a/R/utils.R +++ b/R/utils.R @@ -22,6 +22,7 @@ nan_is_na = function(x) { } if (base::getRversion() < "3.2.0") { # Apr 2015 + lengths = function(x) vapply(x, length, 0L, USE.NAMES=FALSE) isNamespaceLoaded = function(x) x %chin% loadedNamespaces() } diff --git a/README.md b/README.md index 47fcf46fc2..50ea4383e0 100644 --- a/README.md +++ b/README.md @@ -33,9 +33,12 @@ * fast and feature rich joins: **ordered joins** (e.g. rolling forwards, backwards, nearest and limited staleness), **[overlapping range joins](https://github.com/Rdatatable/data.table/wiki/talks/EARL2014_OverlapRangeJoin_Arun.pdf)** (similar to `IRanges::findOverlaps`), **[non-equi joins](https://github.com/Rdatatable/data.table/wiki/talks/ArunSrinivasanUseR2016.pdf)** (i.e. joins using operators `>, >=, <, <=`), **aggregate on join** (`by=.EACHI`), **update on join** * fast add/update/delete columns **by reference** by group using no copies at all * fast and feature rich **reshaping** data: **[`?dcast`](https://rdatatable.gitlab.io/data.table/reference/dcast.data.table.html)** (_pivot/wider/spread_) and **[`?melt`](https://rdatatable.gitlab.io/data.table/reference/melt.data.table.html)** (_unpivot/longer/gather_) +* fast and feature rich various kinds of operations: [**rolling statistics**](https://rdatatable.gitlab.io/data.table/reference/froll.html) (_rolling mean, rolling max, rolling any R function_), [**grouping sets**](https://rdatatable.gitlab.io/data.table/reference/groupingsets.html) (_cube, rolllup_), [**set operations**](https://rdatatable.gitlab.io/data.table/reference/setops.html) (_union, intersect, setdiff_) +* faster implementations of many functions: `fsort`, `fifelse`, `fcase`, `fcoalesce`, `frank`, `between`, `nafill`, `rleid`, and more * **any R function from any R package** can be used in queries not just the subset of functions made available by a database backend, also columns of type `list` are supported +* [**meta-programming interface**](https://rdatatable.gitlab.io/data.table/articles/datatable-programming.html) for convenience of R developers importing data.table in their packages * has **[no dependencies](https://en.wikipedia.org/wiki/Dependency_hell)** at all other than base R itself, for simpler production/maintenance -* the R dependency is **as old as possible for as long as possible**, dated April 2014, and we continuously test against that version; e.g. v1.11.0 released on 5 May 2018 bumped the dependency up from 5 year old R 3.0.0 to 4 year old R 3.1.0 +* the R dependency is **as old as possible for as long as possible**, dated April 2014, and we continuously test against that version ## Installation diff --git a/inst/tests/froll.Rraw b/inst/tests/froll.Rraw index f6a4f96a80..95846acbbf 100644 --- a/inst/tests/froll.Rraw +++ b/inst/tests/froll.Rraw @@ -130,7 +130,7 @@ anserr = list( froll_exact = ans4-ans1 ) errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE) -if (!(.Platform$OS.type=="windows" && getDTthreads()>1L)) { # windows 2+ threads rounding issue: #3346 +if (!(.Platform$OS.type=="windows" && getDTthreads(FALSE)>1L)) { # windows 2+ threads rounding issue: #3346 if (.Machine$sizeof.longdouble == 16L) test(6000.024, errs[["froll_exact"]]==0) # only where long double available, otherwise we get noLD CRAN note if (Sys.info()["machine"] == "x86_64") test(6000.025, errs[["froll_fast"]]>errs[["froll_exact"]]) # floating point arithmetic issue on various machines #3491 } @@ -308,36 +308,36 @@ test(6000.0671, frollmean(c(1:2,NA,4:10), 4), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 4, hasna 0, narm 0", - "frollmeanFast: NA.*are present in input, skip non-NA attempt and run with extra care for NAs", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 4, hasnf 0, narm 0", + "frollmeanFast: non-finite values are present in input, skip non-finite inaware attempt and run with extra care for NFs straighaway", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*" )) -test(6000.0672, frollmean(c(1:2,NA,4:10), 4, hasNA=FALSE), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), output=c( +test(6000.0672, frollmean(c(1:2,NA,4:10), 4, has.nf=FALSE), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 4, hasna -1, narm 0", - "frollmeanFast: NA.*are present in input, skip non-NA attempt and run with extra care for NAs", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 4, hasnf -1, narm 0", + "frollmeanFast: non-finite values are present in input, skip non-finite inaware attempt and run with extra care for NFs straighaway", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*" -), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.0673, frollmean(c(1:2,NA,4:10), 2, hasNA=FALSE), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), output=c( +), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.0673, frollmean(c(1:2,NA,4:10), 2, has.nf=FALSE), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 2, hasna -1, narm 0", - "frollmeanFast: NA.*are present in input, re-running with extra care for NAs", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 2, hasnf -1, narm 0", + "frollmeanFast: non-finite values are present in input, re-running with extra care for NFs", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*" -), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") +), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") test(6000.0674, frollmean(c(1:2,NA,4:10), 4, align="center"), c(rep(NA_real_, 4), 5.5, 6.5, 7.5, 8.5, NA, NA), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", - "frollmeanFast: running for input length 10, window 4, hasna 0, narm 0", - "frollmeanFast: NA.*are present in input, skip non-NA attempt and run with extra care for NAs", - "frollmean: align 0, shift answer by -2", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 4, hasnf 0, narm 0", + "frollmeanFast: non-finite values are present in input, skip non-finite inaware attempt and run with extra care for NFs straighaway", + "frollfun: align 0, shift answer by -2", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*" )) options(datatable.verbose=FALSE) @@ -383,7 +383,7 @@ test(6000.093, frollmean(list(1:3, 4:6), 4), list(c(NA_real_, NA_real_, NA_real_ test(6000.0931, frollmean(list(1:3, 4:6), 4, align="center"), list(c(NA_real_, NA_real_, NA_real_), c(NA_real_, NA_real_, NA_real_))) test(6000.0932, frollmean(list(1:3, 4:6), 4, align="left"), list(c(NA_real_, NA_real_, NA_real_), c(NA_real_, NA_real_, NA_real_))) options(datatable.verbose=TRUE) -test(6000.0933, frollmean(list(1:3, 4:6), 4), list(c(NA_real_, NA_real_, NA_real_), c(NA_real_, NA_real_, NA_real_)), output="frollmean: window width longer than input vector, returning all NA vector") +test(6000.0933, frollmean(list(1:3, 4:6), 4), list(c(NA_real_, NA_real_, NA_real_), c(NA_real_, NA_real_, NA_real_)), output="frollfun: window width longer than input vector, returning all NA vector") options(datatable.verbose=FALSE) #### n==length(x) test(6000.094, frollmean(list(1:3, 4:6), 3), list(c(NA_real_, NA_real_, 2), c(NA_real_, NA_real_, 5))) @@ -426,19 +426,19 @@ test(6000.119, frollmean(1:5, list(1:5)), error="n must be integer, list is acce test(6000.1192, frollmean(1:5, 2, adaptive=NA), error="adaptive must be TRUE or FALSE") #### na.rm=NA test(6000.1193, frollmean(1:5, 2, na.rm=NA), error="na.rm must be TRUE or FALSE") -#### hasNA=1 -test(6000.1194, frollmean(1:5, 2, hasNA=1), error="hasNA must be TRUE, FALSE or NA") -#### hasNA=FALSE na.rm=TRUE -test(6000.1195, frollmean(1:5, 2, na.rm=TRUE, hasNA=FALSE), error="using hasNA FALSE and na.rm TRUE does not make sense, if you know there are NA values use hasNA TRUE, otherwise leave it as default NA") +#### has.nf=1 +test(6000.1194, frollmean(1:5, 2, has.nf=1), error="has.nf must be TRUE, FALSE or NA") +#### has.nf=FALSE na.rm=TRUE +test(6000.1195, frollmean(1:5, 2, na.rm=TRUE, has.nf=FALSE), error="using has.nf FALSE and na.rm TRUE does not make sense, if you know there are non-finite values then use has.nf TRUE, otherwise leave it as default NA") #### exact na.rm=TRUE adaptive=TRUE verbose=TRUE options(datatable.verbose=TRUE) test(6000.1196, frollmean(c(1:5,NA), 1:6, algo="exact", na.rm=TRUE, adaptive=TRUE), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*, not entering parallel execution here because algo='exact' will compute results in parallel", "frollfunR: 1:", - "fadaptiverollmeanExact: running in parallel for input length 6, hasna 0, narm 1", - "fadaptiverollmeanExact: NA.*are present in input, re-running with extra care for NAs", - "fadaptiverollmean: processing algo 1 took.*", + "frolladaptivemeanExact: running in parallel for input length 6, hasnf 0, narm 1", + "frolladaptivemeanExact: non-finite values are present in input, re-running with extra care for NFs", + "frolladaptivefun: processing fun 0 algo 1 took.*", "frollfunR: processing.*took.*" )) #### exact na.rm=TRUE verbose=TRUE @@ -446,9 +446,9 @@ test(6000.1197, frollmean(c(1:5,NA), 2, algo="exact", na.rm=TRUE), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*, not entering parallel execution here because algo='exact' will compute results in parallel", "frollfunR: 1:", - "frollmeanExact: running in parallel for input length 6, window 2, hasna 0, narm 1", - "frollmeanExact: NA.*are present in input, re-running with extra care for NAs", - "frollmean: processing algo 1 took.*", + "frollmeanExact: running in parallel for input length 6, window 2, hasnf 0, narm 1", + "frollmeanExact: non-finite values are present in input, re-running with extra care for NFs", + "frollfun: processing fun 0 algo 1 took.*", "frollfunR: processing.*took.*" )) options(datatable.verbose=FALSE) @@ -468,25 +468,31 @@ ma = function(x, n, na.rm=FALSE, nf.rm=FALSE) { n = 4 x = 1:16 x[5] = NaN -test(6000.120, frollmean(x, n), ma(x, n, nf.rm=TRUE)) +test(6000.120, frollmean(x, n), ma(x, n)) test(6000.121, frollmean(x, n, algo="exact"), ma(x, n)) x[6] = NA -test(6000.122, frollmean(x, n), ma(x, n, nf.rm=TRUE)) +test(6000.122, frollmean(x, n), ma(x, n)) test(6000.123, frollmean(x, n, algo="exact"), ma(x, n)) # use do not use identical as NaN-NA behaviour is platform/compiler specific #3353 #### test inconsistency of NaN-NA order is consistent to https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441 x[5] = NA x[6] = NaN -test(6000.124, frollmean(x, n), ma(x, n, nf.rm=TRUE)) +test(6000.124, frollmean(x, n), ma(x, n)) test(6000.125, frollmean(x, n, algo="exact"), ma(x, n)) x[5] = Inf -test(6000.126, frollmean(x, n), ma(x, n, nf.rm=TRUE)) +test(6000.126, frollmean(x, n), ma(x, n)) test(6000.127, frollmean(x, n, algo="exact"), ma(x, n)) x[6] = -Inf -test(6000.128, frollmean(x, n), ma(x, n, nf.rm=TRUE)) +test(6000.128, frollmean(x, n), ma(x, n)) test(6000.129, frollmean(x, n, algo="exact"), ma(x, n)) x[5:7] = c(NA, Inf, -Inf) -test(6000.130, frollmean(x, n), ma(x, n, nf.rm=TRUE)) +test(6000.130, frollmean(x, n), ma(x, n)) test(6000.131, frollmean(x, n, algo="exact"), ma(x, n)) +x = c(Inf,-Inf,-Inf,Inf,Inf) +n = 2 +test(6000.1311, frollmean(x, n), ma(x, n)) +test(6000.1312, frollmean(x, n, algo="exact"), ma(x, n)) +test(6000.1313, frollsum(x, n), c(NA,NaN,-Inf,NA,Inf)) +test(6000.1314, frollsum(x, n, algo="exact"), c(NA,NaN,-Inf,NA,Inf)) #### adaptive window ama = function(x, n, na.rm=FALSE, fill=NA, nf.rm=FALSE) { @@ -563,9 +569,16 @@ if (FALSE) { #### adaptive limitations test(6000.145, frollmean(1:2, 1:2, adaptive=TRUE, align="right"), c(1, 1.5)) -test(6000.146, frollmean(1:2, 1:2, adaptive=TRUE, align="center"), error="using adaptive TRUE and align argument different than 'right' is not implemented") -test(6000.147, frollmean(1:2, 1:2, adaptive=TRUE, align="left"), error="using adaptive TRUE and align argument different than 'right' is not implemented") -test(6000.148, frollmean(list(1:2, 1:3), list(1:2), adaptive=TRUE), error="adaptive rolling function can only process 'x' having equal length of elements, like data.table or data.frame. If you want to call rolling function on list having variable length of elements call it for each field separately") +test(6000.146, frollmean(1:2, 1:2, adaptive=TRUE, align="center"), error="using adaptive TRUE and align 'center' is not implemented") +test(6000.147, frollmean(list(1:2, 1:3), list(1:2), adaptive=TRUE), error="adaptive rolling function can only process 'x' having equal length of elements, like data.table or data.frame. If you want to call rolling function on list having variable length of elements call it for each field separately") + +#### adaptive align - added in #5441 +options(datatable.verbose=TRUE) +test(6000.148, frollsum(c(1,3,4,2,0), c(3,2,2,3,2), adaptive=TRUE, align="left"), c(8,7,6,NA,NA), output=c("processing from align='right'")) +options(datatable.verbose=FALSE) +test(6000.1481, frollsum(c(1,3,4,2,0), list(c(3,2,2,3,2), c(3,3,3,3,3)), adaptive=TRUE, align="left"), list(c(8,7,6,NA,NA), c(8,9,6,NA,NA))) +test(6000.1482, frollsum(list(c(1,3,4,2,0), c(3,1,4,2,0)), c(3,2,2,3,2), adaptive=TRUE, align="left"), list(c(8,7,6,NA,NA), c(8,5,6,NA,NA))) +test(6000.1483, frollsum(list(c(1,3,4,2,0), c(3,1,4,2,0)), list(c(3,2,2,3,2), c(3,3,3,3,3)), adaptive=TRUE, align="left"), list(c(8,7,6,NA,NA),c(8,9,6,NA,NA),c(8,5,6,NA,NA),c(8,7,6,NA,NA))) #### adaptive exact fastama = function(x, n, na.rm, fill=NA) { @@ -635,24 +648,24 @@ test(6000.158, frollmean(1:10, list(1:5), adaptive=TRUE), error="length of integ n = c(4,1,4,5,5,4,6,5,4,4,2,3,4,3,2,4) x = 1:16 x[5] = NaN -test(6000.159, frollmean(x, n, adaptive=TRUE), ama(x, n, nf.rm=TRUE)) +test(6000.159, frollmean(x, n, adaptive=TRUE), ama(x, n)) test(6000.160, frollmean(x, n, algo="exact", adaptive=TRUE), ama(x, n)) x[6] = NA -test(6000.161, frollmean(x, n, adaptive=TRUE), ama(x, n, nf.rm=TRUE)) +test(6000.161, frollmean(x, n, adaptive=TRUE), ama(x, n)) test(6000.162, frollmean(x, n, algo="exact", adaptive=TRUE), ama(x, n)) # use do not use identical as NaN-NA behaviour is platform/compiler specific #3353 #### test inconsistency of NaN-NA order is consistent to https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441 x[5] = NA x[6] = NaN -test(6000.163, frollmean(x, n, adaptive=TRUE), ama(x, n, nf.rm=TRUE)) +test(6000.163, frollmean(x, n, adaptive=TRUE), ama(x, n)) test(6000.164, frollmean(x, n, algo="exact", adaptive=TRUE), ama(x, n)) x[5] = Inf -test(6000.165, frollmean(x, n, adaptive=TRUE), ama(x, n, nf.rm=TRUE)) +test(6000.165, frollmean(x, n, adaptive=TRUE), ama(x, n)) test(6000.166, frollmean(x, n, algo="exact", adaptive=TRUE), ama(x, n)) x[6] = -Inf -test(6000.167, frollmean(x, n, adaptive=TRUE), ama(x, n, nf.rm=TRUE)) +test(6000.167, frollmean(x, n, adaptive=TRUE), ama(x, n)) test(6000.168, frollmean(x, n, algo="exact", adaptive=TRUE), ama(x, n)) x[5:7] = c(NA, Inf, -Inf) -test(6000.169, frollmean(x, n, adaptive=TRUE), ama(x, n, nf.rm=TRUE)) +test(6000.169, frollmean(x, n, adaptive=TRUE), ama(x, n)) test(6000.170, frollmean(x, n, algo="exact", adaptive=TRUE), ama(x, n)) ## test verbose messages @@ -663,82 +676,82 @@ test(6000.171, frollmean(x, n), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*")) test(6000.172, frollmean(list(x, x+1), n), output=c( "frollfunR: allocating memory for results 2x1", "frollfunR: 2 column.*1 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: 2:", - "frollmeanFast: running for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*")) test(6000.173, frollmean(x, c(n, n+1)), output=c( "frollfunR: allocating memory for results 1x2", "frollfunR: 1 column.*2 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: 2:", - "frollmeanFast: running for input length 10, window 4, hasna 0, narm 0", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 4, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*")) test(6000.174, frollmean(list(x, x+1), c(n, n+1)), output=c( "frollfunR: allocating memory for results 2x2", "frollfunR: 2 column.*2 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: 2:", - "frollmeanFast: running for input length 10, window 4, hasna 0, narm 0", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 4, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: 3:", - "frollmeanFast: running for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: 4:", - "frollmeanFast: running for input length 10, window 4, hasna 0, narm 0", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 4, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*")) test(6000.175, frollmean(x, n, algo="exact"), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanExact: running in parallel for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 1 took.*", + "frollmeanExact: running in parallel for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 1 took.*", "frollfunR: processing.*took.*")) test(6000.176, frollmean(x, n, align="center"), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 3, hasna 0, narm 0", - "frollmean: align 0, shift answer by -1", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 3, hasnf 0, narm 0", + "frollfun: align 0, shift answer by -1", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*")) test(6000.177, frollmean(x, n, align="left"), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 3, hasna 0, narm 0", - "frollmean: align -1, shift answer by -2", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 3, hasnf 0, narm 0", + "frollfun: align -1, shift answer by -2", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*")) nn = c(1:4,2:3,1:4) test(6000.178, frollmean(x, nn, adaptive=TRUE), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "fadaptiverollmeanFast: running for input length 10, hasna 0, narm 0", - "fadaptiverollmean: processing algo 0 took.*", + "frolladaptivemeanFast: running for input length 10, hasnf 0, narm 0", + "frolladaptivefun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*")) test(6000.179, frollmean(x, nn, algo="exact", adaptive=TRUE), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "fadaptiverollmeanExact: running in parallel for input length 10, hasna 0, narm 0", - "fadaptiverollmean: processing algo 1 took.*", + "frolladaptivemeanExact: running in parallel for input length 10, hasnf 0, narm 0", + "frolladaptivefun: processing fun 0 algo 1 took.*", "frollfunR: processing.*took.*")) x[8] = NA @@ -746,33 +759,33 @@ test(6000.180, frollmean(x, n), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanFast: running for input length 10, window 3, hasna 0, narm 0", - "frollmeanFast: NA.*are present in input, re-running with extra care for NAs", - "frollmean: processing algo 0 took.*", + "frollmeanFast: running for input length 10, window 3, hasnf 0, narm 0", + "frollmeanFast: non-finite values are present in input, re-running with extra care for NFs", + "frollfun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*")) test(6000.181, frollmean(x, n, algo="exact"), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanExact: running in parallel for input length 10, window 3, hasna 0, narm 0", - "frollmeanExact: NA.*are present in input, na.rm was FALSE so in 'exact' implementation NAs were handled already, no need to re-run", - "frollmean: processing algo 1 took.*", + "frollmeanExact: running in parallel for input length 10, window 3, hasnf 0, narm 0", + "frollmeanExact: non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run", + "frollfun: processing fun 0 algo 1 took.*", "frollfunR: processing.*took.*")) test(6000.182, frollmean(x, nn, adaptive=TRUE), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "fadaptiverollmeanFast: running for input length 10, hasna 0, narm 0", - "fadaptiverollmeanFast: NA.*are present in input, re-running with extra care for NAs", - "fadaptiverollmean: processing algo 0 took.*", + "frolladaptivemeanFast: running for input length 10, hasnf 0, narm 0", + "frolladaptivemeanFast: non-finite values are present in input, re-running with extra care for NFs", + "frolladaptivefun: processing fun 0 algo 0 took.*", "frollfunR: processing.*took.*")) test(6000.183, frollmean(x, nn, algo="exact", adaptive=TRUE), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "fadaptiverollmeanExact: running in parallel for input length 10, hasna 0, narm 0", - "fadaptiverollmeanExact: NA.*are present in input, na.rm was FALSE so in 'exact' implementation NAs were handled already, no need to re-run", - "fadaptiverollmean: processing algo 1 took.*", + "frolladaptivemeanExact: running in parallel for input length 10, hasnf 0, narm 0", + "frolladaptivemeanExact: non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run", + "frolladaptivefun: processing fun 0 algo 1 took.*", "frollfunR: processing.*took.*")) d = as.data.table(list(1:10/2, 10:1/4)) @@ -780,38 +793,40 @@ test(6000.184, frollmean(d[,1], 3, algo="exact"), output=c( "frollfunR: allocating memory for results 1x1", "frollfunR: 1 column.*1 window.*", "frollfunR: 1:", - "frollmeanExact: running in parallel for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 1 took.*", + "frollmeanExact: running in parallel for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 1 took.*", "frollfunR: processing.*took.*" )) test(6000.185, frollmean(d, 3:4, algo="exact"), output=c( "frollfunR: allocating memory for results 2x2", "frollfunR: 2 column.*2 window.*", "frollfunR: 1:", - "frollmeanExact: running in parallel for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 1 took.*", + "frollmeanExact: running in parallel for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 1 took.*", "frollfunR: 2:", - "frollmeanExact: running in parallel for input length 10, window 4, hasna 0, narm 0", - "frollmean: processing algo 1 took.*", + "frollmeanExact: running in parallel for input length 10, window 4, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 1 took.*", "frollfunR: 3:", - "frollmeanExact: running in parallel for input length 10, window 3, hasna 0, narm 0", - "frollmean: processing algo 1 took.*", + "frollmeanExact: running in parallel for input length 10, window 3, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 1 took.*", "frollfunR: 4:", - "frollmeanExact: running in parallel for input length 10, window 4, hasna 0, narm 0", - "frollmean: processing algo 1 took.*", + "frollmeanExact: running in parallel for input length 10, window 4, hasnf 0, narm 0", + "frollfun: processing fun 0 algo 1 took.*", "frollfunR: processing.*took.*" )) options(datatable.verbose=FALSE) ## test warnings -test(6000.186, frollmean(c(1:2,NA,4:10), 4, hasNA=FALSE), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.187, frollmean(c(1:2,NA,4:10), 2, hasNA=FALSE), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.188, frollmean(c(1:2,NA,4:10), 4, hasNA=FALSE, algo="exact"), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.189, frollmean(c(1:2,NA,4:10), 2, hasNA=FALSE, algo="exact"), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.190, frollmean(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, hasNA=FALSE), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), warning="*hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.191, frollmean(c(1:2,NA,4:10), rep(2L,10), adaptive=TRUE, hasNA=FALSE), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), warning="*hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.192, frollmean(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, hasNA=FALSE, algo="exact"), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), warning="*hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.193, frollmean(c(1:2,NA,4:10), rep(2L,10), adaptive=TRUE, hasNA=FALSE, algo="exact"), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), warning="*hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") +test(6000.186, frollmean(c(1:2,NA,4:10), 4, has.nf=FALSE), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.187, frollmean(c(1:2,NA,4:10), 2, has.nf=FALSE), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.188, frollmean(c(1:2,NA,4:10), 4, has.nf=FALSE, algo="exact"), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.189, frollmean(c(1:2,NA,4:10), 2, has.nf=FALSE, algo="exact"), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.190, frollmean(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, has.nf=FALSE), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.191, frollmean(c(1:2,NA,4:10), rep(2L,10), adaptive=TRUE, has.nf=FALSE), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.192, frollmean(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, has.nf=FALSE, algo="exact"), c(rep(NA_real_, 6), 5.5, 6.5, 7.5, 8.5), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.193, frollmean(c(1:2,NA,4:10), rep(2L,10), adaptive=TRUE, has.nf=FALSE, algo="exact"), c(NA, 1.5, NA, NA, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.199, frollmean(1:2, 1, hasNA=TRUE), c(1,2), warning="hasNA is deprecated, use has.nf instead") +test(6000.1991, frollmean(1:2, 1, has.nf=FALSE, hasNA=TRUE), error="hasNA is deprecated, use has.nf instead") ## frollsum x = 1:6/2 @@ -826,27 +841,955 @@ test(6000.202, ans1, expected) options(datatable.verbose=TRUE) test(6000.211, frollsum(1:5, 6), rep(NA_real_, 5L), output="window width longer than input vector") options(datatable.verbose=FALSE) -test(6000.212, frollsum(c(1:2,NA,4:10), 4, hasNA=FALSE), c(rep(NA_real_, 6), 22, 26, 30, 34), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.213, frollsum(c(1:2,NA,4:10), 2, hasNA=FALSE), c(NA, 3, NA, NA, 9, 11, 13, 15, 17, 19), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.214, frollsum(c(1:2,NA,4:10), 4, hasNA=FALSE, algo="exact"), c(rep(NA_real_, 6), 22, 26, 30, 34), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") +test(6000.212, frollsum(c(1:2,NA,4:10), 4, has.nf=FALSE), c(rep(NA_real_, 6), 22, 26, 30, 34), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.213, frollsum(c(1:2,NA,4:10), 2, has.nf=FALSE), c(NA, 3, NA, NA, 9, 11, 13, 15, 17, 19), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.214, frollsum(c(1:2,NA,4:10), 4, has.nf=FALSE, algo="exact"), c(rep(NA_real_, 6), 22, 26, 30, 34), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") options(datatable.verbose=TRUE) -test(6000.215, frollsum(c(1:2,NA,4:10), 4, algo="exact", na.rm=TRUE), c(rep(NA_real_, 3L), 7, 11, 15, 22, 26, 30, 34), output="re-running with extra care for NAs") -test(6000.216, frollsum(c(1:2,NA,4:10), 4, algo="exact"), c(rep(NA_real_, 6), 22, 26, 30, 34), output="NAs were handled already, no need to re-run") +test(6000.215, frollsum(c(1:2,NA,4:10), 4, algo="exact", na.rm=TRUE), c(rep(NA_real_, 3L), 7, 11, 15, 22, 26, 30, 34), output="non-finite values are present in input, re-running with extra care for NFs") +test(6000.216, frollsum(c(1:2,NA,4:10), 4, algo="exact"), c(rep(NA_real_, 6), 22, 26, 30, 34), output="non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run") options(datatable.verbose=FALSE) -test(6000.217, frollsum(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, hasNA=FALSE), c(rep(NA_real_, 6), 22, 26, 30, 34), warning="*hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") -test(6000.218, frollsum(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, hasNA=FALSE, algo="exact"), c(rep(NA_real_, 6), 22, 26, 30, 34), warning="hasNA=FALSE used but NA.*are present in input, use default hasNA=NA to avoid this warning") +test(6000.217, frollsum(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, has.nf=FALSE), c(rep(NA_real_, 6), 22, 26, 30, 34), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.218, frollsum(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, has.nf=FALSE, algo="exact"), c(rep(NA_real_, 6), 22, 26, 30, 34), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") options(datatable.verbose=TRUE) -test(6000.219, frollsum(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, algo="exact", na.rm=TRUE), c(rep(NA_real_, 3L), 7, 11, 15, 22, 26, 30, 34), output="re-running with extra care for NAs") -test(6000.220, frollsum(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, algo="exact"), c(rep(NA_real_, 6), 22, 26, 30, 34), output="NAs were handled already, no need to re-run") +test(6000.219, frollsum(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, algo="exact", na.rm=TRUE), c(rep(NA_real_, 3L), 7, 11, 15, 22, 26, 30, 34), output="non-finite values are present in input, re-running with extra care for NFs") +test(6000.220, frollsum(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, algo="exact"), c(rep(NA_real_, 6), 22, 26, 30, 34), output="non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run") test(6000.221, frollsum(1:3, 2), c(NA, 3, 5), output="frollsumFast: running for input length") -test(6000.222, frollsum(1:3, 2, align="left"), c(3, 5, NA), output="frollsum: align") -test(6000.223, frollsum(c(1,2,NA), 2), c(NA, 3, NA), output="re-running with extra care for NAs") -test(6000.224, frollsum(c(NA,2,3), 2), c(NA, NA, 5), output="skip non-NA attempt and run with extra care for NAs") -test(6000.225, frollsum(1:3, c(2,2,2), adaptive=TRUE), c(NA, 3, 5), output="fadaptiverollsumFast: running for input length") -test(6000.226, frollsum(c(NA,2,3), c(2,2,2), adaptive=TRUE), c(NA, NA, 5), output="re-running with extra care for NAs") +test(6000.222, frollsum(1:3, 2, align="left"), c(3, 5, NA), output="frollfun: align") +test(6000.223, frollsum(c(1,2,NA), 2), c(NA, 3, NA), output="non-finite values are present in input, re-running with extra care for NFs") +test(6000.224, frollsum(c(NA,2,3), 2), c(NA, NA, 5), output="non-finite values are present in input, skip non-finite inaware attempt and run with extra care for NFs straighaway") +test(6000.225, frollsum(1:3, c(2,2,2), adaptive=TRUE), c(NA, 3, 5), output="frolladaptivesumFast: running for input length") +test(6000.226, frollsum(c(NA,2,3), c(2,2,2), adaptive=TRUE), c(NA, NA, 5), output="non-finite values are present in input, re-running with extra care for NFs") +options(datatable.verbose=FALSE) + +## frollmax adaptive +options(datatable.verbose=TRUE) ## adaptive frollmax no fast algo +test(6000.3, frollmax(1:4, c(2,2,2,2), adaptive=TRUE), output="frolladaptivefun: algo 0 not implemented, fall back to 1") +test(6000.3001, frollmax(1:4, c(2,2,2,2), algo="fast", adaptive=TRUE), output="frolladaptivefun: algo 0 not implemented, fall back to 1") +test(6000.3002, frollmax(1:4, c(2,2,2,2), algo="exact", adaptive=TRUE), notOutput="frolladaptivefun: algo 0 not implemented, fall back to 1") +options(datatable.verbose=FALSE) +n = c(3,2,2,4,2,1,4,8) +x = c(7,2,3,6,3,2,6,6) # no NA +test(6000.3111, frollmax(x, n, adaptive=TRUE), c(NA,7,3,7,6,2,6,7)) # has.nf=NA # narm=F +test(6000.3112, frollmax(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,7,3,7,6,2,6,7)) # narm=T +test(6000.3121, frollmax(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,7,3,7,6,2,6,7)) # has.nf=F +test(6000.3122, frollmax(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.3131, frollmax(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,7,3,7,6,2,6,7)) # has.nf=T +test(6000.3132, frollmax(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,7,3,7,6,2,6,7)) +x = c(7,2,NA,6,3,NA,6,6) # NA +test(6000.3211, frollmax(x, n, adaptive=TRUE), c(NA,7,NA,NA,6,NA,NA,NA)) +test(6000.3212, frollmax(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,7,2,7,6,-Inf,6,7)) +test(6000.3221, frollmax(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,7,2,7,6,-Inf,6,7)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.3222, frollmax(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.3231, frollmax(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,7,NA,NA,6,NA,NA,NA)) +test(6000.3232, frollmax(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,7,2,7,6,-Inf,6,7)) +x = rep(NA_real_, 8) # all NA +test(6000.3241, frollmax(x, n, adaptive=TRUE), rep(NA_real_, 8)) +test(6000.3242, frollmax(x, n, na.rm=TRUE, adaptive=TRUE), c(NA, rep(-Inf, 7))) +test(6000.3251, frollmax(x, n, has.nf=FALSE, adaptive=TRUE), c(NA, rep(-Inf, 7))) +test(6000.3252, frollmax(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.3261, frollmax(x, n, has.nf=TRUE, adaptive=TRUE), rep(NA_real_, 8)) +test(6000.3262, frollmax(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA, rep(-Inf, 7))) +x = c(NA,NaN,NA,NaN,NaN,NaN,NA,NA) # all NaN/NA +test(6000.3271, frollmax(x, n, adaptive=TRUE), c(NA,NA,NA,NA,NaN,NaN,NA,NA)) +test(6000.3272, frollmax(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf)) +test(6000.3281, frollmax(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.3282, frollmax(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.3291, frollmax(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,NA,NA,NA,NaN,NaN,NA,NA)) +test(6000.3292, frollmax(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf)) +x = c(7,2,NA,6,3,Inf,6,6) # Inf +test(6000.3311, frollmax(x, n, adaptive=TRUE), c(NA,7,NA,NA,6,Inf,Inf,NA)) +test(6000.3312, frollmax(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,7,2,7,6,Inf,Inf,Inf)) +test(6000.3321, frollmax(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,7,2,7,6,Inf,Inf,Inf)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.3322, frollmax(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.3331, frollmax(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,7,NA,NA,6,Inf,Inf,NA)) +test(6000.3332, frollmax(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,7,2,7,6,Inf,Inf,Inf)) +x = c(7,2,-Inf,6,3,NA,6,6) # -Inf +test(6000.3341, frollmax(x, n, adaptive=TRUE), c(NA,7,2,7,6,NA,NA,NA)) +test(6000.3342, frollmax(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,7,2,7,6,-Inf,6,7)) +test(6000.3351, frollmax(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,7,2,7,6,-Inf,6,7)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.3352, frollmax(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.3361, frollmax(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,7,2,7,6,NA,NA,NA)) +test(6000.3362, frollmax(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,7,2,7,6,-Inf,6,7)) + +## frollmax non-adaptive +options(datatable.verbose=TRUE) +test(6000.4001, frollmax(1:3, 2), c(NA, 2, 3), output="frollmaxFast: running for input length") +test(6000.4002, frollmax(1:10, 5), c(NA,NA,NA,NA,5,6,7,8,9,10), output="frollmaxFast: nested window max calculation called 0 times") +test(6000.4003, frollmax(10:1, 5), c(NA,NA,NA,NA,10,9,8,7,6,5), output="frollmaxFast: nested window max calculation called 5 times") +test(6000.4004, frollmax(1:3, 2, algo="exact"), c(NA, 2, 3), output="frollmaxExact: running in parallel for input length") +test(6000.4005, frollmax(c(1,2,3,NA,5), 2), c(NA, 2, 3, NA, NA), output="continue with extra care for NFs") +options(datatable.verbose=FALSE) +n = 3 +x = c(7,2,3,6,3,2,4,5) # no NA +ans = c(NA,NA,7,6,6,6,4,5) +test(6000.4111, frollmax(x, n), ans) # has.nf=NA # narm=F +test(6000.4112, frollmax(x, n, na.rm=TRUE), ans) # narm=T +test(6000.4113, frollmax(x, n, algo="exact"), ans) # has.nf=NA # narm=F +test(6000.4114, frollmax(x, n, algo="exact", na.rm=TRUE), ans) # narm=T +test(6000.4121, frollmax(x, n, has.nf=FALSE), ans) # has.nf=F +test(6000.4122, frollmax(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4123, frollmax(x, n, algo="exact", has.nf=FALSE), ans) # has.nf=F +test(6000.4124, frollmax(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4131, frollmax(x, n, has.nf=TRUE), ans) # has.nf=T +test(6000.4132, frollmax(x, n, has.nf=TRUE, na.rm=TRUE), ans) +test(6000.4133, frollmax(x, n, algo="exact", has.nf=TRUE), ans) # has.nf=T +test(6000.4134, frollmax(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), ans) +x = c(7,2,3,NA,3,2,4,NA) # NA +test(6000.4211, frollmax(x, n), c(NA,NA,7,NA,NA,NA,4,NA)) +test(6000.4212, frollmax(x, n, na.rm=TRUE), c(NA,NA,7,3,3,3,4,4)) +test(6000.4213, frollmax(x, n, algo="exact"), c(NA,NA,7,NA,NA,NA,4,NA)) +test(6000.4214, frollmax(x, n, algo="exact", na.rm=TRUE), c(NA,NA,7,3,3,3,4,4)) +test(6000.4221, frollmax(x, n, has.nf=FALSE), c(NA,NA,7,3,3,3,4,4)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.4222, frollmax(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4223, frollmax(x, n, algo="exact", has.nf=FALSE), c(NA,NA,7,3,3,3,4,4)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.4224, frollmax(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4231, frollmax(x, n, has.nf=TRUE), c(NA,NA,7,NA,NA,NA,4,NA)) +test(6000.4232, frollmax(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA,7,3,3,3,4,4)) +test(6000.4233, frollmax(x, n, algo="exact", has.nf=TRUE), c(NA,NA,7,NA,NA,NA,4,NA)) +test(6000.4234, frollmax(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA,7,3,3,3,4,4)) +x = rep(NA_real_, 8) # all NA +test(6000.4241, frollmax(x, n), rep(NA_real_, 8)) +test(6000.4242, frollmax(x, n, na.rm=TRUE), c(NA,NA, rep(-Inf, 6))) +test(6000.4243, frollmax(x, n, algo="exact"), rep(NA_real_, 8)) +test(6000.4244, frollmax(x, n, algo="exact", na.rm=TRUE), c(NA,NA, rep(-Inf, 6))) +test(6000.4251, frollmax(x, n, has.nf=FALSE), c(NA,NA, rep(-Inf, 6))) +test(6000.4252, frollmax(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4253, frollmax(x, n, algo="exact", has.nf=FALSE), c(NA,NA, rep(-Inf, 6))) +test(6000.4254, frollmax(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4261, frollmax(x, n, has.nf=TRUE), rep(NA_real_, 8)) +test(6000.4262, frollmax(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA, rep(-Inf, 6))) +test(6000.4263, frollmax(x, n, algo="exact", has.nf=TRUE), rep(NA_real_, 8)) +test(6000.4264, frollmax(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA, rep(-Inf, 6))) +x = c(NA,NaN,NA,NaN,NaN,NaN,NA,NA) # all NaN/NA +test(6000.4271, frollmax(x, n), c(NA,NA,NA,NA,NA,NaN,NA,NA)) +test(6000.4272, frollmax(x, n, na.rm=TRUE), c(NA,NA,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf)) +test(6000.4273, frollmax(x, n, algo="exact"), c(NA,NA,NA,NA,NA,NaN,NA,NA)) +test(6000.4274, frollmax(x, n, algo="exact", na.rm=TRUE), c(NA,NA,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf)) +test(6000.4281, frollmax(x, n, has.nf=FALSE), c(NA,NA,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.4282, frollmax(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4283, frollmax(x, n, algo="exact", has.nf=FALSE), c(NA,NA,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.4284, frollmax(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4291, frollmax(x, n, has.nf=TRUE), c(NA,NA,NA,NA,NA,NaN,NA,NA)) +test(6000.4292, frollmax(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf)) +test(6000.4293, frollmax(x, n, algo="exact", has.nf=TRUE), c(NA,NA,NA,NA,NA,NaN,NA,NA)) +test(6000.4294, frollmax(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf)) +x = c(NA,2,6,3,Inf,2,4,5) # Inf +test(6000.4311, frollmax(x, n), c(NA,NA,NA,6,Inf,Inf,Inf,5)) +test(6000.4312, frollmax(x, n, na.rm=TRUE), c(NA,NA,6,6,Inf,Inf,Inf,5)) +test(6000.4313, frollmax(x, n, algo="exact"), c(NA,NA,NA,6,Inf,Inf,Inf,5)) +test(6000.4314, frollmax(x, n, algo="exact", na.rm=TRUE), c(NA,NA,6,6,Inf,Inf,Inf,5)) +test(6000.4321, frollmax(x, n, has.nf=FALSE), c(NA,NA,6,6,Inf,Inf,Inf,5)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.4322, frollmax(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4323, frollmax(x, n, algo="exact", has.nf=FALSE), c(NA,NA,6,6,Inf,Inf,Inf,5)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.4324, frollmax(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4331, frollmax(x, n, has.nf=TRUE), c(NA,NA,NA,6,Inf,Inf,Inf,5)) +test(6000.4332, frollmax(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA,6,6,Inf,Inf,Inf,5)) +test(6000.4333, frollmax(x, n, algo="exact", has.nf=TRUE), c(NA,NA,NA,6,Inf,Inf,Inf,5)) +test(6000.4334, frollmax(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA,6,6,Inf,Inf,Inf,5)) +x = c(NA,2,-Inf,3,Inf,2,4,5) # -Inf +test(6000.4341, frollmax(x, n), c(NA,NA,NA,3,Inf,Inf,Inf,5)) +test(6000.4342, frollmax(x, n, na.rm=TRUE), c(NA,NA,2,3,Inf,Inf,Inf,5)) +test(6000.4343, frollmax(x, n, algo="exact"), c(NA,NA,NA,3,Inf,Inf,Inf,5)) +test(6000.4344, frollmax(x, n, algo="exact", na.rm=TRUE), c(NA,NA,2,3,Inf,Inf,Inf,5)) +test(6000.4351, frollmax(x, n, has.nf=FALSE), c(NA,NA,2,3,Inf,Inf,Inf,5)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.4352, frollmax(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4353, frollmax(x, n, algo="exact", has.nf=FALSE), c(NA,NA,2,3,Inf,Inf,Inf,5)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.4354, frollmax(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.4361, frollmax(x, n, has.nf=TRUE), c(NA,NA,NA,3,Inf,Inf,Inf,5)) +test(6000.4362, frollmax(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA,2,3,Inf,Inf,Inf,5)) +test(6000.4363, frollmax(x, n, algo="exact", has.nf=TRUE), c(NA,NA,NA,3,Inf,Inf,Inf,5)) +test(6000.4364, frollmax(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA,2,3,Inf,Inf,Inf,5)) +# edge cases +test(6000.501, frollmax(c(5,NA,1), 1L), c(5,NA,1)) ## na.rm=FALSE window recalc and NA happens to be the first element in a nested loop ## didn't help for codecov, adding internal error to wmax till we have a data that can reach there +test(6000.502, frollmax(c(5,NaN,1), 1L), c(5,NaN,1)) +test(6000.503, frollmax(c(5,1,1,NaN,1,1,1), 2L), c(NA,5,1,NaN,NaN,1,1)) +test(6000.504, frollmax(c(5,1,NA,NaN,1,1,1), 2L), c(NA,5,NA,NA,NaN,1,1)) + +## frollmin adaptive +options(datatable.verbose=TRUE) ## adaptive frollmin no fast algo +test(6000.6, frollmin(1:4, c(2,2,2,2), adaptive=TRUE), output="frolladaptivefun: algo 0 not implemented, fall back to 1") +test(6000.6001, frollmin(1:4, c(2,2,2,2), algo="fast", adaptive=TRUE), output="frolladaptivefun: algo 0 not implemented, fall back to 1") +test(6000.6002, frollmin(1:4, c(2,2,2,2), algo="exact", adaptive=TRUE), notOutput="frolladaptivefun: algo 0 not implemented, fall back to 1") +options(datatable.verbose=FALSE) +n = c(3,2,2,4,2,1,4,8) +x = c(7,2,3,6,3,2,6,6) # no NA +test(6000.6111, frollmin(x, n, adaptive=TRUE), c(NA,2,2,2,3,2,2,2)) # has.nf=NA # narm=F +test(6000.6112, frollmin(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,2,2,2,3,2,2,2)) # narm=T +test(6000.6121, frollmin(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,2,2,2,3,2,2,2)) # has.nf=F +test(6000.6122, frollmin(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.6131, frollmin(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,2,2,2,3,2,2,2)) # has.nf=T +test(6000.6132, frollmin(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,2,2,2,3,2,2,2)) +x = c(7,2,NA,6,3,NA,6,6) # NA +test(6000.6211, frollmin(x, n, adaptive=TRUE), c(NA,2,NA,NA,3,NA,NA,NA)) +test(6000.6212, frollmin(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,2,2,2,3,Inf,3,2)) +test(6000.6221, frollmin(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,2,2,2,3,Inf,3,2)) +test(6000.6222, frollmin(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.6231, frollmin(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,2,NA,NA,3,NA,NA,NA)) +test(6000.6232, frollmin(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,2,2,2,3,Inf,3,2)) +x = rep(NA_real_, 8) # all NA +test(6000.6241, frollmin(x, n, adaptive=TRUE), rep(NA_real_, 8)) +test(6000.6242, frollmin(x, n, na.rm=TRUE, adaptive=TRUE), c(NA, rep(Inf, 7))) +test(6000.6251, frollmin(x, n, has.nf=FALSE, adaptive=TRUE), c(NA, rep(Inf, 7))) +test(6000.6252, frollmin(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.6261, frollmin(x, n, has.nf=TRUE, adaptive=TRUE), rep(NA_real_, 8)) +test(6000.6262, frollmin(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA, rep(Inf, 7))) +x = c(NA,NaN,NA,NaN,NaN,NaN,NA,NA) # all NaN/NA +test(6000.6271, frollmin(x, n, adaptive=TRUE), c(NA,NA,NA,NA,NaN,NaN,NA,NA)) +test(6000.6272, frollmin(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,Inf,Inf,Inf,Inf,Inf,Inf,Inf)) +test(6000.6281, frollmin(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,Inf,Inf,Inf,Inf,Inf,Inf,Inf)) +test(6000.6282, frollmin(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.6291, frollmin(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,NA,NA,NA,NaN,NaN,NA,NA)) +test(6000.6292, frollmin(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,Inf,Inf,Inf,Inf,Inf,Inf,Inf)) +x = c(7,2,NA,6,3,Inf,6,6) # Inf +test(6000.6311, frollmin(x, n, adaptive=TRUE), c(NA,2,NA,NA,3,Inf,3,NA)) +test(6000.6312, frollmin(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,2,2,2,3,Inf,3,2)) +test(6000.6321, frollmin(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,2,2,2,3,Inf,3,2)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.6322, frollmin(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.6331, frollmin(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,2,NA,NA,3,Inf,3,NA)) +test(6000.6332, frollmin(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,2,2,2,3,Inf,3,2)) +x = c(7,2,-Inf,6,3,NA,6,6) # -Inf +test(6000.6341, frollmin(x, n, adaptive=TRUE), c(NA,2,-Inf,-Inf,3,NA,NA,NA)) +test(6000.6342, frollmin(x, n, na.rm=TRUE, adaptive=TRUE), c(NA,2,-Inf,-Inf,3,Inf,3,-Inf)) +test(6000.6351, frollmin(x, n, has.nf=FALSE, adaptive=TRUE), c(NA,2,-Inf,-Inf,3,Inf,3,-Inf)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.6352, frollmin(x, n, has.nf=FALSE, na.rm=TRUE, adaptive=TRUE), error="does not make sense") +test(6000.6361, frollmin(x, n, has.nf=TRUE, adaptive=TRUE), c(NA,2,-Inf,-Inf,3,NA,NA,NA)) +test(6000.6362, frollmin(x, n, has.nf=TRUE, na.rm=TRUE, adaptive=TRUE), c(NA,2,-Inf,-Inf,3,Inf,3,-Inf)) + +## frollmin non-adaptive +options(datatable.verbose=TRUE) +test(6000.7001, frollmin(1:3, 2), c(NA, 1, 2), output="frollminFast: running for input length") +test(6000.7002, frollmin(1:10, 5), c(NA,NA,NA,NA,1,2,3,4,5,6), output="frollminFast: nested window min calculation called 5 times") ## max: 0 +test(6000.7003, frollmin(10:1, 5), c(NA,NA,NA,NA,6,5,4,3,2,1), output="frollminFast: nested window min calculation called 0 times") ## max: 5 +test(6000.7004, frollmin(1:3, 2, algo="exact"), c(NA, 1, 2), output="frollminExact: running in parallel for input length") +test(6000.7005, frollmin(c(1,2,3,NA,5), 2), c(NA, 1, 2, NA, NA), output="continue with extra care for NFs") +options(datatable.verbose=FALSE) +n = 3 +x = c(7,2,3,6,3,2,4,5) # no NA +ans = c(NA,NA,2,2,3,2,2,2) +test(6000.7111, frollmin(x, n), ans) # has.nf=NA # narm=F +test(6000.7112, frollmin(x, n, na.rm=TRUE), ans) # narm=T +test(6000.7113, frollmin(x, n, algo="exact"), ans) # has.nf=NA # narm=F +test(6000.7114, frollmin(x, n, algo="exact", na.rm=TRUE), ans) # narm=T +test(6000.7121, frollmin(x, n, has.nf=FALSE), ans) # has.nf=F +test(6000.7122, frollmin(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7123, frollmin(x, n, algo="exact", has.nf=FALSE), ans) # has.nf=F +test(6000.7124, frollmin(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7131, frollmin(x, n, has.nf=TRUE), ans) # has.nf=T +test(6000.7132, frollmin(x, n, has.nf=TRUE, na.rm=TRUE), ans) +test(6000.7133, frollmin(x, n, algo="exact", has.nf=TRUE), ans) # has.nf=T +test(6000.7134, frollmin(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), ans) +x = c(7,2,3,NA,3,2,4,NA) # NA +test(6000.7211, frollmin(x, n), c(NA,NA,2,NA,NA,NA,2,NA)) +test(6000.7212, frollmin(x, n, na.rm=TRUE), c(NA,NA,2,2,3,2,2,2)) +test(6000.7213, frollmin(x, n, algo="exact"), c(NA,NA,2,NA,NA,NA,2,NA)) +test(6000.7214, frollmin(x, n, algo="exact", na.rm=TRUE), c(NA,NA,2,2,3,2,2,2)) +test(6000.7221, frollmin(x, n, has.nf=FALSE), c(NA,NA,2,2,3,2,2,2)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.7222, frollmin(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7223, frollmin(x, n, algo="exact", has.nf=FALSE), c(NA,NA,2,2,3,2,2,2)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.7224, frollmin(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7231, frollmin(x, n, has.nf=TRUE), c(NA,NA,2,NA,NA,NA,2,NA)) +test(6000.7232, frollmin(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA,2,2,3,2,2,2)) +test(6000.7233, frollmin(x, n, algo="exact", has.nf=TRUE), c(NA,NA,2,NA,NA,NA,2,NA)) +test(6000.7234, frollmin(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA,2,2,3,2,2,2)) +x = rep(NA_real_, 8) # all NA +test(6000.7241, frollmin(x, n), rep(NA_real_, 8)) +test(6000.7242, frollmin(x, n, na.rm=TRUE), c(NA,NA, rep(Inf, 6))) +test(6000.7243, frollmin(x, n, algo="exact"), rep(NA_real_, 8)) +test(6000.7244, frollmin(x, n, algo="exact", na.rm=TRUE), c(NA,NA, rep(Inf, 6))) +test(6000.7251, frollmin(x, n, has.nf=FALSE), c(NA,NA, rep(Inf, 6))) +test(6000.7252, frollmin(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7253, frollmin(x, n, algo="exact", has.nf=FALSE), c(NA,NA, rep(Inf, 6))) +test(6000.7254, frollmin(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7261, frollmin(x, n, has.nf=TRUE), rep(NA_real_, 8)) +test(6000.7262, frollmin(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA, rep(Inf, 6))) +test(6000.7263, frollmin(x, n, algo="exact", has.nf=TRUE), rep(NA_real_, 8)) +test(6000.7264, frollmin(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA, rep(Inf, 6))) +x = c(NA,NaN,NA,NaN,NaN,NaN,NA,NA) # all NaN/NA +test(6000.7271, frollmin(x, n), c(NA,NA,NA,NA,NA,NaN,NA,NA)) +test(6000.7272, frollmin(x, n, na.rm=TRUE), c(NA,NA,Inf,Inf,Inf,Inf,Inf,Inf)) +test(6000.7273, frollmin(x, n, algo="exact"), c(NA,NA,NA,NA,NA,NaN,NA,NA)) +test(6000.7274, frollmin(x, n, algo="exact", na.rm=TRUE), c(NA,NA,Inf,Inf,Inf,Inf,Inf,Inf)) +test(6000.7281, frollmin(x, n, has.nf=FALSE), c(NA,NA,Inf,Inf,Inf,Inf,Inf,Inf)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.7282, frollmin(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7283, frollmin(x, n, algo="exact", has.nf=FALSE), c(NA,NA,Inf,Inf,Inf,Inf,Inf,Inf)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.7284, frollmin(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7291, frollmin(x, n, has.nf=TRUE), c(NA,NA,NA,NA,NA,NaN,NA,NA)) +test(6000.7292, frollmin(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA,Inf,Inf,Inf,Inf,Inf,Inf)) +test(6000.7293, frollmin(x, n, algo="exact", has.nf=TRUE), c(NA,NA,NA,NA,NA,NaN,NA,NA)) +test(6000.7294, frollmin(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA,Inf,Inf,Inf,Inf,Inf,Inf)) +x = c(NA,2,6,3,Inf,2,4,5) # Inf +test(6000.7311, frollmin(x, n), c(NA,NA,NA,2,3,2,2,2)) +test(6000.7312, frollmin(x, n, na.rm=TRUE), c(NA,NA,2,2,3,2,2,2)) +test(6000.7313, frollmin(x, n, algo="exact"), c(NA,NA,NA,2,3,2,2,2)) +test(6000.7314, frollmin(x, n, algo="exact", na.rm=TRUE), c(NA,NA,2,2,3,2,2,2)) +test(6000.7321, frollmin(x, n, has.nf=FALSE), c(NA,NA,2,2,3,2,2,2)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.7322, frollmin(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7323, frollmin(x, n, algo="exact", has.nf=FALSE), c(NA,NA,2,2,3,2,2,2)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.7324, frollmin(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7331, frollmin(x, n, has.nf=TRUE), c(NA,NA,NA,2,3,2,2,2)) +test(6000.7332, frollmin(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA,2,2,3,2,2,2)) +test(6000.7333, frollmin(x, n, algo="exact", has.nf=TRUE), c(NA,NA,NA,2,3,2,2,2)) +test(6000.7334, frollmin(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA,2,2,3,2,2,2)) +x = c(NA,2,-Inf,3,Inf,2,4,5) # -Inf +test(6000.7341, frollmin(x, n), c(NA,NA,NA,-Inf,-Inf,2,2,2)) +test(6000.7342, frollmin(x, n, na.rm=TRUE), c(NA,NA,-Inf,-Inf,-Inf,2,2,2)) +test(6000.7343, frollmin(x, n, algo="exact"), c(NA,NA,NA,-Inf,-Inf,2,2,2)) +test(6000.7344, frollmin(x, n, algo="exact", na.rm=TRUE), c(NA,NA,-Inf,-Inf,-Inf,2,2,2)) +test(6000.7351, frollmin(x, n, has.nf=FALSE), c(NA,NA,-Inf,-Inf,-Inf,2,2,2)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.7352, frollmin(x, n, has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7353, frollmin(x, n, algo="exact", has.nf=FALSE), c(NA,NA,-Inf,-Inf,-Inf,2,2,2)) ## expected incorrect results, see manual has.nf section for details, added in #5441 +test(6000.7354, frollmin(x, n, algo="exact", has.nf=FALSE, na.rm=TRUE), error="does not make sense") +test(6000.7361, frollmin(x, n, has.nf=TRUE), c(NA,NA,NA,-Inf,-Inf,2,2,2)) +test(6000.7362, frollmin(x, n, has.nf=TRUE, na.rm=TRUE), c(NA,NA,-Inf,-Inf,-Inf,2,2,2)) +test(6000.7363, frollmin(x, n, algo="exact", has.nf=TRUE), c(NA,NA,NA,-Inf,-Inf,2,2,2)) +test(6000.7364, frollmin(x, n, algo="exact", has.nf=TRUE, na.rm=TRUE), c(NA,NA,-Inf,-Inf,-Inf,2,2,2)) +# edge cases +test(6000.801, frollmin(c(5,NA,1), 1L), c(5,NA,1)) ## na.rm=FALSE window recalc and NA happens to be the first element in a nested loop ## didn't help for codecov, adding internal error to wmin till we have a data that can reach there +test(6000.802, frollmin(c(5,NaN,1), 1L), c(5,NaN,1)) +test(6000.803, frollmin(c(1,5,5,NaN,5,5,5), 2L), c(NA,1,5,NaN,NaN,5,5)) +test(6000.804, frollmin(c(1,5,NA,NaN,5,5,5), 2L), c(NA,1,NA,NA,NaN,5,5)) + +# frollprod +test(6000.901, frollprod(c(1,1,1), 1), c(1,1,1)) +test(6000.902, frollprod(c(1,1,1), 2), c(NA,1,1)) +test(6000.903, frollprod(c(1,1,1), 2, partial=TRUE), c(1,1,1)) +test(6000.904, frollprod(c(1,1,1), 2, align="left"), c(1,1,NA)) +test(6000.905, frollprod(c(1,1,1), 2, align="left", partial=TRUE), c(1,1,1)) +test(6000.906, frollprod(c(1,1,1), 2, align="center"), c(1,1,NA)) +test(6000.907, frollprod(c(1,1,1,1), 2, align="center"), c(1,1,1,NA)) +test(6000.908, frollprod(1:5, 2, partial=TRUE), c(1,2,6,12,20)) +test(6000.909, frollprod(5:1, 2, partial=TRUE), c(5,20,12,6,2)) +test(6000.910, frollprod(c(Inf,Inf,-Inf), 3), c(NA,NA,-Inf)) +test(6000.911, frollprod(c(Inf,-Inf,-Inf), 3), c(NA,NA,Inf)) +test(6000.912, frollprod(c(-Inf,-Inf), 2), c(NA,Inf)) +test(6000.913, frollprod(c(-Inf,-Inf, -1), 3), c(NA,NA,-Inf)) +test(6000.914, frollprod(1:5, rep(2,5), adaptive=TRUE), c(NA,2,6,12,20)) +test(6000.915, frollprod(1:6/2, 3), c(rep(NA_real_,2), c(0.75, 3, 7.5, 15))) +test(6000.916, frollprod(1:6/2, c(2L, 2L, 3L, 4L, 2L, 3L), adaptive=TRUE), c(NA, 0.5, 0.75, 1.5, 5, 15)) +options(datatable.verbose=TRUE) +test(6000.921, frollprod(1:5, 6), rep(NA_real_, 5L), output="window width longer than input vector") +options(datatable.verbose=FALSE) +test(6000.922, frollprod(c(1:2,NA,4:10), 4, has.nf=FALSE), c(rep(NA_real_, 6), 840, 1680, 3024, 5040), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.923, frollprod(c(1:2,NA,4:10), 2, has.nf=FALSE), c(NA, 2, NA, NA, 20, 30, 42, 56, 72, 90), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.924, frollprod(c(1:2,NA,4:10), 4, has.nf=FALSE, algo="exact"), c(rep(NA_real_, 6), 840, 1680, 3024, 5040), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +options(datatable.verbose=TRUE) +test(6000.925, frollprod(c(1:2,NA,4:10), 4, algo="exact", na.rm=TRUE), c(NA, NA, NA, 8, 40, 120, 840, 1680, 3024, 5040), output="non-finite values are present in input, re-running with extra care for NFs") +test(6000.926, frollprod(c(1:2,NA,4:10), 4, algo="exact"), c(NA, NA, NA, NA, NA, NA, 840, 1680, 3024, 5040), output="non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run") +options(datatable.verbose=FALSE) +test(6000.927, frollprod(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, has.nf=FALSE), c(NA, NA, NA, NA, NA, NA, 840, 1680, 3024, 5040), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +test(6000.928, frollprod(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, has.nf=FALSE, algo="exact"), c(NA, NA, NA, NA, NA, NA, 840, 1680, 3024, 5040), warning="has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning") +options(datatable.verbose=TRUE) +test(6000.929, frollprod(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, algo="exact", na.rm=TRUE), c(NA, NA, NA, 8, 40, 120, 840, 1680, 3024, 5040), output="non-finite values are present in input, re-running with extra care for NFs") +test(6000.930, frollprod(c(1:2,NA,4:10), rep(4L,10), adaptive=TRUE, algo="exact"), c(NA, NA, NA, NA, NA, NA, 840, 1680, 3024, 5040), output="non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run") +test(6000.931, frollprod(1:3, 2), c(NA, 2, 6), output="frollprodFast: running for input length") +test(6000.932, frollprod(1:3, 2, align="left"), c(2, 6, NA), output="frollfun: align") +test(6000.933, frollprod(c(1,2,NA), 2), c(NA, 2, NA), output="non-finite values are present in input, re-running with extra care for NFs") +test(6000.934, frollprod(c(NA,2,3), 2), c(NA, NA, 6), output="non-finite values are present in input, skip non-finite inaware attempt and run with extra care for NFs straighaway") +test(6000.935, frollprod(1:3, c(2,2,2), adaptive=TRUE), c(NA, 2, 6), output="frolladaptiveprodFast: running for input length") +test(6000.936, frollprod(c(NA,2,3), c(2,2,2), adaptive=TRUE), c(NA, NA, 6), output="non-finite values are present in input, re-running with extra care for NFs") options(datatable.verbose=FALSE) +# floating point overflow +test(6000.941, frollprod(c(1e100, 1e100, 1e100, 1e100, 1e100), 5), c(NA,NA,NA,NA,Inf)) +test(6000.942, frollprod(c(1e100, 1e100, 1e100, 1e100, 1e100), 4), c(NA,NA,NA,Inf,Inf)) +test(6000.943, frollprod(c(1e100, 1e100, 1e100, 1e100, -1e100), 5), c(NA,NA,NA,NA,-Inf)) +test(6000.944, frollprod(c(1e100, 1e100, 1e100, 1e100, -1e100), 4), c(NA,NA,NA,Inf,-Inf)) +test(6000.945, frollprod(c(1e100, 1e100, 1e100, 1e100, 1e100), 5, algo="exact"), c(NA,NA,NA,NA,Inf)) +test(6000.946, frollprod(c(1e100, 1e100, 1e100, 1e100, 1e100), 4, algo="exact"), c(NA,NA,NA,Inf,Inf)) +test(6000.947, frollprod(c(1e100, 1e100, 1e100, 1e100, -1e100), 5, algo="exact"), c(NA,NA,NA,NA,-Inf)) +test(6000.948, frollprod(c(1e100, 1e100, 1e100, 1e100, -1e100), 4, algo="exact"), c(NA,NA,NA,Inf,-Inf)) +test(6000.949, frollprod(c(1e100, 1e100, 1e100, 1e100, 1e100), rep(5, 5), adaptive=TRUE), c(NA,NA,NA,NA,Inf)) +test(6000.950, frollprod(c(1e100, 1e100, 1e100, 1e100, 1e100), rep(4, 5), adaptive=TRUE), c(NA,NA,NA,Inf,Inf)) +test(6000.951, frollprod(c(1e100, 1e100, 1e100, 1e100, -1e100), rep(5, 5), adaptive=TRUE), c(NA,NA,NA,NA,-Inf)) +test(6000.952, frollprod(c(1e100, 1e100, 1e100, 1e100, -1e100), rep(4, 5), adaptive=TRUE), c(NA,NA,NA,Inf,-Inf)) +test(6000.953, frollprod(c(1e100, 1e100, 1e100, 1e100, 1e100), rep(5, 5), algo="exact", adaptive=TRUE), c(NA,NA,NA,NA,Inf)) +test(6000.954, frollprod(c(1e100, 1e100, 1e100, 1e100, 1e100), rep(4, 5), algo="exact", adaptive=TRUE), c(NA,NA,NA,Inf,Inf)) +test(6000.955, frollprod(c(1e100, 1e100, 1e100, 1e100, -1e100), rep(5, 5), algo="exact", adaptive=TRUE), c(NA,NA,NA,NA,-Inf)) +test(6000.956, frollprod(c(1e100, 1e100, 1e100, 1e100, -1e100), rep(4, 5), algo="exact", adaptive=TRUE), c(NA,NA,NA,Inf,-Inf)) -## validation +## partial +x = 1:6/2 +n = 3 +an = function(n, len) c(seq.int(n), rep(n, len-n)) +test(6006.011, frollmean(x, an(n, length(x)), adaptive=TRUE), c(0.5,0.75,1,1.5,2,2.5)) +test(6006.012, frollmean(x, n, partial=TRUE), c(0.5,0.75,1,1.5,2,2.5)) +ans = frollmean(x, n) +ans[seq.int(n-1L)] = frollmean(x[seq.int(n-1L)], n, partial=TRUE) +test(6006.013, ans, c(0.5,0.75,1,1.5,2,2.5)) +test(6006.021, frollmean(x, rev(an(rev(n), length(x))), adaptive=TRUE, align="left"), c(1,1.5,2,2.5,2.75,3)) +test(6006.022, frollmean(x, n, partial=TRUE, align="left"), c(1,1.5,2,2.5,2.75,3)) +ans = frollmean(x, n, align="left") +ans[(length(x)-n-1L):length(x)] = frollmean(x[(length(x)-n-1L):length(x)], n, partial=TRUE, align="left") +test(6006.023, ans, c(1,1.5,2,2.5,2.75,3)) +ans = list(c(0.50,0.75,1.00,1.50,2.00,2.50), c(0.50,0.75,1.00,1.25,1.75,2.25)) +test(6006.031, frollmean(1:6/2, list(3L,4L), partial=TRUE), error="n must be integer, list is accepted for adaptive TRUE") +test(6006.032, frollmean(1:6/2, 3:4, partial=TRUE), ans) +options(datatable.verbose=TRUE) +test(6006.901, frollmean(x, n, partial=TRUE), c(0.5,0.75,1,1.5,2,2.5), output="froll partial=TRUE trimming n and redirecting to adaptive=TRUE") +test(6006.902, frollmean(x, rep(n, length(x)), adaptive=TRUE, partial=TRUE), c(0.5,0.75,1,1.5,2,2.5), output="trimming", notOutput="redirecting") +options(datatable.verbose=FALSE) +test(6006.903, frollmean(1:4, 2L, align="center", partial=TRUE), error="'partial' cannot be used together with align='center'") +test(6006.904, frollmean(list(1:4, 2:4), n, partial=TRUE), error="'partial' does not support variable length of columns in x") +test(6006.905, frollmean(list(data.table(v1=1:4), data.table(v1=1:3)), n, partial=TRUE), error="'partial' does not support variable nrow of data.tables in x") +test(6006.906, frollmean(x, TRUE, partial=TRUE), error="n must be integer") +test(6006.907, frollmean(x, list(TRUE), partial=TRUE), error="n must be integer, list is accepted for adaptive TRUE") +## partial adaptive +test(6006.930, frollmean(1:4, rep(2L,4L), adaptive=TRUE, partial=TRUE), c(1,1.5,2.5,3.5)) +test(6006.9301, frollmean(1:4, list(1:4, 1:3), adaptive=TRUE, partial=TRUE), error="adaptive windows provided in n must not to have different lengths") +test(6006.9302, frollmean(1:4, list(1:3), adaptive=TRUE, partial=TRUE), error="length of vectors in x must match to length of adaptive window in n") +test(6006.9303, frollmean(1:4, list(rep(2L,4L)), adaptive=TRUE, partial=TRUE), c(1,1.5,2.5,3.5)) +test(6006.9311, frollsum(1:4, 1:4, adaptive=TRUE, partial=TRUE), c(1,3,6,10)) ## all same as index +test(6006.9312, frollsum(1:4, 1:4, align="left", adaptive=TRUE, partial=TRUE), c(1,5,7,4)) +test(6006.9321, frollsum(1:4, c(2,3,1,1), adaptive=TRUE, partial=TRUE), c(1,3,3,4)) ## leading two bigger than index +test(6006.9322, frollsum(1:4, c(2,3,1,1), align="left", adaptive=TRUE, partial=TRUE), c(3,9,3,4)) +test(6006.9323, frollsum(1:4, c(6,5,4,2), adaptive=TRUE, partial=TRUE), c(1,3,6,7)) ## leading two bigger than rev index +test(6006.9324, frollsum(1:4, c(6,5,4,2), align="left", adaptive=TRUE, partial=TRUE), c(10,9,7,4)) +test(6006.9331, frollsum(1:4, c(2,4,5,6), adaptive=TRUE, partial=TRUE), c(1,3,6,10)) ## trailing two bigger than index +test(6006.9332, frollsum(1:4, c(2,4,5,6), align="left", adaptive=TRUE, partial=TRUE), c(3,9,7,4)) +test(6006.9333, frollsum(1:4, c(1,1,3,2), adaptive=TRUE, partial=TRUE), c(1,2,6,7)) ## trailing two bigger than rev index +test(6006.9334, frollsum(1:4, c(1,1,3,2), align="left", adaptive=TRUE, partial=TRUE), c(1,2,7,4)) + +## give.names +test(6006.951, frollsum(c(1,2,3), 2, give.names=TRUE), c(NA,3,5)) +test(6006.952, frollsum(list(c(1,2,3)), 2, give.names=TRUE), list(V1_rollsum2=c(NA,3,5))) +test(6006.953, frollsum(list(x1=c(1,2,3)), 2, give.names=TRUE), list(x1_rollsum2=c(NA,3,5))) +test(6006.954, frollsum(list(c(1,2,3)), c(n1=2), give.names=TRUE), list(V1_n1=c(NA,3,5))) +test(6006.955, frollsum(list(x1=c(1,2,3)), c(n1=2), give.names=TRUE), list(x1_n1=c(NA,3,5))) +test(6006.956, frollsum(c(1,2,3), 2:3, give.names=TRUE), list(rollsum2=c(NA,3,5), rollsum3=c(NA,NA,6))) +test(6006.957, frollsum(list(c(1,2,3)), 2:3, give.names=TRUE), list(V1_rollsum2=c(NA,3,5), V1_rollsum3=c(NA,NA,6))) +test(6006.958, frollsum(list(c(1,2,3), c(2,3,4)), 2, give.names=TRUE), list(V1_rollsum2=c(NA,3,5), V2_rollsum2=c(NA,5,7))) +test(6006.959, frollsum(list(c(1,2,3), c(2,3,4)), 2:3, give.names=TRUE), list(V1_rollsum2=c(NA,3,5), V1_rollsum3=c(NA,NA,6), V2_rollsum2=c(NA,5,7), V2_rollsum3=c(NA,NA,9))) +test(6006.960, frollsum(c(1,2,3), c(n1=2, n2=3), give.names=TRUE), list(n1=c(NA,3,5), n2=c(NA,NA,6))) +test(6006.961, frollsum(list(c(1,2,3)), c(n1=2, n2=3), give.names=TRUE), list(V1_n1=c(NA,3,5), V1_n2=c(NA,NA,6))) +test(6006.962, frollsum(list(x1=c(1,2,3)), 2:3, give.names=TRUE), list(x1_rollsum2=c(NA,3,5), x1_rollsum3=c(NA,NA,6))) +test(6006.963, frollsum(list(x1=c(1,2,3)), c(n1=2, n2=3), give.names=TRUE), list(x1_n1=c(NA,3,5), x1_n2=c(NA,NA,6))) +test(6006.964, frollsum(list(c(1,2,3), c(2,3,4)), c(n1=2), give.names=TRUE), list(V1_n1=c(NA,3,5), V2_n1=c(NA,5,7))) +test(6006.965, frollsum(list(c(1,2,3), c(2,3,4)), c(n1=2, n2=3), give.names=TRUE), list(V1_n1=c(NA,3,5), V1_n2=c(NA,NA,6), V2_n1=c(NA,5,7), V2_n2=c(NA,NA,9))) +test(6006.966, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), 2, give.names=TRUE), list(x1_rollsum2=c(NA,3,5), x2_rollsum2=c(NA,5,7))) +test(6006.967, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), 2:3, give.names=TRUE), list(x1_rollsum2=c(NA,3,5), x1_rollsum3=c(NA,NA,6), x2_rollsum2=c(NA,5,7), x2_rollsum3=c(NA,NA,9))) +test(6006.968, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), c(n1=2), give.names=TRUE), list(x1_n1=c(NA,3,5), x2_n1=c(NA,5,7))) +test(6006.969, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), c(n1=2, n2=3), give.names=TRUE), list(x1_n1=c(NA,3,5), x1_n2=c(NA,NA,6), x2_n1=c(NA,5,7), x2_n2=c(NA,NA,9))) +test(6006.971, frollsum(c(1,2,3), c(2,2,2), adaptive=TRUE, give.names=TRUE), c(NA,3,5)) ## adaptive +test(6006.972, frollsum(c(1,2,3), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), c(NA,3,5)) +test(6006.973, frollsum(list(c(1,2,3)), c(2,2,2), adaptive=TRUE, give.names=TRUE), list(V1=c(NA,3,5))) +test(6006.974, frollsum(list(c(1,2,3)), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(V1_arollsum1=c(NA,3,5))) +test(6006.975, frollsum(list(x1=c(1,2,3)), c(2,2,2), adaptive=TRUE, give.names=TRUE), list(x1=c(NA,3,5))) +test(6006.976, frollsum(list(x1=c(1,2,3)), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(x1_arollsum1=c(NA,3,5))) +test(6006.977, frollsum(list(c(1,2,3)), list(n1=c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(V1_n1=c(NA,3,5))) +test(6006.978, frollsum(list(x1=c(1,2,3)), list(n1=c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(x1_n1=c(NA,3,5))) +test(6006.979, frollsum(c(1,2,3), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(arollsum1=c(NA,3,5), arollsum2=c(NA,NA,6))) +test(6006.980, frollsum(list(c(1,2,3)), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(V1_arollsum1=c(NA,3,5), V1_arollsum2=c(NA,NA,6))) +test(6006.981, frollsum(list(c(1,2,3), c(2,3,4)), c(2,2,2), adaptive=TRUE, give.names=TRUE), list(V1=c(NA,3,5), V2=c(NA,5,7))) +test(6006.982, frollsum(list(c(1,2,3), c(2,3,4)), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(V1_arollsum1=c(NA,3,5), V2_arollsum1=c(NA,5,7))) +test(6006.983, frollsum(list(c(1,2,3), c(2,3,4)), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(V1_arollsum1=c(NA,3,5), V1_arollsum2=c(NA,NA,6), V2_arollsum1=c(NA,5,7), V2_arollsum2=c(NA,NA,9))) +test(6006.984, frollsum(c(1,2,3), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(n1=c(NA,3,5), n2=c(NA,NA,6))) +test(6006.985, frollsum(list(c(1,2,3)), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(V1_n1=c(NA,3,5), V1_n2=c(NA,NA,6))) +test(6006.986, frollsum(list(x1=c(1,2,3)), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(x1_arollsum1=c(NA,3,5), x1_arollsum2=c(NA,NA,6))) +test(6006.987, frollsum(list(x1=c(1,2,3)), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(x1_n1=c(NA,3,5), x1_n2=c(NA,NA,6))) +test(6006.988, frollsum(list(c(1,2,3), c(2,3,4)), list(n1=c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(V1_n1=c(NA,3,5), V2_n1=c(NA,5,7))) +test(6006.989, frollsum(list(c(1,2,3), c(2,3,4)), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(V1_n1=c(NA,3,5), V1_n2=c(NA,NA,6), V2_n1=c(NA,5,7), V2_n2=c(NA,NA,9))) +test(6006.990, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), c(2,2,2), adaptive=TRUE, give.names=TRUE), list(x1=c(NA,3,5), x2=c(NA,5,7))) +test(6006.991, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(x1_arollsum1=c(NA,3,5), x2_arollsum1=c(NA,5,7))) +test(6006.992, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(x1_arollsum1=c(NA,3,5), x1_arollsum2=c(NA,NA,6), x2_arollsum1=c(NA,5,7), x2_arollsum2=c(NA,NA,9))) +test(6006.993, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), list(n1=c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(x1_n1=c(NA,3,5), x2_n1=c(NA,5,7))) +test(6006.994, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(x1_n1=c(NA,3,5), x1_n2=c(NA,NA,6), x2_n1=c(NA,5,7), x2_n2=c(NA,NA,9))) +test(6006.9950, frollsum(c(1,2,3), 2, partial=TRUE, give.names=TRUE), c(1,3,5)) ## partial +test(6006.9951, frollsum(c(1,2,3), c(n1=2), partial=TRUE, give.names=TRUE), c(1,3,5)) +test(6006.9952, frollsum(list(c(1,2,3)), 2, partial=TRUE, give.names=TRUE), list(V1_rollsum2=c(1,3,5))) +test(6006.9953, frollsum(list(x1=c(1,2,3)), 2, partial=TRUE, give.names=TRUE), list(x1_rollsum2=c(1,3,5))) +test(6006.9954, frollsum(list(c(1,2,3)), c(n1=2), partial=TRUE, give.names=TRUE), list(V1_n1=c(1,3,5))) +test(6006.9955, frollsum(list(x1=c(1,2,3)), c(n1=2), partial=TRUE, give.names=TRUE), list(x1_n1=c(1,3,5))) +test(6006.9956, frollsum(list(c(1,2,3), c(2,3,4)), c(2, 3), partial=TRUE, give.names=TRUE), list(V1_rollsum2=c(1,3,5), V1_rollsum3=c(1,3,6), V2_rollsum2=c(2,5,7), V2_rollsum3=c(2,5,9))) +test(6006.9957, frollsum(list(c(1,2,3), c(2,3,4)), c(n1=2, n2=3), partial=TRUE, give.names=TRUE), list(V1_n1=c(1,3,5), V1_n2=c(1,3,6), V2_n1=c(2,5,7), V2_n2=c(2,5,9))) +test(6006.9958, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), c(2, 3), partial=TRUE, give.names=TRUE), list(x1_rollsum2=c(1,3,5), x1_rollsum3=c(1,3,6), x2_rollsum2=c(2,5,7), x2_rollsum3=c(2,5,9))) +test(6006.9959, frollsum(list(x1=c(1,2,3), x2=c(2,3,4)), c(n1=2, n2=3), partial=TRUE, give.names=TRUE), list(x1_n1=c(1,3,5), x1_n2=c(1,3,6), x2_n1=c(2,5,7), x2_n2=c(2,5,9))) +test(6006.9960, frollsum(c(1,2,3), c(2,2,2), adaptive=TRUE, partial=TRUE, give.names=TRUE), c(1,3,5)) ## adaptive partial +test(6006.9961, frollsum(c(1,2,3), list(c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), c(1,3,5)) +test(6006.9962, frollsum(list(c(1,2,3)), c(2,2,2), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(V1=c(1,3,5))) +test(6006.9963, frollsum(list(c(1,2,3)), list(c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(V1_arollsum1=c(1,3,5))) +test(6006.9964, frollsum(list(x1=c(1,2,3)), c(2,2,2), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(x1=c(1,3,5))) +test(6006.9965, frollsum(list(x1=c(1,2,3)), list(c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(x1_arollsum1=c(1,3,5))) +test(6006.9966, frollsum(c(1,2,3), list(c(n1=c(2,2,2))), adaptive=TRUE, partial=TRUE, give.names=TRUE), c(1,3,5)) +test(6006.9967, frollsum(list(c(1,2,3)), c(2,2,2), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(V1=c(1,3,5))) +test(6006.9968, frollsum(list(c(1,2,3)), list(n1=c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(V1_n1=c(1,3,5))) +test(6006.9969, frollsum(list(x1=c(1,2,3)), list(n1=c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(x1_n1=c(1,3,5))) + +## frollapply +x = as.double(1:10) +test(6010.001, frollsum(x, 3L), frollapply(x, 3L, sum)) +test(6010.002, frollsum(x, 6), frollapply(x, 6, sum)) +test(6010.003, frollmean(x, 3), frollapply(x, 3, mean)) +d = as.data.table(list(1:6/2, 3:8/4)) +test(6010.004, frollsum(d, 3:4), frollapply(d, 3:4, sum)) +test(6010.005, frollmean(d, 3:4), frollapply(d, 3:4, mean)) +d = rbind(d, list(NA,NA)) +ans = list(c(NA,NA,1.5,2,1.5,2,2.5), c(NA,NA,NA,2,1,1.5,2), c(NA,NA,1.25,1.5,1.75,1.5,2), c(NA,NA,NA,1.5,1,1.25,1.5)) +test(6010.006, frollapply(d, 3:4, function(x, ...) if (sum(x, ...)>5) min(x, ...) else max(x, ...), na.rm=TRUE), ans) +# segfault and protect limits #3993 - disabled by default due to high memory usage +if (FALSE) { + test(6010.007, frollapply(1, rep(1L, 1e5), identity), as.list(rep(1, 1e5))) + test(6010.008, frollapply(1, rep(1L, 1e6), identity), as.list(rep(1, 1e6))) + test(6010.009, frollapply(as.list(rep(1, 1e6)), 1, identity), as.list(rep(1, 1e6))) +} +## check documented side effect of noalloc optimization +rollapply = function(x, n, FUN, fill=NA) { + ans = vector("list", length(x)) + if (n>1L) ans[1L:(n-1L)] = as.list(rep(fill, n-1L)) + for (i in n:length(x)) ans[[i]] = FUN(x[(i-n+1L):i]) + ans +} +old = setDTthreads(1L) +test(6010.011, frollapply(c(1, 9), 1L, FUN=identity, simplify=FALSE), list(9,9)) +test(6010.012, frollapply(c(1, 9), 1L, FUN=list, simplify=FALSE), list(list(9),list(9))) +test(6010.013, frollapply(c(1, 9), 1L, FUN=function(x) copy(identity(x)), simplify=FALSE), list(1,9)) +test(6010.014, frollapply(c(1, 9), 1L, FUN=function(x) copy(list(x)), simplify=FALSE), list(list(1),list(9))) +test(6010.015, frollapply(c(1, 9), 1L, FUN=function(x) copy(identity(x)), simplify=FALSE), rollapply(c(1, 9), n=1L, identity)) +test(6010.016, frollapply(c(1, 9), 1L, FUN=function(x) copy(list(x)), simplify=FALSE), rollapply(c(1, 9), n=1L, list)) +setDTthreads(old) + +#### test disabling parallelism +use.fork = .Platform$OS.type!="windows" && getDTthreads()>1L +if (use.fork) { + options(datatable.verbose=TRUE) + test(6010.021, frollapply(1:2, 1, identity), 1:2, output="running on multiple CPU threads using parallel::mcparallel") + options(datatable.verbose=FALSE) + test(6010.022, frollapply(1:2, 1, function(x) {warning("warn"); x}), 1:2) ## warning ignored + test(6010.023, frollapply(1:2, 1, function(x) {stop("err:", tail(x,1)); x}), error="err:1\nerr:2") + test(6010.024, frollapply(1:2, 1, function(x) stop("err")), error="err") ## unique error +} +old = setDTthreads(1L) +options(datatable.verbose=TRUE) +test(6010.025, frollapply(1:2, 1, identity), c(2L,2L), output="running on single CPU thread") +options(datatable.verbose=FALSE) +test(6010.026, frollapply(1:2, 1, function(x) {warning("warn"); x}), c(2L,2L), warning="warn") +test(6010.027, frollapply(1:2, 1, function(x) {warning("warn:", tail(x,1)); x}), c(2L,2L), warning="warn:1\nwarn:2") +test(6010.028, frollapply(1:2, 1, function(x) {stop("err:", tail(x,1)); x}), error="err:1") ## only first +setDTthreads(old) + +#### corner cases from examples - handled properly after frollapply rewrite to R +test(6010.101, frollapply(1:5, 3, function(x) head(x, 2)), list(NA, NA, 1:2, 2:3, 3:4)) +f = function(x) { + n = length(x) + # length 1 will be returned only for first iteration where we check length + if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored +} +test(6010.102, frollapply(1:5, 3, f), list(NA,NA,1L,c(2L,4L),c(3L,5L))) +test(6010.103, frollapply(c(1,2,1,1,1,2,3,2), 3, uniqueN), c(NA,NA,2L,2L,1L,2L,3L,2L)) +test(6010.104, frollapply(c(1,2,1,1,NA,2,NA,2), 3, anyNA), c(NA,NA,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE)) +f = function(x) { + n = length(x) + # double type will be returned only for first iteration where we check type + if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double +} +test(6010.105, head(frollapply(1:5, 3, f), 3L), list(NA, NA, 1)) + +## partial +x = 1:6/2 +n = 3 +an = function(n, len) c(seq.int(n), rep(n, len-n)) +test(6010.111, frollapply(FUN=mean, x, an(n, length(x)), adaptive=TRUE), c(0.5,0.75,1,1.5,2,2.5)) +test(6010.112, frollapply(FUN=mean, x, n, partial=TRUE), c(0.5,0.75,1,1.5,2,2.5)) +ans = frollapply(FUN=mean, x, n) +ans[seq.int(n-1L)] = frollapply(FUN=mean, x[seq.int(n-1L)], n, partial=TRUE) +test(6010.113, ans, c(0.5,0.75,1,1.5,2,2.5)) +test(6010.121, frollapply(FUN=mean, x, rev(an(rev(n), length(x))), adaptive=TRUE, align="left"), c(1,1.5,2,2.5,2.75,3)) +test(6010.122, frollapply(FUN=mean, x, n, partial=TRUE, align="left"), c(1,1.5,2,2.5,2.75,3)) +ans = frollapply(FUN=mean, x, n, align="left") +ans[(length(x)-n-1L):length(x)] = frollapply(FUN=mean, x[(length(x)-n-1L):length(x)], n, partial=TRUE, align="left") +test(6010.123, ans, c(1,1.5,2,2.5,2.75,3)) +ans = list(c(0.50,0.75,1.00,1.50,2.00,2.50), c(0.50,0.75,1.00,1.25,1.75,2.25)) +test(6010.131, frollapply(FUN=mean, 1:6/2, list(3L,4L), partial=TRUE), error="'N' must be integer, list is accepted for adaptive TRUE") +test(6010.132, frollapply(FUN=mean, 1:6/2, 3:4, partial=TRUE), ans) +test(6010.143, frollapply(FUN=mean, 1:4, 2L, align="center", partial=TRUE), error="'partial' cannot be used together with align='center'") +test(6010.144, frollapply(FUN=mean, list(1:4, 2:4), n, partial=TRUE), error="'partial' does not support variable length of columns in x") +test(6010.145, frollapply(FUN=mean, x, TRUE, partial=TRUE), error="'N' must be integer vector") +test(6010.146, frollapply(FUN=mean, x, list(TRUE), partial=TRUE), error="'N' must be integer, list is accepted for adaptive TRUE") +## growable failed if length was set after copy: attempt to set index 1/1 in SET_STRING_ELT +old = setDTthreads(1L) +test(6010.150, frollapply(c("B","B","C"), 3, unique, simplify=FALSE, partial=TRUE), list("B", "B", c("B","C"))) +setDTthreads(old) + +# frollapply adaptive +test(6010.2011, frollapply(1:3, c(3,3,3), sum, adaptive=TRUE), c(NA,NA,6L)) +test(6010.2012, frollapply(1:3, c(4,4,4), sum, adaptive=TRUE), rep(NA,3)) # none of the windows in k was small enough to cover length of x +test(6010.2013, frollapply(1:5, rep(2, 5), mean, adaptive=NA), error="'adaptive' must be TRUE or FALSE") +test(6010.2014, frollapply(1:5, rep(3, 5), toString, adaptive=TRUE), c(NA,NA,"1, 2, 3","2, 3, 4","3, 4, 5")) +test(6010.2015, frollapply(1:2, 1:2, mean, adaptive=TRUE, align="right"), c(1, 1.5)) +test(6010.2016, frollapply(1:2, 1:2, mean, adaptive=TRUE, align="center"), error="using adaptive TRUE and align 'center' is not implemented") +test(6010.2017, frollapply(list(1:2, 1:3), list(1:2), mean, adaptive=TRUE), error="adaptive rolling function can only process 'X' having equal length of elements; If you want to call rolling function on list having variable length of elements call it for each field separately") +test(6010.2018, frollapply(1:5, rep(3, 5), function(x) head(x, 2), adaptive=TRUE), list(NA, NA, 1:2, 2:3, 3:4)) +test(6010.2019, frollapply(1:10, list(1:5), mean, adaptive=TRUE), error="length of integer vector(s) provided as list to 'N' argument must be equal to number of observations provided in 'X'") +test(6010.202, frollapply(1:10, 1:5, mean, adaptive=TRUE), error="length of integer vector(s) provided as list to 'N' argument must be equal to number of observations provided in 'X'") +options(datatable.verbose=TRUE) +test(6010.2021, frollapply(c(1,3,4,2,0), c(3,2,2,3,2), sum, adaptive=TRUE, align="left"), c(8,7,6,NA,NA), output="processing for align='right'") +options(datatable.verbose=FALSE) +test(6010.203, frollapply(c(1,2,1,1,1,2,3,2), rep(3, 8), uniqueN, adaptive=TRUE), c(NA,NA,2L,2L,1L,2L,3L,2L)) +test(6010.204, frollapply(c(1,2,1,1,NA,2,NA,2), rep(3, 8), anyNA, adaptive=TRUE), c(NA,NA,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE)) +test(6010.205, frollapply(c(2,2,2,3,4), c(1,3,3,2,3), uniqueN, adaptive=TRUE), c(1L,NA,1L,2L,3L)) ## window width bigger than location + +#### test coverage +test(6010.501, frollapply(1:3, "b", sum), error="'N' must be integer") +test(6010.503, frollapply(1:3, integer(), sum), error="'N' must be non 0 length") +test(6010.504, frollapply(1:3, 2L, sum, fill=1:2), list(1:2, 3L, 5L)) +test(6010.505, frollapply(1:3, 2L, sum, fill=NA_integer_), c(NA,3L,5L)) +test(6010.506, frollapply(1:3, 2L, sum, fill=-1L), c(-1L,3L,5L)) +test(6010.5071, frollapply(1:3, 2L, sum, fill=-2), c(-2L,3L,5L)) +test(6010.5072, frollapply(1:3, 2L, sum, fill=-2L), c(-2L,3L,5L)) +test(6010.508, frollapply(1:3, 2L, sum, fill="z"), list("z",3L,5L)) +test(6010.509, frollapply(1:3, 4L, sum), c(NA,NA,NA)) +test(6010.510, frollapply(1:5, 3L, sum), c(NA,NA,6L,9L,12L)) +test(6010.511, frollapply(1:5, 3L, sum, align="center"), c(NA,6L,9L,12L,NA)) +test(6010.512, frollapply(1:5, 3L, sum, align="left"), c(6L,9L,12L,NA,NA)) +test(6010.513, frollapply(1:5, 4L, sum), c(NA,NA,NA,10L,14L)) +test(6010.514, frollapply(1:5, 4L, sum, align="center"), c(NA,10L,14L,NA,NA)) +test(6010.515, frollapply(1:5, 4L, sum, align="left"), c(10L,14L,NA,NA,NA)) +test(6010.516, frollapply(1:6, 3L, sum), c(NA,NA,6L,9L,12L,15L)) +test(6010.517, frollapply(1:6, 3L, sum, align="center"), c(NA,6L,9L,12L,15L,NA)) +test(6010.518, frollapply(1:6, 3L, sum, align="left"), c(6L,9L,12L,15L,NA,NA)) +test(6010.519, frollapply(1:6, 4L, sum), c(NA,NA,NA,10L,14L,18L)) +test(6010.520, frollapply(1:6, 4L, sum, align="center"), c(NA,10L,14L,18L,NA,NA)) +test(6010.521, frollapply(1:6, 4L, sum, align="left"), c(10L,14L,18L,NA,NA,NA)) +test(6010.522, frollapply(c(1:3,NA,5:6), 4L, sum), rep(NA_integer_,6)) +test(6010.523, frollapply(c(1:3,NA,5:6), 4L, sum, na.rm=TRUE), c(NA,NA,NA,6L,10L,14L)) +test(6010.524, frollapply(c(1,2,3,NA,NA,NA,NA), 3L, mean), c(NA,NA,2,NA,NA,NA,NA)) +test(6010.525, frollapply(c(1,2,3,NA,NA,NA,NA), 3L, mean, na.rm=TRUE), c(NA,NA,2,2.5,3,NaN,NaN)) +test(6010.526, frollapply(numeric(), 3L, sum), list()) +test(6010.527, frollapply(1:5, 3L, toString), c(NA, NA, "1, 2, 3", "2, 3, 4", "3, 4, 5")) +ma = function(x, n, na.rm=FALSE) { + ans = rep(NA_real_, nx<-length(x)) + for (i in n:nx) ans[i]=mean(x[(i-n+1):i], na.rm=na.rm) + ans +} +n = 4L +x = as.double(1:16) +x[5] = NaN +test(6010.531, frollapply(x, n, mean), ma(x, n)) +x[6] = NA +test(6010.532, frollapply(x, n, mean), ma(x, n)) +x[5] = NA +x[6] = NaN +test(6010.533, frollapply(x, n, mean), ma(x, n)) +x[5] = Inf +test(6010.534, frollapply(x, n, mean), ma(x, n)) +x[6] = -Inf +test(6010.535, frollapply(x, n, mean), ma(x, n)) +x[5:7] = c(NA, Inf, -Inf) +test(6010.536, frollapply(x, n, mean), ma(x, n)) +#### error from invalid args +test(6010.541, frollapply(1:2, 2, sum, by.column=NA), error="must be TRUE or FALSE") +test(6010.542, frollapply(1:2, 2, sum, adaptive=NA), error="must be TRUE or FALSE") +test(6010.543, frollapply(1:2, 2, sum, partial=NA), error="must be TRUE or FALSE") +test(6010.544, frollapply(1:2, 2, sum, give.names=NA), error="must be TRUE or FALSE") +test(6010.545, frollapply(1:2, 2, sum, simplify=NA), error="must be TRUE or FALSE or a function") +test(6010.561, frollapply(x=1:2, N=2, FUN=sum), c(NA,3L), warning="'x' is deprecated in frollapply, use 'X' instead") +test(6010.562, frollapply(X=1:2, n=2, FUN=sum), c(NA,3L), warning="'n' is deprecated in frollapply, use 'N' instead") +test(6010.563, frollapply(x=1:2, n=2, FUN=sum), c(NA,3L), warning=c("'x' is deprecated in frollapply, use 'X' instead","'n' is deprecated in frollapply, use 'N' instead")) + +## by.column +x = data.table(v1=1:5, v2=2:6/2) +test(6010.601, frollapply(x, 3, dim, by.column=FALSE, fill=c(rows=NA_integer_, cols=NA_integer_)), data.table(rows=c(NA,NA,3L,3L,3L), cols=c(NA,NA,2L,2L,2L))) +test(6010.602, frollapply(x, 3, FUN=tail, 1L, by.column=FALSE, fill=data.table(v1=NA_integer_, v2=NA_real_)), copy(x)[1:2, names(x) := NA]) +test(6010.603, frollapply(x, 3, FUN=tail, 1L, by.column=FALSE, partial=TRUE), x) +test(6010.604, frollapply(x, 3, dim, by.column=FALSE, partial=TRUE), data.table(V1=c(1:3,3L,3L), V2=c(2L,2L,2L,2L,2L))) ## fill is not used in partial +test(6010.605, frollapply(x, 3, function(x) setNames(dim(x), c("rows","cols")), by.column=FALSE, partial=TRUE), data.table(rows=c(1:3,3L,3L), cols=c(2L,2L,2L,2L,2L))) +test(6010.606, frollapply(x, rep(3,5), dim, by.column=FALSE, fill=c(rows=NA_integer_, cols=NA_integer_), adaptive=TRUE, align="left"), data.table(rows=c(3L,3L,3L,NA,NA), cols=c(2L,2L,2L,NA,NA))) +#### empty input +test(6010.607, frollapply(list(), 3, identity, by.column=FALSE), list()) +test(6010.608, frollapply(list(numeric(), numeric()), 3, identity, by.column=FALSE), list()) +test(6010.609, frollapply(list(numeric(), 1:3), 3, identity, by.column=FALSE), error="all vectors must have equal lengths") +test(6010.610, frollapply(numeric(), 3, identity), list()) +test(6010.611, frollapply(list(numeric(), numeric()), 3, identity), list(NULL,NULL)) +test(6010.612, frollapply(list(numeric(), 1:3), 3, identity), list(NULL, list(NA,NA,1:3))) + +#### list input in frollapply +DT = as.data.table(iris) +test(6010.620, ## list()/.() same as data.frame() + DT[, frollapply(.(Sepal.Length, Sepal.Width), 3, function(l) list(l[[1L]][1L], l[[2L]][1L]), fill=list(NA,NA), by.column=FALSE)], + DT[, frollapply(data.frame(Sepal.Length, Sepal.Width), 3, function(l) list(l[[1L]][1L], l[[2L]][1L]), fill=list(NA,NA), by.column=FALSE)]) +rm(DT) +flow = function(x) { + v1 = x[[1L]] + v2 = x[[2L]] + (v1[2L] - v1[1L] * (1+v2[2L])) / v1[1L] +} +idx = c(1:2, 51:52, 101:102) +ans = c(NA, -3.03921568627451, NA, -3.28571428571429, NA, -2.77936507936508) +test(6010.621, as.data.table(iris)[, "flow" := frollapply(.(Sepal.Length, Sepal.Width), 2L, flow, by.column=FALSE), by = Species]$flow[idx], ans) +test(6010.622, as.data.table(iris)[, "flow" := frollapply(data.frame(Sepal.Length, Sepal.Width), 2L, flow, by.column=FALSE), by = Species]$flow[idx], ans) +test(6010.623, as.data.table(iris)[, "flow" := unlist(lapply(split(data.frame(Sepal.Length, Sepal.Width), Species), frollapply, 2L, flow, by.column=FALSE))]$flow[idx], ans) +f = function(l) as.list(range(l[[1L]])-range(l[[2L]])) +test(6010.624, frollapply(list(1:5, 5:1), c(2,2,3,3,4), f, adaptive=TRUE, by.column=FALSE, fill=list(NA,NA)), data.table(V1=c(NA,-3L,-2L,0L,1L), V2=c(NA,-3L,-2L,0L,1L))) +test(6010.625, frollapply(list(1:5, 5:1), c(2,2,3,3,4), f, align="left", adaptive=TRUE, by.column=FALSE, fill=list(NA,NA)), data.table(V1=c(-3L,-1L,2L,NA,NA), V2=c(-3L,-1L,2L,NA,NA))) +#### list of df/lists +x = list(data.table(x=1:2, y=2:3), data.table(z=3:5)) +test(6010.631, frollapply(x, 2, tail, 1, by.column=FALSE, fill=data.table(), simplify=function(x) rbindlist(x, fill=TRUE)), list(data.table(x=2L, y=3L), data.table(z=4:5))) +test(6010.632, frollapply(x, 2:3, tail, 1, by.column=FALSE, fill=data.table(), simplify=function(x) rbindlist(x, fill=TRUE)), list(data.table(x=2L, y=3L), data.table(NULL), data.table(z=4:5), data.table(z=5L))) +x = lapply(x, as.list) +test(6010.633, frollapply(x, 2, tail, 1, by.column=FALSE, fill=list()), error="supports vectorized input") + +#### lm +f = function(x) coef(lm(v2 ~ v1, data=x)) +coef.fill = c("(Intercept)"=NA_real_, "v1"=NA_real_) +test(6010.651, frollapply(data.table(v1=1:5, v2=2:6/2), 3, f, by.column=FALSE, fill=coef.fill), data.table("(Intercept)"=c(NA,NA,0.5,0.5,0.5), "v1"=c(NA,NA,0.5,0.5,0.5))) +test(6010.652, frollapply(data.table(v1=1:5, v2=2:6), 3, f, by.column=FALSE, fill=coef.fill), data.table("(Intercept)"=c(NA,NA,1,1,1), "v1"=c(NA,NA,1,1,1))) +## vectorized input for by.column=FALSE +X = list(data1 = data.table(v1=1:5, v2=2:6/2), data2 = data.table(v1=1:5, v2=2:6)) +n = c(small = 3, big = 4) +ans = list(data1_small = data.table("(Intercept)"=c(NA,NA,0.5,0.5,0.5), "v1"=c(NA,NA,0.5,0.5,0.5)), + data1_big = data.table("(Intercept)"=c(NA,NA,NA,0.5,0.5), "v1"=c(NA,NA,NA,0.5,0.5)), + data2_small = data.table("(Intercept)"=c(NA,NA,1,1,1), "v1"=c(NA,NA,1,1,1)), + data2_big = data.table("(Intercept)"=c(NA,NA,NA,1,1), "v1"=c(NA,NA,NA,1,1))) +test(6010.653, y = ans, x = lapply( + FUN = function(x) x[, names(x) := lapply(.SD, round, 8L)], ## otherwise we get 0.500...0001 and that fails test() when input is a list + frollapply(X, n, f, by.column=FALSE, fill=coef.fill, give.names=TRUE) +)) +rm(X, ans, n) + +## simplify +test(6010.701, frollapply(1:5, 2, sum), c(NA,3L,5L,7L,9L)) +test(6010.702, frollapply(1:5, 2, sum, simplify=unlist), c(NA,3L,5L,7L,9L)) +test(6010.703, frollapply(1:5, 2, sum, simplify=FALSE), list(NA,3L,5L,7L,9L)) +test(6010.704, frollapply(1:5, 2, range), list(NA,1:2,2:3,3:4,4:5)) ## fill=NA could possibly be recycled to length of FUN results +test(6010.705, frollapply(1:5, 2, range, simplify=FALSE), list(NA,1:2,2:3,3:4,4:5)) +test(6010.706, frollapply(1:5, 2, range, fill=c(NA_integer_,NA_integer_)), data.table(V1=c(NA,1:4), V2=c(NA,2:5))) +test(6010.707, frollapply(1:5, 2, range, fill=c(min=NA_integer_, max=NA_integer_)), data.table(min=c(NA,1:4), max=c(NA,2:5))) +test(6010.708, frollapply(1:5, 2, range, fill=c(min=NA_integer_, max=NA_integer_), simplify=function(x) rbindlist(lapply(x, as.list))), data.table(min=c(NA,1:4), max=c(NA,2:5))) +test(6010.709, frollapply(1:5, 2, function(x) as.list(range(x)), fill=list(min=NA_integer_, max=NA_integer_)), data.table(min=c(NA,1:4), max=c(NA,2:5))) +test(6010.710, frollapply(1:5, 2, function(x) as.list(range(x)), fill=list(min=NA_integer_, max=NA_integer_), simplify=rbindlist), data.table(min=c(NA,1:4), max=c(NA,2:5))) +test(6010.711, frollapply(1:5, 2, function(x) as.list(range(x)), fill=list(NA_integer_, NA_integer_), simplify=FALSE), list(list(NA_integer_, NA_integer_), as.list(1:2), as.list(2:3), as.list(3:4), as.list(4:5))) +#### fixing .internal.selfref +use.fork = .Platform$OS.type!="windows" && getDTthreads()>1L +if (use.fork) { + is.ok = function(x) {stopifnot(is.data.table(x)); capture.output(print(attr(x, ".internal.selfref", TRUE)))!=""} + ans = frollapply(1:2, 2, data.table) ## default: fill=NA + test(6010.770, is.ok(ans[[2L]])) ## mismatch of 'fill' type so simplify=TRUE did not run rbindlist but frollapply detected DT and fixed + ans = frollapply(1:2, 2, data.table, fill=data.table(NA)) ## fill type match + test(6010.771, is.ok(ans)) ## simplify=TRUE did run rbindlist, but frollapply fixed anyway + ans = frollapply(1:2, 2, data.table, fill=data.table(NA), simplify=FALSE) + test(6010.772, is.ok(ans[[2L]])) + ans = frollapply(1:2, 2, function(x) list(data.table(x)), fill=list(data.table(NA)), simplify=FALSE) + test(6010.773, !is.ok(ans[[2L]][[1L]])) + test(6010.7731, set(ans[[2L]][[1L]],, "newcol", 1L), error="data.table has either been loaded from disk") + ans = lapply(ans, lapply, setDT) + test(6010.774, is.ok(ans[[2L]][[1L]])) ## fix after + ans = frollapply(1:2, 2, function(x) list(data.table(x)), fill=list(data.table(NA)), simplify=function(x) lapply(x, lapply, setDT)) + test(6010.775, is.ok(ans[[2L]][[1L]])) ## fix inside frollapply via simplify + f = function(x) (if (x[1L]==1L) data.frame else data.table)(x) ## automatic fix may not work for a non-type stable function + ans = frollapply(1:3, 2, f, fill=data.table(NA), simplify=FALSE) + test(6010.776, !is.ok(ans[[3L]])) + ans = frollapply(1:3, 2, f, fill=data.table(NA), simplify=function(x) lapply(x, function(y) if (is.data.table(y)) setDT(y) else y)) + test(6010.777, is.ok(ans[[3L]])) ## fix inside frollapply via simplify +} + +## partial adaptive +test(6010.801, frollapply(1:4, rep(2L,4L), mean, adaptive=TRUE, partial=TRUE), c(1,1.5,2.5,3.5)) +test(6010.802, frollapply(FUN=mean, 1:4, rep(2L,4L), adaptive=TRUE, partial=TRUE), c(1,1.5,2.5,3.5)) +test(6010.803, frollapply(FUN=mean, 1:4, list(1:4, 1:3), adaptive=TRUE, partial=TRUE), error="adaptive windows provided in 'N' must not to have different lengths") +test(6010.804, frollapply(FUN=mean, 1:4, list(1:3), adaptive=TRUE, partial=TRUE), error="length of integer vector(s) provided as list to 'N' argument must be equal to number of observations provided in 'X'") +test(6010.805, frollapply(FUN=mean, 1:4, list(rep(2L,4L)), adaptive=TRUE, partial=TRUE), c(1,1.5,2.5,3.5)) +test(6010.806, frollapply(FUN=sum, as.double(1:4), 1:4, adaptive=TRUE, partial=TRUE), c(1,3,6,10)) ## all same as index +test(6010.807, frollapply(FUN=sum, as.double(1:4), 1:4, align="left", adaptive=TRUE, partial=TRUE), c(1,5,7,4)) +test(6010.808, frollapply(FUN=sum, as.double(1:4), c(2,3,1,1), adaptive=TRUE, partial=TRUE), c(1,3,3,4)) ## leading two bigger than index +test(6010.809, frollapply(FUN=sum, as.double(1:4), c(2,3,1,1), align="left", adaptive=TRUE, partial=TRUE), c(3,9,3,4)) +test(6010.810, frollapply(FUN=sum, as.double(1:4), c(6,5,4,2), adaptive=TRUE, partial=TRUE), c(1,3,6,7)) ## leading two bigger than rev index +test(6010.811, frollapply(FUN=sum, as.double(1:4), c(6,5,4,2), align="left", adaptive=TRUE, partial=TRUE), c(10,9,7,4)) +test(6010.812, frollapply(FUN=sum, as.double(1:4), c(2,4,5,6), adaptive=TRUE, partial=TRUE), c(1,3,6,10)) ## trailing two bigger than index +test(6010.813, frollapply(FUN=sum, as.double(1:4), c(2,4,5,6), align="left", adaptive=TRUE, partial=TRUE), c(3,9,7,4)) +test(6010.814, frollapply(FUN=sum, as.double(1:4), c(1,1,3,2), adaptive=TRUE, partial=TRUE), c(1,2,6,7)) ## trailing two bigger than rev index +test(6010.815, frollapply(FUN=sum, as.double(1:4), c(1,1,3,2), align="left", adaptive=TRUE, partial=TRUE), c(1,2,7,4)) + +## give names +test(6010.951, frollapply(FUN=sum, c(1,2,3), 2, give.names=TRUE), c(NA,3,5)) +test(6010.952, frollapply(FUN=sum, list(c(1,2,3)), 2, give.names=TRUE), list(V1_rollapply2=c(NA,3,5))) +test(6010.953, frollapply(FUN=sum, list(x1=c(1,2,3)), 2, give.names=TRUE), list(x1_rollapply2=c(NA,3,5))) +test(6010.954, frollapply(FUN=sum, list(c(1,2,3)), c(n1=2), give.names=TRUE), list(V1_n1=c(NA,3,5))) +test(6010.955, frollapply(FUN=sum, list(x1=c(1,2,3)), c(n1=2), give.names=TRUE), list(x1_n1=c(NA,3,5))) +test(6010.956, frollapply(FUN=sum, c(1,2,3), 2:3, give.names=TRUE), list(rollapply2=c(NA,3,5), rollapply3=c(NA,NA,6))) +test(6010.957, frollapply(FUN=sum, list(c(1,2,3)), 2:3, give.names=TRUE), list(V1_rollapply2=c(NA,3,5), V1_rollapply3=c(NA,NA,6))) +test(6010.958, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), 2, give.names=TRUE), list(V1_rollapply2=c(NA,3,5), V2_rollapply2=c(NA,5,7))) +test(6010.959, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), 2:3, give.names=TRUE), list(V1_rollapply2=c(NA,3,5), V1_rollapply3=c(NA,NA,6), V2_rollapply2=c(NA,5,7), V2_rollapply3=c(NA,NA,9))) +test(6010.960, frollapply(FUN=sum, c(1,2,3), c(n1=2, n2=3), give.names=TRUE), list(n1=c(NA,3,5), n2=c(NA,NA,6))) +test(6010.961, frollapply(FUN=sum, list(c(1,2,3)), c(n1=2, n2=3), give.names=TRUE), list(V1_n1=c(NA,3,5), V1_n2=c(NA,NA,6))) +test(6010.962, frollapply(FUN=sum, list(x1=c(1,2,3)), 2:3, give.names=TRUE), list(x1_rollapply2=c(NA,3,5), x1_rollapply3=c(NA,NA,6))) +test(6010.963, frollapply(FUN=sum, list(x1=c(1,2,3)), c(n1=2, n2=3), give.names=TRUE), list(x1_n1=c(NA,3,5), x1_n2=c(NA,NA,6))) +test(6010.964, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), c(n1=2), give.names=TRUE), list(V1_n1=c(NA,3,5), V2_n1=c(NA,5,7))) +test(6010.965, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), c(n1=2, n2=3), give.names=TRUE), list(V1_n1=c(NA,3,5), V1_n2=c(NA,NA,6), V2_n1=c(NA,5,7), V2_n2=c(NA,NA,9))) +test(6010.966, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), 2, give.names=TRUE), list(x1_rollapply2=c(NA,3,5), x2_rollapply2=c(NA,5,7))) +test(6010.967, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), 2:3, give.names=TRUE), list(x1_rollapply2=c(NA,3,5), x1_rollapply3=c(NA,NA,6), x2_rollapply2=c(NA,5,7), x2_rollapply3=c(NA,NA,9))) +test(6010.968, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), c(n1=2), give.names=TRUE), list(x1_n1=c(NA,3,5), x2_n1=c(NA,5,7))) +test(6010.969, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), c(n1=2, n2=3), give.names=TRUE), list(x1_n1=c(NA,3,5), x1_n2=c(NA,NA,6), x2_n1=c(NA,5,7), x2_n2=c(NA,NA,9))) +test(6010.971, frollapply(FUN=sum, c(1,2,3), c(2,2,2), adaptive=TRUE, give.names=TRUE), c(NA,3,5)) ## adaptive +test(6010.972, frollapply(FUN=sum, c(1,2,3), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), c(NA,3,5)) +test(6010.973, frollapply(FUN=sum, list(c(1,2,3)), c(2,2,2), adaptive=TRUE, give.names=TRUE), list(V1=c(NA,3,5))) +test(6010.974, frollapply(FUN=sum, list(c(1,2,3)), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(V1_arollapply1=c(NA,3,5))) +test(6010.975, frollapply(FUN=sum, list(x1=c(1,2,3)), c(2,2,2), adaptive=TRUE, give.names=TRUE), list(x1=c(NA,3,5))) +test(6010.976, frollapply(FUN=sum, list(x1=c(1,2,3)), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(x1_arollapply1=c(NA,3,5))) +test(6010.977, frollapply(FUN=sum, list(c(1,2,3)), list(n1=c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(V1_n1=c(NA,3,5))) +test(6010.978, frollapply(FUN=sum, list(x1=c(1,2,3)), list(n1=c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(x1_n1=c(NA,3,5))) +test(6010.979, frollapply(FUN=sum, c(1,2,3), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(arollapply1=c(NA,3,5), arollapply2=c(NA,NA,6))) +test(6010.980, frollapply(FUN=sum, list(c(1,2,3)), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(V1_arollapply1=c(NA,3,5), V1_arollapply2=c(NA,NA,6))) +test(6010.981, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), c(2,2,2), adaptive=TRUE, give.names=TRUE), list(V1=c(NA,3,5), V2=c(NA,5,7))) +test(6010.982, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(V1_arollapply1=c(NA,3,5), V2_arollapply1=c(NA,5,7))) +test(6010.983, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(V1_arollapply1=c(NA,3,5), V1_arollapply2=c(NA,NA,6), V2_arollapply1=c(NA,5,7), V2_arollapply2=c(NA,NA,9))) +test(6010.984, frollapply(FUN=sum, c(1,2,3), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(n1=c(NA,3,5), n2=c(NA,NA,6))) +test(6010.985, frollapply(FUN=sum, list(c(1,2,3)), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(V1_n1=c(NA,3,5), V1_n2=c(NA,NA,6))) +test(6010.986, frollapply(FUN=sum, list(x1=c(1,2,3)), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(x1_arollapply1=c(NA,3,5), x1_arollapply2=c(NA,NA,6))) +test(6010.987, frollapply(FUN=sum, list(x1=c(1,2,3)), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(x1_n1=c(NA,3,5), x1_n2=c(NA,NA,6))) +test(6010.988, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), list(n1=c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(V1_n1=c(NA,3,5), V2_n1=c(NA,5,7))) +test(6010.989, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(V1_n1=c(NA,3,5), V1_n2=c(NA,NA,6), V2_n1=c(NA,5,7), V2_n2=c(NA,NA,9))) +test(6010.990, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), c(2,2,2), adaptive=TRUE, give.names=TRUE), list(x1=c(NA,3,5), x2=c(NA,5,7))) +test(6010.991, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), list(c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(x1_arollapply1=c(NA,3,5), x2_arollapply1=c(NA,5,7))) +test(6010.992, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), list(c(2,2,2), c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(x1_arollapply1=c(NA,3,5), x1_arollapply2=c(NA,NA,6), x2_arollapply1=c(NA,5,7), x2_arollapply2=c(NA,NA,9))) +test(6010.993, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), list(n1=c(2,2,2)), adaptive=TRUE, give.names=TRUE), list(x1_n1=c(NA,3,5), x2_n1=c(NA,5,7))) +test(6010.994, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), list(n1=c(2,2,2), n2=c(3,3,3)), adaptive=TRUE, give.names=TRUE), list(x1_n1=c(NA,3,5), x1_n2=c(NA,NA,6), x2_n1=c(NA,5,7), x2_n2=c(NA,NA,9))) +test(6010.9950, frollapply(FUN=sum, c(1,2,3), 2, partial=TRUE, give.names=TRUE), c(1,3,5)) ## partial +test(6010.9951, frollapply(FUN=sum, c(1,2,3), c(n1=2), partial=TRUE, give.names=TRUE), c(1,3,5)) +test(6010.9952, frollapply(FUN=sum, list(c(1,2,3)), 2, partial=TRUE, give.names=TRUE), list(V1_rollapply2=c(1,3,5))) +test(6010.9953, frollapply(FUN=sum, list(x1=c(1,2,3)), 2, partial=TRUE, give.names=TRUE), list(x1_rollapply2=c(1,3,5))) +test(6010.9954, frollapply(FUN=sum, list(c(1,2,3)), c(n1=2), partial=TRUE, give.names=TRUE), list(V1_n1=c(1,3,5))) +test(6010.9955, frollapply(FUN=sum, list(x1=c(1,2,3)), c(n1=2), partial=TRUE, give.names=TRUE), list(x1_n1=c(1,3,5))) +test(6010.9956, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), c(2, 3), partial=TRUE, give.names=TRUE), list(V1_rollapply2=c(1,3,5), V1_rollapply3=c(1,3,6), V2_rollapply2=c(2,5,7), V2_rollapply3=c(2,5,9))) +test(6010.9957, frollapply(FUN=sum, list(c(1,2,3), c(2,3,4)), c(n1=2, n2=3), partial=TRUE, give.names=TRUE), list(V1_n1=c(1,3,5), V1_n2=c(1,3,6), V2_n1=c(2,5,7), V2_n2=c(2,5,9))) +test(6010.9958, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), c(2, 3), partial=TRUE, give.names=TRUE), list(x1_rollapply2=c(1,3,5), x1_rollapply3=c(1,3,6), x2_rollapply2=c(2,5,7), x2_rollapply3=c(2,5,9))) +test(6010.9959, frollapply(FUN=sum, list(x1=c(1,2,3), x2=c(2,3,4)), c(n1=2, n2=3), partial=TRUE, give.names=TRUE), list(x1_n1=c(1,3,5), x1_n2=c(1,3,6), x2_n1=c(2,5,7), x2_n2=c(2,5,9))) +test(6010.9960, frollapply(FUN=sum, c(1,2,3), c(2,2,2), adaptive=TRUE, partial=TRUE, give.names=TRUE), c(1,3,5)) ## adaptive partial +test(6010.9961, frollapply(FUN=sum, c(1,2,3), list(c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), c(1,3,5)) +test(6010.9962, frollapply(FUN=sum, list(c(1,2,3)), c(2,2,2), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(V1=c(1,3,5))) +test(6010.9963, frollapply(FUN=sum, list(c(1,2,3)), list(c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(V1_arollapply1=c(1,3,5))) +test(6010.9964, frollapply(FUN=sum, list(x1=c(1,2,3)), c(2,2,2), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(x1=c(1,3,5))) +test(6010.9965, frollapply(FUN=sum, list(x1=c(1,2,3)), list(c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(x1_arollapply1=c(1,3,5))) +test(6010.9966, frollapply(FUN=sum, c(1,2,3), list(c(n1=c(2,2,2))), adaptive=TRUE, partial=TRUE, give.names=TRUE), c(1,3,5)) +test(6010.9967, frollapply(FUN=sum, list(c(1,2,3)), c(2,2,2), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(V1=c(1,3,5))) +test(6010.9968, frollapply(FUN=sum, list(c(1,2,3)), list(n1=c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(V1_n1=c(1,3,5))) +test(6010.9969, frollapply(FUN=sum, list(x1=c(1,2,3)), list(n1=c(2,2,2)), adaptive=TRUE, partial=TRUE, give.names=TRUE), list(x1_n1=c(1,3,5))) + +## frolladapt +test(6015.001, frolladapt(integer(), -1L), error="n must be positive integer values") +test(6015.002, frolladapt(integer(), 0L), error="n must be positive integer values") +test(6015.003, frolladapt(integer(), 1L), integer()) +test(6015.004, frolladapt(integer(), 2L), integer()) +test(6015.005, frolladapt(integer(), integer()), error="must be non-zero length and must not have NAs") +test(6015.006, frolladapt(integer(), NA_integer_), error="must be non-zero length and must not have NAs") +test(6015.007, frolladapt(integer(), 0), error="n must be positive integer values") +test(6015.008, frolladapt(integer(), 1), integer()) +test(6015.011, frolladapt(0L, 0L), error="n must be positive integer values") +test(6015.012, frolladapt(0L, 0L, partial=TRUE), error="n must be positive integer values") +test(6015.013, frolladapt(0L, 1L), 1L) +test(6015.014, frolladapt(0L, 1L, partial=TRUE), 1L) +test(6015.015, frolladapt(0L, 2L), NA_integer_) +test(6015.016, frolladapt(0L, 2L, partial=TRUE), 1L) +test(6015.017, frolladapt(0L, 0), error="n must be positive integer values") +test(6015.018, frolladapt(0L, 0, partial=TRUE), error="n must be positive integer values") +test(6015.019, frolladapt(0L, 1), 1L) +test(6015.020, frolladapt(0L, 1, partial=TRUE), 1L) +test(6015.021, frolladapt(c(1,3,5), 2), c(NA,1L,1L)) +test(6015.022, frolladapt(c(1,3,5), 2, partial=TRUE), c(1L,1L,1L)) +test(6015.023, frolladapt(c(2,4,6), 2), c(NA,1L,1L)) +test(6015.024, frolladapt(c(2,4,6), 2, partial=TRUE), c(1L,1L,1L)) +test(6015.025, frolladapt(c(1,3,5), 1), c(1L,1L,1L)) +test(6015.026, frolladapt(c(1,3,5), 1, partial=TRUE), c(1L,1L,1L)) +test(6015.027, frolladapt(c(1,3,4,5), 2), c(NA,1L,2L,2L)) +test(6015.028, frolladapt(c(1,3,4,5), 2, partial=TRUE), c(1L,1L,2L,2L)) +test(6015.029, frolladapt(c(-3,-2,-1,1,3), 2), c(NA,2L,2L,1L,1L)) +test(6015.030, frolladapt(c(-3,-2,-1,1,3), 2, partial=TRUE), c(1L,2L,2L,1L,1L)) +test(6015.031, frolladapt(c(-3,-2,-1,1,3), 3), c(NA,NA,3L,2L,2L)) +test(6015.032, frolladapt(c(-3,-2,-1,1,3), 3, partial=TRUE), c(1L,2L,3L,2L,2L)) +idx = c(1:4,6:7,10:14,16:17,23:24) +test(6015.041, frolladapt(idx, 5), c(NA, NA, NA, NA, 4L, 4L, 3L, 3L, 3L, 4L, 5L, 4L, 4L, 1L, 2L)) +test(6015.042, frolladapt(idx, 5, partial=TRUE), c(1L, 2L, 3L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 5L, 4L, 4L, 1L, 2L)) +test(6015.051, frolladapt(list(c(1,3,5), c(2,4,6)), 2), error="Index vector 'x' must of numeric type") +test(6015.052, frolladapt(list(c(1,3,5), c(2,4,6)), 2:3), error="Index vector 'x' must of numeric type") +test(6015.053, frolladapt(c(1,3,5), 2:3), list(c(NA,1L,1L), c(NA,2L,2L))) +test(6015.054, frolladapt(c(1,3,5), 2:3, partial=TRUE), list(c(1L,1L,1L), c(1L,2L,2L))) +test(6015.061, frolladapt(c(1,3,5), 2, give.names=TRUE), c(NA,1L,1L)) +test(6015.062, frolladapt(c(1,3,5), 2:3, give.names=TRUE), list(n2=c(NA,1L,1L), n3=c(NA,2L,2L))) +test(6015.063, frolladapt(c(1,3,5), c(a=2, b=3)), list(c(NA,1L,1L), c(NA,2L,2L))) +test(6015.064, frolladapt(c(1,3,5), c(a=2, b=3), give.names=TRUE), list(a=c(NA,1L,1L), b=c(NA,2L,2L))) + +### verified against slider pkg + +#library(slider) +#library(data.table) +#set.seed(108) +#N = 64 +#n = 8 +#x = sample(N, N, TRUE) +#idx = sort(sample(N*64, N)) ## update this for each sparsity +#system.time(s <- slide_index_dbl(x, idx, mean, .before=n-1L, .complete=TRUE)) +#system.time(d <- frollmean(x, frolladapt(idx, n), adaptive=TRUE)) +#all.equal(d, s) +#cat("x = "); dput(x); cat("idx = "); dput(idx); cat("ans = "); dput(s) + +n = 8 +#### completely dense: sort(sample(N*1.0, N)) = 1:N +x = c(63L, 42L, 47L, 16L, 39L, 45L, 47L, 43L, 54L, 29L, 47L, 45L, 45L, 27L, 39L, 26L, 6L, 39L, 23L, 57L, 6L, 22L, 20L, 14L, 24L, 53L, 58L, 31L, 54L, 51L, 55L, 19L, 22L, 21L, 4L, 53L, 14L, 35L, 13L, 49L, 51L, 42L, 46L, 47L, 24L, 59L, 58L, 53L, 36L, 41L, 5L, 57L, 51L, 44L, 21L, 3L, 45L, 12L, 61L, 25L, 47L, 57L, 52L, 57L) +idx = 1:64 +ans = c(NA, NA, NA, NA, NA, NA, NA, 42.75, 41.625, 40, 40, 43.625, 44.375, 42.125, 41.125, 39, 33, 34.25, 31.25, 32.75, 27.875, 27.25, 24.875, 23.375, 25.625, 27.375, 31.75, 28.5, 34.5, 38.125, 42.5, 43.125, 42.875, 38.875, 32.125, 34.875, 29.875, 27.875, 22.625, 26.375, 30, 32.625, 37.875, 37.125, 38.375, 41.375, 47, 47.5, 45.625, 45.5, 40.375, 41.625, 45, 43.125, 38.5, 32.25, 33.375, 29.75, 36.75, 32.75, 32.25, 33.875, 37.75, 44.5) +test(6015.301, frollmean(x, frolladapt(idx, n), adaptive=TRUE), ans) +#### very dense: sort(sample(N*1.1, N)) +x = c(63L, 42L, 47L, 16L, 39L, 45L, 47L, 43L, 54L, 29L, 47L, 45L, 45L, 27L, 39L, 26L, 6L, 39L, 23L, 57L, 6L, 22L, 20L, 14L, 24L, 53L, 58L, 31L, 54L, 51L, 55L, 19L, 22L, 21L, 4L, 53L, 14L, 35L, 13L, 49L, 51L, 42L, 46L, 47L, 24L, 59L, 58L, 53L, 36L, 41L, 5L, 57L, 51L, 44L, 21L, 3L, 45L, 12L, 61L, 25L, 47L, 57L, 52L, 57L) +idx = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 22L, 23L, 24L, 25L, 26L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 63L, 64L, 65L, 66L, 68L, 69L, 70L) +ans = c(NA, NA, NA, NA, NA, NA, NA, 42.75, 41.625, 40, 40, 43.625, 44.375, 42.125, 41.125, 39, 33, 34.25, 31.25, 31.6666666666667, 26.1666666666667, 25.5, 27.8333333333333, 23.6666666666667, 23.8333333333333, 28, 28.1428571428571, 31.7142857142857, 36.2857142857143, 40.7142857142857, 46.5714285714286, 43.125, 42.875, 38.875, 32.125, 34.875, 29.875, 24, 23.1428571428571, 27, 31.2857142857143, 36.7142857142857, 35.7142857142857, 40.4285714285714, 38.375, 41.375, 47, 47.5, 45.625, 45.5, 40.375, 41.625, 45, 43.125, 38.5, 32.25, 33.375, 33.2857142857143, 33.8571428571429, 30.1428571428571, 30.5714285714286, 41.1666666666667, 42.3333333333333, 44.4285714285714) +test(6015.302, frollmean(x, frolladapt(idx, n), adaptive=TRUE), ans) +#### moderately dense: sort(sample(N*1.5, N)) +x = c(63L, 42L, 47L, 16L, 39L, 45L, 47L, 43L, 54L, 29L, 47L, 45L, 45L, 27L, 39L, 26L, 6L, 39L, 23L, 57L, 6L, 22L, 20L, 14L, 24L, 53L, 58L, 31L, 54L, 51L, 55L, 19L, 22L, 21L, 4L, 53L, 14L, 35L, 13L, 49L, 51L, 42L, 46L, 47L, 24L, 59L, 58L, 53L, 36L, 41L, 5L, 57L, 51L, 44L, 21L, 3L, 45L, 12L, 61L, 25L, 47L, 57L, 52L, 57L) +idx = c(2L, 3L, 4L, 6L, 7L, 10L, 12L, 13L, 14L, 15L, 17L, 18L, 19L, 20L, 21L, 24L, 25L, 26L, 27L, 29L, 30L, 32L, 33L, 34L, 35L, 36L, 37L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 48L, 54L, 55L, 56L, 57L, 59L, 60L, 62L, 63L, 65L, 66L, 67L, 68L, 69L, 71L, 73L, 76L, 77L, 79L, 80L, 84L, 85L, 86L, 87L, 90L, 91L, 92L, 94L, 95L, 96L) +ans = c(NA, NA, NA, NA, NA, 37.8, 36.75, 38, 45.6, 43.6, 44.1666666666667, 44.1666666666667, 44.2857142857143, 41.4285714285714, 40.8571428571429, 38.1666666666667, 31.3333333333333, 30.3333333333333, 26.6666666666667, 30.2, 26.1666666666667, 25.5, 27.8333333333333, 23.6666666666667, 23.8333333333333, 28, 28.1428571428571, 31.7142857142857, 36.2857142857143, 40.7142857142857, 46.5714285714286, 45.8571428571429, 41.4285714285714, 36.1428571428571, 28.6666666666667, 28.5, 23.6666666666667, 34, 28.75, 32.8, 35.8333333333333, 34, 39.3333333333333, 47, 43.1666666666667, 44.8333333333333, 46, 47, 46.1666666666667, 45.1666666666667, 33.75, 34.75, 38.5, 39.6, 43.25, 29.75, 32.8, 25, 28.4, 27.8333333333333, 32.1666666666667, 40.4, 48.4, 49.8333333333333) +test(6015.303, frollmean(x, frolladapt(idx, n), adaptive=TRUE), ans) +#### moderately sparse: sort(sample(N*2, N)) +x = c(63L, 42L, 47L, 16L, 39L, 45L, 47L, 43L, 54L, 29L, 47L, 45L, 45L, 27L, 39L, 26L, 6L, 39L, 23L, 57L, 6L, 22L, 20L, 14L, 24L, 53L, 58L, 31L, 54L, 51L, 55L, 19L, 22L, 21L, 4L, 53L, 14L, 35L, 13L, 49L, 51L, 42L, 46L, 47L, 24L, 59L, 58L, 53L, 36L, 41L, 5L, 57L, 51L, 44L, 21L, 3L, 45L, 12L, 61L, 25L, 47L, 57L, 52L, 57L) +idx = c(2L, 3L, 4L, 10L, 12L, 13L, 14L, 15L, 17L, 19L, 20L, 21L, 25L, 26L, 27L, 29L, 30L, 32L, 34L, 35L, 37L, 42L, 44L, 45L, 48L, 54L, 55L, 56L, 57L, 60L, 61L, 63L, 67L, 69L, 71L, 76L, 81L, 82L, 85L, 86L, 87L, 90L, 91L, 92L, 97L, 99L, 101L, 103L, 104L, 105L, 107L, 109L, 112L, 113L, 114L, 115L, 116L, 119L, 120L, 121L, 122L, 124L, 125L, 126L) +ans = c(NA, NA, NA, 35, 27.5, 33.3333333333333, 36.75, 38, 40.6666666666667, 42.8333333333333, 44.1666666666667, 44.1666666666667, 41.5, 38.6, 40.6, 34.25, 28.6, 30.3333333333333, 26.6, 30.2, 26.2, 28.3333333333333, 16, 18.6666666666667, 20, 38.5, 45, 47.3333333333333, 49, 49.4, 50.3333333333333, 42, 36.75, 20.6666666666667, 15.6666666666667, 26, 33.5, 34, 20.6666666666667, 27.75, 32.4, 38.75, 40.2, 41.3333333333333, 39.75, 43.3333333333333, 47, 48.5, 46, 49.4, 38.6, 38.4, 38.5, 39.25, 35.6, 35.2, 36.8333333333333, 29.3333333333333, 31, 27.8333333333333, 32.1666666666667, 40.4, 42.3333333333333, 44.4285714285714) +test(6015.304, frollmean(x, frolladapt(idx, n), adaptive=TRUE), ans) +#### very sparse: sort(sample(N*8, N)) +x = c(63L, 42L, 47L, 16L, 39L, 45L, 47L, 43L, 54L, 29L, 47L, 45L, 45L, 27L, 39L, 26L, 6L, 39L, 23L, 57L, 6L, 22L, 20L, 14L, 24L, 53L, 58L, 31L, 54L, 51L, 55L, 19L, 22L, 21L, 4L, 53L, 14L, 35L, 13L, 49L, 51L, 42L, 46L, 47L, 24L, 59L, 58L, 53L, 36L, 41L, 5L, 57L, 51L, 44L, 21L, 3L, 45L, 12L, 61L, 25L, 47L, 57L, 52L, 57L) +idx = c(3L, 10L, 42L, 57L, 60L, 97L, 116L, 118L, 130L, 141L, 143L, 154L, 158L, 160L, 172L, 183L, 184L, 185L, 191L, 210L, 214L, 218L, 227L, 229L, 239L, 285L, 293L, 298L, 300L, 301L, 304L, 310L, 325L, 327L, 337L, 348L, 353L, 360L, 362L, 364L, 366L, 376L, 378L, 379L, 383L, 396L, 399L, 401L, 416L, 419L, 429L, 438L, 441L, 447L, 459L, 460L, 474L, 481L, 484L, 489L, 504L, 508L, 509L, 510L) +ans = c(NA, 52.5, 47, 16, 27.5, 45, 47, 45, 54, 29, 38, 45, 45, 39, 39, 26, 16, 23.6666666666667, 22.6666666666667, 57, 31.5, 14, 20, 17, 24, 53, 58, 44.5, 47.6666666666667, 45.3333333333333, 47.75, 37, 22, 21.5, 4, 53, 33.5, 24.5, 24, 32.3333333333333, 37, 42, 44, 45, 39.75, 59, 58.5, 56.6666666666667, 36, 38.5, 5, 57, 54, 47.5, 21, 12, 45, 28.5, 36.5, 43, 47, 52, 52, 53.25) +test(6015.305, frollmean(x, frolladapt(idx, n), adaptive=TRUE), ans) +#### "completely" sparse: sort(sample(N*64, N)) +x = c(63L, 42L, 47L, 16L, 39L, 45L, 47L, 43L, 54L, 29L, 47L, 45L, 45L, 27L, 39L, 26L, 6L, 39L, 23L, 57L, 6L, 22L, 20L, 14L, 24L, 53L, 58L, 31L, 54L, 51L, 55L, 19L, 22L, 21L, 4L, 53L, 14L, 35L, 13L, 49L, 51L, 42L, 46L, 47L, 24L, 59L, 58L, 53L, 36L, 41L, 5L, 57L, 51L, 44L, 21L, 3L, 45L, 12L, 61L, 25L, 47L, 57L, 52L, 57L) +idx = c(3L, 118L, 130L, 301L, 401L, 670L, 684L, 730L, 739L, 741L, 751L, 874L, 878L, 950L, 959L, 1178L, 1309L, 1361L, 1372L, 1453L, 1462L, 1633L, 1652L, 1666L, 1746L, 1840L, 1977L, 2047L, 2105L, 2232L, 2262L, 2287L, 2346L, 2401L, 2431L, 2447L, 2555L, 2563L, 2570L, 2620L, 2745L, 2853L, 2887L, 2938L, 2939L, 2956L, 3020L, 3049L, 3075L, 3114L, 3215L, 3226L, 3232L, 3255L, 3382L, 3397L, 3436L, 3488L, 3725L, 3884L, 3937L, 3944L, 3960L, 4041L) +ans = c(NA, 42, 47, 16, 39, 45, 47, 43, 54, 41.5, 47, 45, 45, 27, 39, 26, 6, 39, 23, 57, 6, 22, 20, 14, 24, 53, 58, 31, 54, 51, 55, 19, 22, 21, 4, 53, 14, 35, 24, 49, 51, 42, 46, 47, 35.5, 59, 58, 53, 36, 41, 5, 57, 54, 44, 21, 3, 45, 12, 61, 25, 47, 52, 52, 57) +test(6015.306, frollmean(x, frolladapt(idx, n), adaptive=TRUE), ans) + +#### Time classes + +x = as.Date("2022-10-24") + c(0:1,4,6,7) +test(6015.401, frolladapt(x, 2L), c(NA,2L,1L,1L,2L)) +test(6015.402, frolladapt(x, 2L, partial=TRUE), c(1L,2L,1L,1L,2L)) + +x = as.IDate("2022-10-24") + c(0:1,4,6,7) +test(6015.501, frolladapt(x, 2L), c(NA,2L,1L,1L,2L)) +test(6015.502, frolladapt(x, 2L, partial=TRUE), c(1L,2L,1L,1L,2L)) + +x = as.ITime(as.ITime("19:33:30") + c(0:1,4,6,7)*60) +test(6015.601, frolladapt(x, 2*60), c(NA,NA,1L,1L,2L)) ## 2 minutes +test(6015.602, frolladapt(x, 2*60, partial=TRUE), c(1L,2L,1L,1L,2L)) + +x = as.POSIXct("2022-10-24 19:34:30") + c(0:1,4,6,7)*60 +test(6015.701, frolladapt(x, 2*60), c(NA,NA,1L,1L,2L)) ## 2 minutes +test(6015.702, frolladapt(x, 2*60, partial=TRUE), c(1L,2L,1L,1L,2L)) + +x = as.POSIXct("2022-10-24 19:34:30.005") + c(0:1,4,6,7)*60 +test(6015.801, frolladapt(x, 2*60), c(NA,NA,1L,1L,2L)) +test(6015.802, frolladapt(x, 2*60, partial=TRUE), c(1L,2L,1L,1L,2L)) +x = c(as.POSIXct("2022-10-24 19:34:00.900"), as.POSIXct("2022-10-24 19:36:00.005"), as.POSIXct("2022-10-24 19:38:00.006")) +test(6015.803, frolladapt(x, 2*60), c(NA,1L,1L)) ## sub seconds truncation +test(6015.804, frolladapt(x, 2*60, partial=TRUE), c(1L,1L,1L)) +test(6015.805, frolladapt(as.POSIXct(round(x)), 2*60), c(NA,2L,1L)) +test(6015.806, frolladapt(as.POSIXct(round(x)), 2*60, partial=TRUE), c(1L,2L,1L)) + +test(6015.901, frolladapt(TRUE, 1L), error="must of numeric type") +test(6015.902, frolladapt(1L, FALSE), error="must be integer") +test(6015.903, frolladapt(1L, list(1L)), error="must be integer") +test(6015.904, frolladapt(1L, 1.1), error="must be integer") +test(6015.905, frolladapt(1L, 1L, partial=NA), error="must be TRUE or FALSE") +test(6015.906, frolladapt(1L, 1L, give.names=NA), error="must be TRUE or FALSE") +test(6015.907, frolladapt(c(1L,3L,2L), 2L), error="be sorted, have no duplicates, have no NAs") +test(6015.908, frolladapt(c(1L,2L,2L), 2L), error="be sorted, have no duplicates, have no NAs") +test(6015.909, frolladapt(c(1L,2L,NA_integer_), 2L), error="be sorted, have no duplicates, have no NAs") ## loop that checks for sorted will detect NAs as well, except for first element +test(6015.910, frolladapt(c(NA_integer_,1L,2L), 2L), error="be sorted, have no duplicates, have no NAs") ## first NA is detected by extra check + +# batch validation set.seed(108) makeNA = function(x, ratio=0.1, nf=FALSE) { @@ -862,33 +1805,57 @@ makeNA = function(x, ratio=0.1, nf=FALSE) { } x } -num = 6001.0 ## against base to verify exactness of non-finite values, not handled in zoo -rollfun = function(x, n, FUN, fill=NA_real_, na.rm=FALSE, nf.rm=FALSE) { +rollfun = function(x, n, FUN, fill=NA_real_, na.rm=FALSE, nf.rm=FALSE, partial=FALSE) { ans = rep(fill, nx<-length(x)) f = match.fun(FUN) if (nf.rm) x[is.infinite(x)] = NA_real_ - for (i in n:nx) ans[i] = f(x[(i-n+1):i], na.rm=na.rm) + for (i in seq_along(x)) { + ans[i] = if (i >= n) + f(x[(i-n+1):i], na.rm=na.rm) + else if (partial) + f(x[max((i-n+1), 1L):i], na.rm=na.rm) + else + as.double(fill) + } ans } -base_compare = function(x, n, funs=c("mean","sum"), algos=c("fast","exact")) { +base_compare = function(x, n, funs=c("mean","sum","max","min","prod"), algos=c("fast","exact")) { num.step = 0.001 for (fun in funs) { for (na.rm in c(FALSE, TRUE)) { for (fill in c(NA_real_, 0)) { - for (algo in algos) { + for (partial in c(FALSE,TRUE)) { + for (has.nf in c(NA,TRUE,FALSE)) { + if (identical(has.nf, FALSE)) { + if (na.rm) + next ## errors "not make sense" + if (any(!is.finite(x))) + next ## do not test warnings (mean, sum, prod) or incorrect expect results (max, min) + } + for (algo in algos) { + num <<- num + num.step + eval(substitute( # so we can have values displayed in output/log rather than variables + test(.num, ignore.warning="no non-missing arguments", + rollfun(x, n, FUN=.fun, fill=.fill, na.rm=.na.rm, partial=.partial), + froll(.fun, x, n, fill=.fill, na.rm=.na.rm, algo=.algo, partial=.partial, has.nf=.has.nf)), + list(.num=num, .fun=fun, .fill=fill, .na.rm=na.rm, .algo=algo, .partial=partial, .has.nf=has.nf) + )) + } + } num <<- num + num.step eval(substitute( # so we can have values displayed in output/log rather than variables - test(.num, - froll(.fun, x, n, fill=.fill, na.rm=.na.rm, algo=.algo), - rollfun(x, n, FUN=.fun, fill=.fill, na.rm=.na.rm, nf.rm=.nf.rm)), - list(.num=num, .fun=fun, .fill=fill, .na.rm=na.rm, .algo=algo, .nf.rm=algo!="exact") + test(.num, ignore.warning="no non-missing arguments", + frollapply(x, n, FUN=.fun, fill=.fill, na.rm=.na.rm, partial=.partial), + froll(.fun, x, n, fill=.fill, na.rm=.na.rm, partial=.partial)), + list(.num=num, .fun=fun, .fill=fill, .na.rm=na.rm, .partial=partial) )) } } } } } +num = 7001.0 ## random NA non-finite x = makeNA(rnorm(1e3), nf=TRUE); n = 50 base_compare(x, n) @@ -898,24 +1865,43 @@ x = makeNA(rnorm(1e3), nf=TRUE); n = 51 base_compare(x, n) x = makeNA(rnorm(1e3+1), nf=TRUE); n = 51 base_compare(x, n) -num = 6002.0 + #### against zoo if (requireNamespace("zoo", quietly=TRUE)) { drollapply = function(...) as.double(zoo::rollapply(...)) # rollapply is not consistent in data type of answer, force to double - zoo_compare = function(x, n, funs=c("mean","sum"), algos=c("fast","exact")) { + zoo_compare = function(x, n, funs=c("mean","sum","max","min","prod"), algos=c("fast","exact")) { num.step = 0.0001 - #### fun, align, na.rm, fill, algo + #### fun, align, na.rm, fill, algo, partial for (fun in funs) { for (align in c("right","center","left")) { for (na.rm in c(FALSE, TRUE)) { for (fill in c(NA_real_, 0)) { - for (algo in algos) { + for (partial in c(FALSE,TRUE)) { + if (partial && align=="center") + next ## center does not work with adaptive, so partial as well + for (has.nf in c(NA,TRUE,FALSE)) { + if (identical(has.nf, FALSE)) { + if (na.rm) + next ## errors "not make sense" + if (any(!is.finite(x))) + next ## do not test warnings (mean, sum, prod) or incorrect expect results (max, min) + } + for (algo in algos) { + num <<- num + num.step + eval(substitute( # so we can have values displayed in output/log rather than variables + test(.num, ignore.warning="no non-missing arguments", + drollapply(x, n, FUN=.fun, fill=.fill, align=.align, na.rm=.na.rm, partial=.partial), + froll(.fun, x, n, align=.align, fill=.fill, na.rm=.na.rm, algo=.algo, partial=.partial, has.nf=.has.nf)), + list(.num=num, .fun=fun, .align=align, .fill=fill, .na.rm=na.rm, .algo=algo, .partial=partial, .has.nf=has.nf) + )) + } + } num <<- num + num.step eval(substitute( # so we can have values displayed in output/log rather than variables - test(.num, - froll(.fun, x, n, align=.align, fill=.fill, na.rm=.na.rm, algo=.algo), - drollapply(x, n, FUN=.fun, fill=.fill, align=.align, na.rm=.na.rm)), - list(.num=num, .fun=fun, .align=align, .fill=fill, .na.rm=na.rm, .algo=algo) + test(.num, ignore.warning="no non-missing arguments", + frollapply(x, n, FUN=.fun, fill=.fill, align=.align, na.rm=.na.rm, partial=.partial), + froll(.fun, x, n, align=.align, fill=.fill, na.rm=.na.rm, partial=.partial)), + list(.num=num, .fun=fun, .align=align, .fill=fill, .na.rm=na.rm, .partial=partial) )) } } @@ -923,6 +1909,7 @@ if (requireNamespace("zoo", quietly=TRUE)) { } } } + num = 7002.0 ## no NA x = rnorm(1e3); n = 50 # x even, n even zoo_compare(x, n) @@ -952,39 +1939,69 @@ if (requireNamespace("zoo", quietly=TRUE)) { zoo_compare(x, n) } #### adaptive moving average compare -num = 6003.0 -arollfun = function(fun, x, n, na.rm=FALSE, fill=NA, nf.rm=FALSE) { +arollfun = function(FUN, x, n, na.rm=FALSE, align=c("right","left"), fill=NA, nf.rm=FALSE, partial=FALSE) { # adaptive moving average in R stopifnot((nx<-length(x))==length(n)) - ans = rep(NA_real_, nx) + align = match.arg(align) + ans = rep(fill, nx) if (nf.rm) x[is.infinite(x)] = NA_real_ - FUN = match.fun(fun) - for (i in seq_along(x)) { - ans[i] = if (i >= n[i]) - FUN(x[(i-n[i]+1):i], na.rm=na.rm) - else as.double(fill) + f = match.fun(FUN) + if (align=="right") { + for (i in seq_along(x)) { + if (i >= n[i]) + ans[i] = f(x[(i-n[i]+1):i], na.rm=na.rm) + else if (partial) + ans[i] = f(x[1L:i], na.rm=na.rm) + } + } else { + for (i in seq_along(x)) { + if (i <= nx-n[i]+1) + ans[i] = f(x[i:(i+n[i]-1)], na.rm=na.rm) + else if (partial) + ans[i] = f(x[i:length(x)], na.rm=na.rm) + } } ans } -afun_compare = function(x, n, funs=c("mean","sum"), algos=c("fast","exact")) { +afun_compare = function(x, n, funs=c("mean","sum","max","min","prod"), algos=c("fast","exact")) { num.step = 0.0001 - #### fun, na.rm, fill, algo + #### fun, align, na.rm, fill, algo for (fun in funs) { - for (na.rm in c(FALSE, TRUE)) { - for (fill in c(NA_real_, 0)) { - for (algo in algos) { - num <<- num + num.step - eval(substitute( - test(.num, - froll(.fun, x, n, fill=.fill, na.rm=.na.rm, algo=.algo, adaptive=TRUE), - arollfun(.fun, x, n, fill=.fill, na.rm=.na.rm, nf.rm=.nf.rm)), - list(.num=num, .fun=fun, .fill=fill, .na.rm=na.rm, .algo=algo, .nf.rm=algo!="exact") - )) + for (align in c("right","left")) { + for (na.rm in c(FALSE, TRUE)) { + for (fill in c(NA_real_, 0)) { + for (partial in c(FALSE,TRUE)) { + for (has.nf in c(NA,TRUE,FALSE)) { + if (identical(has.nf, FALSE)) { + if (na.rm) + next ## errors "not make sense" + if (any(!is.finite(x))) + next ## do not test warnings (mean, sum, prod) or incorrect expect results (max, min) + } + for (algo in algos) { + num <<- num + num.step + eval(substitute( + test(.num, ignore.warning="no non-missing arguments", + arollfun(.fun, x, n, fill=.fill, na.rm=.na.rm, align=.align, partial=.partial), + froll(.fun, x, n, fill=.fill, na.rm=.na.rm, algo=.algo, adaptive=TRUE, align=.align, has.nf=.has.nf, partial=.partial)), + list(.num=num, .fun=fun, .fill=fill, .na.rm=na.rm, .algo=algo, .align=align, .has.nf=has.nf, .partial=partial) + )) + } + } + num <<- num + num.step + eval(substitute( + test(.num, ignore.warning="no non-missing arguments", + frollapply(x, n, FUN=.fun, fill=.fill, na.rm=.na.rm, adaptive=TRUE, align=.align, partial=.partial), + froll(.fun, x, n, fill=.fill, na.rm=.na.rm, adaptive=TRUE, align=.align, partial=.partial)), + list(.num=num, .fun=fun, .fill=fill, .na.rm=na.rm, .align=align, .partial=partial) + )) + } } } } } } +num = 7003.0 #### no NA x = rnorm(1e3); n = sample(50, length(x), TRUE) # x even, n even afun_compare(x, n) @@ -1022,96 +2039,3 @@ afun_compare(x, n) x = makeNA(rnorm(1e3+1), nf=TRUE); n = sample(51, length(x), TRUE) afun_compare(x, n) rm(num) - -## frollapply -x = 1:10 -test(6010.001, frollsum(x, 3L), frollapply(x, 3L, sum)) -test(6010.002, frollsum(x, 6), frollapply(x, 6, sum)) -test(6010.003, frollmean(x, 3), frollapply(x, 3, mean)) -d = as.data.table(list(1:6/2, 3:8/4)) -test(6010.004, frollsum(d, 3:4), frollapply(d, 3:4, sum)) -test(6010.005, frollmean(d, 3:4), frollapply(d, 3:4, mean)) -d = rbind(d, list(NA,NA)) -ans = list(c(NA,NA,1.5,2,1.5,2,2.5), c(NA,NA,NA,2,1,1.5,2), c(NA,NA,1.25,1.5,1.75,1.5,2), c(NA,NA,NA,1.5,1,1.25,1.5)) -test(6010.006, frollapply(d, 3:4, function(x, ...) if (sum(x, ...)>5) min(x, ...) else max(x, ...), na.rm=TRUE), ans) -# segfault and protect limits #3993 - disabled by default due to high memory usage -if (FALSE) { - test(6010.007, frollapply(1, rep(1L, 1e5), identity), as.list(rep(1, 1e5))) - test(6010.008, frollapply(1, rep(1L, 1e6), identity), as.list(rep(1, 1e6))) - test(6010.009, frollapply(as.list(rep(1, 1e6)), 1, identity), as.list(rep(1, 1e6))) -} -#### corner cases from examples -test(6010.101, frollapply(1:5, 3, function(x) head(x, 2)), error="frollapply: results from provided FUN are not length 1") -f = function(x) { - n = length(x) - # length 1 will be returned only for first iteration where we check length - if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored -} -test(6010.102, frollapply(1:5, 3, f), c(NA,NA,1,2,3)) -options(datatable.verbose=TRUE) -x = c(1,2,1,1,1,2,3,2) -ans = c(NA,NA,2,2,1,2,3,2) -numUniqueN = function(x) as.numeric(uniqueN(x)) -test(6010.103, frollapply(x, 3, uniqueN), ans, output=c("frollapplyR: allocating memory.*","frollapply: results from provided FUN are not of type double, coercion from integer or logical will be applied on each iteration.*","frollapply: took.*","frollapplyR: processing.*took.*")) -test(6010.104, frollapply(x, 3, numUniqueN), ans, output=c("frollapplyR: allocating memory.*","frollapply: took.*","frollapplyR: processing.*took.*")) -test(6010.105, as.logical(frollapply(c(1,2,1,1,NA,2,NA,2), 3, anyNA)), c(NA,NA,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE), output=c("frollapplyR: allocating memory.*","frollapply: results from provided FUN are not of type double, coercion from integer or logical will be applied on each iteration","frollapply: took.*","frollapplyR: processing.*took.*")) -f = function(x) { - n = length(x) - # double type will be returned only for first iteration where we check type - if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double -} -#test(6010.106, head(frollapply(1:5, 3, f), 3L), c(NA_real_,NA_real_,1), output=c("frollapplyR: allocating memory.*","frollapply: took.*","frollapplyR: processing.*took.*")) # only head 3 is valid, rest is undefined as REAL is applied on logical type, can return garbage or fail with REAL error -options(datatable.verbose=FALSE) -#### test coverage -test(6010.501, frollapply(1:3, "b", sum), error="n must be integer") -test(6010.502, frollapply(1:3, 2.5, sum), error="n must be integer") -test(6010.503, frollapply(1:3, integer(), sum), error="n must be non 0 length") -test(6010.504, frollapply(1:3, 2L, sum, fill=1:2), error="fill must be a vector of length 1") -test(6010.505, frollapply(1:3, 2L, sum, fill=NA_integer_), c(NA,3,5)) -test(6010.506, frollapply(1:3, 2L, sum, fill=-1L), c(-1,3,5)) -test(6010.507, frollapply(1:3, 2L, sum, fill=-2), c(-2,3,5)) -test(6010.508, frollapply(1:3, 2L, sum, fill="z"), error="fill must be numeric") -test(6010.509, frollapply(1:3, 4L, sum), c(NA_real_,NA_real_,NA_real_)) -test(6010.510, frollapply(1:5, 3L, sum), c(NA,NA,6,9,12)) -test(6010.511, frollapply(1:5, 3L, sum, align="center"), c(NA,6,9,12,NA)) -test(6010.512, frollapply(1:5, 3L, sum, align="left"), c(6,9,12,NA,NA)) -test(6010.513, frollapply(1:5, 4L, sum), c(NA,NA,NA,10,14)) -test(6010.514, frollapply(1:5, 4L, sum, align="center"), c(NA,10,14,NA,NA)) -test(6010.515, frollapply(1:5, 4L, sum, align="left"), c(10,14,NA,NA,NA)) -test(6010.516, frollapply(1:6, 3L, sum), c(NA,NA,6,9,12,15)) -test(6010.517, frollapply(1:6, 3L, sum, align="center"), c(NA,6,9,12,15,NA)) -test(6010.518, frollapply(1:6, 3L, sum, align="left"), c(6,9,12,15,NA,NA)) -test(6010.519, frollapply(1:6, 4L, sum), c(NA,NA,NA,10,14,18)) -test(6010.520, frollapply(1:6, 4L, sum, align="center"), c(NA,10,14,18,NA,NA)) -test(6010.521, frollapply(1:6, 4L, sum, align="left"), c(10,14,18,NA,NA,NA)) -test(6010.522, frollapply(c(1:3,NA,5:6), 4L, sum), rep(NA_real_,6)) -test(6010.523, frollapply(c(1:3,NA,5:6), 4L, sum, na.rm=TRUE), c(NA,NA,NA,6,10,14)) -test(6010.524, frollapply(c(1,2,3,NA,NA,NA,NA), 3L, mean), c(NA,NA,2,NA,NA,NA,NA)) -test(6010.525, frollapply(c(1,2,3,NA,NA,NA,NA), 3L, mean, na.rm=TRUE), c(NA,NA,2,2.5,3,NaN,NaN)) -test(6010.526, frollapply(numeric(), 3L, sum), numeric()) -test(6010.527, frollapply(1:5, 3L, toString), error="frollapply: results from provided FUN are not of type double") -options(datatable.verbose=TRUE) -test(6010.551, frollapply(1:3, 4L, sum), c(NA_real_,NA_real_,NA_real_), output=c("frollapplyR: allocating memory.*","frollapply: window width longer than input vector.*","frollapplyR: processing.*took.*")) -test(6010.552, frollapply(1:5, 3L, sum), c(NA,NA,6,9,12), output=c("frollapplyR: allocating memory.*","frollapply: took.*","frollapplyR: processing.*took.*")) -test(6010.553, frollapply(1:5, 3L, sum, align="center"), c(NA,6,9,12,NA), output=c("frollapplyR: allocating memory.*","frollapply: align 0, shift.*","frollapply: took.*","frollapplyR: processing.*took.*")) -options(datatable.verbose=FALSE) -ma = function(x, n, na.rm=FALSE) { - ans = rep(NA_real_, nx<-length(x)) - for (i in n:nx) ans[i]=mean(x[(i-n+1):i], na.rm=na.rm) - ans -} -n = 4L -x = as.double(1:16) -x[5] = NaN -test(6010.601, frollapply(x, n, mean), ma(x, n)) -x[6] = NA -test(6010.602, frollapply(x, n, mean), ma(x, n)) -x[5] = NA -x[6] = NaN -test(6010.603, frollapply(x, n, mean), ma(x, n)) -x[5] = Inf -test(6010.604, frollapply(x, n, mean), ma(x, n)) -x[6] = -Inf -test(6010.605, frollapply(x, n, mean), ma(x, n)) -x[5:7] = c(NA, Inf, -Inf) -test(6010.606, frollapply(x, n, mean), ma(x, n)) diff --git a/man/froll.Rd b/man/froll.Rd index 090b397a90..d8997b424b 100644 --- a/man/froll.Rd +++ b/man/froll.Rd @@ -1,121 +1,118 @@ -\name{roll} -\alias{roll} +\name{froll} \alias{froll} +\alias{roll} \alias{rolling} \alias{sliding} \alias{moving} -\alias{rollmean} \alias{frollmean} -\alias{rollsum} \alias{frollsum} -\alias{rollapply} -\alias{frollapply} +\alias{frollmax} +\alias{frollmin} +\alias{frollprod} +\alias{rollmean} +\alias{rollsum} +\alias{rollmax} +\alias{rollmin} +\alias{rollprod} \title{Rolling functions} \description{ - Fast rolling functions to calculate aggregates on sliding windows. Function name and arguments are experimental. + Fast rolling functions to calculate aggregates on sliding windows. For user-defined rolling function see \code{\link{frollapply}}. } \usage{ -frollmean(x, n, fill=NA, algo=c("fast", "exact"), - align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) -frollsum(x, n, fill=NA, algo=c("fast","exact"), - align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) -frollapply(x, n, FUN, \dots, fill=NA, align=c("right", "left", "center")) + frollmean(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), + na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) + frollsum(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), + na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) + frollmax(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), + na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) + frollmin(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), + na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) + frollprod(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"), + na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, give.names=FALSE, hasNA) } \arguments{ \item{x}{ Vector, \code{data.frame} or \code{data.table} of integer, numeric or logical columns over which to calculate the windowed aggregations. May also be a list, in which case the rolling function is applied to each of its elements. } - \item{n}{ Integer vector giving rolling window size(s). This is the \emph{total} number of included values. Adaptive rolling functions also accept a list of integer vectors. } + \item{n}{ Integer vector giving rolling window size(s). This is the \emph{total} number of included values in aggregate function. Adaptive rolling functions also accept a list of integer vectors when applying multiple window sizes. } \item{fill}{ Numeric; value to pad by. Defaults to \code{NA}. } - \item{algo}{ Character, default \code{"fast"}. When set to \code{"exact"}, a slower (but more accurate) algorithm is used. It - suffers less from floating point rounding errors by performing an extra pass, and carefully handles all non-finite values. - It will use mutiple cores where available. See Details for more information. } + \item{algo}{ Character, default \code{"fast"}. When set to \code{"exact"}, a slower (in some cases more accurate) algorithm is used. See \emph{Implementation} section below for details. } \item{align}{ Character, specifying the "alignment" of the rolling window, defaulting to \code{"right"}. \code{"right"} covers preceding rows (the window \emph{ends} on the current value); \code{"left"} covers following rows (the window \emph{starts} on the current value); \code{"center"} is halfway in between (the window is \emph{centered} on the current value, biased towards \code{"left"} when \code{n} is even). } - \item{na.rm}{ Logical, default \code{FALSE}. Should missing values be removed when - calculating window? For details on handling other non-finite values, see Details. } - \item{hasNA}{ Logical. If it is known that \code{x} contains \code{NA} - then setting this to \code{TRUE} will speed up calculation. Defaults to \code{NA}. } - \item{adaptive}{ Logical, default \code{FALSE}. Should the rolling function be calculated adaptively? See Details below. } - \item{FUN}{ The function to be applied to the rolling window; see Details for restrictions. } - \item{\dots}{ Extra arguments passed to \code{FUN} in \code{frollapply}. } + \item{na.rm}{ Logical, default \code{FALSE}. Should missing values be removed when calculating window? } + \item{has.nf}{ Logical. If it is known that \code{x} contains (or not) non-finite values (\code{NA, NaN, Inf, -Inf}) then setting this to \code{TRUE}/\code{FALSE} may speed up computation. Defaults to \code{NA}. See \emph{has.nf argument} section below for details. } + \item{adaptive}{ Logical, default \code{FALSE}. Should the rolling function be calculated adaptively? See \emph{Adaptive rolling functions} section below for details. } + \item{partial}{ Logical, default \code{FALSE}. Should the rolling window size(s) provided in \code{n} be trimmed to available observations. See \emph{\code{partial} argument} section below for details. } + \item{give.names}{ Logical, default \code{FALSE}. When \code{TRUE}, names are automatically generated corresponding to names of \code{x} and names of \code{n}. If answer is an atomic vector, then the argument is ignored, see examples. } + \item{hasNA}{ Logical. Deprecated, use \code{has.nf} argument instead. } } \details{ - \code{froll*} functions accept vectors, lists, \code{data.frame}s or - \code{data.table}s. They always return a list except when the input is a - \code{vector} and \code{length(n)==1}, in which case a \code{vector} - is returned, for convenience. Thus, rolling functions can be used - conveniently within \code{data.table} syntax. - - Argument \code{n} allows multiple values to apply rolling functions on - multiple window sizes. If \code{adaptive=TRUE}, then \code{n} must be a list. - Each list element must be integer vector of window sizes corresponding - to every single observation in each column; see Examples. + \code{froll*} functions accept vector, list, \code{data.frame} or \code{data.table}. Functions operate on a single vector, when passing a non-atomic input, then function is applied column-by-column, not to a complete set of column at once. - When \code{algo="fast"} an \emph{"on-line"} algorithm is used, and - all of \code{NaN, +Inf, -Inf} are treated as \code{NA}. - Setting \code{algo="exact"} will make rolling functions to use a more - computationally-intensive algorithm that suffers less from floating point - rounding error (the same consideration applies to \code{\link[base]{mean}}). - \code{algo="exact"} also handles \code{NaN, +Inf, -Inf} consistently to - base R. In case of some functions (like \emph{mean}), it will additionally - make extra pass to perform floating point error correction. Error - corrections might not be truly exact on some platforms (like Windows) - when using multiple threads. + Argument \code{n} allows multiple values to apply rolling function on multiple window sizes. If \code{adaptive=TRUE}, then \code{n} can be a list to specify multiple window sizes for adaptive rolling computation. See \emph{Adaptive rolling functions} section below for details. - Adaptive rolling functions are a special case where each - observation has its own corresponding rolling window width. Due to the logic - of adaptive rolling functions, the following restrictions apply: - \itemize{ - \item{ \code{align} only \code{"right"}. } - \item{ if list of vectors is passed to \code{x}, then all - vectors within it must have equal length. } - } + When multiple columns and/or multiple windows width are provided, then computation run in parallel. The exception is for \code{algo="exact"}, which runs in parallel even for single column and single window width. By default data.table uses only half of available CPUs, see \code{\link{setDTthreads}} for details on how to tune CPU usage. - When multiple columns or multiple windows width are provided, then they - are run in parallel. The exception is for \code{algo="exact"}, which runs in - parallel already. - - \code{frollapply} computes rolling aggregate on arbitrary R functions. - The input \code{x} (first argument) to the function \code{FUN} - is coerced to \emph{numeric} beforehand and \code{FUN} - has to return a scalar \emph{numeric} value. Checks for that are made only - during the first iteration when \code{FUN} is evaluated. Edge cases can be - found in examples below. Any R function is supported, but it is not optimized - using our own C implementation -- hence, for example, using \code{frollapply} - to compute a rolling average is inefficient. It is also always single-threaded - because there is no thread-safe API to R's C \code{eval}. Nevertheless we've - seen the computation speed up vis-a-vis versions implemented in base R. + Setting \code{options(datatable.verbose=TRUE)} will display various information about how rolling function processed. It will not print information in a real-time but only at the end of the processing. } \value{ - A list except when the input is a \code{vector} and - \code{length(n)==1} in which case a \code{vector} is returned. + A list except when the input is not \emph{vectorized} (\code{x} is not a list, and \code{n} specify single rolling window), in which case a \code{vector} is returned, for convenience. Thus, rolling functions can be used conveniently within \code{data.table} syntax. } \note{ - Users coming from most popular package for rolling functions - \code{zoo} might expect following differences in \code{data.table} - implementation. + Be aware that rolling functions operates on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, then one has to ensure that there are no gaps in the input, or use adaptive rolling function to handle gaps, for which we provide helper function \code{\link{frolladapt}} to generate adaptive window size. +} +\section{\code{has.nf} argument}{ + \code{has.nf} can be used to speed up processing in cases when it is known if \code{x} contains (or not) non-finite values (\code{NA, NaN, Inf, -Inf}). + \itemize{ + \item{ Default \code{has.nf=NA} uses faster implementation that does not support non-finite values, but when non-finite values are detected it will re-run non-finite supported implementation. } + \item{ \code{has.nf=TRUE} uses non-finite aware implementation straightaway. } + \item{ \code{has.nf=FALSE} uses faster implementation that does not support non-finite values. Then depending on the rolling function it will either: + \itemize{ + \item{ (\emph{mean, sum, prod}) detect non-finite, re-run non-finite aware. } + \item{ (\emph{max, min}) not detect NFs and may silently give incorrect answer. } + } + In general \code{has.nf=FALSE && any(!is.finite(x))} should be considered as undefined behavior. Therefore \code{has.nf=FALSE} should be used with care. } + } +} +\section{Implementation}{ + Each rolling function has 4 different implementations. First factor that decides which implementation is being used is \code{adaptive} argument, see setion below for details. Then for each of those two algorithms (adaptive \code{TRUE/FALSE}) there are two \code{algo} argument values. \itemize{ - \item{ rolling function will always return result of the same length - as input. } + \item{ \code{algo="fast"} uses \emph{"on-line"}, single pass, algorithm. + \itemize{ + \item{ \emph{max} and \emph{min} rolling function will not do only a single pass but, on average \code{length(x)/n}, nested loops will be computed. The bigger the window the bigger advantage over algo \emph{exact} which computes \code{length(x)} nested loops. Note that \emph{exact} uses multiple CPUs so for a small window size and many CPUs it is possible it will be actually faster than \emph{fast} but in those cases elapsed timings will likely be far below a single second. } + \item{ Not all functions have \emph{fast} implementation available. As of now \emph{max} and \emph{min} in case of \code{adaptive=TRUE} do not not have \emph{fast} implementation, therefore it will automatically fall back to \emph{exact} implementation. \code{datatable.verbose} option can be used to check that. } + }} + \item{ \code{algo="exact"} will make rolling functions to use a more computationally-intensive algorithm. For each observation from input vector it will compute a function on a window from scratch (complexity \eqn{O(n^2)}). + \itemize{ + \item { Depeneding on the function, this algorithm may suffers less from floating point rounding error (the same consideration applies to base \code{\link[base]{mean}}). } + \item{ In case of \emph{mean} (and possibly other functions in future), it will additionally make extra pass to perform floating point error correction. Error corrections might not be truly exact on some platforms (like Windows) when using multiple threads. } + }} + } +} +\section{Adaptive rolling functions}{ + Adaptive rolling functions are a special case where each observation has its own corresponding rolling window width. Therefore values passed to \code{n} argument must be series corresponding to observations in \code{x}. If multiple windows is meant to be computed then a list of integer vectors is expected; each list element must be an integer vector of window size corresponding to observations in \code{x}; see Examples. Due to the logic or implementation of adaptive rolling functions, the following restrictions apply + \itemize{ + \item{ \code{align} does not support \code{"center"}. } + \item{ if list of vectors is passed to \code{x}, then all vectors within it must have equal length due to the fact that length of adaptive window widths must match the length of vectors in \code{x}. } + } +} +\section{\code{partial} argument}{ + \code{partial=TRUE} will turn a function into adaptive function and trim window size in \code{n} argument using \code{n = c(seq.int(n), rep(n, len-n))} to available observations. It inherits limitations of adaptive rolling functions, see above. Adaptive functions uses more complex algorithms, therefore if performance is important then \code{partial=TRUE} should be avoided in favour of computing only missing observations separately after the rolling function; see examples. +} +\section{\code{zoo} package users notice}{ + Users coming from most popular package for rolling functions \code{zoo} might expect following differences in \code{data.table} implementation + \itemize{ + \item{ rolling function will always return result of the same length as input. } \item{ \code{fill} defaults to \code{NA}. } - \item{ \code{fill} accepts only constant values. It does not support - for \emph{na.locf} or other functions. } + \item{ \code{fill} accepts only constant values. No support for \emph{na.locf} or other functions. } \item{ \code{align} defaults to \code{"right"}. } - \item{ \code{na.rm} is respected, and other functions are not needed - when input contains \code{NA}. } + \item{ \code{na.rm} is respected, and other functions are not needed when input contains \code{NA}. } \item{ integers and logical are always coerced to double. } - \item{ when \code{adaptive=FALSE} (default), then \code{n} must be a - numeric vector. List is not accepted. } - \item{ when \code{adaptive=TRUE}, then \code{n} must be vector of - length equal to \code{nrow(x)}, or list of such vectors. } - \item{ \code{partial} window feature is not supported, although it can - be accomplished by using \code{adaptive=TRUE}, see examples. \code{NA} is always returned for incomplete windows. } + \item{ when \code{adaptive=FALSE} (default), then \code{n} must be a numeric vector. List is not accepted. } + \item{ when \code{adaptive=TRUE}, then \code{n} must be vector of length equal to \code{nrow(x)}, or list of such vectors. } } - - Be aware that rolling functions operates on the physical order of input. - If the intent is to roll values in a vector by a logical window, for - example an hour, or a day, one has to ensure that there are no gaps in - input. For details see \href{https://github.com/Rdatatable/data.table/issues/3241}{issue #3241}. } \examples{ +# single vector and single window +frollmean(1:6, 3) + d = as.data.table(list(1:6/2, 3:8/4)) # rollmean of single vector and single window frollmean(d[, V1], 3) @@ -127,18 +124,31 @@ frollmean(d[, .(V1)], c(3, 4)) frollmean(d, c(3, 4)) ## three calls above will use multiple cores when available -# partial window using adaptive rolling function +# other functions +frollsum(d, 3:4) +frollmax(d, 3:4) +frollmin(d, 3:4) +frollprod(d, 3:4) + +# partial=TRUE +x = 1:6/2 +n = 3 +ans1 = frollmean(x, n, partial=TRUE) +# same using adaptive=TRUE an = function(n, len) c(seq.int(n), rep(n, len-n)) -n = an(3, nrow(d)) -frollmean(d, n, adaptive=TRUE) +ans2 = frollmean(x, an(n, length(x)), adaptive=TRUE) +all.equal(ans1, ans2) +# much faster by using partial only for incomplete observations +ans3 = frollmean(x, n) +ans3[seq.int(n-1L)] = frollmean(x[seq.int(n-1L)], n, partial=TRUE) +all.equal(ans1, ans3) -# frollsum -frollsum(d, 3:4) +# give.names +frollsum(list(x=1:5, y=5:1), c(tiny=2, big=4), give.names=TRUE) -# frollapply -frollapply(d, 3:4, sum) -f = function(x, ...) if (sum(x, ...)>5) min(x, ...) else max(x, ...) -frollapply(d, 3:4, f, na.rm=TRUE) +# has.nf=FALSE should be used with care +frollmax(c(1,2,NA,4,5), 2) +frollmax(c(1,2,NA,4,5), 2, has.nf=FALSE) # performance vs exactness set.seed(108) @@ -160,7 +170,7 @@ system.time(ans1<-ma(x, n)) system.time(ans2<-fastma(x, n)) system.time(ans3<-frollmean(x, n)) system.time(ans4<-frollmean(x, n, algo="exact")) -system.time(ans5<-frollapply(x, n, mean)) +system.time(ans5<-frollapply(x, n, mean, simplify=unlist)) anserr = list( fastma = ans2-ans1, froll_fast = ans3-ans1, @@ -169,38 +179,9 @@ anserr = list( ) errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE) sapply(errs, format, scientific=FALSE) # roundoff - -# frollapply corner cases -f = function(x) head(x, 2) ## FUN returns non length 1 -try(frollapply(1:5, 3, f)) -f = function(x) { ## FUN sometimes returns non length 1 - n = length(x) - # length 1 will be returned only for first iteration where we check length - if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored! -} -frollapply(1:5, 3, f) -options(datatable.verbose=TRUE) -x = c(1,2,1,1,1,2,3,2) -frollapply(x, 3, uniqueN) ## FUN returns integer -numUniqueN = function(x) as.numeric(uniqueN(x)) -frollapply(x, 3, numUniqueN) -x = c(1,2,1,1,NA,2,NA,2) -frollapply(x, 3, anyNA) ## FUN returns logical -as.logical(frollapply(x, 3, anyNA)) -options(datatable.verbose=FALSE) -f = function(x) { ## FUN returns character - if (sum(x)>5) "big" else "small" -} -try(frollapply(1:5, 3, f)) -f = function(x) { ## FUN is not type-stable - n = length(x) - # double type will be returned only for first iteration where we check type - if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double -} -try(frollapply(1:5, 3, f)) } \seealso{ - \code{\link{shift}}, \code{\link{data.table}} + \code{\link{frollapply}}, \code{\link{frolladapt}}, \code{\link{shift}}, \code{\link{data.table}}, \code{\link{setDTthreads}} } \references{ \href{https://en.wikipedia.org/wiki/Round-off_error}{Round-off error} diff --git a/man/frolladapt.Rd b/man/frolladapt.Rd new file mode 100644 index 0000000000..477c7dd00c --- /dev/null +++ b/man/frolladapt.Rd @@ -0,0 +1,42 @@ +\name{frolladapt} +\alias{frolladapt} +\title{Adapt rolling window to irregularly spaced time series} +\description{ + Helper function to generate adaptive window size based on the irregularly spaced time series index. Generated adaptive window can be then used in rolling functions. See \code{\link{froll}} and \code{\link{frollapply}} for details. +} +\usage{ + frolladapt(x, n, align="right", partial=FALSE, give.names=FALSE) +} +\arguments{ + \item{x}{ Integer. Other objects of type numeric (including \code{Date}, \code{POSIXct} and any others numeric-based class) will be coerced to integer, which, for example, in case of \code{POSIXct} means truncating to whole seconds. Must be sorted, have no duplicate and have no missing values. } + \item{n}{ Integer vector giving rolling window size(s). This is the \emph{total} number of included values in aggregate function. Value corresponds to unit of \code{x}. When \code{x} is a \code{POSIXct} then \code{n} are seconds, when \code{x} is a \code{Date} then \code{n} are days. } + \item{align}{ Character, default \code{"right"}. Other aligntment than the ddefault has not yet been implemented. } + \item{partial}{ Logical, default \code{FALSE}. Should the rolling window size(s) provided in \code{N} be trimmed to available observations. For details see \code{\link{froll}}. } + \item{give.names}{ Logical, default \code{FALSE}. When \code{TRUE}, names are automatically generated corresponding to names of \code{N}. If answer is an integer vector, then the argument is ignored, see examples. } +} +\details{ + Argument \code{n} allows multiple values to generate multiple adaptive windows, unlike \code{x}, as mixing different time series would make no sense. +} +\value{ + When \code{length(n)==1L} then integer vector (\emph{adaptive window} size) of length of \code{x}. Otherwise a list of \code{length(n)} having integer vectors (\emph{adaptive window} sizes) of length of \code{x} for each window size provided in \code{n}. +} +\examples{ +idx = as.Date("2022-10-23") + c(0,1,4,5,6,7,9,10,14) +dt = data.table(index=idx, value=seq_along(idx)) +dt +dt[, n3 := frolladapt(index, n=3L)] +dt +dt[, rollmean3 := frollmean(value, n3, adaptive=TRUE)] +dt +dt[, n3p := frolladapt(index, n=3L, partial=TRUE)] +dt[, rollmean3p := frollmean(value, n3p, adaptive=TRUE)] +dt + +n34 = frolladapt(idx, c(small=3, big=4), give.names=TRUE) +n34 +dt[, frollmean(value, n34, adaptive=TRUE, give.names=TRUE)] +} +\seealso{ + \code{\link{froll}}, \code{\link{frollapply}} +} +\keyword{ data } diff --git a/man/frollapply.Rd b/man/frollapply.Rd new file mode 100644 index 0000000000..0acfb4f189 --- /dev/null +++ b/man/frollapply.Rd @@ -0,0 +1,272 @@ +\name{frollapply} +\alias{frollapply} +\alias{rollapply} +\title{Rolling user-defined function} +\description{ + Fast rolling user-defined function (\emph{UDF}) to calculate on sliding windows. Experimental. Please read, at least, \emph{caveats} section below. +} +\usage{ + frollapply(X, N, FUN, \dots, by.column=TRUE, fill=NA, + align=c("right","left","center"), adaptive=FALSE, partial=FALSE, + give.names=FALSE, simplify=TRUE, x, n) +} +\arguments{ + \item{X}{ Atomic vector, \code{data.frame}, \code{data.table} or \code{list}. When \code{by.column=TRUE} (default) then a non-atomic \code{X} is processed as \emph{vectorized} input, so rolling function is calculated for each column/vector (non-atomic columns/vectors are not supported). When \code{by.column=FALSE} then \code{X} expects to be a data.frame, data.table or a list of equal length vectors (non-atomic columns/vectors are not supported), so rolling function is calculated for \code{X} as data.frame/data.table/list rather than atomic vector. It supports \emph{vectorized} input as well, passing list of data.frames/data.tables, but not list of lists. } + \item{N}{ Integer vector giving rolling window size(s). This is the \emph{total} number of included values in aggregate function. Adaptive rolling functions also accept a list of integer vectors when applying multiple window sizes, see \code{adaptive} argument description for details. In both \code{adaptive} cases \code{N} may also be a list, supporting \emph{vectorized} input, then rolling function is calculated for each element of the list. } + \item{FUN}{ The function to be applied on a subsets of \code{X}. } + \item{\dots}{ Extra arguments passed to \code{FUN}. Note that arguments passed to \dots cannot have same names as arguments of \code{frollapply}. } + \item{by.column}{ Logical. When \code{TRUE} (default) then \code{X} of types list/data.frame/data.table is treated as vectorized input rather an object to apply rolling window on. Setting to \code{FALSE} allows rolling window to be applied on multiple variables, using data.frame, data.table or a list, as a whole. For details see \emph{\code{by.column} argument} section below. } + \item{fill}{ An object; value to pad by. Defaults to \code{NA}. When \code{partial=TRUE} this argument is ignored. } + \item{align}{ Character, specifying the "alignment" of the rolling window, defaulting to \code{"right"}. For details see \code{\link{froll}}. } + \item{adaptive}{ Logical, default \code{FALSE}. Should the rolling function be calculated adaptively? For details see \code{\link{froll}}. } + \item{partial}{ Logical, default \code{FALSE}. Should the rolling window size(s) provided in \code{N} be trimmed to available observations. For details see \code{\link{froll}}. } + \item{give.names}{ Logical, default \code{FALSE}. When \code{TRUE}, names are automatically generated corresponding to names of \code{X} and names of \code{N}. If answer is an atomic vector, then the argument is ignored, see examples. } + \item{simplify}{ Logical or a function. When \code{TRUE} (default) then internal \code{simplifylist} function is applied on a list storing results of all computations. When \code{FALSE} then list is returned without any post-processing. Argument can take a function as well, then the function is applied to a list that would have been returned when \code{simplify=FALSE}. If results are not automatically simplified when \code{simplify=TRUE} then, for backward compatibility, one should use \code{simplify=FALSE} explicitly. See \emph{\code{simplify} argument} section below for details. } + \item{x}{ Deprecated, use \code{X} instead. } + \item{n}{ Deprecated, use \code{N} instead. } +} +\value{ + A list except when the input is not \emph{vectorized} (\code{X} is not a list to apply function \emph{by column}, and \code{N} specify single rolling window), in which case a \code{vector} is returned, for convenience. Thus, rolling functions can be used conveniently within \code{data.table} syntax. +} +\note{ + Be aware that rolling functions operates on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, then one has to ensure that there are no gaps in the input, or use adaptive rolling function to handle gaps, for which we provide helper function \code{\link{frolladapt}} to generate adaptive window size. +} +\section{\code{by.column} argument}{ + Setting \code{by.column} to \code{FALSE} allows to apply function on multiple variables rather than a single vector. Then \code{X} expects to be data.table, data.table or a list of equal length vectors, and window size provided in \code{N} refers to number of rows (or length of a vectors in a list). See examples for use cases. Error \code{incorrect number of dimensions} can be commonly observed when \code{by.column} was not set to \code{FALSE} when \code{FUN} expects its input to be a data.frame/data.table. +} +\section{\code{simplify} argument}{ + One should avoid \code{simplify=TRUE} when writing robust code. One reason is performance, as explained in \emph{Performance consideration} section below. Another is backward compatibility. If results are not automatically simplified when \code{simplify=TRUE} then, for backward compatibility, one should use \code{simplify=FALSE} explicitly. In future version we may improve internal \code{simplifylist} function, then \code{simplify=TRUE} may return object of a different type, breaking downstream code. If results are already simplified with \code{simplify=TRUE}, then it can be considered backward compatible. +} +\section{Caveats}{ + With great power comes great responsibility. + \enumerate{ + \item{ An optimization used to avoid repeated allocation of window subsets (explained more deeply in \emph{Implementation} section below) may, in special cases, return rather surprising results: +\preformatted{ +setDTthreads(1) +frollapply(c(1, 9), n=1L, FUN=identity) ## unexpected +#[1] 9 9 +frollapply(c(1, 9), n=1L, FUN=list) ## unexpected +# V1 +# +#1: 9 +#2: 9 +setDTthreads(2) +frollapply(c(1, 9), n=1L, FUN=identity) ## good only because threads >= input +#[1] 1 9 +frollapply(c(1, 5, 9), n=1L, FUN=identity) ## unexpected again +#[1] 5 5 9 +} + Problem occurs, in rather unlikely scenarios for rolling computations, when objects returned from a function can be its input (i.e. \code{identity}), or a reference to it (i.e. \code{list}), then one has to add extra \code{copy} call: +\preformatted{ +setDTthreads(1) +frollapply(c(1, 9), n=1L, FUN=function(x) copy(identity(x))) ## only 'copy' would be equivalent here +#[1] 1 9 +frollapply(c(1, 9), n=1L, FUN=function(x) copy(list(x))) +# V1 +# +#1: 1 +#2: 9 +} } + \item{ \code{FUN} calls are internally passed to \code{parallel::mcparallel} to evaluate them in parallel. We inherit few limitations from \code{parallel} package explained below. This optimization can be disabled completely by calling \code{setDTthreads(1)}, then limitations listed below do not apply because all iterations of \code{FUN} evaluation will be made sequentially without use of \code{parallel} package. Note that on Windows platform this optimization is always disabled due to lack of \emph{fork} used by \code{parallel} package. One can use \code{options(datatable.verbose=TRUE)} to get extra information if \code{frollapply} is running multithreaded or not. + \itemize{ + \item{ Warnings produced inside the function are silently ignored. } + \item{ \code{FUN} should not use any on-screen devices, GUI elements, tcltk, multithreaded libraries. Note that \code{setDTthreads(1L)} is passed to forked processes, therefore any data.table code inside \code{FUN} will be forced to be single threaded. It is advised to not call \code{setDTthreads} inside \code{FUN}. \code{frollapply} is already parallelized and nested parallelism is rarely a good idea. } + \item{ Any operation that could misbehave when run in parallel has to be handled. For example writing to the same file from multiple CPU threads. +\preformatted{ +old = setDTthreads(1L) +frollapply(iris, 5L, by.column=FALSE, FUN=fwrite, file="rolling-data.csv", append=TRUE) +setDTthreads(old) +} } + \item{ Objects returned from forked processes, \code{FUN}, are serialized. This may cause problems for objects that are meant not to be serialized, like data.table. We are handling that for data.table class internally in \code{frollapply} whenever \code{FUN} is returning data.table (which is checked on the results of the first \code{FUN} call so it assumes function is type stable). If data.table is nested in another object returned from \code{FUN} then the problem may still manifest, in such case one has to call \code{setDT} on objects returned from \code{FUN}. This can be also nicely handled via \code{simplify} argument when passing a function that calls \code{setDT} on nested data.table objects returned from \code{FUN}. Anyway, returning data.table from \code{FUN} should, in majority of cases, be avoided from the performance reasons, see \emph{UDF optimization} section for details. +\preformatted{ +is.ok = function(x) {stopifnot(is.data.table(x)); format(attr(x, ".internal.selfref", TRUE))!=""} + +## frollapply will fix DT in most cases +ans = frollapply(1:2, 2, data.table, fill=data.table(NA)) +is.ok(ans) +#[1] TRUE +ans = frollapply(1:2, 2, data.table, fill=data.table(NA), simplify=FALSE) +is.ok(ans[[2L]]) +#[1] TRUE + +## nested DT not fixed +ans = frollapply(1:2, 2, function(x) list(data.table(x)), fill=list(data.table(NA)), simplify=FALSE) +is.ok(ans[[2L]][[1L]]) +#[1] FALSE +#### now if we want to use it +set(ans[[2L]][[1L]],, "newcol", 1L) +#Error in set(ans[[2L]][[1L]], , "newcol", 1L) : +# This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or setalloccol() on it first (to pre-allocate space for new columns) before assigning by reference to it. +#### fix as explained in error message +ans = lapply(ans, lapply, setDT) +is.ok(ans[[2L]][[1L]]) +#[1] TRUE + +## fix inside frollapply via simplify +simplifix = function(x) lapply(x, lapply, setDT) +ans = frollapply(1:2, 2, function(x) list(data.table(x)), fill=list(data.table(NA)), simplify=simplifix) +is.ok(ans[[2L]][[1L]]) +#[1] TRUE + +## automatic fix may not work for a non-type stable function +f = function(x) (if (x[1L]==1L) data.frame else data.table)(x) +ans = frollapply(1:3, 2, f, fill=data.table(NA), simplify=FALSE) +is.ok(ans[[3L]]) +#[1] FALSE +#### fix inside frollapply via simplify +simplifix = function(x) lapply(x, function(y) if (is.data.table(y)) setDT(y) else y) +ans = frollapply(1:3, 2, f, fill=data.table(NA), simplify=simplifix) +is.ok(ans[[3L]]) +#[1] TRUE +} } + } + } + \item{ Due to possible future improvements of handling simplification of results returned from rolling function, the default \code{simplify=TRUE} may not be backward compatible for functions that produce results that haven't been already automatically simplified. See \emph{\code{simplify} argument} section for details. } + } +} +\section{Performance consideration}{ + \code{frollapply} is meant to run any UDF function. If one needs to use a common function like \emph{mean, sum, max}, etc., then we have highly optimized, implemented in C language, rolling functions described in \code{\link{froll}} manual.\cr + Most crucial optimizations are the ones to be applied on UDF. Those are discussed in next section \emph{UDF optimization} below. + \itemize{ + \item{ When using \code{by.column=FALSE} one can subset dataset before passing it to \code{X} to keep only columns relevant for the computation: +\preformatted{ +x = setDT(lapply(1:100, function(x) as.double(rep.int(x,1e4L)))) +f = function(x) sum(x$V1*x$V2) +system.time(frollapply(x, 100, f, by.column=FALSE)) +# user system elapsed +# 0.157 0.067 0.081 +system.time(frollapply(x[, c("V1","V2"), with=FALSE], 100, f, by.column=FALSE)) +# user system elapsed +# 0.096 0.054 0.054 +} + } + \item{ Avoid partial, see \emph{\code{partial} argument} section of \code{\link{froll}} manual. } + \item{ Avoid \code{simplify=TRUE} and provide a function instead: +\preformatted{ +x = rnorm(1e5) +system.time(frollapply(x, 2, function(x) 1L, simplify=TRUE)) +# user system elapsed +# 0.308 0.076 0.196 +system.time(frollapply(x, 2, function(x) 1L, simplify=unlist)) +# user system elapsed +# 0.214 0.080 0.088 +} + } + \item{ CPU threads utilization in \code{frollapply} can be controlled by \code{\link{setDTthreads}}, which by default uses half of available CPU threads. } + \item{ Optimization that avoids repeated allocation of a window subset (see \emph{Implementation} section for details), in case of adaptive rolling function, depends on R's \emph{growable bit}. This feature has been added in R 3.4.0. Adaptive \code{frollapply} will still work on older versions of R but, due to repeated allocation of window subset, it will be much slower. } + \item{ Parallel computation of \code{FUN} is handled by \code{parallel} package (part of R core since 2.14.0) and its \emph{fork} mechanism. \emph{Fork} is not available on Windows OS therefore it will be always single threaded on that platform. } + } +} +\section{UDF optimization}{ + FUN will be evaluated many times so should be highly optimized. Tips below are not specific to \code{frollapply} and can be applied to any code is meant to run in many iterations. + \itemize{ + \item{ It is usually better to return the most lightweight objects from \code{FUN}, for example it will be faster to return a list rather a data.table. In the case presented below, \code{simplify=TRUE} is calling \code{rbindlist} on the results anyway, which makes the results equal: +\preformatted{ +fun1 = function(x) {tmp=range(x); data.table(min=tmp[1L], max=tmp[2L])} +fun2 = function(x) {tmp=range(x); list(min=tmp[1L], max=tmp[2L])} +fill1 = data.table(min=NA_integer_, max=NA_integer_) +fill2 = list(min=NA_integer_, max=NA_integer_) +system.time(a<-frollapply(1:1e4, 100, fun1, fill=fill1)) +# user system elapsed +# 2.047 0.337 0.788 +system.time(b<-frollapply(1:1e4, 100, fun2, fill=fill2)) +# user system elapsed +# 0.205 0.125 0.138 +all.equal(a, b) +#[1] TRUE +} } + \item{ Code that is not dependent on rolling window should be taken out as pre or post computation: +\preformatted{ +x = c(1L,3L) +system.time(for (i in 1:1e6) sum(x+1L)) +# user system elapsed +# 0.308 0.004 0.312 +system.time({y = x+1L; for (i in 1:1e6) sum(y)}) +# user system elapsed +# 0.203 0.000 0.202 +} } + \item{ Being strict about data types removes the need for R to handle them automatically: +\preformatted{ +x = vector("integer", 1e6) +system.time(for (i in 1:1e6) x[i] = NA) +# user system elapsed +# 0.160 0.000 0.161 +system.time(for (i in 1:1e6) x[i] = NA_integer_) +# user system elapsed +# 0.05 0.00 0.05 +} } + \item{ If a function calls another function under the hood, it is usually better to call the latter one directly: +\preformatted{ +x = matrix(c(1L,2L,3L,4L), c(2L,2L)) +system.time(for (i in 1:1e4) colSums(x)) +# user system elapsed +# 0.051 0.000 0.051 +system.time(for (i in 1:1e4) .colSums(x, 2L, 2L)) +# user system elapsed +# 0.015 0.000 0.015 +} } + \item{ There are many functions that may be optimized for scaling up for bigger input, yet for a small input they may carry bigger overhead comparing to their simpler counterparts. One may need to experiment on own data, but low overhead functions are likely be faster when evaluating in many iterations: +\preformatted{ +## uniqueN +x = c(1L,3L,5L) +system.time(for (i in 1:1e4) uniqueN(x)) +# user system elapsed +# 0.156 0.004 0.160 +system.time(for (i in 1:1e4) length(unique(x))) +# user system elapsed +# 0.040 0.004 0.043 +## column subset +x = data.table(v1 = c(1L,3L,5L)) +system.time(for (i in 1:1e4) x[, v1]) +# user system elapsed +# 3.197 0.004 3.201 +system.time(for (i in 1:1e4) x[["v1"]]) +# user system elapsed +# 0.063 0.000 0.063 +} } + } +} +\section{Implementation}{ + Evaluation of UDF comes with very limited capabilities for optimizations, therefore speed improvements in \code{frollapply} should not be expected as good as in other data.table fast functions. \code{frollapply} is implemented almost exclusively in R, rather than C. Its speed improvement comes from two optimizations that have been applied: + \enumerate{ + \item{ No repeated allocation of a rolling window subset.\cr + Object (type of \code{X} and size of \code{N}) is allocated once (for each CPU thread), and then for each iteration this object is being re-used by copying expected subset of data into it. This means we still have to subset data on each iteration, but we only copy data into pre-allocated window object, instead of allocating in each iteration. Allocation is carrying much bigger overhead than copy. The faster the \code{FUN} evaluates the more relative speedup we are getting, because allocation of a subset does not depend on how fast or slow \code{FUN} evaluates. See \emph{caveats} section for possible edge cases caused by this optimization. } + \item{ Parallel evaluation of \code{FUN} calls.\cr + Until now (October 2022) all the multithreaded code in data.table was using \emph{OpenMP}. It can be used only in C language and it has very low overhead. Unfortunately it could not be applied in \code{frollapply} because to evaluate UDF from C code one has to call R's C api that is not thread safe (can be run only from single threaded C code). Therefore \code{frollapply} uses \code{\link[parallel]{parallel-package}} to provide parallelism on R language level. It uses \emph{fork} parallelism, which has low overhead as well, unless results of computation are big in size. \emph{Fork} is not available on Windows OS. See \emph{caveats} section for limitations caused by using this optimization. } + } +} +\examples{ +frollapply(1:16, 4, median) +frollapply(1:9, 3, toString) + +## vectorized input +x = list(1:10, 10:1) +n = c(3, 4) +frollapply(x, n, sum) +## give names +x = list(data1 = 1:10, data2 = 10:1) +n = c(small = 3, big = 4) +frollapply(x, n, sum, give.names=TRUE) + +## by.column=FALSE +x = as.data.table(iris) +flow = function(x) { + v1 = x[[1L]] + v2 = x[[2L]] + (v1[2L] - v1[1L] * (1+v2[2L])) / v1[1L] +} +x[, + "flow" := frollapply(.(Sepal.Length, Sepal.Width), 2L, flow, by.column=FALSE), + by = Species + ][] + +## rolling regression: by.column=FALSE +f = function(x) coef(lm(v2 ~ v1, data=x)) +x = data.table(v1=rnorm(120), v2=rnorm(120)) +coef.fill = c("(Intercept)"=NA_real_, "v1"=NA_real_) +frollapply(x, 4, f, by.column=FALSE, fill=coef.fill) +} +\seealso{ + \code{\link{froll}}, \code{\link{frolladapt}}, \code{\link{shift}}, \code{\link{data.table}}, \code{\link{setDTthreads}} +} +\keyword{ data } diff --git a/src/data.table.h b/src/data.table.h index b966e86c08..292c944d64 100644 --- a/src/data.table.h +++ b/src/data.table.h @@ -7,6 +7,9 @@ # define USE_RINTERNALS // #3301 # define DATAPTR_RO(x) ((const void *)DATAPTR(x)) #endif +#if !defined(R_VERSION) || R_VERSION < R_Version(3, 4, 0) +# define SET_GROWABLE_BIT(x) // frollapply adaptive #5441 +#endif #include #define SEXPPTR_RO(x) ((const SEXP *)DATAPTR_RO(x)) // to avoid overhead of looped STRING_ELT and VECTOR_ELT #include // for uint64_t rather than unsigned long long @@ -196,26 +199,49 @@ void initDTthreads(); int getDTthreads(const int64_t n, const bool throttle); void avoid_openmp_hang_within_fork(); +typedef enum { // adding rolling functions here and in frollfunR in frollR.c + MEAN = 0, + SUM = 1, + MAX = 2, + MIN = 3, + PROD = 4 +} rollfun_t; // froll.c -void frollmean(unsigned int algo, double *x, uint64_t nx, ans_t *ans, int k, int align, double fill, bool narm, int hasna, bool verbose); -void frollmeanFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose); -void frollmeanExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose); -void frollsum(unsigned int algo, double *x, uint64_t nx, ans_t *ans, int k, int align, double fill, bool narm, int hasna, bool verbose); -void frollsumFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose); -void frollsumExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose); -void frollapply(double *x, int64_t nx, double *w, int k, ans_t *ans, int align, double fill, SEXP call, SEXP rho, bool verbose); +void frollfun(rollfun_t rfun, unsigned int algo, double *x, uint64_t nx, ans_t *ans, int k, int align, double fill, bool narm, int hasnf, bool verbose); +void frollmeanFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); +void frollmeanExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); +void frollsumFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); +void frollsumExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); +void frollmaxFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); +void frollmaxExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); +void frollminFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); +void frollminExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); +void frollprodFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); +void frollprodExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose); // frolladaptive.c -void fadaptiverollmean(unsigned int algo, double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose); -void fadaptiverollmeanFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose); -void fadaptiverollmeanExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose); -void fadaptiverollsum(unsigned int algo, double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose); -void fadaptiverollsumFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose); -void fadaptiverollsumExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose); +void frolladaptivefun(rollfun_t rfun, unsigned int algo, double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); +void frolladaptivemeanFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); +void frolladaptivemeanExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); +void frolladaptivesumFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); +void frolladaptivesumExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); +//void frolladaptivemaxFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); // does not exists as of now +void frolladaptivemaxExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); +//void frolladaptiveminFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); // does not exists as of now +void frolladaptiveminExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); +void frolladaptiveprodFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); +void frolladaptiveprodExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose); // frollR.c -SEXP frollfunR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP algo, SEXP align, SEXP narm, SEXP hasNA, SEXP adaptive); -SEXP frollapplyR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP align, SEXP rho); +SEXP frollfunR(SEXP fun, SEXP xobj, SEXP kobj, SEXP fill, SEXP algo, SEXP align, SEXP narm, SEXP hasnf, SEXP adaptive); +SEXP frolladapt(SEXP xobj, SEXP kobj, SEXP partial); + +// frollapply.c +SEXP memcpyVector(SEXP dest, SEXP src, SEXP offset, SEXP size); +SEXP memcpyDT(SEXP dest, SEXP src, SEXP offset, SEXP size); +SEXP memcpyVectoradaptive(SEXP dest, SEXP src, SEXP offset, SEXP size); +SEXP memcpyDTadaptive(SEXP dest, SEXP src, SEXP offset, SEXP size); +SEXP setgrowable(SEXP x); // nafill.c void nafillDouble(double *x, uint_fast64_t nx, unsigned int type, double fill, bool nan_is_na, ans_t *ans, bool verbose); @@ -247,7 +273,8 @@ SEXP coerceAs(SEXP x, SEXP as, SEXP copyArg); // types.c char *end(char *start); -void ansMsg(ans_t *ans, int n, bool verbose, const char *func); +void ansSetMsg(ans_t *ans, uint8_t status, const char *msg, const char *func); +void ansGetMsgs(ans_t *ans, int n, bool verbose, const char *func); SEXP testMsgR(SEXP status, SEXP x, SEXP k); //fifelse.c diff --git a/src/froll.c b/src/froll.c index 3ab7bd927a..d5471e027e 100644 --- a/src/froll.c +++ b/src/froll.c @@ -1,54 +1,113 @@ #include "data.table.h" -/* fast rolling mean - router +#undef SUM_WINDOW_STEP_FRONT +#define SUM_WINDOW_STEP_FRONT \ +if (R_FINITE(x[i])) { \ + w += x[i]; \ +} else if (ISNAN(x[i])) { \ + nc++; \ +} else if (x[i]==R_PosInf) { \ + pinf++; \ +} else if (x[i]==R_NegInf) { \ + ninf++; \ +} +#undef SUM_WINDOW_STEP_BACK +#define SUM_WINDOW_STEP_BACK \ +if (R_FINITE(x[i-k])) { \ + w -= x[i-k]; \ +} else if (ISNAN(x[i-k])) { \ + nc--; \ +} else if (x[i-k]==R_PosInf) { \ + pinf--; \ +} else if (x[i-k]==R_NegInf) { \ + ninf--; \ +} + +/* rolling fun - router for fun and algo * early stopping for window bigger than input - * also handles 'align' in single place - * algo = 0: frollmeanFast + * handles 'align' in single place for center or left + * rfun enum rollfun_t routes to rolling function + * algo = 0: fast * adding/removing in/out of sliding window of observations - * algo = 1: frollmeanExact - * recalculate whole mean for each observation, roundoff correction is adjusted, also support for NaN and Inf + * algo = 1: exact + * recalculate whole fun for each observation, for mean roundoff correction is adjusted */ -void frollmean(unsigned int algo, double *x, uint64_t nx, ans_t *ans, int k, int align, double fill, bool narm, int hasna, bool verbose) { - if (nx < k) { // if window width bigger than input just return vector of fill values +void frollfun(rollfun_t rfun, unsigned int algo, double *x, uint64_t nx, ans_t *ans, int k, int align, double fill, bool narm, int hasnf, bool verbose) { + double tic = 0; + if (verbose) + tic = omp_get_wtime(); + if (nx < k) { // if window width bigger than input just return vector of fill values if (verbose) snprintf(end(ans->message[0]), 500, _("%s: window width longer than input vector, returning all NA vector\n"), __func__); // implicit n_message limit discussed here: https://github.com/Rdatatable/data.table/issues/3423#issuecomment-487722586 - for (int i=0; idbl_v[i] = fill; } return; } - double tic = 0; - if (verbose) - tic = omp_get_wtime(); - if (algo==0) { - frollmeanFast(x, nx, ans, k, fill, narm, hasna, verbose); - } else if (algo==1) { - frollmeanExact(x, nx, ans, k, fill, narm, hasna, verbose); + switch (rfun) { + case MEAN : + if (algo==0) { + frollmeanFast(x, nx, ans, k, fill, narm, hasnf, verbose); + } else if (algo==1) { + frollmeanExact(x, nx, ans, k, fill, narm, hasnf, verbose); + } + break; + case SUM : + if (algo==0) { + frollsumFast(x, nx, ans, k, fill, narm, hasnf, verbose); + } else if (algo==1) { + frollsumExact(x, nx, ans, k, fill, narm, hasnf, verbose); + } + break; + case MAX : + if (algo==0) { + frollmaxFast(x, nx, ans, k, fill, narm, hasnf, verbose); + } else if (algo==1) { + frollmaxExact(x, nx, ans, k, fill, narm, hasnf, verbose); + } + break; + case MIN : + if (algo==0) { + frollminFast(x, nx, ans, k, fill, narm, hasnf, verbose); + } else if (algo==1) { + frollminExact(x, nx, ans, k, fill, narm, hasnf, verbose); + } + break; + case PROD : + if (algo==0) { + frollprodFast(x, nx, ans, k, fill, narm, hasnf, verbose); + } else if (algo==1) { + frollprodExact(x, nx, ans, k, fill, narm, hasnf, verbose); + } + break; + default: + error(_("Internal error: Unknown rfun value in froll: %d"), rfun); // #nocov } - if (ans->status < 3 && align < 1) { // align center or left, only when no errors occurred - int k_ = align==-1 ? k-1 : floor(k/2); // offset to shift + if (align < 1 && ans->status < 3) { + int k_ = align==-1 ? k-1 : floor(k/2); // offset to shift if (verbose) snprintf(end(ans->message[0]), 500, _("%s: align %d, shift answer by %d\n"), __func__, align, -k_); memmove((char *)ans->dbl_v, (char *)ans->dbl_v + (k_*sizeof(double)), (nx-k_)*sizeof(double)); // apply shift to achieve expected align - for (uint64_t i=nx-k_; idbl_v[i] = fill; } } if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: processing algo %u took %.3fs\n"), __func__, algo, omp_get_wtime()-tic); + snprintf(end(ans->message[0]), 500, _("%s: processing fun %d algo %u took %.3fs\n"), __func__, rfun, algo, omp_get_wtime()-tic); } + /* fast rolling mean - fast - * when no info on NA (hasNA argument) then assume no NAs run faster version + * when no info on NF (has.nf argument) then assume no NFs run faster version * rollmean implemented as single pass sliding window for align="right" - * if NAs detected re-run rollmean implemented as single pass sliding window with NA support + * if non-finite detected re-run rollmean implemented as single pass sliding window with NA support */ -void frollmeanFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose) { +void frollmeanFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose) { if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", window %d, hasna %d, narm %d\n"), "frollmeanFast", (uint64_t)nx, k, hasna, (int)narm); + snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollmeanFast", (uint64_t)nx, k, hasnf, (int)narm); long double w = 0.0; // sliding window aggregate - bool truehasna = hasna>0; // flag to re-run with NA support if NAs detected - if (!truehasna) { + bool truehasnf = hasnf>0; // flag to re-run with NA support if NAs detected + if (!truehasnf) { int i; // iterator declared here because it is being used after for loop for (i=0; idbl_v[i] = (double) (w / k); // rollfun to answer vector } if (!R_FINITE((double) w)) { // mark to re-run with NA care - if (hasna==-1) { // raise warning - ans->status = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, re-running with extra care for NAs\n"), __func__); - w = 0.0; - truehasna = true; + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + w = 0.0; truehasnf = true; } } else { // early stopping branch when NAs detected in first k obs - if (hasna==-1) { // raise warning - ans->status = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, skip non-NA attempt and run with extra care for NAs\n"), __func__); - w = 0.0; - truehasna = true; + ansSetMsg(ans, 0, "%s: non-finite values are present in input, skip non-finite inaware attempt and run with extra care for NFs straighaway\n", __func__); + w = 0.0; truehasnf = true; } } - if (truehasna) { - int nc = 0; // NA counter within sliding window + if (truehasnf) { + int nc = 0, pinf = 0, ninf = 0; // NA counter within sliding window int i; // iterator declared here because it is being used after for loop + +#undef MEAN_WINDOW_STEP_VALUE +#define MEAN_WINDOW_STEP_VALUE \ + if (nc == 0) { \ + if (pinf == 0) { \ + if (ninf == 0) { \ + ans->dbl_v[i] = (double) (w / k); \ + } else { \ + ans->dbl_v[i] = R_NegInf; \ + } \ + } else if (ninf == 0) { \ + ans->dbl_v[i] = R_PosInf; \ + } else { \ + ans->dbl_v[i] = R_NaN; \ + } \ + } else if (nc == k) { \ + ans->dbl_v[i] = narm ? R_NaN : NA_REAL; \ + } else { \ + if (narm) { \ + if (pinf == 0) { \ + if (ninf == 0) { \ + ans->dbl_v[i] = (double) (w / (k - nc)); \ + } else { \ + ans->dbl_v[i] = R_NegInf; \ + } \ + } else if (ninf == 0) { \ + ans->dbl_v[i] = R_PosInf; \ + } else { \ + ans->dbl_v[i] = R_NaN; \ + } \ + } else { \ + ans->dbl_v[i] = NA_REAL; \ + } \ + } + for (i=0; idbl_v[i] = fill; // partial window fill all } - if (R_FINITE(x[i])) { - w += x[i]; // i==k-1 - } else { - nc++; - } - if (nc == 0) { - ans->dbl_v[i] = (double) (w / k); // no NAs in first full window - } else if (nc == k) { - ans->dbl_v[i] = narm ? R_NaN : NA_REAL; // all values in sliding window are NA, expected output for fun(NA, na.rm=T/F) - } else { - ans->dbl_v[i] = narm ? (double) (w / (k - nc)) : NA_REAL; // some values in window are NA - } + SUM_WINDOW_STEP_FRONT // i==k-1 + MEAN_WINDOW_STEP_VALUE for (uint64_t i=k; idbl_v[i] = (double) (w / k); // no NAs in sliding window for present observation - } else if (nc == k) { - ans->dbl_v[i] = narm ? R_NaN : NA_REAL; // all values in window are NA, expected output for fun(NA, na.rm=T/F) - } else { - ans->dbl_v[i] = narm ? (double) (w / (k - nc)) : NA_REAL; // some values in window are NA - } + SUM_WINDOW_STEP_BACK + SUM_WINDOW_STEP_FRONT + MEAN_WINDOW_STEP_VALUE } } } /* fast rolling mean - exact - * when no info on NA (hasNA argument) then assume no NAs run faster version, also when na.rm=FALSE faster version can proceed + * when no info on NF (has.nf argument) then assume no NFs run faster version, also when na.rm=FALSE faster version can proceed * rollmean implemented as mean of k obs for each observation for align="right" - * if NAs detected and na.rm=TRUE then re-run rollmean implemented as mean of k bos for each observation with NA support + * if non-finite detected and na.rm=TRUE then re-run NF aware rollmean */ -void frollmeanExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose) { +void frollmeanExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose) { if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", window %d, hasna %d, narm %d\n"), "frollmeanExact", (uint64_t)nx, k, hasna, (int)narm); + snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollmeanExact", (uint64_t)nx, k, hasnf, (int)narm); for (int i=0; idbl_v[i] = fill; } - bool truehasna = hasna>0; // flag to re-run with NA support if NAs detected - if (!truehasna || !narm) { + bool truehasnf = hasnf>0; // flag to re-run with NA support if NAs detected + if (!truehasnf || !narm) { #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=k-1; idbl_v[i] = (double) (res + (err / k)); // adjust calculated rollfun with roundoff correction - } else { + } else if (ISNAN((double) w)) { if (!narm) { ans->dbl_v[i] = (double) (w / k); // NAs should be propagated } - truehasna = true; // NAs detected for this window, set flag so rest of windows will not be re-run + truehasnf = true; // NAs detected for this window, set flag so rest of windows will not be re-run + } else { + ans->dbl_v[i] = (double) w; // Inf and -Inf } } - if (truehasna) { - if (hasna==-1) { // raise warning - ans->status = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + if (truehasnf) { + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) { - if (narm) { - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, re-running with extra care for NAs\n"), __func__); - } else { - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, na.rm was FALSE so in 'exact' implementation NAs were handled already, no need to re-run\n"), __func__); - } + if (narm) + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + else + ansSetMsg(ans, 0, "%s: non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run\n", __func__); } } } - if (truehasna && narm) { + if (truehasnf && narm) { #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=k-1; i DBL_MAX) { - ans->dbl_v[i] = R_PosInf; // handle Inf for na.rm=TRUE consistently to base R - } else if (w < -DBL_MAX) { - ans->dbl_v[i] = R_NegInf; - } else { + if (R_FINITE((double) w)) { long double res = w / k; // keep results as long double for intermediate processing long double err = 0.0; // roundoff corrector if (nc == 0) { // no NAs in current window @@ -211,47 +266,22 @@ void frollmeanExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool } else { // nc == k ans->dbl_v[i] = R_NaN; // all values NAs and narm so produce expected values } + } else { + ans->dbl_v[i] = (double) w; // Inf and -Inf } } } } -/* fast rolling sum */ -void frollsum(unsigned int algo, double *x, uint64_t nx, ans_t *ans, int k, int align, double fill, bool narm, int hasna, bool verbose) { - if (nx < k) { - if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: window width longer than input vector, returning all NA vector\n"), __func__); - for (int i=0; idbl_v[i] = fill; - } - return; - } - double tic = 0; - if (verbose) - tic = omp_get_wtime(); - if (algo==0) { - frollsumFast(x, nx, ans, k, fill, narm, hasna, verbose); - } else if (algo==1) { - frollsumExact(x, nx, ans, k, fill, narm, hasna, verbose); - } - if (ans->status < 3 && align < 1) { - int k_ = align==-1 ? k-1 : floor(k/2); - if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: align %d, shift answer by %d\n"), __func__, align, -k_); - memmove((char *)ans->dbl_v, (char *)ans->dbl_v + (k_*sizeof(double)), (nx-k_)*sizeof(double)); - for (uint64_t i=nx-k_; idbl_v[i] = fill; - } - } - if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: processing algo %u took %.3fs\n"), __func__, algo, omp_get_wtime()-tic); -} -void frollsumFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose) { +/* fast rolling sum - fast + * same as mean fast + */ +void frollsumFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose) { if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", window %d, hasna %d, narm %d\n"), "frollsumFast", (uint64_t)nx, k, hasna, (int)narm); + snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollsumFast", (uint64_t)nx, k, hasnf, (int)narm); long double w = 0.0; - bool truehasna = hasna>0; - if (!truehasna) { + bool truehasnf = hasnf>0; + if (!truehasnf) { int i; for (i=0; idbl_v[i] = (double) w; } if (!R_FINITE((double) w)) { - if (hasna==-1) { - ans->status = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, re-running with extra care for NAs\n"), __func__); - w = 0.0; - truehasna = true; + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + w = 0.0; truehasnf = true; } } else { - if (hasna==-1) { - ans->status = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, skip non-NA attempt and run with extra care for NAs\n"), __func__); - w = 0.0; - truehasna = true; + ansSetMsg(ans, 0, "%s: non-finite values are present in input, skip non-finite inaware attempt and run with extra care for NFs straighaway\n", __func__); + w = 0.0; truehasnf = true; } } - if (truehasna) { - int nc = 0; - int i; - for (i=0; idbl_v[i] = fill; - } - if (R_FINITE(x[i])) { - w += x[i]; - } else { - nc++; - } - if (nc == 0) { - ans->dbl_v[i] = (double) w; - } else if (nc == k) { - ans->dbl_v[i] = narm ? 0.0 : NA_REAL; - } else { - ans->dbl_v[i] = narm ? (double) w : NA_REAL; + if (truehasnf) { + int nc = 0, pinf = 0, ninf = 0; // NA counter within sliding window + int i; // iterator declared here because it is being used after for loop + +#undef SUM_WINDOW_STEP_VALUE +#define SUM_WINDOW_STEP_VALUE \ +if (nc == 0) { \ + if (pinf == 0) { \ + if (ninf == 0) { \ + ans->dbl_v[i] = (double) w; \ + } else { \ + ans->dbl_v[i] = R_NegInf; \ + } \ + } else if (ninf == 0) { \ + ans->dbl_v[i] = R_PosInf; \ + } else { \ + ans->dbl_v[i] = R_NaN; \ + } \ +} else if (nc == k) { \ + ans->dbl_v[i] = narm ? 0.0 : NA_REAL; \ +} else { \ + if (narm) { \ + if (pinf == 0) { \ + if (ninf == 0) { \ + ans->dbl_v[i] = (double) w; \ + } else { \ + ans->dbl_v[i] = R_NegInf; \ + } \ + } else if (ninf == 0) { \ + ans->dbl_v[i] = R_PosInf; \ + } else { \ + ans->dbl_v[i] = R_NaN; \ + } \ + } else { \ + ans->dbl_v[i] = NA_REAL; \ + } \ +} + + for (i=0; idbl_v[i] = fill; // partial window fill all } - for (uint64_t i=k; idbl_v[i] = (double) w; - } else if (nc == k) { - ans->dbl_v[i] = narm ? 0.0 : NA_REAL; - } else { - ans->dbl_v[i] = narm ? (double) w : NA_REAL; - } + SUM_WINDOW_STEP_FRONT // i==k-1 + SUM_WINDOW_STEP_VALUE + for (uint64_t i=k; imessage[0]), 500, _("%s: running in parallel for input length %"PRIu64", window %d, hasna %d, narm %d\n"), "frollsumExact", (uint64_t)nx, k, hasna, (int)narm); + snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollsumExact", (uint64_t)nx, k, hasnf, (int)narm); for (int i=0; idbl_v[i] = fill; } - bool truehasna = hasna>0; - if (!truehasna || !narm) { + bool truehasnf = hasnf>0; + if (!truehasnf || !narm) { #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=k-1; idbl_v[i] = (double) w; - } else { + } else if (ISNAN((double) w)) { if (!narm) { ans->dbl_v[i] = (double) w; } - truehasna = true; + truehasnf = true; + } else { + ans->dbl_v[i] = (double) w; } } - if (truehasna) { - if (hasna==-1) { - ans->status = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + if (truehasnf) { + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) { - if (narm) { - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, re-running with extra care for NAs\n"), __func__); - } else { - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, na.rm was FALSE so in 'exact' implementation NAs were handled already, no need to re-run\n"), __func__); - } + if (narm) + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + else + ansSetMsg(ans, 0, "%s: non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run\n", __func__); } } } - if (truehasna && narm) { + if (truehasnf && narm) { #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=k-1; i DBL_MAX) { + if (w > DBL_MAX) { // in contrast to mean, here we can overflow long double more than DBL_MAX ans->dbl_v[i] = R_PosInf; } else if (w < -DBL_MAX) { ans->dbl_v[i] = R_NegInf; @@ -397,68 +430,552 @@ void frollsumExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool } } -/* fast rolling any R function - * not plain C, not thread safe - * R eval() allocates +static inline void wmax(double *x, uint64_t o, int k, double *w, uint64_t *iw, bool narm) { + if (narm) { + for (int i=0; i= w[0]) { // this never true if all x NAs and narm=TRUE + iw[0] = o+i-k+1; + w[0] = x[iw[0]]; + } + } + } else { + double ww = R_NegInf; + uint64_t iww = 0; + for (int i=0; i x[i]: NA > NaN + } else { // no NA in window so NaN >= than any non-NA + iww = ii; ww = R_NaN; + } + } else if (ISNAN(ww)) { + // w still within the window and is NA or NaN, x[i] is not NA - already checked above, therefore to nothing + } else if (x[ii] >= ww) { + iww = ii; ww = x[iww]; + } + } + iw[0] = iww; + w[0] = ww; + } +} +/* fast rolling max - fast + * fast online algorithm do single pass over elements keeping track of recent max and its index + * if index of max is within progressing window then it keeps running single pass + * whenever max is leaving the window (index of max is outside of iterator minus window size) then new maximum is computed via nested loop on current location + * new max is used to continue outer single pass as long as new max index is not leaving the running window + * should scale well for bigger window size, may carry overhead for small window, needs benchmarking */ -void frollapply(double *x, int64_t nx, double *w, int k, ans_t *ans, int align, double fill, SEXP call, SEXP rho, bool verbose) { - if (nx < k) { - if (verbose) - Rprintf(_("%s: window width longer than input vector, returning all NA vector\n"), __func__); - for (int i=0; imessage[0]), 500, _("%s: running for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollmaxFast", (uint64_t)nx, k, hasnf, (int)narm); + double w = R_NegInf; // window max + uint64_t cmax = 0; // counter of nested loops for verbose + uint64_t iw = 0; // index of window max + uint64_t i; + if (narm || hasnf==-1) { + for (i=0; i= w) { // >= rather than > because we track most recent maximum using iw + iw = i; w = x[iw]; + } ans->dbl_v[i] = fill; } - return; + for (i=k-1; i= w) { + iw = i; w = x[iw]; + } + ans->dbl_v[i] = w; + } + } else { + bool truehasnf = hasnf>0; + for (i=0; i x[i]: NA > NaN + } else { + iw = i; w = R_NaN; + } + } else if (x[i] >= w) { + iw = i; w = x[iw]; + } + ans->dbl_v[i] = fill; + } + if (!truehasnf) { // maybe no NAs + for (; i= w) { + iw = i; w = x[iw]; + } + ans->dbl_v[i] = w; + } + } + if (truehasnf) { + for (; i x[i]: NA > NaN + } else { // no NA in window so NaN >= than any non-NA + iw = i; w = R_NaN; + } + } else if (iw+k <= i) { // max left current window + iw = i-k; w = R_NegInf; + wmax(x, i, k, &w, &iw, false); cmax++; + } else if (ISNAN(w)) { + // w still within the window and is NA or NaN, x[i] is not NA - already checked above, therefore do nothing + } else if (x[i] >= w) { + iw = i; w = x[iw]; + } + ans->dbl_v[i] = w; + } + } } - double tic = 0; if (verbose) - tic = omp_get_wtime(); + snprintf(end(ans->message[0]), 500, _("%s: nested window max calculation called %"PRIu64" times\n"), __func__, cmax); +} +/* fast rolling max - exact + * for each observation in x compute max in window from scratch + * faster version ignores NAs (narm=T or has.nf=F), as they are not propagated by `>` operator + * otherwise we scan for NaN/NA and run either of two loops + * has.nf=FALSE can give incorrect results if NAs provided, documented to be used with care + */ +void frollmaxExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose) { + if (verbose) + snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollmaxExact", (uint64_t)nx, k, hasnf, (int)narm); for (int i=0; idbl_v[i] = fill; } - // this is i=k-1 iteration - first full window - taken out from the loop - // we use it to add extra check that results of a FUN are length 1 numeric - memcpy(w, x, k*sizeof(double)); - SEXP eval0 = PROTECT(eval(call, rho)); - if (xlength(eval0) != 1) - error(_("%s: results from provided FUN are not length 1"), __func__); - SEXPTYPE teval0 = TYPEOF(eval0); - if (teval0 == REALSXP) { - ans->dbl_v[k-1] = REAL(eval0)[0]; + if (narm || hasnf==-1) { // ignore NAs as > does not propagate + #pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=k-1; i w) + w = x[i+j]; + } + ans->dbl_v[i] = w; + } + } else { + bool *isnan = malloc(nx*sizeof(bool)); // isnan lookup - we use it to reduce ISNAN calls in nested loop + if (!isnan) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for isnan", __func__); // raise error + free(isnan); + return; + } // # nocov end + bool truehasnf = hasnf>0; + for (uint64_t i=0; i w) + w = x[i+j]; + } + ans->dbl_v[i] = w; + } + } else { // there are some NAs + #pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=k-1; i NaN + } else { + w = R_NaN; // continue nested loop in case there is NA there + } + } else if (x[i+j] > w) + w = x[i+j]; + } + } + ans->dbl_v[i] = w; + } + } + } +} + +static inline void wmin(double *x, uint64_t o, int k, double *w, uint64_t *iw, bool narm) { + if (narm) { + for (int i=0; idbl_v[k-1] = REAL(coerceVector(eval0, REALSXP))[0]; - } else { - error(_("%s: results from provided FUN are not of type double"), __func__); + double ww = R_PosInf; + uint64_t iww = 0; + for (int i=0; i x[i]: NA > NaN + } else { // no NA in window so NaN >= than any non-NA + iww = ii; ww = R_NaN; + } + } else if (ISNAN(ww)) { + // w still within the window and is NA or NaN, x[i] is not NA - already checked above, therefore to nothing + } else if (x[ii] <= ww) { + iww = ii; ww = x[iww]; + } + } + iw[0] = iww; + w[0] = ww; + } +} +/* fast rolling min - fast + * see rolling max fast details + */ +void frollminFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose) { + if (verbose) + snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollminFast", (uint64_t)nx, k, hasnf, (int)narm); + double w = R_PosInf; // window min + uint64_t cmin = 0; // counter of nested loops for verbose + uint64_t iw = 0; // index of window min + uint64_t i; + if (narm || hasnf==-1) { + for (i=0; idbl_v[i] = fill; + } + for (i=k-1; idbl_v[i] = w; + } + } else { + bool truehasnf = hasnf>0; + for (i=0; i x[i]: NA > NaN + } else { + iw = i; w = R_NaN; + } + } else if (x[i] <= w) { + iw = i; w = x[iw]; + } + ans->dbl_v[i] = fill; } + if (!truehasnf) { // maybe no NAs + for (; idbl_v[i] = w; + } + } + if (truehasnf) { + for (; i x[i]: NA > NaN + } else { // no NA in window so NaN >= than any non-NA + iw = i; w = R_NaN; + } + } else if (iw+k <= i) { // min left current window // note that it is still <= same as in max + iw = i-k; w = R_PosInf; + wmin(x, i, k, &w, &iw, false); cmin++; + } else if (ISNAN(w)) { + // w still within the window and is NA or NaN, x[i] is not NA - already checked above, therefore do nothing + } else if (x[i] <= w) { + iw = i; w = x[iw]; + } + ans->dbl_v[i] = w; + } + } + } + if (verbose) + snprintf(end(ans->message[0]), 500, _("%s: nested window min calculation called %"PRIu64" times\n"), __func__, cmin); +} +/* fast rolling min - exact + * see rolling max exact details + */ +void frollminExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasnf, bool verbose) { + if (verbose) + snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollminExact", (uint64_t)nx, k, hasnf, (int)narm); + for (int i=0; idbl_v[i] = fill; } - UNPROTECT(1); // eval0 - // for each row it copies expected window data into w - // evaluate call which has been prepared to point into w - if (teval0 == REALSXP) { - for (int64_t i=k; idbl_v[i] = REAL(eval(call, rho))[0]; // this may fail with for a not type-stable fun + if (narm || hasnf==-1) { // ignore NAs as > does not propagate +#pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=k-1; idbl_v[i] = w; } } else { - for (int64_t i=k; idbl_v[i] = REAL(coerceVector(evali, REALSXP))[0]; - UNPROTECT(1); // evali + bool *isnan = malloc(nx*sizeof(bool)); // isnan lookup - we use it to reduce ISNAN calls in nested loop + if (!isnan) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for isnan", __func__); // raise error + free(isnan); + return; + } // # nocov end + bool truehasnf = hasnf>0; + for (uint64_t i=0; idbl_v[i] = w; + } + } else { // there are some NAs +#pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=k-1; i NaN + } else { + w = R_NaN; // continue nested loop in case there is NA there + } + } else if (x[i+j] < w) + w = x[i+j]; + } + } + ans->dbl_v[i] = w; + } } } - if (ans->status < 3 && align < 1) { - int k_ = align==-1 ? k-1 : floor(k/2); - if (verbose) - Rprintf(_("%s: align %d, shift answer by %d\n"), __func__, align, -k_); - memmove((char *)ans->dbl_v, (char *)ans->dbl_v + (k_*sizeof(double)), (nx-k_)*sizeof(double)); - for (int64_t i=nx-k_; imessage[0]), 500, _("%s: running for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollprodFast", (uint64_t)nx, k, hasnf, (int)narm); + long double w = 1.0; + bool truehasnf = hasnf>0; + if (!truehasnf) { + int i; + for (i=0; idbl_v[i] = fill; } + w *= x[i]; + ans->dbl_v[i] = (double) w; + if (R_FINITE((double) w)) { + for (uint64_t i=k; idbl_v[i] = (double) w; + } + if (!R_FINITE((double) w)) { + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); + if (verbose) + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + w = 1.0; truehasnf = true; + } + } else { + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); + if (verbose) + ansSetMsg(ans, 0, "%s: non-finite values are present in input, skip non-finite inaware attempt and run with extra care for NFs straighaway\n", __func__); + w = 1.0; truehasnf = true; + } + } + if (truehasnf) { + int nc = 0, pinf = 0, ninf = 0; // NA counter within sliding window + int i; // iterator declared here because it is being used after for loop + +#undef PROD_WINDOW_STEP_FRONT +#define PROD_WINDOW_STEP_FRONT \ + if (R_FINITE(x[i])) { \ + w *= x[i]; \ + } else if (ISNAN(x[i])) { \ + nc++; \ + } else if (x[i]==R_PosInf) { \ + pinf++; \ + } else if (x[i]==R_NegInf) { \ + ninf++; \ + } +#undef PROD_WINDOW_STEP_BACK +#define PROD_WINDOW_STEP_BACK \ + if (R_FINITE(x[i-k])) { \ + w /= x[i-k]; \ + } else if (ISNAN(x[i-k])) { \ + nc--; \ + } else if (x[i-k]==R_PosInf) { \ + pinf--; \ + } else if (x[i-k]==R_NegInf) { \ + ninf--; \ } +#undef PROD_WINDOW_STEP_VALUE +#define PROD_WINDOW_STEP_VALUE \ + if (nc == 0) { \ + if (pinf == 0 && ninf == 0) { \ + ans->dbl_v[i] = (double) w; \ + } else { \ + ans->dbl_v[i] = (ninf+(w<0))%2 ? R_NegInf : R_PosInf; \ + } \ + } else if (nc == k) { \ + ans->dbl_v[i] = narm ? 1.0 : NA_REAL; \ + } else { \ + if (narm) { \ + if (pinf == 0 && ninf == 0) { \ + ans->dbl_v[i] = (double) w; \ + } else { \ + ans->dbl_v[i] = (ninf+(w<0))%2 ? R_NegInf : R_PosInf; \ + } \ + } else { \ + ans->dbl_v[i] = NA_REAL; \ + } \ + } + + for (i=0; idbl_v[i] = fill; // partial window fill all + } + PROD_WINDOW_STEP_FRONT // i==k-1 + PROD_WINDOW_STEP_VALUE + for (uint64_t i=k; imessage[0]), 500, _("%s: running in parallel for input length %"PRIu64", window %d, hasnf %d, narm %d\n"), "frollprodExact", (uint64_t)nx, k, hasnf, (int)narm); + for (int i=0; idbl_v[i] = fill; + } + bool truehasnf = hasnf>0; + if (!truehasnf || !narm) { +#pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=k-1; idbl_v[i] = (double) w; + } else if (ISNAN((double) w)) { + if (!narm) { + ans->dbl_v[i] = (double) w; + } + truehasnf = true; + } else { + ans->dbl_v[i] = (double) w; + } + } + if (truehasnf) { + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); + if (verbose) { + if (narm) + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + else + ansSetMsg(ans, 0, "%s: non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run\n", __func__); + } + } + } + if (truehasnf && narm) { +#pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=k-1; i DBL_MAX) { // in contrast to mean, here we can overflow long double more than DBL_MAX + ans->dbl_v[i] = R_PosInf; + } else if (w < -DBL_MAX) { + ans->dbl_v[i] = R_NegInf; + } else { + if (nc < k) { + ans->dbl_v[i] = (double) w; + } else { + ans->dbl_v[i] = 1.0; + } + } + } + } } diff --git a/src/frollR.c b/src/frollR.c index 74cc7dd4ef..fe2769471f 100644 --- a/src/frollR.c +++ b/src/frollR.c @@ -1,7 +1,8 @@ #include "data.table.h" // first (before Rdefines.h) for clang-13-omp, #5122 #include -SEXP coerceToRealListR(SEXP obj) { +// validate and coerce to list of real +SEXP coerceX(SEXP obj) { // accept atomic/list of integer/logical/real returns list of real int protecti = 0; if (isVectorAtomic(obj)) { @@ -20,83 +21,91 @@ SEXP coerceToRealListR(SEXP obj) { UNPROTECT(protecti); return x; } - -SEXP frollfunR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP algo, SEXP align, SEXP narm, SEXP hasna, SEXP adaptive) { +// validate and coerce to integer or list of integer +SEXP coerceK(SEXP obj, bool adaptive) { int protecti = 0; - const bool verbose = GetVerbose(); - - if (!xlength(obj)) - return(obj); // empty input: NULL, list() - double tic = 0; - if (verbose) - tic = omp_get_wtime(); - SEXP x = PROTECT(coerceToRealListR(obj)); protecti++; - R_len_t nx=length(x); // number of columns to roll on - - if (xlength(k) == 0) // check that window is non zero length - error(_("n must be non 0 length")); - - if (!IS_TRUE_OR_FALSE(adaptive)) - error(_("%s must be TRUE or FALSE"), "adaptive"); - bool badaptive = LOGICAL(adaptive)[0]; - - R_len_t nk = 0; // number of rolling windows, for adaptive might be atomic to be wrapped into list, 0 for clang -Wall - SEXP ik = R_NilValue; // holds integer window width, if doing non-adaptive roll fun - SEXP kl = R_NilValue; // holds adaptive window width, if doing adaptive roll fun - if (!badaptive) { // validating n input for adaptive=FALSE - if (isNewList(k)) + SEXP ans = R_NilValue; + if (!adaptive) { + if (isNewList(obj)) error(_("n must be integer, list is accepted for adaptive TRUE")); - - if (isInteger(k)) { // check that k is integer vector - ik = k; - } else if (isReal(k)) { // if n is double then convert to integer - ik = PROTECT(coerceVector(k, INTSXP)); protecti++; + if (isInteger(obj)) { + ans = obj; + } else if (isReal(obj)) { + ans = PROTECT(coerceVector(obj, INTSXP)); protecti++; } else { error(_("n must be integer")); } - - nk = length(k); - R_len_t i=0; // check that all window values positive - while (i < nk && INTEGER(ik)[i] > 0) i++; + int nk = length(obj); + R_len_t i = 0; + int *iik = INTEGER(ans); + while (i < nk && iik[i] > 0) i++; if (i != nk) error(_("n must be positive integer values (> 0)")); - } else { // validating n input for adaptive=TRUE - if (isVectorAtomic(k)) { // if not-list then wrap into list - kl = PROTECT(allocVector(VECSXP, 1)); protecti++; - if (isInteger(k)) { // check that k is integer vector - SET_VECTOR_ELT(kl, 0, k); - } else if (isReal(k)) { // if n is double then convert to integer - SET_VECTOR_ELT(kl, 0, coerceVector(k, INTSXP)); + } else { + if (isVectorAtomic(obj)) { + ans = PROTECT(allocVector(VECSXP, 1)); protecti++; + if (isInteger(obj)) { + SET_VECTOR_ELT(ans, 0, obj); + } else if (isReal(obj)) { + SET_VECTOR_ELT(ans, 0, coerceVector(obj, INTSXP)); } else { error(_("n must be integer vector or list of integer vectors")); } - nk = 1; } else { - nk = length(k); - kl = PROTECT(allocVector(VECSXP, nk)); protecti++; - for (R_len_t i=0; i 0 && (inx[i]!=inx[i-1])) // variable length list input not allowed for adaptive roll error(_("adaptive rolling function can only process 'x' having equal length of elements, like data.table or data.frame; If you want to call rolling function on list having variable length of elements call it for each field separately")); - if (xlength(VECTOR_ELT(kl, j))!=inx[0]) // check that length of integer vectors in n list match to xrows[0] ([0] and not [i] because there is above check for equal xrows) + if (xlength(VECTOR_ELT(k, j))!=inx[0]) // check that length of integer vectors in n list match to xrows[0] ([0] and not [i] because there is above check for equal xrows) error(_("length of integer vector(s) provided as list to 'n' argument must be equal to number of observations provided in 'x'")); } SET_VECTOR_ELT(ans, i*nk+j, allocVector(REALSXP, inx[i]));// allocate answer vector for this column-window @@ -132,11 +141,17 @@ SEXP frollfunR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP algo, SEXP align, SEX dx[i] = REAL(VECTOR_ELT(x, i)); // assign source columns to C pointers } - enum {MEAN, SUM} sfun; + rollfun_t rfun; // adding fun needs to be here and data.table.h if (!strcmp(CHAR(STRING_ELT(fun, 0)), "mean")) { - sfun = MEAN; + rfun = MEAN; } else if (!strcmp(CHAR(STRING_ELT(fun, 0)), "sum")) { - sfun = SUM; + rfun = SUM; + } else if (!strcmp(CHAR(STRING_ELT(fun, 0)), "max")) { + rfun = MAX; + } else if (!strcmp(CHAR(STRING_ELT(fun, 0)), "min")) { + rfun = MIN; + } else if (!strcmp(CHAR(STRING_ELT(fun, 0)), "prod")) { + rfun = PROD; } else { error(_("Internal error: invalid %s argument in %s function should have been caught earlier. Please report to the data.table issue tracker."), "fun", "rolling"); // # nocov } @@ -149,10 +164,10 @@ SEXP frollfunR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP algo, SEXP align, SEX bool bnarm = LOGICAL(narm)[0]; - int ihasna = // plain C tri-state boolean as integer - LOGICAL(hasna)[0]==NA_LOGICAL ? 0 : // hasna NA, default, no info about NA - LOGICAL(hasna)[0]==TRUE ? 1 : // hasna TRUE, might be some NAs - -1; // hasna FALSE, there should be no NAs + int ihasnf = // plain C tri-state boolean as integer + LOGICAL(hasnf)[0]==NA_LOGICAL ? 0 : // hasnf NA, default, no info about NA + LOGICAL(hasnf)[0]==TRUE ? 1 : // hasnf TRUE, might be some NAs + -1; // hasnf FALSE, there should be no NAs // or there must be no NAs for rollmax #5441 unsigned int ialgo; // decode algo to integer if (!strcmp(CHAR(STRING_ELT(algo, 0)), "fast")) @@ -162,15 +177,6 @@ SEXP frollfunR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP algo, SEXP align, SEX else error(_("Internal error: invalid %s argument in %s function should have been caught earlier. Please report to the data.table issue tracker."), "algo", "rolling"); // # nocov - int* iik = NULL; - if (!badaptive) { - if (!isInteger(ik)) - error(_("Internal error: badaptive=%d but ik is not integer"), badaptive); // # nocov - iik = INTEGER(ik); // pointer to non-adaptive window width, still can be vector when doing multiple windows - } else { - // ik is still R_NilValue from initialization. But that's ok as it's only needed below when !badaptive. - } - if (verbose) { if (ialgo==0) Rprintf(_("%s: %d column(s) and %d window(s), if product > 1 then entering parallel execution\n"), __func__, nx, nk); @@ -180,121 +186,76 @@ SEXP frollfunR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP algo, SEXP align, SEX #pragma omp parallel for if (ialgo==0) schedule(dynamic) collapse(2) num_threads(getDTthreads(nx*nk, false)) for (R_len_t i=0; i 0)"); + int *x = INTEGER(xobj); + int64_t len = XLENGTH(xobj); // can be 0 + + if (len && x[0] == NA_INTEGER) + error("Index provided to 'x' must: be sorted, have no duplicates, have no NAs"); // error text for consistency to the one below + for (int64_t i=1; i n) { + error("internal error: an > n, should not increment i in the first place"); // # nocov + } else if (an == n) { // an is same size as n, so we either have no gaps or will need to shrink an by j++ + if (lhs == rhs+n-1) { // no gaps - or a k gaps and a k dups? + ians[i] = n; // could skip if pre-fill + i++; + j++; + } else if (lhs > rhs+n-1) { // need to shrink an + j++; } else { - error(_("n must be integer")); + error("internal error: not sorted, should be been detected by now"); // # nocov + } + } else if (an < n) { // there are some gaps + if (lhs == rhs+n-1) { // gap and rhs matches the bound, so increment i and j + ians[i] = an; + i++; + j++; + } else if (lhs > rhs+n-1L) { // need to shrink an + ians[i] = an; // likely to be overwritten by smaller an if shrinking continues because i is not incremented in this iteration + j++; + } else if (lhs < rhs+n-1L) { + ians[i] = !p && lhsmessage[0]), 500, _("%s: algo %u not implemented, fall back to %u\n"), __func__, algo, (unsigned int) 1); + } + frolladaptivemaxExact(x, nx, ans, k, fill, narm, hasnf, verbose); + break; + case MIN : + if (algo==0 && verbose) { + //frolladaptiveminFast(x, nx, ans, k, fill, narm, hasnf, verbose); // frolladaptiveminFast does not exists as of now + snprintf(end(ans->message[0]), 500, _("%s: algo %u not implemented, fall back to %u\n"), __func__, algo, (unsigned int) 1); + } + frolladaptiveminExact(x, nx, ans, k, fill, narm, hasnf, verbose); + break; + case PROD : + if (algo==0) { + frolladaptiveprodFast(x, nx, ans, k, fill, narm, hasnf, verbose); + } else if (algo==1) { + frolladaptiveprodExact(x, nx, ans, k, fill, narm, hasnf, verbose); + } + break; + default: + error(_("Internal error: Unknown rfun value in froll: %d"), rfun); // #nocov } if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: processing algo %u took %.3fs\n"), __func__, algo, omp_get_wtime()-tic); - // implicit n_message limit discussed here: https://github.com/Rdatatable/data.table/issues/3423#issuecomment-487722586 + snprintf(end(ans->message[0]), 500, _("%s: processing fun %d algo %u took %.3fs\n"), __func__, rfun, algo, omp_get_wtime()-tic); } -/* fast adaptive rolling mean - fast - * when no info on NA (hasNA argument) then assume no NAs run faster version + +/* fast rolling adaptive mean - fast + * when no info on NF (has.nf argument) then assume no NFs run faster version * adaptive rollmean implemented as cumsum first pass, then diff cumsum by indexes `i` to `i-k[i]` - * if NAs detected re-run rollmean implemented as cumsum with NA support + * if NFs detected re-run rollmean implemented as cumsum with NF support */ -void fadaptiverollmeanFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose) { +void frolladaptivemeanFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose) { if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", hasna %d, narm %d\n"), "fadaptiverollmeanFast", (uint64_t)nx, hasna, (int) narm); - bool truehasna = hasna>0; // flag to re-run if NAs detected + snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", hasnf %d, narm %d\n"), "frolladaptivemeanFast", (uint64_t)nx, hasnf, (int) narm); + bool truehasnf = hasnf>0; // flag to re-run if NAs detected long double w = 0.0; double *cs = malloc(nx*sizeof(double)); // cumsum vector, same as double cs[nx] but no segfault if (!cs) { // # nocov start - ans->status = 3; // raise error - snprintf(ans->message[3], 500, _("%s: Unable to allocate memory for cumsum"), __func__); + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cumsum", __func__); // raise error free(cs); return; } // # nocov end - if (!truehasna) { + if (!truehasnf) { for (uint64_t i=0; idbl_v[i] = fill; // position in a vector smaller than obs window width - partial window } } - } else { // update truehasna flag if NAs detected - if (hasna==-1) { // raise warning - ans->status = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + } else { // update truehasnf flag if NAs detected + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, re-running with extra care for NAs\n"), __func__); - w = 0.0; - truehasna = true; + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + w = 0.0; truehasnf = true; } } - if (truehasna) { - uint64_t nc = 0; // running NA counter + if (truehasnf) { + uint64_t nc = 0, pinf = 0, ninf = 0; // running NA counter uint64_t *cn = malloc(nx*sizeof(uint64_t)); // cumulative NA counter, used the same way as cumsum, same as uint64_t cn[nx] but no segfault if (!cn) { // # nocov start - ans->status = 3; // raise error - snprintf(ans->message[3], 500, _("%s: Unable to allocate memory for cum NA counter"), __func__); - free(cs); - free(cn); + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cum NA counter", __func__); // raise error + free(cs); free(cn); + return; + } // # nocov end + uint64_t *cpinf = malloc(nx*sizeof(uint64_t)); + if (!cpinf) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cum Inf counter", __func__); // raise error + free(cs); free(cn); free(cpinf); + return; + } // # nocov end + uint64_t *cninf = malloc(nx*sizeof(uint64_t)); + if (!cninf) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cum -Inf counter", __func__); // raise error + free(cs); free(cn); free(cpinf); free(cninf); return; } // # nocov end for (uint64_t i=0; i0) { \ + if (narm) { \ + if (wpinf > 0) { \ + if (wninf > 0) { \ + ans->dbl_v[i] = R_NaN; \ + } else { \ + ans->dbl_v[i] = R_PosInf; \ + } \ + } else if (wninf > 0) { \ + ans->dbl_v[i] = R_NegInf; \ + } else { \ + int thisk = k[i] - ((int) wn); \ + ans->dbl_v[i] = thisk==0 ? R_NaN : ws/thisk; \ + } \ + } else { \ + ans->dbl_v[i] = NA_REAL; \ + } \ + } else { \ + if (wpinf > 0) { \ + if (wninf > 0) { \ + ans->dbl_v[i] = R_NaN; \ + } else { \ + ans->dbl_v[i] = R_PosInf; \ + } \ + } else if (wninf > 0) { \ + ans->dbl_v[i] = R_NegInf; \ + } else { \ + ans->dbl_v[i] = ws/k[i]; \ + } \ + } + #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; idbl_v[i] = fill; - } else if (!narm) { // this branch reduce number of branching in narm=1 below - if (i+1 == k[i]) { - ans->dbl_v[i] = cn[i]>0 ? NA_REAL : cs[i]/k[i]; - } else if (i+1 > k[i]) { - ans->dbl_v[i] = (cn[i] - cn[i-k[i]])>0 ? NA_REAL : (cs[i]-cs[i-k[i]])/k[i]; - } - } else if (i+1 == k[i]) { // window width equal to observation position in vector - int thisk = k[i] - ((int) cn[i]); // window width taking NAs into account, we assume single window width is int32, cum NA counter can be int64 - ans->dbl_v[i] = thisk==0 ? R_NaN : cs[i]/thisk; // handle all obs NAs and na.rm=TRUE - } else if (i+1 > k[i]) { // window width smaller than observation position in vector - int thisk = k[i] - ((int) (cn[i] - cn[i-k[i]])); // window width taking NAs into account, we assume single window width is int32, cum NA counter can be int64 - ans->dbl_v[i] = thisk==0 ? R_NaN : (cs[i]-cs[i-k[i]])/thisk; // handle all obs NAs and na.rm=TRUE + } else if (i+1 == k[i]) { // first full window + wn = cn[i]; + wpinf = cpinf[i]; + wninf = cninf[i]; + ws = cs[i]; + MEAN_WINDOW_STEP_VALUE + } else { // all the remaining full windows + wn = cn[i] - cn[i-k[i]]; // NAs in current window + wpinf = cpinf[i] - cpinf[i-k[i]]; // Inf in current window + wninf = cninf[i] - cninf[i-k[i]]; // -Inf in current window + ws = cs[i] - cs[i-k[i]]; // cumsum in current window + MEAN_WINDOW_STEP_VALUE } } - free(cn); - } // end of truehasna + free(cninf); free(cpinf); free(cn); + } // end of truehasnf free(cs); } -/* fast adaptive rolling mean exact +/* fast rolling adaptive mean - exact * extra nested loop to calculate mean of each obs and error correction * requires much more cpu * uses multiple cores */ -void fadaptiverollmeanExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose) { +void frolladaptivemeanExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose) { if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", hasna %d, narm %d\n"), "fadaptiverollmeanExact", (uint64_t)nx, hasna, (int) narm); - bool truehasna = hasna>0; // flag to re-run if NAs detected - if (!truehasna || !narm) { // narm=FALSE handled here as NAs properly propagated in exact algo + snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", hasnf %d, narm %d\n"), "frolladaptivemeanExact", (uint64_t)nx, hasnf, (int) narm); + bool truehasnf = hasnf>0; // flag to re-run if NAs detected + if (!truehasnf || !narm) { // narm=FALSE handled here as NAs properly propagated in exact algo #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; idbl_v[i] = (double) (res + (err / k[i])); // adjust calculated fun with roundoff correction - } else { + } else if (ISNAN((double) w)) { if (!narm) { - ans->dbl_v[i] = (double) (w / k[i]); // NAs should be propagated + ans->dbl_v[i] = (double) w; } - truehasna = true; // NAs detected for this window, set flag so rest of windows will not be re-run + truehasnf = true; // NAs detected for this window, set flag so rest of windows will not be re-run + } else { + ans->dbl_v[i] = (double) w; // Inf and -Inf } } } - if (truehasna) { - if (hasna==-1) { // raise warning - ans->status = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + if (truehasnf) { + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) { - if (narm) { - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, re-running with extra care for NAs\n"), __func__); - } else { - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, na.rm was FALSE so in 'exact' implementation NAs were handled already, no need to re-run\n"), __func__); - } + if (narm) + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + else + ansSetMsg(ans, 0, "%s: non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run\n", __func__); } } } - if (truehasna && narm) { + if (truehasnf && narm) { #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; i DBL_MAX) { - ans->dbl_v[i] = R_PosInf; // handle Inf for na.rm=TRUE consistently to base R - } else if (w < -DBL_MAX) { - ans->dbl_v[i] = R_NegInf; - } else { + if (R_FINITE((double) w)) { if (nc == 0) { // no NAs in current window res = w / k[i]; for (int j=-k[i]+1; j<=0; j++) { // sub-loop on window width to accumulate roundoff error @@ -194,38 +272,29 @@ void fadaptiverollmeanExact(double *x, uint64_t nx, ans_t *ans, int *k, double f } else { // nc == k[i] ans->dbl_v[i] = R_NaN; // this branch assume narm so R_NaN always here } + } else { + ans->dbl_v[i] = (double) w; } } } - } // end of truehasna + } // end of truehasnf } -/* fast adaptive rolling sum */ -void fadaptiverollsum(unsigned int algo, double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose) { - double tic = 0; - if (verbose) - tic = omp_get_wtime(); - if (algo==0) { - fadaptiverollsumFast(x, nx, ans, k, fill, narm, hasna, verbose); - } else if (algo==1) { - fadaptiverollsumExact(x, nx, ans, k, fill, narm, hasna, verbose); - } - if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: processing algo %u took %.3fs\n"), __func__, algo, omp_get_wtime()-tic); -} -void fadaptiverollsumFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose) { +/* fast rolling adaptive sum - fast + * same as adaptive mean fast + */ +void frolladaptivesumFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose) { if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", hasna %d, narm %d\n"), "fadaptiverollsumFast", (uint64_t)nx, hasna, (int) narm); - bool truehasna = hasna>0; + snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", hasnf %d, narm %d\n"), "frolladaptivesumFast", (uint64_t)nx, hasnf, (int) narm); + bool truehasnf = hasnf>0; long double w = 0.0; double *cs = malloc(nx*sizeof(double)); if (!cs) { // # nocov start - ans->status = 3; - snprintf(ans->message[3], 500, _("%s: Unable to allocate memory for cumsum"), __func__); + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cumsum", __func__); // raise error free(cs); return; } // # nocov end - if (!truehasna) { + if (!truehasnf) { for (uint64_t i=0; istatus = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, re-running with extra care for NAs\n"), __func__); - w = 0.0; - truehasna = true; + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + w = 0.0; truehasnf = true; } } - if (truehasna) { - uint64_t nc = 0; - uint64_t *cn = malloc(nx*sizeof(uint64_t)); + if (truehasnf) { + uint64_t nc = 0, pinf = 0, ninf = 0; // running NA counter + uint64_t *cn = malloc(nx*sizeof(uint64_t)); // cumulative NA counter, used the same way as cumsum, same as uint64_t cn[nx] but no segfault if (!cn) { // # nocov start - ans->status = 3; - snprintf(ans->message[3], 500, _("%s: Unable to allocate memory for cum NA counter"), __func__); - free(cs); - free(cn); + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cum NA counter", __func__); // raise error + free(cs); free(cn); return; } // # nocov end - for (uint64_t i=0; i0) { \ + if (narm) { \ + if (wpinf > 0) { \ + if (wninf > 0) { \ + ans->dbl_v[i] = R_NaN; \ + } else { \ + ans->dbl_v[i] = R_PosInf; \ + } \ + } else if (wninf > 0) { \ + ans->dbl_v[i] = R_NegInf; \ + } else { \ + int thisk = k[i] - ((int) wn); \ + ans->dbl_v[i] = thisk==0 ? 0.0 : ws; \ + } \ + } else { \ + ans->dbl_v[i] = NA_REAL; \ + } \ + } else { \ + if (wpinf > 0) { \ + if (wninf > 0) { \ + ans->dbl_v[i] = R_NaN; \ + } else { \ + ans->dbl_v[i] = R_PosInf; \ + } \ + } else if (wninf > 0) { \ + ans->dbl_v[i] = R_NegInf; \ + } else { \ + ans->dbl_v[i] = ws; \ + } \ } + #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; idbl_v[i] = fill; - } else if (!narm) { - if (i+1 == k[i]) { - ans->dbl_v[i] = cn[i]>0 ? NA_REAL : cs[i]; - } else if (i+1 > k[i]) { - ans->dbl_v[i] = (cn[i] - cn[i-k[i]])>0 ? NA_REAL : cs[i]-cs[i-k[i]]; - } - } else if (i+1 == k[i]) { - int thisk = k[i] - ((int) cn[i]); - ans->dbl_v[i] = thisk==0 ? 0.0 : cs[i]; - } else if (i+1 > k[i]) { - int thisk = k[i] - ((int) (cn[i] - cn[i-k[i]])); - ans->dbl_v[i] = thisk==0 ? 0.0 : cs[i]-cs[i-k[i]]; + } else if (i+1 == k[i]) { // first full window + wn = cn[i]; + wpinf = cpinf[i]; + wninf = cninf[i]; + ws = cs[i]; + SUM_WINDOW_STEP_VALUE + } else { // all the remaining full windows + wn = cn[i] - cn[i-k[i]]; // NAs in current window + wpinf = cpinf[i] - cpinf[i-k[i]]; // Inf in current window + wninf = cninf[i] - cninf[i-k[i]]; // -Inf in current window + ws = cs[i] - cs[i-k[i]]; // cumsum in current window + SUM_WINDOW_STEP_VALUE } } - free(cn); + free(cninf); free(cpinf); free(cn); } free(cs); } -void fadaptiverollsumExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose) { +/* fast rolling adaptive sum - exact + * same as adaptive mean exact + */ +void frolladaptivesumExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose) { if (verbose) - snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", hasna %d, narm %d\n"), "fadaptiverollsumExact", (uint64_t)nx, hasna, (int) narm); - bool truehasna = hasna>0; - if (!truehasna || !narm) { + snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", hasnf %d, narm %d\n"), "frolladaptivesumExact", (uint64_t)nx, hasnf, (int) narm); + bool truehasnf = hasnf>0; + if (!truehasnf || !narm) { #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; idbl_v[i] = (double) w; - } else { + } else if (ISNAN((double) w)) { if (!narm) { ans->dbl_v[i] = (double) w; } - truehasna = true; + truehasnf = true; // NAs detected for this window, set flag so rest of windows will not be re-run + } else { + ans->dbl_v[i] = (double) w; // Inf and -Inf } } } - if (truehasna) { - if (hasna==-1) { - ans->status = 2; - snprintf(end(ans->message[2]), 500, _("%s: hasNA=FALSE used but NA (or other non-finite) value(s) are present in input, use default hasNA=NA to avoid this warning"), __func__); - } + if (truehasnf) { + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); if (verbose) { - if (narm) { - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, re-running with extra care for NAs\n"), __func__); - } else { - snprintf(end(ans->message[0]), 500, _("%s: NA (or other non-finite) value(s) are present in input, na.rm was FALSE so in 'exact' implementation NAs were handled already, no need to re-run\n"), __func__); - } + if (narm) + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + else + ansSetMsg(ans, 0, "%s: non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run\n", __func__); } } } - if (truehasna && narm) { + if (truehasnf && narm) { #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; imessage[0]), 500, _("%s: running in parallel for input length %"PRIu64", hasnf %d, narm %d\n"), "frolladaptivemaxExact", (uint64_t)nx, hasnf, (int) narm); + if (narm || hasnf==-1) { // fastest we can get for adaptive max as there is no algo='fast', therefore we drop any NA checks when has.nf=FALSE + #pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=0; idbl_v[i] = fill; + } else { + double w = R_NegInf; + for (int j=-k[i]+1; j<=0; j++) { + if (x[i+j] > w) + w = x[i+j]; + } + ans->dbl_v[i] = w; + } + } + } else { + bool *isnan = malloc(nx*sizeof(bool)); // isnan lookup - we use it to reduce ISNAN calls in nested loop + if (!isnan) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for isnan", __func__); // raise error + free(isnan); + return; + } // # nocov end + bool truehasnf = hasnf>0; + for (uint64_t i=0; idbl_v[i] = fill; + } else { + double w = R_NegInf; + for (int j=-k[i]+1; j<=0; j++) { + if (x[i+j] > w) + w = x[i+j]; + } + ans->dbl_v[i] = w; + } + } + } else { // there are some NAs + #pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=0; idbl_v[i] = fill; + } else { + double w = R_NegInf; + if (isnan[i] && ISNA(x[i])) { + w = NA_REAL; + } else { + for (int j=-k[i]+1; j<=0; j++) { + if (isnan[i+j]) { + if (ISNA(x[i+j])) { + w = NA_REAL; + break; + } else { + w = R_NaN; + } + } else if (x[i+j] > w) + w = x[i+j]; + } + } + ans->dbl_v[i] = w; + } + } + } + } +} + +/* fast rolling adaptive min - exact + * for has.nf=FALSE it will not detect if any NAs were in the input, therefore could produce incorrect result, well documented + */ +void frolladaptiveminExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose) { + if (verbose) + snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", hasnf %d, narm %d\n"), "frolladaptiveminExact", (uint64_t)nx, hasnf, (int) narm); + if (narm || hasnf==-1) { // fastest we can get for adaptive max as there is no algo='fast', therefore we drop any NA checks when has.nf=FALSE +#pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=0; idbl_v[i] = fill; + } else { + double w = R_PosInf; + for (int j=-k[i]+1; j<=0; j++) { + if (x[i+j] < w) + w = x[i+j]; + } + ans->dbl_v[i] = w; + } + } + } else { + bool *isnan = malloc(nx*sizeof(bool)); // isnan lookup - we use it to reduce ISNAN calls in nested loop + if (!isnan) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for isnan", __func__); // raise error + free(isnan); + return; + } // # nocov end + bool truehasnf = hasnf>0; + for (uint64_t i=0; idbl_v[i] = fill; + } else { + double w = R_PosInf; + for (int j=-k[i]+1; j<=0; j++) { + if (x[i+j] < w) + w = x[i+j]; + } + ans->dbl_v[i] = w; + } + } + } else { // there are some NAs +#pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=0; idbl_v[i] = fill; + } else { + double w = R_PosInf; + if (isnan[i] && ISNA(x[i])) { + w = NA_REAL; + } else { + for (int j=-k[i]+1; j<=0; j++) { + if (isnan[i+j]) { + if (ISNA(x[i+j])) { + w = NA_REAL; + break; + } else { + w = R_NaN; + } + } else if (x[i+j] < w) + w = x[i+j]; + } + } + ans->dbl_v[i] = w; + } + } + } + } +} + +/* fast rolling adaptive prod - fast + * same as adaptive mean fast + */ +void frolladaptiveprodFast(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose) { + if (verbose) + snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", hasnf %d, narm %d\n"), "frolladaptiveprodFast", (uint64_t)nx, hasnf, (int) narm); + bool truehasnf = hasnf>0; + long double w = 1.0; + double *cs = malloc(nx*sizeof(double)); + if (!cs) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cumprod", __func__); // raise error + free(cs); + return; + } // # nocov end + if (!truehasnf) { + for (uint64_t i=0; idbl_v[i] = cs[i]; + } else if (i+1 > k[i]) { + ans->dbl_v[i] = cs[i]/cs[i-k[i]]; + } else { + ans->dbl_v[i] = fill; + } + } + } else { + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); + if (verbose) + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + w = 1.0; truehasnf = true; + } + } + if (truehasnf) { + uint64_t nc = 0, pinf = 0, ninf = 0; // running NA counter + uint64_t *cn = malloc(nx*sizeof(uint64_t)); // cumulative NA counter, used the same way as cumprod, same as uint64_t cn[nx] but no segfault + if (!cn) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cum NA counter", __func__); // raise error + free(cs); free(cn); + return; + } // # nocov end + uint64_t *cpinf = malloc(nx*sizeof(uint64_t)); + if (!cpinf) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cum Inf counter", __func__); // raise error + free(cs); free(cn); free(cpinf); + return; + } // # nocov end + uint64_t *cninf = malloc(nx*sizeof(uint64_t)); + if (!cninf) { // # nocov start + ansSetMsg(ans, 3, "%s: Unable to allocate memory for cum -Inf counter", __func__); // raise error + free(cs); free(cn); free(cpinf); free(cninf); + return; + } // # nocov end + for (uint64_t i=0; i0) { \ + if (narm) { \ + if (wpinf == 0 && wninf == 0) { \ + int thisk = k[i] - ((int) wn); \ + ans->dbl_v[i] = thisk==0 ? 1.0 : (double) ws; \ + } else { \ + ans->dbl_v[i] = (wninf+(ws<0))%2 ? R_NegInf : R_PosInf; \ + } \ + } else { \ + ans->dbl_v[i] = NA_REAL; \ + } \ + } else { \ + if (wpinf == 0 && wninf == 0) { \ + ans->dbl_v[i] = (double) ws; \ + } else { \ + ans->dbl_v[i] = (wninf+(ws<0))%2 ? R_NegInf : R_PosInf; \ + } \ + } + +#pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=0; idbl_v[i] = fill; + } else if (i+1 == k[i]) { // first full window + wn = cn[i]; + wpinf = cpinf[i]; + wninf = cninf[i]; + ws = cs[i]; + PROD_WINDOW_STEP_VALUE + } else { // all the remaining full windows + wn = cn[i] - cn[i-k[i]]; // NAs in current window + wpinf = cpinf[i] - cpinf[i-k[i]]; // Inf in current window + wninf = cninf[i] - cninf[i-k[i]]; // -Inf in current window + ws = cs[i] / cs[i-k[i]]; // cumprod in current window + PROD_WINDOW_STEP_VALUE + } + } + free(cninf); free(cpinf); free(cn); + } + free(cs); +} +/* fast rolling adaptive prod - exact + * same as adaptive mean exact + */ +void frolladaptiveprodExact(double *x, uint64_t nx, ans_t *ans, int *k, double fill, bool narm, int hasnf, bool verbose) { + if (verbose) + snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", hasnf %d, narm %d\n"), "frolladaptiveprodExact", (uint64_t)nx, hasnf, (int) narm); + bool truehasnf = hasnf>0; + if (!truehasnf || !narm) { +#pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=0; idbl_v[i] = fill; + } else { + long double w = 1.0; + for (int j=-k[i]+1; j<=0; j++) { + w *= x[i+j]; + } + if (R_FINITE((double) w)) { + ans->dbl_v[i] = (double) w; + } else if (ISNAN((double) w)) { + if (!narm) { + ans->dbl_v[i] = (double) w; + } + truehasnf = true; // NAs detected for this window, set flag so rest of windows will not be re-run + } else { + ans->dbl_v[i] = (double) w; // Inf and -Inf + } + } + } + if (truehasnf) { + if (hasnf==-1) + ansSetMsg(ans, 2, "%s: has.nf=FALSE used but non-finite values are present in input, use default has.nf=NA to avoid this warning", __func__); + if (verbose) { + if (narm) + ansSetMsg(ans, 0, "%s: non-finite values are present in input, re-running with extra care for NFs\n", __func__); + else + ansSetMsg(ans, 0, "%s: non-finite values are present in input, na.rm=FALSE and algo='exact' propagates NFs properply, no need to re-run\n", __func__); + } + } + } + if (truehasnf && narm) { +#pragma omp parallel for num_threads(getDTthreads(nx, true)) + for (uint64_t i=0; idbl_v[i] = fill; + } else { + long double w = 1.0; + int nc = 0; + for (int j=-k[i]+1; j<=0; j++) { + if (ISNAN(x[i+j])) { + nc++; + } else { + w *= x[i+j]; + } + } + if (w > DBL_MAX) { + ans->dbl_v[i] = R_PosInf; + } else if (w < -DBL_MAX) { + ans->dbl_v[i] = R_NegInf; + } else { + if (nc < k[i]) { + ans->dbl_v[i] = (double) w; + } else { + ans->dbl_v[i] = 1.0; + } + } + } + } + } +} diff --git a/src/frollapply.c b/src/frollapply.c new file mode 100644 index 0000000000..cf9f97e2be --- /dev/null +++ b/src/frollapply.c @@ -0,0 +1,92 @@ +#include "data.table.h" // first (before Rdefines.h) for clang-13-omp, #5122 +#include + +#define MEMCPY \ +switch (TYPEOF(d)) { \ +case INTSXP: { \ + memcpy(INTEGER(d), INTEGER(s)+o, nrow*sizeof(int)); \ +} break; \ +case LGLSXP: { \ + memcpy(LOGICAL(d), LOGICAL(s)+o, nrow*sizeof(int)); \ +} break; \ +case REALSXP: { \ + memcpy(REAL(d), REAL(s)+o, nrow*sizeof(double)); \ +} break; \ +case STRSXP: { \ + for (int i=0; i ans->status) + ans->status = status; + snprintf(end(ans->message[status]), 500, _(msg), func); // func should be passed via ... really, thus this helper cannot replace all cases we need + // implicit n_message limit discussed here: https://github.com/Rdatatable/data.table/issues/3423#issuecomment-487722586 +} + /* * function to print verbose messages, stderr messages, warnings and errors stored in ans_t struct */ -void ansMsg(ans_t *ans, int n, bool verbose, const char *func) { +void ansGetMsgs(ans_t *ans, int n, bool verbose, const char *func) { for (int i=0; imessage[0]), 500, "%s: stdout 1 message\n", __func__); - snprintf(end(ans->message[0]), 500, "%s: stdout 2 message\n", __func__); + ansSetMsg(ans, 0, "%s: stdout 1 message\n", __func__); + ansSetMsg(ans, 0, "%s: stdout 2 message\n", __func__); } if (istatus == 1 || istatus == 12 || istatus == 13 || istatus == 123) { - snprintf(end(ans->message[1]), 500, "%s: stderr 1 message\n", __func__); - snprintf(end(ans->message[1]), 500, "%s: stderr 2 message\n", __func__); - ans->status = 1; + ansSetMsg(ans, 1, "%s: stderr 1 message\n", __func__); + ansSetMsg(ans, 1, "%s: stderr 2 message\n", __func__); } if (istatus == 2 || istatus == 12 || istatus == 23 || istatus == 123) { - snprintf(end(ans->message[2]), 500, "%s: stderr 1 warning\n", __func__); - snprintf(end(ans->message[2]), 500, "%s: stderr 2 warning\n", __func__); - ans->status = 2; + ansSetMsg(ans, 2, "%s: stderr 1 warning\n", __func__); + ansSetMsg(ans, 2, "%s: stderr 2 warning\n", __func__); } if (istatus == 3 || istatus == 13 || istatus == 23 || istatus == 123) { - snprintf(end(ans->message[3]), 500, "%s: stderr 1 error\n", __func__); - snprintf(end(ans->message[3]), 500, "%s: stderr 2 error\n", __func__); // printed too because errors appended and raised from ansMsg later on - ans->status = 3; + ansSetMsg(ans, 3, "%s: stderr 1 error\n", __func__); + ansSetMsg(ans, 3, "%s: stderr 2 error\n", __func__); // printed too because errors appended and raised from ansGetMsgs later on } - ans->int_v[0] = ans->status; + ans->int_v[0] = ans->status; // just a return value of status } SEXP testMsgR(SEXP status, SEXP x, SEXP k) { if (!isInteger(status) || !isInteger(x) || !isInteger(k)) @@ -57,7 +64,7 @@ SEXP testMsgR(SEXP status, SEXP x, SEXP k) { const bool verbose = GetVerbose(); int istatus = INTEGER(status)[0], nx = INTEGER(x)[0], nk = INTEGER(k)[0]; - // TODO below chunk into allocansList helper, not for 1.12.4 + // TODO below chunk into allocAnsList helper - not easy for variable length of inner vectors SEXP ans = PROTECT(allocVector(VECSXP, nk * nx)); protecti++; ans_t *vans = (ans_t *)R_alloc(nx*nk, sizeof(ans_t)); if (verbose) @@ -76,7 +83,7 @@ SEXP testMsgR(SEXP status, SEXP x, SEXP k) { } } - ansMsg(vans, nx*nk, verbose, __func__); + ansGetMsgs(vans, nx*nk, verbose, __func__); UNPROTECT(protecti); return ans; }