Wishlist / redesign proposal for predict_spatial() #99

CBonannella · 2025-08-15T16:56:16Z

CBonannella
Aug 15, 2025

Context

I’m using mlr3spatial::predict_spatial() for large-area species-distribution mapping (binary + probabilistic output). While porting a stacked model pipeline I ran into several pain points that make the current helper too narrow for real-world classification workflows.

Below is a short review of the current behaviour, why it breaks, and extensions that would solve the issues without changing defaults.

Current limitations (classification)

Issue	Why it matters
Always writes `pred$response` (hard labels)	No way to output class probabilities even if the learner was trained with `predict_type = "prob"`. I saw there's a pull request up for this but it's quite old (~ 2 years ago).
Always calls `terra::categories()`	Turns the raster into categorical, so continuous probability rasters are impossible.
Fixed datatype (`FLT8S`), great for regression but not always ideal for classification + no GDAL opts	Produces huge 64-bit files; impossible to request other more compressed datatypes, set tiling or data compression options etc.

The PR I mentioned above also doesn't address GDAL options but focuses only on implementing probabilities, which would be already a good step forward, but you know, one can dream, no?

What a more general helper should cover

It's great that the terra::writeRaster() uses data chunking, which helps a lot in real world / production pipelines, but here is what I think could be improved:

Respect predict_type
- keep behavior as it is for response.
- if it's binary classification, write single layer (pred$prob[,"1"]), if the prob object has ncol > 2 allow the user to select how many raster layers need to be written.
Add datatype and GDAL options as function arguments
Keep regression unchanged

Pseudocode / sketch

predict_spatial(
  newdata,
  learner,
  chunksize  = 200L,
  format     = c("terra","raster","stars"),
  filename   = NULL,
  prob_class = NULL,   # "auto" (binary positive), character vector, or NULL (=all)
  datatype   = "FLT8S",
  gdal_opts  = NULL
)

So basically:

If learner$predict_type == "prob"
- binary = write one band (prob_class defaults to the second factor level, i.e. the "positive" class)
- multiclass = write one band per class or only subset given by prob_class
- Skip terra::categories() entirely; name output layer(s) as p_<class> where <class> is the corresponding class label (coming from prob_class).
Else (hard labels, predict_type == "response") keep current categorical workflow / current workflow
Add datatype and gdal_opts arguments, pass them through to terra::writeStart()

I'm currently prototyping / doing this ad hoc myself but I'd be happy contribute / help with these implementations, doing some testing and eventually update docs/examples when needed.

Let me know if this direction is acceptable, or if there are design constraints I missed.

Thanks for the great package!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Wishlist / redesign proposal for predict_spatial() #99

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Wishlist / redesign proposal for predict_spatial() #99

Uh oh!

CBonannella Aug 15, 2025

Context

Current limitations (classification)

What a more general helper should cover

Pseudocode / sketch

Replies: 0 comments

CBonannella
Aug 15, 2025