Skip to content

could benchmark_grid error for learners with different predict_type? #1273

@tdhock

Description

@tdhock

Hi @sebffischer
I am using benchmark_grid which I find very useful in general for ML experiments to compare prediction accuracy of different learning algorithms. (on the cluster sometimes)
But sometimes I forget to set predict_type="prob" and I don't get a message about that until score() after which I have to go back to the beginning, set predict_type, then launch a bunch more jobs and wait for the results. For example:

lrn_list <- mlr3::lrns(c("classif.featureless","classif.rpart"))
lrn_list[[1]]$predict_type <- "prob"
bgrid <- mlr3::benchmark_grid( 
  mlr3::tsk("sonar"),
  lrn_list,
  mlr3::rsmp("cv"))
bresult <- mlr3::benchmark(bgrid)
bresult$score(mlr3::msr("classif.auc"))

The result I got was:

> bresult$score(mlr3::msr("classif.auc"))
       nr task_id          learner_id resampling_id iteration     prediction_test classif.auc
    <int>  <char>              <char>        <char>     <int>              <list>       <num>
 1:     1   sonar classif.featureless            cv         1 <PredictionClassif>         0.5
 2:     1   sonar classif.featureless            cv         2 <PredictionClassif>         0.5
 3:     1   sonar classif.featureless            cv         3 <PredictionClassif>         0.5
 4:     1   sonar classif.featureless            cv         4 <PredictionClassif>         0.5
 5:     1   sonar classif.featureless            cv         5 <PredictionClassif>         0.5
 6:     1   sonar classif.featureless            cv         6 <PredictionClassif>         0.5
 7:     1   sonar classif.featureless            cv         7 <PredictionClassif>         0.5
 8:     1   sonar classif.featureless            cv         8 <PredictionClassif>         0.5
 9:     1   sonar classif.featureless            cv         9 <PredictionClassif>         0.5
10:     1   sonar classif.featureless            cv        10 <PredictionClassif>         0.5
11:     2   sonar       classif.rpart            cv         1 <PredictionClassif>         NaN
12:     2   sonar       classif.rpart            cv         2 <PredictionClassif>         NaN
13:     2   sonar       classif.rpart            cv         3 <PredictionClassif>         NaN
14:     2   sonar       classif.rpart            cv         4 <PredictionClassif>         NaN
15:     2   sonar       classif.rpart            cv         5 <PredictionClassif>         NaN
16:     2   sonar       classif.rpart            cv         6 <PredictionClassif>         NaN
17:     2   sonar       classif.rpart            cv         7 <PredictionClassif>         NaN
18:     2   sonar       classif.rpart            cv         8 <PredictionClassif>         NaN
19:     2   sonar       classif.rpart            cv         9 <PredictionClassif>         NaN
20:     2   sonar       classif.rpart            cv        10 <PredictionClassif>         NaN
       nr task_id          learner_id resampling_id iteration     prediction_test classif.auc
Hidden columns: uhash, task, learner, resampling
Message d'avis :
Measure 'classif.auc' is missing predict type 'prob' of learner 'classif.rpart' 

Note NaN values for classif.auc and the warning about missing predict type.

I tried to change the predict type after the benchmark, but I got an error:

> bresult$learners$learner[[2]]$predict_type <- "prob"
Erreur dans assert_ro_binding(rhs) : Field/Binding is read-only

In terms of user experience, I believe it would be much better to error early, something like: Error in benchmark_grid(): all learners must have the same value of predict_type. Could that be possible please?

By the way, is there a strong argument to keep "response" as the default? It seems that "prob" is much more generally useful. (compatible with more measures)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions