Get predictions on test sets in MLR











up vote
0
down vote

favorite












I'm fitting classification models for binary issues using MLR package in R. For each model, I perform a cross-validation with embedded feature selection using "selectFeatures" function and retrieve mean AUCs over test sets. I would like next to retrieve predictions on the test sets for each fold but this function does not seem to support that. I already tried to plug selected predictors into the "resample" function to get it. It works, but performance metrics are different which is not suitable for my analysis. I also tried to check in caret package if it is possible but I have not seen a solution at first glance. Any idea how to do it?



Here is my code with synthetic data and with my attempt with "resample" function (again: not suitable in this current version as performance metrics are different) .



# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################

install.packages("mlbench")
library(mlbench)
data(BreastCancer)

# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable
p<-mlbench.waveform(1000)

# convert list into dataframe
dataset<-as.data.frame(p)

# drop thrid class to get 2 classes
dataset2 = subset(dataset, classes != 3)

# 2. Perform cross validation with embedded feature selection
#############################################################

library(BBmisc)
library(nnet)
library(mlr)

# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.nnet", predict.type = "prob")

# Choice of sampling plan: 10 fold cross validation with stratification of target classes
mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)

# Choice of feature selection strategy
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

# Choice of feature selection technique (stepwize family) and p-value
mFSCS = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

# Choice of seed
set.seed(12)

# Choice of data
mCT <- makeClassifTask(data =dataset2, target = "classes")

# Perform the method
result = selectFeatures(mL,mCT, mRD, control = ctrl, measures = list(mlr::auc,mlr::acc,mlr::brier))

# Retrieve AUC and selected variables
analyzeFeatSelResult(result)
# Result: auc.test.mean=0.9614525 Variables selected: x.10, x.11, x.15, x.17, x.18

# 3. Retrieve predictions on tests sets (to later perform Delong tests on AUCs derived from multiple sets of candidate variables)
#################################################################################################################################

# create new dataset with selected predictors
keep <- c("x.10","x.11","x.15","x.17","x.18","classes")
dataset3 <- dataset2[ , names(dataset2) %in% keep]

# Perform same tasks with resample function instead of selectFeatures function to get predictions on tests set
mL <- makeLearner("classif.nnet", predict.type = "prob")
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)
set.seed(12)
mCT <- makeClassifTask(data =dataset3, target = "classes")
r1r = resample(mL, mCT, mRD, measures = list(mlr::auc,mlr::acc,mlr::brier))
# Result: auc.test.mean=0.9673023









share|improve this question




























    up vote
    0
    down vote

    favorite












    I'm fitting classification models for binary issues using MLR package in R. For each model, I perform a cross-validation with embedded feature selection using "selectFeatures" function and retrieve mean AUCs over test sets. I would like next to retrieve predictions on the test sets for each fold but this function does not seem to support that. I already tried to plug selected predictors into the "resample" function to get it. It works, but performance metrics are different which is not suitable for my analysis. I also tried to check in caret package if it is possible but I have not seen a solution at first glance. Any idea how to do it?



    Here is my code with synthetic data and with my attempt with "resample" function (again: not suitable in this current version as performance metrics are different) .



    # 1. Find a synthetic dataset for supervised learning (two classes)
    ###################################################################

    install.packages("mlbench")
    library(mlbench)
    data(BreastCancer)

    # generate 1000 rows, 21 quantitative candidate predictors and 1 target variable
    p<-mlbench.waveform(1000)

    # convert list into dataframe
    dataset<-as.data.frame(p)

    # drop thrid class to get 2 classes
    dataset2 = subset(dataset, classes != 3)

    # 2. Perform cross validation with embedded feature selection
    #############################################################

    library(BBmisc)
    library(nnet)
    library(mlr)

    # Choice of algorithm i.e. neural network
    mL <- makeLearner("classif.nnet", predict.type = "prob")

    # Choice of sampling plan: 10 fold cross validation with stratification of target classes
    mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)

    # Choice of feature selection strategy
    ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

    # Choice of feature selection technique (stepwize family) and p-value
    mFSCS = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

    # Choice of seed
    set.seed(12)

    # Choice of data
    mCT <- makeClassifTask(data =dataset2, target = "classes")

    # Perform the method
    result = selectFeatures(mL,mCT, mRD, control = ctrl, measures = list(mlr::auc,mlr::acc,mlr::brier))

    # Retrieve AUC and selected variables
    analyzeFeatSelResult(result)
    # Result: auc.test.mean=0.9614525 Variables selected: x.10, x.11, x.15, x.17, x.18

    # 3. Retrieve predictions on tests sets (to later perform Delong tests on AUCs derived from multiple sets of candidate variables)
    #################################################################################################################################

    # create new dataset with selected predictors
    keep <- c("x.10","x.11","x.15","x.17","x.18","classes")
    dataset3 <- dataset2[ , names(dataset2) %in% keep]

    # Perform same tasks with resample function instead of selectFeatures function to get predictions on tests set
    mL <- makeLearner("classif.nnet", predict.type = "prob")
    ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
    mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)
    set.seed(12)
    mCT <- makeClassifTask(data =dataset3, target = "classes")
    r1r = resample(mL, mCT, mRD, measures = list(mlr::auc,mlr::acc,mlr::brier))
    # Result: auc.test.mean=0.9673023









    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I'm fitting classification models for binary issues using MLR package in R. For each model, I perform a cross-validation with embedded feature selection using "selectFeatures" function and retrieve mean AUCs over test sets. I would like next to retrieve predictions on the test sets for each fold but this function does not seem to support that. I already tried to plug selected predictors into the "resample" function to get it. It works, but performance metrics are different which is not suitable for my analysis. I also tried to check in caret package if it is possible but I have not seen a solution at first glance. Any idea how to do it?



      Here is my code with synthetic data and with my attempt with "resample" function (again: not suitable in this current version as performance metrics are different) .



      # 1. Find a synthetic dataset for supervised learning (two classes)
      ###################################################################

      install.packages("mlbench")
      library(mlbench)
      data(BreastCancer)

      # generate 1000 rows, 21 quantitative candidate predictors and 1 target variable
      p<-mlbench.waveform(1000)

      # convert list into dataframe
      dataset<-as.data.frame(p)

      # drop thrid class to get 2 classes
      dataset2 = subset(dataset, classes != 3)

      # 2. Perform cross validation with embedded feature selection
      #############################################################

      library(BBmisc)
      library(nnet)
      library(mlr)

      # Choice of algorithm i.e. neural network
      mL <- makeLearner("classif.nnet", predict.type = "prob")

      # Choice of sampling plan: 10 fold cross validation with stratification of target classes
      mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)

      # Choice of feature selection strategy
      ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

      # Choice of feature selection technique (stepwize family) and p-value
      mFSCS = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

      # Choice of seed
      set.seed(12)

      # Choice of data
      mCT <- makeClassifTask(data =dataset2, target = "classes")

      # Perform the method
      result = selectFeatures(mL,mCT, mRD, control = ctrl, measures = list(mlr::auc,mlr::acc,mlr::brier))

      # Retrieve AUC and selected variables
      analyzeFeatSelResult(result)
      # Result: auc.test.mean=0.9614525 Variables selected: x.10, x.11, x.15, x.17, x.18

      # 3. Retrieve predictions on tests sets (to later perform Delong tests on AUCs derived from multiple sets of candidate variables)
      #################################################################################################################################

      # create new dataset with selected predictors
      keep <- c("x.10","x.11","x.15","x.17","x.18","classes")
      dataset3 <- dataset2[ , names(dataset2) %in% keep]

      # Perform same tasks with resample function instead of selectFeatures function to get predictions on tests set
      mL <- makeLearner("classif.nnet", predict.type = "prob")
      ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
      mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)
      set.seed(12)
      mCT <- makeClassifTask(data =dataset3, target = "classes")
      r1r = resample(mL, mCT, mRD, measures = list(mlr::auc,mlr::acc,mlr::brier))
      # Result: auc.test.mean=0.9673023









      share|improve this question















      I'm fitting classification models for binary issues using MLR package in R. For each model, I perform a cross-validation with embedded feature selection using "selectFeatures" function and retrieve mean AUCs over test sets. I would like next to retrieve predictions on the test sets for each fold but this function does not seem to support that. I already tried to plug selected predictors into the "resample" function to get it. It works, but performance metrics are different which is not suitable for my analysis. I also tried to check in caret package if it is possible but I have not seen a solution at first glance. Any idea how to do it?



      Here is my code with synthetic data and with my attempt with "resample" function (again: not suitable in this current version as performance metrics are different) .



      # 1. Find a synthetic dataset for supervised learning (two classes)
      ###################################################################

      install.packages("mlbench")
      library(mlbench)
      data(BreastCancer)

      # generate 1000 rows, 21 quantitative candidate predictors and 1 target variable
      p<-mlbench.waveform(1000)

      # convert list into dataframe
      dataset<-as.data.frame(p)

      # drop thrid class to get 2 classes
      dataset2 = subset(dataset, classes != 3)

      # 2. Perform cross validation with embedded feature selection
      #############################################################

      library(BBmisc)
      library(nnet)
      library(mlr)

      # Choice of algorithm i.e. neural network
      mL <- makeLearner("classif.nnet", predict.type = "prob")

      # Choice of sampling plan: 10 fold cross validation with stratification of target classes
      mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)

      # Choice of feature selection strategy
      ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

      # Choice of feature selection technique (stepwize family) and p-value
      mFSCS = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

      # Choice of seed
      set.seed(12)

      # Choice of data
      mCT <- makeClassifTask(data =dataset2, target = "classes")

      # Perform the method
      result = selectFeatures(mL,mCT, mRD, control = ctrl, measures = list(mlr::auc,mlr::acc,mlr::brier))

      # Retrieve AUC and selected variables
      analyzeFeatSelResult(result)
      # Result: auc.test.mean=0.9614525 Variables selected: x.10, x.11, x.15, x.17, x.18

      # 3. Retrieve predictions on tests sets (to later perform Delong tests on AUCs derived from multiple sets of candidate variables)
      #################################################################################################################################

      # create new dataset with selected predictors
      keep <- c("x.10","x.11","x.15","x.17","x.18","classes")
      dataset3 <- dataset2[ , names(dataset2) %in% keep]

      # Perform same tasks with resample function instead of selectFeatures function to get predictions on tests set
      mL <- makeLearner("classif.nnet", predict.type = "prob")
      ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
      mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)
      set.seed(12)
      mCT <- makeClassifTask(data =dataset3, target = "classes")
      r1r = resample(mL, mCT, mRD, measures = list(mlr::auc,mlr::acc,mlr::brier))
      # Result: auc.test.mean=0.9673023






      cross-validation feature-selection mlr






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 9 at 10:06

























      asked Nov 9 at 8:57









      Chris

      215




      215
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          ctrl is missing in your code.



          For getting predictions of your resample object, just use getRRPredictions(r1r) or
          r1r$measures.test.






          share|improve this answer





















          • Indeed ctrl was missing. I have added it. My question is not on getting predictions from the resample object, I already did it (it was my first attempt). The issue with this attempt is that resample function give a different AUC than the one of makeClassifTask. I have edited my question to make it more clear. Thx!
            – Chris
            Nov 9 at 10:10












          • You could use "makeFeatSelWrapper" as alternative. I also get different results, like you...
            – PhilippPro
            Nov 9 at 12:52










          • Does makeFeatSelWrapperd do the whole i.e. CV+feature selection+prediced values ?
            – Chris
            Nov 9 at 15:10










          • It seems indeed to be a solution. What is however strange is that I get model without variables at the end for logistic regression and erros for a neural network. I'm going to open a separate question for this.
            – Chris
            Nov 12 at 12:45











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53222557%2fget-predictions-on-test-sets-in-mlr%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote













          ctrl is missing in your code.



          For getting predictions of your resample object, just use getRRPredictions(r1r) or
          r1r$measures.test.






          share|improve this answer





















          • Indeed ctrl was missing. I have added it. My question is not on getting predictions from the resample object, I already did it (it was my first attempt). The issue with this attempt is that resample function give a different AUC than the one of makeClassifTask. I have edited my question to make it more clear. Thx!
            – Chris
            Nov 9 at 10:10












          • You could use "makeFeatSelWrapper" as alternative. I also get different results, like you...
            – PhilippPro
            Nov 9 at 12:52










          • Does makeFeatSelWrapperd do the whole i.e. CV+feature selection+prediced values ?
            – Chris
            Nov 9 at 15:10










          • It seems indeed to be a solution. What is however strange is that I get model without variables at the end for logistic regression and erros for a neural network. I'm going to open a separate question for this.
            – Chris
            Nov 12 at 12:45















          up vote
          1
          down vote













          ctrl is missing in your code.



          For getting predictions of your resample object, just use getRRPredictions(r1r) or
          r1r$measures.test.






          share|improve this answer





















          • Indeed ctrl was missing. I have added it. My question is not on getting predictions from the resample object, I already did it (it was my first attempt). The issue with this attempt is that resample function give a different AUC than the one of makeClassifTask. I have edited my question to make it more clear. Thx!
            – Chris
            Nov 9 at 10:10












          • You could use "makeFeatSelWrapper" as alternative. I also get different results, like you...
            – PhilippPro
            Nov 9 at 12:52










          • Does makeFeatSelWrapperd do the whole i.e. CV+feature selection+prediced values ?
            – Chris
            Nov 9 at 15:10










          • It seems indeed to be a solution. What is however strange is that I get model without variables at the end for logistic regression and erros for a neural network. I'm going to open a separate question for this.
            – Chris
            Nov 12 at 12:45













          up vote
          1
          down vote










          up vote
          1
          down vote









          ctrl is missing in your code.



          For getting predictions of your resample object, just use getRRPredictions(r1r) or
          r1r$measures.test.






          share|improve this answer












          ctrl is missing in your code.



          For getting predictions of your resample object, just use getRRPredictions(r1r) or
          r1r$measures.test.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 9 at 9:52









          PhilippPro

          38617




          38617












          • Indeed ctrl was missing. I have added it. My question is not on getting predictions from the resample object, I already did it (it was my first attempt). The issue with this attempt is that resample function give a different AUC than the one of makeClassifTask. I have edited my question to make it more clear. Thx!
            – Chris
            Nov 9 at 10:10












          • You could use "makeFeatSelWrapper" as alternative. I also get different results, like you...
            – PhilippPro
            Nov 9 at 12:52










          • Does makeFeatSelWrapperd do the whole i.e. CV+feature selection+prediced values ?
            – Chris
            Nov 9 at 15:10










          • It seems indeed to be a solution. What is however strange is that I get model without variables at the end for logistic regression and erros for a neural network. I'm going to open a separate question for this.
            – Chris
            Nov 12 at 12:45


















          • Indeed ctrl was missing. I have added it. My question is not on getting predictions from the resample object, I already did it (it was my first attempt). The issue with this attempt is that resample function give a different AUC than the one of makeClassifTask. I have edited my question to make it more clear. Thx!
            – Chris
            Nov 9 at 10:10












          • You could use "makeFeatSelWrapper" as alternative. I also get different results, like you...
            – PhilippPro
            Nov 9 at 12:52










          • Does makeFeatSelWrapperd do the whole i.e. CV+feature selection+prediced values ?
            – Chris
            Nov 9 at 15:10










          • It seems indeed to be a solution. What is however strange is that I get model without variables at the end for logistic regression and erros for a neural network. I'm going to open a separate question for this.
            – Chris
            Nov 12 at 12:45
















          Indeed ctrl was missing. I have added it. My question is not on getting predictions from the resample object, I already did it (it was my first attempt). The issue with this attempt is that resample function give a different AUC than the one of makeClassifTask. I have edited my question to make it more clear. Thx!
          – Chris
          Nov 9 at 10:10






          Indeed ctrl was missing. I have added it. My question is not on getting predictions from the resample object, I already did it (it was my first attempt). The issue with this attempt is that resample function give a different AUC than the one of makeClassifTask. I have edited my question to make it more clear. Thx!
          – Chris
          Nov 9 at 10:10














          You could use "makeFeatSelWrapper" as alternative. I also get different results, like you...
          – PhilippPro
          Nov 9 at 12:52




          You could use "makeFeatSelWrapper" as alternative. I also get different results, like you...
          – PhilippPro
          Nov 9 at 12:52












          Does makeFeatSelWrapperd do the whole i.e. CV+feature selection+prediced values ?
          – Chris
          Nov 9 at 15:10




          Does makeFeatSelWrapperd do the whole i.e. CV+feature selection+prediced values ?
          – Chris
          Nov 9 at 15:10












          It seems indeed to be a solution. What is however strange is that I get model without variables at the end for logistic regression and erros for a neural network. I'm going to open a separate question for this.
          – Chris
          Nov 12 at 12:45




          It seems indeed to be a solution. What is however strange is that I get model without variables at the end for logistic regression and erros for a neural network. I'm going to open a separate question for this.
          – Chris
          Nov 12 at 12:45


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53222557%2fget-predictions-on-test-sets-in-mlr%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Guess what letter conforming each word

          Port of Spain

          Run scheduled task as local user group (not BUILTIN)