Using Caret to select features within folds of a cross validation

up vote
1
down vote

favorite

In the caret package, is there any way to use the recursive feature elimination function within the folds of a cross validation scheme in a trainControl that passes to a train function which uses tuning grids?

I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.

I've played around with a bunch of different methods to do this, but none are perfect. For example, I can make my own cross validation folds and run trainControl with method = 'none' but that won't utilize the training grid in train (an evaluation group is needed for that). I can also make my own cv folds, and have method = 'cv', in the trainControl (I can use a tuning grid here), but the best tune is chosen on the hold-out sample generated by the trainControl hold-out, not my hold-out.

Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?

In my work-flow, I am testing a few different model types with their own tuning grids. There are parts of caret I really like, and I've spent a ton of time on this so I'd like to use it, but this is a deal breaker if I can't get it to work. I'm open to any suggestions!

Thanks in advance-

SOLUTION:
My solution may not be the most efficient, but it seems to work. I made my cross validation folds using information here: https://stats.stackexchange.com/questions/61090/how-to-split-a-data-set-to-do-10-fold-cross-validation.
Using the createFolds (caret function) does not create equal folds, so I opted for the second solution. It looks like you may could do it with caret's stratified sampling, but I haven't explored that yet.

This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.

  ## Make the folds for the cross validation

  folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%

    sample(., length(.), replace= F)



  for(f in 1:10) { 



    testIndexes <- which(folds == f,arr.ind=TRUE)

    trainIndexes <- which(folds != f, arr.ind= T)



    trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)

    testIndexList <- replicate(500, testIndexes, simplify = F)



    testData <- data[testIndexes, ]

    trainData <- data[-testIndexes, ]



    ## Make the train control object

    train_control <- trainControl(method = 'boot', 

                                  numbe r= 1,

                                  summaryFunction = modfun,

                                  preProcOptions = c('center', 'scale', newdata= testData),

                                  index = trainIndexList,

                                  indexOut = testIndexList,

                                  classProbs = T,

                                  savePredictions = T)



  ## Feature Selection

    ## Make the control for the recursive feature elimination

  rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)



    ## Generate the data frame based on feature selection

  fs_results <- rfe(trainData[,2:ncol(trainData)],

                    trainData[,'target'],

                    sizes=c(2:ncol(trainData)),

                    rfeControl= rfe_control)



  use_features <- c('target', predictors(fs_results))



  features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%

    rbind(features, .) ## Specify features as a data frame ahead of time



  data_min <- data[, use_features] %>% data.frame()

...(modeling code, including train functions and desired output)...

I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.

edited Nov 8 at 9:19

jmuhlenkamp

1,113424

asked Aug 3 '17 at 18:41

RoseS

557

had you got any solutions
– Shubham Sharma
Jan 24 at 13:22

I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15

add a comment |

up vote
1
down vote

favorite

I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.

Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?

Thanks in advance-

This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.

  ## Make the folds for the cross validation

  folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%

    sample(., length(.), replace= F)



  for(f in 1:10) { 



    testIndexes <- which(folds == f,arr.ind=TRUE)

    trainIndexes <- which(folds != f, arr.ind= T)



    trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)

    testIndexList <- replicate(500, testIndexes, simplify = F)



    testData <- data[testIndexes, ]

    trainData <- data[-testIndexes, ]



    ## Make the train control object

    train_control <- trainControl(method = 'boot', 

                                  numbe r= 1,

                                  summaryFunction = modfun,

                                  preProcOptions = c('center', 'scale', newdata= testData),

                                  index = trainIndexList,

                                  indexOut = testIndexList,

                                  classProbs = T,

                                  savePredictions = T)



  ## Feature Selection

    ## Make the control for the recursive feature elimination

  rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)



    ## Generate the data frame based on feature selection

  fs_results <- rfe(trainData[,2:ncol(trainData)],

                    trainData[,'target'],

                    sizes=c(2:ncol(trainData)),

                    rfeControl= rfe_control)



  use_features <- c('target', predictors(fs_results))



  features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%

    rbind(features, .) ## Specify features as a data frame ahead of time



  data_min <- data[, use_features] %>% data.frame()

...(modeling code, including train functions and desired output)...

I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.

edited Nov 8 at 9:19

jmuhlenkamp

1,113424

asked Aug 3 '17 at 18:41

RoseS

557

had you got any solutions
– Shubham Sharma
Jan 24 at 13:22

I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15

add a comment |

up vote
1
down vote

favorite

I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.

Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?

Thanks in advance-

This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.

  ## Make the folds for the cross validation

  folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%

    sample(., length(.), replace= F)



  for(f in 1:10) { 



    testIndexes <- which(folds == f,arr.ind=TRUE)

    trainIndexes <- which(folds != f, arr.ind= T)



    trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)

    testIndexList <- replicate(500, testIndexes, simplify = F)



    testData <- data[testIndexes, ]

    trainData <- data[-testIndexes, ]



    ## Make the train control object

    train_control <- trainControl(method = 'boot', 

                                  numbe r= 1,

                                  summaryFunction = modfun,

                                  preProcOptions = c('center', 'scale', newdata= testData),

                                  index = trainIndexList,

                                  indexOut = testIndexList,

                                  classProbs = T,

                                  savePredictions = T)



  ## Feature Selection

    ## Make the control for the recursive feature elimination

  rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)



    ## Generate the data frame based on feature selection

  fs_results <- rfe(trainData[,2:ncol(trainData)],

                    trainData[,'target'],

                    sizes=c(2:ncol(trainData)),

                    rfeControl= rfe_control)



  use_features <- c('target', predictors(fs_results))



  features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%

    rbind(features, .) ## Specify features as a data frame ahead of time



  data_min <- data[, use_features] %>% data.frame()

...(modeling code, including train functions and desired output)...

I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.

edited Nov 8 at 9:19

jmuhlenkamp

1,113424

asked Aug 3 '17 at 18:41

RoseS

557

I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.

Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?

Thanks in advance-

This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.

  ## Make the folds for the cross validation

  folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%

    sample(., length(.), replace= F)



  for(f in 1:10) { 



    testIndexes <- which(folds == f,arr.ind=TRUE)

    trainIndexes <- which(folds != f, arr.ind= T)



    trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)

    testIndexList <- replicate(500, testIndexes, simplify = F)



    testData <- data[testIndexes, ]

    trainData <- data[-testIndexes, ]



    ## Make the train control object

    train_control <- trainControl(method = 'boot', 

                                  numbe r= 1,

                                  summaryFunction = modfun,

                                  preProcOptions = c('center', 'scale', newdata= testData),

                                  index = trainIndexList,

                                  indexOut = testIndexList,

                                  classProbs = T,

                                  savePredictions = T)



  ## Feature Selection

    ## Make the control for the recursive feature elimination

  rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)



    ## Generate the data frame based on feature selection

  fs_results <- rfe(trainData[,2:ncol(trainData)],

                    trainData[,'target'],

                    sizes=c(2:ncol(trainData)),

                    rfeControl= rfe_control)



  use_features <- c('target', predictors(fs_results))



  features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%

    rbind(features, .) ## Specify features as a data frame ahead of time



  data_min <- data[, use_features] %>% data.frame()

...(modeling code, including train functions and desired output)...

I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.

cross-validation r-caret feature-selection

edited Nov 8 at 9:19

jmuhlenkamp

1,113424

asked Aug 3 '17 at 18:41

RoseS

557

edited Nov 8 at 9:19

jmuhlenkamp

1,113424

asked Aug 3 '17 at 18:41

RoseS

557

edited Nov 8 at 9:19

jmuhlenkamp

1,113424

edited Nov 8 at 9:19

jmuhlenkamp

1,113424

edited Nov 8 at 9:19

jmuhlenkamp

1,113424

asked Aug 3 '17 at 18:41

RoseS

557

asked Aug 3 '17 at 18:41

RoseS

557

asked Aug 3 '17 at 18:41

RoseS

557

had you got any solutions
– Shubham Sharma
Jan 24 at 13:22

I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15

add a comment |

had you got any solutions
– Shubham Sharma
Jan 24 at 13:22

I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15

had you got any solutions
– Shubham Sharma
Jan 24 at 13:22

I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45492147%2fusing-caret-to-select-features-within-folds-of-a-cross-validation%23new-answer', 'question_page');
}
);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk