Using Caret to select features within folds of a cross validation

Multi tool use
up vote
1
down vote
favorite
In the caret package, is there any way to use the recursive feature elimination function within the folds of a cross validation scheme in a trainControl that passes to a train function which uses tuning grids?
I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.
I've played around with a bunch of different methods to do this, but none are perfect. For example, I can make my own cross validation folds and run trainControl with method = 'none' but that won't utilize the training grid in train (an evaluation group is needed for that). I can also make my own cv folds, and have method = 'cv', in the trainControl (I can use a tuning grid here), but the best tune is chosen on the hold-out sample generated by the trainControl hold-out, not my hold-out.
Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?
In my work-flow, I am testing a few different model types with their own tuning grids. There are parts of caret I really like, and I've spent a ton of time on this so I'd like to use it, but this is a deal breaker if I can't get it to work. I'm open to any suggestions!
Thanks in advance-
SOLUTION:
My solution may not be the most efficient, but it seems to work. I made my cross validation folds using information here: https://stats.stackexchange.com/questions/61090/how-to-split-a-data-set-to-do-10-fold-cross-validation.
Using the createFolds (caret function) does not create equal folds, so I opted for the second solution. It looks like you may could do it with caret's stratified sampling, but I haven't explored that yet.
This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.
## Make the folds for the cross validation
folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%
sample(., length(.), replace= F)
for(f in 1:10) {
testIndexes <- which(folds == f,arr.ind=TRUE)
trainIndexes <- which(folds != f, arr.ind= T)
trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)
testIndexList <- replicate(500, testIndexes, simplify = F)
testData <- data[testIndexes, ]
trainData <- data[-testIndexes, ]
## Make the train control object
train_control <- trainControl(method = 'boot',
numbe r= 1,
summaryFunction = modfun,
preProcOptions = c('center', 'scale', newdata= testData),
index = trainIndexList,
indexOut = testIndexList,
classProbs = T,
savePredictions = T)
## Feature Selection
## Make the control for the recursive feature elimination
rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)
## Generate the data frame based on feature selection
fs_results <- rfe(trainData[,2:ncol(trainData)],
trainData[,'target'],
sizes=c(2:ncol(trainData)),
rfeControl= rfe_control)
use_features <- c('target', predictors(fs_results))
features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%
rbind(features, .) ## Specify features as a data frame ahead of time
data_min <- data[, use_features] %>% data.frame()
...(modeling code, including train functions and desired output)...
}
I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.
cross-validation r-caret feature-selection
add a comment |
up vote
1
down vote
favorite
In the caret package, is there any way to use the recursive feature elimination function within the folds of a cross validation scheme in a trainControl that passes to a train function which uses tuning grids?
I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.
I've played around with a bunch of different methods to do this, but none are perfect. For example, I can make my own cross validation folds and run trainControl with method = 'none' but that won't utilize the training grid in train (an evaluation group is needed for that). I can also make my own cv folds, and have method = 'cv', in the trainControl (I can use a tuning grid here), but the best tune is chosen on the hold-out sample generated by the trainControl hold-out, not my hold-out.
Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?
In my work-flow, I am testing a few different model types with their own tuning grids. There are parts of caret I really like, and I've spent a ton of time on this so I'd like to use it, but this is a deal breaker if I can't get it to work. I'm open to any suggestions!
Thanks in advance-
SOLUTION:
My solution may not be the most efficient, but it seems to work. I made my cross validation folds using information here: https://stats.stackexchange.com/questions/61090/how-to-split-a-data-set-to-do-10-fold-cross-validation.
Using the createFolds (caret function) does not create equal folds, so I opted for the second solution. It looks like you may could do it with caret's stratified sampling, but I haven't explored that yet.
This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.
## Make the folds for the cross validation
folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%
sample(., length(.), replace= F)
for(f in 1:10) {
testIndexes <- which(folds == f,arr.ind=TRUE)
trainIndexes <- which(folds != f, arr.ind= T)
trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)
testIndexList <- replicate(500, testIndexes, simplify = F)
testData <- data[testIndexes, ]
trainData <- data[-testIndexes, ]
## Make the train control object
train_control <- trainControl(method = 'boot',
numbe r= 1,
summaryFunction = modfun,
preProcOptions = c('center', 'scale', newdata= testData),
index = trainIndexList,
indexOut = testIndexList,
classProbs = T,
savePredictions = T)
## Feature Selection
## Make the control for the recursive feature elimination
rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)
## Generate the data frame based on feature selection
fs_results <- rfe(trainData[,2:ncol(trainData)],
trainData[,'target'],
sizes=c(2:ncol(trainData)),
rfeControl= rfe_control)
use_features <- c('target', predictors(fs_results))
features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%
rbind(features, .) ## Specify features as a data frame ahead of time
data_min <- data[, use_features] %>% data.frame()
...(modeling code, including train functions and desired output)...
}
I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.
cross-validation r-caret feature-selection
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
In the caret package, is there any way to use the recursive feature elimination function within the folds of a cross validation scheme in a trainControl that passes to a train function which uses tuning grids?
I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.
I've played around with a bunch of different methods to do this, but none are perfect. For example, I can make my own cross validation folds and run trainControl with method = 'none' but that won't utilize the training grid in train (an evaluation group is needed for that). I can also make my own cv folds, and have method = 'cv', in the trainControl (I can use a tuning grid here), but the best tune is chosen on the hold-out sample generated by the trainControl hold-out, not my hold-out.
Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?
In my work-flow, I am testing a few different model types with their own tuning grids. There are parts of caret I really like, and I've spent a ton of time on this so I'd like to use it, but this is a deal breaker if I can't get it to work. I'm open to any suggestions!
Thanks in advance-
SOLUTION:
My solution may not be the most efficient, but it seems to work. I made my cross validation folds using information here: https://stats.stackexchange.com/questions/61090/how-to-split-a-data-set-to-do-10-fold-cross-validation.
Using the createFolds (caret function) does not create equal folds, so I opted for the second solution. It looks like you may could do it with caret's stratified sampling, but I haven't explored that yet.
This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.
## Make the folds for the cross validation
folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%
sample(., length(.), replace= F)
for(f in 1:10) {
testIndexes <- which(folds == f,arr.ind=TRUE)
trainIndexes <- which(folds != f, arr.ind= T)
trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)
testIndexList <- replicate(500, testIndexes, simplify = F)
testData <- data[testIndexes, ]
trainData <- data[-testIndexes, ]
## Make the train control object
train_control <- trainControl(method = 'boot',
numbe r= 1,
summaryFunction = modfun,
preProcOptions = c('center', 'scale', newdata= testData),
index = trainIndexList,
indexOut = testIndexList,
classProbs = T,
savePredictions = T)
## Feature Selection
## Make the control for the recursive feature elimination
rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)
## Generate the data frame based on feature selection
fs_results <- rfe(trainData[,2:ncol(trainData)],
trainData[,'target'],
sizes=c(2:ncol(trainData)),
rfeControl= rfe_control)
use_features <- c('target', predictors(fs_results))
features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%
rbind(features, .) ## Specify features as a data frame ahead of time
data_min <- data[, use_features] %>% data.frame()
...(modeling code, including train functions and desired output)...
}
I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.
cross-validation r-caret feature-selection
In the caret package, is there any way to use the recursive feature elimination function within the folds of a cross validation scheme in a trainControl that passes to a train function which uses tuning grids?
I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.
I've played around with a bunch of different methods to do this, but none are perfect. For example, I can make my own cross validation folds and run trainControl with method = 'none' but that won't utilize the training grid in train (an evaluation group is needed for that). I can also make my own cv folds, and have method = 'cv', in the trainControl (I can use a tuning grid here), but the best tune is chosen on the hold-out sample generated by the trainControl hold-out, not my hold-out.
Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?
In my work-flow, I am testing a few different model types with their own tuning grids. There are parts of caret I really like, and I've spent a ton of time on this so I'd like to use it, but this is a deal breaker if I can't get it to work. I'm open to any suggestions!
Thanks in advance-
SOLUTION:
My solution may not be the most efficient, but it seems to work. I made my cross validation folds using information here: https://stats.stackexchange.com/questions/61090/how-to-split-a-data-set-to-do-10-fold-cross-validation.
Using the createFolds (caret function) does not create equal folds, so I opted for the second solution. It looks like you may could do it with caret's stratified sampling, but I haven't explored that yet.
This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.
## Make the folds for the cross validation
folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%
sample(., length(.), replace= F)
for(f in 1:10) {
testIndexes <- which(folds == f,arr.ind=TRUE)
trainIndexes <- which(folds != f, arr.ind= T)
trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)
testIndexList <- replicate(500, testIndexes, simplify = F)
testData <- data[testIndexes, ]
trainData <- data[-testIndexes, ]
## Make the train control object
train_control <- trainControl(method = 'boot',
numbe r= 1,
summaryFunction = modfun,
preProcOptions = c('center', 'scale', newdata= testData),
index = trainIndexList,
indexOut = testIndexList,
classProbs = T,
savePredictions = T)
## Feature Selection
## Make the control for the recursive feature elimination
rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)
## Generate the data frame based on feature selection
fs_results <- rfe(trainData[,2:ncol(trainData)],
trainData[,'target'],
sizes=c(2:ncol(trainData)),
rfeControl= rfe_control)
use_features <- c('target', predictors(fs_results))
features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%
rbind(features, .) ## Specify features as a data frame ahead of time
data_min <- data[, use_features] %>% data.frame()
...(modeling code, including train functions and desired output)...
}
I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.
cross-validation r-caret feature-selection
cross-validation r-caret feature-selection
edited Nov 8 at 9:19


jmuhlenkamp
1,113424
1,113424
asked Aug 3 '17 at 18:41
RoseS
557
557
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
add a comment |
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45492147%2fusing-caret-to-select-features-within-folds-of-a-cross-validation%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
ilC xgv9M1tSt J
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15