Using Caret to select features within folds of a cross validation
up vote
1
down vote
favorite
In the caret package, is there any way to use the recursive feature elimination function within the folds of a cross validation scheme in a trainControl that passes to a train function which uses tuning grids?
I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.
I've played around with a bunch of different methods to do this, but none are perfect. For example, I can make my own cross validation folds and run trainControl with method = 'none' but that won't utilize the training grid in train (an evaluation group is needed for that). I can also make my own cv folds, and have method = 'cv', in the trainControl (I can use a tuning grid here), but the best tune is chosen on the hold-out sample generated by the trainControl hold-out, not my hold-out.
Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?
In my work-flow, I am testing a few different model types with their own tuning grids. There are parts of caret I really like, and I've spent a ton of time on this so I'd like to use it, but this is a deal breaker if I can't get it to work. I'm open to any suggestions!
Thanks in advance-
SOLUTION:
My solution may not be the most efficient, but it seems to work. I made my cross validation folds using information here: https://stats.stackexchange.com/questions/61090/how-to-split-a-data-set-to-do-10-fold-cross-validation.
Using the createFolds (caret function) does not create equal folds, so I opted for the second solution. It looks like you may could do it with caret's stratified sampling, but I haven't explored that yet.
This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.
## Make the folds for the cross validation
folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%
sample(., length(.), replace= F)
for(f in 1:10) {
testIndexes <- which(folds == f,arr.ind=TRUE)
trainIndexes <- which(folds != f, arr.ind= T)
trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)
testIndexList <- replicate(500, testIndexes, simplify = F)
testData <- data[testIndexes, ]
trainData <- data[-testIndexes, ]
## Make the train control object
train_control <- trainControl(method = 'boot',
numbe r= 1,
summaryFunction = modfun,
preProcOptions = c('center', 'scale', newdata= testData),
index = trainIndexList,
indexOut = testIndexList,
classProbs = T,
savePredictions = T)
## Feature Selection
## Make the control for the recursive feature elimination
rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)
## Generate the data frame based on feature selection
fs_results <- rfe(trainData[,2:ncol(trainData)],
trainData[,'target'],
sizes=c(2:ncol(trainData)),
rfeControl= rfe_control)
use_features <- c('target', predictors(fs_results))
features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%
rbind(features, .) ## Specify features as a data frame ahead of time
data_min <- data[, use_features] %>% data.frame()
...(modeling code, including train functions and desired output)...
}
I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.
cross-validation r-caret feature-selection
add a comment |
up vote
1
down vote
favorite
In the caret package, is there any way to use the recursive feature elimination function within the folds of a cross validation scheme in a trainControl that passes to a train function which uses tuning grids?
I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.
I've played around with a bunch of different methods to do this, but none are perfect. For example, I can make my own cross validation folds and run trainControl with method = 'none' but that won't utilize the training grid in train (an evaluation group is needed for that). I can also make my own cv folds, and have method = 'cv', in the trainControl (I can use a tuning grid here), but the best tune is chosen on the hold-out sample generated by the trainControl hold-out, not my hold-out.
Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?
In my work-flow, I am testing a few different model types with their own tuning grids. There are parts of caret I really like, and I've spent a ton of time on this so I'd like to use it, but this is a deal breaker if I can't get it to work. I'm open to any suggestions!
Thanks in advance-
SOLUTION:
My solution may not be the most efficient, but it seems to work. I made my cross validation folds using information here: https://stats.stackexchange.com/questions/61090/how-to-split-a-data-set-to-do-10-fold-cross-validation.
Using the createFolds (caret function) does not create equal folds, so I opted for the second solution. It looks like you may could do it with caret's stratified sampling, but I haven't explored that yet.
This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.
## Make the folds for the cross validation
folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%
sample(., length(.), replace= F)
for(f in 1:10) {
testIndexes <- which(folds == f,arr.ind=TRUE)
trainIndexes <- which(folds != f, arr.ind= T)
trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)
testIndexList <- replicate(500, testIndexes, simplify = F)
testData <- data[testIndexes, ]
trainData <- data[-testIndexes, ]
## Make the train control object
train_control <- trainControl(method = 'boot',
numbe r= 1,
summaryFunction = modfun,
preProcOptions = c('center', 'scale', newdata= testData),
index = trainIndexList,
indexOut = testIndexList,
classProbs = T,
savePredictions = T)
## Feature Selection
## Make the control for the recursive feature elimination
rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)
## Generate the data frame based on feature selection
fs_results <- rfe(trainData[,2:ncol(trainData)],
trainData[,'target'],
sizes=c(2:ncol(trainData)),
rfeControl= rfe_control)
use_features <- c('target', predictors(fs_results))
features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%
rbind(features, .) ## Specify features as a data frame ahead of time
data_min <- data[, use_features] %>% data.frame()
...(modeling code, including train functions and desired output)...
}
I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.
cross-validation r-caret feature-selection
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
In the caret package, is there any way to use the recursive feature elimination function within the folds of a cross validation scheme in a trainControl that passes to a train function which uses tuning grids?
I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.
I've played around with a bunch of different methods to do this, but none are perfect. For example, I can make my own cross validation folds and run trainControl with method = 'none' but that won't utilize the training grid in train (an evaluation group is needed for that). I can also make my own cv folds, and have method = 'cv', in the trainControl (I can use a tuning grid here), but the best tune is chosen on the hold-out sample generated by the trainControl hold-out, not my hold-out.
Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?
In my work-flow, I am testing a few different model types with their own tuning grids. There are parts of caret I really like, and I've spent a ton of time on this so I'd like to use it, but this is a deal breaker if I can't get it to work. I'm open to any suggestions!
Thanks in advance-
SOLUTION:
My solution may not be the most efficient, but it seems to work. I made my cross validation folds using information here: https://stats.stackexchange.com/questions/61090/how-to-split-a-data-set-to-do-10-fold-cross-validation.
Using the createFolds (caret function) does not create equal folds, so I opted for the second solution. It looks like you may could do it with caret's stratified sampling, but I haven't explored that yet.
This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.
## Make the folds for the cross validation
folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%
sample(., length(.), replace= F)
for(f in 1:10) {
testIndexes <- which(folds == f,arr.ind=TRUE)
trainIndexes <- which(folds != f, arr.ind= T)
trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)
testIndexList <- replicate(500, testIndexes, simplify = F)
testData <- data[testIndexes, ]
trainData <- data[-testIndexes, ]
## Make the train control object
train_control <- trainControl(method = 'boot',
numbe r= 1,
summaryFunction = modfun,
preProcOptions = c('center', 'scale', newdata= testData),
index = trainIndexList,
indexOut = testIndexList,
classProbs = T,
savePredictions = T)
## Feature Selection
## Make the control for the recursive feature elimination
rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)
## Generate the data frame based on feature selection
fs_results <- rfe(trainData[,2:ncol(trainData)],
trainData[,'target'],
sizes=c(2:ncol(trainData)),
rfeControl= rfe_control)
use_features <- c('target', predictors(fs_results))
features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%
rbind(features, .) ## Specify features as a data frame ahead of time
data_min <- data[, use_features] %>% data.frame()
...(modeling code, including train functions and desired output)...
}
I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.
cross-validation r-caret feature-selection
In the caret package, is there any way to use the recursive feature elimination function within the folds of a cross validation scheme in a trainControl that passes to a train function which uses tuning grids?
I love the recursive feature elimination function, but it really should be applied to the training folds in cross validation and then tested on the hold-out fold.
I've played around with a bunch of different methods to do this, but none are perfect. For example, I can make my own cross validation folds and run trainControl with method = 'none' but that won't utilize the training grid in train (an evaluation group is needed for that). I can also make my own cv folds, and have method = 'cv', in the trainControl (I can use a tuning grid here), but the best tune is chosen on the hold-out sample generated by the trainControl hold-out, not my hold-out.
Is there a way to tell caret to evaluate the models with the tuning grids on my pre-specified hold-out fold (the one taken prior to feature elimination)?
In my work-flow, I am testing a few different model types with their own tuning grids. There are parts of caret I really like, and I've spent a ton of time on this so I'd like to use it, but this is a deal breaker if I can't get it to work. I'm open to any suggestions!
Thanks in advance-
SOLUTION:
My solution may not be the most efficient, but it seems to work. I made my cross validation folds using information here: https://stats.stackexchange.com/questions/61090/how-to-split-a-data-set-to-do-10-fold-cross-validation.
Using the createFolds (caret function) does not create equal folds, so I opted for the second solution. It looks like you may could do it with caret's stratified sampling, but I haven't explored that yet.
This code utilizes a bootstrap approach within each cv fold and predicts all of the observations in the hold out fold for each iteration.
## Make the folds for the cross validation
folds <- cut(seq(1,nrow(data)), breaks=10, labels=FALSE) %>%
sample(., length(.), replace= F)
for(f in 1:10) {
testIndexes <- which(folds == f,arr.ind=TRUE)
trainIndexes <- which(folds != f, arr.ind= T)
trainIndexList <- replicate(500, sample(trainIndexes, length(trainIndexes), replace = T), simplify = F)
testIndexList <- replicate(500, testIndexes, simplify = F)
testData <- data[testIndexes, ]
trainData <- data[-testIndexes, ]
## Make the train control object
train_control <- trainControl(method = 'boot',
numbe r= 1,
summaryFunction = modfun,
preProcOptions = c('center', 'scale', newdata= testData),
index = trainIndexList,
indexOut = testIndexList,
classProbs = T,
savePredictions = T)
## Feature Selection
## Make the control for the recursive feature elimination
rfe_control <- rfeControl(functions = rfFuncs, method = 'cv', number= 10)
## Generate the data frame based on feature selection
fs_results <- rfe(trainData[,2:ncol(trainData)],
trainData[,'target'],
sizes=c(2:ncol(trainData)),
rfeControl= rfe_control)
use_features <- c('target', predictors(fs_results))
features <- predictors(fs_results) %>% data.frame(features= .) %>% mutate(fold= f) %>%
rbind(features, .) ## Specify features as a data frame ahead of time
data_min <- data[, use_features] %>% data.frame()
...(modeling code, including train functions and desired output)...
}
I haven't tried to do an lapply instead of a for loop yet. I'd appreciate any suggestions for efficiency.
cross-validation r-caret feature-selection
cross-validation r-caret feature-selection
edited Nov 8 at 9:19
jmuhlenkamp
1,113424
1,113424
asked Aug 3 '17 at 18:41
RoseS
557
557
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
add a comment |
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45492147%2fusing-caret-to-select-features-within-folds-of-a-cross-validation%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
had you got any solutions
– Shubham Sharma
Jan 24 at 13:22
I ended up assigning folds outside the loop: Then, for
– RoseS
Mar 16 at 17:15