Error when trying to pass custom metric in Caret package

up vote
1
down vote

favorite

Related question - 1

I have a dataset like so:

> head(training_data)

  year     month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser

1 2016   October          Social     1477775021           1                  0  Chrome

2 2016 September          Social     1473037945           1                  0  Safari

3 2017      July  Organic Search     1500305542           1                  0  Chrome

4 2017      July  Organic Search     1500322111           2              16569  Chrome

5 2016    August          Social     1471890172           1                  0  Safari

6 2017       May          Direct     1495146428           1                  0  Chrome         

  operatingSystem isMobile continent     subContinent       country      source   medium

1         Windows        0  Americas    South America        Brazil youtube.com referral

2       Macintosh        0  Americas Northern America United States youtube.com referral

3         Windows        0  Americas Northern America        Canada      google  organic

4         Windows        0  Americas Northern America        Canada      google  organic

5       Macintosh        0    Africa   Eastern Africa        Zambia youtube.com referral

6         Android        1  Americas Northern America United States    (direct)         

  isTrueDirect hits pageviews positiveTransaction

1            0    1         1                  No

2            0    1         1                  No

3            0    5         5                  No

4            1    3         3                  No

5            0    1         1                  No

6            1    6         6                  No



> str(training_data)

'data.frame':   1000 obs. of  18 variables:

 $ year               : int  2016 2016 2017 2017 2016 2017 2016 2017 2017 2016 ...

 $ month              : Factor w/ 12 levels "January","February",..: 10 9 7 7 8 5 10 3 3 12 ...

 $ channelGrouping    : chr  "Social" "Social" "Organic Search" "Organic Search" ...

 $ visitStartTime     : int  1477775021 1473037945 1500305542 1500322111 1471890172 1495146428 1476003570 1488556031 1490323225 1480696262 ...

 $ visitNumber        : int  1 1 1 2 1 1 1 1 1 1 ...

 $ timeSinceLastVisit : int  0 0 0 16569 0 0 0 0 0 0 ...

 $ browser            : chr  "Chrome" "Safari" "Chrome" "Chrome" ...

 $ operatingSystem    : chr  "Windows" "Macintosh" "Windows" "Windows" ...

 $ isMobile           : int  0 0 0 0 0 1 0 1 0 0 ...

 $ continent          : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 1 2 3 3 2 4 ...

 $ subContinent       : chr  "South America" "Northern America" "Northern America" "Northern America" ...

 $ country            : chr  "Brazil" "United States" "Canada" "Canada" ...

 $ source             : chr  "youtube.com" "youtube.com" "google" "google" ...

 $ medium             : chr  "referral" "referral" "organic" "organic" ...

 $ isTrueDirect       : int  0 0 0 1 0 1 0 0 0 0 ...

 $ hits               : int  1 1 5 3 1 6 1 1 2 1 ...

 $ pageviews          : int  1 1 5 3 1 6 1 1 2 1 ...

 $ positiveTransaction: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 …

I then define my custom RMSLE function using Metrics package:

rmsleMetric <- function(data, lev = NULL, model = NULL){

    out <- Metrics::rmsle(data$obs, data$pred)

    names(out) <- c("rmsle")

    return (out)

}

Then, I define the trainControl:

tc <- trainControl(method = "repeatedcv",

   number = 5,

   repeats = 5,

   summaryFunction = rmsleMetric,

   classProbs = TRUE)

My grid search:

tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))

Finally, my model:

penalizedLogit_ridge <- train(positiveTransaction ~ .,

    data = training_data,

    metric="rmsle",

    method = "glmnet",

    family = "binomial",

    trControl = tc,

    tuneGrid = tg

)

When I try to run the command above, I get an error:

Something is wrong; all the rmsle metric values are missing:

     rmsle

 Min.   : NA

 1st Qu.: NA

 Median : NA

 Mean   :NaN

 3rd Qu.: NA

 Max.   : NA

 NA's   :11

Error: Stopping

In addition: There were 50 or more warnings (use warnings() to see the first 50)

Looking at warnings, I find:

1: In Ops.factor(1, actual) : ‘+’ not meaningful for factors

2: In Ops.factor(1, predicted) : ‘+’ not meaningful for factors

repeated 25 times

Since the same thing works fine if I change the metric to AUC using prSummary as my summary function, I don't believe that there are any issues with my data.

So, I believe that my function is wrong but I don't know how to figure out why it is wrong.

Any help is highly appreciated.

asked Nov 12 at 20:24

Akshay Gaur

577513

add a comment |

up vote
1
down vote

favorite

Related question - 1

I have a dataset like so:

> head(training_data)

  year     month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser

1 2016   October          Social     1477775021           1                  0  Chrome

2 2016 September          Social     1473037945           1                  0  Safari

3 2017      July  Organic Search     1500305542           1                  0  Chrome

4 2017      July  Organic Search     1500322111           2              16569  Chrome

5 2016    August          Social     1471890172           1                  0  Safari

6 2017       May          Direct     1495146428           1                  0  Chrome         

  operatingSystem isMobile continent     subContinent       country      source   medium

1         Windows        0  Americas    South America        Brazil youtube.com referral

2       Macintosh        0  Americas Northern America United States youtube.com referral

3         Windows        0  Americas Northern America        Canada      google  organic

4         Windows        0  Americas Northern America        Canada      google  organic

5       Macintosh        0    Africa   Eastern Africa        Zambia youtube.com referral

6         Android        1  Americas Northern America United States    (direct)         

  isTrueDirect hits pageviews positiveTransaction

1            0    1         1                  No

2            0    1         1                  No

3            0    5         5                  No

4            1    3         3                  No

5            0    1         1                  No

6            1    6         6                  No



> str(training_data)

'data.frame':   1000 obs. of  18 variables:

 $ year               : int  2016 2016 2017 2017 2016 2017 2016 2017 2017 2016 ...

 $ month              : Factor w/ 12 levels "January","February",..: 10 9 7 7 8 5 10 3 3 12 ...

 $ channelGrouping    : chr  "Social" "Social" "Organic Search" "Organic Search" ...

 $ visitStartTime     : int  1477775021 1473037945 1500305542 1500322111 1471890172 1495146428 1476003570 1488556031 1490323225 1480696262 ...

 $ visitNumber        : int  1 1 1 2 1 1 1 1 1 1 ...

 $ timeSinceLastVisit : int  0 0 0 16569 0 0 0 0 0 0 ...

 $ browser            : chr  "Chrome" "Safari" "Chrome" "Chrome" ...

 $ operatingSystem    : chr  "Windows" "Macintosh" "Windows" "Windows" ...

 $ isMobile           : int  0 0 0 0 0 1 0 1 0 0 ...

 $ continent          : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 1 2 3 3 2 4 ...

 $ subContinent       : chr  "South America" "Northern America" "Northern America" "Northern America" ...

 $ country            : chr  "Brazil" "United States" "Canada" "Canada" ...

 $ source             : chr  "youtube.com" "youtube.com" "google" "google" ...

 $ medium             : chr  "referral" "referral" "organic" "organic" ...

 $ isTrueDirect       : int  0 0 0 1 0 1 0 0 0 0 ...

 $ hits               : int  1 1 5 3 1 6 1 1 2 1 ...

 $ pageviews          : int  1 1 5 3 1 6 1 1 2 1 ...

 $ positiveTransaction: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 …

I then define my custom RMSLE function using Metrics package:

rmsleMetric <- function(data, lev = NULL, model = NULL){

    out <- Metrics::rmsle(data$obs, data$pred)

    names(out) <- c("rmsle")

    return (out)

}

Then, I define the trainControl:

tc <- trainControl(method = "repeatedcv",

   number = 5,

   repeats = 5,

   summaryFunction = rmsleMetric,

   classProbs = TRUE)

My grid search:

tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))

Finally, my model:

penalizedLogit_ridge <- train(positiveTransaction ~ .,

    data = training_data,

    metric="rmsle",

    method = "glmnet",

    family = "binomial",

    trControl = tc,

    tuneGrid = tg

)

When I try to run the command above, I get an error:

Something is wrong; all the rmsle metric values are missing:

     rmsle

 Min.   : NA

 1st Qu.: NA

 Median : NA

 Mean   :NaN

 3rd Qu.: NA

 Max.   : NA

 NA's   :11

Error: Stopping

In addition: There were 50 or more warnings (use warnings() to see the first 50)

Looking at warnings, I find:

1: In Ops.factor(1, actual) : ‘+’ not meaningful for factors

2: In Ops.factor(1, predicted) : ‘+’ not meaningful for factors

repeated 25 times

Since the same thing works fine if I change the metric to AUC using prSummary as my summary function, I don't believe that there are any issues with my data.

So, I believe that my function is wrong but I don't know how to figure out why it is wrong.

Any help is highly appreciated.

asked Nov 12 at 20:24

Akshay Gaur

577513

add a comment |

up vote
1
down vote

favorite

Related question - 1

I have a dataset like so:

> head(training_data)

  year     month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser

1 2016   October          Social     1477775021           1                  0  Chrome

2 2016 September          Social     1473037945           1                  0  Safari

3 2017      July  Organic Search     1500305542           1                  0  Chrome

4 2017      July  Organic Search     1500322111           2              16569  Chrome

5 2016    August          Social     1471890172           1                  0  Safari

6 2017       May          Direct     1495146428           1                  0  Chrome         

  operatingSystem isMobile continent     subContinent       country      source   medium

1         Windows        0  Americas    South America        Brazil youtube.com referral

2       Macintosh        0  Americas Northern America United States youtube.com referral

3         Windows        0  Americas Northern America        Canada      google  organic

4         Windows        0  Americas Northern America        Canada      google  organic

5       Macintosh        0    Africa   Eastern Africa        Zambia youtube.com referral

6         Android        1  Americas Northern America United States    (direct)         

  isTrueDirect hits pageviews positiveTransaction

1            0    1         1                  No

2            0    1         1                  No

3            0    5         5                  No

4            1    3         3                  No

5            0    1         1                  No

6            1    6         6                  No



> str(training_data)

'data.frame':   1000 obs. of  18 variables:

 $ year               : int  2016 2016 2017 2017 2016 2017 2016 2017 2017 2016 ...

 $ month              : Factor w/ 12 levels "January","February",..: 10 9 7 7 8 5 10 3 3 12 ...

 $ channelGrouping    : chr  "Social" "Social" "Organic Search" "Organic Search" ...

 $ visitStartTime     : int  1477775021 1473037945 1500305542 1500322111 1471890172 1495146428 1476003570 1488556031 1490323225 1480696262 ...

 $ visitNumber        : int  1 1 1 2 1 1 1 1 1 1 ...

 $ timeSinceLastVisit : int  0 0 0 16569 0 0 0 0 0 0 ...

 $ browser            : chr  "Chrome" "Safari" "Chrome" "Chrome" ...

 $ operatingSystem    : chr  "Windows" "Macintosh" "Windows" "Windows" ...

 $ isMobile           : int  0 0 0 0 0 1 0 1 0 0 ...

 $ continent          : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 1 2 3 3 2 4 ...

 $ subContinent       : chr  "South America" "Northern America" "Northern America" "Northern America" ...

 $ country            : chr  "Brazil" "United States" "Canada" "Canada" ...

 $ source             : chr  "youtube.com" "youtube.com" "google" "google" ...

 $ medium             : chr  "referral" "referral" "organic" "organic" ...

 $ isTrueDirect       : int  0 0 0 1 0 1 0 0 0 0 ...

 $ hits               : int  1 1 5 3 1 6 1 1 2 1 ...

 $ pageviews          : int  1 1 5 3 1 6 1 1 2 1 ...

 $ positiveTransaction: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 …

I then define my custom RMSLE function using Metrics package:

rmsleMetric <- function(data, lev = NULL, model = NULL){

    out <- Metrics::rmsle(data$obs, data$pred)

    names(out) <- c("rmsle")

    return (out)

}

Then, I define the trainControl:

tc <- trainControl(method = "repeatedcv",

   number = 5,

   repeats = 5,

   summaryFunction = rmsleMetric,

   classProbs = TRUE)

My grid search:

tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))

Finally, my model:

penalizedLogit_ridge <- train(positiveTransaction ~ .,

    data = training_data,

    metric="rmsle",

    method = "glmnet",

    family = "binomial",

    trControl = tc,

    tuneGrid = tg

)

When I try to run the command above, I get an error:

Something is wrong; all the rmsle metric values are missing:

     rmsle

 Min.   : NA

 1st Qu.: NA

 Median : NA

 Mean   :NaN

 3rd Qu.: NA

 Max.   : NA

 NA's   :11

Error: Stopping

In addition: There were 50 or more warnings (use warnings() to see the first 50)

Looking at warnings, I find:

1: In Ops.factor(1, actual) : ‘+’ not meaningful for factors

2: In Ops.factor(1, predicted) : ‘+’ not meaningful for factors

repeated 25 times

Since the same thing works fine if I change the metric to AUC using prSummary as my summary function, I don't believe that there are any issues with my data.

So, I believe that my function is wrong but I don't know how to figure out why it is wrong.

Any help is highly appreciated.

asked Nov 12 at 20:24

Akshay Gaur

577513

Related question - 1

I have a dataset like so:

> head(training_data)

  year     month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser

1 2016   October          Social     1477775021           1                  0  Chrome

2 2016 September          Social     1473037945           1                  0  Safari

3 2017      July  Organic Search     1500305542           1                  0  Chrome

4 2017      July  Organic Search     1500322111           2              16569  Chrome

5 2016    August          Social     1471890172           1                  0  Safari

6 2017       May          Direct     1495146428           1                  0  Chrome         

  operatingSystem isMobile continent     subContinent       country      source   medium

1         Windows        0  Americas    South America        Brazil youtube.com referral

2       Macintosh        0  Americas Northern America United States youtube.com referral

3         Windows        0  Americas Northern America        Canada      google  organic

4         Windows        0  Americas Northern America        Canada      google  organic

5       Macintosh        0    Africa   Eastern Africa        Zambia youtube.com referral

6         Android        1  Americas Northern America United States    (direct)         

  isTrueDirect hits pageviews positiveTransaction

1            0    1         1                  No

2            0    1         1                  No

3            0    5         5                  No

4            1    3         3                  No

5            0    1         1                  No

6            1    6         6                  No



> str(training_data)

'data.frame':   1000 obs. of  18 variables:

 $ year               : int  2016 2016 2017 2017 2016 2017 2016 2017 2017 2016 ...

 $ month              : Factor w/ 12 levels "January","February",..: 10 9 7 7 8 5 10 3 3 12 ...

 $ channelGrouping    : chr  "Social" "Social" "Organic Search" "Organic Search" ...

 $ visitStartTime     : int  1477775021 1473037945 1500305542 1500322111 1471890172 1495146428 1476003570 1488556031 1490323225 1480696262 ...

 $ visitNumber        : int  1 1 1 2 1 1 1 1 1 1 ...

 $ timeSinceLastVisit : int  0 0 0 16569 0 0 0 0 0 0 ...

 $ browser            : chr  "Chrome" "Safari" "Chrome" "Chrome" ...

 $ operatingSystem    : chr  "Windows" "Macintosh" "Windows" "Windows" ...

 $ isMobile           : int  0 0 0 0 0 1 0 1 0 0 ...

 $ continent          : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 1 2 3 3 2 4 ...

 $ subContinent       : chr  "South America" "Northern America" "Northern America" "Northern America" ...

 $ country            : chr  "Brazil" "United States" "Canada" "Canada" ...

 $ source             : chr  "youtube.com" "youtube.com" "google" "google" ...

 $ medium             : chr  "referral" "referral" "organic" "organic" ...

 $ isTrueDirect       : int  0 0 0 1 0 1 0 0 0 0 ...

 $ hits               : int  1 1 5 3 1 6 1 1 2 1 ...

 $ pageviews          : int  1 1 5 3 1 6 1 1 2 1 ...

 $ positiveTransaction: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 …

I then define my custom RMSLE function using Metrics package:

rmsleMetric <- function(data, lev = NULL, model = NULL){

    out <- Metrics::rmsle(data$obs, data$pred)

    names(out) <- c("rmsle")

    return (out)

}

Then, I define the trainControl:

tc <- trainControl(method = "repeatedcv",

   number = 5,

   repeats = 5,

   summaryFunction = rmsleMetric,

   classProbs = TRUE)

My grid search:

tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))

Finally, my model:

penalizedLogit_ridge <- train(positiveTransaction ~ .,

    data = training_data,

    metric="rmsle",

    method = "glmnet",

    family = "binomial",

    trControl = tc,

    tuneGrid = tg

)

When I try to run the command above, I get an error:

Something is wrong; all the rmsle metric values are missing:

     rmsle

 Min.   : NA

 1st Qu.: NA

 Median : NA

 Mean   :NaN

 3rd Qu.: NA

 Max.   : NA

 NA's   :11

Error: Stopping

In addition: There were 50 or more warnings (use warnings() to see the first 50)

Looking at warnings, I find:

1: In Ops.factor(1, actual) : ‘+’ not meaningful for factors

2: In Ops.factor(1, predicted) : ‘+’ not meaningful for factors

repeated 25 times

Since the same thing works fine if I change the metric to AUC using prSummary as my summary function, I don't believe that there are any issues with my data.

So, I believe that my function is wrong but I don't know how to figure out why it is wrong.

Any help is highly appreciated.

r logistic-regression metrics r-caret evaluation

asked Nov 12 at 20:24

Akshay Gaur

577513

asked Nov 12 at 20:24

Akshay Gaur

577513

asked Nov 12 at 20:24

Akshay Gaur

577513

asked Nov 12 at 20:24

Akshay Gaur

577513

asked Nov 12 at 20:24

Akshay Gaur

577513

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

Your custom metric is not defined properly. If you use classProbs = TRUE and savePredictions = "final" with trainControl you will realize that there are two columns named according to your target classes which hold the predicted probabilities while the data$pred column holds the predicted class which can not be used to calculate the desired metric.

A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:

rmsleMetric <- function(data, lev = NULL, model = NULL){

  lvls <- levels(data$obs)

  out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),

                        data[, lvls[1]])

  names(out) <- c("rmsle")

  return (out)

}

does it work:

library(caret)

library(mlbench)

data(Sonar)

tc <- trainControl(method = "repeatedcv",

                   number = 2,

                   repeats = 2,

                   summaryFunction = rmsleMetric,

                   classProbs = TRUE,

                   savePredictions = "final")

tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))



penalizedLogit_ridge <- train(Class ~ .,

                              data = Sonar,

                              metric="rmsle",

                              method = "glmnet",

                              family = "binomial",

                              trControl = tc,

                              tuneGrid = tg)



#output

glmnet 



208 samples

 60 predictor

  2 classes: 'M', 'R' 



No pre-processing

Resampling: Cross-Validated (2 fold, repeated 2 times) 

Summary of sample sizes: 105, 103, 104, 104 

Resampling results across tuning parameters:



  lambda  rmsle    

  0.0     0.2835407

  0.1     0.2753197

  0.2     0.2768288

  0.3     0.2797847

  0.4     0.2827953

  0.5     0.2856088

  0.6     0.2881894

  0.7     0.2905501

  0.8     0.2927171

  0.9     0.2947169

  1.0     0.2965505



Tuning parameter 'alpha' was held constant at a value of 0

rmsle was used to select the optimal model using the largest value.

The final values used for the model were alpha = 0 and lambda = 1.

You can inspect caret::twoClassSummary - it is defined quite similarly.

edited Nov 13 at 12:59

answered Nov 13 at 9:52

missuse

11.5k2622

Sounds very promising. Let me test this really quick before I mark as answer
– Akshay Gaur
Nov 13 at 15:27

This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?
– Akshay Gaur
Nov 13 at 15:47

1

Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.
– missuse
Nov 13 at 16:41

Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.
– Akshay Gaur
Nov 13 at 21:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269560%2ferror-when-trying-to-pass-custom-metric-in-caret-package%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:

rmsleMetric <- function(data, lev = NULL, model = NULL){

  lvls <- levels(data$obs)

  out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),

                        data[, lvls[1]])

  names(out) <- c("rmsle")

  return (out)

}

does it work:

library(caret)

library(mlbench)

data(Sonar)

tc <- trainControl(method = "repeatedcv",

                   number = 2,

                   repeats = 2,

                   summaryFunction = rmsleMetric,

                   classProbs = TRUE,

                   savePredictions = "final")

tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))



penalizedLogit_ridge <- train(Class ~ .,

                              data = Sonar,

                              metric="rmsle",

                              method = "glmnet",

                              family = "binomial",

                              trControl = tc,

                              tuneGrid = tg)



#output

glmnet 



208 samples

 60 predictor

  2 classes: 'M', 'R' 



No pre-processing

Resampling: Cross-Validated (2 fold, repeated 2 times) 

Summary of sample sizes: 105, 103, 104, 104 

Resampling results across tuning parameters:



  lambda  rmsle    

  0.0     0.2835407

  0.1     0.2753197

  0.2     0.2768288

  0.3     0.2797847

  0.4     0.2827953

  0.5     0.2856088

  0.6     0.2881894

  0.7     0.2905501

  0.8     0.2927171

  0.9     0.2947169

  1.0     0.2965505



Tuning parameter 'alpha' was held constant at a value of 0

rmsle was used to select the optimal model using the largest value.

The final values used for the model were alpha = 0 and lambda = 1.

You can inspect caret::twoClassSummary - it is defined quite similarly.

edited Nov 13 at 12:59

answered Nov 13 at 9:52

missuse

11.5k2622

Sounds very promising. Let me test this really quick before I mark as answer
– Akshay Gaur
Nov 13 at 15:27

This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?
– Akshay Gaur
Nov 13 at 15:47

1

Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.
– missuse
Nov 13 at 16:41

Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.
– Akshay Gaur
Nov 13 at 21:47

add a comment |

up vote
1
down vote

accepted

A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:

rmsleMetric <- function(data, lev = NULL, model = NULL){

  lvls <- levels(data$obs)

  out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),

                        data[, lvls[1]])

  names(out) <- c("rmsle")

  return (out)

}

does it work:

library(caret)

library(mlbench)

data(Sonar)

tc <- trainControl(method = "repeatedcv",

                   number = 2,

                   repeats = 2,

                   summaryFunction = rmsleMetric,

                   classProbs = TRUE,

                   savePredictions = "final")

tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))



penalizedLogit_ridge <- train(Class ~ .,

                              data = Sonar,

                              metric="rmsle",

                              method = "glmnet",

                              family = "binomial",

                              trControl = tc,

                              tuneGrid = tg)



#output

glmnet 



208 samples

 60 predictor

  2 classes: 'M', 'R' 



No pre-processing

Resampling: Cross-Validated (2 fold, repeated 2 times) 

Summary of sample sizes: 105, 103, 104, 104 

Resampling results across tuning parameters:



  lambda  rmsle    

  0.0     0.2835407

  0.1     0.2753197

  0.2     0.2768288

  0.3     0.2797847

  0.4     0.2827953

  0.5     0.2856088

  0.6     0.2881894

  0.7     0.2905501

  0.8     0.2927171

  0.9     0.2947169

  1.0     0.2965505



Tuning parameter 'alpha' was held constant at a value of 0

rmsle was used to select the optimal model using the largest value.

The final values used for the model were alpha = 0 and lambda = 1.

You can inspect caret::twoClassSummary - it is defined quite similarly.

edited Nov 13 at 12:59

answered Nov 13 at 9:52

missuse

11.5k2622

Sounds very promising. Let me test this really quick before I mark as answer
– Akshay Gaur
Nov 13 at 15:27

This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?
– Akshay Gaur
Nov 13 at 15:47

1

Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.
– missuse
Nov 13 at 16:41

Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.
– Akshay Gaur
Nov 13 at 21:47

add a comment |

up vote
1
down vote

accepted

A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:

rmsleMetric <- function(data, lev = NULL, model = NULL){

  lvls <- levels(data$obs)

  out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),

                        data[, lvls[1]])

  names(out) <- c("rmsle")

  return (out)

}

does it work:

library(caret)

library(mlbench)

data(Sonar)

tc <- trainControl(method = "repeatedcv",

                   number = 2,

                   repeats = 2,

                   summaryFunction = rmsleMetric,

                   classProbs = TRUE,

                   savePredictions = "final")

tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))



penalizedLogit_ridge <- train(Class ~ .,

                              data = Sonar,

                              metric="rmsle",

                              method = "glmnet",

                              family = "binomial",

                              trControl = tc,

                              tuneGrid = tg)



#output

glmnet 



208 samples

 60 predictor

  2 classes: 'M', 'R' 



No pre-processing

Resampling: Cross-Validated (2 fold, repeated 2 times) 

Summary of sample sizes: 105, 103, 104, 104 

Resampling results across tuning parameters:



  lambda  rmsle    

  0.0     0.2835407

  0.1     0.2753197

  0.2     0.2768288

  0.3     0.2797847

  0.4     0.2827953

  0.5     0.2856088

  0.6     0.2881894

  0.7     0.2905501

  0.8     0.2927171

  0.9     0.2947169

  1.0     0.2965505



Tuning parameter 'alpha' was held constant at a value of 0

rmsle was used to select the optimal model using the largest value.

The final values used for the model were alpha = 0 and lambda = 1.

You can inspect caret::twoClassSummary - it is defined quite similarly.

edited Nov 13 at 12:59

answered Nov 13 at 9:52

missuse

11.5k2622

A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:

rmsleMetric <- function(data, lev = NULL, model = NULL){

  lvls <- levels(data$obs)

  out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),

                        data[, lvls[1]])

  names(out) <- c("rmsle")

  return (out)

}

does it work:

library(caret)

library(mlbench)

data(Sonar)

tc <- trainControl(method = "repeatedcv",

                   number = 2,

                   repeats = 2,

                   summaryFunction = rmsleMetric,

                   classProbs = TRUE,

                   savePredictions = "final")

tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))



penalizedLogit_ridge <- train(Class ~ .,

                              data = Sonar,

                              metric="rmsle",

                              method = "glmnet",

                              family = "binomial",

                              trControl = tc,

                              tuneGrid = tg)



#output

glmnet 



208 samples

 60 predictor

  2 classes: 'M', 'R' 



No pre-processing

Resampling: Cross-Validated (2 fold, repeated 2 times) 

Summary of sample sizes: 105, 103, 104, 104 

Resampling results across tuning parameters:



  lambda  rmsle    

  0.0     0.2835407

  0.1     0.2753197

  0.2     0.2768288

  0.3     0.2797847

  0.4     0.2827953

  0.5     0.2856088

  0.6     0.2881894

  0.7     0.2905501

  0.8     0.2927171

  0.9     0.2947169

  1.0     0.2965505



Tuning parameter 'alpha' was held constant at a value of 0

rmsle was used to select the optimal model using the largest value.

The final values used for the model were alpha = 0 and lambda = 1.

You can inspect caret::twoClassSummary - it is defined quite similarly.

edited Nov 13 at 12:59

answered Nov 13 at 9:52

missuse

11.5k2622

edited Nov 13 at 12:59

answered Nov 13 at 9:52

missuse

11.5k2622

answered Nov 13 at 9:52

missuse

11.5k2622

answered Nov 13 at 9:52

missuse

11.5k2622

Sounds very promising. Let me test this really quick before I mark as answer
– Akshay Gaur
Nov 13 at 15:27

This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?
– Akshay Gaur
Nov 13 at 15:47

1

Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.
– missuse
Nov 13 at 16:41

Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.
– Akshay Gaur
Nov 13 at 21:47

add a comment |

Sounds very promising. Let me test this really quick before I mark as answer
– Akshay Gaur
Nov 13 at 15:27

This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?
– Akshay Gaur
Nov 13 at 15:47

1

Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.
– missuse
Nov 13 at 16:41

Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.
– Akshay Gaur
Nov 13 at 21:47

Sounds very promising. Let me test this really quick before I mark as answer
– Akshay Gaur
Nov 13 at 15:27

This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?
– Akshay Gaur
Nov 13 at 15:47

Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.
– missuse
Nov 13 at 16:41

Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.
– Akshay Gaur
Nov 13 at 21:47

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk