Mixed integer programming R: Least absolute deviation with cost associated with each regressor
up vote
0
down vote
favorite
I have been presented with a problem, regarding the minimization of the absolute error, the problem know as LAD(Least absolute deviation) but, being each regressor the result of expensive test with an associated cost, one should refrain from using regressors that don't explain variance to a high degree. It takes the following equations:
Where N is the total number of observations, E the deviation associated with observation i, S the number of independant variables, lambda a penalty coefficient for the cost, and C the cost associated with performing the test.
So far, I have oriented as usual. To make it lineal, I transformed the absolute value in two errors, e^+ and e^-, where e= y_i-(B_0+sum(B_j*X_ij) and the following constraints:
z_j ={0,1}, binary value about whether the regressor enters my model.
B_i<=M_zj; B_i>=-M_zj
- E^+, E^- >=0
A toy subset of data I'm working has the following structure:
For y
quality
1 5
2 5
3 5
4 6
5 7
6 5
For the regressors
fixed.acidity volatile.acidity citric.acid
1 7.5 0.610 0.26
2 5.6 0.540 0.04
3 7.4 0.965 0.00
4 6.7 0.460 0.24
5 6.1 0.400 0.16
6 9.7 0.690 0.32
And for the cost
fixed.acidity volatile.acidity citric.acid
1 0.26 0.6 0.52
So far, my code looks like this:
# loading the matrixes
y <- read.csv(file="PATH\y.csv", header = TRUE, sep = ",") #dim=100*11
regresores <- read.csv(file="PATH\regressors.csv", header = TRUE, sep = ",")#dim=100*1
cost <- read.csv(file="PATH\cost.csv", header = TRUE, sep = ",")#dim=1*11
for (i in seq(0, 1, by = 0.1)){#so as to have a collection of models with different penalties
obj.fun <- c(1,1,i*coste)
constr <- matrix(
c(y,regresores,-regresores),
c(-y,-regresores,regresores),
sum(regresores),ncol = ,byrow = TRUE)
constr.dir <- c("<=",">=","<=","==")
rhs<-c(regresores,-regresores,1,binary)
sol<- lp("min", obj.fun, constr, constr.tr, rhs)
sol$objval
sol$solution}
I know theres is a LAD function in R, but for consistence sake with my colleagues, as well as a pretty annoying phD tutor, I have to perform this using lpSolve
in R. I have just started with R for the project and I don't know exactly why this won't run. Is there something wrong with the syntax or my formulation of the model. Right know, the main problem I have is:
"Error in matrix(c(y, regressors, -regressors), c(-y, -regressors, regressors), : non-numeric matrix extent".
Mainly, I intended for it to create said weighted LAD model and have it return the different values of lambda, from 0 to 1 in a 0.1 step.
Thanks in advance and sorry for any inconvenience, neither English nor R are my native languages.
r regression mixed-integer-programming
|
show 2 more comments
up vote
0
down vote
favorite
I have been presented with a problem, regarding the minimization of the absolute error, the problem know as LAD(Least absolute deviation) but, being each regressor the result of expensive test with an associated cost, one should refrain from using regressors that don't explain variance to a high degree. It takes the following equations:
Where N is the total number of observations, E the deviation associated with observation i, S the number of independant variables, lambda a penalty coefficient for the cost, and C the cost associated with performing the test.
So far, I have oriented as usual. To make it lineal, I transformed the absolute value in two errors, e^+ and e^-, where e= y_i-(B_0+sum(B_j*X_ij) and the following constraints:
z_j ={0,1}, binary value about whether the regressor enters my model.
B_i<=M_zj; B_i>=-M_zj
- E^+, E^- >=0
A toy subset of data I'm working has the following structure:
For y
quality
1 5
2 5
3 5
4 6
5 7
6 5
For the regressors
fixed.acidity volatile.acidity citric.acid
1 7.5 0.610 0.26
2 5.6 0.540 0.04
3 7.4 0.965 0.00
4 6.7 0.460 0.24
5 6.1 0.400 0.16
6 9.7 0.690 0.32
And for the cost
fixed.acidity volatile.acidity citric.acid
1 0.26 0.6 0.52
So far, my code looks like this:
# loading the matrixes
y <- read.csv(file="PATH\y.csv", header = TRUE, sep = ",") #dim=100*11
regresores <- read.csv(file="PATH\regressors.csv", header = TRUE, sep = ",")#dim=100*1
cost <- read.csv(file="PATH\cost.csv", header = TRUE, sep = ",")#dim=1*11
for (i in seq(0, 1, by = 0.1)){#so as to have a collection of models with different penalties
obj.fun <- c(1,1,i*coste)
constr <- matrix(
c(y,regresores,-regresores),
c(-y,-regresores,regresores),
sum(regresores),ncol = ,byrow = TRUE)
constr.dir <- c("<=",">=","<=","==")
rhs<-c(regresores,-regresores,1,binary)
sol<- lp("min", obj.fun, constr, constr.tr, rhs)
sol$objval
sol$solution}
I know theres is a LAD function in R, but for consistence sake with my colleagues, as well as a pretty annoying phD tutor, I have to perform this using lpSolve
in R. I have just started with R for the project and I don't know exactly why this won't run. Is there something wrong with the syntax or my formulation of the model. Right know, the main problem I have is:
"Error in matrix(c(y, regressors, -regressors), c(-y, -regressors, regressors), : non-numeric matrix extent".
Mainly, I intended for it to create said weighted LAD model and have it return the different values of lambda, from 0 to 1 in a 0.1 step.
Thanks in advance and sorry for any inconvenience, neither English nor R are my native languages.
r regression mixed-integer-programming
What is your question?
– emilliman5
Nov 9 at 12:38
Sorry, just realized I forgot the most important part. I'll edit it now.
– Aaron G.
Nov 9 at 12:54
It helps us help you if you provide 1) the smallest amount of code possible to demonstrate the problem, 2) data for that code to run on, 3) an explicit statement of expected results, 4) an explanation of why your code doesn't work (e.g., how are the results different from the desired output or what error messages do you receive).
– Lyngbakr
Nov 9 at 13:03
2)the data is kind of massive, it is a matrix of 11*100 samples.3) I wish to have result showing, for each lambda, a model to work with.4) the code has given me some different errors, the main one being non-numeric matrix extend.1) The code I provided, is it not showing? Otherwise I don´t know exactly what it is supposed to be.
– Aaron G.
Nov 9 at 13:13
The code is showing, I'm just explaining what a complete question looks like. (See here for more details.) If you can't provide all the data, provide a subset or a toy data set that can be used to demonstrate the problem. There are lots of data sets here for this sort of thing. Also, edit your question to include the exact error messages you receive. If you provide a well-structured question, you're more likely to get a good answer.
– Lyngbakr
Nov 9 at 15:58
|
show 2 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have been presented with a problem, regarding the minimization of the absolute error, the problem know as LAD(Least absolute deviation) but, being each regressor the result of expensive test with an associated cost, one should refrain from using regressors that don't explain variance to a high degree. It takes the following equations:
Where N is the total number of observations, E the deviation associated with observation i, S the number of independant variables, lambda a penalty coefficient for the cost, and C the cost associated with performing the test.
So far, I have oriented as usual. To make it lineal, I transformed the absolute value in two errors, e^+ and e^-, where e= y_i-(B_0+sum(B_j*X_ij) and the following constraints:
z_j ={0,1}, binary value about whether the regressor enters my model.
B_i<=M_zj; B_i>=-M_zj
- E^+, E^- >=0
A toy subset of data I'm working has the following structure:
For y
quality
1 5
2 5
3 5
4 6
5 7
6 5
For the regressors
fixed.acidity volatile.acidity citric.acid
1 7.5 0.610 0.26
2 5.6 0.540 0.04
3 7.4 0.965 0.00
4 6.7 0.460 0.24
5 6.1 0.400 0.16
6 9.7 0.690 0.32
And for the cost
fixed.acidity volatile.acidity citric.acid
1 0.26 0.6 0.52
So far, my code looks like this:
# loading the matrixes
y <- read.csv(file="PATH\y.csv", header = TRUE, sep = ",") #dim=100*11
regresores <- read.csv(file="PATH\regressors.csv", header = TRUE, sep = ",")#dim=100*1
cost <- read.csv(file="PATH\cost.csv", header = TRUE, sep = ",")#dim=1*11
for (i in seq(0, 1, by = 0.1)){#so as to have a collection of models with different penalties
obj.fun <- c(1,1,i*coste)
constr <- matrix(
c(y,regresores,-regresores),
c(-y,-regresores,regresores),
sum(regresores),ncol = ,byrow = TRUE)
constr.dir <- c("<=",">=","<=","==")
rhs<-c(regresores,-regresores,1,binary)
sol<- lp("min", obj.fun, constr, constr.tr, rhs)
sol$objval
sol$solution}
I know theres is a LAD function in R, but for consistence sake with my colleagues, as well as a pretty annoying phD tutor, I have to perform this using lpSolve
in R. I have just started with R for the project and I don't know exactly why this won't run. Is there something wrong with the syntax or my formulation of the model. Right know, the main problem I have is:
"Error in matrix(c(y, regressors, -regressors), c(-y, -regressors, regressors), : non-numeric matrix extent".
Mainly, I intended for it to create said weighted LAD model and have it return the different values of lambda, from 0 to 1 in a 0.1 step.
Thanks in advance and sorry for any inconvenience, neither English nor R are my native languages.
r regression mixed-integer-programming
I have been presented with a problem, regarding the minimization of the absolute error, the problem know as LAD(Least absolute deviation) but, being each regressor the result of expensive test with an associated cost, one should refrain from using regressors that don't explain variance to a high degree. It takes the following equations:
Where N is the total number of observations, E the deviation associated with observation i, S the number of independant variables, lambda a penalty coefficient for the cost, and C the cost associated with performing the test.
So far, I have oriented as usual. To make it lineal, I transformed the absolute value in two errors, e^+ and e^-, where e= y_i-(B_0+sum(B_j*X_ij) and the following constraints:
z_j ={0,1}, binary value about whether the regressor enters my model.
B_i<=M_zj; B_i>=-M_zj
- E^+, E^- >=0
A toy subset of data I'm working has the following structure:
For y
quality
1 5
2 5
3 5
4 6
5 7
6 5
For the regressors
fixed.acidity volatile.acidity citric.acid
1 7.5 0.610 0.26
2 5.6 0.540 0.04
3 7.4 0.965 0.00
4 6.7 0.460 0.24
5 6.1 0.400 0.16
6 9.7 0.690 0.32
And for the cost
fixed.acidity volatile.acidity citric.acid
1 0.26 0.6 0.52
So far, my code looks like this:
# loading the matrixes
y <- read.csv(file="PATH\y.csv", header = TRUE, sep = ",") #dim=100*11
regresores <- read.csv(file="PATH\regressors.csv", header = TRUE, sep = ",")#dim=100*1
cost <- read.csv(file="PATH\cost.csv", header = TRUE, sep = ",")#dim=1*11
for (i in seq(0, 1, by = 0.1)){#so as to have a collection of models with different penalties
obj.fun <- c(1,1,i*coste)
constr <- matrix(
c(y,regresores,-regresores),
c(-y,-regresores,regresores),
sum(regresores),ncol = ,byrow = TRUE)
constr.dir <- c("<=",">=","<=","==")
rhs<-c(regresores,-regresores,1,binary)
sol<- lp("min", obj.fun, constr, constr.tr, rhs)
sol$objval
sol$solution}
I know theres is a LAD function in R, but for consistence sake with my colleagues, as well as a pretty annoying phD tutor, I have to perform this using lpSolve
in R. I have just started with R for the project and I don't know exactly why this won't run. Is there something wrong with the syntax or my formulation of the model. Right know, the main problem I have is:
"Error in matrix(c(y, regressors, -regressors), c(-y, -regressors, regressors), : non-numeric matrix extent".
Mainly, I intended for it to create said weighted LAD model and have it return the different values of lambda, from 0 to 1 in a 0.1 step.
Thanks in advance and sorry for any inconvenience, neither English nor R are my native languages.
r regression mixed-integer-programming
r regression mixed-integer-programming
edited Nov 20 at 15:41
emilliman5
3,91321429
3,91321429
asked Nov 9 at 12:34
Aaron G.
11
11
What is your question?
– emilliman5
Nov 9 at 12:38
Sorry, just realized I forgot the most important part. I'll edit it now.
– Aaron G.
Nov 9 at 12:54
It helps us help you if you provide 1) the smallest amount of code possible to demonstrate the problem, 2) data for that code to run on, 3) an explicit statement of expected results, 4) an explanation of why your code doesn't work (e.g., how are the results different from the desired output or what error messages do you receive).
– Lyngbakr
Nov 9 at 13:03
2)the data is kind of massive, it is a matrix of 11*100 samples.3) I wish to have result showing, for each lambda, a model to work with.4) the code has given me some different errors, the main one being non-numeric matrix extend.1) The code I provided, is it not showing? Otherwise I don´t know exactly what it is supposed to be.
– Aaron G.
Nov 9 at 13:13
The code is showing, I'm just explaining what a complete question looks like. (See here for more details.) If you can't provide all the data, provide a subset or a toy data set that can be used to demonstrate the problem. There are lots of data sets here for this sort of thing. Also, edit your question to include the exact error messages you receive. If you provide a well-structured question, you're more likely to get a good answer.
– Lyngbakr
Nov 9 at 15:58
|
show 2 more comments
What is your question?
– emilliman5
Nov 9 at 12:38
Sorry, just realized I forgot the most important part. I'll edit it now.
– Aaron G.
Nov 9 at 12:54
It helps us help you if you provide 1) the smallest amount of code possible to demonstrate the problem, 2) data for that code to run on, 3) an explicit statement of expected results, 4) an explanation of why your code doesn't work (e.g., how are the results different from the desired output or what error messages do you receive).
– Lyngbakr
Nov 9 at 13:03
2)the data is kind of massive, it is a matrix of 11*100 samples.3) I wish to have result showing, for each lambda, a model to work with.4) the code has given me some different errors, the main one being non-numeric matrix extend.1) The code I provided, is it not showing? Otherwise I don´t know exactly what it is supposed to be.
– Aaron G.
Nov 9 at 13:13
The code is showing, I'm just explaining what a complete question looks like. (See here for more details.) If you can't provide all the data, provide a subset or a toy data set that can be used to demonstrate the problem. There are lots of data sets here for this sort of thing. Also, edit your question to include the exact error messages you receive. If you provide a well-structured question, you're more likely to get a good answer.
– Lyngbakr
Nov 9 at 15:58
What is your question?
– emilliman5
Nov 9 at 12:38
What is your question?
– emilliman5
Nov 9 at 12:38
Sorry, just realized I forgot the most important part. I'll edit it now.
– Aaron G.
Nov 9 at 12:54
Sorry, just realized I forgot the most important part. I'll edit it now.
– Aaron G.
Nov 9 at 12:54
It helps us help you if you provide 1) the smallest amount of code possible to demonstrate the problem, 2) data for that code to run on, 3) an explicit statement of expected results, 4) an explanation of why your code doesn't work (e.g., how are the results different from the desired output or what error messages do you receive).
– Lyngbakr
Nov 9 at 13:03
It helps us help you if you provide 1) the smallest amount of code possible to demonstrate the problem, 2) data for that code to run on, 3) an explicit statement of expected results, 4) an explanation of why your code doesn't work (e.g., how are the results different from the desired output or what error messages do you receive).
– Lyngbakr
Nov 9 at 13:03
2)the data is kind of massive, it is a matrix of 11*100 samples.3) I wish to have result showing, for each lambda, a model to work with.4) the code has given me some different errors, the main one being non-numeric matrix extend.1) The code I provided, is it not showing? Otherwise I don´t know exactly what it is supposed to be.
– Aaron G.
Nov 9 at 13:13
2)the data is kind of massive, it is a matrix of 11*100 samples.3) I wish to have result showing, for each lambda, a model to work with.4) the code has given me some different errors, the main one being non-numeric matrix extend.1) The code I provided, is it not showing? Otherwise I don´t know exactly what it is supposed to be.
– Aaron G.
Nov 9 at 13:13
The code is showing, I'm just explaining what a complete question looks like. (See here for more details.) If you can't provide all the data, provide a subset or a toy data set that can be used to demonstrate the problem. There are lots of data sets here for this sort of thing. Also, edit your question to include the exact error messages you receive. If you provide a well-structured question, you're more likely to get a good answer.
– Lyngbakr
Nov 9 at 15:58
The code is showing, I'm just explaining what a complete question looks like. (See here for more details.) If you can't provide all the data, provide a subset or a toy data set that can be used to demonstrate the problem. There are lots of data sets here for this sort of thing. Also, edit your question to include the exact error messages you receive. If you provide a well-structured question, you're more likely to get a good answer.
– Lyngbakr
Nov 9 at 15:58
|
show 2 more comments
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53225806%2fmixed-integer-programming-r-least-absolute-deviation-with-cost-associated-with%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What is your question?
– emilliman5
Nov 9 at 12:38
Sorry, just realized I forgot the most important part. I'll edit it now.
– Aaron G.
Nov 9 at 12:54
It helps us help you if you provide 1) the smallest amount of code possible to demonstrate the problem, 2) data for that code to run on, 3) an explicit statement of expected results, 4) an explanation of why your code doesn't work (e.g., how are the results different from the desired output or what error messages do you receive).
– Lyngbakr
Nov 9 at 13:03
2)the data is kind of massive, it is a matrix of 11*100 samples.3) I wish to have result showing, for each lambda, a model to work with.4) the code has given me some different errors, the main one being non-numeric matrix extend.1) The code I provided, is it not showing? Otherwise I don´t know exactly what it is supposed to be.
– Aaron G.
Nov 9 at 13:13
The code is showing, I'm just explaining what a complete question looks like. (See here for more details.) If you can't provide all the data, provide a subset or a toy data set that can be used to demonstrate the problem. There are lots of data sets here for this sort of thing. Also, edit your question to include the exact error messages you receive. If you provide a well-structured question, you're more likely to get a good answer.
– Lyngbakr
Nov 9 at 15:58