R data.table: Filter for rows by condition in multiple variables
I have a filter problem with the following data.table and really hope that someone can help me with that. I am not sure if there is an easy way of doing that and hope that it is not too much to ask for. So this is my problem:
A B C Area
aa M+H 1 127427
aa M+H 2 204051.5
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
bb M+K 2 61715568
For each group A B (aa M+H, aa M+Na, bb M+H, bb M+K) all rows with a value C > 1 should be filtered out if their Area value is higher than in the row with the same A B combination and a C value 1 (each A B C combination exists only once in the table). After that step the following rows should be left:
A B C Area
aa M+H 1 127427
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
and after that i would like to filter out all rows which are in the same A C group (aa 1, aa 2, bb 1, bb2) but with a higher Area value than in the row with an "M+H" as B value. So this should be left:
A B C Area
aa M+H 1 127427
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
And in the end get rid of all A B groups (aa M+H, aa M+Na, bb M+H, bb M+K) that do not one row with a value of 1 in C left. So there should only be:
A B C Area
aa M+H 1 127427
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
I was trying to get it done using data.table but if someone tells me that dplyr is much better for it I would also be happy for a solution there. Anyway thank you a lot for your time and effort!
Yasel
r data.table
add a comment |
I have a filter problem with the following data.table and really hope that someone can help me with that. I am not sure if there is an easy way of doing that and hope that it is not too much to ask for. So this is my problem:
A B C Area
aa M+H 1 127427
aa M+H 2 204051.5
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
bb M+K 2 61715568
For each group A B (aa M+H, aa M+Na, bb M+H, bb M+K) all rows with a value C > 1 should be filtered out if their Area value is higher than in the row with the same A B combination and a C value 1 (each A B C combination exists only once in the table). After that step the following rows should be left:
A B C Area
aa M+H 1 127427
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
and after that i would like to filter out all rows which are in the same A C group (aa 1, aa 2, bb 1, bb2) but with a higher Area value than in the row with an "M+H" as B value. So this should be left:
A B C Area
aa M+H 1 127427
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
And in the end get rid of all A B groups (aa M+H, aa M+Na, bb M+H, bb M+K) that do not one row with a value of 1 in C left. So there should only be:
A B C Area
aa M+H 1 127427
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
I was trying to get it done using data.table but if someone tells me that dplyr is much better for it I would also be happy for a solution there. Anyway thank you a lot for your time and effort!
Yasel
r data.table
can you clarify the rules for filtering out the first table. why wasn'tbb M+H 2 214221
filtered out? and yes, dplyr is well suited for this task
– Wally Ali
Nov 20 '18 at 19:00
here is something to get you started:DT[, .SD[!(C>1 & Area > Area[C==1])], by=.(A, B)][, .SD[Area<=Area[B=="M+H"]], by=.(A, C)]
– chinsoon12
Nov 21 '18 at 0:19
add a comment |
I have a filter problem with the following data.table and really hope that someone can help me with that. I am not sure if there is an easy way of doing that and hope that it is not too much to ask for. So this is my problem:
A B C Area
aa M+H 1 127427
aa M+H 2 204051.5
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
bb M+K 2 61715568
For each group A B (aa M+H, aa M+Na, bb M+H, bb M+K) all rows with a value C > 1 should be filtered out if their Area value is higher than in the row with the same A B combination and a C value 1 (each A B C combination exists only once in the table). After that step the following rows should be left:
A B C Area
aa M+H 1 127427
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
and after that i would like to filter out all rows which are in the same A C group (aa 1, aa 2, bb 1, bb2) but with a higher Area value than in the row with an "M+H" as B value. So this should be left:
A B C Area
aa M+H 1 127427
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
And in the end get rid of all A B groups (aa M+H, aa M+Na, bb M+H, bb M+K) that do not one row with a value of 1 in C left. So there should only be:
A B C Area
aa M+H 1 127427
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
I was trying to get it done using data.table but if someone tells me that dplyr is much better for it I would also be happy for a solution there. Anyway thank you a lot for your time and effort!
Yasel
r data.table
I have a filter problem with the following data.table and really hope that someone can help me with that. I am not sure if there is an easy way of doing that and hope that it is not too much to ask for. So this is my problem:
A B C Area
aa M+H 1 127427
aa M+H 2 204051.5
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
bb M+K 2 61715568
For each group A B (aa M+H, aa M+Na, bb M+H, bb M+K) all rows with a value C > 1 should be filtered out if their Area value is higher than in the row with the same A B combination and a C value 1 (each A B C combination exists only once in the table). After that step the following rows should be left:
A B C Area
aa M+H 1 127427
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
and after that i would like to filter out all rows which are in the same A C group (aa 1, aa 2, bb 1, bb2) but with a higher Area value than in the row with an "M+H" as B value. So this should be left:
A B C Area
aa M+H 1 127427
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
And in the end get rid of all A B groups (aa M+H, aa M+Na, bb M+H, bb M+K) that do not one row with a value of 1 in C left. So there should only be:
A B C Area
aa M+H 1 127427
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
I was trying to get it done using data.table but if someone tells me that dplyr is much better for it I would also be happy for a solution there. Anyway thank you a lot for your time and effort!
Yasel
r data.table
r data.table
edited Nov 20 '18 at 22:26
yasel
asked Nov 20 '18 at 18:22
yaselyasel
727
727
can you clarify the rules for filtering out the first table. why wasn'tbb M+H 2 214221
filtered out? and yes, dplyr is well suited for this task
– Wally Ali
Nov 20 '18 at 19:00
here is something to get you started:DT[, .SD[!(C>1 & Area > Area[C==1])], by=.(A, B)][, .SD[Area<=Area[B=="M+H"]], by=.(A, C)]
– chinsoon12
Nov 21 '18 at 0:19
add a comment |
can you clarify the rules for filtering out the first table. why wasn'tbb M+H 2 214221
filtered out? and yes, dplyr is well suited for this task
– Wally Ali
Nov 20 '18 at 19:00
here is something to get you started:DT[, .SD[!(C>1 & Area > Area[C==1])], by=.(A, B)][, .SD[Area<=Area[B=="M+H"]], by=.(A, C)]
– chinsoon12
Nov 21 '18 at 0:19
can you clarify the rules for filtering out the first table. why wasn't
bb M+H 2 214221
filtered out? and yes, dplyr is well suited for this task– Wally Ali
Nov 20 '18 at 19:00
can you clarify the rules for filtering out the first table. why wasn't
bb M+H 2 214221
filtered out? and yes, dplyr is well suited for this task– Wally Ali
Nov 20 '18 at 19:00
here is something to get you started:
DT[, .SD[!(C>1 & Area > Area[C==1])], by=.(A, B)][, .SD[Area<=Area[B=="M+H"]], by=.(A, C)]
– chinsoon12
Nov 21 '18 at 0:19
here is something to get you started:
DT[, .SD[!(C>1 & Area > Area[C==1])], by=.(A, B)][, .SD[Area<=Area[B=="M+H"]], by=.(A, C)]
– chinsoon12
Nov 21 '18 at 0:19
add a comment |
2 Answers
2
active
oldest
votes
Welcome to SO!
Following your instructions I'm coming to a different result as yours, but you might be able to adapt it to your needs:
library(data.table)
DT <- data.table(stringsAsFactors=FALSE,
A = c("aa", "aa", "aa", "aa", "bb", "bb", "bb", "bb", "bb"),
B = c("M+H", "M+H", "M+Na", "M+Na", "M+H", "M+H", "M+H", "M+K",
"M+K"),
C = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L),
Area = c(127427, 204051.5, 6855539.48777, 6469689, 15330650, 214221,
11357158, 2140221, 61715568)
)
DT <- DT[DT[C==1], on=.(A, B)][i.Area-Area > 0 | C==1]
DT[, c("i.C", "i.Area") := NULL]
DT <- DT[DT[B=="M+H"], on=.(A, C)][i.Area-Area <= 0]
DT[, c("i.B", "i.Area") := NULL]
DT <- DT[DT[C==1], on=.(A, B)]
DT[, c("i.C", "i.Area") := NULL]
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
This isn't the most glamorous solution, but some variation of it might get you there:
library(data.table)
A <- c(rep("aa",4),rep("bb",5))
B <- c(rep("M+H",2),rep("M+Na",2),rep("M+H",3),rep("M+K",2))
C <- c(1,2,1,2,1,2,3,1,2)
Area <- c(127427,204051.5,6855539.48777,6469689,15330650,214221,11357158,2140221,61715568)
DT <- as.data.table(cbind(A,B,C,Area))
DT <- setorder(DT,A,B)
DT$ABFilter <- sapply(1:nrow(DT), function(x) ifelse((C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && (Area[x] < Area[x-1]))
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ABFilter=="Keep",]
DT$ABFilter <- NULL
DT
DT <- setorder(DT,A,C)
DT$ACFilter <- sapply(1:nrow(DT), function(x) ifelse((B[x]=="M+H"
|| (B[x]!="M+H" && A[x]==A[x-1] && C[x]==C[x-1] && B[x-1]=="M+H" && (Area[x] < Area[x-1]))
|| (B[x]!="M+H" && A[x]==A[x-2] && C[x]==C[x-2] && B[x-2]=="M+H" && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ACFilter=="Keep",]
DT$ACFilter <- NULL
DT
DT <- setorder(DT,A,B,C)
DT$ABCFilter <- sapply(1:nrow(DT), function(x) ifelse(C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && C[x-2]==1)
, "Keep", "Discard"))
DT <- DT[ABCFilter=="Keep",]
DT$ABCFilter <- NULL
DT
I'm not so clear on the rules you're using either. It looks like the row with Area = 11357158 should get retained because it is lesser than the corresponding row with C = 1, and the row with Area = 6855539.48777 should get retained because it is greater than the corresponding row with B = M+H:
A B C Area
1: aa M+H 1 127427
2: aa M+Na 1 6855539.48777
3: aa M+Na 2 6469689
4: bb M+H 1 15330650
5: bb M+H 2 214221
6: bb M+H 3 11357158
7: bb M+K 1 2140221
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399216%2fr-data-table-filter-for-rows-by-condition-in-multiple-variables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Welcome to SO!
Following your instructions I'm coming to a different result as yours, but you might be able to adapt it to your needs:
library(data.table)
DT <- data.table(stringsAsFactors=FALSE,
A = c("aa", "aa", "aa", "aa", "bb", "bb", "bb", "bb", "bb"),
B = c("M+H", "M+H", "M+Na", "M+Na", "M+H", "M+H", "M+H", "M+K",
"M+K"),
C = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L),
Area = c(127427, 204051.5, 6855539.48777, 6469689, 15330650, 214221,
11357158, 2140221, 61715568)
)
DT <- DT[DT[C==1], on=.(A, B)][i.Area-Area > 0 | C==1]
DT[, c("i.C", "i.Area") := NULL]
DT <- DT[DT[B=="M+H"], on=.(A, C)][i.Area-Area <= 0]
DT[, c("i.B", "i.Area") := NULL]
DT <- DT[DT[C==1], on=.(A, B)]
DT[, c("i.C", "i.Area") := NULL]
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
Welcome to SO!
Following your instructions I'm coming to a different result as yours, but you might be able to adapt it to your needs:
library(data.table)
DT <- data.table(stringsAsFactors=FALSE,
A = c("aa", "aa", "aa", "aa", "bb", "bb", "bb", "bb", "bb"),
B = c("M+H", "M+H", "M+Na", "M+Na", "M+H", "M+H", "M+H", "M+K",
"M+K"),
C = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L),
Area = c(127427, 204051.5, 6855539.48777, 6469689, 15330650, 214221,
11357158, 2140221, 61715568)
)
DT <- DT[DT[C==1], on=.(A, B)][i.Area-Area > 0 | C==1]
DT[, c("i.C", "i.Area") := NULL]
DT <- DT[DT[B=="M+H"], on=.(A, C)][i.Area-Area <= 0]
DT[, c("i.B", "i.Area") := NULL]
DT <- DT[DT[C==1], on=.(A, B)]
DT[, c("i.C", "i.Area") := NULL]
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
Welcome to SO!
Following your instructions I'm coming to a different result as yours, but you might be able to adapt it to your needs:
library(data.table)
DT <- data.table(stringsAsFactors=FALSE,
A = c("aa", "aa", "aa", "aa", "bb", "bb", "bb", "bb", "bb"),
B = c("M+H", "M+H", "M+Na", "M+Na", "M+H", "M+H", "M+H", "M+K",
"M+K"),
C = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L),
Area = c(127427, 204051.5, 6855539.48777, 6469689, 15330650, 214221,
11357158, 2140221, 61715568)
)
DT <- DT[DT[C==1], on=.(A, B)][i.Area-Area > 0 | C==1]
DT[, c("i.C", "i.Area") := NULL]
DT <- DT[DT[B=="M+H"], on=.(A, C)][i.Area-Area <= 0]
DT[, c("i.B", "i.Area") := NULL]
DT <- DT[DT[C==1], on=.(A, B)]
DT[, c("i.C", "i.Area") := NULL]
Welcome to SO!
Following your instructions I'm coming to a different result as yours, but you might be able to adapt it to your needs:
library(data.table)
DT <- data.table(stringsAsFactors=FALSE,
A = c("aa", "aa", "aa", "aa", "bb", "bb", "bb", "bb", "bb"),
B = c("M+H", "M+H", "M+Na", "M+Na", "M+H", "M+H", "M+H", "M+K",
"M+K"),
C = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L),
Area = c(127427, 204051.5, 6855539.48777, 6469689, 15330650, 214221,
11357158, 2140221, 61715568)
)
DT <- DT[DT[C==1], on=.(A, B)][i.Area-Area > 0 | C==1]
DT[, c("i.C", "i.Area") := NULL]
DT <- DT[DT[B=="M+H"], on=.(A, C)][i.Area-Area <= 0]
DT[, c("i.B", "i.Area") := NULL]
DT <- DT[DT[C==1], on=.(A, B)]
DT[, c("i.C", "i.Area") := NULL]
edited Nov 21 '18 at 8:06
answered Nov 20 '18 at 22:21
ismirsehregalismirsehregal
1,7701212
1,7701212
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
This isn't the most glamorous solution, but some variation of it might get you there:
library(data.table)
A <- c(rep("aa",4),rep("bb",5))
B <- c(rep("M+H",2),rep("M+Na",2),rep("M+H",3),rep("M+K",2))
C <- c(1,2,1,2,1,2,3,1,2)
Area <- c(127427,204051.5,6855539.48777,6469689,15330650,214221,11357158,2140221,61715568)
DT <- as.data.table(cbind(A,B,C,Area))
DT <- setorder(DT,A,B)
DT$ABFilter <- sapply(1:nrow(DT), function(x) ifelse((C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && (Area[x] < Area[x-1]))
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ABFilter=="Keep",]
DT$ABFilter <- NULL
DT
DT <- setorder(DT,A,C)
DT$ACFilter <- sapply(1:nrow(DT), function(x) ifelse((B[x]=="M+H"
|| (B[x]!="M+H" && A[x]==A[x-1] && C[x]==C[x-1] && B[x-1]=="M+H" && (Area[x] < Area[x-1]))
|| (B[x]!="M+H" && A[x]==A[x-2] && C[x]==C[x-2] && B[x-2]=="M+H" && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ACFilter=="Keep",]
DT$ACFilter <- NULL
DT
DT <- setorder(DT,A,B,C)
DT$ABCFilter <- sapply(1:nrow(DT), function(x) ifelse(C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && C[x-2]==1)
, "Keep", "Discard"))
DT <- DT[ABCFilter=="Keep",]
DT$ABCFilter <- NULL
DT
I'm not so clear on the rules you're using either. It looks like the row with Area = 11357158 should get retained because it is lesser than the corresponding row with C = 1, and the row with Area = 6855539.48777 should get retained because it is greater than the corresponding row with B = M+H:
A B C Area
1: aa M+H 1 127427
2: aa M+Na 1 6855539.48777
3: aa M+Na 2 6469689
4: bb M+H 1 15330650
5: bb M+H 2 214221
6: bb M+H 3 11357158
7: bb M+K 1 2140221
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
This isn't the most glamorous solution, but some variation of it might get you there:
library(data.table)
A <- c(rep("aa",4),rep("bb",5))
B <- c(rep("M+H",2),rep("M+Na",2),rep("M+H",3),rep("M+K",2))
C <- c(1,2,1,2,1,2,3,1,2)
Area <- c(127427,204051.5,6855539.48777,6469689,15330650,214221,11357158,2140221,61715568)
DT <- as.data.table(cbind(A,B,C,Area))
DT <- setorder(DT,A,B)
DT$ABFilter <- sapply(1:nrow(DT), function(x) ifelse((C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && (Area[x] < Area[x-1]))
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ABFilter=="Keep",]
DT$ABFilter <- NULL
DT
DT <- setorder(DT,A,C)
DT$ACFilter <- sapply(1:nrow(DT), function(x) ifelse((B[x]=="M+H"
|| (B[x]!="M+H" && A[x]==A[x-1] && C[x]==C[x-1] && B[x-1]=="M+H" && (Area[x] < Area[x-1]))
|| (B[x]!="M+H" && A[x]==A[x-2] && C[x]==C[x-2] && B[x-2]=="M+H" && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ACFilter=="Keep",]
DT$ACFilter <- NULL
DT
DT <- setorder(DT,A,B,C)
DT$ABCFilter <- sapply(1:nrow(DT), function(x) ifelse(C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && C[x-2]==1)
, "Keep", "Discard"))
DT <- DT[ABCFilter=="Keep",]
DT$ABCFilter <- NULL
DT
I'm not so clear on the rules you're using either. It looks like the row with Area = 11357158 should get retained because it is lesser than the corresponding row with C = 1, and the row with Area = 6855539.48777 should get retained because it is greater than the corresponding row with B = M+H:
A B C Area
1: aa M+H 1 127427
2: aa M+Na 1 6855539.48777
3: aa M+Na 2 6469689
4: bb M+H 1 15330650
5: bb M+H 2 214221
6: bb M+H 3 11357158
7: bb M+K 1 2140221
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
This isn't the most glamorous solution, but some variation of it might get you there:
library(data.table)
A <- c(rep("aa",4),rep("bb",5))
B <- c(rep("M+H",2),rep("M+Na",2),rep("M+H",3),rep("M+K",2))
C <- c(1,2,1,2,1,2,3,1,2)
Area <- c(127427,204051.5,6855539.48777,6469689,15330650,214221,11357158,2140221,61715568)
DT <- as.data.table(cbind(A,B,C,Area))
DT <- setorder(DT,A,B)
DT$ABFilter <- sapply(1:nrow(DT), function(x) ifelse((C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && (Area[x] < Area[x-1]))
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ABFilter=="Keep",]
DT$ABFilter <- NULL
DT
DT <- setorder(DT,A,C)
DT$ACFilter <- sapply(1:nrow(DT), function(x) ifelse((B[x]=="M+H"
|| (B[x]!="M+H" && A[x]==A[x-1] && C[x]==C[x-1] && B[x-1]=="M+H" && (Area[x] < Area[x-1]))
|| (B[x]!="M+H" && A[x]==A[x-2] && C[x]==C[x-2] && B[x-2]=="M+H" && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ACFilter=="Keep",]
DT$ACFilter <- NULL
DT
DT <- setorder(DT,A,B,C)
DT$ABCFilter <- sapply(1:nrow(DT), function(x) ifelse(C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && C[x-2]==1)
, "Keep", "Discard"))
DT <- DT[ABCFilter=="Keep",]
DT$ABCFilter <- NULL
DT
I'm not so clear on the rules you're using either. It looks like the row with Area = 11357158 should get retained because it is lesser than the corresponding row with C = 1, and the row with Area = 6855539.48777 should get retained because it is greater than the corresponding row with B = M+H:
A B C Area
1: aa M+H 1 127427
2: aa M+Na 1 6855539.48777
3: aa M+Na 2 6469689
4: bb M+H 1 15330650
5: bb M+H 2 214221
6: bb M+H 3 11357158
7: bb M+K 1 2140221
This isn't the most glamorous solution, but some variation of it might get you there:
library(data.table)
A <- c(rep("aa",4),rep("bb",5))
B <- c(rep("M+H",2),rep("M+Na",2),rep("M+H",3),rep("M+K",2))
C <- c(1,2,1,2,1,2,3,1,2)
Area <- c(127427,204051.5,6855539.48777,6469689,15330650,214221,11357158,2140221,61715568)
DT <- as.data.table(cbind(A,B,C,Area))
DT <- setorder(DT,A,B)
DT$ABFilter <- sapply(1:nrow(DT), function(x) ifelse((C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && (Area[x] < Area[x-1]))
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ABFilter=="Keep",]
DT$ABFilter <- NULL
DT
DT <- setorder(DT,A,C)
DT$ACFilter <- sapply(1:nrow(DT), function(x) ifelse((B[x]=="M+H"
|| (B[x]!="M+H" && A[x]==A[x-1] && C[x]==C[x-1] && B[x-1]=="M+H" && (Area[x] < Area[x-1]))
|| (B[x]!="M+H" && A[x]==A[x-2] && C[x]==C[x-2] && B[x-2]=="M+H" && (Area[x] < Area[x-2])))
, "Keep", "Discard"))
DT <- DT[ACFilter=="Keep",]
DT$ACFilter <- NULL
DT
DT <- setorder(DT,A,B,C)
DT$ABCFilter <- sapply(1:nrow(DT), function(x) ifelse(C[x]==1
|| (C[x]==2 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-1] && B[x]==B[x-1] && C[x-1]==1)
|| (C[x]==3 && A[x]==A[x-2] && B[x]==B[x-2] && C[x-2]==1)
, "Keep", "Discard"))
DT <- DT[ABCFilter=="Keep",]
DT$ABCFilter <- NULL
DT
I'm not so clear on the rules you're using either. It looks like the row with Area = 11357158 should get retained because it is lesser than the corresponding row with C = 1, and the row with Area = 6855539.48777 should get retained because it is greater than the corresponding row with B = M+H:
A B C Area
1: aa M+H 1 127427
2: aa M+Na 1 6855539.48777
3: aa M+Na 2 6469689
4: bb M+H 1 15330650
5: bb M+H 2 214221
6: bb M+H 3 11357158
7: bb M+K 1 2140221
edited Nov 20 '18 at 22:43
answered Nov 20 '18 at 22:11
cgrafecgrafe
814
814
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
thanks a lot for your help! you are right, I made a mistake in my example, (my had was already smoking at the time a wrote it). I edited now in a way that its correct. I also had to replace a "lower" with an "higher" for the second condition. I will test the solution as soon as I get back to my main computer tomorrow!
– yasel
Nov 20 '18 at 22:29
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399216%2fr-data-table-filter-for-rows-by-condition-in-multiple-variables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
can you clarify the rules for filtering out the first table. why wasn't
bb M+H 2 214221
filtered out? and yes, dplyr is well suited for this task– Wally Ali
Nov 20 '18 at 19:00
here is something to get you started:
DT[, .SD[!(C>1 & Area > Area[C==1])], by=.(A, B)][, .SD[Area<=Area[B=="M+H"]], by=.(A, C)]
– chinsoon12
Nov 21 '18 at 0:19