Usage of 'for loop' in R to split a dataframe into several dataframes












-1















I have a problem with for loop.
I have a dataframe with 120 unique IDs. I want to split the dataframe into 120 different dataframes based on the ID. I split it using the following code:



split_part0 <- split(PART0_DF, PART0_DF$sysid)


Now I want to do something like



for(i in 1:120){ 
sys[i] <- as.data.frame(split_part0[[i]])}


This way I have the 120 dataframes with unique frame names I can use for further analysis.
Is using 'for loop' in this particular case not possible? If so, what other commands can I use?
Dummy data for PART0_DF:



 Date      sysid   power   temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11


I want the output to be like



     >>sys1
Date sysid power temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
>>sys2
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11









share|improve this question

























  • If you provide a small dummy example of PART0_DF it would be easier to understand what it is you need.

    – rookie
    Nov 21 '18 at 11:47
















-1















I have a problem with for loop.
I have a dataframe with 120 unique IDs. I want to split the dataframe into 120 different dataframes based on the ID. I split it using the following code:



split_part0 <- split(PART0_DF, PART0_DF$sysid)


Now I want to do something like



for(i in 1:120){ 
sys[i] <- as.data.frame(split_part0[[i]])}


This way I have the 120 dataframes with unique frame names I can use for further analysis.
Is using 'for loop' in this particular case not possible? If so, what other commands can I use?
Dummy data for PART0_DF:



 Date      sysid   power   temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11


I want the output to be like



     >>sys1
Date sysid power temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
>>sys2
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11









share|improve this question

























  • If you provide a small dummy example of PART0_DF it would be easier to understand what it is you need.

    – rookie
    Nov 21 '18 at 11:47














-1












-1








-1








I have a problem with for loop.
I have a dataframe with 120 unique IDs. I want to split the dataframe into 120 different dataframes based on the ID. I split it using the following code:



split_part0 <- split(PART0_DF, PART0_DF$sysid)


Now I want to do something like



for(i in 1:120){ 
sys[i] <- as.data.frame(split_part0[[i]])}


This way I have the 120 dataframes with unique frame names I can use for further analysis.
Is using 'for loop' in this particular case not possible? If so, what other commands can I use?
Dummy data for PART0_DF:



 Date      sysid   power   temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11


I want the output to be like



     >>sys1
Date sysid power temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
>>sys2
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11









share|improve this question
















I have a problem with for loop.
I have a dataframe with 120 unique IDs. I want to split the dataframe into 120 different dataframes based on the ID. I split it using the following code:



split_part0 <- split(PART0_DF, PART0_DF$sysid)


Now I want to do something like



for(i in 1:120){ 
sys[i] <- as.data.frame(split_part0[[i]])}


This way I have the 120 dataframes with unique frame names I can use for further analysis.
Is using 'for loop' in this particular case not possible? If so, what other commands can I use?
Dummy data for PART0_DF:



 Date      sysid   power   temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11


I want the output to be like



     >>sys1
Date sysid power temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
>>sys2
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11






r for-loop






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 14:42







Shruthi Patil

















asked Nov 21 '18 at 11:04









Shruthi PatilShruthi Patil

54




54













  • If you provide a small dummy example of PART0_DF it would be easier to understand what it is you need.

    – rookie
    Nov 21 '18 at 11:47



















  • If you provide a small dummy example of PART0_DF it would be easier to understand what it is you need.

    – rookie
    Nov 21 '18 at 11:47

















If you provide a small dummy example of PART0_DF it would be easier to understand what it is you need.

– rookie
Nov 21 '18 at 11:47





If you provide a small dummy example of PART0_DF it would be easier to understand what it is you need.

– rookie
Nov 21 '18 at 11:47












2 Answers
2






active

oldest

votes


















0














An easy way to do this is to create a factor vector by appending the string sys to the id numbers, and using it to split the data. There is no need to use a for() loop to produce the desired output, since the result of split() is a list of data frames when the input to be split is a data frame.



The value of the factor is used to name each element in the list generated by split(). In the case of the OP, since sysid is numeric and starts with 1, it's not obvious that the id numbers are being used to name the resulting data frames in the list, as explained in the help for split().



Using the data from the OP we'll illustrate how to use the sysid column to create a factor variable that combines the string sys with the id values, and split it into a list of data frames that can be accessed by name.



rawData <- "Date      sysid   power   temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11"

data <- read.table(text = rawData,header=TRUE)
sysidName <- paste0("sys",data$sysid)

splitData <- split(data,sysidName)

splitData


...and the output:



> splitData
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11

>


At this point one can access individual data frames in the list by using the $ form of the extract operator:



> splitData$sys1
Date sysid power temperature sysidName
1 1.1.2018 1 1000 14 sys1
2 2.1.2018 1 1200 16 sys1
3 3.1.2018 1 800 18 sys1
>


Also, by using the names() function one can obtain a vector of all the named elements in the list of data frames.



> names(splitData)
[1] "sys1" "sys2"
>


Reiterating the main point from the top of the answer, when split() is used with a data frame, the resulting list is a list of objects of type data.frame(). For example:



> str(splitData["sys1"])
List of 1
$ sys1:'data.frame': 3 obs. of 4 variables:
..$ Date : Factor w/ 3 levels "1.1.2018","2.1.2018",..: 1 2 3
..$ sysid : int [1:3] 1 1 1
..$ power : int [1:3] 1000 1200 800
..$ temperature: int [1:3] 14 16 18
>


If you must use a for() loop...



Since the OP asked whether the problem could be solved with a for() loop, the answer is "yes."



# create a vector containing unique values of sysid
ids <- unique(data$sysid)
# initialize output data frame list
dfList <- list()
# loop thru unique values and generate named data frames in list()
for(i in ids){
dfname <- paste0("sys",i)
dfList[[dfname]] <- data[data$sysid == i,]
}
dfList


...and the output:



> for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ }
> dfList
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11


Choosing the "best" answer



Between split(), for() and the other answer using by(), how do we choose the best answer?



One way is to determine which version runs fastest, given that the real data will be much larger than the sample data from the original post.



We can use the microbenchmark package to compare the performance of the three different approaches.




split() performance



library(microbenchmark)
> microbenchmark(splitData <- split(data,sysidName),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
splitData <- split(data, sysidName) 144.594 147.359 185.7987 150.1245 170.4705 615.507 100
>



for() performance



> microbenchmark(for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ },unit="us")
Unit: microseconds
expr min lq mean
for (i in ids) { dfname <- paste0("sys", i) dfList[[dfname]] <- data[data$sysid == i, ] } 2643.755 2857.286 3457.642
median uq max neval
3099.064 3479.311 8511.609 100
>



by() performance



> microbenchmark(df_list <- by(df, df$sysid, function(unique) unique),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
df_list <- by(df, df$sysid, function(unique) unique) 256.791 260.5445 304.9296 275.9515 309.5325 1218.372 100
>


...and the winner is:



split(), with an average runtime of 186 microseconds, versus 305 microseconds for by() and a whopping 3,458 microseconds for the for() loop approach.






share|improve this answer


























  • I would like to access each data frame to do further analysis separately. Hence, the first solution is perfect. Thank you very much.

    – Shruthi Patil
    Nov 23 '18 at 9:58



















0














Another option is using the function by():



df <- data.frame(
Date = c("1.1.2018", "2.1.2018", "3.1.2018", "1.1.2018", "2.1.2018", "3.1.2018"),
sysid = c(1, 1, 1, 2, 2, 2),
power = c(1000, 1200, 800, 1500, 800, 1300)
)
df
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300


Now split df in as many dataframes as you have distinct ('unique') sysid values using by() and calling unique:



df_list <- by(df, df$sysid, function(unique) unique)
df_list
df$sysid: 1
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
----------------------------------------------------------------------------------------------
df$sysid: 2
Date sysid power
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300





share|improve this answer
























  • I tried this too. But, the solution provided by @LenGreski is more suited for my requirement. Thanks for your help

    – Shruthi Patil
    Nov 23 '18 at 10:00











  • On Stack Overflow it is customary to click the upward arrow if a given answer is useful.

    – Chris Ruehlemann
    Nov 23 '18 at 14:49











  • I have already. It says I don't have enough reputation for it to get displayed. However, it is recorded.

    – Shruthi Patil
    Nov 24 '18 at 16:17











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53410738%2fusage-of-for-loop-in-r-to-split-a-dataframe-into-several-dataframes%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














An easy way to do this is to create a factor vector by appending the string sys to the id numbers, and using it to split the data. There is no need to use a for() loop to produce the desired output, since the result of split() is a list of data frames when the input to be split is a data frame.



The value of the factor is used to name each element in the list generated by split(). In the case of the OP, since sysid is numeric and starts with 1, it's not obvious that the id numbers are being used to name the resulting data frames in the list, as explained in the help for split().



Using the data from the OP we'll illustrate how to use the sysid column to create a factor variable that combines the string sys with the id values, and split it into a list of data frames that can be accessed by name.



rawData <- "Date      sysid   power   temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11"

data <- read.table(text = rawData,header=TRUE)
sysidName <- paste0("sys",data$sysid)

splitData <- split(data,sysidName)

splitData


...and the output:



> splitData
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11

>


At this point one can access individual data frames in the list by using the $ form of the extract operator:



> splitData$sys1
Date sysid power temperature sysidName
1 1.1.2018 1 1000 14 sys1
2 2.1.2018 1 1200 16 sys1
3 3.1.2018 1 800 18 sys1
>


Also, by using the names() function one can obtain a vector of all the named elements in the list of data frames.



> names(splitData)
[1] "sys1" "sys2"
>


Reiterating the main point from the top of the answer, when split() is used with a data frame, the resulting list is a list of objects of type data.frame(). For example:



> str(splitData["sys1"])
List of 1
$ sys1:'data.frame': 3 obs. of 4 variables:
..$ Date : Factor w/ 3 levels "1.1.2018","2.1.2018",..: 1 2 3
..$ sysid : int [1:3] 1 1 1
..$ power : int [1:3] 1000 1200 800
..$ temperature: int [1:3] 14 16 18
>


If you must use a for() loop...



Since the OP asked whether the problem could be solved with a for() loop, the answer is "yes."



# create a vector containing unique values of sysid
ids <- unique(data$sysid)
# initialize output data frame list
dfList <- list()
# loop thru unique values and generate named data frames in list()
for(i in ids){
dfname <- paste0("sys",i)
dfList[[dfname]] <- data[data$sysid == i,]
}
dfList


...and the output:



> for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ }
> dfList
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11


Choosing the "best" answer



Between split(), for() and the other answer using by(), how do we choose the best answer?



One way is to determine which version runs fastest, given that the real data will be much larger than the sample data from the original post.



We can use the microbenchmark package to compare the performance of the three different approaches.




split() performance



library(microbenchmark)
> microbenchmark(splitData <- split(data,sysidName),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
splitData <- split(data, sysidName) 144.594 147.359 185.7987 150.1245 170.4705 615.507 100
>



for() performance



> microbenchmark(for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ },unit="us")
Unit: microseconds
expr min lq mean
for (i in ids) { dfname <- paste0("sys", i) dfList[[dfname]] <- data[data$sysid == i, ] } 2643.755 2857.286 3457.642
median uq max neval
3099.064 3479.311 8511.609 100
>



by() performance



> microbenchmark(df_list <- by(df, df$sysid, function(unique) unique),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
df_list <- by(df, df$sysid, function(unique) unique) 256.791 260.5445 304.9296 275.9515 309.5325 1218.372 100
>


...and the winner is:



split(), with an average runtime of 186 microseconds, versus 305 microseconds for by() and a whopping 3,458 microseconds for the for() loop approach.






share|improve this answer


























  • I would like to access each data frame to do further analysis separately. Hence, the first solution is perfect. Thank you very much.

    – Shruthi Patil
    Nov 23 '18 at 9:58
















0














An easy way to do this is to create a factor vector by appending the string sys to the id numbers, and using it to split the data. There is no need to use a for() loop to produce the desired output, since the result of split() is a list of data frames when the input to be split is a data frame.



The value of the factor is used to name each element in the list generated by split(). In the case of the OP, since sysid is numeric and starts with 1, it's not obvious that the id numbers are being used to name the resulting data frames in the list, as explained in the help for split().



Using the data from the OP we'll illustrate how to use the sysid column to create a factor variable that combines the string sys with the id values, and split it into a list of data frames that can be accessed by name.



rawData <- "Date      sysid   power   temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11"

data <- read.table(text = rawData,header=TRUE)
sysidName <- paste0("sys",data$sysid)

splitData <- split(data,sysidName)

splitData


...and the output:



> splitData
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11

>


At this point one can access individual data frames in the list by using the $ form of the extract operator:



> splitData$sys1
Date sysid power temperature sysidName
1 1.1.2018 1 1000 14 sys1
2 2.1.2018 1 1200 16 sys1
3 3.1.2018 1 800 18 sys1
>


Also, by using the names() function one can obtain a vector of all the named elements in the list of data frames.



> names(splitData)
[1] "sys1" "sys2"
>


Reiterating the main point from the top of the answer, when split() is used with a data frame, the resulting list is a list of objects of type data.frame(). For example:



> str(splitData["sys1"])
List of 1
$ sys1:'data.frame': 3 obs. of 4 variables:
..$ Date : Factor w/ 3 levels "1.1.2018","2.1.2018",..: 1 2 3
..$ sysid : int [1:3] 1 1 1
..$ power : int [1:3] 1000 1200 800
..$ temperature: int [1:3] 14 16 18
>


If you must use a for() loop...



Since the OP asked whether the problem could be solved with a for() loop, the answer is "yes."



# create a vector containing unique values of sysid
ids <- unique(data$sysid)
# initialize output data frame list
dfList <- list()
# loop thru unique values and generate named data frames in list()
for(i in ids){
dfname <- paste0("sys",i)
dfList[[dfname]] <- data[data$sysid == i,]
}
dfList


...and the output:



> for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ }
> dfList
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11


Choosing the "best" answer



Between split(), for() and the other answer using by(), how do we choose the best answer?



One way is to determine which version runs fastest, given that the real data will be much larger than the sample data from the original post.



We can use the microbenchmark package to compare the performance of the three different approaches.




split() performance



library(microbenchmark)
> microbenchmark(splitData <- split(data,sysidName),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
splitData <- split(data, sysidName) 144.594 147.359 185.7987 150.1245 170.4705 615.507 100
>



for() performance



> microbenchmark(for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ },unit="us")
Unit: microseconds
expr min lq mean
for (i in ids) { dfname <- paste0("sys", i) dfList[[dfname]] <- data[data$sysid == i, ] } 2643.755 2857.286 3457.642
median uq max neval
3099.064 3479.311 8511.609 100
>



by() performance



> microbenchmark(df_list <- by(df, df$sysid, function(unique) unique),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
df_list <- by(df, df$sysid, function(unique) unique) 256.791 260.5445 304.9296 275.9515 309.5325 1218.372 100
>


...and the winner is:



split(), with an average runtime of 186 microseconds, versus 305 microseconds for by() and a whopping 3,458 microseconds for the for() loop approach.






share|improve this answer


























  • I would like to access each data frame to do further analysis separately. Hence, the first solution is perfect. Thank you very much.

    – Shruthi Patil
    Nov 23 '18 at 9:58














0












0








0







An easy way to do this is to create a factor vector by appending the string sys to the id numbers, and using it to split the data. There is no need to use a for() loop to produce the desired output, since the result of split() is a list of data frames when the input to be split is a data frame.



The value of the factor is used to name each element in the list generated by split(). In the case of the OP, since sysid is numeric and starts with 1, it's not obvious that the id numbers are being used to name the resulting data frames in the list, as explained in the help for split().



Using the data from the OP we'll illustrate how to use the sysid column to create a factor variable that combines the string sys with the id values, and split it into a list of data frames that can be accessed by name.



rawData <- "Date      sysid   power   temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11"

data <- read.table(text = rawData,header=TRUE)
sysidName <- paste0("sys",data$sysid)

splitData <- split(data,sysidName)

splitData


...and the output:



> splitData
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11

>


At this point one can access individual data frames in the list by using the $ form of the extract operator:



> splitData$sys1
Date sysid power temperature sysidName
1 1.1.2018 1 1000 14 sys1
2 2.1.2018 1 1200 16 sys1
3 3.1.2018 1 800 18 sys1
>


Also, by using the names() function one can obtain a vector of all the named elements in the list of data frames.



> names(splitData)
[1] "sys1" "sys2"
>


Reiterating the main point from the top of the answer, when split() is used with a data frame, the resulting list is a list of objects of type data.frame(). For example:



> str(splitData["sys1"])
List of 1
$ sys1:'data.frame': 3 obs. of 4 variables:
..$ Date : Factor w/ 3 levels "1.1.2018","2.1.2018",..: 1 2 3
..$ sysid : int [1:3] 1 1 1
..$ power : int [1:3] 1000 1200 800
..$ temperature: int [1:3] 14 16 18
>


If you must use a for() loop...



Since the OP asked whether the problem could be solved with a for() loop, the answer is "yes."



# create a vector containing unique values of sysid
ids <- unique(data$sysid)
# initialize output data frame list
dfList <- list()
# loop thru unique values and generate named data frames in list()
for(i in ids){
dfname <- paste0("sys",i)
dfList[[dfname]] <- data[data$sysid == i,]
}
dfList


...and the output:



> for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ }
> dfList
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11


Choosing the "best" answer



Between split(), for() and the other answer using by(), how do we choose the best answer?



One way is to determine which version runs fastest, given that the real data will be much larger than the sample data from the original post.



We can use the microbenchmark package to compare the performance of the three different approaches.




split() performance



library(microbenchmark)
> microbenchmark(splitData <- split(data,sysidName),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
splitData <- split(data, sysidName) 144.594 147.359 185.7987 150.1245 170.4705 615.507 100
>



for() performance



> microbenchmark(for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ },unit="us")
Unit: microseconds
expr min lq mean
for (i in ids) { dfname <- paste0("sys", i) dfList[[dfname]] <- data[data$sysid == i, ] } 2643.755 2857.286 3457.642
median uq max neval
3099.064 3479.311 8511.609 100
>



by() performance



> microbenchmark(df_list <- by(df, df$sysid, function(unique) unique),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
df_list <- by(df, df$sysid, function(unique) unique) 256.791 260.5445 304.9296 275.9515 309.5325 1218.372 100
>


...and the winner is:



split(), with an average runtime of 186 microseconds, versus 305 microseconds for by() and a whopping 3,458 microseconds for the for() loop approach.






share|improve this answer















An easy way to do this is to create a factor vector by appending the string sys to the id numbers, and using it to split the data. There is no need to use a for() loop to produce the desired output, since the result of split() is a list of data frames when the input to be split is a data frame.



The value of the factor is used to name each element in the list generated by split(). In the case of the OP, since sysid is numeric and starts with 1, it's not obvious that the id numbers are being used to name the resulting data frames in the list, as explained in the help for split().



Using the data from the OP we'll illustrate how to use the sysid column to create a factor variable that combines the string sys with the id values, and split it into a list of data frames that can be accessed by name.



rawData <- "Date      sysid   power   temperature
1.1.2018 1 1000 14
2.1.2018 1 1200 16
3.1.2018 1 800 18
1.1.2018 2 1500 8
2.1.2018 2 800 18
3.1.2018 2 1300 11"

data <- read.table(text = rawData,header=TRUE)
sysidName <- paste0("sys",data$sysid)

splitData <- split(data,sysidName)

splitData


...and the output:



> splitData
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11

>


At this point one can access individual data frames in the list by using the $ form of the extract operator:



> splitData$sys1
Date sysid power temperature sysidName
1 1.1.2018 1 1000 14 sys1
2 2.1.2018 1 1200 16 sys1
3 3.1.2018 1 800 18 sys1
>


Also, by using the names() function one can obtain a vector of all the named elements in the list of data frames.



> names(splitData)
[1] "sys1" "sys2"
>


Reiterating the main point from the top of the answer, when split() is used with a data frame, the resulting list is a list of objects of type data.frame(). For example:



> str(splitData["sys1"])
List of 1
$ sys1:'data.frame': 3 obs. of 4 variables:
..$ Date : Factor w/ 3 levels "1.1.2018","2.1.2018",..: 1 2 3
..$ sysid : int [1:3] 1 1 1
..$ power : int [1:3] 1000 1200 800
..$ temperature: int [1:3] 14 16 18
>


If you must use a for() loop...



Since the OP asked whether the problem could be solved with a for() loop, the answer is "yes."



# create a vector containing unique values of sysid
ids <- unique(data$sysid)
# initialize output data frame list
dfList <- list()
# loop thru unique values and generate named data frames in list()
for(i in ids){
dfname <- paste0("sys",i)
dfList[[dfname]] <- data[data$sysid == i,]
}
dfList


...and the output:



> for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ }
> dfList
$`sys1`
Date sysid power temperature
1 1.1.2018 1 1000 14
2 2.1.2018 1 1200 16
3 3.1.2018 1 800 18

$sys2
Date sysid power temperature
4 1.1.2018 2 1500 8
5 2.1.2018 2 800 18
6 3.1.2018 2 1300 11


Choosing the "best" answer



Between split(), for() and the other answer using by(), how do we choose the best answer?



One way is to determine which version runs fastest, given that the real data will be much larger than the sample data from the original post.



We can use the microbenchmark package to compare the performance of the three different approaches.




split() performance



library(microbenchmark)
> microbenchmark(splitData <- split(data,sysidName),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
splitData <- split(data, sysidName) 144.594 147.359 185.7987 150.1245 170.4705 615.507 100
>



for() performance



> microbenchmark(for(i in ids){
+ dfname <- paste0("sys",i)
+ dfList[[dfname]] <- data[data$sysid == i,]
+ },unit="us")
Unit: microseconds
expr min lq mean
for (i in ids) { dfname <- paste0("sys", i) dfList[[dfname]] <- data[data$sysid == i, ] } 2643.755 2857.286 3457.642
median uq max neval
3099.064 3479.311 8511.609 100
>



by() performance



> microbenchmark(df_list <- by(df, df$sysid, function(unique) unique),unit="us")
Unit: microseconds
expr min lq mean median uq max neval
df_list <- by(df, df$sysid, function(unique) unique) 256.791 260.5445 304.9296 275.9515 309.5325 1218.372 100
>


...and the winner is:



split(), with an average runtime of 186 microseconds, versus 305 microseconds for by() and a whopping 3,458 microseconds for the for() loop approach.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 22 '18 at 6:39

























answered Nov 21 '18 at 14:56









Len GreskiLen Greski

3,2281623




3,2281623













  • I would like to access each data frame to do further analysis separately. Hence, the first solution is perfect. Thank you very much.

    – Shruthi Patil
    Nov 23 '18 at 9:58



















  • I would like to access each data frame to do further analysis separately. Hence, the first solution is perfect. Thank you very much.

    – Shruthi Patil
    Nov 23 '18 at 9:58

















I would like to access each data frame to do further analysis separately. Hence, the first solution is perfect. Thank you very much.

– Shruthi Patil
Nov 23 '18 at 9:58





I would like to access each data frame to do further analysis separately. Hence, the first solution is perfect. Thank you very much.

– Shruthi Patil
Nov 23 '18 at 9:58













0














Another option is using the function by():



df <- data.frame(
Date = c("1.1.2018", "2.1.2018", "3.1.2018", "1.1.2018", "2.1.2018", "3.1.2018"),
sysid = c(1, 1, 1, 2, 2, 2),
power = c(1000, 1200, 800, 1500, 800, 1300)
)
df
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300


Now split df in as many dataframes as you have distinct ('unique') sysid values using by() and calling unique:



df_list <- by(df, df$sysid, function(unique) unique)
df_list
df$sysid: 1
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
----------------------------------------------------------------------------------------------
df$sysid: 2
Date sysid power
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300





share|improve this answer
























  • I tried this too. But, the solution provided by @LenGreski is more suited for my requirement. Thanks for your help

    – Shruthi Patil
    Nov 23 '18 at 10:00











  • On Stack Overflow it is customary to click the upward arrow if a given answer is useful.

    – Chris Ruehlemann
    Nov 23 '18 at 14:49











  • I have already. It says I don't have enough reputation for it to get displayed. However, it is recorded.

    – Shruthi Patil
    Nov 24 '18 at 16:17
















0














Another option is using the function by():



df <- data.frame(
Date = c("1.1.2018", "2.1.2018", "3.1.2018", "1.1.2018", "2.1.2018", "3.1.2018"),
sysid = c(1, 1, 1, 2, 2, 2),
power = c(1000, 1200, 800, 1500, 800, 1300)
)
df
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300


Now split df in as many dataframes as you have distinct ('unique') sysid values using by() and calling unique:



df_list <- by(df, df$sysid, function(unique) unique)
df_list
df$sysid: 1
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
----------------------------------------------------------------------------------------------
df$sysid: 2
Date sysid power
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300





share|improve this answer
























  • I tried this too. But, the solution provided by @LenGreski is more suited for my requirement. Thanks for your help

    – Shruthi Patil
    Nov 23 '18 at 10:00











  • On Stack Overflow it is customary to click the upward arrow if a given answer is useful.

    – Chris Ruehlemann
    Nov 23 '18 at 14:49











  • I have already. It says I don't have enough reputation for it to get displayed. However, it is recorded.

    – Shruthi Patil
    Nov 24 '18 at 16:17














0












0








0







Another option is using the function by():



df <- data.frame(
Date = c("1.1.2018", "2.1.2018", "3.1.2018", "1.1.2018", "2.1.2018", "3.1.2018"),
sysid = c(1, 1, 1, 2, 2, 2),
power = c(1000, 1200, 800, 1500, 800, 1300)
)
df
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300


Now split df in as many dataframes as you have distinct ('unique') sysid values using by() and calling unique:



df_list <- by(df, df$sysid, function(unique) unique)
df_list
df$sysid: 1
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
----------------------------------------------------------------------------------------------
df$sysid: 2
Date sysid power
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300





share|improve this answer













Another option is using the function by():



df <- data.frame(
Date = c("1.1.2018", "2.1.2018", "3.1.2018", "1.1.2018", "2.1.2018", "3.1.2018"),
sysid = c(1, 1, 1, 2, 2, 2),
power = c(1000, 1200, 800, 1500, 800, 1300)
)
df
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300


Now split df in as many dataframes as you have distinct ('unique') sysid values using by() and calling unique:



df_list <- by(df, df$sysid, function(unique) unique)
df_list
df$sysid: 1
Date sysid power
1 1.1.2018 1 1000
2 2.1.2018 1 1200
3 3.1.2018 1 800
----------------------------------------------------------------------------------------------
df$sysid: 2
Date sysid power
4 1.1.2018 2 1500
5 2.1.2018 2 800
6 3.1.2018 2 1300






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 '18 at 15:52









Chris RuehlemannChris Ruehlemann

469210




469210













  • I tried this too. But, the solution provided by @LenGreski is more suited for my requirement. Thanks for your help

    – Shruthi Patil
    Nov 23 '18 at 10:00











  • On Stack Overflow it is customary to click the upward arrow if a given answer is useful.

    – Chris Ruehlemann
    Nov 23 '18 at 14:49











  • I have already. It says I don't have enough reputation for it to get displayed. However, it is recorded.

    – Shruthi Patil
    Nov 24 '18 at 16:17



















  • I tried this too. But, the solution provided by @LenGreski is more suited for my requirement. Thanks for your help

    – Shruthi Patil
    Nov 23 '18 at 10:00











  • On Stack Overflow it is customary to click the upward arrow if a given answer is useful.

    – Chris Ruehlemann
    Nov 23 '18 at 14:49











  • I have already. It says I don't have enough reputation for it to get displayed. However, it is recorded.

    – Shruthi Patil
    Nov 24 '18 at 16:17

















I tried this too. But, the solution provided by @LenGreski is more suited for my requirement. Thanks for your help

– Shruthi Patil
Nov 23 '18 at 10:00





I tried this too. But, the solution provided by @LenGreski is more suited for my requirement. Thanks for your help

– Shruthi Patil
Nov 23 '18 at 10:00













On Stack Overflow it is customary to click the upward arrow if a given answer is useful.

– Chris Ruehlemann
Nov 23 '18 at 14:49





On Stack Overflow it is customary to click the upward arrow if a given answer is useful.

– Chris Ruehlemann
Nov 23 '18 at 14:49













I have already. It says I don't have enough reputation for it to get displayed. However, it is recorded.

– Shruthi Patil
Nov 24 '18 at 16:17





I have already. It says I don't have enough reputation for it to get displayed. However, it is recorded.

– Shruthi Patil
Nov 24 '18 at 16:17


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53410738%2fusage-of-for-loop-in-r-to-split-a-dataframe-into-several-dataframes%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

鏡平學校

ꓛꓣだゔៀៅຸ໢ທຮ໕໒ ,ໂ'໥໓າ໼ឨឲ៵៭ៈゎゔit''䖳𥁄卿' ☨₤₨こゎもょの;ꜹꟚꞖꞵꟅꞛေၦေɯ,ɨɡ𛃵𛁹ޝ޳ޠ޾,ޤޒޯ޾𫝒𫠁သ𛅤チョ'サノބޘދ𛁐ᶿᶇᶀᶋᶠ㨑㽹⻮ꧬ꧹؍۩وَؠ㇕㇃㇪ ㇦㇋㇋ṜẰᵡᴠ 軌ᵕ搜۳ٰޗޮ޷ސޯ𫖾𫅀ल, ꙭ꙰ꚅꙁꚊꞻꝔ꟠Ꝭㄤﺟޱސꧨꧼ꧴ꧯꧽ꧲ꧯ'⽹⽭⾁⿞⼳⽋២៩ញណើꩯꩤ꩸ꩮᶻᶺᶧᶂ𫳲𫪭𬸄𫵰𬖩𬫣𬊉ၲ𛅬㕦䬺𫝌𫝼,,𫟖𫞽ហៅ஫㆔ాఆఅꙒꚞꙍ,Ꙟ꙱エ ,ポテ,フࢰࢯ𫟠𫞶 𫝤𫟠ﺕﹱﻜﻣ𪵕𪭸𪻆𪾩𫔷ġ,ŧآꞪ꟥,ꞔꝻ♚☹⛵𛀌ꬷꭞȄƁƪƬșƦǙǗdžƝǯǧⱦⱰꓕꓢႋ神 ဴ၀க௭எ௫ឫោ ' េㇷㇴㇼ神ㇸㇲㇽㇴㇼㇻㇸ'ㇸㇿㇸㇹㇰㆣꓚꓤ₡₧ ㄨㄟ㄂ㄖㄎ໗ツڒذ₶।ऩछएोञयूटक़कयँृी,冬'𛅢𛅥ㇱㇵㇶ𥄥𦒽𠣧𠊓𧢖𥞘𩔋цѰㄠſtʯʭɿʆʗʍʩɷɛ,əʏダヵㄐㄘR{gỚṖḺờṠṫảḙḭᴮᵏᴘᵀᵷᵕᴜᴏᵾq﮲ﲿﴽﭙ軌ﰬﶚﶧ﫲Ҝжюїкӈㇴffצּ﬘﭅﬈軌'ffistfflſtffतभफɳɰʊɲʎ𛁱𛁖𛁮𛀉 𛂯𛀞నఋŀŲ 𫟲𫠖𫞺ຆຆ ໹້໕໗ๆทԊꧢꧠ꧰ꓱ⿝⼑ŎḬẃẖỐẅ ,ờỰỈỗﮊDžȩꭏꭎꬻ꭮ꬿꭖꭥꭅ㇭神 ⾈ꓵꓑ⺄㄄ㄪㄙㄅㄇstA۵䞽ॶ𫞑𫝄㇉㇇゜軌𩜛𩳠Jﻺ‚Üမ႕ႌႊၐၸဓၞၞၡ៸wyvtᶎᶪᶹစဎ꣡꣰꣢꣤ٗ؋لㇳㇾㇻㇱ㆐㆔,,㆟Ⱶヤマފ޼ޝަݿݞݠݷݐ',ݘ,ݪݙݵ𬝉𬜁𫝨𫞘くせぉて¼óû×ó£…𛅑הㄙくԗԀ5606神45,神796'𪤻𫞧ꓐ㄁ㄘɥɺꓵꓲ3''7034׉ⱦⱠˆ“𫝋ȍ,ꩲ軌꩷ꩶꩧꩫఞ۔فڱێظペサ神ナᴦᵑ47 9238їﻂ䐊䔉㠸﬎ffiﬣ,לּᴷᴦᵛᵽ,ᴨᵤ ᵸᵥᴗᵈꚏꚉꚟ⻆rtǟƴ𬎎

Why https connections are so slow when debugging (stepping over) in Java?