How to calculate regression residuals in R for each individual in a longitudinal analysis?












0















I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).



Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):



lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)


Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).



Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):



#Group mean-centering a variable. Relevant for L1 variables only.
gmc = function(variable, group){
return(ave(variable, group, FUN = function(x){x - mean(x)}))
}

df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)


Data extract in long format (where Person_ID is the person, nYear is time, Weekly_Gross_Pay_Main_Job is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc is the group-mean centered version):



structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 
100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
-0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
0.0436666666666666, -0.120714285714286, -0.0647142857142858,
-0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
)), row.names = c(NA, 100L), class = "data.frame")









share|improve this question



























    0















    I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).



    Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):



    lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)


    Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).



    Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):



    #Group mean-centering a variable. Relevant for L1 variables only.
    gmc = function(variable, group){
    return(ave(variable, group, FUN = function(x){x - mean(x)}))
    }

    df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)


    Data extract in long format (where Person_ID is the person, nYear is time, Weekly_Gross_Pay_Main_Job is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc is the group-mean centered version):



    structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 
    100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
    100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
    100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
    100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
    100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
    100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
    100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
    100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
    100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
    100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
    100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
    100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
    100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
    100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
    7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
    8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
    5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
    13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
    6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
    2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
    8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
    0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
    0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
    0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
    0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
    0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
    0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
    0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
    0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
    1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
    0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
    0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
    0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
    0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
    0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
    0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
    -0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
    0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
    -0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
    -0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
    -0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
    0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
    -0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
    -0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
    0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
    0.0436666666666666, -0.120714285714286, -0.0647142857142858,
    -0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
    0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
    -0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
    0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
    )), row.names = c(NA, 100L), class = "data.frame")









    share|improve this question

























      0












      0








      0








      I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).



      Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):



      lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)


      Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).



      Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):



      #Group mean-centering a variable. Relevant for L1 variables only.
      gmc = function(variable, group){
      return(ave(variable, group, FUN = function(x){x - mean(x)}))
      }

      df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)


      Data extract in long format (where Person_ID is the person, nYear is time, Weekly_Gross_Pay_Main_Job is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc is the group-mean centered version):



      structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 
      100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
      100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
      100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
      100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
      100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
      100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
      100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
      100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
      100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
      100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
      100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
      100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
      100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
      100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
      7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
      8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
      5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
      13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
      6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
      2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
      8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
      0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
      0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
      0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
      0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
      0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
      0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
      0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
      0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
      1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
      0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
      0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
      0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
      0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
      0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
      0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
      -0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
      0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
      -0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
      -0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
      -0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
      0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
      -0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
      -0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
      0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
      0.0436666666666666, -0.120714285714286, -0.0647142857142858,
      -0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
      0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
      -0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
      0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
      )), row.names = c(NA, 100L), class = "data.frame")









      share|improve this question














      I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).



      Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):



      lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)


      Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).



      Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):



      #Group mean-centering a variable. Relevant for L1 variables only.
      gmc = function(variable, group){
      return(ave(variable, group, FUN = function(x){x - mean(x)}))
      }

      df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)


      Data extract in long format (where Person_ID is the person, nYear is time, Weekly_Gross_Pay_Main_Job is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc is the group-mean centered version):



      structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L, 
      100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
      100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
      100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
      100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
      100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
      100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
      100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
      100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
      100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
      100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
      100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
      100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
      100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
      100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
      7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
      8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
      5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
      13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
      6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
      2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
      8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
      0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
      0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
      0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
      0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
      0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
      0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
      0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
      0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
      1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
      0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
      0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
      0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
      0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
      0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
      0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
      -0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
      0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
      -0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
      -0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
      -0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
      0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
      -0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
      -0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
      0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
      0.0436666666666666, -0.120714285714286, -0.0647142857142858,
      -0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
      0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
      -0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
      0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
      )), row.names = c(NA, 100L), class = "data.frame")






      r regression longitudinal multilevel-analysis






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 '18 at 3:16









      aspark2020aspark2020

      185




      185
























          2 Answers
          2






          active

          oldest

          votes


















          1














          not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
          Here's a linear mixed effects model with some data i had lying around



              some.model<-lme(DV~IV, random=~1|Id, data=df)
          head(residuals(some.model))
          7 7 24 24 32 32
          -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


          If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



             extra.column<-residuals(some.model)
          extra.column.id<-names(residuals(some.model))
          extra.column<-residuals(some.model)
          cbind(extra.column,extra.column.id)
          extra.column extra.column.id
          7 "-0.0541358252373243" "7"
          7 "-0.0541358252373243" "7"
          24 "0.0642716380035857" "24"
          24 "0.0642716380035857" "24"
          32 "-0.0019754241828096" "32"
          32 "-0.0019754241828096" "32"


          Sorry if this is not what you're looking for, but check out the residuals command.






          share|improve this answer































            0














            Here is how I ended up doing it:



            #Before you begin, time needs to be grand-mean centered.
            df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

            #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

            #First, create a group called `by_person`.
            df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
            by_Person <- dplyr::group_by(df, Person_ID)

            #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
            df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
            df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
            df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

            #Third, copy over the required columns (renaming them would be more efficient, but either way).
            df$RegResGrossPay <- df$.resid

            #Fourth, do an optional tidy up.
            colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
            colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
            colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
            df$Person_ID.y <- NULL
            df$nYearmc.y <- NULL
            df$Weekly_Gross_Pay_Main_Job.y <- NULL
            df$.fitted <- NULL
            df$.se.fit <- NULL
            df$.resid <- NULL
            df$.hat <- NULL
            df$.sigma <- NULL
            df$.cooksd <- NULL
            df$.std.resid <- NULL
            df.Weekly_Gross_Pay_Main_Job <- NULL

            #Fifth, generate plots of the variables you need.
            ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385696%2fhow-to-calculate-regression-residuals-in-r-for-each-individual-in-a-longitudinal%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
              Here's a linear mixed effects model with some data i had lying around



                  some.model<-lme(DV~IV, random=~1|Id, data=df)
              head(residuals(some.model))
              7 7 24 24 32 32
              -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


              If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



                 extra.column<-residuals(some.model)
              extra.column.id<-names(residuals(some.model))
              extra.column<-residuals(some.model)
              cbind(extra.column,extra.column.id)
              extra.column extra.column.id
              7 "-0.0541358252373243" "7"
              7 "-0.0541358252373243" "7"
              24 "0.0642716380035857" "24"
              24 "0.0642716380035857" "24"
              32 "-0.0019754241828096" "32"
              32 "-0.0019754241828096" "32"


              Sorry if this is not what you're looking for, but check out the residuals command.






              share|improve this answer




























                1














                not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
                Here's a linear mixed effects model with some data i had lying around



                    some.model<-lme(DV~IV, random=~1|Id, data=df)
                head(residuals(some.model))
                7 7 24 24 32 32
                -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


                If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



                   extra.column<-residuals(some.model)
                extra.column.id<-names(residuals(some.model))
                extra.column<-residuals(some.model)
                cbind(extra.column,extra.column.id)
                extra.column extra.column.id
                7 "-0.0541358252373243" "7"
                7 "-0.0541358252373243" "7"
                24 "0.0642716380035857" "24"
                24 "0.0642716380035857" "24"
                32 "-0.0019754241828096" "32"
                32 "-0.0019754241828096" "32"


                Sorry if this is not what you're looking for, but check out the residuals command.






                share|improve this answer


























                  1












                  1








                  1







                  not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
                  Here's a linear mixed effects model with some data i had lying around



                      some.model<-lme(DV~IV, random=~1|Id, data=df)
                  head(residuals(some.model))
                  7 7 24 24 32 32
                  -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


                  If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



                     extra.column<-residuals(some.model)
                  extra.column.id<-names(residuals(some.model))
                  extra.column<-residuals(some.model)
                  cbind(extra.column,extra.column.id)
                  extra.column extra.column.id
                  7 "-0.0541358252373243" "7"
                  7 "-0.0541358252373243" "7"
                  24 "0.0642716380035857" "24"
                  24 "0.0642716380035857" "24"
                  32 "-0.0019754241828096" "32"
                  32 "-0.0019754241828096" "32"


                  Sorry if this is not what you're looking for, but check out the residuals command.






                  share|improve this answer













                  not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
                  Here's a linear mixed effects model with some data i had lying around



                      some.model<-lme(DV~IV, random=~1|Id, data=df)
                  head(residuals(some.model))
                  7 7 24 24 32 32
                  -0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424


                  If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.



                     extra.column<-residuals(some.model)
                  extra.column.id<-names(residuals(some.model))
                  extra.column<-residuals(some.model)
                  cbind(extra.column,extra.column.id)
                  extra.column extra.column.id
                  7 "-0.0541358252373243" "7"
                  7 "-0.0541358252373243" "7"
                  24 "0.0642716380035857" "24"
                  24 "0.0642716380035857" "24"
                  32 "-0.0019754241828096" "32"
                  32 "-0.0019754241828096" "32"


                  Sorry if this is not what you're looking for, but check out the residuals command.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 22 '18 at 9:39









                  Huy PhamHuy Pham

                  1315




                  1315

























                      0














                      Here is how I ended up doing it:



                      #Before you begin, time needs to be grand-mean centered.
                      df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

                      #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

                      #First, create a group called `by_person`.
                      df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                      by_Person <- dplyr::group_by(df, Person_ID)

                      #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
                      df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
                      df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                      df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

                      #Third, copy over the required columns (renaming them would be more efficient, but either way).
                      df$RegResGrossPay <- df$.resid

                      #Fourth, do an optional tidy up.
                      colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
                      colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
                      colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
                      df$Person_ID.y <- NULL
                      df$nYearmc.y <- NULL
                      df$Weekly_Gross_Pay_Main_Job.y <- NULL
                      df$.fitted <- NULL
                      df$.se.fit <- NULL
                      df$.resid <- NULL
                      df$.hat <- NULL
                      df$.sigma <- NULL
                      df$.cooksd <- NULL
                      df$.std.resid <- NULL
                      df.Weekly_Gross_Pay_Main_Job <- NULL

                      #Fifth, generate plots of the variables you need.
                      ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)





                      share|improve this answer




























                        0














                        Here is how I ended up doing it:



                        #Before you begin, time needs to be grand-mean centered.
                        df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

                        #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

                        #First, create a group called `by_person`.
                        df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                        by_Person <- dplyr::group_by(df, Person_ID)

                        #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
                        df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
                        df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                        df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

                        #Third, copy over the required columns (renaming them would be more efficient, but either way).
                        df$RegResGrossPay <- df$.resid

                        #Fourth, do an optional tidy up.
                        colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
                        colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
                        colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
                        df$Person_ID.y <- NULL
                        df$nYearmc.y <- NULL
                        df$Weekly_Gross_Pay_Main_Job.y <- NULL
                        df$.fitted <- NULL
                        df$.se.fit <- NULL
                        df$.resid <- NULL
                        df$.hat <- NULL
                        df$.sigma <- NULL
                        df$.cooksd <- NULL
                        df$.std.resid <- NULL
                        df.Weekly_Gross_Pay_Main_Job <- NULL

                        #Fifth, generate plots of the variables you need.
                        ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)





                        share|improve this answer


























                          0












                          0








                          0







                          Here is how I ended up doing it:



                          #Before you begin, time needs to be grand-mean centered.
                          df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

                          #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

                          #First, create a group called `by_person`.
                          df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                          by_Person <- dplyr::group_by(df, Person_ID)

                          #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
                          df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
                          df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                          df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

                          #Third, copy over the required columns (renaming them would be more efficient, but either way).
                          df$RegResGrossPay <- df$.resid

                          #Fourth, do an optional tidy up.
                          colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
                          colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
                          colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
                          df$Person_ID.y <- NULL
                          df$nYearmc.y <- NULL
                          df$Weekly_Gross_Pay_Main_Job.y <- NULL
                          df$.fitted <- NULL
                          df$.se.fit <- NULL
                          df$.resid <- NULL
                          df$.hat <- NULL
                          df$.sigma <- NULL
                          df$.cooksd <- NULL
                          df$.std.resid <- NULL
                          df.Weekly_Gross_Pay_Main_Job <- NULL

                          #Fifth, generate plots of the variables you need.
                          ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)





                          share|improve this answer













                          Here is how I ended up doing it:



                          #Before you begin, time needs to be grand-mean centered.
                          df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)

                          #Now to regress the time-varying covariate onto grand-mean centered time and complete the process.

                          #First, create a group called `by_person`.
                          df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                          by_Person <- dplyr::group_by(df, Person_ID)

                          #Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
                          df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
                          df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
                          df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")

                          #Third, copy over the required columns (renaming them would be more efficient, but either way).
                          df$RegResGrossPay <- df$.resid

                          #Fourth, do an optional tidy up.
                          colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
                          colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
                          colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
                          df$Person_ID.y <- NULL
                          df$nYearmc.y <- NULL
                          df$Weekly_Gross_Pay_Main_Job.y <- NULL
                          df$.fitted <- NULL
                          df$.se.fit <- NULL
                          df$.resid <- NULL
                          df$.hat <- NULL
                          df$.sigma <- NULL
                          df$.cooksd <- NULL
                          df$.std.resid <- NULL
                          df.Weekly_Gross_Pay_Main_Job <- NULL

                          #Fifth, generate plots of the variables you need.
                          ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 27 '18 at 6:17









                          aspark2020aspark2020

                          185




                          185






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385696%2fhow-to-calculate-regression-residuals-in-r-for-each-individual-in-a-longitudinal%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Guess what letter conforming each word

                              Port of Spain

                              Run scheduled task as local user group (not BUILTIN)