correlations across columns AWK












1















I need to calculate correlations across columns.

The code below works when calculating correlations across rows.

What is needed to modify to calculate across columns?



Input file:  
Name C1 C2 C3 C4 C5 C6
R1 1 2 3 4 5 6
R2 2 1 1 0 1 0
R3 1 3 1 1 2 1
R4 1 1 0 2 0 1
R5 1 2 2 2 0 2
R6 1 1 0 1 2 0

Desired Output:
C1 C1 1.00
C1 C2 -0.4
C1 C3 -0.069
C1 C4 -0.597
C1 C5 -0.175
C1 C5 -0.362
C2 C2 1.00
C2 C3 0.4889
etc.


Code:
awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'









share|improve this question























  • Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.

    – Corentin Limier
    Nov 21 '18 at 16:29
















1















I need to calculate correlations across columns.

The code below works when calculating correlations across rows.

What is needed to modify to calculate across columns?



Input file:  
Name C1 C2 C3 C4 C5 C6
R1 1 2 3 4 5 6
R2 2 1 1 0 1 0
R3 1 3 1 1 2 1
R4 1 1 0 2 0 1
R5 1 2 2 2 0 2
R6 1 1 0 1 2 0

Desired Output:
C1 C1 1.00
C1 C2 -0.4
C1 C3 -0.069
C1 C4 -0.597
C1 C5 -0.175
C1 C5 -0.362
C2 C2 1.00
C2 C3 0.4889
etc.


Code:
awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'









share|improve this question























  • Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.

    – Corentin Limier
    Nov 21 '18 at 16:29














1












1








1








I need to calculate correlations across columns.

The code below works when calculating correlations across rows.

What is needed to modify to calculate across columns?



Input file:  
Name C1 C2 C3 C4 C5 C6
R1 1 2 3 4 5 6
R2 2 1 1 0 1 0
R3 1 3 1 1 2 1
R4 1 1 0 2 0 1
R5 1 2 2 2 0 2
R6 1 1 0 1 2 0

Desired Output:
C1 C1 1.00
C1 C2 -0.4
C1 C3 -0.069
C1 C4 -0.597
C1 C5 -0.175
C1 C5 -0.362
C2 C2 1.00
C2 C3 0.4889
etc.


Code:
awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'









share|improve this question














I need to calculate correlations across columns.

The code below works when calculating correlations across rows.

What is needed to modify to calculate across columns?



Input file:  
Name C1 C2 C3 C4 C5 C6
R1 1 2 3 4 5 6
R2 2 1 1 0 1 0
R3 1 3 1 1 2 1
R4 1 1 0 2 0 1
R5 1 2 2 2 0 2
R6 1 1 0 1 2 0

Desired Output:
C1 C1 1.00
C1 C2 -0.4
C1 C3 -0.069
C1 C4 -0.597
C1 C5 -0.175
C1 C5 -0.362
C2 C2 1.00
C2 C3 0.4889
etc.


Code:
awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'






awk






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 '18 at 14:57









roddyroddy

212




212













  • Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.

    – Corentin Limier
    Nov 21 '18 at 16:29



















  • Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.

    – Corentin Limier
    Nov 21 '18 at 16:29

















Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.

– Corentin Limier
Nov 21 '18 at 16:29





Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.

– Corentin Limier
Nov 21 '18 at 16:29












1 Answer
1






active

oldest

votes


















1














Don't know if looking for this kind solution, but how about to transpose first with following awk:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'


Output:



Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0


Then just combine with Your script to calculate column-column correlations:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'


Output:



C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1





share|improve this answer



















  • 1





    i was writing the exact same solution. I would just change "," to FS and add a if(j!=NR) before writing FS so there is no file separator at the end of line

    – Corentin Limier
    Nov 21 '18 at 16:28











  • Thanks, agree with both improvements.

    – Kubator
    Nov 21 '18 at 16:36











  • Thanks to you both. i can work it that way.

    – roddy
    Nov 21 '18 at 17:59











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414786%2fcorrelations-across-columns-awk%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Don't know if looking for this kind solution, but how about to transpose first with following awk:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'


Output:



Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0


Then just combine with Your script to calculate column-column correlations:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'


Output:



C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1





share|improve this answer



















  • 1





    i was writing the exact same solution. I would just change "," to FS and add a if(j!=NR) before writing FS so there is no file separator at the end of line

    – Corentin Limier
    Nov 21 '18 at 16:28











  • Thanks, agree with both improvements.

    – Kubator
    Nov 21 '18 at 16:36











  • Thanks to you both. i can work it that way.

    – roddy
    Nov 21 '18 at 17:59
















1














Don't know if looking for this kind solution, but how about to transpose first with following awk:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'


Output:



Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0


Then just combine with Your script to calculate column-column correlations:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'


Output:



C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1





share|improve this answer



















  • 1





    i was writing the exact same solution. I would just change "," to FS and add a if(j!=NR) before writing FS so there is no file separator at the end of line

    – Corentin Limier
    Nov 21 '18 at 16:28











  • Thanks, agree with both improvements.

    – Kubator
    Nov 21 '18 at 16:36











  • Thanks to you both. i can work it that way.

    – roddy
    Nov 21 '18 at 17:59














1












1








1







Don't know if looking for this kind solution, but how about to transpose first with following awk:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'


Output:



Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0


Then just combine with Your script to calculate column-column correlations:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'


Output:



C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1





share|improve this answer













Don't know if looking for this kind solution, but how about to transpose first with following awk:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'


Output:



Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0


Then just combine with Your script to calculate column-column correlations:



awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'


Output:



C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 '18 at 16:16









KubatorKubator

75911




75911








  • 1





    i was writing the exact same solution. I would just change "," to FS and add a if(j!=NR) before writing FS so there is no file separator at the end of line

    – Corentin Limier
    Nov 21 '18 at 16:28











  • Thanks, agree with both improvements.

    – Kubator
    Nov 21 '18 at 16:36











  • Thanks to you both. i can work it that way.

    – roddy
    Nov 21 '18 at 17:59














  • 1





    i was writing the exact same solution. I would just change "," to FS and add a if(j!=NR) before writing FS so there is no file separator at the end of line

    – Corentin Limier
    Nov 21 '18 at 16:28











  • Thanks, agree with both improvements.

    – Kubator
    Nov 21 '18 at 16:36











  • Thanks to you both. i can work it that way.

    – roddy
    Nov 21 '18 at 17:59








1




1





i was writing the exact same solution. I would just change "," to FS and add a if(j!=NR) before writing FS so there is no file separator at the end of line

– Corentin Limier
Nov 21 '18 at 16:28





i was writing the exact same solution. I would just change "," to FS and add a if(j!=NR) before writing FS so there is no file separator at the end of line

– Corentin Limier
Nov 21 '18 at 16:28













Thanks, agree with both improvements.

– Kubator
Nov 21 '18 at 16:36





Thanks, agree with both improvements.

– Kubator
Nov 21 '18 at 16:36













Thanks to you both. i can work it that way.

– roddy
Nov 21 '18 at 17:59





Thanks to you both. i can work it that way.

– roddy
Nov 21 '18 at 17:59




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414786%2fcorrelations-across-columns-awk%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Port of Spain

Run scheduled task as local user group (not BUILTIN)