correlations across columns AWK
I need to calculate correlations across columns.
The code below works when calculating correlations across rows.
What is needed to modify to calculate across columns?
Input file:
Name C1 C2 C3 C4 C5 C6
R1 1 2 3 4 5 6
R2 2 1 1 0 1 0
R3 1 3 1 1 2 1
R4 1 1 0 2 0 1
R5 1 2 2 2 0 2
R6 1 1 0 1 2 0
Desired Output:
C1 C1 1.00
C1 C2 -0.4
C1 C3 -0.069
C1 C4 -0.597
C1 C5 -0.175
C1 C5 -0.362
C2 C2 1.00
C2 C3 0.4889
etc.
Code:
awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'
awk
add a comment |
I need to calculate correlations across columns.
The code below works when calculating correlations across rows.
What is needed to modify to calculate across columns?
Input file:
Name C1 C2 C3 C4 C5 C6
R1 1 2 3 4 5 6
R2 2 1 1 0 1 0
R3 1 3 1 1 2 1
R4 1 1 0 2 0 1
R5 1 2 2 2 0 2
R6 1 1 0 1 2 0
Desired Output:
C1 C1 1.00
C1 C2 -0.4
C1 C3 -0.069
C1 C4 -0.597
C1 C5 -0.175
C1 C5 -0.362
C2 C2 1.00
C2 C3 0.4889
etc.
Code:
awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'
awk
Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.
– Corentin Limier
Nov 21 '18 at 16:29
add a comment |
I need to calculate correlations across columns.
The code below works when calculating correlations across rows.
What is needed to modify to calculate across columns?
Input file:
Name C1 C2 C3 C4 C5 C6
R1 1 2 3 4 5 6
R2 2 1 1 0 1 0
R3 1 3 1 1 2 1
R4 1 1 0 2 0 1
R5 1 2 2 2 0 2
R6 1 1 0 1 2 0
Desired Output:
C1 C1 1.00
C1 C2 -0.4
C1 C3 -0.069
C1 C4 -0.597
C1 C5 -0.175
C1 C5 -0.362
C2 C2 1.00
C2 C3 0.4889
etc.
Code:
awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'
awk
I need to calculate correlations across columns.
The code below works when calculating correlations across rows.
What is needed to modify to calculate across columns?
Input file:
Name C1 C2 C3 C4 C5 C6
R1 1 2 3 4 5 6
R2 2 1 1 0 1 0
R3 1 3 1 1 2 1
R4 1 1 0 2 0 1
R5 1 2 2 2 0 2
R6 1 1 0 1 2 0
Desired Output:
C1 C1 1.00
C1 C2 -0.4
C1 C3 -0.069
C1 C4 -0.597
C1 C5 -0.175
C1 C5 -0.362
C2 C2 1.00
C2 C3 0.4889
etc.
Code:
awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'
awk
awk
asked Nov 21 '18 at 14:57
roddyroddy
212
212
Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.
– Corentin Limier
Nov 21 '18 at 16:29
add a comment |
Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.
– Corentin Limier
Nov 21 '18 at 16:29
Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.
– Corentin Limier
Nov 21 '18 at 16:29
Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.
– Corentin Limier
Nov 21 '18 at 16:29
add a comment |
1 Answer
1
active
oldest
votes
Don't know if looking for this kind solution, but how about to transpose first with following awk:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'
Output:
Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0
Then just combine with Your script to calculate column-column correlations:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'
Output:
C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1
1
i was writing the exact same solution. I would just change","
toFS
and add aif(j!=NR)
before writingFS
so there is no file separator at the end of line
– Corentin Limier
Nov 21 '18 at 16:28
Thanks, agree with both improvements.
– Kubator
Nov 21 '18 at 16:36
Thanks to you both. i can work it that way.
– roddy
Nov 21 '18 at 17:59
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414786%2fcorrelations-across-columns-awk%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Don't know if looking for this kind solution, but how about to transpose first with following awk:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'
Output:
Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0
Then just combine with Your script to calculate column-column correlations:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'
Output:
C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1
1
i was writing the exact same solution. I would just change","
toFS
and add aif(j!=NR)
before writingFS
so there is no file separator at the end of line
– Corentin Limier
Nov 21 '18 at 16:28
Thanks, agree with both improvements.
– Kubator
Nov 21 '18 at 16:36
Thanks to you both. i can work it that way.
– roddy
Nov 21 '18 at 17:59
add a comment |
Don't know if looking for this kind solution, but how about to transpose first with following awk:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'
Output:
Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0
Then just combine with Your script to calculate column-column correlations:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'
Output:
C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1
1
i was writing the exact same solution. I would just change","
toFS
and add aif(j!=NR)
before writingFS
so there is no file separator at the end of line
– Corentin Limier
Nov 21 '18 at 16:28
Thanks, agree with both improvements.
– Kubator
Nov 21 '18 at 16:36
Thanks to you both. i can work it that way.
– roddy
Nov 21 '18 at 17:59
add a comment |
Don't know if looking for this kind solution, but how about to transpose first with following awk:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'
Output:
Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0
Then just combine with Your script to calculate column-column correlations:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'
Output:
C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1
Don't know if looking for this kind solution, but how about to transpose first with following awk:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
'
Output:
Name R1 R2 R3 R4 R5 R6
C1 1 2 1 1 1 1
C2 2 1 3 1 2 1
C3 3 1 1 0 2 0
C4 4 0 1 2 2 1
C5 5 1 2 0 0 2
C6 6 0 1 1 2 0
Then just combine with Your script to calculate column-column correlations:
awk '
{ for (i=1;i<=NF;i++) arr[i","NR]=$i; }
END {
for (i=1;i<=NF;i++) {
for (j=1;j<=NR;j++) printf("%s%s",arr[i","j],FS);
printf("%s",RS);
}
}
' roddy.txt | awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if (b <= 0) next
for (i = 2; i <= NF; ++i) x[NR, i] = ($i - a) / b
n[NR] = $1
for (i = 2; i <= NR; ++i) {
if (!(i in n)) continue
a = 0
for (k = 2; k <= NF; ++k)
a += x[NR, k] * x[i, k]
print n[NR], n[i], a
}}'
Output:
C1 C1 1
C2 C1 -0.4
C2 C2 1
C3 C1 -0.069843
C3 C2 0.488901
C3 C3 1
C4 C1 -0.597614
C4 C2 0.239046
C4 C3 0.667827
C4 C4 1
C5 C1 -0.175412
C5 C2 0.30697
C5 C3 0.581936
C5 C4 0.576557
C5 C5 1
C6 C1 -0.362738
C6 C2 0.362738
C6 C3 0.861381
C6 C4 0.932143
C6 C5 0.731727
C6 C6 1
answered Nov 21 '18 at 16:16
KubatorKubator
75911
75911
1
i was writing the exact same solution. I would just change","
toFS
and add aif(j!=NR)
before writingFS
so there is no file separator at the end of line
– Corentin Limier
Nov 21 '18 at 16:28
Thanks, agree with both improvements.
– Kubator
Nov 21 '18 at 16:36
Thanks to you both. i can work it that way.
– roddy
Nov 21 '18 at 17:59
add a comment |
1
i was writing the exact same solution. I would just change","
toFS
and add aif(j!=NR)
before writingFS
so there is no file separator at the end of line
– Corentin Limier
Nov 21 '18 at 16:28
Thanks, agree with both improvements.
– Kubator
Nov 21 '18 at 16:36
Thanks to you both. i can work it that way.
– roddy
Nov 21 '18 at 17:59
1
1
i was writing the exact same solution. I would just change
","
to FS
and add a if(j!=NR)
before writing FS
so there is no file separator at the end of line– Corentin Limier
Nov 21 '18 at 16:28
i was writing the exact same solution. I would just change
","
to FS
and add a if(j!=NR)
before writing FS
so there is no file separator at the end of line– Corentin Limier
Nov 21 '18 at 16:28
Thanks, agree with both improvements.
– Kubator
Nov 21 '18 at 16:36
Thanks, agree with both improvements.
– Kubator
Nov 21 '18 at 16:36
Thanks to you both. i can work it that way.
– roddy
Nov 21 '18 at 17:59
Thanks to you both. i can work it that way.
– roddy
Nov 21 '18 at 17:59
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414786%2fcorrelations-across-columns-awk%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Alternative solution : use another language with mathematics libraries, I don't think that awk is really appropriate for that.
– Corentin Limier
Nov 21 '18 at 16:29