Matlab: Euclidean norm (or difference) between two vectors
I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?
m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);
%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.
for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end
Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)
dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2));
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)
dG = sqrt(sum(tmp.^2,2));
matlab
add a comment |
I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?
m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);
%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.
for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end
Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)
dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2));
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)
dG = sqrt(sum(tmp.^2,2));
matlab
Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.
– Cris Luengo
Nov 22 '18 at 16:55
Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not uselog10
in your code.
– Cris Luengo
Nov 22 '18 at 16:58
@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?
– HCAI
Nov 23 '18 at 8:31
Well, note thatlog(std(x))~=std(log(x))
, so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note thatlog(G)-log(C) = log(G./C)
, think about what that means for your data!
– Cris Luengo
Nov 23 '18 at 19:48
add a comment |
I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?
m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);
%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.
for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end
Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)
dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2));
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)
dG = sqrt(sum(tmp.^2,2));
matlab
I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?
m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);
%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.
for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end
Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)
dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2));
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)
dG = sqrt(sum(tmp.^2,2));
matlab
matlab
edited Nov 22 '18 at 14:25
HCAI
asked Nov 19 '18 at 10:25
HCAIHCAI
58041338
58041338
Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.
– Cris Luengo
Nov 22 '18 at 16:55
Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not uselog10
in your code.
– Cris Luengo
Nov 22 '18 at 16:58
@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?
– HCAI
Nov 23 '18 at 8:31
Well, note thatlog(std(x))~=std(log(x))
, so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note thatlog(G)-log(C) = log(G./C)
, think about what that means for your data!
– Cris Luengo
Nov 23 '18 at 19:48
add a comment |
Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.
– Cris Luengo
Nov 22 '18 at 16:55
Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not uselog10
in your code.
– Cris Luengo
Nov 22 '18 at 16:58
@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?
– HCAI
Nov 23 '18 at 8:31
Well, note thatlog(std(x))~=std(log(x))
, so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note thatlog(G)-log(C) = log(G./C)
, think about what that means for your data!
– Cris Luengo
Nov 23 '18 at 19:48
Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.
– Cris Luengo
Nov 22 '18 at 16:55
Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.
– Cris Luengo
Nov 22 '18 at 16:55
Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use
log10
in your code.– Cris Luengo
Nov 22 '18 at 16:58
Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use
log10
in your code.– Cris Luengo
Nov 22 '18 at 16:58
@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?
– HCAI
Nov 23 '18 at 8:31
@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?
– HCAI
Nov 23 '18 at 8:31
Well, note that
log(std(x))~=std(log(x))
, so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C)
, think about what that means for your data!– Cris Luengo
Nov 23 '18 at 19:48
Well, note that
log(std(x))~=std(log(x))
, so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C)
, think about what that means for your data!– Cris Luengo
Nov 23 '18 at 19:48
add a comment |
2 Answers
2
active
oldest
votes
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);
where the name-pair 'mahalanobis',diag(log10(GSD)).^2
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
Nov 19 '18 at 10:36
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
Nov 19 '18 at 10:39
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
Nov 19 '18 at 10:41
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
Nov 19 '18 at 10:46
I made a mistake, size(C)=[m,9].
– HCAI
Nov 19 '18 at 10:50
|
show 3 more comments
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
Nov 19 '18 at 12:25
I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?
– HCAI
Nov 22 '18 at 14:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372579%2fmatlab-euclidean-norm-or-difference-between-two-vectors%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);
where the name-pair 'mahalanobis',diag(log10(GSD)).^2
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
Nov 19 '18 at 10:36
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
Nov 19 '18 at 10:39
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
Nov 19 '18 at 10:41
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
Nov 19 '18 at 10:46
I made a mistake, size(C)=[m,9].
– HCAI
Nov 19 '18 at 10:50
|
show 3 more comments
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);
where the name-pair 'mahalanobis',diag(log10(GSD)).^2
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
Nov 19 '18 at 10:36
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
Nov 19 '18 at 10:39
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
Nov 19 '18 at 10:41
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
Nov 19 '18 at 10:46
I made a mistake, size(C)=[m,9].
– HCAI
Nov 19 '18 at 10:50
|
show 3 more comments
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);
where the name-pair 'mahalanobis',diag(log10(GSD)).^2
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);
where the name-pair 'mahalanobis',diag(log10(GSD)).^2
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
edited Nov 26 '18 at 7:19
answered Nov 19 '18 at 10:31
Nicky MattssonNicky Mattsson
2,397725
2,397725
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
Nov 19 '18 at 10:36
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
Nov 19 '18 at 10:39
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
Nov 19 '18 at 10:41
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
Nov 19 '18 at 10:46
I made a mistake, size(C)=[m,9].
– HCAI
Nov 19 '18 at 10:50
|
show 3 more comments
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
Nov 19 '18 at 10:36
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
Nov 19 '18 at 10:39
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
Nov 19 '18 at 10:41
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
Nov 19 '18 at 10:46
I made a mistake, size(C)=[m,9].
– HCAI
Nov 19 '18 at 10:50
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
Nov 19 '18 at 10:36
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
Nov 19 '18 at 10:36
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
Nov 19 '18 at 10:39
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
Nov 19 '18 at 10:39
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
Nov 19 '18 at 10:41
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
Nov 19 '18 at 10:41
That is not the problem,
C(:,2:9)
should work, the variable C
is 10
long in your example so C(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use of GSD
. Give me a second to fix it.– Nicky Mattsson
Nov 19 '18 at 10:46
That is not the problem,
C(:,2:9)
should work, the variable C
is 10
long in your example so C(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use of GSD
. Give me a second to fix it.– Nicky Mattsson
Nov 19 '18 at 10:46
I made a mistake, size(C)=[m,9].
– HCAI
Nov 19 '18 at 10:50
I made a mistake, size(C)=[m,9].
– HCAI
Nov 19 '18 at 10:50
|
show 3 more comments
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
Nov 19 '18 at 12:25
I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?
– HCAI
Nov 22 '18 at 14:37
add a comment |
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
Nov 19 '18 at 12:25
I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?
– HCAI
Nov 22 '18 at 14:37
add a comment |
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
edited Nov 19 '18 at 11:09
answered Nov 19 '18 at 11:04
BriceBrice
1,400110
1,400110
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
Nov 19 '18 at 12:25
I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?
– HCAI
Nov 22 '18 at 14:37
add a comment |
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
Nov 19 '18 at 12:25
I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?
– HCAI
Nov 22 '18 at 14:37
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
Nov 19 '18 at 12:25
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
Nov 19 '18 at 12:25
I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?
– HCAI
Nov 22 '18 at 14:37
I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?
– HCAI
Nov 22 '18 at 14:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372579%2fmatlab-euclidean-norm-or-difference-between-two-vectors%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.
– Cris Luengo
Nov 22 '18 at 16:55
Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use
log10
in your code.– Cris Luengo
Nov 22 '18 at 16:58
@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?
– HCAI
Nov 23 '18 at 8:31
Well, note that
log(std(x))~=std(log(x))
, so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note thatlog(G)-log(C) = log(G./C)
, think about what that means for your data!– Cris Luengo
Nov 23 '18 at 19:48