Matlab: Euclidean norm (or difference) between two vectors












0















I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?



m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);

%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.

for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end


Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:



dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)

dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2));

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)
dG = sqrt(sum(tmp.^2,2));


enter image description here










share|improve this question

























  • Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

    – Cris Luengo
    Nov 22 '18 at 16:55











  • Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

    – Cris Luengo
    Nov 22 '18 at 16:58











  • @CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

    – HCAI
    Nov 23 '18 at 8:31











  • Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

    – Cris Luengo
    Nov 23 '18 at 19:48


















0















I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?



m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);

%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.

for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end


Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:



dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)

dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2));

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)
dG = sqrt(sum(tmp.^2,2));


enter image description here










share|improve this question

























  • Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

    – Cris Luengo
    Nov 22 '18 at 16:55











  • Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

    – Cris Luengo
    Nov 22 '18 at 16:58











  • @CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

    – HCAI
    Nov 23 '18 at 8:31











  • Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

    – Cris Luengo
    Nov 23 '18 at 19:48
















0












0








0








I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?



m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);

%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.

for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end


Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:



dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)

dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2));

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)
dG = sqrt(sum(tmp.^2,2));


enter image description here










share|improve this question
















I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?



m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);

%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.

for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end


Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:



dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)

dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2));

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)
dG = sqrt(sum(tmp.^2,2));


enter image description here







matlab






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 14:25







HCAI

















asked Nov 19 '18 at 10:25









HCAIHCAI

58041338




58041338













  • Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

    – Cris Luengo
    Nov 22 '18 at 16:55











  • Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

    – Cris Luengo
    Nov 22 '18 at 16:58











  • @CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

    – HCAI
    Nov 23 '18 at 8:31











  • Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

    – Cris Luengo
    Nov 23 '18 at 19:48





















  • Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

    – Cris Luengo
    Nov 22 '18 at 16:55











  • Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

    – Cris Luengo
    Nov 22 '18 at 16:58











  • @CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

    – HCAI
    Nov 23 '18 at 8:31











  • Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

    – Cris Luengo
    Nov 23 '18 at 19:48



















Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

– Cris Luengo
Nov 22 '18 at 16:55





Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

– Cris Luengo
Nov 22 '18 at 16:55













Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

– Cris Luengo
Nov 22 '18 at 16:58





Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

– Cris Luengo
Nov 22 '18 at 16:58













@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

– HCAI
Nov 23 '18 at 8:31





@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

– HCAI
Nov 23 '18 at 8:31













Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

– Cris Luengo
Nov 23 '18 at 19:48







Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

– Cris Luengo
Nov 23 '18 at 19:48














2 Answers
2






active

oldest

votes


















1














You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);


where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).



Implicit expansion



In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.






share|improve this answer


























  • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

    – HCAI
    Nov 19 '18 at 10:36











  • No that does not matter. You will have to index into it anyway.

    – Nicky Mattsson
    Nov 19 '18 at 10:39











  • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

    – HCAI
    Nov 19 '18 at 10:41













  • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

    – Nicky Mattsson
    Nov 19 '18 at 10:46











  • I made a mistake, size(C)=[m,9].

    – HCAI
    Nov 19 '18 at 10:50



















1














The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



realmax('single')
ans =
3.4028e+38

realmax('double')
ans =
1.7977e+308


With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



With new versions (2016b and later), simply use:



tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



The following code will work everywhere



tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later





share|improve this answer


























  • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

    – HCAI
    Nov 19 '18 at 12:25











  • I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

    – HCAI
    Nov 22 '18 at 14:37











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372579%2fmatlab-euclidean-norm-or-difference-between-two-vectors%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);


where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).



Implicit expansion



In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.






share|improve this answer


























  • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

    – HCAI
    Nov 19 '18 at 10:36











  • No that does not matter. You will have to index into it anyway.

    – Nicky Mattsson
    Nov 19 '18 at 10:39











  • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

    – HCAI
    Nov 19 '18 at 10:41













  • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

    – Nicky Mattsson
    Nov 19 '18 at 10:46











  • I made a mistake, size(C)=[m,9].

    – HCAI
    Nov 19 '18 at 10:50
















1














You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);


where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).



Implicit expansion



In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.






share|improve this answer


























  • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

    – HCAI
    Nov 19 '18 at 10:36











  • No that does not matter. You will have to index into it anyway.

    – Nicky Mattsson
    Nov 19 '18 at 10:39











  • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

    – HCAI
    Nov 19 '18 at 10:41













  • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

    – Nicky Mattsson
    Nov 19 '18 at 10:46











  • I made a mistake, size(C)=[m,9].

    – HCAI
    Nov 19 '18 at 10:50














1












1








1







You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);


where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).



Implicit expansion



In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.






share|improve this answer















You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);


where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).



Implicit expansion



In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 26 '18 at 7:19

























answered Nov 19 '18 at 10:31









Nicky MattssonNicky Mattsson

2,397725




2,397725













  • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

    – HCAI
    Nov 19 '18 at 10:36











  • No that does not matter. You will have to index into it anyway.

    – Nicky Mattsson
    Nov 19 '18 at 10:39











  • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

    – HCAI
    Nov 19 '18 at 10:41













  • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

    – Nicky Mattsson
    Nov 19 '18 at 10:46











  • I made a mistake, size(C)=[m,9].

    – HCAI
    Nov 19 '18 at 10:50



















  • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

    – HCAI
    Nov 19 '18 at 10:36











  • No that does not matter. You will have to index into it anyway.

    – Nicky Mattsson
    Nov 19 '18 at 10:39











  • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

    – HCAI
    Nov 19 '18 at 10:41













  • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

    – Nicky Mattsson
    Nov 19 '18 at 10:46











  • I made a mistake, size(C)=[m,9].

    – HCAI
    Nov 19 '18 at 10:50

















Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

– HCAI
Nov 19 '18 at 10:36





Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

– HCAI
Nov 19 '18 at 10:36













No that does not matter. You will have to index into it anyway.

– Nicky Mattsson
Nov 19 '18 at 10:39





No that does not matter. You will have to index into it anyway.

– Nicky Mattsson
Nov 19 '18 at 10:39













Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

– HCAI
Nov 19 '18 at 10:41







Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

– HCAI
Nov 19 '18 at 10:41















That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

– Nicky Mattsson
Nov 19 '18 at 10:46





That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

– Nicky Mattsson
Nov 19 '18 at 10:46













I made a mistake, size(C)=[m,9].

– HCAI
Nov 19 '18 at 10:50





I made a mistake, size(C)=[m,9].

– HCAI
Nov 19 '18 at 10:50













1














The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



realmax('single')
ans =
3.4028e+38

realmax('double')
ans =
1.7977e+308


With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



With new versions (2016b and later), simply use:



tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



The following code will work everywhere



tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later





share|improve this answer


























  • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

    – HCAI
    Nov 19 '18 at 12:25











  • I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

    – HCAI
    Nov 22 '18 at 14:37
















1














The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



realmax('single')
ans =
3.4028e+38

realmax('double')
ans =
1.7977e+308


With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



With new versions (2016b and later), simply use:



tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



The following code will work everywhere



tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later





share|improve this answer


























  • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

    – HCAI
    Nov 19 '18 at 12:25











  • I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

    – HCAI
    Nov 22 '18 at 14:37














1












1








1







The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



realmax('single')
ans =
3.4028e+38

realmax('double')
ans =
1.7977e+308


With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



With new versions (2016b and later), simply use:



tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



The following code will work everywhere



tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later





share|improve this answer















The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



realmax('single')
ans =
3.4028e+38

realmax('double')
ans =
1.7977e+308


With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



With new versions (2016b and later), simply use:



tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



The following code will work everywhere



tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 19 '18 at 11:09

























answered Nov 19 '18 at 11:04









BriceBrice

1,400110




1,400110













  • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

    – HCAI
    Nov 19 '18 at 12:25











  • I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

    – HCAI
    Nov 22 '18 at 14:37



















  • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

    – HCAI
    Nov 19 '18 at 12:25











  • I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

    – HCAI
    Nov 22 '18 at 14:37

















Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

– HCAI
Nov 19 '18 at 12:25





Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

– HCAI
Nov 19 '18 at 12:25













I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

– HCAI
Nov 22 '18 at 14:37





I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

– HCAI
Nov 22 '18 at 14:37


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372579%2fmatlab-euclidean-norm-or-difference-between-two-vectors%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Port of Spain

Run scheduled task as local user group (not BUILTIN)