Matlab: Euclidean norm (or difference) between two vectors

I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?

m=1E7;

G=1E5*rand(1,8);

C=1E5*[zeros(m,1),rand(m,8)]; 

GSD=10*rand(1,8);



%I've taken the log10 of the values because G and C are very large in magnitude. 

%Don't know if it's worth it.



for i=1:m

    dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));

end

Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)



dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2)); 



tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)

dG = sqrt(sum(tmp.^2,2));

enter image description here

edited Nov 22 '18 at 14:25

asked Nov 19 '18 at 10:25

HCAI

58041338

Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

– Cris Luengo
Nov 22 '18 at 16:55

Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

– Cris Luengo
Nov 22 '18 at 16:58

@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

– HCAI
Nov 23 '18 at 8:31

Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

– Cris Luengo
Nov 23 '18 at 19:48

add a comment |

m=1E7;

G=1E5*rand(1,8);

C=1E5*[zeros(m,1),rand(m,8)]; 

GSD=10*rand(1,8);



%I've taken the log10 of the values because G and C are very large in magnitude. 

%Don't know if it's worth it.



for i=1:m

    dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));

end

Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)



dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2)); 



tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)

dG = sqrt(sum(tmp.^2,2));

enter image description here

edited Nov 22 '18 at 14:25

asked Nov 19 '18 at 10:25

HCAI

58041338

Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

– Cris Luengo
Nov 22 '18 at 16:55

Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

– Cris Luengo
Nov 22 '18 at 16:58

@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

– HCAI
Nov 23 '18 at 8:31

Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

– Cris Luengo
Nov 23 '18 at 19:48

add a comment |

m=1E7;

G=1E5*rand(1,8);

C=1E5*[zeros(m,1),rand(m,8)]; 

GSD=10*rand(1,8);



%I've taken the log10 of the values because G and C are very large in magnitude. 

%Don't know if it's worth it.



for i=1:m

    dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));

end

Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)



dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2)); 



tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)

dG = sqrt(sum(tmp.^2,2));

enter image description here

edited Nov 22 '18 at 14:25

asked Nov 19 '18 at 10:25

HCAI

58041338

m=1E7;

G=1E5*rand(1,8);

C=1E5*[zeros(m,1),rand(m,8)]; 

GSD=10*rand(1,8);



%I've taken the log10 of the values because G and C are very large in magnitude. 

%Don't know if it's worth it.



for i=1:m

    dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));

end

Using the examples from below, they don't all give the same answer. In fact none of them give the same answer (see following figure using:

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD))); %(1)



dG = sqrt(sum((log10(G)-log10(C(:,2:end))./log10(GSD)).^2,2)); 



tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD)); %(4)

dG = sqrt(sum(tmp.^2,2));

enter image description here

matlab

edited Nov 22 '18 at 14:25

asked Nov 19 '18 at 10:25

HCAI

58041338

edited Nov 22 '18 at 14:25

asked Nov 19 '18 at 10:25

HCAI

58041338

edited Nov 22 '18 at 14:25

asked Nov 19 '18 at 10:25

HCAI

58041338

asked Nov 19 '18 at 10:25

HCAI

58041338

asked Nov 19 '18 at 10:25

HCAI

58041338

Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

– Cris Luengo
Nov 22 '18 at 16:55

Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

– Cris Luengo
Nov 22 '18 at 16:58

@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

– HCAI
Nov 23 '18 at 8:31

Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

– Cris Luengo
Nov 23 '18 at 19:48

add a comment |

Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

– Cris Luengo
Nov 22 '18 at 16:55

Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

– Cris Luengo
Nov 22 '18 at 16:58

@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

– HCAI
Nov 23 '18 at 8:31

Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

– Cris Luengo
Nov 23 '18 at 19:48

Which one of the three gives the same answer as your own code? I would guess that the Mahalanobis one is wrong, but I don’t immediately see what is different about the other two.

– Cris Luengo
Nov 22 '18 at 16:55

Also, you do change the math when taking the logarithm of your data. I don’t see why you do this, and why you think you’re computing the Euclidean distance. I would not use log10 in your code.

– Cris Luengo
Nov 22 '18 at 16:58

@CrisLuengo G is some experimental data and GSD the standard deviation for each point in G. C are prediction of G from a mathematical model by trying different parameter values. I want to find a parameter value that makes a row in C that minimise the difference between C and G, which takes into account the effect of the standard deviations. So I thought I should take logs in case there are differences in orders magnitude. What do you think?

– HCAI
Nov 23 '18 at 8:31

Well, note that log(std(x))~=std(log(x)), so I don’t think your normalization is correct any more. If you want to compute the Euclidean distance of the logarithm of your data, that’s fine, but take the logarithm first, then normalize, or normalize first, then take the logarithm. You are doing neither. And typically, the normalization (whitening) should take care of orders of magnitude difference. Also note that log(G)-log(C) = log(G./C), think about what that means for your data!

– Cris Luengo
Nov 23 '18 at 19:48

add a comment |

2 Answers
2

active

oldest

votes

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);

where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Note that the Mahalanobis distance is originally intented for normalising data, thus it is the "covariance" which have to be put as the fourth input, which MATLAB then finds the Cholesky decomposition of (element-wise squareroot when diagonal, as here).

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited Nov 26 '18 at 7:19

answered Nov 19 '18 at 10:31

Nicky Mattsson

2,397725

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

– HCAI
Nov 19 '18 at 10:36

No that does not matter. You will have to index into it anyway.

– Nicky Mattsson
Nov 19 '18 at 10:39

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

– HCAI
Nov 19 '18 at 10:41

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

– Nicky Mattsson
Nov 19 '18 at 10:46

I made a mistake, size(C)=[m,9].

– HCAI
Nov 19 '18 at 10:50

|
show 3 more comments

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited Nov 19 '18 at 11:09

answered Nov 19 '18 at 11:04

Brice

1,400110

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

– HCAI
Nov 19 '18 at 12:25

I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

– HCAI
Nov 22 '18 at 14:37

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372579%2fmatlab-euclidean-norm-or-difference-between-two-vectors%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);

where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited Nov 26 '18 at 7:19

answered Nov 19 '18 at 10:31

Nicky Mattsson

2,397725

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

– HCAI
Nov 19 '18 at 10:36

No that does not matter. You will have to index into it anyway.

– Nicky Mattsson
Nov 19 '18 at 10:39

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

– HCAI
Nov 19 '18 at 10:41

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

– Nicky Mattsson
Nov 19 '18 at 10:46

I made a mistake, size(C)=[m,9].

– HCAI
Nov 19 '18 at 10:50

|
show 3 more comments

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);

where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited Nov 26 '18 at 7:19

answered Nov 19 '18 at 10:31

Nicky Mattsson

2,397725

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

– HCAI
Nov 19 '18 at 10:36

No that does not matter. You will have to index into it anyway.

– Nicky Mattsson
Nov 19 '18 at 10:39

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

– HCAI
Nov 19 '18 at 10:41

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

– Nicky Mattsson
Nov 19 '18 at 10:46

I made a mistake, size(C)=[m,9].

– HCAI
Nov 19 '18 at 10:50

|
show 3 more comments

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);

where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited Nov 26 '18 at 7:19

answered Nov 19 '18 at 10:31

Nicky Mattsson

2,397725

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)).^2);

where the name-pair 'mahalanobis',diag(log10(GSD)).^2 puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited Nov 26 '18 at 7:19

answered Nov 19 '18 at 10:31

Nicky Mattsson

2,397725

edited Nov 26 '18 at 7:19

answered Nov 19 '18 at 10:31

Nicky Mattsson

2,397725

answered Nov 19 '18 at 10:31

Nicky Mattsson

2,397725

answered Nov 19 '18 at 10:31

Nicky Mattsson

2,397725

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

– HCAI
Nov 19 '18 at 10:36

No that does not matter. You will have to index into it anyway.

– Nicky Mattsson
Nov 19 '18 at 10:39

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

– HCAI
Nov 19 '18 at 10:41

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

– Nicky Mattsson
Nov 19 '18 at 10:46

I made a mistake, size(C)=[m,9].

– HCAI
Nov 19 '18 at 10:50

|
show 3 more comments

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

– HCAI
Nov 19 '18 at 10:36

No that does not matter. You will have to index into it anyway.

– Nicky Mattsson
Nov 19 '18 at 10:39

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

– HCAI
Nov 19 '18 at 10:41

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

– Nicky Mattsson
Nov 19 '18 at 10:46

I made a mistake, size(C)=[m,9].

– HCAI
Nov 19 '18 at 10:50

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?

– HCAI
Nov 19 '18 at 10:36

No that does not matter. You will have to index into it anyway.

– Nicky Mattsson
Nov 19 '18 at 10:39

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?

– HCAI
Nov 19 '18 at 10:41

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.

– Nicky Mattsson
Nov 19 '18 at 10:46

I made a mistake, size(C)=[m,9].

– HCAI
Nov 19 '18 at 10:50

|
show 3 more comments

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited Nov 19 '18 at 11:09

answered Nov 19 '18 at 11:04

Brice

1,400110

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

– HCAI
Nov 19 '18 at 12:25

I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

– HCAI
Nov 22 '18 at 14:37

add a comment |

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited Nov 19 '18 at 11:09

answered Nov 19 '18 at 11:04

Brice

1,400110

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

– HCAI
Nov 19 '18 at 12:25

I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

– HCAI
Nov 22 '18 at 14:37

add a comment |

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited Nov 19 '18 at 11:09

answered Nov 19 '18 at 11:04

Brice

1,400110

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited Nov 19 '18 at 11:09

answered Nov 19 '18 at 11:04

Brice

1,400110

edited Nov 19 '18 at 11:09

answered Nov 19 '18 at 11:04

Brice

1,400110

answered Nov 19 '18 at 11:04

Brice

1,400110

answered Nov 19 '18 at 11:04

Brice

1,400110

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

– HCAI
Nov 19 '18 at 12:25

I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

– HCAI
Nov 22 '18 at 14:37

add a comment |

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

– HCAI
Nov 19 '18 at 12:25

I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

– HCAI
Nov 22 '18 at 14:37

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.

– HCAI
Nov 19 '18 at 12:25

I'm just looking at these again. They don't give the same distance (see pic above). Any thoughts about what's going on?

– HCAI
Nov 22 '18 at 14:37

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk