Further optimization of simple cython code
I have a function written in cython that computes a certain measure of correlation (distance correlation) via a double for loop:
%%cython -a
import numpy as np
def distances_Matrix(X):
return np.array([[np.linalg.norm(xi-xj) for xi in X] for xj in X])
def c_dCov(double[:, :] a, double[:, :] b, int n):
cdef int i
cdef int j
cdef double U = 0
cdef double W1 = n/(n-1)
cdef double W2 = 2/(n-2)
cdef double[:] a_M = np.mean(a,axis=1)
cdef double a_ = np.mean(a)
cdef double[:] b_M = np.mean(b,axis=1)
cdef double b_ = np.mean(b)
for i in range(n):
for j in range(n):
if i != j:
U = U + (a[i][j] + W1*(-a_M[i]-a_M[j]+a_)) * (b[i][j] + W1*(-b_M[i]-b_M[j]+b_))
else:
U = U - W2*(W1**2)*(a_M[i] - a_) * (b_M[i] - b_)
return U/(n*(n-3))
def c_dCor(X,Y):
n = len(X)
a = distances_Matrix(X)
b = distances_Matrix(Y)
V_XX = c_dCov(a,a,n)
V_YY = c_dCov(b,b,n)
V_XY = c_dCov(a,b,n)
return V_XY/np.sqrt(V_XX*V_YY)
When I compile this fragment of code I get the following report of optimization by the compiler:
Line 23 is still quite yellow, which indicates significant python interactions, how can I make that line further optimized?.
The operations done on that line are quite trivial, just products and sums, since I did specify the types of every array and variable used in that function, why do I get such a bad performance on that line?
Thanks in advance.
python numpy optimization cython
add a comment |
I have a function written in cython that computes a certain measure of correlation (distance correlation) via a double for loop:
%%cython -a
import numpy as np
def distances_Matrix(X):
return np.array([[np.linalg.norm(xi-xj) for xi in X] for xj in X])
def c_dCov(double[:, :] a, double[:, :] b, int n):
cdef int i
cdef int j
cdef double U = 0
cdef double W1 = n/(n-1)
cdef double W2 = 2/(n-2)
cdef double[:] a_M = np.mean(a,axis=1)
cdef double a_ = np.mean(a)
cdef double[:] b_M = np.mean(b,axis=1)
cdef double b_ = np.mean(b)
for i in range(n):
for j in range(n):
if i != j:
U = U + (a[i][j] + W1*(-a_M[i]-a_M[j]+a_)) * (b[i][j] + W1*(-b_M[i]-b_M[j]+b_))
else:
U = U - W2*(W1**2)*(a_M[i] - a_) * (b_M[i] - b_)
return U/(n*(n-3))
def c_dCor(X,Y):
n = len(X)
a = distances_Matrix(X)
b = distances_Matrix(Y)
V_XX = c_dCov(a,a,n)
V_YY = c_dCov(b,b,n)
V_XY = c_dCov(a,b,n)
return V_XY/np.sqrt(V_XX*V_YY)
When I compile this fragment of code I get the following report of optimization by the compiler:
Line 23 is still quite yellow, which indicates significant python interactions, how can I make that line further optimized?.
The operations done on that line are quite trivial, just products and sums, since I did specify the types of every array and variable used in that function, why do I get such a bad performance on that line?
Thanks in advance.
python numpy optimization cython
I was able to optimize line 23 by adding the@cython.boundscheck(False)
decorator toc_dCov
. That alone was enough to turn yellow to white. Though@cython.boundscheck(False)
isn't in your code snippet, it does look like it's in the optimization report. Did you already try usingboundscheck
, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?
– tel
Nov 13 at 4:43
a[i][j]
toa[i,j]
may also be useful
– DavidW
Nov 13 at 7:14
The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
– max9111
Nov 13 at 10:03
@max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
– Ettore Majorana
Nov 13 at 15:37
At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
– max9111
Nov 13 at 15:59
add a comment |
I have a function written in cython that computes a certain measure of correlation (distance correlation) via a double for loop:
%%cython -a
import numpy as np
def distances_Matrix(X):
return np.array([[np.linalg.norm(xi-xj) for xi in X] for xj in X])
def c_dCov(double[:, :] a, double[:, :] b, int n):
cdef int i
cdef int j
cdef double U = 0
cdef double W1 = n/(n-1)
cdef double W2 = 2/(n-2)
cdef double[:] a_M = np.mean(a,axis=1)
cdef double a_ = np.mean(a)
cdef double[:] b_M = np.mean(b,axis=1)
cdef double b_ = np.mean(b)
for i in range(n):
for j in range(n):
if i != j:
U = U + (a[i][j] + W1*(-a_M[i]-a_M[j]+a_)) * (b[i][j] + W1*(-b_M[i]-b_M[j]+b_))
else:
U = U - W2*(W1**2)*(a_M[i] - a_) * (b_M[i] - b_)
return U/(n*(n-3))
def c_dCor(X,Y):
n = len(X)
a = distances_Matrix(X)
b = distances_Matrix(Y)
V_XX = c_dCov(a,a,n)
V_YY = c_dCov(b,b,n)
V_XY = c_dCov(a,b,n)
return V_XY/np.sqrt(V_XX*V_YY)
When I compile this fragment of code I get the following report of optimization by the compiler:
Line 23 is still quite yellow, which indicates significant python interactions, how can I make that line further optimized?.
The operations done on that line are quite trivial, just products and sums, since I did specify the types of every array and variable used in that function, why do I get such a bad performance on that line?
Thanks in advance.
python numpy optimization cython
I have a function written in cython that computes a certain measure of correlation (distance correlation) via a double for loop:
%%cython -a
import numpy as np
def distances_Matrix(X):
return np.array([[np.linalg.norm(xi-xj) for xi in X] for xj in X])
def c_dCov(double[:, :] a, double[:, :] b, int n):
cdef int i
cdef int j
cdef double U = 0
cdef double W1 = n/(n-1)
cdef double W2 = 2/(n-2)
cdef double[:] a_M = np.mean(a,axis=1)
cdef double a_ = np.mean(a)
cdef double[:] b_M = np.mean(b,axis=1)
cdef double b_ = np.mean(b)
for i in range(n):
for j in range(n):
if i != j:
U = U + (a[i][j] + W1*(-a_M[i]-a_M[j]+a_)) * (b[i][j] + W1*(-b_M[i]-b_M[j]+b_))
else:
U = U - W2*(W1**2)*(a_M[i] - a_) * (b_M[i] - b_)
return U/(n*(n-3))
def c_dCor(X,Y):
n = len(X)
a = distances_Matrix(X)
b = distances_Matrix(Y)
V_XX = c_dCov(a,a,n)
V_YY = c_dCov(b,b,n)
V_XY = c_dCov(a,b,n)
return V_XY/np.sqrt(V_XX*V_YY)
When I compile this fragment of code I get the following report of optimization by the compiler:
Line 23 is still quite yellow, which indicates significant python interactions, how can I make that line further optimized?.
The operations done on that line are quite trivial, just products and sums, since I did specify the types of every array and variable used in that function, why do I get such a bad performance on that line?
Thanks in advance.
python numpy optimization cython
python numpy optimization cython
edited Nov 13 at 1:43
asked Nov 13 at 1:12
Ettore Majorana
646
646
I was able to optimize line 23 by adding the@cython.boundscheck(False)
decorator toc_dCov
. That alone was enough to turn yellow to white. Though@cython.boundscheck(False)
isn't in your code snippet, it does look like it's in the optimization report. Did you already try usingboundscheck
, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?
– tel
Nov 13 at 4:43
a[i][j]
toa[i,j]
may also be useful
– DavidW
Nov 13 at 7:14
The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
– max9111
Nov 13 at 10:03
@max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
– Ettore Majorana
Nov 13 at 15:37
At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
– max9111
Nov 13 at 15:59
add a comment |
I was able to optimize line 23 by adding the@cython.boundscheck(False)
decorator toc_dCov
. That alone was enough to turn yellow to white. Though@cython.boundscheck(False)
isn't in your code snippet, it does look like it's in the optimization report. Did you already try usingboundscheck
, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?
– tel
Nov 13 at 4:43
a[i][j]
toa[i,j]
may also be useful
– DavidW
Nov 13 at 7:14
The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
– max9111
Nov 13 at 10:03
@max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
– Ettore Majorana
Nov 13 at 15:37
At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
– max9111
Nov 13 at 15:59
I was able to optimize line 23 by adding the
@cython.boundscheck(False)
decorator to c_dCov
. That alone was enough to turn yellow to white. Though @cython.boundscheck(False)
isn't in your code snippet, it does look like it's in the optimization report. Did you already try using boundscheck
, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?– tel
Nov 13 at 4:43
I was able to optimize line 23 by adding the
@cython.boundscheck(False)
decorator to c_dCov
. That alone was enough to turn yellow to white. Though @cython.boundscheck(False)
isn't in your code snippet, it does look like it's in the optimization report. Did you already try using boundscheck
, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?– tel
Nov 13 at 4:43
a[i][j]
to a[i,j]
may also be useful– DavidW
Nov 13 at 7:14
a[i][j]
to a[i,j]
may also be useful– DavidW
Nov 13 at 7:14
The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
– max9111
Nov 13 at 10:03
The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
– max9111
Nov 13 at 10:03
@max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
– Ettore Majorana
Nov 13 at 15:37
@max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
– Ettore Majorana
Nov 13 at 15:37
At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
– max9111
Nov 13 at 15:59
At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
– max9111
Nov 13 at 15:59
add a comment |
1 Answer
1
active
oldest
votes
Short answer: disable bounds checking in the c_dCov
function by adding the following decorator on the line right before it:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):
Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:
%%cython -a
#cython: language_level=3, boundscheck=False
If you had a setup.py
file, you could also globally turn bounds checking off there:
from distutils.core import setup
from Cython.Build import cythonize
setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)
Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:
Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov
would become:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):
but only @cython.boundscheck(False)
was needed to get the better optimization report.
Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False)
and wraparound(False)
decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.
Explanation
Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr
of shape (5,5)
and you try to access arr[19,27]
, your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck
compiler directive, or for a single function with the @cython.boundscheck(False)
decorator.
Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
– Ettore Majorana
Nov 13 at 15:31
@Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
– max9111
Nov 13 at 19:19
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272344%2ffurther-optimization-of-simple-cython-code%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Short answer: disable bounds checking in the c_dCov
function by adding the following decorator on the line right before it:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):
Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:
%%cython -a
#cython: language_level=3, boundscheck=False
If you had a setup.py
file, you could also globally turn bounds checking off there:
from distutils.core import setup
from Cython.Build import cythonize
setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)
Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:
Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov
would become:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):
but only @cython.boundscheck(False)
was needed to get the better optimization report.
Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False)
and wraparound(False)
decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.
Explanation
Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr
of shape (5,5)
and you try to access arr[19,27]
, your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck
compiler directive, or for a single function with the @cython.boundscheck(False)
decorator.
Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
– Ettore Majorana
Nov 13 at 15:31
@Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
– max9111
Nov 13 at 19:19
add a comment |
Short answer: disable bounds checking in the c_dCov
function by adding the following decorator on the line right before it:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):
Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:
%%cython -a
#cython: language_level=3, boundscheck=False
If you had a setup.py
file, you could also globally turn bounds checking off there:
from distutils.core import setup
from Cython.Build import cythonize
setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)
Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:
Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov
would become:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):
but only @cython.boundscheck(False)
was needed to get the better optimization report.
Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False)
and wraparound(False)
decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.
Explanation
Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr
of shape (5,5)
and you try to access arr[19,27]
, your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck
compiler directive, or for a single function with the @cython.boundscheck(False)
decorator.
Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
– Ettore Majorana
Nov 13 at 15:31
@Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
– max9111
Nov 13 at 19:19
add a comment |
Short answer: disable bounds checking in the c_dCov
function by adding the following decorator on the line right before it:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):
Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:
%%cython -a
#cython: language_level=3, boundscheck=False
If you had a setup.py
file, you could also globally turn bounds checking off there:
from distutils.core import setup
from Cython.Build import cythonize
setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)
Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:
Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov
would become:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):
but only @cython.boundscheck(False)
was needed to get the better optimization report.
Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False)
and wraparound(False)
decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.
Explanation
Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr
of shape (5,5)
and you try to access arr[19,27]
, your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck
compiler directive, or for a single function with the @cython.boundscheck(False)
decorator.
Short answer: disable bounds checking in the c_dCov
function by adding the following decorator on the line right before it:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):
Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:
%%cython -a
#cython: language_level=3, boundscheck=False
If you had a setup.py
file, you could also globally turn bounds checking off there:
from distutils.core import setup
from Cython.Build import cythonize
setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)
Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:
Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov
would become:
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):
but only @cython.boundscheck(False)
was needed to get the better optimization report.
Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False)
and wraparound(False)
decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.
Explanation
Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr
of shape (5,5)
and you try to access arr[19,27]
, your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck
compiler directive, or for a single function with the @cython.boundscheck(False)
decorator.
edited Nov 13 at 7:05
answered Nov 13 at 4:38
tel
6,06011430
6,06011430
Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
– Ettore Majorana
Nov 13 at 15:31
@Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
– max9111
Nov 13 at 19:19
add a comment |
Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
– Ettore Majorana
Nov 13 at 15:31
@Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
– max9111
Nov 13 at 19:19
Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
– Ettore Majorana
Nov 13 at 15:31
Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
– Ettore Majorana
Nov 13 at 15:31
@Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
– max9111
Nov 13 at 19:19
@Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
– max9111
Nov 13 at 19:19
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272344%2ffurther-optimization-of-simple-cython-code%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I was able to optimize line 23 by adding the
@cython.boundscheck(False)
decorator toc_dCov
. That alone was enough to turn yellow to white. Though@cython.boundscheck(False)
isn't in your code snippet, it does look like it's in the optimization report. Did you already try usingboundscheck
, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?– tel
Nov 13 at 4:43
a[i][j]
toa[i,j]
may also be useful– DavidW
Nov 13 at 7:14
The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
– max9111
Nov 13 at 10:03
@max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
– Ettore Majorana
Nov 13 at 15:37
At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
– max9111
Nov 13 at 15:59