Further optimization of simple cython code












2














I have a function written in cython that computes a certain measure of correlation (distance correlation) via a double for loop:



%%cython -a
import numpy as np

def distances_Matrix(X):
return np.array([[np.linalg.norm(xi-xj) for xi in X] for xj in X])

def c_dCov(double[:, :] a, double[:, :] b, int n):
cdef int i
cdef int j
cdef double U = 0
cdef double W1 = n/(n-1)
cdef double W2 = 2/(n-2)
cdef double[:] a_M = np.mean(a,axis=1)
cdef double a_ = np.mean(a)
cdef double[:] b_M = np.mean(b,axis=1)
cdef double b_ = np.mean(b)

for i in range(n):
for j in range(n):
if i != j:
U = U + (a[i][j] + W1*(-a_M[i]-a_M[j]+a_)) * (b[i][j] + W1*(-b_M[i]-b_M[j]+b_))
else:
U = U - W2*(W1**2)*(a_M[i] - a_) * (b_M[i] - b_)
return U/(n*(n-3))

def c_dCor(X,Y):
n = len(X)
a = distances_Matrix(X)
b = distances_Matrix(Y)
V_XX = c_dCov(a,a,n)
V_YY = c_dCov(b,b,n)
V_XY = c_dCov(a,b,n)
return V_XY/np.sqrt(V_XX*V_YY)


When I compile this fragment of code I get the following report of optimization by the compiler:



enter image description here



Line 23 is still quite yellow, which indicates significant python interactions, how can I make that line further optimized?.



The operations done on that line are quite trivial, just products and sums, since I did specify the types of every array and variable used in that function, why do I get such a bad performance on that line?



Thanks in advance.










share|improve this question
























  • I was able to optimize line 23 by adding the @cython.boundscheck(False) decorator to c_dCov . That alone was enough to turn yellow to white. Though @cython.boundscheck(False) isn't in your code snippet, it does look like it's in the optimization report. Did you already try using boundscheck, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?
    – tel
    Nov 13 at 4:43










  • a[i][j] to a[i,j] may also be useful
    – DavidW
    Nov 13 at 7:14










  • The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
    – max9111
    Nov 13 at 10:03










  • @max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
    – Ettore Majorana
    Nov 13 at 15:37












  • At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
    – max9111
    Nov 13 at 15:59
















2














I have a function written in cython that computes a certain measure of correlation (distance correlation) via a double for loop:



%%cython -a
import numpy as np

def distances_Matrix(X):
return np.array([[np.linalg.norm(xi-xj) for xi in X] for xj in X])

def c_dCov(double[:, :] a, double[:, :] b, int n):
cdef int i
cdef int j
cdef double U = 0
cdef double W1 = n/(n-1)
cdef double W2 = 2/(n-2)
cdef double[:] a_M = np.mean(a,axis=1)
cdef double a_ = np.mean(a)
cdef double[:] b_M = np.mean(b,axis=1)
cdef double b_ = np.mean(b)

for i in range(n):
for j in range(n):
if i != j:
U = U + (a[i][j] + W1*(-a_M[i]-a_M[j]+a_)) * (b[i][j] + W1*(-b_M[i]-b_M[j]+b_))
else:
U = U - W2*(W1**2)*(a_M[i] - a_) * (b_M[i] - b_)
return U/(n*(n-3))

def c_dCor(X,Y):
n = len(X)
a = distances_Matrix(X)
b = distances_Matrix(Y)
V_XX = c_dCov(a,a,n)
V_YY = c_dCov(b,b,n)
V_XY = c_dCov(a,b,n)
return V_XY/np.sqrt(V_XX*V_YY)


When I compile this fragment of code I get the following report of optimization by the compiler:



enter image description here



Line 23 is still quite yellow, which indicates significant python interactions, how can I make that line further optimized?.



The operations done on that line are quite trivial, just products and sums, since I did specify the types of every array and variable used in that function, why do I get such a bad performance on that line?



Thanks in advance.










share|improve this question
























  • I was able to optimize line 23 by adding the @cython.boundscheck(False) decorator to c_dCov . That alone was enough to turn yellow to white. Though @cython.boundscheck(False) isn't in your code snippet, it does look like it's in the optimization report. Did you already try using boundscheck, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?
    – tel
    Nov 13 at 4:43










  • a[i][j] to a[i,j] may also be useful
    – DavidW
    Nov 13 at 7:14










  • The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
    – max9111
    Nov 13 at 10:03










  • @max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
    – Ettore Majorana
    Nov 13 at 15:37












  • At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
    – max9111
    Nov 13 at 15:59














2












2








2







I have a function written in cython that computes a certain measure of correlation (distance correlation) via a double for loop:



%%cython -a
import numpy as np

def distances_Matrix(X):
return np.array([[np.linalg.norm(xi-xj) for xi in X] for xj in X])

def c_dCov(double[:, :] a, double[:, :] b, int n):
cdef int i
cdef int j
cdef double U = 0
cdef double W1 = n/(n-1)
cdef double W2 = 2/(n-2)
cdef double[:] a_M = np.mean(a,axis=1)
cdef double a_ = np.mean(a)
cdef double[:] b_M = np.mean(b,axis=1)
cdef double b_ = np.mean(b)

for i in range(n):
for j in range(n):
if i != j:
U = U + (a[i][j] + W1*(-a_M[i]-a_M[j]+a_)) * (b[i][j] + W1*(-b_M[i]-b_M[j]+b_))
else:
U = U - W2*(W1**2)*(a_M[i] - a_) * (b_M[i] - b_)
return U/(n*(n-3))

def c_dCor(X,Y):
n = len(X)
a = distances_Matrix(X)
b = distances_Matrix(Y)
V_XX = c_dCov(a,a,n)
V_YY = c_dCov(b,b,n)
V_XY = c_dCov(a,b,n)
return V_XY/np.sqrt(V_XX*V_YY)


When I compile this fragment of code I get the following report of optimization by the compiler:



enter image description here



Line 23 is still quite yellow, which indicates significant python interactions, how can I make that line further optimized?.



The operations done on that line are quite trivial, just products and sums, since I did specify the types of every array and variable used in that function, why do I get such a bad performance on that line?



Thanks in advance.










share|improve this question















I have a function written in cython that computes a certain measure of correlation (distance correlation) via a double for loop:



%%cython -a
import numpy as np

def distances_Matrix(X):
return np.array([[np.linalg.norm(xi-xj) for xi in X] for xj in X])

def c_dCov(double[:, :] a, double[:, :] b, int n):
cdef int i
cdef int j
cdef double U = 0
cdef double W1 = n/(n-1)
cdef double W2 = 2/(n-2)
cdef double[:] a_M = np.mean(a,axis=1)
cdef double a_ = np.mean(a)
cdef double[:] b_M = np.mean(b,axis=1)
cdef double b_ = np.mean(b)

for i in range(n):
for j in range(n):
if i != j:
U = U + (a[i][j] + W1*(-a_M[i]-a_M[j]+a_)) * (b[i][j] + W1*(-b_M[i]-b_M[j]+b_))
else:
U = U - W2*(W1**2)*(a_M[i] - a_) * (b_M[i] - b_)
return U/(n*(n-3))

def c_dCor(X,Y):
n = len(X)
a = distances_Matrix(X)
b = distances_Matrix(Y)
V_XX = c_dCov(a,a,n)
V_YY = c_dCov(b,b,n)
V_XY = c_dCov(a,b,n)
return V_XY/np.sqrt(V_XX*V_YY)


When I compile this fragment of code I get the following report of optimization by the compiler:



enter image description here



Line 23 is still quite yellow, which indicates significant python interactions, how can I make that line further optimized?.



The operations done on that line are quite trivial, just products and sums, since I did specify the types of every array and variable used in that function, why do I get such a bad performance on that line?



Thanks in advance.







python numpy optimization cython






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 at 1:43

























asked Nov 13 at 1:12









Ettore Majorana

646




646












  • I was able to optimize line 23 by adding the @cython.boundscheck(False) decorator to c_dCov . That alone was enough to turn yellow to white. Though @cython.boundscheck(False) isn't in your code snippet, it does look like it's in the optimization report. Did you already try using boundscheck, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?
    – tel
    Nov 13 at 4:43










  • a[i][j] to a[i,j] may also be useful
    – DavidW
    Nov 13 at 7:14










  • The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
    – max9111
    Nov 13 at 10:03










  • @max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
    – Ettore Majorana
    Nov 13 at 15:37












  • At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
    – max9111
    Nov 13 at 15:59


















  • I was able to optimize line 23 by adding the @cython.boundscheck(False) decorator to c_dCov . That alone was enough to turn yellow to white. Though @cython.boundscheck(False) isn't in your code snippet, it does look like it's in the optimization report. Did you already try using boundscheck, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?
    – tel
    Nov 13 at 4:43










  • a[i][j] to a[i,j] may also be useful
    – DavidW
    Nov 13 at 7:14










  • The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
    – max9111
    Nov 13 at 10:03










  • @max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
    – Ettore Majorana
    Nov 13 at 15:37












  • At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
    – max9111
    Nov 13 at 15:59
















I was able to optimize line 23 by adding the @cython.boundscheck(False) decorator to c_dCov . That alone was enough to turn yellow to white. Though @cython.boundscheck(False) isn't in your code snippet, it does look like it's in the optimization report. Did you already try using boundscheck, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?
– tel
Nov 13 at 4:43




I was able to optimize line 23 by adding the @cython.boundscheck(False) decorator to c_dCov . That alone was enough to turn yellow to white. Though @cython.boundscheck(False) isn't in your code snippet, it does look like it's in the optimization report. Did you already try using boundscheck, and did it not work? If so, you might just need to upgrade your Cython package. What version are you currently using?
– tel
Nov 13 at 4:43












a[i][j] to a[i,j] may also be useful
– DavidW
Nov 13 at 7:14




a[i][j] to a[i,j] may also be useful
– DavidW
Nov 13 at 7:14












The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
– max9111
Nov 13 at 10:03




The most obvious thing to optimize is the distances_Matrix function. eg. stackoverflow.com/q/50675705/4045774 (Don't forget to add a sqrt to euclidean_distance_square_einsum).
– max9111
Nov 13 at 10:03












@max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
– Ettore Majorana
Nov 13 at 15:37






@max9111 I will now attempt to optimize distances_Matrix. Do you think doing it all with pure python loops in order to cythonize the function is a good approach? Thanks in advance for your insight.
– Ettore Majorana
Nov 13 at 15:37














At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
– max9111
Nov 13 at 15:59




At least for a bit larger problems, there isn't so much optimization potential, apart from a fast BLAS backend (eg. Intel MKL). You can simply take the euclidean_distance_square_einsum function and add a np.sqrt .
– max9111
Nov 13 at 15:59












1 Answer
1






active

oldest

votes


















1














Short answer: disable bounds checking in the c_dCov function by adding the following decorator on the line right before it:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):


Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:



%%cython -a
#cython: language_level=3, boundscheck=False


If you had a setup.py file, you could also globally turn bounds checking off there:



from distutils.core import setup
from Cython.Build import cythonize

setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)


Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:



enter image description here



Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov would become:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):


but only @cython.boundscheck(False) was needed to get the better optimization report.



Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False) and wraparound(False) decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.



Explanation



Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr of shape (5,5) and you try to access arr[19,27], your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck compiler directive, or for a single function with the @cython.boundscheck(False) decorator.






share|improve this answer























  • Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
    – Ettore Majorana
    Nov 13 at 15:31












  • @Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
    – max9111
    Nov 13 at 19:19











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272344%2ffurther-optimization-of-simple-cython-code%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Short answer: disable bounds checking in the c_dCov function by adding the following decorator on the line right before it:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):


Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:



%%cython -a
#cython: language_level=3, boundscheck=False


If you had a setup.py file, you could also globally turn bounds checking off there:



from distutils.core import setup
from Cython.Build import cythonize

setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)


Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:



enter image description here



Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov would become:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):


but only @cython.boundscheck(False) was needed to get the better optimization report.



Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False) and wraparound(False) decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.



Explanation



Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr of shape (5,5) and you try to access arr[19,27], your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck compiler directive, or for a single function with the @cython.boundscheck(False) decorator.






share|improve this answer























  • Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
    – Ettore Majorana
    Nov 13 at 15:31












  • @Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
    – max9111
    Nov 13 at 19:19
















1














Short answer: disable bounds checking in the c_dCov function by adding the following decorator on the line right before it:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):


Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:



%%cython -a
#cython: language_level=3, boundscheck=False


If you had a setup.py file, you could also globally turn bounds checking off there:



from distutils.core import setup
from Cython.Build import cythonize

setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)


Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:



enter image description here



Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov would become:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):


but only @cython.boundscheck(False) was needed to get the better optimization report.



Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False) and wraparound(False) decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.



Explanation



Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr of shape (5,5) and you try to access arr[19,27], your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck compiler directive, or for a single function with the @cython.boundscheck(False) decorator.






share|improve this answer























  • Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
    – Ettore Majorana
    Nov 13 at 15:31












  • @Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
    – max9111
    Nov 13 at 19:19














1












1








1






Short answer: disable bounds checking in the c_dCov function by adding the following decorator on the line right before it:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):


Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:



%%cython -a
#cython: language_level=3, boundscheck=False


If you had a setup.py file, you could also globally turn bounds checking off there:



from distutils.core import setup
from Cython.Build import cythonize

setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)


Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:



enter image description here



Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov would become:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):


but only @cython.boundscheck(False) was needed to get the better optimization report.



Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False) and wraparound(False) decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.



Explanation



Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr of shape (5,5) and you try to access arr[19,27], your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck compiler directive, or for a single function with the @cython.boundscheck(False) decorator.






share|improve this answer














Short answer: disable bounds checking in the c_dCov function by adding the following decorator on the line right before it:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):


Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:



%%cython -a
#cython: language_level=3, boundscheck=False


If you had a setup.py file, you could also globally turn bounds checking off there:



from distutils.core import setup
from Cython.Build import cythonize

setup(
name="foo",
ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)


Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:



enter image description here



Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov would become:



cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):


but only @cython.boundscheck(False) was needed to get the better optimization report.



Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False) and wraparound(False) decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.



Explanation



Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr of shape (5,5) and you try to access arr[19,27], your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck compiler directive, or for a single function with the @cython.boundscheck(False) decorator.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 at 7:05

























answered Nov 13 at 4:38









tel

6,06011430




6,06011430












  • Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
    – Ettore Majorana
    Nov 13 at 15:31












  • @Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
    – max9111
    Nov 13 at 19:19


















  • Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
    – Ettore Majorana
    Nov 13 at 15:31












  • @Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
    – max9111
    Nov 13 at 19:19
















Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
– Ettore Majorana
Nov 13 at 15:31






Thanks a lot! Indeed I had those decorators but they didn't seem to have any effect on the optimization, so I removed them in the end. I updated my cython version and now the report looks like yours.
– Ettore Majorana
Nov 13 at 15:31














@Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
– max9111
Nov 13 at 19:19




@Ettore Majorana Declaring contigous arrays like c_dCov(double[:, ::1] or disabling boundschecking can have a quite significant effect if you get working SIMD vectorization by this. This also heavily depends on the compiler flags. Nevertheless the creation of the distances matrix should more time consuming than the simple loop you wanted to optimize in the first place.
– max9111
Nov 13 at 19:19


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272344%2ffurther-optimization-of-simple-cython-code%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Port of Spain

Run scheduled task as local user group (not BUILTIN)