Why are Python operations 30× slower after calling time.sleep or subprocess.Popen?











up vote
14
down vote

favorite
9












Consider the following loop:



for i in range(20):
if i == 10:
subprocess.Popen(["echo"]) # command 1
t_start = time.time()
1+1 # command 2
t_stop = time.time()
print(t_stop - t_start)


“command 2” command takes systematically longer to run when “command 1” is run before it. The following plot shows the execution time of 1+1 as a function of the loop index i, averaged over 100 runs.



Execution of 1+1 is 30 times slower when preceded by subprocess.Popen.



Execution time of <code>1+1</code> as a function of loop index





It gets even weirder. One may think that only the first command run after subprocess.Popen() is affected, but it is not the case. The following loop shows that all commands in the current loop iteration are affected. But the subsequent loops iterations seem to be mostly OK.



var = 0
for i in range(20):
if i == 10:
# command 1
subprocess.Popen(['echo'])
# command 2a
t_start = time.time()
1 + 1
t_stop = time.time()
print(t_stop - t_start)
# command 2b
t_start = time.time()
print(1)
t_stop = time.time()
print(t_stop - t_start)
# command 2c
t_start = time.time()
var += 1
t_stop = time.time()
print(t_stop - t_start)


Here’s a plot of the execution times for this loop, average over 100 runs:



Execution times of <code>1+1, print(1), and var += 1</code> as a function of loop index





More remarks:




  • We get the same effect when replacing subprocess.Popen() (“command 1”) with time.sleep(), or rawkit’s libraw C++ bindings initialization (libraw.bindings.LibRaw()). However, using other libraries with C++ bindings such as libraw.py, or OpenCV’s cv2.warpAffine() do not affect execution times. Opening files don’t either.

  • The effect is not caused by time.time(), because it is visible with timeit.timeit(), and even by measuring manually when print() result appear.

  • It also happens without a for-loop.

  • This happens even when a lot of different (possibly CPU- and memory-consuming) operations are performed between “command 1” (subprocess.Popen) and “command 2”.

  • With Numpy arrays, the slowdown appears to be proportional to the size of the array. With relatively big arrays (~ 60 M points), a simple arr += 1 operation can take up to 300 ms!




Question: What may cause this effect, and why does it affect only the current loop iteration?



I suspect that it could be related to context switching, but this doesn’t seem to explain why a whole loop iteration would affected. If context switching is indeed the cause, why do some commands trigger it while others don’t?










share|improve this question




















  • 3




    For the record, I've been able to reproduce this with Python 3.7, using time.perf_counter_ns() instead of time.time(), and using 1+1 or even doing nothing at all between the counters to time.perf_counter_ns()...
    – Schnouki
    Nov 9 at 15:37















up vote
14
down vote

favorite
9












Consider the following loop:



for i in range(20):
if i == 10:
subprocess.Popen(["echo"]) # command 1
t_start = time.time()
1+1 # command 2
t_stop = time.time()
print(t_stop - t_start)


“command 2” command takes systematically longer to run when “command 1” is run before it. The following plot shows the execution time of 1+1 as a function of the loop index i, averaged over 100 runs.



Execution of 1+1 is 30 times slower when preceded by subprocess.Popen.



Execution time of <code>1+1</code> as a function of loop index





It gets even weirder. One may think that only the first command run after subprocess.Popen() is affected, but it is not the case. The following loop shows that all commands in the current loop iteration are affected. But the subsequent loops iterations seem to be mostly OK.



var = 0
for i in range(20):
if i == 10:
# command 1
subprocess.Popen(['echo'])
# command 2a
t_start = time.time()
1 + 1
t_stop = time.time()
print(t_stop - t_start)
# command 2b
t_start = time.time()
print(1)
t_stop = time.time()
print(t_stop - t_start)
# command 2c
t_start = time.time()
var += 1
t_stop = time.time()
print(t_stop - t_start)


Here’s a plot of the execution times for this loop, average over 100 runs:



Execution times of <code>1+1, print(1), and var += 1</code> as a function of loop index





More remarks:




  • We get the same effect when replacing subprocess.Popen() (“command 1”) with time.sleep(), or rawkit’s libraw C++ bindings initialization (libraw.bindings.LibRaw()). However, using other libraries with C++ bindings such as libraw.py, or OpenCV’s cv2.warpAffine() do not affect execution times. Opening files don’t either.

  • The effect is not caused by time.time(), because it is visible with timeit.timeit(), and even by measuring manually when print() result appear.

  • It also happens without a for-loop.

  • This happens even when a lot of different (possibly CPU- and memory-consuming) operations are performed between “command 1” (subprocess.Popen) and “command 2”.

  • With Numpy arrays, the slowdown appears to be proportional to the size of the array. With relatively big arrays (~ 60 M points), a simple arr += 1 operation can take up to 300 ms!




Question: What may cause this effect, and why does it affect only the current loop iteration?



I suspect that it could be related to context switching, but this doesn’t seem to explain why a whole loop iteration would affected. If context switching is indeed the cause, why do some commands trigger it while others don’t?










share|improve this question




















  • 3




    For the record, I've been able to reproduce this with Python 3.7, using time.perf_counter_ns() instead of time.time(), and using 1+1 or even doing nothing at all between the counters to time.perf_counter_ns()...
    – Schnouki
    Nov 9 at 15:37













up vote
14
down vote

favorite
9









up vote
14
down vote

favorite
9






9





Consider the following loop:



for i in range(20):
if i == 10:
subprocess.Popen(["echo"]) # command 1
t_start = time.time()
1+1 # command 2
t_stop = time.time()
print(t_stop - t_start)


“command 2” command takes systematically longer to run when “command 1” is run before it. The following plot shows the execution time of 1+1 as a function of the loop index i, averaged over 100 runs.



Execution of 1+1 is 30 times slower when preceded by subprocess.Popen.



Execution time of <code>1+1</code> as a function of loop index





It gets even weirder. One may think that only the first command run after subprocess.Popen() is affected, but it is not the case. The following loop shows that all commands in the current loop iteration are affected. But the subsequent loops iterations seem to be mostly OK.



var = 0
for i in range(20):
if i == 10:
# command 1
subprocess.Popen(['echo'])
# command 2a
t_start = time.time()
1 + 1
t_stop = time.time()
print(t_stop - t_start)
# command 2b
t_start = time.time()
print(1)
t_stop = time.time()
print(t_stop - t_start)
# command 2c
t_start = time.time()
var += 1
t_stop = time.time()
print(t_stop - t_start)


Here’s a plot of the execution times for this loop, average over 100 runs:



Execution times of <code>1+1, print(1), and var += 1</code> as a function of loop index





More remarks:




  • We get the same effect when replacing subprocess.Popen() (“command 1”) with time.sleep(), or rawkit’s libraw C++ bindings initialization (libraw.bindings.LibRaw()). However, using other libraries with C++ bindings such as libraw.py, or OpenCV’s cv2.warpAffine() do not affect execution times. Opening files don’t either.

  • The effect is not caused by time.time(), because it is visible with timeit.timeit(), and even by measuring manually when print() result appear.

  • It also happens without a for-loop.

  • This happens even when a lot of different (possibly CPU- and memory-consuming) operations are performed between “command 1” (subprocess.Popen) and “command 2”.

  • With Numpy arrays, the slowdown appears to be proportional to the size of the array. With relatively big arrays (~ 60 M points), a simple arr += 1 operation can take up to 300 ms!




Question: What may cause this effect, and why does it affect only the current loop iteration?



I suspect that it could be related to context switching, but this doesn’t seem to explain why a whole loop iteration would affected. If context switching is indeed the cause, why do some commands trigger it while others don’t?










share|improve this question















Consider the following loop:



for i in range(20):
if i == 10:
subprocess.Popen(["echo"]) # command 1
t_start = time.time()
1+1 # command 2
t_stop = time.time()
print(t_stop - t_start)


“command 2” command takes systematically longer to run when “command 1” is run before it. The following plot shows the execution time of 1+1 as a function of the loop index i, averaged over 100 runs.



Execution of 1+1 is 30 times slower when preceded by subprocess.Popen.



Execution time of <code>1+1</code> as a function of loop index





It gets even weirder. One may think that only the first command run after subprocess.Popen() is affected, but it is not the case. The following loop shows that all commands in the current loop iteration are affected. But the subsequent loops iterations seem to be mostly OK.



var = 0
for i in range(20):
if i == 10:
# command 1
subprocess.Popen(['echo'])
# command 2a
t_start = time.time()
1 + 1
t_stop = time.time()
print(t_stop - t_start)
# command 2b
t_start = time.time()
print(1)
t_stop = time.time()
print(t_stop - t_start)
# command 2c
t_start = time.time()
var += 1
t_stop = time.time()
print(t_stop - t_start)


Here’s a plot of the execution times for this loop, average over 100 runs:



Execution times of <code>1+1, print(1), and var += 1</code> as a function of loop index





More remarks:




  • We get the same effect when replacing subprocess.Popen() (“command 1”) with time.sleep(), or rawkit’s libraw C++ bindings initialization (libraw.bindings.LibRaw()). However, using other libraries with C++ bindings such as libraw.py, or OpenCV’s cv2.warpAffine() do not affect execution times. Opening files don’t either.

  • The effect is not caused by time.time(), because it is visible with timeit.timeit(), and even by measuring manually when print() result appear.

  • It also happens without a for-loop.

  • This happens even when a lot of different (possibly CPU- and memory-consuming) operations are performed between “command 1” (subprocess.Popen) and “command 2”.

  • With Numpy arrays, the slowdown appears to be proportional to the size of the array. With relatively big arrays (~ 60 M points), a simple arr += 1 operation can take up to 300 ms!




Question: What may cause this effect, and why does it affect only the current loop iteration?



I suspect that it could be related to context switching, but this doesn’t seem to explain why a whole loop iteration would affected. If context switching is indeed the cause, why do some commands trigger it while others don’t?







python performance subprocess python-performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 9 at 17:21

























asked Nov 9 at 14:20









Arcturus B

1,38211534




1,38211534








  • 3




    For the record, I've been able to reproduce this with Python 3.7, using time.perf_counter_ns() instead of time.time(), and using 1+1 or even doing nothing at all between the counters to time.perf_counter_ns()...
    – Schnouki
    Nov 9 at 15:37














  • 3




    For the record, I've been able to reproduce this with Python 3.7, using time.perf_counter_ns() instead of time.time(), and using 1+1 or even doing nothing at all between the counters to time.perf_counter_ns()...
    – Schnouki
    Nov 9 at 15:37








3




3




For the record, I've been able to reproduce this with Python 3.7, using time.perf_counter_ns() instead of time.time(), and using 1+1 or even doing nothing at all between the counters to time.perf_counter_ns()...
– Schnouki
Nov 9 at 15:37




For the record, I've been able to reproduce this with Python 3.7, using time.perf_counter_ns() instead of time.time(), and using 1+1 or even doing nothing at all between the counters to time.perf_counter_ns()...
– Schnouki
Nov 9 at 15:37












1 Answer
1






active

oldest

votes

















up vote
6
down vote



accepted










my guess would be that this is due to the Python code being evicted from various caches in the CPU/memory system



the perflib package can be used to extract more detailed CPU level stats about the state of the cache — i.e. the number of hits/misses.



I get ~5 times the LIBPERF_COUNT_HW_CACHE_MISSES counter after the Popen() call:



from subprocess import Popen, DEVNULL
from perflib import PerfCounter
import numpy as np

arr =
p = PerfCounter('LIBPERF_COUNT_HW_CACHE_MISSES')

for i in range(100):
ti =
p.reset()
p.start()
ti.extend(p.getval() for _ in range(7))
Popen(['echo'], stdout=DEVNULL)
ti.extend(p.getval() for _ in range(7))
p.stop()
arr.append(ti)


np.diff(np.array(arr), axis=1).mean(axis=0).astype(int).tolist()


gives me:



 2605,  2185,  2127,  2099,  2407,  2120,
5481210,
16499, 10694, 10398, 10301, 10206, 10166


(lines broken in non-standard places to indicate code flow)






share|improve this answer























  • Just to make sure I understand, you mean it's because the new process takes over the processor's caches, right? (Roughly speaking, of course.)
    – jpmc26
    Nov 9 at 23:52












  • @jpmc26 sort of… doing anything will put pressure on caches, the more work you do the more things move around. with high probability this will result in old things being evicted from the cache and when these old things are needed again they need to be repopulated. starting up a process will involve churning through a few MB of RAM, as will creating a large numpy array, etc.
    – Sam Mason
    Nov 10 at 0:20











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53227478%2fwhy-are-python-operations-30%25c3%2597-slower-after-calling-time-sleep-or-subprocess-pope%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
6
down vote



accepted










my guess would be that this is due to the Python code being evicted from various caches in the CPU/memory system



the perflib package can be used to extract more detailed CPU level stats about the state of the cache — i.e. the number of hits/misses.



I get ~5 times the LIBPERF_COUNT_HW_CACHE_MISSES counter after the Popen() call:



from subprocess import Popen, DEVNULL
from perflib import PerfCounter
import numpy as np

arr =
p = PerfCounter('LIBPERF_COUNT_HW_CACHE_MISSES')

for i in range(100):
ti =
p.reset()
p.start()
ti.extend(p.getval() for _ in range(7))
Popen(['echo'], stdout=DEVNULL)
ti.extend(p.getval() for _ in range(7))
p.stop()
arr.append(ti)


np.diff(np.array(arr), axis=1).mean(axis=0).astype(int).tolist()


gives me:



 2605,  2185,  2127,  2099,  2407,  2120,
5481210,
16499, 10694, 10398, 10301, 10206, 10166


(lines broken in non-standard places to indicate code flow)






share|improve this answer























  • Just to make sure I understand, you mean it's because the new process takes over the processor's caches, right? (Roughly speaking, of course.)
    – jpmc26
    Nov 9 at 23:52












  • @jpmc26 sort of… doing anything will put pressure on caches, the more work you do the more things move around. with high probability this will result in old things being evicted from the cache and when these old things are needed again they need to be repopulated. starting up a process will involve churning through a few MB of RAM, as will creating a large numpy array, etc.
    – Sam Mason
    Nov 10 at 0:20















up vote
6
down vote



accepted










my guess would be that this is due to the Python code being evicted from various caches in the CPU/memory system



the perflib package can be used to extract more detailed CPU level stats about the state of the cache — i.e. the number of hits/misses.



I get ~5 times the LIBPERF_COUNT_HW_CACHE_MISSES counter after the Popen() call:



from subprocess import Popen, DEVNULL
from perflib import PerfCounter
import numpy as np

arr =
p = PerfCounter('LIBPERF_COUNT_HW_CACHE_MISSES')

for i in range(100):
ti =
p.reset()
p.start()
ti.extend(p.getval() for _ in range(7))
Popen(['echo'], stdout=DEVNULL)
ti.extend(p.getval() for _ in range(7))
p.stop()
arr.append(ti)


np.diff(np.array(arr), axis=1).mean(axis=0).astype(int).tolist()


gives me:



 2605,  2185,  2127,  2099,  2407,  2120,
5481210,
16499, 10694, 10398, 10301, 10206, 10166


(lines broken in non-standard places to indicate code flow)






share|improve this answer























  • Just to make sure I understand, you mean it's because the new process takes over the processor's caches, right? (Roughly speaking, of course.)
    – jpmc26
    Nov 9 at 23:52












  • @jpmc26 sort of… doing anything will put pressure on caches, the more work you do the more things move around. with high probability this will result in old things being evicted from the cache and when these old things are needed again they need to be repopulated. starting up a process will involve churning through a few MB of RAM, as will creating a large numpy array, etc.
    – Sam Mason
    Nov 10 at 0:20













up vote
6
down vote



accepted







up vote
6
down vote



accepted






my guess would be that this is due to the Python code being evicted from various caches in the CPU/memory system



the perflib package can be used to extract more detailed CPU level stats about the state of the cache — i.e. the number of hits/misses.



I get ~5 times the LIBPERF_COUNT_HW_CACHE_MISSES counter after the Popen() call:



from subprocess import Popen, DEVNULL
from perflib import PerfCounter
import numpy as np

arr =
p = PerfCounter('LIBPERF_COUNT_HW_CACHE_MISSES')

for i in range(100):
ti =
p.reset()
p.start()
ti.extend(p.getval() for _ in range(7))
Popen(['echo'], stdout=DEVNULL)
ti.extend(p.getval() for _ in range(7))
p.stop()
arr.append(ti)


np.diff(np.array(arr), axis=1).mean(axis=0).astype(int).tolist()


gives me:



 2605,  2185,  2127,  2099,  2407,  2120,
5481210,
16499, 10694, 10398, 10301, 10206, 10166


(lines broken in non-standard places to indicate code flow)






share|improve this answer














my guess would be that this is due to the Python code being evicted from various caches in the CPU/memory system



the perflib package can be used to extract more detailed CPU level stats about the state of the cache — i.e. the number of hits/misses.



I get ~5 times the LIBPERF_COUNT_HW_CACHE_MISSES counter after the Popen() call:



from subprocess import Popen, DEVNULL
from perflib import PerfCounter
import numpy as np

arr =
p = PerfCounter('LIBPERF_COUNT_HW_CACHE_MISSES')

for i in range(100):
ti =
p.reset()
p.start()
ti.extend(p.getval() for _ in range(7))
Popen(['echo'], stdout=DEVNULL)
ti.extend(p.getval() for _ in range(7))
p.stop()
arr.append(ti)


np.diff(np.array(arr), axis=1).mean(axis=0).astype(int).tolist()


gives me:



 2605,  2185,  2127,  2099,  2407,  2120,
5481210,
16499, 10694, 10398, 10301, 10206, 10166


(lines broken in non-standard places to indicate code flow)







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 9 at 16:22

























answered Nov 9 at 16:14









Sam Mason

2,1221224




2,1221224












  • Just to make sure I understand, you mean it's because the new process takes over the processor's caches, right? (Roughly speaking, of course.)
    – jpmc26
    Nov 9 at 23:52












  • @jpmc26 sort of… doing anything will put pressure on caches, the more work you do the more things move around. with high probability this will result in old things being evicted from the cache and when these old things are needed again they need to be repopulated. starting up a process will involve churning through a few MB of RAM, as will creating a large numpy array, etc.
    – Sam Mason
    Nov 10 at 0:20


















  • Just to make sure I understand, you mean it's because the new process takes over the processor's caches, right? (Roughly speaking, of course.)
    – jpmc26
    Nov 9 at 23:52












  • @jpmc26 sort of… doing anything will put pressure on caches, the more work you do the more things move around. with high probability this will result in old things being evicted from the cache and when these old things are needed again they need to be repopulated. starting up a process will involve churning through a few MB of RAM, as will creating a large numpy array, etc.
    – Sam Mason
    Nov 10 at 0:20
















Just to make sure I understand, you mean it's because the new process takes over the processor's caches, right? (Roughly speaking, of course.)
– jpmc26
Nov 9 at 23:52






Just to make sure I understand, you mean it's because the new process takes over the processor's caches, right? (Roughly speaking, of course.)
– jpmc26
Nov 9 at 23:52














@jpmc26 sort of… doing anything will put pressure on caches, the more work you do the more things move around. with high probability this will result in old things being evicted from the cache and when these old things are needed again they need to be repopulated. starting up a process will involve churning through a few MB of RAM, as will creating a large numpy array, etc.
– Sam Mason
Nov 10 at 0:20




@jpmc26 sort of… doing anything will put pressure on caches, the more work you do the more things move around. with high probability this will result in old things being evicted from the cache and when these old things are needed again they need to be repopulated. starting up a process will involve churning through a few MB of RAM, as will creating a large numpy array, etc.
– Sam Mason
Nov 10 at 0:20


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53227478%2fwhy-are-python-operations-30%25c3%2597-slower-after-calling-time-sleep-or-subprocess-pope%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Port of Spain

Run scheduled task as local user group (not BUILTIN)