GHC, Simple Performance Test yields batcrazy sawtooth graph
I am trying to do some very simple performance testing in Haskell using the clock
library and code that looks like this
timeBefore <- getTime Monotonic
--do something
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--something
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
This also works, until I remove the underscore from replicateM_
(in an attempt to return all the times to calculate an average.... but the problem happens even when I don't return numerical values). As soon as I replace replicateM_
with replicateM
, the values scrolling by started to look pretty erratic, and often very large. Confused, I graphed the result and was pretty surprised at what I saw
This is completely bizarre.... It looks like the overall program is constantly slowing down and speeding up in a sawtooth fashion. I can't imagine what would cause this to happen (something to do with garbage collection? Something to do with memory allocation for a bunch of returned ()
s?). Note that the baseline timing values I get with the underscore 'replicateM_' is about 100nm, so all of the sawtooth behavior is adding to the overall time, by two orders of magnitude.
The weird behavior remains when I remove literally everything from my source code but this
module Main where
import Control.Monad
import System.Clock
main :: IO ()
main = do
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--I don't even need to put anything here for the problem to exist
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
return ()
All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.
What am I missing here?
haskell
add a comment |
I am trying to do some very simple performance testing in Haskell using the clock
library and code that looks like this
timeBefore <- getTime Monotonic
--do something
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--something
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
This also works, until I remove the underscore from replicateM_
(in an attempt to return all the times to calculate an average.... but the problem happens even when I don't return numerical values). As soon as I replace replicateM_
with replicateM
, the values scrolling by started to look pretty erratic, and often very large. Confused, I graphed the result and was pretty surprised at what I saw
This is completely bizarre.... It looks like the overall program is constantly slowing down and speeding up in a sawtooth fashion. I can't imagine what would cause this to happen (something to do with garbage collection? Something to do with memory allocation for a bunch of returned ()
s?). Note that the baseline timing values I get with the underscore 'replicateM_' is about 100nm, so all of the sawtooth behavior is adding to the overall time, by two orders of magnitude.
The weird behavior remains when I remove literally everything from my source code but this
module Main where
import Control.Monad
import System.Clock
main :: IO ()
main = do
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--I don't even need to put anything here for the problem to exist
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
return ()
All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.
What am I missing here?
haskell
1
GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.
– Thomas M. DuBuisson
Nov 21 '18 at 1:13
2
You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference ismapM
vsmapM_
, it has to be the listmapM
produces. All other details depend on things like runtime options, ghc version, and compile flags.
– Carl
Nov 21 '18 at 5:46
If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.
– dfeuer
Nov 21 '18 at 17:36
Nobody mentioned Criterion yet?
– MathematicalOrchid
Nov 23 '18 at 10:50
add a comment |
I am trying to do some very simple performance testing in Haskell using the clock
library and code that looks like this
timeBefore <- getTime Monotonic
--do something
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--something
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
This also works, until I remove the underscore from replicateM_
(in an attempt to return all the times to calculate an average.... but the problem happens even when I don't return numerical values). As soon as I replace replicateM_
with replicateM
, the values scrolling by started to look pretty erratic, and often very large. Confused, I graphed the result and was pretty surprised at what I saw
This is completely bizarre.... It looks like the overall program is constantly slowing down and speeding up in a sawtooth fashion. I can't imagine what would cause this to happen (something to do with garbage collection? Something to do with memory allocation for a bunch of returned ()
s?). Note that the baseline timing values I get with the underscore 'replicateM_' is about 100nm, so all of the sawtooth behavior is adding to the overall time, by two orders of magnitude.
The weird behavior remains when I remove literally everything from my source code but this
module Main where
import Control.Monad
import System.Clock
main :: IO ()
main = do
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--I don't even need to put anything here for the problem to exist
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
return ()
All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.
What am I missing here?
haskell
I am trying to do some very simple performance testing in Haskell using the clock
library and code that looks like this
timeBefore <- getTime Monotonic
--do something
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--something
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
This also works, until I remove the underscore from replicateM_
(in an attempt to return all the times to calculate an average.... but the problem happens even when I don't return numerical values). As soon as I replace replicateM_
with replicateM
, the values scrolling by started to look pretty erratic, and often very large. Confused, I graphed the result and was pretty surprised at what I saw
This is completely bizarre.... It looks like the overall program is constantly slowing down and speeding up in a sawtooth fashion. I can't imagine what would cause this to happen (something to do with garbage collection? Something to do with memory allocation for a bunch of returned ()
s?). Note that the baseline timing values I get with the underscore 'replicateM_' is about 100nm, so all of the sawtooth behavior is adding to the overall time, by two orders of magnitude.
The weird behavior remains when I remove literally everything from my source code but this
module Main where
import Control.Monad
import System.Clock
main :: IO ()
main = do
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--I don't even need to put anything here for the problem to exist
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)
return ()
All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.
What am I missing here?
haskell
haskell
asked Nov 20 '18 at 23:57
jamshidhjamshidh
10.7k1328
10.7k1328
1
GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.
– Thomas M. DuBuisson
Nov 21 '18 at 1:13
2
You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference ismapM
vsmapM_
, it has to be the listmapM
produces. All other details depend on things like runtime options, ghc version, and compile flags.
– Carl
Nov 21 '18 at 5:46
If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.
– dfeuer
Nov 21 '18 at 17:36
Nobody mentioned Criterion yet?
– MathematicalOrchid
Nov 23 '18 at 10:50
add a comment |
1
GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.
– Thomas M. DuBuisson
Nov 21 '18 at 1:13
2
You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference ismapM
vsmapM_
, it has to be the listmapM
produces. All other details depend on things like runtime options, ghc version, and compile flags.
– Carl
Nov 21 '18 at 5:46
If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.
– dfeuer
Nov 21 '18 at 17:36
Nobody mentioned Criterion yet?
– MathematicalOrchid
Nov 23 '18 at 10:50
1
1
GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.
– Thomas M. DuBuisson
Nov 21 '18 at 1:13
GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.
– Thomas M. DuBuisson
Nov 21 '18 at 1:13
2
2
You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is
mapM
vs mapM_
, it has to be the list mapM
produces. All other details depend on things like runtime options, ghc version, and compile flags.– Carl
Nov 21 '18 at 5:46
You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is
mapM
vs mapM_
, it has to be the list mapM
produces. All other details depend on things like runtime options, ghc version, and compile flags.– Carl
Nov 21 '18 at 5:46
If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.
– dfeuer
Nov 21 '18 at 17:36
If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.
– dfeuer
Nov 21 '18 at 17:36
Nobody mentioned Criterion yet?
– MathematicalOrchid
Nov 23 '18 at 10:50
Nobody mentioned Criterion yet?
– MathematicalOrchid
Nov 23 '18 at 10:50
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403390%2fghc-simple-performance-test-yields-batcrazy-sawtooth-graph%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403390%2fghc-simple-performance-test-yields-batcrazy-sawtooth-graph%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.
– Thomas M. DuBuisson
Nov 21 '18 at 1:13
2
You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is
mapM
vsmapM_
, it has to be the listmapM
produces. All other details depend on things like runtime options, ghc version, and compile flags.– Carl
Nov 21 '18 at 5:46
If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.
– dfeuer
Nov 21 '18 at 17:36
Nobody mentioned Criterion yet?
– MathematicalOrchid
Nov 23 '18 at 10:50