GHC, Simple Performance Test yields batcrazy sawtooth graph












2















I am trying to do some very simple performance testing in Haskell using the clock library and code that looks like this



timeBefore <- getTime Monotonic
--do something
timeAfter <- getTime Monotonic

putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)


This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this



replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--something
timeAfter <- getTime Monotonic

putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)


This also works, until I remove the underscore from replicateM_ (in an attempt to return all the times to calculate an average.... but the problem happens even when I don't return numerical values). As soon as I replace replicateM_ with replicateM, the values scrolling by started to look pretty erratic, and often very large. Confused, I graphed the result and was pretty surprised at what I saw



enter image description here



This is completely bizarre.... It looks like the overall program is constantly slowing down and speeding up in a sawtooth fashion. I can't imagine what would cause this to happen (something to do with garbage collection? Something to do with memory allocation for a bunch of returned ()s?). Note that the baseline timing values I get with the underscore 'replicateM_' is about 100nm, so all of the sawtooth behavior is adding to the overall time, by two orders of magnitude.



The weird behavior remains when I remove literally everything from my source code but this



module Main where

import Control.Monad
import System.Clock

main :: IO ()
main = do
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--I don't even need to put anything here for the problem to exist
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

return ()


All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.



What am I missing here?










share|improve this question


















  • 1





    GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

    – Thomas M. DuBuisson
    Nov 21 '18 at 1:13






  • 2





    You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

    – Carl
    Nov 21 '18 at 5:46











  • If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

    – dfeuer
    Nov 21 '18 at 17:36













  • Nobody mentioned Criterion yet?

    – MathematicalOrchid
    Nov 23 '18 at 10:50
















2















I am trying to do some very simple performance testing in Haskell using the clock library and code that looks like this



timeBefore <- getTime Monotonic
--do something
timeAfter <- getTime Monotonic

putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)


This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this



replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--something
timeAfter <- getTime Monotonic

putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)


This also works, until I remove the underscore from replicateM_ (in an attempt to return all the times to calculate an average.... but the problem happens even when I don't return numerical values). As soon as I replace replicateM_ with replicateM, the values scrolling by started to look pretty erratic, and often very large. Confused, I graphed the result and was pretty surprised at what I saw



enter image description here



This is completely bizarre.... It looks like the overall program is constantly slowing down and speeding up in a sawtooth fashion. I can't imagine what would cause this to happen (something to do with garbage collection? Something to do with memory allocation for a bunch of returned ()s?). Note that the baseline timing values I get with the underscore 'replicateM_' is about 100nm, so all of the sawtooth behavior is adding to the overall time, by two orders of magnitude.



The weird behavior remains when I remove literally everything from my source code but this



module Main where

import Control.Monad
import System.Clock

main :: IO ()
main = do
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--I don't even need to put anything here for the problem to exist
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

return ()


All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.



What am I missing here?










share|improve this question


















  • 1





    GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

    – Thomas M. DuBuisson
    Nov 21 '18 at 1:13






  • 2





    You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

    – Carl
    Nov 21 '18 at 5:46











  • If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

    – dfeuer
    Nov 21 '18 at 17:36













  • Nobody mentioned Criterion yet?

    – MathematicalOrchid
    Nov 23 '18 at 10:50














2












2








2








I am trying to do some very simple performance testing in Haskell using the clock library and code that looks like this



timeBefore <- getTime Monotonic
--do something
timeAfter <- getTime Monotonic

putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)


This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this



replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--something
timeAfter <- getTime Monotonic

putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)


This also works, until I remove the underscore from replicateM_ (in an attempt to return all the times to calculate an average.... but the problem happens even when I don't return numerical values). As soon as I replace replicateM_ with replicateM, the values scrolling by started to look pretty erratic, and often very large. Confused, I graphed the result and was pretty surprised at what I saw



enter image description here



This is completely bizarre.... It looks like the overall program is constantly slowing down and speeding up in a sawtooth fashion. I can't imagine what would cause this to happen (something to do with garbage collection? Something to do with memory allocation for a bunch of returned ()s?). Note that the baseline timing values I get with the underscore 'replicateM_' is about 100nm, so all of the sawtooth behavior is adding to the overall time, by two orders of magnitude.



The weird behavior remains when I remove literally everything from my source code but this



module Main where

import Control.Monad
import System.Clock

main :: IO ()
main = do
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--I don't even need to put anything here for the problem to exist
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

return ()


All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.



What am I missing here?










share|improve this question














I am trying to do some very simple performance testing in Haskell using the clock library and code that looks like this



timeBefore <- getTime Monotonic
--do something
timeAfter <- getTime Monotonic

putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)


This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this



replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--something
timeAfter <- getTime Monotonic

putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)


This also works, until I remove the underscore from replicateM_ (in an attempt to return all the times to calculate an average.... but the problem happens even when I don't return numerical values). As soon as I replace replicateM_ with replicateM, the values scrolling by started to look pretty erratic, and often very large. Confused, I graphed the result and was pretty surprised at what I saw



enter image description here



This is completely bizarre.... It looks like the overall program is constantly slowing down and speeding up in a sawtooth fashion. I can't imagine what would cause this to happen (something to do with garbage collection? Something to do with memory allocation for a bunch of returned ()s?). Note that the baseline timing values I get with the underscore 'replicateM_' is about 100nm, so all of the sawtooth behavior is adding to the overall time, by two orders of magnitude.



The weird behavior remains when I remove literally everything from my source code but this



module Main where

import Control.Monad
import System.Clock

main :: IO ()
main = do
replicateM_ 10000 $ do
timeBefore <- getTime Monotonic
--I don't even need to put anything here for the problem to exist
timeAfter <- getTime Monotonic
putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

return ()


All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.



What am I missing here?







haskell






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 23:57









jamshidhjamshidh

10.7k1328




10.7k1328








  • 1





    GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

    – Thomas M. DuBuisson
    Nov 21 '18 at 1:13






  • 2





    You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

    – Carl
    Nov 21 '18 at 5:46











  • If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

    – dfeuer
    Nov 21 '18 at 17:36













  • Nobody mentioned Criterion yet?

    – MathematicalOrchid
    Nov 23 '18 at 10:50














  • 1





    GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

    – Thomas M. DuBuisson
    Nov 21 '18 at 1:13






  • 2





    You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

    – Carl
    Nov 21 '18 at 5:46











  • If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

    – dfeuer
    Nov 21 '18 at 17:36













  • Nobody mentioned Criterion yet?

    – MathematicalOrchid
    Nov 23 '18 at 10:50








1




1





GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

– Thomas M. DuBuisson
Nov 21 '18 at 1:13





GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

– Thomas M. DuBuisson
Nov 21 '18 at 1:13




2




2





You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

– Carl
Nov 21 '18 at 5:46





You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

– Carl
Nov 21 '18 at 5:46













If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

– dfeuer
Nov 21 '18 at 17:36







If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

– dfeuer
Nov 21 '18 at 17:36















Nobody mentioned Criterion yet?

– MathematicalOrchid
Nov 23 '18 at 10:50





Nobody mentioned Criterion yet?

– MathematicalOrchid
Nov 23 '18 at 10:50












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403390%2fghc-simple-performance-test-yields-batcrazy-sawtooth-graph%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403390%2fghc-simple-performance-test-yields-batcrazy-sawtooth-graph%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Run scheduled task as local user group (not BUILTIN)

Port of Spain