GHC, Simple Performance Test yields batcrazy sawtooth graph

I am trying to do some very simple performance testing in Haskell using the clock library and code that looks like this

timeBefore <- getTime Monotonic

--do something

timeAfter <- getTime Monotonic



putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this

replicateM_ 10000 $ do

  timeBefore <- getTime Monotonic

  --something

  timeAfter <- getTime Monotonic



  putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

This also works, until I remove the underscore from replicateM_ (in an attempt to return all the times to calculate an average.... but the problem happens even when I don't return numerical values). As soon as I replace replicateM_ with replicateM, the values scrolling by started to look pretty erratic, and often very large. Confused, I graphed the result and was pretty surprised at what I saw

enter image description here

This is completely bizarre.... It looks like the overall program is constantly slowing down and speeding up in a sawtooth fashion. I can't imagine what would cause this to happen (something to do with garbage collection? Something to do with memory allocation for a bunch of returned ()s?). Note that the baseline timing values I get with the underscore 'replicateM_' is about 100nm, so all of the sawtooth behavior is adding to the overall time, by two orders of magnitude.

The weird behavior remains when I remove literally everything from my source code but this

module Main where



import           Control.Monad

import           System.Clock



main :: IO ()

main = do

  replicateM_ 10000 $ do

    timeBefore <- getTime Monotonic

    --I don't even need to put anything here for the problem to exist

    timeAfter <- getTime Monotonic

    putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)



  return ()

All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.

What am I missing here?

asked Nov 20 '18 at 23:57

jamshidh

10.7k1328

1

GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

– Thomas M. DuBuisson
Nov 21 '18 at 1:13

2

You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

– Carl
Nov 21 '18 at 5:46

If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

– dfeuer
Nov 21 '18 at 17:36

Nobody mentioned Criterion yet?

– MathematicalOrchid
Nov 23 '18 at 10:50

add a comment |

I am trying to do some very simple performance testing in Haskell using the clock library and code that looks like this

timeBefore <- getTime Monotonic

--do something

timeAfter <- getTime Monotonic



putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this

replicateM_ 10000 $ do

  timeBefore <- getTime Monotonic

  --something

  timeAfter <- getTime Monotonic



  putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

enter image description here

The weird behavior remains when I remove literally everything from my source code but this

module Main where



import           Control.Monad

import           System.Clock



main :: IO ()

main = do

  replicateM_ 10000 $ do

    timeBefore <- getTime Monotonic

    --I don't even need to put anything here for the problem to exist

    timeAfter <- getTime Monotonic

    putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)



  return ()

All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.

What am I missing here?

asked Nov 20 '18 at 23:57

jamshidh

10.7k1328

1

GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

– Thomas M. DuBuisson
Nov 21 '18 at 1:13

2

You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

– Carl
Nov 21 '18 at 5:46

If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

– dfeuer
Nov 21 '18 at 17:36

Nobody mentioned Criterion yet?

– MathematicalOrchid
Nov 23 '18 at 10:50

add a comment |

I am trying to do some very simple performance testing in Haskell using the clock library and code that looks like this

timeBefore <- getTime Monotonic

--do something

timeAfter <- getTime Monotonic



putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this

replicateM_ 10000 $ do

  timeBefore <- getTime Monotonic

  --something

  timeAfter <- getTime Monotonic



  putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

enter image description here

The weird behavior remains when I remove literally everything from my source code but this

module Main where



import           Control.Monad

import           System.Clock



main :: IO ()

main = do

  replicateM_ 10000 $ do

    timeBefore <- getTime Monotonic

    --I don't even need to put anything here for the problem to exist

    timeAfter <- getTime Monotonic

    putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)



  return ()

All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.

What am I missing here?

asked Nov 20 '18 at 23:57

jamshidh

10.7k1328

I am trying to do some very simple performance testing in Haskell using the clock library and code that looks like this

timeBefore <- getTime Monotonic

--do something

timeAfter <- getTime Monotonic



putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

This works, but the times are a bit noisy, so I repeat multiple times to get an average, like this

replicateM_ 10000 $ do

  timeBefore <- getTime Monotonic

  --something

  timeAfter <- getTime Monotonic



  putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)

enter image description here

The weird behavior remains when I remove literally everything from my source code but this

module Main where



import           Control.Monad

import           System.Clock



main :: IO ()

main = do

  replicateM_ 10000 $ do

    timeBefore <- getTime Monotonic

    --I don't even need to put anything here for the problem to exist

    timeAfter <- getTime Monotonic

    putStrLn $ show $ toNanoSecs (timeAfter `diffTimeSpec` timeBefore)



  return ()

All you need to do is add or remove that underscore, and the times go from an expected 100nm flat line to the batcrazy sawtooth that jumps from 100nm to 8000nm over and over again.

What am I missing here?

haskell

asked Nov 20 '18 at 23:57

jamshidh

10.7k1328

asked Nov 20 '18 at 23:57

jamshidh

10.7k1328

asked Nov 20 '18 at 23:57

jamshidh

10.7k1328

asked Nov 20 '18 at 23:57

jamshidh

10.7k1328

asked Nov 20 '18 at 23:57

jamshidh

10.7k1328

1

GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

– Thomas M. DuBuisson
Nov 21 '18 at 1:13

2

You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

– Carl
Nov 21 '18 at 5:46

If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

– dfeuer
Nov 21 '18 at 17:36

Nobody mentioned Criterion yet?

– MathematicalOrchid
Nov 23 '18 at 10:50

add a comment |

1

GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

– Thomas M. DuBuisson
Nov 21 '18 at 1:13

2

You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

– Carl
Nov 21 '18 at 5:46

If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

– dfeuer
Nov 21 '18 at 17:36

Nobody mentioned Criterion yet?

– MathematicalOrchid
Nov 23 '18 at 10:50

GC is a good guess - as more and more heap is allocatec the copying collector has more work. I had a similar, though less pronounced, benchmark issue due (I believe) to intel cpu frequency scaling and thermal profiles. Ultimately, it seems unlikely anyone here can tell you with certainty what is going on since you didn't share the full system. Best I'd hope for in your position is educated guessing and people telling you to do it correctly via a benchmarking library.

– Thomas M. DuBuisson
Nov 21 '18 at 1:13

You're almost certainly seeing the cost of the garbage collector traversing something that is still alive repeatedly, until it's finally freed and the GC doesn't have to traverse it to reclaim the memory. Given that the key difference is mapM vs mapM_, it has to be the list mapM produces. All other details depend on things like runtime options, ghc version, and compile flags.

– Carl
Nov 21 '18 at 5:46

If you really just need the mean, don't capture the times; just calculate the sum as you go. If you need more statistics (standard deviation, median, interquartile range, etc.), then you should consider recording the times in an (unboxed) array or vector instead of a list.

– dfeuer
Nov 21 '18 at 17:36

Nobody mentioned Criterion yet?

– MathematicalOrchid
Nov 23 '18 at 10:50

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403390%2fghc-simple-performance-test-yields-batcrazy-sawtooth-graph%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk