What cost does bloated object file carry?












-1















While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.



Let me illustrate it with an example.



Original hpp/cpp files can be approximated like this:



original.hpp:




void example(bool arg1, bool arg2, const char* data);


original.cpp:



#include "ex1.hpp"
#include <iostream>

void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}


Let's assume, that every single time the function is called, arg1 and arg2 are known during compilation. Argument data isn't, and for variety of reasons its processing cannot be put in header file.



However, all those if statements can be handled by the compiler with a little bit of template magic:



magic.hpp:



template<bool arg1, bool arg2>
void example(const char* data);


magic.cpp:



#include "ex1.hpp"    
#include <iostream>

template<bool arg1, bool arg2>
struct Processor;

template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};

template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};

template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};

template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};

template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}

template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);


As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!



Now, in my real-life case things are a bit more complex, because instead of two bool arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);



Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.



All this in the name of removing several if and one switch statements.



My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?










share|improve this question

























  • Comments are not for extended discussion; this conversation has been moved to chat.

    – Samuel Liew
    Nov 21 '18 at 2:13
















-1















While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.



Let me illustrate it with an example.



Original hpp/cpp files can be approximated like this:



original.hpp:




void example(bool arg1, bool arg2, const char* data);


original.cpp:



#include "ex1.hpp"
#include <iostream>

void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}


Let's assume, that every single time the function is called, arg1 and arg2 are known during compilation. Argument data isn't, and for variety of reasons its processing cannot be put in header file.



However, all those if statements can be handled by the compiler with a little bit of template magic:



magic.hpp:



template<bool arg1, bool arg2>
void example(const char* data);


magic.cpp:



#include "ex1.hpp"    
#include <iostream>

template<bool arg1, bool arg2>
struct Processor;

template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};

template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};

template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};

template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};

template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}

template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);


As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!



Now, in my real-life case things are a bit more complex, because instead of two bool arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);



Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.



All this in the name of removing several if and one switch statements.



My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?










share|improve this question

























  • Comments are not for extended discussion; this conversation has been moved to chat.

    – Samuel Liew
    Nov 21 '18 at 2:13














-1












-1








-1








While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.



Let me illustrate it with an example.



Original hpp/cpp files can be approximated like this:



original.hpp:




void example(bool arg1, bool arg2, const char* data);


original.cpp:



#include "ex1.hpp"
#include <iostream>

void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}


Let's assume, that every single time the function is called, arg1 and arg2 are known during compilation. Argument data isn't, and for variety of reasons its processing cannot be put in header file.



However, all those if statements can be handled by the compiler with a little bit of template magic:



magic.hpp:



template<bool arg1, bool arg2>
void example(const char* data);


magic.cpp:



#include "ex1.hpp"    
#include <iostream>

template<bool arg1, bool arg2>
struct Processor;

template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};

template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};

template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};

template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};

template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}

template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);


As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!



Now, in my real-life case things are a bit more complex, because instead of two bool arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);



Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.



All this in the name of removing several if and one switch statements.



My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?










share|improve this question
















While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.



Let me illustrate it with an example.



Original hpp/cpp files can be approximated like this:



original.hpp:




void example(bool arg1, bool arg2, const char* data);


original.cpp:



#include "ex1.hpp"
#include <iostream>

void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}


Let's assume, that every single time the function is called, arg1 and arg2 are known during compilation. Argument data isn't, and for variety of reasons its processing cannot be put in header file.



However, all those if statements can be handled by the compiler with a little bit of template magic:



magic.hpp:



template<bool arg1, bool arg2>
void example(const char* data);


magic.cpp:



#include "ex1.hpp"    
#include <iostream>

template<bool arg1, bool arg2>
struct Processor;

template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};

template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};

template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};

template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};

template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}

template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);


As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!



Now, in my real-life case things are a bit more complex, because instead of two bool arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);



Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.



All this in the name of removing several if and one switch statements.



My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?







c++ templates optimization embedded






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 14:00







Darth Hunterix

















asked Nov 20 '18 at 21:41









Darth HunterixDarth Hunterix

1,22432329




1,22432329













  • Comments are not for extended discussion; this conversation has been moved to chat.

    – Samuel Liew
    Nov 21 '18 at 2:13



















  • Comments are not for extended discussion; this conversation has been moved to chat.

    – Samuel Liew
    Nov 21 '18 at 2:13

















Comments are not for extended discussion; this conversation has been moved to chat.

– Samuel Liew
Nov 21 '18 at 2:13





Comments are not for extended discussion; this conversation has been moved to chat.

– Samuel Liew
Nov 21 '18 at 2:13












1 Answer
1






active

oldest

votes


















4














The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.






share|improve this answer





















  • 1





    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.

    – Lundin
    Nov 21 '18 at 11:58











  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.

    – Darth Hunterix
    Nov 21 '18 at 21:55











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402004%2fwhat-cost-does-bloated-object-file-carry%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









4














The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.






share|improve this answer





















  • 1





    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.

    – Lundin
    Nov 21 '18 at 11:58











  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.

    – Darth Hunterix
    Nov 21 '18 at 21:55
















4














The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.






share|improve this answer





















  • 1





    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.

    – Lundin
    Nov 21 '18 at 11:58











  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.

    – Darth Hunterix
    Nov 21 '18 at 21:55














4












4








4







The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.






share|improve this answer















The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 22:02

























answered Nov 20 '18 at 21:57









xaxxonxaxxon

14.5k43061




14.5k43061








  • 1





    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.

    – Lundin
    Nov 21 '18 at 11:58











  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.

    – Darth Hunterix
    Nov 21 '18 at 21:55














  • 1





    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.

    – Lundin
    Nov 21 '18 at 11:58











  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.

    – Darth Hunterix
    Nov 21 '18 at 21:55








1




1





What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.

– Lundin
Nov 21 '18 at 11:58





What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.

– Lundin
Nov 21 '18 at 11:58













I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.

– Darth Hunterix
Nov 21 '18 at 21:55





I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.

– Darth Hunterix
Nov 21 '18 at 21:55




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402004%2fwhat-cost-does-bloated-object-file-carry%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

鏡平學校

ꓛꓣだゔៀៅຸ໢ທຮ໕໒ ,ໂ'໥໓າ໼ឨឲ៵៭ៈゎゔit''䖳𥁄卿' ☨₤₨こゎもょの;ꜹꟚꞖꞵꟅꞛေၦေɯ,ɨɡ𛃵𛁹ޝ޳ޠ޾,ޤޒޯ޾𫝒𫠁သ𛅤チョ'サノބޘދ𛁐ᶿᶇᶀᶋᶠ㨑㽹⻮ꧬ꧹؍۩وَؠ㇕㇃㇪ ㇦㇋㇋ṜẰᵡᴠ 軌ᵕ搜۳ٰޗޮ޷ސޯ𫖾𫅀ल, ꙭ꙰ꚅꙁꚊꞻꝔ꟠Ꝭㄤﺟޱސꧨꧼ꧴ꧯꧽ꧲ꧯ'⽹⽭⾁⿞⼳⽋២៩ញណើꩯꩤ꩸ꩮᶻᶺᶧᶂ𫳲𫪭𬸄𫵰𬖩𬫣𬊉ၲ𛅬㕦䬺𫝌𫝼,,𫟖𫞽ហៅ஫㆔ాఆఅꙒꚞꙍ,Ꙟ꙱エ ,ポテ,フࢰࢯ𫟠𫞶 𫝤𫟠ﺕﹱﻜﻣ𪵕𪭸𪻆𪾩𫔷ġ,ŧآꞪ꟥,ꞔꝻ♚☹⛵𛀌ꬷꭞȄƁƪƬșƦǙǗdžƝǯǧⱦⱰꓕꓢႋ神 ဴ၀க௭எ௫ឫោ ' េㇷㇴㇼ神ㇸㇲㇽㇴㇼㇻㇸ'ㇸㇿㇸㇹㇰㆣꓚꓤ₡₧ ㄨㄟ㄂ㄖㄎ໗ツڒذ₶।ऩछएोञयूटक़कयँृी,冬'𛅢𛅥ㇱㇵㇶ𥄥𦒽𠣧𠊓𧢖𥞘𩔋цѰㄠſtʯʭɿʆʗʍʩɷɛ,əʏダヵㄐㄘR{gỚṖḺờṠṫảḙḭᴮᵏᴘᵀᵷᵕᴜᴏᵾq﮲ﲿﴽﭙ軌ﰬﶚﶧ﫲Ҝжюїкӈㇴffצּ﬘﭅﬈軌'ffistfflſtffतभफɳɰʊɲʎ𛁱𛁖𛁮𛀉 𛂯𛀞నఋŀŲ 𫟲𫠖𫞺ຆຆ ໹້໕໗ๆทԊꧢꧠ꧰ꓱ⿝⼑ŎḬẃẖỐẅ ,ờỰỈỗﮊDžȩꭏꭎꬻ꭮ꬿꭖꭥꭅ㇭神 ⾈ꓵꓑ⺄㄄ㄪㄙㄅㄇstA۵䞽ॶ𫞑𫝄㇉㇇゜軌𩜛𩳠Jﻺ‚Üမ႕ႌႊၐၸဓၞၞၡ៸wyvtᶎᶪᶹစဎ꣡꣰꣢꣤ٗ؋لㇳㇾㇻㇱ㆐㆔,,㆟Ⱶヤマފ޼ޝަݿݞݠݷݐ',ݘ,ݪݙݵ𬝉𬜁𫝨𫞘くせぉて¼óû×ó£…𛅑הㄙくԗԀ5606神45,神796'𪤻𫞧ꓐ㄁ㄘɥɺꓵꓲ3''7034׉ⱦⱠˆ“𫝋ȍ,ꩲ軌꩷ꩶꩧꩫఞ۔فڱێظペサ神ナᴦᵑ47 9238їﻂ䐊䔉㠸﬎ffiﬣ,לּᴷᴦᵛᵽ,ᴨᵤ ᵸᵥᴗᵈꚏꚉꚟ⻆rtǟƴ𬎎

Why https connections are so slow when debugging (stepping over) in Java?