Is there a limit on the number of hugepage entries that can be stored in the TLB

up vote
0
down vote

favorite

I'm trying to analyze the network performance boosts that VMs get when they use hugepages. For this I configured the hypervisor to have several 1G hugepages (36) by changing the grub command line and rebooting and when launching the VMs I made sure the hugepages were being passed on to the VMs. On launching 8 VMs (each with 2 1G huge pages) and running network throughput tests between them, it was found that the throughput was drastically lower than when running without hugepages. That got me wondering, if it had something to do with the number of hugepages I was using. Is there a limit on the number of 1G hugepages that can be referenced using the TLB and if so, is it lower than the limit for regular sized pages? How do I know this information. In this scenario I was using an Ivy Bridge system, and using cpuid command, I saw something like

cache and TLB information (2):

  0x63: data TLB: 1G pages, 4-way, 4 entries

  0x03: data TLB: 4K pages, 4-way, 64 entries

  0x76: instruction TLB: 2M/4M pages, fully, 8 entries

  0xff: cache data is in CPUID 4

  0xb5: instruction TLB: 4K, 8-way, 64 entries

  0xf0: 64 byte prefetching

  0xc1: L2 TLB: 4K/2M pages, 8-way, 1024 entries

Does it mean I can have only 4 1G hugepage mappings in the TLB at any time?

asked Nov 7 at 20:21

Sai Malleni

2

Welcome to Stackoverflow. While your question is set within the scenario of virtualization and involving different CPUs, your questions are substantially answered by this question: stackoverflow.com/questions/40649655/…. Effectively, yes, the processor's TLB has dedicated space for the different types of entries, with a very limited space for huge pages.
– Brian
Nov 7 at 20:29

Yes, you've found a way to create very poor hugepage locality. Most workloads that do a lot of kernel access to memory have more accesses within the same 1G hugepage. (User-space memory on Linux usually uses 2M hugepages when it's using anonymous hugepages at all). In Haswell for example, 2M and 4k TLB entries can go into the 2nd-level TLB victim cache, but apparently 1G entries can't, if 7-cpu.com/cpu/Haswell.html is fully accurate.
– Peter Cordes
Nov 8 at 8:57

add a comment |

up vote
0
down vote

favorite

cache and TLB information (2):

  0x63: data TLB: 1G pages, 4-way, 4 entries

  0x03: data TLB: 4K pages, 4-way, 64 entries

  0x76: instruction TLB: 2M/4M pages, fully, 8 entries

  0xff: cache data is in CPUID 4

  0xb5: instruction TLB: 4K, 8-way, 64 entries

  0xf0: 64 byte prefetching

  0xc1: L2 TLB: 4K/2M pages, 8-way, 1024 entries

Does it mean I can have only 4 1G hugepage mappings in the TLB at any time?

asked Nov 7 at 20:21

Sai Malleni

2

Welcome to Stackoverflow. While your question is set within the scenario of virtualization and involving different CPUs, your questions are substantially answered by this question: stackoverflow.com/questions/40649655/…. Effectively, yes, the processor's TLB has dedicated space for the different types of entries, with a very limited space for huge pages.
– Brian
Nov 7 at 20:29

Yes, you've found a way to create very poor hugepage locality. Most workloads that do a lot of kernel access to memory have more accesses within the same 1G hugepage. (User-space memory on Linux usually uses 2M hugepages when it's using anonymous hugepages at all). In Haswell for example, 2M and 4k TLB entries can go into the 2nd-level TLB victim cache, but apparently 1G entries can't, if 7-cpu.com/cpu/Haswell.html is fully accurate.
– Peter Cordes
Nov 8 at 8:57

add a comment |

up vote
0
down vote

favorite

cache and TLB information (2):

  0x63: data TLB: 1G pages, 4-way, 4 entries

  0x03: data TLB: 4K pages, 4-way, 64 entries

  0x76: instruction TLB: 2M/4M pages, fully, 8 entries

  0xff: cache data is in CPUID 4

  0xb5: instruction TLB: 4K, 8-way, 64 entries

  0xf0: 64 byte prefetching

  0xc1: L2 TLB: 4K/2M pages, 8-way, 1024 entries

Does it mean I can have only 4 1G hugepage mappings in the TLB at any time?

asked Nov 7 at 20:21

Sai Malleni

cache and TLB information (2):

  0x63: data TLB: 1G pages, 4-way, 4 entries

  0x03: data TLB: 4K pages, 4-way, 64 entries

  0x76: instruction TLB: 2M/4M pages, fully, 8 entries

  0xff: cache data is in CPUID 4

  0xb5: instruction TLB: 4K, 8-way, 64 entries

  0xf0: 64 byte prefetching

  0xc1: L2 TLB: 4K/2M pages, 8-way, 1024 entries

Does it mean I can have only 4 1G hugepage mappings in the TLB at any time?

cpu cpu-architecture tlb huge-pages

asked Nov 7 at 20:21

Sai Malleni

asked Nov 7 at 20:21

Sai Malleni

asked Nov 7 at 20:21

Sai Malleni

asked Nov 7 at 20:21

Sai Malleni

asked Nov 7 at 20:21

Sai Malleni

2

Welcome to Stackoverflow. While your question is set within the scenario of virtualization and involving different CPUs, your questions are substantially answered by this question: stackoverflow.com/questions/40649655/…. Effectively, yes, the processor's TLB has dedicated space for the different types of entries, with a very limited space for huge pages.
– Brian
Nov 7 at 20:29

Yes, you've found a way to create very poor hugepage locality. Most workloads that do a lot of kernel access to memory have more accesses within the same 1G hugepage. (User-space memory on Linux usually uses 2M hugepages when it's using anonymous hugepages at all). In Haswell for example, 2M and 4k TLB entries can go into the 2nd-level TLB victim cache, but apparently 1G entries can't, if 7-cpu.com/cpu/Haswell.html is fully accurate.
– Peter Cordes
Nov 8 at 8:57

add a comment |

2

Welcome to Stackoverflow. While your question is set within the scenario of virtualization and involving different CPUs, your questions are substantially answered by this question: stackoverflow.com/questions/40649655/…. Effectively, yes, the processor's TLB has dedicated space for the different types of entries, with a very limited space for huge pages.
– Brian
Nov 7 at 20:29

Yes, you've found a way to create very poor hugepage locality. Most workloads that do a lot of kernel access to memory have more accesses within the same 1G hugepage. (User-space memory on Linux usually uses 2M hugepages when it's using anonymous hugepages at all). In Haswell for example, 2M and 4k TLB entries can go into the 2nd-level TLB victim cache, but apparently 1G entries can't, if 7-cpu.com/cpu/Haswell.html is fully accurate.
– Peter Cordes
Nov 8 at 8:57

Welcome to Stackoverflow. While your question is set within the scenario of virtualization and involving different CPUs, your questions are substantially answered by this question: stackoverflow.com/questions/40649655/…. Effectively, yes, the processor's TLB has dedicated space for the different types of entries, with a very limited space for huge pages.
– Brian
Nov 7 at 20:29

Yes, you've found a way to create very poor hugepage locality. Most workloads that do a lot of kernel access to memory have more accesses within the same 1G hugepage. (User-space memory on Linux usually uses 2M hugepages when it's using anonymous hugepages at all). In Haswell for example, 2M and 4k TLB entries can go into the 2nd-level TLB victim cache, but apparently 1G entries can't, if 7-cpu.com/cpu/Haswell.html is fully accurate.
– Peter Cordes
Nov 8 at 8:57

add a comment |

1 Answer
1

active

oldest

votes

up vote
2
down vote

Yes, of course. Having an unbounded upper limit in the number of TLB entries would require an unbounded amount of physical space in the CPU die.

Every TLB in every architecture has an upper limit on the number of entries it can hold.

For the x86 case this number is less than what you probably expected: it is 4.

It was 4 in your Ivy Bridge and it is still 4 in my Kaby Lake, four generations later.

It's worth noting that 4 entries cover 4GiB of RAM (4x1GiB), that's seems enough to handle networking if properly used.

Finally, TLBs are core resources, each core has its set of TLBs.

If you disable SMT (e.g. Intel Hyper Threading) or assign both threads on a core to the same VM, the VMs won't be competing for the TLB entries.

However each VM can only have at most 4xC huge page entries cached, where C is the number of cores dedicated to that VM.

The ability of the VM to fully exploit these entries depends on how the Host OS, the hyper-visor and the guest OS work together and on the memory layout of the guest application of interest (pages shared across cores have duplicated TLB entries in each core).

It's hard (almost impossible?) to transparently use 1GiB pages, I'm not sure how the hyper-visor and the VM are going to use those pages - I'd say you need specific support for that but I'm not sure.

As Peter Cordes noted, 1GiB pages use a single-level TLB (and in Skylake, apparently there is also a second level TLB with 16 entries for 1GB pages).
A miss in the 1GiB TLB will result in a page walk so it's very important that all the software involved use page-aware code.

edited Nov 8 at 22:21

BeeOnRope

24.4k873169

answered Nov 8 at 9:57

Margaret Bloom

21.3k32762

Worth mentioning that at least according to 7-cpu.com/cpu/Haswell.html, the 2nd level TLB victim cache doesn't hold 1G TLB entries in Haswell, so if you have misses they have to come from the page-walker. But Skylake has a 16-entry 2nd-level TLB for 1G pages to back up the 4-entry 1st level TLB. 7-cpu.com/cpu/Skylake.html.
– Peter Cordes
Nov 8 at 11:08

Thanks @PeterCorder, that's nice to know and have in the answer.
– Margaret Bloom
Nov 8 at 11:19

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53197226%2fis-there-a-limit-on-the-number-of-hugepage-entries-that-can-be-stored-in-the-tlb%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

Yes, of course. Having an unbounded upper limit in the number of TLB entries would require an unbounded amount of physical space in the CPU die.

Every TLB in every architecture has an upper limit on the number of entries it can hold.

For the x86 case this number is less than what you probably expected: it is 4.

It was 4 in your Ivy Bridge and it is still 4 in my Kaby Lake, four generations later.

edited Nov 8 at 22:21

BeeOnRope

24.4k873169

answered Nov 8 at 9:57

Margaret Bloom

21.3k32762

Worth mentioning that at least according to 7-cpu.com/cpu/Haswell.html, the 2nd level TLB victim cache doesn't hold 1G TLB entries in Haswell, so if you have misses they have to come from the page-walker. But Skylake has a 16-entry 2nd-level TLB for 1G pages to back up the 4-entry 1st level TLB. 7-cpu.com/cpu/Skylake.html.
– Peter Cordes
Nov 8 at 11:08

Thanks @PeterCorder, that's nice to know and have in the answer.
– Margaret Bloom
Nov 8 at 11:19

add a comment |

up vote
2
down vote

Yes, of course. Having an unbounded upper limit in the number of TLB entries would require an unbounded amount of physical space in the CPU die.

Every TLB in every architecture has an upper limit on the number of entries it can hold.

For the x86 case this number is less than what you probably expected: it is 4.

It was 4 in your Ivy Bridge and it is still 4 in my Kaby Lake, four generations later.

edited Nov 8 at 22:21

BeeOnRope

24.4k873169

answered Nov 8 at 9:57

Margaret Bloom

21.3k32762

Worth mentioning that at least according to 7-cpu.com/cpu/Haswell.html, the 2nd level TLB victim cache doesn't hold 1G TLB entries in Haswell, so if you have misses they have to come from the page-walker. But Skylake has a 16-entry 2nd-level TLB for 1G pages to back up the 4-entry 1st level TLB. 7-cpu.com/cpu/Skylake.html.
– Peter Cordes
Nov 8 at 11:08

Thanks @PeterCorder, that's nice to know and have in the answer.
– Margaret Bloom
Nov 8 at 11:19

add a comment |

up vote
2
down vote

Yes, of course. Having an unbounded upper limit in the number of TLB entries would require an unbounded amount of physical space in the CPU die.

Every TLB in every architecture has an upper limit on the number of entries it can hold.

For the x86 case this number is less than what you probably expected: it is 4.

It was 4 in your Ivy Bridge and it is still 4 in my Kaby Lake, four generations later.

edited Nov 8 at 22:21

BeeOnRope

24.4k873169

answered Nov 8 at 9:57

Margaret Bloom

21.3k32762

Yes, of course. Having an unbounded upper limit in the number of TLB entries would require an unbounded amount of physical space in the CPU die.

Every TLB in every architecture has an upper limit on the number of entries it can hold.

For the x86 case this number is less than what you probably expected: it is 4.

It was 4 in your Ivy Bridge and it is still 4 in my Kaby Lake, four generations later.

edited Nov 8 at 22:21

BeeOnRope

24.4k873169

answered Nov 8 at 9:57

Margaret Bloom

21.3k32762

edited Nov 8 at 22:21

BeeOnRope

24.4k873169

edited Nov 8 at 22:21

BeeOnRope

24.4k873169

edited Nov 8 at 22:21

BeeOnRope

24.4k873169

answered Nov 8 at 9:57

Margaret Bloom

21.3k32762

answered Nov 8 at 9:57

Margaret Bloom

21.3k32762

answered Nov 8 at 9:57

Margaret Bloom

21.3k32762

Worth mentioning that at least according to 7-cpu.com/cpu/Haswell.html, the 2nd level TLB victim cache doesn't hold 1G TLB entries in Haswell, so if you have misses they have to come from the page-walker. But Skylake has a 16-entry 2nd-level TLB for 1G pages to back up the 4-entry 1st level TLB. 7-cpu.com/cpu/Skylake.html.
– Peter Cordes
Nov 8 at 11:08

Thanks @PeterCorder, that's nice to know and have in the answer.
– Margaret Bloom
Nov 8 at 11:19

add a comment |

Worth mentioning that at least according to 7-cpu.com/cpu/Haswell.html, the 2nd level TLB victim cache doesn't hold 1G TLB entries in Haswell, so if you have misses they have to come from the page-walker. But Skylake has a 16-entry 2nd-level TLB for 1G pages to back up the 4-entry 1st level TLB. 7-cpu.com/cpu/Skylake.html.
– Peter Cordes
Nov 8 at 11:08

Thanks @PeterCorder, that's nice to know and have in the answer.
– Margaret Bloom
Nov 8 at 11:19

Worth mentioning that at least according to 7-cpu.com/cpu/Haswell.html, the 2nd level TLB victim cache doesn't hold 1G TLB entries in Haswell, so if you have misses they have to come from the page-walker. But Skylake has a 16-entry 2nd-level TLB for 1G pages to back up the 4-entry 1st level TLB. 7-cpu.com/cpu/Skylake.html.
– Peter Cordes
Nov 8 at 11:08

Thanks @PeterCorder, that's nice to know and have in the answer.
– Margaret Bloom
Nov 8 at 11:19

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk