Improving Virtual Memory Performance in Virtualized Environments

Improving Virtual Memory Performance in Virtualized Environments
Author: Yashwant Marathe
Publisher:
Total Pages: 250
Release: 2018
Genre:
ISBN:


Download Improving Virtual Memory Performance in Virtualized Environments Book in PDF, Epub and Kindle

Virtual Memory is a major system performance bottleneck in virtualized environments. In addition to expensive address translations, frequent virtual machine context switches are common in virtualized environments, resulting in increased TLB miss rates, subsequent expensive page walks and data cache contention due to incoming page table entries evicting useful data. Orthogonally, translation coherence, which is currently an expensive operation implemented in software, can consume up to 50% of the runtime of an application executing on the guest. To improve the performance of virtual memory in virtualized environments, two solutions have been proposed in this thesis - namely, (1) Context Switch Aware Large TLB (CSALT), an architecture which addresses the problem of increased TLB miss rates and their adverse impact on data caches. CSALT copes with the increased demand of context switches by storing a large number TLB entries. It mitigates data cache contention by employing a novel TLB-aware cache partitioning scheme. On 8-core systems that switch between two virtual machine contexts executing multi-threaded workloads, CSALT achieves an average performance improvement of 85% over a baseline with conventional L1-L2 TLBs and 25% over a baseline which has a large L3 TLB (2) Translation Coherence using Addressable TLBs (TCAT), a hardware translation coherence scheme which eliminates almost all of the overheads associated with address translation coherence. TCAT overlays translation coherence atop cache coherence to accurately identify slave cores. It then leverages the addressable Part-Of-Memory TLB (POM-TLB) to eliminate expensive Inter Processor Interrupts (IPI) and achieve precise invalidations on the slave core. On 8-core systems with one virtual machine context executing multi-threaded workloads, TCAT achieves an average performance improvement of 13% over the kvmtlb baseline