Improving the Performance and Energy-efficiency of Virtual Memory

Improving the Performance and Energy-efficiency of Virtual Memory
Author: Vasileios Karakostas
Publisher:
Total Pages: 173
Release: 2016
Genre:
ISBN:


Download Improving the Performance and Energy-efficiency of Virtual Memory Book in PDF, Epub and Kindle

Virtual memory improves programmer productivity, enhances process security, and increases memory utilization. However, virtual memory requires an address translation from the virtual to the physical address space on every memory operation. Page-based implementations of virtual memory divide physical memory into fixed size pages, and use a per-process page table to map virtual pages to physical pages. The hardware key component for accelerating address translation is the Translation Lookaside Buffer (TLB), that holds recently used mappings from the virtual to the physical address space. However, address translation still incurs high (i) performance overheads due to costly page table walks after TLB misses, and (ii) energy overheads due to frequent TLB lookups on every memory operation. This thesis quantifies these overheads and proposes techniques to mitigate them. In this thesis we argue that fixed size page-based approaches for address translation exhibit limited potential for improving TLB performance because they increase the TLB reach by a fixed amount. To overcome the limitations of such approaches, we introduce the concept of range translations and we show how they can significantly improve the performance and energy-efficiency of address translation. We first comprehensively quantify the address translation performance overhead on a collection of emerging scale-out applications. We show that address translation accounts for up to 16% of the total execution time. We find that huge pages may improve the application performance by reducing the time spent in page walks, enabling better exploitation of the available execution resources. However, the limited hardware support for huge pages in combination with the workloads' low memory locality leave ample space for performance optimizations. To reduce the performance overheads of address translation, we propose Redundant Memory Mappings (R10). R10 provides an efficient alternative representation of many virtual-to-physical mappings. We define a range translation be a subset of a process's pages that are virtually and physically contiguous. R10 translates each range translation with a single range table entry, enabling a modest number of entries to translate most of the process's address space. R10 operates in parallel with standard paging and introduces a software range table and a hardware range TLB with arbitrarily large reach that is accessed in parallel with the regular L2-page TLB. We modify the operating system to automatically detect ranges and to increase their likelihood with eager paging. R10 is thus transparent to applications. We prototype R10 software in Linux and emulate the hardware. R10 reduces the overhead of virtual memory to less than 1% on average on a wide range of workloads. To reduce the energy cost of address translation, we propose the Lite mechanism and the TLB-Lite and R10-Lite designs. Lite monitors the performance and utility of L1 TLBs, and adaptively changes their sizes with way-disabling. The resulting TLB-Lite design targets commodity processors with TLB support for huge pages and opportunistically reduces the dynamic energy spent in address translation with minimal impact on TLB miss cycles. To further provide more energy-efficient address translation, we propose R10-Lite that adds to R10 an L1-range TLB, that is accessed in parallel with the regular L1-page TLB, and the Lite mechanism. The high hit ratio of the L1-range TLB allows Lite to downsize the L1-page TLBs more aggressively. R10-Lite reduces the dynamic energy spent in address translation by 71% on average. Above the near-zero L2 TLB misses from R10, R10-Lite further reduces the overhead from L1 TLB misses by 99\%. The proposed designs target current and future high-performance and energy-efficient memory systems to meet the ever increasing memory demands of applications.