Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity

2018-12-14T17:55:39Z (GMT) by Donghyuk Lee
In modern systems, DRAM-based main memory is signi cantly slower than the processor.<br>Consequently, processors spend a long time waiting to access data from main memory, making<br>the long main memory access latency one of the most critical bottlenecks to achieving high<br>system performance. Unfortunately, the latency of DRAM has remained almost constant in<br>the past decade. This is mainly because DRAM has been optimized for cost-per-bit, rather<br>than access latency. As a result, DRAM latency is not reducing with technology scaling, and<br>continues to be an important performance bottleneck in modern and future systems.<br>This dissertation seeks to achieve low latency DRAM-based memory systems at low cost<br>in three major directions. The key idea of these three major directions is to enable and ex-<br>ploit latency heterogeneity in DRAM architecture. First, based on the observation that long<br>bitlines in DRAM are one of the dominant sources of DRAM latency, we propose a new<br>DRAM architecture, Tiered-Latency DRAM (TL-DRAM), which divides the long bitline into<br>two shorter segments using an isolation transistor, allowing one segment to be accessed with<br>reduced latency. Second, we propose a ne-grained DRAM latency reduction mechanism,<br>Adaptive-Latency DRAM, which optimizes DRAM latency for the common operating conditions for individual DRAM module. We observe that DRAM manufacturers incorporate a very large timing margin as a provision against the worst-case operating conditions, which<br>is accessing the slowest cell across all DRAM products with the worst latency at the highest<br>temperature, even though such a slowest cell and such an operating condition are rare. Our<br>mechanism dynamically optimizes DRAM latency to the current operating condition of the<br>accessed DRAM module, thereby reliably improving system performance. Third, we observe<br>that cells closer to the peripheral logic can be much faster than cells farther from the peripheral<br>logic (a phenomenon we call architectural variation). Based on this observation, we propose a<br>new technique, Architectural-Variation-Aware DRAM (AVA-DRAM), which reduces DRAM<br>latency at low cost, by pro ling and identifying only the inherently slower regions in DRAM<br>to dynamically determine the lowest latency DRAM can operate at without causing failures.<br>This dissertation provides a detailed analysis of DRAM latency by using both circuit-level<br>simulation with a detailed DRAM model and FPGA-based pro ling of real DRAM modules.<br>Our latency analysis shows that our low latency DRAM mechanisms enable significant latency<br>reductions, leading to large improvement in both system performance and energy efficiency<br>across a variety of workloads in our evaluated systems, while ensuring reliable DRAM operation.<br>