Lifetime-Aware Task Mapping for Embedded Chip Multiprocessors

2018-12-12T21:24:14Z (GMT) by Adam Hartman
The International Technology Roadmap for Semiconductors has identified reliability as a growing<br>challenge for users and designers of all types of integrated circuits. In particular, the occurrence<br>of wearout faults is expected to increase exponentially as manufacturing processes scale below<br>65nm. By acknowledging the importance of these faults and the resulting failures, designers can<br>take steps to improve the expected lifetime of the system. Several system-level techniques, such<br>as communication architecture design and slack allocation, are capable of mitigating the effects of<br>wearout faults and improving system lifetime. Task mapping optimization is another system-level<br>technique that can be applied at both design time and runtime to enhance system lifetime and has<br>several advantages over other lifetime optimization techniques.<br>The first advantage that task mapping has over other system-level techniques is that it is more<br>flexible. We show that task mapping can positively impact system lifetime in a number of scenarios<br>and does not rely on redundancy or complex reconfiguration mechanisms, although both of those<br>provide additional benefit. The second advantage of using task mapping to improve system lifetime<br>is a lower cost compared to other techniques. Other lifetime improvement techniques seek to augment<br>systems in a cost-effective way to mitigate the effects of wearout faults while task mapping<br>does not necessarily require additional investment in hardware to achieve similar effects. The final,<br>and perhaps most significant, advantage of task mapping is its ability to dynamically manage lifetime<br>as the system is running. While decisions made by other system- and circuit-level techniques<br>must be finalized before the system is manufactured, the task mapping can continue to change to<br>account for the actual state of the system in real time.<br>We propose two distinct task mapping techniques to be used at the two different times during<br>which optimization can occur. At design time, we take advantage of abundant computational resources to perform an intelligent search of the initial task mapping solution space using ant colony<br>optimization. At runtime, we leverage information from hardware sensors to quickly select good<br>task mappings using a meta-heuristic. Our two techniques can be used together or in isolation<br>depending on the use case and design requirements of the system.<br>This thesis makes the following intellectual contributions:<br> Lifetime-aware design-time task mapping - Ours is the first approach to search for initial task<br>mappings that directly optimize system lifetime rather than optimizing other metrics which<br>only influence system lifetime, like temperature and power. Because this technique is meant<br>for use at design time, we employ a powerful search algorithm called ant colony optimization,<br>which takes advantage of a designer’s computational resources to find a near-optimal<br>task mapping. Our lifetime-aware design-time task mapping improves system lifetime by<br>an average of 32.3% compared to a lifetime-agnostic approach across a range of real-world<br>benchmarks.<br> Lifetime-aware runtime task mapping - Ours is the first approach to dynamically manage<br>the lifetime of embedded chip multiprocessors at runtime through the use of task mapping.<br>By leveraging data from hardware sensors and information about the system state, our metaheuristic<br>approach is able to find high-quality task mappings which extend system lifetime<br>without performing a costly search of the solution space. Our lifetime-aware runtime task<br>mapping improves system lifetime by an average of 7.1% compared to a runtime temperatureaware<br>task mapping approach, and in the best case, system lifetime was improved by 17.4%.<br>Our approach also improved the amount of time until the first component failure by 14.6% on<br>average and 33.9% in the best case.<br> Evaluation of lifetime-aware task mapping - We measure the improvement in system lifetime<br>resulting from our task mapping techniques across a range of benchmarks. We also compare<br>our lifetime-aware techniques to others which attempt to indirectly optimize system lifetime<br>to show that direct optimization is the only way to achieve maximum lifetime. For example,<br>we show that task mappings that are near optimal in terms of average initial component temperature<br>can result in a range of system lifetimes that is up to 53.2% of the optimal lifetime;<br>clearly, low temperature does not imply long lifetime.  Co-optimization of competing lifetime metrics - The wide range of use cases for embedded<br>chip multiprocessors means that different systems will have different design goals. We consider<br>how the pertinent measure of lifetime changes in different use cases, and analyze the<br>degree to which these competing lifetime metrics can be co-optimized.<br> Best practices for a system lifetime simulator - We created a simulator which estimates the<br>lifetime of an embedded chip multiprocessor executing one or more applications. The simulator<br>is detailed enough to capture the effects of various system-level design techniques on<br>lifetime, and thus, it is valuable to the field of lifetime optimization research even outside the<br>context of task mapping.<br>In summary, lifetime optimization for embedded chip multiprocessors is required so that cuttingedge<br>manufacturing processes can continue to be used for a wide range of systems. Our research<br>mitigates the problem of increasingly common wearout faults by proposing and evaluating a pair of<br>design- and runtime task mapping techniques that enhance system lifetime across a broad range of<br>use cases. <br>