High-Radix Scalable Modular Crossbar Switches

2018-12-14T17:53:01Z (GMT) by Cagla Cakir
As process technologies have scaled, the increasing number of processor cores and memories<br>on a single die has also driven the need for more complex on-chip interconnection networks.<br>Crossbar switches are primary building blocks in such networks-on-chip, as they can be used<br>as fast single-stage networks or as the core of the router switch in multi-stage networks.<br>While crossbars offer non-blocking, single-hop, all-to-all communication, they tend to scale<br>poorly with the number of nodes due to the latency and energy of the long wires and highradix<br>multiplexor structures needed. In this work, we investigate how to improve crossbar<br>performance, energy-efficiency, and scalability.<br>To better understand the design space and scaling limitations, we have developed an on chip<br>switch modeling tool calibrated using circuit-level simulations. The tool enables a design<br>space exploration showing how area, power, and performance vary across radix, data width,<br>wire parameters, and circuit implementation. In addition to conventional design options,<br>we examined capacitively coupled low-swing signaling to improve to energy consumption of<br>the I/O wires. This exploration shows that the main bottlenecks are the long I/O wires and<br>the key to improving the performance and efficiency is to minimize the area. Using these<br>insights, we present modular crossbar switches that can perform better at high radices than<br>the monolithic designs. The modular sub-blocks are arranged in a controlled flow-through,<br>pipelined scheme to eliminate global connections and maintain linear performance scaling<br>and high throughput. Modularity also enables energy savings via deactivation of unused<br>I/O wires.<br>To evaluate our design, we implemented a prototype radix-64 modular crossbar switch<br>testcip in 40nm CMOS bulk process. The testchip operates at 2.38GHz at 1V nominal<br>supply voltage and consumes 1.2W power. It offers 2.2X better throughput and 2.4X better<br>energy-efficiency than published state of the art designs. We further evaluated modular<br>crossbar networks with the proposed crevaluation tool. The proposed design achieves more than 90% saturation throughput with<br>an internal speed up of 1.5, supports high data line rates, and offers lower average network<br>latency compared to conventional crossbars. Evaluation results show that modular crossbars<br>are scalable to high-radices while still offering high-performance, energy-efficiency and onehop<br>simplicity.ossbar switches using BookSim2, a network on chip <br>