The Teraflops Research Chip (also called Polaris) is a research manycore aka multicore processor, containing 80 cores developed by Intel Corporation's Tera-Scale Computing Research Program. The processor was officially announced February 11, 2007 and shown working at the 2007 International Solid-State Circuits Conference. Features of the processor include dual floating point engines, sleeping-core technology, self-correction, fixed-function cores, and three-dimensional memory stacking. The purpose of the chip is to explore the possibilities of Tera-Scale architecture (the process of creating processors with more than four cores) and to experiment with various forms of networking and communication within the next generation of processors.
The processor consists of 80 individual cores on a single chip. The cores are different from the cores used in today's mainstream multi-processors in that they are much simpler in design. The same parts and ideas that went into constructing today's generation of processors were used in the new processor. These parts and ideas are simply reconstructed in a fashion which defines the new tera-scale era of processor architecture and allow for more than four cores to function on one chip.
Each of the cores on board the teraflops research chip contains two floating point engines.
The new tera-scale technology which allows for so many cores to be integrated on one chip also allows for better load distribution and a decreased chance of overheating. If a core is overloaded then the heat produced by that core increases, which reflects a decrease in efficiency and a waste of energy. In the teraflops research chip, if some of the cores are being overloaded, that load can just be delegated to other cores, resulting in a load distribution which does not create as much heat. The processor introduces a notion of sleeping cores. To further power efficiency and optimize the ratio between computing usage and power consumption, cores that are not in use or are not needed will sleep. In other words, they will not be powered or operational other than to perform their communication duties.
Along with 80 cores, the chip also contains 80 routers. Each core has a dedicated router which is responsible for the communication of that core with all other cores and components of the processor. The router uses a five port system with 1 port going to each of the surrounding cores and one going to the DRAM (the processors local memory). The chip is laid out in an 8 core by 10 core format. Each of the 8 cores in any of the 10 rows, called nodes, has the ability to communicate directly with other cores within the same node. Communication between nodes and to other processor components is directed through a routing system. The on-die interconnect fabric which the cores use to communicate with each other is currently being researched. One option being considered is the ring topology, which consists of various sized ring networks being integrated within each other to connect the cores. A more flexible and likely solution is the mesh topology in which the cores will be connected on a grid layout.
The processor allows for the use of a self-correction system. If a core is unable to function, it can delegate all of its workload permanently to another core without the need to edit the software interacting with the processor.
With tera-scale technology processors such as the teraflops research chip can dedicate cores to certain functions. The number of cores dedicated and the functions that they are dedicated to will be dependent on the use of the processor. Functions include graphics, networking, security, and more.
In the demo processor an SRAM chip ("Freya") was stacked directly underneath the cores ("Polaris"). This vertical connection is relatively new compared to the old technologies of having the memory be next to the cpu on the die, or embedded within the die. The distance of the DRAM main memory to CPU is one of the roadblocks to maximizing the capabilities of the processor. However, while minimizing this distance minimizes the signal delay and power consumption, it also brings the DRAM closer to the CPU's heat (the maximum temperature can easily reach higher than 120 °C), increasing the risk of data loss. However, more recent studies by Google have shown that it's not the heat as much as it's the system utilization (which causes more heat because of inefficient design of CPU's and other IC's) that causes more errors. The cnet.com article cites a paper written by Bianca Schroeder (University of Toronto), Eduardo Pinheiro (Google Inc.), and Wolf-Dietrich Weber (Google Inc.).
The idea behind making a research chip of this sort is that it allows companies to explore the possibilities of tera-scale computing. Instead of being forced to work in theory, engineers and researchers can instead work with the chip itself experimentally. The chip is a wake-up call to other computer-related industries to advance their products to meet the needs of the new computing power. Through the exploration of the 80 core processor, concepts such as the Larrabee processor, that has both the CPU and GPU on one chip, can be better understood and made more realistic. With the computing power of the teraflops research chip, technologies such as graphics virtualization and visual recognition become much more realistic.
As is often the case, the development of new processor architecture is accompanied by the issue of software development. Software tends to lag behind hardware development, especially in the case of multi-core chips. Intel aims to solve this problem by creating a new programming language especially for the 80 core processor called Ct. Intel also created a software development kit to accommodate visual recognition and multi thread instructions. Both Intel and Microsoft are supporting a new age of programmers by jointly donating $20 million to the cause.
|3.16 GHz||0.95 V||62W||1.62 Terabits/s||1.01 Teraflops|
|5.1 GHz||1.2 V||175W||2.61 Terabits/s||1.63 Teraflops|
|5.7 GHz||1.35 V||265W||2.92 Terabits/s||1.81 Teraflops|
The processor is constructed using a 65 nm CMOS process, the die is 12.64 mm by 21.72 mm (274.5 mm²) and contains 100 million transistors. The package is connected through a 1248 pin LGA with 343 signal pins.