The Tianhe-1A Supercomputer, located at National Supercomputer Center, Tianjin (NVIDIA)

November 12, 2010 (TSR) –  China stunned the world on October 28, 2010, by unveiling the world’s fastest supercomputer named the Tianhe-1A. Created by combining 7,168 NVIDIA’s Tesla GPUs and 14,336 CPUs to good use,  it demonstrated capabilities which approached a whopping 50-percent faster than the previous record holder, the Tianhe-1A scores 2.507 petaflops in LINPACK benchmarking despite being half the size of a regular supercomputer.

The Tianhe-1A epitomizes modern heterogeneous computing by coupling massively parallel GPUs with multi-core CPUs, which enables significant inprovements in performance, size, and power. The system uses 7,168 NVIDIA® Tesla™ M2050 GPUs and 14,336 CPUs.  It would require more than 50,000 CPUs and  as much as twice the floor space alone to deliver the same performance using CPUs.

By comparison, a 2.507 petaflop system built entirely with CPUs would consume more than 12 megawatts, however, by use of GPUs in a heterogeneous computing environment, the Tianhe-1A consumes only 4.04 megawatts, making it 3 times more power efficient. This difference in power consumption is enough to provide electricity to over 5000 homes for a year.

Tianhe-1A was designed by the National University of Defense Technology (NUDT) in China. The system is housed at National Supercomputer Center in Tianjin and is already fully operational.

The Tianhe-1A Supercomputer, located at National Supercomputer Center, Tianjin (NVIDIA)

China’s supercomputer performs 2.5 times 10 to the 15th power mathematical operations per second and also uses a Linux-based operating system. This is also the same operating system, (OS) which Russia is also investing millions to create. Russia’s own OS is just like India’s, and is based on Linux for their citizens’ national digital security. In fact, 7,168 Tesla M2050 GPUs (along with 14,336 CPUs) are harnessed, which the National University of Defense Technology, (its creators), calculate normally would require over 50,000 regular CPUs and would  double the needed floor space.  It’s a more power-efficient configuration, consuming “only” 4.04 megawatts , which is a third of the consumption a regular machine. “Peak performance” is the key word to remember here, and the 4.7 number means a lot .

The United States for decades has developed most of the underlying technology that goes into the massive supercomputers and has built the largest, fastest machines at research laboratories and universities. Some of the top systems simulate the effects of nuclear weapons, while others predict the weather and aid in energy research.

Ironically, the United States lost its crown as supercomputing kingpin for the first time (in stunning fashion), in 2002 when Japan unveiled a machine with more computing power than the top 20 American computers combined. The United States government responded in kind,by pouring money into supercomputing projects and forming groups to plot a comeback. The United States regained its leadership status in 2004 and has kept it until now.

The race to build the fastest supercomputer has become a source of national pride as these machines are valued for their ability to solve problems critical to national interests in areas like defense, energy, finance and science. Supercomputing technology also finds its way into mainstream business; oil and gas companies use it to find reservoirs and Wall Street traders use it for superquick automated trades. Procter & Gamble even uses supercomputers to make sure that Pringles go into cans without breaking.

Over the last decade,  China has steadily inched up in the rankings of supercomputers. Tianhe-1A stands as the culmination of billions of dollars in investment and scientific development, and China has gone from a computing afterthought to a world technology superpower.

Supercomputers are built by combining thousands of small computer servers and by using software to turn them into a single entity. By doing so, any organization with enough money can buy expertise  and what amounts to off-the-shelf components to create a super-fast machine.

The Tianhe-1A system follows that model by linking thousands upon thousands of processors made by American companies Intel, and Nvidia. But the “secret sauce” behind the system, which is the most significant technological achievement, is the interconnect, or networking technology. Developed by Chinese researchers, this interconnect shuttles data back and forth across the smaller computers at breakneck speed. The Chinese interconnect can handle data at about twice the speed of a common interconnect, called InfiniBand, which is traditionally used in many supercomputers.

Christopher Mims has written a two-part blog at MIT’s Technology Review concerning the historic announcement of China’s new supercomputer which he says is only “technically the world’s fastest”.

In the first part of his blog, Mims makes the case that this has to do with the method the machine’s peformance is measured, (à la the Linpack benchmark), which is the test used to officially determine the speed of the world’s fastest supercomputers. It measures a computer’s ability to perform calculations in short bursts. But in the real world of scientific computing, sustained performance is considered a more meaningful designation as…

“the Tianhe 1A comes on strong, but American supercomputers can last all night or sometimes many days, depending on the scale of the problem they’re tackling. A distinction in peak processing power is not a predictor of sustained performance, and, according to Mims, the NVIDIA GPUs in the Tianhe-1A are not so great at the latter. With GPU-based systems, there’s a memory bottleneck that leaves the GPUs sitting idle much of the time.”

Mims also questions whether engineers working on Tianhe-1A will be able to create scientific software that can take advantage of the machine’s peak performance by rarely accessing memory,  and parellelizing the code to work with GPUs is perhaps an even greater challenge. That approach has stymied programmers in the West.

Of course, the USA is not going to allow itself  be second fiddle. The second part of Mim’s blog says that the US is already developing a system that is on track to becoming the world’s fastest supercomputer in 2012. The United States has plans in place to make much faster machines out of proprietary components, and to advance the software used by these systems in such a fashion that they become increasingly easy for researchers to use. But those computers remain years away.  For now, China is king. Let’s give them props and be gracious, everyone.

New research from the University of Warwick will be presented at the World’s largest supercomputing conference next week, which pits China’s new No. 1 supercomputer against alternative U.S. designs. The work provides crucial new analysis that will benefit the battle plans of both sides, in an escalating war between two competing technologies.

Next week Professor Stephen Jarvis, Royal Society Industry Fellow at the University of Warwick’s Department of Computer Science, will tell the 15,000 delegates in New Orleans  how general-purpose GPU (GPGPU) designs used in China’s 2.5 Petaflops Tianhe-1A  actually fare against alternative supercomputing designs employed in the U.S, which use relatively simpler processing cores brought together in parallel by highly effective and scalable interconnects, as seen in the IBM BlueGene architectures.

“The ‘Should I buy GPGPUs or BlueGene?’ debate ticks all the boxes for a good fight…”,  Jarvis states, “No one is quite sure of the design that is going to get us to Exascale computing, the next milestone in the  21stcentury computing (one quintillion floating-point operations per second {10^18}).  “It’s not simply an architectural decision either. You could run a small town on the power required to run one of these supercomputers and even if you plump for a design and power the thing up, programming it is currently impossible” ,  says Jarvis.
Specification of the CUDA-capable GPUs used. (Click photo for bigger view)

Jarvis’ research uses mathematical models, benchmarking and simulation to determine the likely performance of these future computing designs at scale:

“At Supercomputing in New Orleans we directly compare GPGPU designs with that of the BlueGene. If you are investing billions of Dollars or Yuan in supercomputing programs, then it is worth standing back and calculating what designs might realistically get you to Exascale, and once you have that design, mitigating for the known risks , (i.e), power, resilience and programmability.”

His paper uses mathematical modeling to highlight some of the biggest challenges in the supercomputing war. The first of these is a massive programming/gap, where even the best computer programmers are struggling to use even a small fraction of the computing power that the latest supercomputing designs have,  and which will continue to be a problem without significant innovation”

Execution times across different workstation configurations (CPU and GPU). (Click photo for bigger view)

“If your application fits, then GPGPU solutions will outgun ‘BlueGene’ designs on peak performance,”  but Jarvis also illustrates potential pitfalls in this approach. “The Tianhe-1A has a theoretical peak performance of 4.7 Petaflops, yet our best programming code-based measures can only deliver 2.5 Petaflops of that peak.  That’s a lot of unused computer that you are powering.  Contrast this with the Dawn BlueGene/P at Lawrence Livermore National Laboratory in the U.S.  It’s a small machine at 0.5 Petaflops peak performance, but it delivers 0.415 Petaflops of that peak. In many ways this is not surprising, as our current programming models are designed around CPUs”.

“The ‘BlueGene’ design is not without its own problems. In our paper we show that BlueGenes can require many more processing elements than a GPU-based system to do the same work. Many of our scientific algorithms just do not scale to this degree, so unless we invest in this area we are just going to end up with fantastic machines that we can not use.” Jarvis explains.

Another key problem identified by the University of Warwick research is the fact that in the rush to use excitingly powerful GPGPUs, researchers have not yet put sufficient energy into devising the best technologies to actually link them together in parallel at massive scales.

Jarvis’ modeling found that small GPU  based systems solved problems between 3 and 7 times faster than traditional CPU-based designs. However he also found that as you increased the number of processing elements linked together, the performance of the GPU-based systems improved at a much slower rate than the ‘BlueGene’-style machines.

Execution times for large scale GPU and CPU clusters (* indicates a model-based prediction). Click photo for bigger view.

“Given the crossroads at which supercomputing stands, and the national pride at stake in achieving Exascale, this design battle will continue to be hotly contested. It will also need the best modelling techniques that the community can provide to discern good design from bad,” Jarvis concludes.

Mims explains that because so much time goes into the development process of the highest-end supercomputers, with long planning, design and implementation stages, generally the experts in the field can predict with some degree of certainty which systems will be game changers and to some extent, how they will measure up to one another. This is why Mims says that “it’s possible to predict with some confidence the world’s fastest supercomputers. Even, perhaps, the single fastest supercomputer, in the year 2012.”

According to Jack Dongarra, the keeper of the TOP500 list of the world’s fastest systems, there are five such systems in line to topple Tianhe-1A’s standing.

The University of Warwick (S.J. Pennycook, S.D. Hammond and S.A. Jarvis) paper is to be presented on the 15th of November, is entitled “Performance Analysis of a Hybrid MPI/CUDA Implementation of the NAS-LU Benchmark.”  Here’s a PDF of the paper and conference schedule in New Orleans if you wish to attend.


Lady Michelle-Jennifer Santos is the Chief Visionary Founder & Owner of The Santos Republic.


Previous articleChip-in-a-pill may be approved in 2012

Lady MJ Santos is the Founder/CEO of The Santos Republic Systems. Her professional background is political and media strategy, asset and credit enhancement, international trade and development and public speaking. For two consecutive years, she was awarded by Silicon Valley’s TRIPBASE as their favourite “writer to be revered and respected” of all the world politics blogs from across the internet for “displaying knowledge and temerity in her approach matched only by her success in the political and managerial circles”.



Please enter your comment!
Please enter your name here