Sheer power
By Simon Williams
High performance computing systems are helping to predict the weather, design cars and calculate how much the world is heating. Simon Williams multi-threads.
HardCopy Issue: 43 | Found In: Systems | Published: 01/02/2009 | Last Revision: 07/07/2010
To most people a computer is a PC, or a Personal Computer. That’s what they work with on desk or lap, so that’s what it means to them. They may have some idea that the computers that link up the Internet are a bit different, but unless they work in a data centre, it probably won’t be of much interest.
Those who do work in data-centres, academic institutions or research laboratories know there are other types of computer. There are the High Performance Computers (HPCs) which work at a much higher level of performance than even a multi-core, multi-gigabyte PC.
These days, an HPC will be a 64-bit machine running many processors in a massively parallel array. The days of a single processor linked to a bank of addressable memory, often described as the ‘von Neumann model’, have been superseded. Instead each computing task is shared among many processors with the results integrated at the end. Such an approach dramatically increases the overall performance of the computer.
HPCs can be put to a wide variety of uses, addressing problems in areas of engineering, mathematics, geology, meteorology and aviation. Where, even a few years ago, problems would have been tied to timeshare on a mainframe facility, it’s now often possible to work on an HPC within your own organisation.
History lessons
There have always been computers that have been designed for specific tasks. You wouldn’t use a personal computer to predict weather cycles, or a Cray supercomputer for a quick game of Luxor II. Going back to the launch of the IBM PC in 1981, a typical HPC would have been something like the DEC PDP11.
Cray supercomputers weren't just round to look good in Bond movies: the shape also meant shorter interconnections between processors and other key components.
The PDP11 machines were generally termed mini-computers and could be used for anything from handling payrolls to controlling machine tools and running complex computations in university departments. However its days were numbered as the PC started to erode the bottom end of the machine’s market while more exotic supercomputers took over the high-end of the HPC market.
Supercomputers differ from mainstream machines in being geared specifically for high levels of calculation. They’re not normally used for transaction processing so the memory hierarchy is very important and high bandwidth can often take precedence over latency.
As computation became faster, the time taken for signals to move between processors becomes relevant and the length of each connection needs to be kept to a minimum. It’s not for aesthetic reasons that Cray supercomputers adopted a cylindrical design. With processor boards radiating from a central core, interconnect length can be reduced in comparison with more conventional cuboid designs. The Borg missed a trick, there.
The development of high performance supercomputers prompted a number of technologies that we now take for granted in everyday computing. They include liquid cooling of components, parallel file systems and striped disks (which are the basis of RAID arrays). Indeed the most popular operating system for supercomputers is now Linux, having taken the flag from Unix in 2004.
Supercomputers now tend to be individual designs customised for specific tasks and relatively few in number. However below this level of performance, a much larger market has grown up for HPCs. These still use the massively parallel computing paradigm that was developed for the supercomputer, but in arrays of lower-cost, off-the-peg processors and in conventional rack-mounted installations, usually referred to as computer clusters.
Big players in HPC hardware include IBM, HP and Sun and, as well as the academic and research uses for the devices, they’re increasingly being used for business applications such as data warehousing and transaction processing. Indeed there’s a move to change the meaning of the initials ‘HPC’ from High Performance Computing to High Productivity Computing.
The HPC in action
HPCs are in use today in many different ways. The University of Cambridge, for example, has one of the largest HPC clusters in the UK. Called Darwin, it is designed to facilitate all kinds of research within the University (Cambridge being the largest producer of research papers in the country). The cluster is built on 585 Dell servers with a total of 2,300 Intel Core processors held in just 18 racks.
Parallel standards
There are two key standards for programming on massively parallel computing systems, namely MPI and OpenMP. The Message Passing Interface (MPI) was created by William Gropp of Illinois University and Ewing Lusk of the Argonne National Laboratory. As the title says, it is designed to handle the passing of messages between processors in a multi-processor array such as you might find in a HPC cluster.
MPI is language and platform independent so can be adhered to by .NET, Java and other application platforms. It provides protocols with specific syntax for handling communication between processes (and in some cases processors), for defining suitable datatypes and for managing how those processes interrelate.
There are two common levels: MPI 1.2 and MPI 2. MPI 2 adds a lot of extra functions to the earlier standard but many programmers have stuck to those in 1.2 to ensure compatibility on a wide range of systems. More and more implementations now support MPI 2, though, so the compatibility requirement is easing.
OpenMP is an Application Programming Interface (API) aimed at parallel processing. It enables multi-threading of processes, allowing a single thread to be split into a specified number of slave threads, each of which can be allocated to and run on a separate processor, according to instructions from the runtime environment. At the end of the parallel section, the threads come back together and results are allocated as appropriate. This is the basis of much parallel programming and the OpenMP constructs can be easily built into C, C++ and FORTRAN code.
OpenMP is currently in version 3 and is a very useful if not essential tool to use when writing code for parallel execution. Again, support for this standard is wide and provides a platform and language independent process for creating applications to run on HPCs.
The HPC facility is available across all departments within the University although initially it was more heavily used by the more technically aware, partly because it was running a Linux operating system which some departments found unfamiliar.
To get over this difficulty and increase the level of use in some of the less technical departments, the HPC Department implemented Microsoft HPC Server 2008 as an alternative operating system. The familiar, Windows-like environment encouraged many more students to make use of the HPC facility rather than relying on their own departments’ slower mini-clusters.
The cluster is arranged so that jobs are managed depending not only on requests for computing power but also on whether they are designed for Linux or HPC Server. Processors can be rebooted on-the-fly between the two operating systems, so distributing the load across the cluster.
Sharing an HPC facility among many different people can make it a very cost-effective tool and provide higher performance overall than several, smaller computing clusters.
Based in Norway, but a major player in the global oil and gas markets, Statoil has a workforce of 24,000 and employs a number of HPC clusters analysing and creating simulations based on huge sets of seismic data. The largest single cluster contains 512 nodes and 1,024 Intel Xeon processors, and can reach a theoretical peak performance of 6.3 teraflops. This particular cluster processes geological data, as part of Statoil’s search for new oil resources.
The cluster is managed under Red Hat Enterprise Linux, after a project conducted by the company rationalised its computing environments from seven different variants of UNIX to just Linux and Windows, with considerable savings. Ole Petter Drange, Operation Manager for servers and data storage, says that “...the total cost of ownership was always far lower with Linux than it was with UNIX, so it was worth it.”
The science fiction exploration game EVE Online sits on a massive HPC array and holds the record for simultaneous players in a single game universe at 45,186. There are games with more subscribers than EVE Online but most, like World of Warcraft, restrict their players to particular groups of servers so that not all active players work within the same universe.
Perhaps surprisingly, when CCP started EVE Online seven years ago, the company built the system on Windows 2000 Server, using a single computer to manage and hand out computing tasks to a hierarchy of ‘compute nodes’ further down the tree. Now running on Windows HPC Server 2008, the EVE Online system makes use of a 1.5Tb database, which effectively describes the state of the game itself. SQL Server software handles 40,000 operations a second to service the game’s subscribers as they interact with each other and move around the EVE universe.
Holding the record for the most players online together in a single universe, EVE Online is also one of the best looking multi-player games around.
CCP Games sees the computing challenges of servicing its subscriber base as being similar to that facing many different organisations involved in large-scale computing. As its Vice President of Engineering, Gabe Mahoney, says “Just like so many large enterprises today, we’re looking at high-performance computing to give us the performance and scale that we need to meet challenging technical difficulties.”
The company has ambitious plans for other online games, using the architecture they’ve developed for EVE Online, with the potential of reaching 1 million concurrent players.
These are just three of the varied tasks in which HPC has been found invaluable. Because of the easily configurable and scalable nature of an HPC cluster, once the infrastructure is in place and a system is running its comparatively easy to increase the performance. Simply add extra compute nodes by putting more servers in the rack.
Management of the individual processor loading is largely transparent, but it does require the specialist HPC operating systems. There are operating systems designed for parallel clusters available from Microsoft, Red Hat and Sun, all of which deliver high-performance without requiring reams of detailed technical knowledge.
HPC operating systems
With Microsoft HPC Server 2008, you can see at a glance which processors are taking the most hammer.
For general purpose HPC clusters that are built on Intel or AMD processors, there are two main operating systems to run. Microsoft offers Windows HPC Server 2008 while Red Hat has both HPC Solution and Enterprise Linux for HPC Compute Nodes.
One big advantage that Microsoft claims for HPC Server 2008 is its similarity with the Windows desktop interface. The argument goes that for anyone used to using Windows on the desktop, switching to working on an HPC is straightforward as the look and feel of the front end will be very familiar.
The operating system itself includes a job scheduler which can work on its own or with third-party products to structure the way tasks are distributed through the cluster. This is backed up by monitoring controls so you can be alerted quickly if any processor is running outside its thermal limits.
Parallel programming
Even after training, the concepts required to write efficient and accurate code for parallel execution are not straightforward. To help with this, there are a number of tools that can take some of the strain out of creating multi-processing code.
The release of Microsoft Visual Studio 2010 Community Technology Preview (CTP) shows increased support for parallel development. It includes Visual Studio IDE support for parallel development, native C++ libraries and compiler support for parallel applications, while .NET Framework 4.0 includes parallel language semantics and components.
Additionally, the new version includes a performance analyser that can work with parallel code to show concurrency issues in the applications you’re writing, as well as other, linear, problems that need sorting. The debugger can show you where the various segments of your code are executing so that you can tweak it for efficient parallel running.
As the prime mover in building processors with multiple cores and in creating multi-processor clusters, Intel has been supporting programmers with a variety of parallel-capable compilers and debugging tools. Its new Intel Parallel Studio, available now in Beta, is designed to work alongside Microsoft Visual Studio and provide C and C++ programmers with a variety of extra tools.
The Intel Parallel Adviser shows where parallelism could benefit existing source code so that you can take advantage of the use of multiple processors as you’re working on your current applications. It can identify conflicts between different threads and suggest ways of resolving them to produce faster, more efficient code.
Another tool within the suite, the Intel Parallel Amplifier, shows where code has become linear in execution and could benefit from improving the parallelism of its execution. The suite also determines how well code will scale from a few cores or processors to many. Planning the code now for massive parallelism later can save heavy rewrites further down the line.
It also includes failover handling for the head node. In the event the head node fails, then the job scheduler is transferred to a failover node in such a way that the compute nodes working to the schedule see no interruption in instruction flow.
One of the important aspects of running on an HPC cluster is the ability to handle parallel programming. As you would expect, HPC Server 2008 integrates with Microsoft’s Visual Studio 2008 to provide a parallel processing environment for developing and running parallel applications.
Red Hat HPC Solution is, in fact, two related pieces of software working together. There’s Red Hat’s own Enterprise Linux and there’s Platform Computing’s Open Cluster Stack 5 (OCS5). Enterprise Linux is the company’s corporate offering for large-scale network use while Open Cluster Stack 5 includes cluster management tools, resource and application monitors, interconnect support, and a job scheduler in the form of Platform Lava.
Installation of Platform OCS5 is simplified by the use of node images. You build up the structure of a cluster node and effectively cut and paste it to all the nodes in your system. Lava then manages the load on each of the compute nodes, making sure that work is distributed across the cluster and that no nodes are over-stretched or left idle.
If you’re considering deploying an HPC cluster in an academic, scientific, engineering or business environment, these are good, off-the shelf operating systems which will help you manage the underlying complexity of a multi-processor system.