This week, the Argonne National Laboratory announced that its Aurora supercomputer is now fully operational and accessible to researchers. Originally unveiled in 2015, the system faced significant delays but now boasts over 1 FP64 ExaFLOPS performance for simulations and 11.6 mixed precision ExaFLOPS for artificial intelligence and machine learning tasks.
“The launch of Aurora for open scientific research is a thrilling milestone,” stated Michael Papka, director of the Argonne Leadership Computing Facility (ALCF), a user facility supported by the DOE Office of Science. “Preliminary feedback from early users has showcased Aurora’s extensive capabilities. We are excited to see how it will revolutionize scientific inquiry across various fields.”
The release of Aurora to the scientific community signifies its official acceptance by the Argonne National Laboratory (ARNL), highlighting a significant achievement for the project. Initially set for a 2018 launch, the supercomputer’s timeline was disrupted when Intel ceased production of its Xeon Phi processors. Subsequent redesigns of the system and delays related to Intel’s 7nm process technology further postponed its completion to 2021, and eventually to 2023.
Although the hardware installation was completed in June 2023, achieving exascale performance took until May 2024. During this period, only a select group of researchers had access to the system.
While Aurora may not be the leading supercomputer for simulations, as its FP64 performance just surpasses one ExaFLOPS, it stands out in AI capabilities with its ability to reach 11.6 mixed precision ExaFLOPS, as per the HPL-MxP benchmark.
“A major focus for Aurora is the development of large language models specifically for scientific applications,” explained Rick Stevens, Argonne’s Associate Laboratory Director for Computing, Environment and Life Sciences. “For instance, the AuroraGPT project aims to create a foundational model that consolidates knowledge from various fields like biology and chemistry. Aurora is designed to help researchers develop new AI tools that keep pace with the speed of their ideas, not just the speed of their computational resources.”
Among the initial research undertakings on Aurora are detailed simulations of complex systems such as the human circulatory system, nuclear reactors, and supernovae. The supercomputer’s remarkable capabilities are also pivotal in analyzing data from major research facilities like Argonne’s Advanced Photon Source (APS) and CERN’s Large Hadron Collider.
“The projects currently running on Aurora are among the most cutting-edge and dynamic in science today,” said Katherine Riley, Director of Science at ALCF. “Aurora will expedite scientific discoveries by enabling more sophisticated modeling of complex physical systems and processing vast datasets.”
On the technical front, Aurora is exceptionally robust. It consists of 166 racks, each containing 64 blades, totaling 10,624 blades. Each blade is equipped with two Xeon Max processors, 64 GB of HBM2E memory, and six Intel Data Center Max ‘Ponte Vecchio’ GPUs, all maintained by an advanced liquid-cooling system.
In all, Aurora features 21,248 CPUs with more than 1.1 million high-performance x86 cores, 19.9 PB of DDR5 memory, and 1.36 PB of HBM2E memory connected to the CPUs. It also includes 63,744 GPUs optimized for AI and HPC applications with 8.16 PB of HBM2E memory. The system’s storage capabilities are supported by 1,024 nodes with solid-state drives, offering a total capacity of 220 PB and a bandwidth of 31 TB/s. The infrastructure is based on HPE’s Shasta supercomputer architecture and uses Slingshot interconnects.