Moving Along the Frontier: Ready to Crush Science

Computational Users at the Oak Ridge Leadership Computing Facility, Or OLCF, scientists are running code on Frontier’s architecture in the form of a powerful test system called Crusher in OLCF.

Frontier, an HPE Cray EX supercomputer that supports 10. is enabled18 Counts per second – or 18 with 10 zeros – was installed in late 2021 and is undergoing integration and testing. Frontier is on track to become the country’s first ex-scale supercomputer this year.

Early science users are accessing Frontier via crushers, and the Frontier system will enter full user operation with INCITE program users on January 1, 2023. The Crusher HPE is a 1.5-cabinet iteration of the massive system featuring 192 nodes connected by slingshot interconnects. Each node contains an optimized third generation AMD EPYC™ CPU and four AMD Instinct™ MI250x accelerators.

ORNL’s Frontier, a HPE Cray eX supercomputer capable of 1018 calculations per second – or 10 with 18 zeros – is on track to become the nation’s first exascale supercomputer this year. Image credit: ORNL

four well-established projects – the Cancer Distributed Learning Environment, or Candle, project; (Parallel) Architecture, or Computational Hydrodynamics on the Chola, Project; Locally Self-consistent Multiple Scattering, or LSMS, project; And the Nuclear Coupled-Cluster Oak Ridge, or NuCCOR, project – has successfully optimized code on Frontier Architecture via Crusher.

Some of this code has been used on the platform since OLCF’s first hybrid-architecture system, the 27-petaflop Cray XK7 Titan supercomputer, which debuted 10 years ago this year. Taking up only 44 square feet of floor space, the Crusher is 1/100th the size of the previous Titan supercomputer, but was faster than the entire 4,352-square-foot system, packing a huge computing punch for its small size. Had been.

OLCF, the US Department of Energy’s Office of Science User Facility at DOE’s Oak Ridge National Laboratory, has built a reputation for developing and deploying some of the most powerful high-performance computing resources for open science, and has followed Frontier’s success. The system will last as the country’s first exascale supercomputer. Frontier provides an 8-fold increase in computational power over the center’s current 200-petaflop IBM AC922 Summit supercomputer.

“Kolsher is the latest in a long line of test and development systems that we have deployed for early users of the OLCF platform and is easily the most powerful of these,” said ORNL’s Bronson Messer, OLCF Director of Science. “The results these code teams are getting on the machine are very encouraging as we look to the beginning of the exascale era with Frontier.”

OLCF is hosting a hackathon that is gearing up to get users up and running on Crusher and Frontier soon. Hosted by the User Support Group at OLCF, the 3-day events target Frontier’s architecture and are extremely valuable to the facility, vendors, and user community.

ORNL’s Ballint Xu, Group said, “As more people move to this hardware – when we have more codes and styles of programming on the system – it gives us opportunities to discover and overcome challenges and we Prepares to run science on the frontier.” Leader of OLCF’s Advanced Computing for Atomic, Particle and Astrophysics Group. More hackathons will be held in the coming months.

Below are details of these four important projects currently underway on the Crusher Cabinet and the breakthrough science enabling them along Frontiers in the areas of cancer research, astrophysics, materials and nuclear physics.

“Transformer” Deep Learning Model

Formed from a partnership between DOE and the National Cancer Institute, or NCI, Candle Cancer is part of the Moonshot effort and exists within DOE’s Exascale Computing Project. It aims to develop applications from pilot projects in the first Cancer Moonshot effort, extend them to the next generation of supercomputers, and support their deep learning components to accelerate cancer research on machines like Frontier.

The Candle Project is currently developing next-generation natural language processing models for precision medicine using “transformers,” which are deep learning models that identify unseen connections between words in clinical text.

Led by Gina Turasi, director of the National Center for Computational Sciences at ORNL, the Candle team has successfully run one of their transformer models on a crusher, achieving an 80% speedup at the crusher node from previous systems. The attempt to optimize and run the code on Crusher was done by John Gonley, a computational scientist and technical lead of ORNL’s Candle team.

Frontier will enable them to use a much larger neural language processing model with many more parameters. Ultimately, the team aims to provide the NCI with a better, more accurate model for cancer surveillance.

“We expect that our next generation of models trained on systems like Frontier are going to be based on this transformer architecture and going to be significantly more accurate than the models we have today,” Gounle said.

Chickpeas

The Chola code is an astrophysical hydrodynamics code used to simulate the dynamics of galaxies, revealing how they form and evolve. The code from the Center for Accelerated Application Readiness, or CAR, program, was one of the first codes to be rewritten for the Chola Frontier. Now, the team’s code is rolling out on Crusher, and the team is looking at major results that are leading them to an understanding of the physics driving star formation and why galaxies stop forming stars.

“We are seeing about 15 times the speed on the crushers compared to our baseline tests from the 2019 summit,” said Evan Schneider, an assistant professor at the University of Pittsburgh and Chola’s principal investigator. “About 3x the improvement is hardware based, and about 5x is from software development improvements made through the CAAR project.”

The promising performance on the crusher points to success on the full Frontier system, which will be operational in the second half of 2022.

lsms

LSMS is a first-principles code used to calculate the properties of materials, including magnetic materials, metal systems, and alloys. LSMS, one of OLCF’s CAAR codes, can calculate the physics of extremely large material systems – more than 100,000 atoms – as determined by the motion of electrons in a solid. The code is currently deployed on Crusher and will soon be able to scale to the full Frontier system.

“With Frontier, we will be able to perform LSMS calculations of large systems and study new physics,” said ORNL’s Markus Eisenbach, OLCF’s senior computational scientist and principal investigator of the LSMS. “Since we will have significantly more computational power available along Frontiers, we can actually use physics models that include more correlation effects that we cannot easily capture on current systems.”

Eisenbach and team also look forward to combining classical statistical mechanics – which provides the team with the behavior of materials at different temperatures – with machine learning workflows on the frontier to more quickly calculate material behavior.

NuCCOR

NuCCOR is a nuclear physics code in OLCF’s CAAR program that is used to calculate the properties of atomic nuclei and their reactions. The code is an ab-initio quantum multi-body application, meaning that it computes atomic properties “from the beginning” rather than making assumptions about their behavior.

NuCCOR is able to calculate the properties of large nuclei, breaking new ground to reach the atomic size that defines the extent of matter’s existence. NuCCOR is currently operating on crushers and is running on the scale of the full Frontier system.

“With Frontier, we arrive at a paradigm shift in nuclear physics,” said ORNL’s Gustav Janssen, computational scientist at the National Center for Computational Sciences and the CAAR liaison for NUCCOR.

“The use of ab-initio methods, methods that use only the forces between protons and neutrons as input, would no longer be limited to the size of a nucleus, and the entire atomic chart would be within reach. It would be more accurate.” and will pave the way for precise calculations, a better understanding of the fundamental interactions between protons and neutrons, and the discovery of nuclear isotopes that have yet to be discovered.”

During the NuCCOR team’s initial testing at Crusher, it was found that its computational kernels were 8 times faster on one of the AMD Instinct™ MI250x GPUs that power Frontier than on one of Summit’s NVIDIA V100 GPUs.

Source: ORNL


Leave a Reply

Your email address will not be published.