Data processing demand for LHC’s Run 3 is on the rise, but four extensive experiments are increasing their usage of GPUs to enhance their computing infrastructure.
Traditional computer farms must be equipped to analyze as many as 1 billion proton collisions per minute or tens of thousands of complex lead collisions. Their demand for data processing has increased significantly with the LHC experiment upgrades due to be in effect next year. The four extensive experiments have adopted graphics processing units (GPUs) to meet their computational challenges.
The GPUs are high-performance processors that specialize in image processing. They were initially designed to speed up the rendering of three-dimensional computer graphics. The LHC experiments, the Worldwide LHC Computing Grid(WLCG), and the CERN Openlab have studied their use over the last few years. The quality and efficiency of high-energy physics computing infrastructure will be improved by increasing the use of GPUs.
“The LHC’s ambitious upgrade program presents a variety of computing challenges. GPUs can help support machine-learning approaches to tackle many of these,” Enrica Porcari says, Head of the CERN IT department. “CERN IT has offered access to GPU platforms at the data center since 2020. These have proved popular for a variety of applications. CERN openlab, which collaborates with the industry on research into GPUs for machine learning, is also conducting essential studies. The Scientific Computing Collaborations group works to port and optimize key code from these experiments. Since 2010, ALICE has pioneered GPUs in its high-level trigger online computer farm. (H), and it is the only experiment to do so to this extent. ALICE’s new upgraded detector has more than 12 million electronic sensor elements. This creates a data stream that exceeds 3.5 Terabytes per second. The stream can reach 600 gigabytes per second after first-level data processing. These data can be analyzed online using a high-performance computer farm that employs 250 nodes, each with eight GPUs and two 32-core CPUs with 250 nodes. Most software used to reconstruct events from individual particle detector signals has been modified to run on GPUs.
The GPU-based online reconstruction of and compression of data from the Time Projection Chamber allows ALICE to reduce the data transfer rate to a maximum speed of 100 gigabytes per minute before writing it to the disk. To process lead collision data online at a 50kHz interaction rate, it would take eight times as many servers and other resources to use GPUs.
ALICE used online reconstruction of GPUs to complete the LHC pilot beam data collection at the end of October 2021. Online reconstruction is possible when there is no LHC beam. The full ALICE reconstruction software was developed with GPU support to maximize the potential of the GPUs. This means that more than 80% can be run on the GPUs.
LHCb researchers have been researching parallel computing architectures since 2013. This is to replace some of the traditional processing that takes place on CPUs. The Allen project is an entirely GPU-based first-level real-time processing that can handle LHCb’s high data rate with only 200 GPU cards. LHCb can find charged particle trajectories at the beginning of real-time processing. These reduce data rates by around 30-60 before the detector’s alignment, and calibration is completed, and a more comprehensive CPU-based full detector reconstruction is performed. This compact system can also lead to significant energy savings.
The LHCb experiment will start in 2022 and process 4 Terabytes per second of data in real-time. For physics analysis, it will select 10 gigabytes from the most exciting LHC collisions every second. LHCb is unique in that it does not offload work. Instead, it analyzes the 30 million particle-bunch crossings per Second on GPUs.
LHCb has seen improvements in CPU processing and almost a factor of 20 in its energy efficiency for its detector reconstructions since 2018. LHCb researchers look forward to putting the first data in 2022 on the new system and continuing to build upon it to realize the full potential of the upgraded LHCb.
CMS used GPUs to reconstruct LHC collision data for the first time in October 2013 during the LHC pilot beams. The CMS HLT was run on a traditional computer farm with over 30 000 cores during the LHC’s first two runs. The studies of Phase 2 of CMS show that GPUs can be used to keep the HLT farm’s cost, size, and power consumption under control, even at higher LHC luminosities. CMS will equip the entire HLT with GPUs to allow them to experience the heterogeneous farm and the use of GPUs within a production environment. The new farm will have 25600 CPU cores and 400 GPUs.
These GPUs will give CMS the ability to not only improve the online reconstruction quality but also extend its physics program. It will be able to run the online data-scouting analysis at an even faster rate than it did before. Today, 30% of HLT processing can now be offloaded to GPUs. This includes the calorimeters, pixel tracker, vertex reconstruction, and pixel-only tracks. As other components are being developed, the number of algorithms that can be run on GPUs will increase during Run 3.
ATLAS is involved in several R&D projects that aim to use GPUs in the online trigger system and the experiment. Many analyses already use GPUs. They are instrumental in machine learning applications that require faster training. ATLAS R&D has focused on improving the software infrastructure to enable GPUs and other exotic processors to become available within the next few years. A few applications, such as a fast calorimeter simulator, are now running on GPUs. These will be the key examples that can be used to validate infrastructure improvements.