(Video: The Little Big Studio/ CERN IT department )
The Large Hadron Collider (LHC) produces millions of collisions every second in each detector, generating approximately one petabyte of data per second. None of today’s computing systems are capable of recording such rates, so sophisticated selection systems are used for a first fast electronic pre-selection, only passing one out of 10,000 events. Tens of thousands of processor cores then select 1% of the remaining events for analysis.
Even after such drastic data reduction, the four big experiments, ALICE, ATLAS, CMS and LHCb, together need to store over 25 petabytes per year. The LHC data are aggregated in the CERN Data Centre, which performs initial data reconstruction is performed, and a copy is archived to long-term tape storage. Another copy is sent to several large data centres around the world. Subsequently hundreds of thousands of computers from around the world come into action: harnessed in a distributed computing service, they form the Worldwide LHC Computing Grid (WLCG), which provides the resources to store, distribute, and process the LHC data. WLCG combines the power of more than 170 collaborating centres in 36 countries around the world, which are linked to CERN. Every day WLCG processes more than 1.5 million "jobs", corresponding to a single computer running for more than 600 years.