Data Logistics and the CMS Analysis Model
Managan, Julie E.
The Compact Muon Solenoid Experiment (CMS) at the Large Hadron Collider (LHC) at CERN has brilliant prospects for uncovering new information about the physical structure of our universe. Soon physicists around the world will participate together in analyzing CMS data in search of new physics phenomena and the Higgs Boson. However, they face a significant problem: with 5 Petabytes of data needing distribution each year, how will physicists get the data they need? How and where will they be able to analyze it? Computing resources and scientists are scattered around the world, while CMS data exists in localized chunks. The CMS computing model only allows analysis of locally stored data, "tethering" analysis to storage. The Vanderbilt CMS team is actively working to solve this problem with the Research and Education Data Depot Network (REDDnet), a program run by Vanderbilt's Advanced Computing Center for Research and Education (ACCRE). I participated in this effort by testing data transfers into REDDnet via the gridFTP server, a File Transfer Protocol which incorporates an LHC Computing Grid security layer. I created a test suite which helped identify and solve a large number of problems with gridFTP. Once optimized, I achieved sustained throughputs of 700-800 Megabits per second (Mbps) over a 1 Gigabit per second (Gbps) connection, with remarkably few failures. GridFTP is the gateway between REDDnet and CMS, and my tests were designed to exercise and harden this important tool. My results support other indications that the REDDnet system will be a successful solution to the limitations of data-tethering in the CMS computing model.