(1) University of Glasgow, Glasgow, G12 8QQ, Scotland
(2) CERN, European Organization for Nuclear Research, 1211 Geneva, Switzerland
The Large Hadron Collider (LHC) at CERN, the European Organization for
Nuclear Research, will produce unprecedented volumes of data when it starts
operation in 2007. To provide for its computational needs, the
LHC Computing Grid (LCG) will be deployed as a worldwide
computational grid service, providing the middleware upon which
the physics analysis for the LHC will be carried
out. In 2003, versions of this middleware were deployed which were based
on the middleware produced by the European Data Grid project (EDG). In
2004 the LCG-2 release, which consisted of the EDG middleware with
some minor modifications, was deployed for use by the LHC experiments.
A series of data challenges by these experiments were the first real
experiment production use of LCG. During the course of the data
challenges, many issues and problems were exposed which had not shown up
in more limited tests. The deployment, service and development teams
worked closely with the experiments to understand these issues and while
some of the problems were solved during the data challenges, others
exposed fundamental problems with the middleware as deployed in LCG-2.
One of these fundamental problems was the performance under
real load of the catalog component provided by EDG, the Replica Location
Service. To solve these problems a new component was designed,
the LCG File Catalog (LFC). The LFC moves away from the Replica
Location Service model used in previous LCG releases, towards a
hierarchical filesystem model which is more like a UNIX filesystem.
It also adds missing functionality which was requested by the experiments.
This paper presents the architecture and implementation of the LFC and
evaluates it in a series of performance tests, with up to forty million
entries and one hundred requesting threads from multiple clients. The
results show good scalability up to the limits of these tests, and
compare favourably with other Grid catalog implementations.
14th IEEE International Symposium on High Performance Distributed Computing
(HPDC-14)
Research Triangle Park, North Carolina, USA