20.1.2010
Scalable SRMDistributed Scalable dCache SRM powered by Terracotta |
The Storage Resource Manager (SRM) protocol used by LHC experiments uses an X509-based GSI security mechanism for users' authentication, delegation of user credentials and for control channel encryption. While being secure, flexible and extensible, GSI has drawbacks: algorithms that implement GSI use great deal of CPU resources and require large amounts of computer memory. GSI limitations, together with the fact that dCache SRM Server can be deployed only on a single node, mean that a dCache's SRM can service a limited rate of operations, bound by the CPU resources of a single computer. Depending on the peformance of the LHC experiments and the size of the Storage Element, the upper processing rate of a single SRM frontend could be below the maximum rate of operations that LHC experiments will require from their Storage Systems once LHC production reaches its peak.
One way to increase the performance of the SRM server is to eliminate bottlenecks and improve the performance of the underlying GSI libraries. The dCache collaboration has identified several such bottlenecks and fixing them has increased the SRM's performance many fold. These fixes have been offered to Globus and are now included in the current releases of their GSI libraries.
Another way to address the LHC needs is to make the SRM server scale horizontally: allowing several computers to work together in providing a unified service for end users. This is the direction of the Terracotta work; it aims to provide support for a distributed SRM server using Terracotta[1]. The introduction of Terracotta allows the deployment of several dCache SRM servers within the same dCache instance, each running on a different server. Using a Network Load Balancer, these SRM servers will appear as a single endpoint. In addition to increased performance, this architecture also affords users a more stable service. The SRM service will continue even if one of the SRM servers fails.
Terracotta allows the deployment of several dCache SRM servers within the same dCache instance, each running on a different server. Using a Network Load Balancer, these SRM servers will appear as a single endpoint. In addition to increased performance, this architecture also affords users a more stable service. The SRM service will continue even if one of the SRM servers fails.
In a nutshell, here's how it works: the desired number of computers are configured to run as an SRM servers. In addition, each of these servers is also configured to be part of a Terracotta cluster. This cluster will also need at least one Terracotta server. When the Terracotta server(s) and SRM servers are started, they establish a cluster of cooperating computers that can store shared information. Using this Terracotta cluster, the SRM servers then share information about current user activity. This allows a user, who has requested storing data in dCache by sending a request to one SRM server, to query the progress of their request with any of the SRM servers. By storing all SRM activity in the Terracotta “Network Attached Shared Memory”, the SRM servers can all work together, acting as a single SRM instance.
For detailed instructions on how to configure dCache SRM to run with Terracotta please see [2].