Articles
12.09.2010
Back to dCache.org All Abstracts All Articles Latest Article
All Articles
1.4.2010 dCache going standard
The NFS v4.1 dCache implementation.

10.3.2010 German Storage Support Group
Proactive support with an emphasis on training

17.2.2010 Pool to pool tuning.
Improved Pool to pool tuning in 1.9.5

11.1.2010 WebDAV in dCache
dCache provides standards based access to your data from the desktop

20.1.2010 Scalable SRM
Distributed Scalable dCache SRM powered by Terracotta

1.1.2099 dCache Articles
This could be your article.
Promotion
Promotion

Pool to pool tuning.

Improved Pool to pool tuning in 1.9.5

dCache supports a "hot-spot replication" feature. The idea is that "hot" pools, ones that are suffering heavy use, are detected. When the Pool Manager receives a request to read data only stored on a hot pool it will copy data from the hot pool to a less-loaded pool. When this pool-to-pool transfer completes the request may be satisfied by another pool. Since the new replica will be on a less-loaded pool, the Pool Manager will chose the freshly created file in preference to to heavily loaded pool, so distributing load away from the hot pool.

The issue then is how does the Pool Manager know when a pool is "hot"?
The long-standing algorithm uses a fixed threshold value. The combined cost is compared against a threshold value; if the pool's cost exceeds that threshold (and on-demand replication is enabled) then a read request that would normally be sent to this pool will, instead, trigger a pool-to-pool copy. The cut-off value may be configured in the admin interface using the "set costcuts" command.

 [srm-devel.desy.de] (PoolManager) admin > set costcuts -p2p=0.5 
 costcuts;idle=0.0;p2p=0.5;alert=0.0;halt=0.0;fallback=0.0 
 [srm-devel.desy.de] (PoolManager) admin > save 
The disadvantage of this approach is that, rather than detecting pools that are serving "many more" read request that their fellow pools (and so, "hot"), the algorithm selects those pools that have greater than some threshold value. The threshold value must be carefully chosen to select only pools that are hot; should the dCache system change then a new threshold value may be more appropriate. This dependency on the dCache instance means that to achieve good hot-spot replication, a site-admin must continually tune the threshold to match circumstances.


With dCache v1.9.5-nn there is a new, adaptive algorithm for triggering hot-spot replication. Instead of using a constant value as the cut-off cost for triggering pool-to-pool replication, a percentile cost is used; for example, specifying the fiftieth percential sets the cut off value to be the median pool cost. The value is calculated dynamically, taking into account new pool costs as they are received from the pools.


To use the new algorithm, you must configure the cut-off cost to be some number with the percentage symbol as a suffix; for example, to use the median value, specify "50%". The "%" at the end of the number indicates that the new algorithm should be used.


Specifying the ninety five percentile (configured as "95%") would mean that the cost cut-off for hot-spot replication is the ninety fifth percentile. The ninety fifth percentile cost is the cost of the pool that is 95% along a list of dCache pools that are sorted in ascending order of cost. If "95%" is specified as the cost cut-off then (roughly) 95% of pools will have a cost below the cut-off value and read requests to those pools will not trigger pool-to-pool transfers.

 [srm-devel.desy.de] (PoolManager) admin > set costcuts -p2p=95% 
 costcuts;idle=0.0;p2p=95.0%;alert=0.0;halt=0.0;fallback=0.0 
 [srm-devel.desy.de] (PoolManager) admin > save 


What this means is that the adaptive algorithm will trigger p2p replication for a fixed number of pools, rather than for a fixed cost. As the distribution of load on the pool changes, read requests destined for a particular pool may trigger replication or not; however, at any one time, read requests that target pools from a fixed-size list will suffer replication. For the above "95%" example, read requests to the pools within the top 5% loaded pools will trigger pool-to-pool replication.


This approach works because the likelihood of a read request triggering replication depends on how likely it is that a read request will land on a "hot" pool. The (percentage) number of pools that trigger replication is fixed. If the requests are evenly spread over all available pools then the likelihood of a read request will be simply (100% - cut-off). For the "95%" example, read requests involving 5% of pools will trigger replication; if the requests are evenly spread then the likelihood of a read request triggering a replication is 5%.


However, if the requests are somehow correlated then the likelihood that a read request will involve a hot pool will increase. For example, if all the files from some experiment's dataset are stored on the same pool then jobs in the batch-system that are processing the data will introduce a correlation; the likelihood of a read requests using this particular pool will increase. If a batch-farm are processing jobs that are reading files from the same dataset, a dataset stored exclusively on a single pool, then the likelihood of read requests involving a "hot" pool (so triggering replication) will be high. If there is no other activity in the storage element, it can be 100%.


After a file has been replicated, the likelihood that a read request for that file will involve one of the fixed-number of hot pools will decrease. Depending on demand, additional replicas may be made until there is a balance between the number of replicas and load.