File Version: $Id: Resilient_dCache_v1_0.html,v 1.4 2005/02/10 19:42:14 cvs Exp $
| RM |
Resilience Manager, internally a.k.a. Replica
Manager |
| File |
You know what it is but where is it ?
When you have multiple copies of the same file in distributed storage
"file" becomes an abstract concept to represent set of data somewhere in
the system. In dCache file is identified by pnfs ID. File is represented
by its replicas in the pools. Replicas may be copied to other pools or
deleted. |
| Replica |
the copy of the file in the pool. In dCache
replica is identified by pnfsId and pool name. |
| Pool |
space on the disk to store file replicas ;-) |
| Adjustment |
process of keeping number of replicas within
the valid range by deleting ('reducing') extra ('redundant') replicas or
replicating deficient replicas. |
| Unique replica |
replica which can be found in drainoff or offline
pool but not in online pool. See more in pool states. |
| set pool <pool> <state> |
set pool state |
| show pool <pool> |
show pool state |
| ls unique <pool> | check if pool drained off (has unique pndfsIds).
Reports number of replicas in this pool. Zero if no locked replicas. |
| exclude <pnfsId> |
exclude <pnfsId> from adjustments |
| release <pnfsId> |
removes transaction/'BAD' status for pnfsId |
| debug true | false |
enable / disable DEBUG messages in the log file |
psu create pgroup ResilientPools
psu addto pgroup ResilientPools <myPoolName001>Pools included in the resilient pool groop can also be included in other pool groups.
psu addto pgroup ResilientPools <myPoolName002>
psu addto pgroup ResilientPools <myPoolName003>
| General |
|
| -min=2 -max=3 |
Valid range for the replicas count in 'available'
pools. |
| -debug=false | true |
Disable / enable debug messages in the log file |
| Startup
mode |
|
| -hotRestart default |
Startup will be accelerated, when all "known"
pools registered in DB as 'online' before the crash, will re-connect again
during hot restart. Opposite to -coldStart. |
| -coldStart optional |
Good for the first time or big changes in pool
configuration. Will create new pool configuration in DB. Opposite to -hotRestart. |
| -delayDBStartTO=1200 20 min |
on Cold Start: DB init thread sleep this time to get chance to pools to get connected to prevent massive replications when not all pools connected yet when the replication starts. |
| -delayAdjStartTO=1260 21 min |
Normally Adjuster waits for DB init thread to
finish. If by some abnormal reason it can not find DB thread then it will
sleep for this delay. It should be slightly more then "delayDBStartTO". |
| DB connection |
|
-dbURL=jdbc:postgresql://dbservernode.domain.edu:5432/replicas |
|
| Configure host:port where DB server is running and
DB table name. For DB on remote host you shall enable TCP connections to DB
from your host (see installation instructions). |
|
-jdbcDrv=org.postgresql.Driver |
DB driver. Replica Manager was tested with Postgres
DB only. |
-dbUser=myDBUserName |
Configure different DB user |
-dbPass=myDBUserPassword |
Configure differen DB path |
| Delays |
|
| -maxWorkers=4 |
Number of worker threads to do the replication,
the same number of worker threads used for reduction. Must be more for
larger system but avoid situation when requests get queued in the pool. |
| -waitReplicateTO=43200 12 hours |
Timeout for pool-to-pool replica copy transfer. |
| -waitReduceTO=43200 12 hours |
Timeout to delete replica from the pool. |
| -waitDBUpdateTO=600 10 min |
Adjuster cycle period. If nothing changed, sleep
for this time, and restart adjustment cycle to query DB and check do we
have work to do ? |
| -poolWatchDogPeriod=600 10 min |
Pools Watch Dog pool period. Poll the pools
with this period to find if some pool went south without sending notice
(messages). Can not be too short because pool can have high load and do
not send pings for some time. Can not be less than pool ping period. |
| For "Hybrid" dCache
replicaManger ONLY |
|