Abstract
The dCache Book is the guide for administrators of dCache systems. The first part describes the installation of a simple single-host dCache instance. The second part describes the components of dCache and in what ways they can be configured. This is the place for finding information about the role and functionality of components in dCache as needed by an administrator. The third part contains solutions for several problems and tasks which might occur during operating of a dCache system. Finally, the last two parts contain a glossary and a parameter and command reference.
Table of Contents
pnfspnfs Data from GDBM to PostgreSQLList of Figures
List of Tables
PoolManager Hopping Request AttributesList of Examples
config/gridftpdoor.batchtableConfig.xml configuration fileplotConfig.xml configuration file pltnames.xml configuration fileGridFTP doors serving separate network interfacesGridFTP doors serving separate network interfacesconfig/pool.batch file
for multiple mover queuesIn this part is intended for people who are new to dCache. It gives an introduction to dCache, including how configure a simple setup and details some simple and routine administrative operations.
dCache is a distributed storage solution. It organises storage across computers so the combined storage can be used without the end-users being aware of on precisely which computer their data is stored; end-users see simply a large amount of storage.
Because end-users need not know on which computer their data is stored, their data can be migrated from one computer to another without any interruption of service. This allows dCache storage computers to be taken out of service or additional machines (with additional storage) to be added without interrupting the service the end-users enjoy.
dCache supports requesting data from a tertiary storage system. A tertiary storage system typically uses a robotic tape system, where data is stored on a tape from a library of available tapes, which must be loaded and unloaded using a tape robot. Tertiary storage systems typically have a higher initial cost, but can be extended cheaply by added additional tapes. This results in tertiary storage systems being popular where large amounts of data must be read.
dCache also provides many transfer protocols (allowing users to read and write to data). These have a modular deployment, allowing dCache to support expanded capacity by providing additional front-end machines.
Another performance feature of dCache is hot-spot data migration. In this process, dCache will detect when a few file are being requested very often. If this happens, dCache can make duplicate copies of the popular files on other computers. This allows the load to be spread across multiple machines, so increasing throughput.
The flow of data within dCache can also be carefully controlled. This is especially important for large sites as chaotic movement of data may lead to suboptimal usage; instead, incoming and outgoing data can be marshaled so they use designated resources; allowing better throughput and guaranteeing end-user experience.
dCache provides a comprehensive administrative interface for configuring the dCache instance. This is described in the later sections of this book.
The layer model shown in Figure 1.1, “The dCache Layer Model” gives an overview of the architecture of the dCache system.
Table of Contents
The first section describes the installation of a fresh dCache instance using RPM files downloaded from the dCache homepage. It follows a guide to upgrading an existing installation. In both cases we assume standard requirements of a small to medium sized dCache instance without an attached tertiary storage system. The third section contains some pointers on extended features.
In the following the installation of a central admin node of a
dCache instance and of an arbitrary number of dCache nodes
will be described. These nodes may each contain several
dCache pools and
optionally one SRM, one GridFTP door, and/or one GSIdCap
door. On the admin node, a pnfs server and several central
dCache components are installed. The pnfs server, some
central components, and each SRM need an PostgreSQL server
installed locally on the node. The first section describes the
configuration of a PostgreSQL server. After that the
installation of the pnfs server and of the dCache
components will follow. During the whole installation process
root access is required.
In order to install dCache the following requirements must be met:
An RPM-based Linux distribution is required.
dCache 1.8 requires Java 1.5 or 1.6 SDK. Previous releases of dCache can use Java 1.4.2. It is recommended to use the newest Java release available within the release series used.
PostgreSQL must be installed and running. See the section called “Installing a PostgreSQL Server” for more details. It is strongly recommended to use version 8 or higher.
The RPM packages may be installed right away on each node, for example using the command:
[root] #rpm -ivh dcache-server-<version>-<release>.i386.rpm[root] #rpm -ivh dcache-client-<version>-<release>.i386.rpm
The pnfs server software on the admin node can be installed
with the command:
[root] # rpm -ivh pnfs-postgresql-<version>-<release>.i386.rpmYou must configure PostgreSQL for use by dCache and create the necessary PostgreSQL user accounts and database structure. This section describes how to do this.
Using a PostgreSQL server with dCache places a number of requirements on the database. This section describes what configuration is necessary to ensure PostgreSQL operates so dCache can use it.
If you have edited PostgreSQL configuration files, you must restart PostgreSQL for those changes to take effect. On many systems, this can be done with the following command:
[root] # /etc/init.d/postgresql restart
When connecting to PostgreSQL, dCache will always use TCP
connections. So, for dCache to use PostgreSQL, support for
TCP sockets must be enabled.
In contrast to dCache, the PostgreSQL stand-alone client
application psql can connect using
either a TCP socket or via the local filesystem (a
“UNIX” socket). Because of this, it is
common for PostgreSQL to disable TCP sockets by default,
requiring the admin to explicitly configure PostgreSQL so
connecting via a TCP socket is supported.
To enable TCP sockets, edit the PostgreSQL configuration
file postgresql.conf. This is often
found in the /var/lib/pgsql/data, but may
be located elsewhere. You should ensure that the line
tcpip_socket is set to
true; for example:
tcpip_socket = true
Perhaps the simplest configuration is to allow password-less access to the database and the following documentation assumes this is so.
To allow local users to access PostgreSQL without requiring a
password, ensure the file
pg_hba.conf, usually located in
/var/lib/pgsql/data,
contains the following lines.
local all all trust host all all 127.0.0.1/32 trust host all all ::1/128 trust
Please note it is also possible to run dCache with all PostgreSQL accounts requiring passwords.
Prepare the PostgreSQL users and databases as they are
needed for the components of dCache and/or the pnfs
server: The pnfs server only needs a database user. We
suggest to call it pnfsserver. Create it with:
[root] # createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt pnfsserver
Several databases will be created by this user. At initial
installation, as described below, two databases will be
created: admin and
data1. These databases will contain the
information about the namespace of the pnfs filesystem. If
the information in these databases is lost, the whole data
in the dCache instance is not accessible
anymore. Therefore, make sure these databases are backed up
regularly and also stored on appropriately reliable
hardware. Further advice may be found in Chapter 21, PostgreSQL and dCache.
The dCache components will access the database server with the user srmdcache which can be created with the createuser; for example:
[root] # createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt srmdcache
Several central components running on the admin node as well
as each SRM will use the database
dcache for state information:
[root] # createdb -U srmdcache dcacheThere might be several of these on several hosts. Each is used by the dCache components running on the respective host.
The pnfs companion uses the database
companion to store the pools all files
are located on. On the admin node create and initialize it
with the command:
[root] #createdb -U srmdcache companion[root] #psql -U srmdcache companion -f /opt/d-cache/etc/psql_install_companion.sql
(It has to be located on the same host as the pnfs
server.)
If the resilience feature provided by the replica manager is used, the database “replicas” has to be prepared on the admin node with the command:
[root] #createdb -U srmdcache replicas[root] #psql -U srmdcache replicas -f /opt/d-cache/etc/psql_install_replicas.sql
Note that the disk space will at least be cut in half if the replica manager is used.
If the billing information should also be stored in a database (in addition to files) the database billing has to be created:
[root] # createdb -U srmdcache billing
However, we strongly advise against using the same database
server for the pnfs server and the billing information.
For how to configure the billing cell to write into this
database, see below.
The pnfs server software is installed in the directory
/opt/pnfs/. For the
installation copy the file
/opt/pnfs/etc/pnfs_config.template to
/opt/pnfs/etc/pnfs_config. The default
should be suitable for most installations. It contains:
PNFS_INSTALL_DIR = /opt/pnfs PNFS_ROOT = /pnfs PNFS_DB = /opt/pnfsdb PNFS_LOG = /var/log PNFS_OVERWRITE = no PNFS_PSQL_USER = pnfsserver
Next run
/opt/pnfs/install/pnfs-install.sh. This
will write the central configuration file
/usr/etc/pnfsSetup and initialize the
databases in the PostgreSQL server as well as configuration
information below /opt/pnfsdb/ (as configured by
PNFS_DB in
/opt/pnfs/etc/pnfs_config). For
example:
[root] # /opt/pnfs/install/pnfs-install.sh
PNFS_PSQL_USER = pnfsserver
Checking nfs servers : Ok
Preparing setup : Ok
Creating database admin
Creating database data1
Starting pnfs server ... Ok
Trying to talk to dbserver 0 [1122] ... Ok
Trying to talk to dbserver 1 [1122] ... Ok
Trying to mount 'pnfs' : Ok
Correcting pnfs permissions : Ok
Detecting wormhole target (config) : 0000000000000000000010E0
Digging wormholes : dig-0-ok dig-1-ok Done
Creating database link : Ok
Setting mount permissions to world : Ok
Remarks :
ii) Any host may now mount this pnfs server
mount -o intr,rw,noac,hard <thisServerName>:/pnfs /<mountdir>
Installation of PNFS completed - stop PNFS
Stopping Heartbeat .... Ready
Killing pnfsd Done
Killing pmountd Done
Killing dbserver . Done
Removing 8 Clients 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+
Removing 8 Servers 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+
Removing main switchboard ... O.K.
The pnfs server may now be started with
/opt/pnfs/bin/pnfs start, for example:
[root] # /opt/pnfs/bin/pnfs start
Starting dcache services: Shmcom : Installed 8 Clients and 8 Servers
Starting database server for admin (/opt/pnfsdb/pnfs/databases/admin) ... O.K.
Starting database server for data1 (/opt/pnfsdb/pnfs/databases/data1) ... O.K.
Waiting for dbservers to register ... Ready
Starting Mountd : pmountd
Starting nfsd : pnfsd
This script may be linked from /etc/init.d/. Please do not copy
the init script as it may change between releases. On Red Hat
derived Linux distributions the pnfs may be configured to
start at boot time with
[root] #chkconfig --add pnfs[root] #chkconfig pnfs on
It is advisable to create a basic directory structure in the
pnfs namespace where each directory uses a separate pnfs
database. This is still true for the PostgreSQL version of
pnfs since the pnfs server uses global locks on each
database. the section called “The Databases of pnfs” describes how it is
done. WLCG sites should use at least two databases for each VO
they support: One for the directory
/pnfs/<domainName>/data/<
voName>/
and one for the generated files, i.e. for
/pnfs/<domainName>/data/<
voName>/generated/.
Use the templates of the configuration files found in
/opt/d-cache/etc/ to
create the following files.
The central configuration file of a dCache instance is
/opt/d-cache/config/dCacheSetup. For most
installations it is only necessary to set the variable
java to the binary of the Java VM and the
variable serviceLocatorHost to the hostname
of the admin node. Note that the file has to go into the
subdirectory config/
even though the template is found in etc/.
The installation and start-up scripts use the information in
/opt/d-cache/etc/node_config. The
variable NODE_TYPE controls whether the
admin node should be installed or just pools and/or
doors. Accordingly set it to “admin” or
“pool” (for doors, as well). All other variables
may be left at their default value.
For authorization of grid users the file
/opt/d-cache/etc/dcache.kpwd is
needed. Note that it may be generated from the standard
/etc/grid-security/grid-mapfile with the
tool grid-mapfile2dcache-kpwd which is
distributed with the WLCG software.
How to proceed from here depends on whether the release to be installed is older than 1.8.0-14 or not. In particular init scripts and pool creation procedures have changed in dCache 1.8.0-14. Both procedures are described in the following sections.
Once dCache is installed, the section called “Using dCache as an LCG Storage Element” may be consulted about how to activate the info provider included since version 1.6.6-2 based on an earlier WLCG installation.
Whether and how many pools should be installed on the current
node is configured by
/opt/d-cache/etc/pool_path. Each line in
this file describes one pool. The format is as follows:
<poolDataDirectory> <poolSizeInGB> <reinstallPoolYesNo>
where <poolDataDirectory> is the full path to the directory which will contain the data files as well as some of the configuration of the pool, <poolSizeInGB> is the size of the pool. Make sure that there is always enough space under <poolDataDirectory>. Be aware that only pure data content is counted by dCache. Leave enough room for configuration files and filesystem overhead.
When all configuration files are prepared, configure the system with
[root] # /opt/d-cache/install/install.sh
[INFO] No 'SERVER_ID' set in 'node_config'. Using SERVER_ID=<your.domain>.
[INFO] Moving /opt/d-cache/bin/dcache-opt out of the way, because it is obsolete.
[INFO] Creating link /pnfs/ftpBase --> /pnfs/fs which is used by the GridFTP door.
[INFO] Creating link /pnfs/<your.domain> --> /pnfs/fs/usr/
[INFO] Checking on a possibly existing dCache/PNFS configuration ...
[INFO] Configuring pnfs export '/pnfsdoors' (needed from version 1.6.6 on)
mountable by world.
[INFO] You may restrict access to this export to the GridFTP doors which
are not on the admin node. See the documentation.
[INFO] Generating ssh keys:
Generating public/private rsa1 key pair.
Your identification has been saved in ./server_key.
Your public key has been saved in ./server_key.pub.
The key fingerprint is:
e3:4a:13:d5:33:45:e0:cd:69:a3:fb:d7:a8:64:df:73 root@grid-se3.desy.de
[INFO] Creating Pool <hostname>_1
[INFO] Creating Pool <hostname>_2and start the central components (only on the admin node) with
[root] # /opt/d-cache/bin/dcache-core start
Starting dcache services:
Starting lmDomain 6 5 4 3 2 1 0 Done (pid=6802)
Starting dCacheDomain 6 5 4 3 2 1 0 Done (pid=6875)
Starting dirDomain 6 5 4 3 2 1 0 Done (pid=6973)
Starting doorDomain 6 5 4 3 2 1 0 Done (pid=7058)
Starting adminDoorDomain 6 5 4 3 2 1 0 Done (pid=7144)
Starting httpdDomain 6 5 4 3 2 1 0 Done (pid=7234)
Starting utilityDomain 6 5 4 3 2 1 0 Done (pid=7330)
Starting pnfsDomain 6 5 4 3 2 1 0 Done (pid=7436)
Starting gridftp-clintonDomain 6 5 4 3 2 1 0 Done (pid=7569)
Starting gsidcap-clintonDomain 6 5 4 3 2 1 0 Done (pid=7672)
Starting srm-clintonDomain 6 5 4 3 2 1 0 Done (pid=7777)the configured pools are started with
[root] # /opt/d-cache/bin/dcache-pool start
Starting dcache pool: Starting clintonDomain 6 5 4 3 2 1 0 Done (pid=7990)
These scripts may be linked from /etc/init.d/. Please do not
copy the init scripts as they may change between
releases. On Red Hat derived Linux distributions dCache
may be configured to start at boot-time using
chkconfig, for example:
[root] #chkconfig --add dcache-core[root] #chkconfig --add dcache-pool[root] #chkconfig dcache-core on[root] #chkconfig dcache-pool on
A new init script was introduced in release 1.8.0-14. In addition to being able to start and stop dCache, it provides commands for creating and configuring pools. Thus pool creation is no longer part of the install script and pools can be created after the install script has been executed. We therefore proceed by finalising the initial configuration by executing /opt/d-cache/install/install.sh, for example:
[root] # /opt/d-cache/install/install.sh
INFO:Skipping ssh key generation
Checking MasterSetup ./config/dCacheSetup O.k.
Sanning dCache batch files
Processing adminDoor
Processing chimera
Processing dCache
Processing dir
Processing door
Processing gPlazma
Processing gridftpdoor
Processing gsidcapdoor
Processing httpd
Processing info
Processing infoProvider
Processing lm
Processing maintenance
Processing pnfs
Processing pool
Processing replica
Processing srm
Processing statistics
Processing utility
Processing xrootdDoor
Checking Users database .... Ok
Checking Security .... Ok
Checking JVM ........ Ok
Checking Cells ...... Ok
dCacheVersion ....... Version production-1-8-0-14
No pools have been created on the node yet. Adding pools to a node is a two step process:
The directory layout of the pool is created and filled with a skeleton configuration using dcache pool create <poolSize> <poolDirectory>, where <poolDirectory> is the full path to the directory which will contain the data files as well as some of the configuration of the pool, and <poolSize> is the size of the pool, specified in bytes or with a M, G, or T suffix (for mibibytes, gibibytes and tibibytes, respectively).
Make sure that there is always enough space under <poolDirectory>. Be aware that only pure data content is counted by dCache. Leave enough room for configuration files and filesystem overhead.
Creating a pool does not modify the dCache configuration.
The pool is given a unique name and added to the dCache configuration using dcache pool add <poolName> <poolDirectory>, where <poolDirectory> is the directory in which the pool was created and <poolName> is a name for the pool. The name must be unique throughout the whole dCache installation, not just on the node.
Adding a pool to a configuration does not modify the pool or the data in it and can thus safely be undone or repeated.
An example may help to clarify the use of these commands:
[root] #/opt/d-cache/bin/dcache pool create 500G/q/pool1Created a 500 GiB pool in /q/pool1. The pool cannot be used until it has been added to a domain. Use 'pool add' to do so. Please note that this script does not set the owner of the pool directory. You may need to adjust it.[root] #/opt/d-cache/bin/dcache pool add myFirstPool/q/pool1/Added pool myFirstPool in /q/pool1 to dcache-vmDomain. The pool will not be operational until the domain has been started. Use 'start dcache-vmDomain' to start the pool domain.[user] $/opt/d-cache/bin/dcache pool ls Pool Domain Size Free Path myFirstPool dcache-vmDomain 500 550 /q/pool1 Disk space is measured in GiB.
All configured components can now be starting with dcache start, for example:
[root] # /opt/d-cache/bin/dcache start
Starting lmDomain Done (pid=7514)
Starting dCacheDomain Done (pid=7574)
Starting pnfsDomain Done (pid=7647)
Starting dirDomain Done (pid=7709)
Starting adminDomain Done (pid=7791)
Starting httpdDomain Done (pid=7849)
Starting utilityDomain Done (pid=7925)
Starting gPlazma-dcache-vmDomain Done (pid=8002)
Starting infoProviderDomain Done (pid=8081)
Starting dcap-dcache-vmDomain Done (pid=8154)
Starting gridftp-dcache-vmDomain Done (pid=8221)
Starting gsidcap-dcache-vmDomain Done (pid=8296)
Starting dcache-vmDomain Done (pid=8369)
Upgrading to bugfix releases within one version (e.g. from 1.6.6-1 to 1.6.6-3) may be done by shutting down the server and upgrading the packages with
[root] # rpm -Uvh <packageName>For details on the changes, pease refer to the change log.
This section describes the upgrade of dCache instances installed with the previous version (currently version 1.6.5) and also some earlier versions - notably the one distributed with the previous or current WLCG software. The first section will give a quick upgrade guide. It might not be applicable to complex setups. Not all features of the new version will be enabled after the quick upgrade guide. The next section will give pointers on how to enable them.
Note that the upgrade of dCache is independent of a
conversion of the pnfs database from GDBM to PostgreSQL. The
conversion and upgrade to the PostgreSQL version of pnfs may
be performed any time before or after the upgrade of
dCache. Do not perform the dCache upgrade and the pnfs
database conversion simultaneously. It is better to do them one
after the other, and test the system inbetween. Do not be
mislead by the fact the dCache release only contains the
PostgreSQL version of pnfs (since version 1.6.6). See Chapter 19, Moving the pnfs Data from GDBM to PostgreSQL for a guide to convert and upgrade
the pnfs server.
In case you are already using PostgreSQL (e.g. for the SRM),
it is a good idea to upgrade to version 8 now, because prior to
dCache Version 1.6.6 no precious data is stored and therefore
can be wiped off, allowing a PostgreSQL 8 installation from
scratch. Starting from dCache version 1.6.6 PostgreSQL will
be utilized more heavily, making migration a complex
task. Another advantage of PostgreSQL 8 is an integrated
mechanism for automatic backups.
Stop the dCache services on all nodes of the instance:
[root] #/opt/d-cache/bin/dcache-pool stop[root] #/opt/d-cache/bin/dcache-opt stop[root] #/opt/d-cache/bin/dcache-core stop
Leave the pnfs server running. In WLCG installations, there
might be a “meta-package” installed which can
prevent the update to the current version. It should be
deinstalled. The following command will do that and will not
harm if the metapackage is not installed. Therefore, go ahead
and do it anyway:
[root] # rpm -e lcg-SE_dcacheUpgrade the dCache RPM packages with
[root] # rpm -Uvh dcache-server-1.6.6-1.i386.rpm dcache-client-1.6.6-1.i386.rpm
For this quick upgrade you have to keep your old configuration
files (i.e. config/dCacheSetup,
config/PoolManager.conf
etc/node_config,
etc/door_config, and
etc/pool_path). Do not use the
templates.
However, make sure that etc/node_config
contains
PNFS_OVERWRITE=no
and that a single value is assigned to
NODE_TYPE. Check that
etc/pool_path contains
“no” in the third field of
each line. You might also want to doublecheck the contents of
etc/door_config on each node.
Run the install script:
[root] # /opt/d-cache/install/install.shAnd start the server again with
[root] #/opt/d-cache/bin/dcache-core start[root] #/opt/d-cache/bin/dcache-pool start
Note that the start-up script for the optional components is not needed anymore. Therefore, it is probably best to remove them:
[root] # rm /opt/d-cache/bin/dcache-opt /etc/init.d/dcache-optThis section gives a few hints for solving problems and fine-tuning after the upgrade.
Check if the information given in the files
/opt/d-cache/etc/node_config and
/opt/d-cache/etc/door_config is correct:
Check, that a single value is assigned to
NODE_TYPE in
/opt/d-cache/etc/node_config. If the
assignment contains several words, the behaviour of some
previous versions might be different from the new one.
Check that the doors which are started in the (now obsolete)
/opt/d-cache/bin/dcache-opt start-up
script are also enabled in
/opt/d-cache/etc/door_config. The latter
file is now evaluated by the start-up script
/opt/d-cache/bin/dcache-core and not by
the install script any more. This might lead to a different
behaviour.
If, prior to the upgrade, you changed anything in a batch file
(config/<domainName>.batch)
these changes will be moved to files with names
config/<domainName>.batch.rpmsave.
There have been major changes to the batch files. Therefore it
is necessary to reapply your changes. However, keep in mind
that the batch files are considered to be part of the software
and not configuration files.
It should not be necessary to change them in most
situations. Try to find a prober configuration variable in
config/dCacheSetup. (See the template in
etc/dCacheSetup.template for hints.) If
it should still be necessary to change a batch file, contact
<support@dcache.org> and report a “request
for enhancement”. (See Chapter 5, The Cell Package
for background information on the batch files.)
Your old config/PoolManager.conf will not
be overwritten by the upgrade. Its format did not
change. Therefore, it is fine to keep your old one. In case
you did not customize the pool manager configuration, make
sure that the set costcuts line reads
set costcuts -idle=0.0 -p2p=2.0 -alert=0.0 -halt=0.0 -fallback=0.0
Prior versions installed a
config/PoolManager.conf with
-idle=1.0 which will lead to undesired
behaviour of the pool manager.
Before switching on the companion in
config/dCacheSetup with the line
cacheInfo=companion
you have to be aware of the following:
For each file the list of pools the file is stored on (the
cache info) is now stored within the pnfs namespace
metadata. When switching on the companion, the dCache
system expects it to be stored in a PostgreSQL database. You
should first create this database:
[root] #createdb -U srmdcache companion[root] #psql -U srmdcache companion -f /opt/d-cache/etc/psql_install_companion.sql
on the node where the PnfsManager will run (normally the admin node). Now, put
cacheInfo=companion
into config/dCacheSetup and restart the
PnfsManager:
/opt/d-cache/bin/dcache-core restartNow the dCache system will not be aware of any files stored on the pools. To make it aware again, you have to go through the following steps: Since this will take a while and will put a considerable load on the PnfsManager, take care that this is done with one pool at a time. You should also plan for a downtime:
In the admin interface (see the section called “The Admin Interface”) go to a pool, e.g.
(local) admin > cd <hostname>_1and issue the command
(<poolname>) admin > pnfs registerThen go to the pnfs manager:
(<poolname>) admin >..(local) admin >cd PnfsManager
Check the output of the “info” command repeatedly:
(PnfsManager) admin > info
...
Threads (4) Queue
[0] 10
[1] 12
[2] 9
[3] 13
...and wait till the value for all four queues is zero. Then go to the next pool and repeat the process.
Some features of dCache are switched off by default after an installation. The following describes how to put them to use:
The billing information which is normally written to /opt/d-cache/billing/ on the
admin node will also be written to a database if
config/dCacheSetup contains
billingToDb=yes
A PostgreSQL server is expected to run on the admin node with
a database user “srmdcache”
and a database “billing” with
[root] #createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt srmdcache[root] #createdb -U srmdcache billing
The space reservation feature between the SRM and the
GridFTP door may be switched on with
spaceReservation=true
The latest release of the dCache distribution is version 1-6-5.
Release notes describing new features and bug fixes can be found at
http://www.dcache.org/manuals/experts_docs/rel-dcache-1-6-5.html
Note for SRM users
------------------
- The latest SRM client (used with srmcp) has an extended set of
parameters. Therefore it is necessary to renew the config file
that is typically located in the home directory of the user running
the client command (~/.srmconfig/config.xml). Simply remove the
file config.xml, it will be re-generated following the new format
when running srmcp again.
- The SRM client code in release 1-6-5 provides compatibility with
CERN's CASTOR SRM implementation. Therefore all users that are
interested in SRM based data transfer between CASTOR and their
dCache instance should upgrade to the client RPM that is part of
the 1-6-5 distribution.
=======================================================================
Note: From version 1.2.2-6 on it is required to have a Postgres database
installed and activated on the node that is running the SRM server.
Note: If you have installed version 1.2.2-6(-1) or a later version
you need to drop the postgres tables. This is required because of
a db schema change.
Perform the following steps to remove the tables:
1. locate configuration file srm.batch for dcache srm and find values of
parameters jdbcUrl, jdbcUser and jdbcPass the last element of the jdbc url
is your database name, for example if the
value of jdbcUrl is dbc:postgresql://host/dcache then the name of the
database is dcache.
2. Use these parameters and the "psql" postgress client to connect to the
sql server:
$psql -U <user> -h <host> <database name>
Once psql connects to the server the command prompt will appear:
dbname=>
3. Execute the following commands (you can just cut-and-paste the following text
into the psql):
DROP TABLE copyfilerequests ;
DROP TABLE copyfilerequests_b ;
DROP TABLE copyrequests ;
DROP TABLE copyrequests_b ;
DROP TABLE getrequests_protocols ;
DROP TABLE getrequests_protocols_b ;
DROP TABLE getfilerequests ;
DROP TABLE getfilerequests_b ;
DROP TABLE getrequests ;
DROP TABLE getrequests_b ;
DROP TABLE pins ;
DROP TABLE pinrequests ;
DROP TABLE srmnextrequestid ;
DROP TABLE putrequests_protocols ;
DROP TABLE putrequests_protocols_b ;
DROP TABLE putfilerequests ;
DROP TABLE putfilerequests_b ;
DROP TABLE putrequests ;
DROP TABLE putrequests_b ;
DROP TABLE srmrequestcredentials ;
4. Make sure all tables have been dropped, type at the prompt to list all remaining tables
\dt;
Drop eventually remaining tables as described above.
You are done.
The database is used to maintain state information about ongoing
transfers in order to make them persistent to allow a restart of
transfers in case of an interrupt (e.g. server failure/maintenance,
network disconnect etc.).
Though there is no specific version required we recommend using a
recent version that is usually part of the Linux distribution
running on your system.
Hints concerning the PostgreSQL configuration are provided below.
------------------------------------------------------------------------
Note: From version 1.2.2-7 on doors (GridFTP, SRM, gsidcap) can be
installed and configured on a selective basis, and, if required, on
a node other than the admin node.
Find the details below.
------------------------------------------------------------------------
How to update a standard stand-alone dCache installation
------------------------------------------------------------------------
Because of the new SRM there are quite a few changes in the configura-
tion files. An old installation which has not been customized very much
is therefore updated most easily by doing a reinstall following these
rules: (The data in the pools will be preserved.)
-- Save copies of your old config files in /opt/d-cache/etc/ and
/opt/d-cache/config/. Remove the old packages with 'rpm -e' or just
by removing the whole directory /opt/d-cache/.
-- Install the d-cache packages according to the guide below with the
following additions:
- The PNFS system should stay as it is.
- Use your old "etc/pool_path", but set the last column
to "No". Otherwise the data in your pools would be deleted!
- Use the old "etc/node_config" and "etc/dcache.kpwd" if needed
- Create a new "config/dCacheSetup" starting from
"etc/dCacheSetup.template" as described or with the aid of the old
file.
(Try: diff old-etc/dCacheSetup.template old-config/dCacheSetup)
For a customized installation it might be better to use the existing
configuration directories /opt/d-cache/etc/ and /opt/d-cache/config/ and
adjust them to the new version of the SRM. Especially the file
"config/srm.batch" has to be adjusted. A detailed description of the
parameters in this file is given at the end of these instructions.
------------------------------------------------------------------------
Find a set of rpms (as of 01/23/2005) to install a dCache based Disk Pool
Management system (no HSM support) at
http://www.dcache.org/downloads/dcache-v1.2.2-7-j14.tgz
Get the tarball
wget http://www.dcache.org/downloads/dcache-v1.2.2-7-j14.tgz
Unzip the tarball
tar xvzf dcache-v1.2.2-7-j14.tgz
You should find the following files
Release.notes
d-cache-core-1.5.2-xx.i386.rpm
dCache-installation-instructions.txt
d-cache-client-1.0-xx.i386.rpm
d-cache-opt-1.5.3-xx.i386.rpm
dcache-user-instructions.txt
pnfs-3.1.10-xx.i386.rpm
The tar file contains 4 rpms:
- pnfs manager
- dCache core (admin/pool node)
- dCache optional components for admin node
(srm/gridftp servers and the gsidcapdoor)
- client (32 and 64 bit support for dcap access combined in a single lib
(/opt/d-cache/dcap/lib/libdcap.so), e.g dc_lseek, dc_lseek64)
To set up a dCache instance that allows to access it via the dCap protocol
the following components need to be installed
- pnfs The namespace manager (appears as a filesystem to the user)
- admin node Provides all functionalities to manage a distributed disk pool
(can also hold a pool)
- pool node A node that provides storage capacity the dCache instance which
is managed by the PoolManager running on the admin node
To extend accessibility through GridFTP and SRM optional software components can
be installed in addition to the core RPM. It is sufficient to just install the
RPM. No additional step is required.
Note: With the installation of the dCache core some configuration parameters
are stored in pnfs. Therefore the pnfs manager needs to be installed
first.
Though the pnfs manager and the dCache core (admin node) by design can
be installed on different nodes this version of the installation package
assumes that both are installed on the same physical machine.
Prerequisites
-------------
I. The dCache software is written in Java and requires a recent version of either
the JAVA developer kit (jdk) or the runtime environment (jre) to be installed.
II. In case the dCache is going to be accessed via GridFTP and/or SRM a host
certificate is required. Contact the CA responsible for your community for
details. The certificate is expected to be installed in
/etc/grid-security.
III. PostgreSQL needs to be installed on the node running the cntral dCache
services (i.e. the admin node). The db is used by the SRM server, SRM Pin Manager
and the Resilience Manager. In case these services are not running on the node
the db is installed on make sure it is allowed to connect to the db. Add
a "host" entry to the table as described below.
Get a recent version from the Linux distribution that is running on your system.
Alternatively, RPMs can be found at
http://www.postgresql.org/ftp/
A version that is suitable for current versions of RH SL3 can be found at
http://www.postgresql.org/ftp/binary/v8.0.4/linux/rpms/redhat/rhel-es-3.0/
Client, Server and JDBC support is needed.
The following instructions shall be used to configure and initialize the
databases. They need to be executed only following the installation of
the database. An upgrade of the dCache code does not require the
commands to be executed again. All commands shall be carried out by
user 'postgres'
su postgres
# Create directory the db will live in
mkdir <database_directory_name>/data
# Command to initialize DB
initdb -D <database_directory_name>/data
# Enable network access in postgres config file (default port 5432 is used)
<database_directory_name>/data/postgresql.conf
#
tcpip_socket = true
# Edit <database_directory_name>/data/pg_hba.conf to allow hosts to connect
# to the DB (records at the bottom of the file)
# TYPE DATABASE USER IP-ADDRESS IP-MASK METHOD
local all all trust
host all all 127.0.0.1 255.255.255.255 trust
host all all <IP of DB host> 255.255.255.255 trust
host all all <IP of SRM host> 255.255.255.255 trust (if SRM host != DB host)
#
# Command to start the DBMS, make sure the log file exists and
# user 'postgres' has write permission
postmaster -i -D <database_directory_name>/data >logfile 2>&1 &
[Note: You may want to create an rc-script under /etc/init.d
to automatically start the DB upon start of the system]
# Command to create the DB for the SRM
createdb dcache
# Command to connect to the DB
psql -U postgres dcache
# Create DB user 'srmdcache'
create user srmdcache password 'srmdcache';
# Disconnect from dcache db
\q
# All tables required for SRM operation will be created by the SRM
# server
# Command to create the DB for the Resilience Manager
createdb -O srmdcache replicas
# Initialize db tables for the Resilience Manager
# This step requires the dcache-core RPM (v 1.5.2-80 or higher) to be installed
psql -d replicas -U srmdcache -f /opt/d-cache/etc/pd_dump-s-U_enstore.sql
# Just for completeness: Command to stop the DBMS (as user 'postgres')
# pg_ctl stop -D <database_directory_name>/data
To install the pnfs manager follow the instructions below
-------------------------------------------------------
1. install the pnfs rpm
2. copy the template /opt/pnfs.3.1.10/pnfs/etc/pnfs_config.template =>
/opt/pnfs.3.1.10/pnfs/etc/pnfs_config
and customize pnfs_config according to your needs
The pnfs config file contains
PNFS_INSTALL_DIR = /opt/pnfs.3.1.10/pnfs
PNFS_ROOT = /pnfs
PNFS_DB = /opt/pnfsdb
PNFS_LOG = /var/log/pnfsd.log
PNFS_OVERWRITE = no
- don't overwrite pnfsdb if one exists in the place specified above
3. run the install script at
/opt/pnfs.3.1.10/pnfs/install/pnfs-install.sh
- It generates the file "pnfsSetup" in /usr/etc/
4. Start/Stop pnfs
/opt/pnfs.3.1.10/pnfs/bin/pnfs start|stop
- starts pnfs and mounts it at /pnfs/fs
5. Security
In order to minimize the administrative overhead the pnfs filesystem (/pnfs
and /fs) is exported world-wide by default. /pnfs is required by local clients
utilizing the dcap protocol, while /fs is needed by dCache doors (SRM, GridFTP,
gsidcap) that are not running on the host the pnfs filesystem is installed on.
The ability to mount these filesystems can be limited by applying a kind of
"network mask" as a file name.
The installation of pnfs installs a file called 0.0.0.0..0.0.0.0 in
/pnfs/fs/admin/etc/exports. Suppose the ability to mount /pnfs (/fs) should be
limited to local hosts living in network 123.111. Therefore the file would have
to be renamed to 255.255.0.0..123.111.0.0 or 255.255.255.0..123.111.1.0 for a
class C network. This can further be limited to individual hosts and particular
pnfs subtrees, e.g. the host with IP address 123.111.1.1 is allowed to mount
/pnfs/theorie
- create a file named 123.111.1.1
- content of the file is (one line)
/theorie /0/root/fs/usr/data/theorie 30 rw,soft
The mechanism as it is implemented will first look for the host IP address and will
apply the rule if the file exists. If it doesn't it will select the one with the
"mask" and will apply the rule therein respectively.
To install the admin node and or pool node(s) follow the instructions below
---------------------------------------------------------------------------
1. Install the dCache core rpm. In case you want to install optional
components, like the srm/gridftp servers and/or the client
components, it's a good time to install the "d-cache-opt" rpm(s) as well.
(Can also be done later.)
NOTE: In case of the intent to access data using SRM based transfers
(srmcp) with an installation with multiple pool nodes the
d-cache-opt rpm need to be installed on every pool node. In
addition to the software components each pool node needs a
host certificate and full access to the public Internet for
TCP connections in the port range from port 20000 - 50000.
From dcache-core RPM rev 1.5.2-80 on the Resilience Manager is included.
Please find more information about its functionality and configuration at
http://cmsdcam.fnal.gov/dcache/resilient/Resilient_dCache_v1_0.html.
The Resilience Manager is preconfigured but not automatically started
with the core services. The dCache core start-up script contains the
instructions required to start/stop the replica domain, but they are
commented out. Remove the "#" at the beginning of the related lines.
2. configure the installation by using the following template files
in /opt/d-cache/etc. The arrow indicates the name of the customized file
- node_config.template --> /opt/d-cache/etc/node_config
- dcache.kpwd.template --> /opt/d-cache/etc/dcache.kpwd
- dCacheSetup.template --> /opt/d-cache/config/dCacheSetup
- pool_path.template --> /opt/d-cache/etc/pool_path
- door_config.template --> /opt/d-cache/etc/door_config (!!! NEW !!!)
In case of a virgin machine (not an upgrade of an existing dCache
installation) copy the .template file to its base name (e.g.
cp node_config.template node_config) and customize the latter
according to your requirements.
Note: the final place of the dCacheSetup file is
/opt/d-cache/config/dCacheSetup. You need to copy it
manually from /opt/d-cache/etc to the config directory.
2.1. etc/node_config
There is no dedicated rpm for the installation of a pool-node any
longer. Selection of admin vs. pool node is done via the NODE_TYPE
parameter in the node_config file. The admin node can also contain
pools.
NODE_TYPE = dummy # either admin or pool
DCACHE_BASE_DIR = /opt/d-cache
PNFS_ROOT = /pnfs
PNFS_INSTALL_DIR = /opt/pnfs.3.1.10/pnfs
PNFS_START = yes (start pnfs in case it's not running)
PNFS_OVERWRITE = no (in case dCache config exists in pnfs)
POOL_PATH = /opt/d-cache/etc (in case pools are to be configured on
admin node; for details see pool instr.)
NUMBER_OF_MOVERS = 100
Copy the template to its base name, if required, and edit the resulting
file as desired.
2.2. etc/dcache.kpwd
The dcache.kpwd authentication file.
The template needs to be customized and is expected as
/opt/d-cache/etc/dcache.kpwd
In case there is an existing dcache.kpwd it will not be overwritten
See the release notes for further information on the format.
2.3. config/dCacheSetup
Important note: If an existing dCacheSetup file is going to be re-used
make sure the Java classpath setting is uptodate. The
setting that is required by this version of the software
can be found in /opt/d-cache/etc/dCacheSetup.template
From version 1.2.2-7 on there is a new parameter to
support remote db connections for the SRM server
"srmDbHost=<your.dbHost.org>"
this is the primary configuration file for the dCache core and
optional components, i.e. srm/gridftp
Things that need attention (anything else has reasonable defaults)
- java path
- serviceLocatorHost
- use the host name of the node that is running pnfs as it is
defined in your DNS, replace string "SERVER" by the host name
- pnfsSrmPath (default is /)
- srmDbHost=<your.dbHost.org> to let the SRM server know about the db host
- The following parameters should be set to "true" if the dCache
installation is going to be used as a LCG Storage Element
- RecursiveDirectoryCreation=true
- AdvisoryDelete=true
If dCache was previously not running on this machine or if there is no
dCacheSetup file in the config directory copy the dCacheSetup.template
file to /opt/d-cache/config and customize it according to your needs.
2.4. etc/pool_path
The template contains pool parameters (path, size, etc)
The format of the pool_path file is (3 columns)
/path/to/pool size[GB] "overwrite if exists (yes/no)"
[Note: GB means 1024^3; space for inodes etc. is not accounted for]
Copy the template to its base name, if required, and edit the resulting file
as desired. Use an empty file if no pools are wanted (e.g. on a pure admin node).
2.5. Install and configure the doors (GridFTP, SRM, gsidcap) -
etc/door_config
A "door" node (neither an "admin" nor a "pool" node) requires the core
and the opt RPMs to be installed. However, only the following
installation script needs be executed
/opt/d-cache/install/install_doors.sh
(DON'T run /opt/d-cache/install/install.sh)
Make sure the template (/opt/d-cache/etc/door_config.template was copied
to /opt/d-cache/etc/door_config and customized before running the install_doors.sh
script.
The format of the door_config file is (2 columns)
ADMIN_NODE <name of admin node running pnfs>
door active (default is all active)
--------------------
GSIDCAP yes (or "no")
GRIDFTP yes (or "no")
SRM yes (or "no")
Also the dCacheSetup.template file needs to be copied to
/opt/d-cache/config/dCacheSetup and customized accordingly.
If a door or multiple different doors are to be added to an "admin"
and/or a "pool" node
- Install the dcache-opt RPM on each node
- Make sure the template (/opt/d-cache/etc/door_config.template was
copied to /opt/d-cache/etc/door_config and customized before running
the install_doors.sh
- On a "pool" node, copy the dcache authentication file (../etc/dcache.kpwd)
from the admin node to /opt/d-cache/etc on the "pool" node(s)
3. To install an "admin" or a "pool" node run the install script at
- /opt/d-cache/install/install.sh
For an "admin" node this will do all dCache specific preparations in pnfs, etc.
If there is a pool location configured in pool_path it will also install a pool
(in case the file is empty it will not install/configure any pool related
stuff).
- /opt/d-cache/install/install_doors.sh
in case one or multiple different of the following doors are supposed to
be installeda on the admin and/or the pool nodeB
- GridFTP
- SRM
- gsidcap
This will update the ../bin/dcache-opt script accordingly.
Note: "door" nodes need to mount the pnfs fs. Make sure NFS related
communication is enabled between the "admin" and the "door" node(s).
For pnfs installations prior to the one which is part of the 1.2.2-7
distribution do the following
- Make sure pnfs is running
- cp /pnfs/fs/admin/etc/exports/127.0.0.1 \
/pnfs/fs/admin/etc/exports/0.0.0.0..0.0.0.0
(overwrite existing file)
- Monitoring of the door domains via the Web page
In the recent setup, the srm and the gridftp door(s) have changed their
name(s) so that the web page is asking the wrong cell whether or not it's alive.
The srm/gridftp name changed from SRM/GFTP to SRM/GFTP-<HOSTNAME> (where HOSTNAME is
the host, the SRM/GFTP door is running on). To properly update the status page you
have to manually modify the config/httpd.batch on the headnode (or the node were the
http service is running).
At the end of the httpd.batch file you will find a list of 'cells'. The ones you
need to change are called SRM and GFTP. Please change them to SRM-<HOSTNAME> and
GFTP-<HOSTNAME> respectively. In case multiple gridftp doors are configured you need
to add as many lines as there are gridftp doors.
You need to restart the httpd service to activate the changes.
In case you don't want to restart the services you may
as well make the changes in the batch file (for future
restarts ) and use the ssh interface to make temp.
changes :
(local) admin cd collector@httpdDomain
>>(collector@httpdDomain) admin > unwatch SRM
> >>>(collector@httpdDomain) admin > unwatch GFTP
> >>>(collector@httpdDomain) admin > unwatch DCap-gsi
> >>>(collector@httpdDomain) admin >
> >>>(collector@httpdDomain) admin > watch SRM-<HOSTNAME>
> >>>(collector@httpdDomain) admin > watch GFTP-<HOSTNAME>
> >>>(collector@httpdDomain) admin > watch DCap-gsi-<HOSTNAME>
4. Start/stop the dCache services
Make sure pnfs is running and the pnfs filesystem is mounted
- To start/stop pnfs
/opt/pnfs.3.1.10/pnfs/bin/pnfs start|stop
Starting pnfs will also mount the fs, stopping it will unmount pnfs.
Start the core services
/opt/d-cache/bin/dcache-core start|stop
[in case dCache optional components (srm/gridftp/gsidcapdoor) are
installed on an "admin", a "pool" or a "door" node they are
started/stopped with
/opt/d-cache/bin/dcache-opt start|stop ]
To start|stop a pool use
/opt/d-cache/bin/dcache-pool start|stop
5. Client installation
Client components are installed under /opt/d-cache.
The libraries (32 and 64-bit versions) can be found under
/opt/d-cache/dcap/lib. libdcap.so and libpdcap.so
are symbolic links pointing to the 64-bit version.
Also the gsidcap tunnel lib (libgsiTunnel.so) is
installed here. In case the 32 bit version is supposed to be
the default the links can be customized accordingly.
Besides the libraries header files (/opt/d-cache/dcap/include)
and the dccp binary (/opt/d-cache/dcap/bin) are installed with
the Client RPM).
It is sufficient to install the Client RPM. No further installation
step is required to make the client functions operational.
6. Log files
Common location for all dCache related log files is
/opt/d-cache/log
The default location for the PNFS log file is
/var/log/pnfsd.log
The default is all pools register automatically with the default
pgroup.
dCache SRM installation and configuration instructions
======================================================
Requirements on the srm and pool nodes
1. The nodes on which srm server (srm cell) and pool
nodes are installed need to have the grid host certificate
installed. Please refer to the instructions from your Certification
Authority on how to obtain a grid host certificate.
2. There should be a postgres database server running on a
machine accessible by the srm server, and there should be a
postgres user account created, capable of creating new
tables.
Instructions on the installation are provided in section
"Prerequisites", top III
Non-standard DCache Services that srm relies upon
[Note: The configuration options as they are described below are
already part of the srm.batch file as it is coming with
this package. Ususally no modifications are necessary.
Only when an existing installation is upgraded _AND_ the
admin is going to reuse the old srm.batch file the "pin
manager" config entries need to be added. The standard
RPM upgrade mechanism overwrites that file.]
1. Pin Manager.
Pin Manager is used by srm to perform the so
called file "pin in cache" operation. When a file is in pinned
state, it will not be deleted from the cache to make room for
other incoming files. The Pin Manager cell can be started by
adding the following lines to one of the dcache domain
configuration "batch" files (e.g. srm.batch):
#
#pin manager
#
create diskCacheV111.services.PinManager PinManager \
" default -export \
-jdbcUrl=jdbc:postgresql://localhost/dcache \
-jdbcDriver=org.postgresql.Driver \
-dbUser=<user> \
-dbPass=<password>"
[Note: defaults for <user>=srmdcache, <password>=srmdcache]
The configurable parameters are the folowing:
-jdbcUrl url is pointing to the type and the location of the
database, which will be used by the pin manager. For example,
if the database is running on a host "hosta" on a nonstandard
port "12345", and the database name is "name1", this option
value would be "jdbc:postgresql://hosta:12345/name1".
-jdbcDriver specifies the class name for the driver. Should
remain the same for the postgres database.
-dbUser a name of the database user
-dbPass a password for the database user, could be an arbitrary
string if the host on which the pin manager is running is
included in the postgres list of the trusted hosts.
The two optional parameters are the
-poolManager and -pnfsManager, which allow the specification
of alternative names for the PoolManager and PnfsManager cells.
2. GsiftpTransferManager
This service is used by srm to perform the transfers
from a remote server to the dcache via the gsiftp protocol.
The GsiftpTransferManager cell is started by the folowing
"batch" command:
#
# RemoteGsiftpTransferManager
#
create diskCacheV111.services.GsiftpTransferManager
RemoteGsiftpTransferManager
\
"default -export \
-pool_manager_timeout=60 \
-pnfs_manager_timeout=60 \
-pool_timeout=300 \
-mover_timeout=86400 \
-max_transfers=30 \
"
The configurable parameters are the folowing:
-pool_manager_timeout is the timeout in seconds for PoolManager
message exchanges.
-pnfs_manager_timeout is the timeout in seconds for PnfsManager
message exchanges.
-pool_timeout is the timeout in seconds before the first pool
message, confirming the the creation of the mover
-mover_timeout is the time before the transfer manager will
stop waiting for the completion of the started transfer. If
expired it will try to kill the mover, and report the error
back to the caller (srm).
-max_transfers is the maximum number of simultaneous transfers.
If more transfers are scheduled, the transfer manager will fail
them.
3. Copy Manager,
This service is used by the srm when the source and the destination
files in the srm copy request are both local to the storage.
Its configuration parameters are mostly the same as of the
GsiftpTransferManager. The example startup command follows:
create diskCacheV111.doors.CopyManager CopyManager \
"default -export \
-pool_manager_timeout=60 \
-pool_timeout=300 \
-mover_timeout=86400 \
-max_transfers=30 \
"
SRM Configuration Guide
-----------------------
Note: The package is coming with a set of suitable parameters. We expect that
modifications are necessary in rare cases only.
We provide deep technical information for those who are interested in
the details. Those details are not required to operate the SRM/dCache.
In order to start the srm server, the instance of the cell
diskCacheV111.srm.dcache.Storage needs to be created. Here is
the example of a dcache batch file command starting the srm cell,
illustrating most of the configurable parameters:
create diskCacheV111.srm.dcache.Storage SRM \
"default -srmport=${srmPort1} \
-export \
-kpwd-file=${config}/dcache.kpwd \
-pnfs-srm-path=/ \
-buffer_size=1048576 \
-tcp_buffer_size=1048576 \
-parallel_streams=10 \
-debug=true \
-get-lifetime=86400000 \
-put-lifetime=86400000 \
-copy-lifetime=86400000 \
-get-req-thread-queue-size=1000 \
-get-req-thread-pool-size=30 \
-get-req-max-waiting-requests=1000 \
-get-req-ready-queue-size=1000 \
-get-req-max-ready-requests=30 \
-get-req-max-number-of-retries=10 \
-get-req-retry-timeout=60000 \
-get-req-max-num-of-running-by-same-owner=10 \
-put-req-thread-queue-size=1000 \
-put-req-thread-pool-size=30 \
-put-req-max-waiting-requests=1000 \
-put-req-ready-queue-size=1000 \
-put-req-max-ready-requests=30 \
-put-req-max-number-of-retries=10\
-put-req-retry-timeout=60000 \
-put-req-max-num-of-running-by-same-owner=10 \
-copy-req-thread-queue-size=1000 \
-copy-req-thread-pool-size=8 \
-copy-req-max-waiting-requests=1000 \
-copy-req-max-number-of-retries=30\
-copy-req-retry-timeout=6000 \
-copy-req-max-num-of-running-by-same-owner=10 \
-recursive-dirs-creation=true \
-jdbcUrl=jdbc:postgresql://localhost/dcache \
-jdbcDriver=org.postgresql.Driver \
-dbUser=srmdcache \
-dbPass=srmdcache \
"
The available configuration options are:
-kpwd-file specifies the location of the dcache authorization
"database" file.
-pnfs-srm-path specifies the root of the srm within the pnfs
namespace. Essentially this means that the value of this option
will be prepended to all the local storage paths given to the srm
server.
-buffer_size and -tcp_buffer_size specify the size of memory, in
bytes, and socket buffer size, in bytes, to be used with the embedded
gsiftp clients, when performing transfers between the storage and
a gsiftp server.
-parallel_streams specifies the max. number of parallel streams to be
used by the embedded gsiftp client.
-debug tells if extra debug info should be logged. Most of the debug
logging can be turned off by setting the printout domain variable to
error (2). Usually this is done in the first line of the dcache batch
file.
-recursive-dirs-creation turnes on and off the automatic creation
of unexsistent directories, in case of put/copy requests.
-jdbcUrl, -jdbcDriver, -dbUser, -dbPass: these options have exactly
same meaning as the same options of the PinManager.
-jdbcUrl url is pointing to the type and the location of the
database, which will be used by the pin manager. For example,
if the database is running on a host "hosta" on a nonstandard
port "12345", and the database name is "name1", this option
value would be "jdbc:postgresql://hosta:12345/name1".
-jdbcDriver specifies the class name for the driver. Should
remain the same for the postgres database.
-get-lifetime, -put-lifetime, -copy-lifetime specify the lifetimes, in
milliseconds, of the srm get, put and copy requests respectively.
In order to develop a better understanding of the rest of the
parameters we will first describe how the request scheduler works.
Please note that the following explanation is a simplification.
The SRM Scheduler executes the instances of the SRM Job classes. For
the scheduler, execution of the job is the execution of the job's run
method in one of the threads. Jobs are initially in the Pending state.
Once the scheduler receives the job, it puts it in the TQueued state and
transfers it into the Thread Queue.
The Scheduler takes the java threads from the pool and will actually execute
the jobs' "run" methods. Once a thread in the pool becomes available,
the first job from the Thread Queue is removed, and this job's state is
changed to running. The thread starts the execution of the job's run method.
Once the run method returns, the state of the job can still remain
"Running", or it might have changed to "Done" or "AsyncWait".
If the state is still "Running", it will be placed on the ready queue.
Once the job execution is completed, it now waits to be put to the "Ready"
state by the scheduler.
If it is "AsyncWait" this means that the job is partially completed, and
it now waits for the internal event to continue execution; if it is "Done",
the job needs no further processing.
To limit the number of simultaneous transfers by the srm client/user (in
cases when the srm server does not perform the transfer itself, i.e.
"get" and "put" requests) the number of jobs that are "Ready" can be
limited according to the number set in the configuration. The rest of the
requests, which are prepared to be "Ready" are put on the "Ready" queue.
Once the clients finish the transfers, they notify the system by changing
the state of the put or get file requests to "Done". If users/clients never
perform this state change, the request changes its state automatically
upon the expiration of the request's lifetime. If "Ready" spots become
available, corresponding requests are removed from the Ready queue and
their state changes to "ready".
In the dcache srm there are three instances of the scheduler, one for
each possible type of srm requests: copy, get and put.
The following options are described as follows:
-[type]-req-thread-queue-size - maximum number of requests in the
thread queue
-[type]-req-thread-pool-size - maximum number of threads in the thread
pool. This parameter is especially important for copy requests,
since the copy operation for each file is performed in a separate thread.
The number for copy requests should be less than the -max_transfers
parameter of the transfer managers.
-[type]-req-max-waiting-requests - maximum number of requests in
the async wait jobs
-[type]-req-ready-queue-size - maximum number of requests in the
ready queue. This and the following parameters are not important
for the copy scheduler.
-[type]-req-max-ready-requests
this parameter is important for put and get requests, and it is
equivalent to the number of transfer urls given out to the clients,
which are actively transferring (or intend to transfer).
-[type]-req-max-number-of-retries
number of times the job is allowed to fail and to be retried
before SRM should give up and return an error to the user.
-[type]-req-max-num-of-running-by-same-owner
The job owner is roughly equivalent to the user account in the kpwd
file.
When the jobs are removed from the Thread and Ready Queue, their
"owner" is taken in consideration. If the number of jobs submitted by
the user exceeds the number in the configuration, jobs of this particular
user will not be removed from the queue, even if they are first.
If there are jobs belonging to another user, for whom this number was not
reached yet, these will be executed first. This will not lead to
underutilization of the system. If only one owner is running jobs
they all will get scheduled to occupy all available scheduler threads
or ready spots.
Other available options are (these are not recommended to be used so
they are not explained here) :
-poolManager,
-pnfsManager,
-proxies-directory
-url-copy-command
-timeout-command
-usekftp
-globus-url-copy
-use-urlcopy-script
-use-dcap-for-srm-copy
-use-gsiftp-for-srm-copy
-use-http-for-srm-copy
-use-ftp-for-srm-copy
-save-memory
For more information refer to the dCache web site at http://www.dcache.org and the
FNAL SRM web site at http://www-isd.fnal.gov/srm.
System Monitoring
=================
http://admin.node.org:2288
- allows monitoring of services only
Admin Interface
===============
The admin interface offers a very rich set of commands (to
be described elsewhere) allowing to alter system configuration
while the system is running and to solve eventual problems.
Since the distribution comes with an initial password it is
important to login right after the installation in order to
customize it.
Note: The following is supported in version 1.2.2-1 for the
first time and will be supported in future releases.
Assuming you are logged in on the admin node log in to the
admin interface to set the password
ssh -l admin -c blowfish -p 22223 localhost
(passwd : dickerelch)
(local) admin > cd acm
(acm) admin > create user admin
(acm) admin > set passwd <newPasswd> <newPasswd>
(acm) admin > ..
(local) admin > logoff
From now on login as user admin will only be successful if
newPasswd is presented.
Note: When setting the password string in the shell one can
disable the echo by typing "ctrl I" following "set
password".Because of the wide spread usage of dCache in the grid word, we provide the dCache binary packages mainly for a certain Linux flavour complaint to the rest of the WLCG, gLite or OSG software. The dCache system itself is, except for the filesystem engine (pnfs) and the dCap client, pure Java, so that a port should be easy as long as the OS is coming with a resonably modern Java virtual machine. For now dCache is fine with Java 1.5. This chapter should give an estimate on how much effort is necessary to do the porting to other operating systems.
The dCache pool code is certainly the easiest part to port.
Table of Contents
This section is a guide for exploring a newly installed dCache system. The confidence obtained by this exploration will prove very helpful when encountering problems in the running system. This forms the basis for the more detailed stuff in the later parts of this book.
The starting point is a fresh installation according to the
installation instructions shipped with any distribution of
dCache. All components (pnfs,
dcache-core, dcache-pool,
and dcache-opt) are started on the same
host. Additional pools on other hosts may also be running.
First, we will get used to the client tools. On the dCache
admin host, change into the pnfs directory, where the users
are going to store their data:
[user] $cd /pnfs/<site.de>/data/[user] $
Note that on the dCache admin node this directory is a link to
/pnfs/fs/usr/ and the NFS
export localhost:/fs is mounted to /pnfs/fs. The NFS server running on
the dCache admin node is part of the pnfs system which is
not part of dCache but heavily used by it.
The pnfs filesystem is not intended for reading or writing
actual data with regular file operations via the NFS
protocol. However, localhost:/fs is a
special, privileged NFS export that allows reading and writing
for administrative tasks. It should only be mounted by the admin
node.
Reading and writing data to and from a dCache instance can be
done with a number of protocols. After a standard installation,
these protocols are dCap, GSIdCap, and GridFTP. In
addition dCache comes with an implementation of the SRM
protocol which negotiates the actual data transfer protocol.
We will first try dCap with the dccp
command:
[user] $export PATH=/opt/d-cache/dcap/bin/:$PATH[user] $cd /pnfs/<site.de>/data/[user] $dccp /bin/sh my-test-file 541096 bytes in 0 seconds
This command succeeds if the user user has the Unix rights to write
to the current directory /pnfs/<site.de>/data/.
The dccp command also accepts URLs. We can copy the data back
using the dccp command and the dCap protocol but this time
describing the location of the file using a URL.
[user] $ dccp dcap://<adminNode>/pnfs/<site.de>/data/my-test-file /tmp/test.tmp
541096 bytes in 0 secondsHowever, this command only succeeds if the file is world readable. The following shows how ensure the file is not world readable and illustrates dccp consequently failing to copy the file.
[user] $chmod o-r my-test-file[user] $dccp dcap://<adminNode>/pnfs/<site.de>/data/my-test-file /tmp/test2.tmp Command failed! Server error message for [1]: "Permission denied" (errno 2). Failed open file in the dCache. Can't open source file : "Permission denied" System error: Input/output error
This command did not succeed, because dCap access is
unauthenticated and the user is mapped to a non-existent user in
order to determine the access rights. However, you should be
able to access the file with the NFS mount:
[user] $ dccp my-test-file /tmp/test2.tmp
541096 bytes in 0 seconds
If you have a valid grid proxy with a certificate subject which
is properly mapped in the configuration file
/opt/d-cache/etc/dcache.kpwd you can also
try grid-authenticated access via the GSI-authenticated version
of dCap:
[user] $chgrp <yourVO> my-test-file[user] $export LD_LIBRARY_PATH=/opt/d-cache/dcap/lib/:$LD_LIBRARY_PATH[user] $dccp gsidcap://<adminNode>:22128/pnfs/<site.de>/data/my-test-file /tmp/test3.tmp 541096 bytes in 0 seconds
Or we let the SRM negotiate the protocol:
[user] $export PATH=/opt/d-cache/srm/bin/:$PATH[user] $srmcp srm://<adminNode>:8443/pnfs/desy.de/data/my-test-file file:////tmp/test4.tmp configuration file not found, configuring srmcp created configuration file in ~/.srmconfig/config.xml
If the dCache instance is registered as a storage element in
the LCG/EGEE grid and the LCG user interface software is
available the file can be accessed via SRM:
[user] $ lcg-cp -v --vo <yourVO> \
srm://<dCacheAdminFQN>/pnfs/<site.de>/data/my-test-file \
file:///tmp/test5.tmp
Source URL: srm://<dCacheAdminFQN>/pnfs/<site.de>/data/my-test-file
File size: 541096
Source URL for copy: gsiftp://<dCacheAdminFQN>:2811//pnfs/site.de/data/my-test-file
Destination URL: file:///tmp/test5.tmp
# streams: 1
Transfer took 770 ms
and it can be deleted with the help of the SRM interface:
[user] $ srm-advisory-delete srm://<dCacheAdminFQN>:8443/pnfs/<site.de>/data/my-test-file
srmcp error : advisoryDelete(User [name=...],pnfs/<site.de>/data/my-test-file)
Error user User [name=...] has no permission to delete 000100000000000000BAF0C0This works only if the grid certificate subject is mapped to a user which has permissions to delete the file:
[user] $chown <yourVO>001 my-test-file[user] $srm-advisory-delete srm://<dCacheAdminFQN>:8443/pnfs/<site.de>/data/my-test-file
If the grid functionality is not required the file can be
deleted with the NFS mount of the pnfs filesystem:
[user] $rmmy-test-file
In the standard configuration the dCache web interface is
started on the admin node and can be reached via port
2288. Point a web
browser to http://<adminNode>:2288/
to get to the main menue of the dCache web interface. The
contents of the web interface are self-explanatory and are the
primary source for most monitoring and trouble-shooting tasks.
The “Cell Services” page displays the status of
some important cells of
the dCache instance. You might observe that some cells are
marked “OFFLINE” even though you know that they are
running and fine. These might be the cells “SRM”,
“GFTP”, and “DCap-gsi”. The reason is
that the names of the cells monitored by the web interface are
explicitly configured in the file
/opt/d-cache/config/httpd.batch. However,
the cells have other names. If you change the following section
near the end of the file
#
create diskCacheV111.cells.WebCollectorV3 collector \
"PnfsManager \
PoolManager \
GFTP \
SRM \
DCap-gsi \
-replyObject"
#such that it reads
#
create diskCacheV111.cells.WebCollectorV3 collector \
"PnfsManager \
PoolManager \
GFTP-<adminNode> \
SRM-<adminNode> \
DCap-gsi-<adminNode> \
-replyObject"
#
and restart the httpDomain domain by executing
[root] #/opt/d-cache/jobs/httpd stop[root] #/opt/d-cache/jobs/httpd -logfile=/opt/d-cache/log/http.log start
More information about the
<domainName>.batch
will follow in the next section.
The “Pool Usage” page gives an good overview of the current space usage of the whole dCache instance. In the graphs, free space is marked yellow, space occupied by cached files (which may be deleted when space is needed) is marked green, and space occupied by precious files, which cannot be deleted. Other states (e.g., files which are currently written) are marked purple.
The page “Pool Request Queues” (or “Pool Transfer Queues”) gives information about the number current requests handled by each pool. “Actions Log” keeps track of all the transfers performed by the pools up to now.
The remaining pages are only relevant with more advanced configurations: The page “Pools” (or “Pool Attraction Configuration”) can be used to analyze the current configuration of the pool selection unit in the pool manager. The remaining pages are relevant only if a tertiary storage system (HSM) is connected to the dCache instance.
In this section we will have a look at the configuration and log
files of pnfs and dCache.
The primary configuration file for pnfs is
/usr/etc/pnfsSetup. For administrative
tasks in pnfs it has to be sourced and the path to the
admninistration tools should be added to $PATH:
[root] #. /usr/etc/pnfsSetup[root] #PATH=$PATH:$pnfs/tools
Stopping and starting the pnfs server can now be done with
[root] #pnfs stop[root] #pnfs start
The log files for the three pnfs server daemons are
/var/log/pmountd.log,
/var/log/dbserver.log, and
/var/log/pnfsd.log. The
pnfsd is the NFS server implementation. Its
log file contains one line for each NFS filesystem
operation. The result code of that operation can be found at the
end of the line.
The pnfsd daemons communicate with the
dbserver daemons via a shared memory
area. These perform the actual operations on the database files
which can be found in /opt/pnfsdb/pnfs/databases/. The
pnfs system finds them via the information in the directory
/opt/pnfsdb/pnfs/info/.
Each database file is handled by one dbserver
daemon and each access will lock the database file. Each
database file/server is the container for one directory
sub-tree. Similar to mounts in the UNIX filesystem, the
filesystem tree is stitched together from several such
sub-trees. In the standard configuration the
admin database contains the root of the
filesystem (the one mounted on /pnfs/fs/) and the
data1 database the sub-tree starting at
/pnfs/fs/usr/data/. the section called “The Databases of pnfs” describes how to create new databases.
The dCache software is installed in one directory, normally
/opt/d-cache/. All
configuration and log files can be found here. In the following
filenames will always be relative to this directory.
In the previous section we have already seen how a domain is restarted:
[root] #/opt/d-cache/jobs/<domainName> stop[root] #/opt/d-cache/jobs/<domainName> \ -logfile=/opt/d-cache/log/<domainName>.log start
Listing the contents of jobs/ you will see that these
scripts are all in fact links to
jobs/wrapper2.sh. This script will
determine the domain to start by its basename. From the above
commands you can already read of the standard location for the
log files.
The files
config/<domainName>Setup
will be used by the wrapper script and by the domains at
start-up. These files are also links to
config/dCacheSetup. This is the primary
configuration file of the dCache system.
The only files which are different for each domain are
config/<domainName>.batch.
They describe which cells are
started in the domains. Normally,
changes in these files should not be necessary. However, if you need to
change something, consider the following:
Since the standard
config/<domainName>.batch
files will be overwritten when updating to a newer version of dCache
(e.g. with RPM), it is a good idea to modify only private copies of them.
When choosing a name like
config/<newDomainName>.batch
you give the domain the name
<newDomainName>. The necessary links can be
created with
[root] #cd /opt/d-cache/config/[root] #