dCache.Org eagle
black_bg
home | documentation | downloads | feedback | search | imprint
black_bg
release notes | Book | Wiki | Q&A | Client API | dccp
black_bg
Web pages | Single page | PDF (A4-size) | PDF (Letter-size)
black_bg

The dCache Book

Mathias de Riese

Patrick Fuhrmann

Tigran Mkrtchyan

Michael Ernst

Alex Kulyavtsev

Vladimir Podstavkov

Martin Radicke

Neha Sharma

Dmitry Litvintsev

Timur Perelmutov

Ted Hesselroth

Abstract

The dCache Book is the guide for administrators of dCache systems. The first part describes the installation of a simple single-host dCache instance. The second part describes the components of dCache and in what ways they can be configured. This is the place for finding information about the role and functionality of components in dCache as needed by an administrator. The third part contains solutions for several problems and tasks which might occur during operating of a dCache system. Finally, the last two parts contain a glossary and a parameter and command reference.


Table of Contents

I. Getting started
1. Introduction
2. Installing dCache
3. Getting in Touch with dCache
II. Configuration of dCache
4. Configuration in pnfs
5. The Cell Package
6. Resilience with the Replica Manager
7. Configuring the Pool Manager
8. The Interface to a Tertiary Storage System
9. File Hopping
10. dCache Partitioning
11. Central Flushing to tertiary storage systems
12. gPlazma authorization in dCache
13. dCache as xRootd-Server
14. dCache Storage Resource Manager
15. dCache Web Monitoring
III. Cookbook
16. General
17. The Maintenance Module
18. Pool Operations
19. Moving the pnfs Data from GDBM to PostgreSQL
20. Migration of classic SE ( nfs, disk ) to dCache
21. PostgreSQL and dCache
22. Complex Network Configuration
23. Accounting
24. Protocols
25. Advanced Tuning
26. Statistics Module for pre 1.6.7 releases
IV. Reference
27. dCache Clients
28. dCache Cell Commands
29. dCache Developers Corner
30. dCache Glossary
31. Changelog

List of Figures

1.1. The dCache Layer Model
6.1. Pool State Diagram

List of Tables

8.1. Mandatory StorageInfo keys
8.2. Optional StorageInfo keys but used by all HSM's
8.3. Enstore specific
8.4. OSM specific
8.5. Return codes
9.1. PoolManager Hopping Request Attributes
10.1. New and old PoolManager parameter names
11.1. Driver Properties
18.1. Checksum calculation flow
22.1. Protocol Overview
24.1. Open Timeout mechanisms
25.1. Variable Overview
25.2. Variable Overview
25.3. Variable Overview
26.1. File Format
29.1. Parameter setting reference

List of Examples

5.1. Example batch file config/gridftpdoor.batch
15.1. Fragment of tableConfig.xml configuration file
15.2. Fragment of plotConfig.xml configuration file
15.3. Fragment of pltnames.xml configuration file
22.1. Batch file for two GridFTP doors serving separate network interfaces
22.2. Batch file for two GridFTP doors serving separate network interfaces
25.1. To Concurrent Usage Patterns
25.2. Modified config/pool.batch file for multiple mover queues
25.3. Batch file for a GridFTP door using a mover queue
25.4. Batch file for a dCap door for allowing the client to select the mover queue

Part I. Getting started

Chapter 1. Introduction

dCache is a distributed storage solution. It organises storage across computers so the combined storage can be used without the end-users being aware of on precisely which computer their data is stored; end-users see simply a large amount of storage.

Because end-users need not know on which computer their data is stored, their data can be migrated from one computer to another without any interruption of service. This allows dCache storage computers to be taken out of service or additional machines (with additional storage) to be added without interrupting the service the end-users enjoy.

dCache supports requesting data from a tertiary storage system. A tertiary storage system typically uses a robotic tape system, where data is stored on a tape from a library of available tapes, which must be loaded and unloaded using a tape robot. Tertiary storage systems typically have a higher initial cost, but can be extended cheaply by added additional tapes. This results in tertiary storage systems being popular where large amounts of data must be read.

dCache also provides many transfer protocols (allowing users to read and write to data). These have a modular deployment, allowing dCache to support expanded capacity by providing additional front-end machines.

Another performance feature of dCache is hot-spot data migration. In this process, dCache will detect when a few file are being requested very often. If this happens, dCache can make duplicate copies of the popular files on other computers. This allows the load to be spread across multiple machines, so increasing throughput.

The flow of data within dCache can also be carefully controlled. This is especially important for large sites as chaotic movement of data may lead to suboptimal usage; instead, incoming and outgoing data can be marshaled so they use designated resources; allowing better throughput and guaranteeing end-user experience.

dCache provides a comprehensive administrative interface for configuring the dCache instance. This is described in the later sections of this book.

Figure 1.1. The dCache Layer Model

The dCache Layer Model

The layer model shown in Figure 1.1, “The dCache Layer Model” gives an overview of the architecture of the dCache system.

Chapter 2. Installing dCache

Michael Ernst

FNAL

Patrick Fuhrmann

DESY

Mathias de Riese

DESY

The first section describes the installation of a fresh dCache instance using RPM files downloaded from the dCache homepage. It follows a guide to upgrading an existing installation. In both cases we assume standard requirements of a small to medium sized dCache instance without an attached tertiary storage system. The third section contains some pointers on extended features.

[return to top]

Installing a Basic dCache Instance

In the following the installation of a central admin node of a dCache instance and of an arbitrary number of dCache nodes will be described. These nodes may each contain several dCache pools and optionally one SRM, one GridFTP door, and/or one GSIdCap door. On the admin node, a pnfs server and several central dCache components are installed. The pnfs server, some central components, and each SRM need an PostgreSQL server installed locally on the node. The first section describes the configuration of a PostgreSQL server. After that the installation of the pnfs server and of the dCache components will follow. During the whole installation process root access is required.

[return to top]

Prerequisites

In order to install dCache the following requirements must be met:

  • An RPM-based Linux distribution is required.

  • dCache 1.8 requires Java 1.5 or 1.6 SDK. Previous releases of dCache can use Java 1.4.2. It is recommended to use the newest Java release available within the release series used.

  • PostgreSQL must be installed and running. See the section called “Installing a PostgreSQL Server” for more details. It is strongly recommended to use version 8 or higher.

[return to top]

Installation of the dCache Software

The RPM packages may be installed right away on each node, for example using the command:

[root] # rpm -ivh dcache-server-<version>-<release>.i386.rpm
[root] # rpm -ivh dcache-client-<version>-<release>.i386.rpm

The pnfs server software on the admin node can be installed with the command:

[root] # rpm -ivh pnfs-postgresql-<version>-<release>.i386.rpm

[return to top]

Readying the PostgreSQL server

You must configure PostgreSQL for use by dCache and create the necessary PostgreSQL user accounts and database structure. This section describes how to do this.

[return to top]

Configuring the PostgreSQL server

Using a PostgreSQL server with dCache places a number of requirements on the database. This section describes what configuration is necessary to ensure PostgreSQL operates so dCache can use it.

Restarting PostgreSQL

If you have edited PostgreSQL configuration files, you must restart PostgreSQL for those changes to take effect. On many systems, this can be done with the following command:

[root] # /etc/init.d/postgresql restart

[return to top]

Enabling TCP connections

When connecting to PostgreSQL, dCache will always use TCP connections. So, for dCache to use PostgreSQL, support for TCP sockets must be enabled.

In contrast to dCache, the PostgreSQL stand-alone client application psql can connect using either a TCP socket or via the local filesystem (a “UNIX” socket). Because of this, it is common for PostgreSQL to disable TCP sockets by default, requiring the admin to explicitly configure PostgreSQL so connecting via a TCP socket is supported.

To enable TCP sockets, edit the PostgreSQL configuration file postgresql.conf. This is often found in the /var/lib/pgsql/data, but may be located elsewhere. You should ensure that the line tcpip_socket is set to true; for example:

tcpip_socket = true

[return to top]

Enabling local trust

Perhaps the simplest configuration is to allow password-less access to the database and the following documentation assumes this is so.

To allow local users to access PostgreSQL without requiring a password, ensure the file pg_hba.conf, usually located in /var/lib/pgsql/data, contains the following lines.

local   all         all                        trust
host    all         all         127.0.0.1/32   trust
host    all         all         ::1/128        trust

Note

Please note it is also possible to run dCache with all PostgreSQL accounts requiring passwords.

[return to top]

Creating data users and structure

Prepare the PostgreSQL users and databases as they are needed for the components of dCache and/or the pnfs server: The pnfs server only needs a database user. We suggest to call it pnfsserver. Create it with:

[root] # createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt pnfsserver

Several databases will be created by this user. At initial installation, as described below, two databases will be created: admin and data1. These databases will contain the information about the namespace of the pnfs filesystem. If the information in these databases is lost, the whole data in the dCache instance is not accessible anymore. Therefore, make sure these databases are backed up regularly and also stored on appropriately reliable hardware. Further advice may be found in Chapter 21, PostgreSQL and dCache.

The dCache components will access the database server with the user srmdcache which can be created with the createuser; for example:

[root] # createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt srmdcache

Several central components running on the admin node as well as each SRM will use the database dcache for state information:

[root] # createdb -U srmdcache dcache

There might be several of these on several hosts. Each is used by the dCache components running on the respective host.

The pnfs companion uses the database companion to store the pools all files are located on. On the admin node create and initialize it with the command:

[root] # createdb -U srmdcache companion
[root] # psql -U srmdcache companion -f /opt/d-cache/etc/psql_install_companion.sql

(It has to be located on the same host as the pnfs server.)

If the resilience feature provided by the replica manager is used, the database “replicas” has to be prepared on the admin node with the command:

[root] # createdb -U srmdcache replicas
[root] # psql -U srmdcache replicas -f /opt/d-cache/etc/psql_install_replicas.sql

Note

Note that the disk space will at least be cut in half if the replica manager is used.

If the billing information should also be stored in a database (in addition to files) the database billing has to be created:

[root] # createdb -U srmdcache billing

However, we strongly advise against using the same database server for the pnfs server and the billing information. For how to configure the billing cell to write into this database, see below.

[return to top]

Installing the pnfs Server

The pnfs server software is installed in the directory /opt/pnfs/. For the installation copy the file /opt/pnfs/etc/pnfs_config.template to /opt/pnfs/etc/pnfs_config. The default should be suitable for most installations. It contains:

PNFS_INSTALL_DIR = /opt/pnfs
PNFS_ROOT = /pnfs
PNFS_DB = /opt/pnfsdb
PNFS_LOG = /var/log
PNFS_OVERWRITE = no
PNFS_PSQL_USER = pnfsserver

Next run /opt/pnfs/install/pnfs-install.sh. This will write the central configuration file /usr/etc/pnfsSetup and initialize the databases in the PostgreSQL server as well as configuration information below /opt/pnfsdb/ (as configured by PNFS_DB in /opt/pnfs/etc/pnfs_config). For example:

[root] # /opt/pnfs/install/pnfs-install.sh
PNFS_PSQL_USER = pnfsserver
 Checking nfs servers : Ok
      Preparing setup : Ok

 Creating database admin
 Creating database data1

 Starting pnfs server   ... Ok
 Trying to talk to dbserver 0 [1122] ... Ok
 Trying to talk to dbserver 1 [1122] ... Ok
             Trying to mount 'pnfs' : Ok
        Correcting pnfs permissions : Ok
 Detecting wormhole target (config) : 0000000000000000000010E0
                  Digging wormholes :  dig-0-ok dig-1-ok Done
             Creating database link : Ok
 Setting mount permissions to world : Ok

 Remarks :
   ii) Any host may now mount this pnfs server
         mount -o intr,rw,noac,hard  <thisServerName>:/pnfs /<mountdir>

Installation of PNFS completed - stop PNFS
 Stopping Heartbeat ....  Ready
 Killing pnfsd  Done
 Killing pmountd  Done
 Killing dbserver . Done
 Removing 8 Clients  0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+
 Removing 8 Servers  0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+
 Removing main switchboard ... O.K.

The pnfs server may now be started with /opt/pnfs/bin/pnfs start, for example:

[root] # /opt/pnfs/bin/pnfs start
Starting dcache services:  Shmcom : Installed 8 Clients and 8 Servers
Starting database server for admin (/opt/pnfsdb/pnfs/databases/admin) ... O.K.
Starting database server for data1 (/opt/pnfsdb/pnfs/databases/data1) ... O.K.
Waiting for dbservers to register ... Ready
Starting Mountd : pmountd
Starting nfsd : pnfsd

This script may be linked from /etc/init.d/. Please do not copy the init script as it may change between releases. On Red Hat derived Linux distributions the pnfs may be configured to start at boot time with

[root] # chkconfig --add pnfs
[root] # chkconfig pnfs on

It is advisable to create a basic directory structure in the pnfs namespace where each directory uses a separate pnfs database. This is still true for the PostgreSQL version of pnfs since the pnfs server uses global locks on each database. the section called “The Databases of pnfs describes how it is done. WLCG sites should use at least two databases for each VO they support: One for the directory /pnfs/<domainName>/data/< voName>/ and one for the generated files, i.e. for /pnfs/<domainName>/data/< voName>/generated/.

[return to top]

Installing dCache Components

Use the templates of the configuration files found in /opt/d-cache/etc/ to create the following files.

The central configuration file of a dCache instance is /opt/d-cache/config/dCacheSetup. For most installations it is only necessary to set the variable java to the binary of the Java VM and the variable serviceLocatorHost to the hostname of the admin node. Note that the file has to go into the subdirectory config/ even though the template is found in etc/.

The installation and start-up scripts use the information in /opt/d-cache/etc/node_config. The variable NODE_TYPE controls whether the admin node should be installed or just pools and/or doors. Accordingly set it to “admin” or “pool” (for doors, as well). All other variables may be left at their default value.

For authorization of grid users the file /opt/d-cache/etc/dcache.kpwd is needed. Note that it may be generated from the standard /etc/grid-security/grid-mapfile with the tool grid-mapfile2dcache-kpwd which is distributed with the WLCG software.

How to proceed from here depends on whether the release to be installed is older than 1.8.0-14 or not. In particular init scripts and pool creation procedures have changed in dCache 1.8.0-14. Both procedures are described in the following sections.

Once dCache is installed, the section called “Using dCache as an LCG Storage Element” may be consulted about how to activate the info provider included since version 1.6.6-2 based on an earlier WLCG installation.

[return to top]

Prior to 1.8.0-14

Whether and how many pools should be installed on the current node is configured by /opt/d-cache/etc/pool_path. Each line in this file describes one pool. The format is as follows:

<poolDataDirectory> <poolSizeInGB> <reinstallPoolYesNo>

where <poolDataDirectory> is the full path to the directory which will contain the data files as well as some of the configuration of the pool, <poolSizeInGB> is the size of the pool. Make sure that there is always enough space under <poolDataDirectory>. Be aware that only pure data content is counted by dCache. Leave enough room for configuration files and filesystem overhead.

When all configuration files are prepared, configure the system with

[root] # /opt/d-cache/install/install.sh

[INFO]  No 'SERVER_ID' set in 'node_config'. Using SERVER_ID=<your.domain>.

[INFO]  Moving /opt/d-cache/bin/dcache-opt out of the way, because it is obsolete.

[INFO]  Creating link /pnfs/ftpBase --> /pnfs/fs which is used by the GridFTP door.
[INFO]  Creating link /pnfs/<your.domain> --> /pnfs/fs/usr/


[INFO]  Checking on a possibly existing dCache/PNFS configuration ...

[INFO]  Configuring pnfs export '/pnfsdoors' (needed from version 1.6.6 on)
        mountable by world.
[INFO]  You may restrict access to this export to the GridFTP doors which
        are not on the admin node. See the documentation.

[INFO]  Generating ssh keys:
Generating public/private rsa1 key pair.
Your identification has been saved in ./server_key.
Your public key has been saved in ./server_key.pub.
The key fingerprint is:
e3:4a:13:d5:33:45:e0:cd:69:a3:fb:d7:a8:64:df:73 root@grid-se3.desy.de


[INFO]  Creating Pool <hostname>_1
[INFO]  Creating Pool <hostname>_2

and start the central components (only on the admin node) with

[root] # /opt/d-cache/bin/dcache-core start
Starting dcache services:
Starting lmDomain  6 5 4 3 2 1 0 Done (pid=6802)
Starting dCacheDomain  6 5 4 3 2 1 0 Done (pid=6875)
Starting dirDomain  6 5 4 3 2 1 0 Done (pid=6973)
Starting doorDomain  6 5 4 3 2 1 0 Done (pid=7058)
Starting adminDoorDomain  6 5 4 3 2 1 0 Done (pid=7144)
Starting httpdDomain  6 5 4 3 2 1 0 Done (pid=7234)
Starting utilityDomain  6 5 4 3 2 1 0 Done (pid=7330)
Starting pnfsDomain  6 5 4 3 2 1 0 Done (pid=7436)
Starting gridftp-clintonDomain  6 5 4 3 2 1 0 Done (pid=7569)
Starting gsidcap-clintonDomain  6 5 4 3 2 1 0 Done (pid=7672)
Starting srm-clintonDomain  6 5 4 3 2 1 0 Done (pid=7777)

the configured pools are started with

[root] # /opt/d-cache/bin/dcache-pool start
Starting dcache pool: Starting clintonDomain  6 5 4 3 2 1 0 Done (pid=7990)

These scripts may be linked from /etc/init.d/. Please do not copy the init scripts as they may change between releases. On Red Hat derived Linux distributions dCache may be configured to start at boot-time using chkconfig, for example:

[root] # chkconfig --add dcache-core
[root] # chkconfig --add dcache-pool
[root] # chkconfig dcache-core on
[root] # chkconfig dcache-pool on

[return to top]

Since 1.8.0-14

A new init script was introduced in release 1.8.0-14. In addition to being able to start and stop dCache, it provides commands for creating and configuring pools. Thus pool creation is no longer part of the install script and pools can be created after the install script has been executed. We therefore proceed by finalising the initial configuration by executing /opt/d-cache/install/install.sh, for example:

[root] # /opt/d-cache/install/install.sh
INFO:Skipping ssh key generation

 Checking MasterSetup  ./config/dCacheSetup O.k.

   Sanning dCache batch files

    Processing adminDoor
    Processing chimera
    Processing dCache
    Processing dir
    Processing door
    Processing gPlazma
    Processing gridftpdoor
    Processing gsidcapdoor
    Processing httpd
    Processing info
    Processing infoProvider
    Processing lm
    Processing maintenance
    Processing pnfs
    Processing pool
    Processing replica
    Processing srm
    Processing statistics
    Processing utility
    Processing xrootdDoor


 Checking Users database .... Ok
 Checking Security       .... Ok
 Checking JVM ........ Ok
 Checking Cells ...... Ok
 dCacheVersion ....... Version production-1-8-0-14
        

No pools have been created on the node yet. Adding pools to a node is a two step process:

  1. The directory layout of the pool is created and filled with a skeleton configuration using dcache pool create <poolSize> <poolDirectory>, where <poolDirectory> is the full path to the directory which will contain the data files as well as some of the configuration of the pool, and <poolSize> is the size of the pool, specified in bytes or with a M, G, or T suffix (for mibibytes, gibibytes and tibibytes, respectively).

    Make sure that there is always enough space under <poolDirectory>. Be aware that only pure data content is counted by dCache. Leave enough room for configuration files and filesystem overhead.

    Creating a pool does not modify the dCache configuration.

  2. The pool is given a unique name and added to the dCache configuration using dcache pool add <poolName> <poolDirectory>, where <poolDirectory> is the directory in which the pool was created and <poolName> is a name for the pool. The name must be unique throughout the whole dCache installation, not just on the node.

    Adding a pool to a configuration does not modify the pool or the data in it and can thus safely be undone or repeated.

An example may help to clarify the use of these commands:

[root] # /opt/d-cache/bin/dcache pool create 500G /q/pool1
Created a 500 GiB pool in /q/pool1. The pool cannot be used until it has
been added to a domain. Use 'pool add' to do so.

Please note that this script does not set the owner of the pool directory.
You may need to adjust it.
[root] # /opt/d-cache/bin/dcache pool add myFirstPool /q/pool1/

Added pool myFirstPool in /q/pool1 to dcache-vmDomain.

The pool will not be operational until the domain has been started. Use
'start dcache-vmDomain' to start the pool domain.
[user] $ /opt/d-cache/bin/dcache pool ls
Pool        Domain                       Size   Free Path
myFirstPool dcache-vmDomain               500    550 /q/pool1
Disk space is measured in GiB.
        

All configured components can now be starting with dcache start, for example:

[root] # /opt/d-cache/bin/dcache start
Starting lmDomain  Done (pid=7514)
Starting dCacheDomain  Done (pid=7574)
Starting pnfsDomain  Done (pid=7647)
Starting dirDomain  Done (pid=7709)
Starting adminDomain  Done (pid=7791)
Starting httpdDomain  Done (pid=7849)
Starting utilityDomain  Done (pid=7925)
Starting gPlazma-dcache-vmDomain  Done (pid=8002)
Starting infoProviderDomain  Done (pid=8081)
Starting dcap-dcache-vmDomain  Done (pid=8154)
Starting gridftp-dcache-vmDomain  Done (pid=8221)
Starting gsidcap-dcache-vmDomain  Done (pid=8296)
Starting dcache-vmDomain  Done (pid=8369)
        

[return to top]

Upgrading a dCache Instance

Upgrading to bugfix releases within one version (e.g. from 1.6.6-1 to 1.6.6-3) may be done by shutting down the server and upgrading the packages with

[root] # rpm -Uvh <packageName>

For details on the changes, pease refer to the change log.

This section describes the upgrade of dCache instances installed with the previous version (currently version 1.6.5) and also some earlier versions - notably the one distributed with the previous or current WLCG software. The first section will give a quick upgrade guide. It might not be applicable to complex setups. Not all features of the new version will be enabled after the quick upgrade guide. The next section will give pointers on how to enable them.

Note that the upgrade of dCache is independent of a conversion of the pnfs database from GDBM to PostgreSQL. The conversion and upgrade to the PostgreSQL version of pnfs may be performed any time before or after the upgrade of dCache. Do not perform the dCache upgrade and the pnfs database conversion simultaneously. It is better to do them one after the other, and test the system inbetween. Do not be mislead by the fact the dCache release only contains the PostgreSQL version of pnfs (since version 1.6.6). See Chapter 19, Moving the pnfs Data from GDBM to PostgreSQL for a guide to convert and upgrade the pnfs server.

In case you are already using PostgreSQL (e.g. for the SRM), it is a good idea to upgrade to version 8 now, because prior to dCache Version 1.6.6 no precious data is stored and therefore can be wiped off, allowing a PostgreSQL 8 installation from scratch. Starting from dCache version 1.6.6 PostgreSQL will be utilized more heavily, making migration a complex task. Another advantage of PostgreSQL 8 is an integrated mechanism for automatic backups.

[return to top]

Quick Upgrade Guide

Stop the dCache services on all nodes of the instance:

[root] # /opt/d-cache/bin/dcache-pool stop
[root] # /opt/d-cache/bin/dcache-opt stop
[root] # /opt/d-cache/bin/dcache-core stop

Leave the pnfs server running. In WLCG installations, there might be a “meta-package” installed which can prevent the update to the current version. It should be deinstalled. The following command will do that and will not harm if the metapackage is not installed. Therefore, go ahead and do it anyway:

[root] # rpm -e lcg-SE_dcache

Upgrade the dCache RPM packages with

[root] # rpm -Uvh dcache-server-1.6.6-1.i386.rpm dcache-client-1.6.6-1.i386.rpm

For this quick upgrade you have to keep your old configuration files (i.e. config/dCacheSetup, config/PoolManager.conf etc/node_config, etc/door_config, and etc/pool_path). Do not use the templates.

However, make sure that etc/node_config contains

PNFS_OVERWRITE=no

and that a single value is assigned to NODE_TYPE. Check that etc/pool_path contains “no” in the third field of each line. You might also want to doublecheck the contents of etc/door_config on each node.

Run the install script:

[root] # /opt/d-cache/install/install.sh

And start the server again with

[root] # /opt/d-cache/bin/dcache-core start
[root] # /opt/d-cache/bin/dcache-pool start

Note that the start-up script for the optional components is not needed anymore. Therefore, it is probably best to remove them:

[root] # rm /opt/d-cache/bin/dcache-opt /etc/init.d/dcache-opt

[return to top]

After The Upgrade

This section gives a few hints for solving problems and fine-tuning after the upgrade.

Check if the information given in the files /opt/d-cache/etc/node_config and /opt/d-cache/etc/door_config is correct:

Check, that a single value is assigned to NODE_TYPE in /opt/d-cache/etc/node_config. If the assignment contains several words, the behaviour of some previous versions might be different from the new one.

Check that the doors which are started in the (now obsolete) /opt/d-cache/bin/dcache-opt start-up script are also enabled in /opt/d-cache/etc/door_config. The latter file is now evaluated by the start-up script /opt/d-cache/bin/dcache-core and not by the install script any more. This might lead to a different behaviour.

If, prior to the upgrade, you changed anything in a batch file (config/<domainName>.batch) these changes will be moved to files with names config/<domainName>.batch.rpmsave. There have been major changes to the batch files. Therefore it is necessary to reapply your changes. However, keep in mind that the batch files are considered to be part of the software and not configuration files.

It should not be necessary to change them in most situations. Try to find a prober configuration variable in config/dCacheSetup. (See the template in etc/dCacheSetup.template for hints.) If it should still be necessary to change a batch file, contact and report a “request for enhancement”. (See Chapter 5, The Cell Package for background information on the batch files.)

Your old config/PoolManager.conf will not be overwritten by the upgrade. Its format did not change. Therefore, it is fine to keep your old one. In case you did not customize the pool manager configuration, make sure that the set costcuts line reads

set costcuts -idle=0.0 -p2p=2.0 -alert=0.0 -halt=0.0 -fallback=0.0

Prior versions installed a config/PoolManager.conf with -idle=1.0 which will lead to undesired behaviour of the pool manager.

Before switching on the companion in config/dCacheSetup with the line

cacheInfo=companion

you have to be aware of the following:

For each file the list of pools the file is stored on (the cache info) is now stored within the pnfs namespace metadata. When switching on the companion, the dCache system expects it to be stored in a PostgreSQL database. You should first create this database:

[root] # createdb -U srmdcache companion
[root] # psql -U srmdcache companion -f /opt/d-cache/etc/psql_install_companion.sql

on the node where the PnfsManager will run (normally the admin node). Now, put

cacheInfo=companion

into config/dCacheSetup and restart the PnfsManager:

/opt/d-cache/bin/dcache-core restart

Now the dCache system will not be aware of any files stored on the pools. To make it aware again, you have to go through the following steps: Since this will take a while and will put a considerable load on the PnfsManager, take care that this is done with one pool at a time. You should also plan for a downtime:

In the admin interface (see the section called “The Admin Interface”) go to a pool, e.g.

(local) admin > cd <hostname>_1

and issue the command

(<poolname>) admin > pnfs register

Then go to the pnfs manager:

(<poolname>) admin > ..
(local) admin > cd PnfsManager

Check the output of the “info” command repeatedly:

(PnfsManager) admin > info
...
Threads (4) Queue
    [0] 10
    [1] 12
    [2] 9
    [3] 13
...

and wait till the value for all four queues is zero. Then go to the next pool and repeat the process.

[return to top]

Additional Components

Some features of dCache are switched off by default after an installation. The following describes how to put them to use:

[return to top]

Accounting/Billing Information in Database

The billing information which is normally written to /opt/d-cache/billing/ on the admin node will also be written to a database if config/dCacheSetup contains

billingToDb=yes

A PostgreSQL server is expected to run on the admin node with a database user “srmdcache” and a database “billing” with

[root] # createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt srmdcache
[root] # createdb -U srmdcache billing

[return to top]

Space Reservation

The space reservation feature between the SRM and the GridFTP door may be switched on with

spaceReservation=true

[return to top]

Old Installation Manual

The latest release of the dCache distribution is version 1-6-5.
Release notes describing new features and bug fixes can be found at

http://www.dcache.org/manuals/experts_docs/rel-dcache-1-6-5.html 

Note for SRM users
------------------
 - The latest SRM client (used with srmcp) has an extended set of
   parameters. Therefore it is necessary to renew the config file
   that is typically located in the home directory of the user running
   the client command (~/.srmconfig/config.xml). Simply remove the
   file config.xml, it will be re-generated following the new format 
   when running srmcp again.
 - The SRM client code in release 1-6-5 provides compatibility with 
   CERN's CASTOR SRM implementation. Therefore all users that are
   interested in SRM based data transfer between CASTOR and their
   dCache instance should upgrade to the client RPM that is part of
   the 1-6-5 distribution.
=======================================================================

Note: From version 1.2.2-6 on it is required to have a Postgres database
      installed and activated on the node that is running the SRM server.

      Note: If you have installed version 1.2.2-6(-1) or a later version 
      you need to drop the postgres tables. This is required because of 
      a db schema change.

Perform the following steps to remove the tables:
1. locate configuration file  srm.batch for dcache srm and find values of  
parameters jdbcUrl, jdbcUser and jdbcPass the last element of the jdbc url 
is your database name, for example if the 
value of jdbcUrl is dbc:postgresql://host/dcache then the name of the 
database is dcache.

2. Use these parameters and the "psql" postgress client to connect to the 
sql server:

$psql -U <user> -h <host> <database name>

Once psql connects to the server the command prompt will appear:

dbname=>

3. Execute the following commands (you can just cut-and-paste the following text
into the psql):

DROP TABLE  copyfilerequests ;
DROP TABLE  copyfilerequests_b ;
DROP TABLE  copyrequests ;
DROP TABLE  copyrequests_b ;
DROP TABLE  getrequests_protocols ;
DROP TABLE  getrequests_protocols_b ;
DROP TABLE  getfilerequests ;
DROP TABLE  getfilerequests_b ;
DROP TABLE  getrequests ;
DROP TABLE  getrequests_b ;
DROP TABLE  pins ;
DROP TABLE  pinrequests ;
DROP TABLE  srmnextrequestid ;
DROP TABLE  putrequests_protocols ;
DROP TABLE  putrequests_protocols_b ;
DROP TABLE  putfilerequests ;
DROP TABLE  putfilerequests_b ;
DROP TABLE  putrequests ;
DROP TABLE  putrequests_b ;
DROP TABLE  srmrequestcredentials ;

4. Make sure all tables have been dropped, type at the prompt to list all remaining tables
   \dt;
   Drop eventually remaining tables as described above.
 
You are done.



      The database is used to maintain state information about ongoing
      transfers in order to make them persistent to allow a restart of
      transfers in case of an interrupt (e.g. server failure/maintenance,
      network disconnect etc.). 

      Though there is no specific version required we recommend using a
      recent version that is usually part of the Linux distribution
      running on your system.
      Hints concerning the PostgreSQL configuration are provided below.
------------------------------------------------------------------------
Note: From version 1.2.2-7 on doors (GridFTP, SRM, gsidcap) can be 
      installed and configured on a selective basis, and, if required, on
      a node other than the admin node.
      Find the details below.

------------------------------------------------------------------------
       How to update a standard stand-alone dCache installation
------------------------------------------------------------------------
Because of the new SRM there are quite a few changes in the configura-
tion files. An old installation which has not been customized very much
is therefore updated most easily by doing a reinstall following these
rules: (The data in the pools will be preserved.)
-- Save copies of your old config files in /opt/d-cache/etc/ and
   /opt/d-cache/config/. Remove the old packages with 'rpm -e' or just
   by removing the whole directory /opt/d-cache/.
-- Install the d-cache packages according to the guide below with the
   following additions:
   - The PNFS system should stay as it is.
   - Use your old "etc/pool_path", but set the last column
      to "No". Otherwise the data in your pools would be deleted!
   - Use the old "etc/node_config" and "etc/dcache.kpwd" if needed
   - Create a new "config/dCacheSetup" starting from
     "etc/dCacheSetup.template" as described or with the aid of the old
     file.
     (Try: diff old-etc/dCacheSetup.template old-config/dCacheSetup)

For a customized installation it might be better to use the existing
configuration directories /opt/d-cache/etc/ and /opt/d-cache/config/ and
adjust them to the new version of the SRM. Especially the file
"config/srm.batch" has to be adjusted. A detailed description of the
parameters in this file is given at the end of these instructions.
------------------------------------------------------------------------

Find a set of rpms (as of 01/23/2005) to install a dCache based Disk Pool
Management system (no HSM support) at

 http://www.dcache.org/downloads/dcache-v1.2.2-7-j14.tgz


Get the tarball

 wget http://www.dcache.org/downloads/dcache-v1.2.2-7-j14.tgz

Unzip the tarball

 tar xvzf dcache-v1.2.2-7-j14.tgz

You should find the following files

Release.notes	
d-cache-core-1.5.2-xx.i386.rpm
dCache-installation-instructions.txt       
d-cache-client-1.0-xx.i386.rpm	
d-cache-opt-1.5.3-xx.i386.rpm
dcache-user-instructions.txt  
pnfs-3.1.10-xx.i386.rpm


The tar file contains 4 rpms:
 - pnfs manager
 - dCache core (admin/pool node)
 - dCache optional components for admin node 
   (srm/gridftp servers and the gsidcapdoor)
 - client (32 and 64 bit support for dcap access combined in a single lib
   (/opt/d-cache/dcap/lib/libdcap.so), e.g dc_lseek, dc_lseek64)
           

To set up a dCache instance that allows to access it via the dCap protocol
the following components need to be installed
 - pnfs        The namespace manager (appears as a filesystem to the user)
 - admin node  Provides all functionalities to manage a distributed disk pool
               (can also hold a pool)
 - pool node   A node that provides storage capacity the dCache instance which
               is managed by the PoolManager running on the admin node
To extend accessibility through GridFTP and SRM optional software components can 
be installed in addition to the core RPM. It is sufficient to just install the
RPM. No additional step is required.

Note: With the installation of the dCache core some configuration parameters 
      are stored in pnfs. Therefore the pnfs manager needs to be installed
      first.
      Though the pnfs manager and the dCache core (admin node) by design can
      be installed on different nodes this version of the installation package
      assumes that both are installed on the same physical machine. 


Prerequisites
-------------

I.   The dCache software is written in Java and requires a recent version of either
     the JAVA developer kit (jdk) or the runtime environment (jre) to be installed.

II.  In case the dCache is going to be accessed via GridFTP and/or SRM a host
     certificate is required. Contact the CA responsible for your community for
     details. The certificate is expected to be installed in
     /etc/grid-security.


III. PostgreSQL needs to be installed on the node running the cntral dCache
     services (i.e. the admin node). The db is used by the SRM server, SRM Pin Manager
     and the Resilience Manager. In case these services are not running on the node
     the db is installed on make sure it is allowed to connect to the db. Add
     a "host" entry to the table as described below.
     Get a recent version from the Linux distribution that is running on your system.
     Alternatively, RPMs can be found at

     http://www.postgresql.org/ftp/

     A version that is suitable for current versions of RH SL3 can be found at
     http://www.postgresql.org/ftp/binary/v8.0.4/linux/rpms/redhat/rhel-es-3.0/
     
     Client, Server and JDBC support is needed.

     The following instructions shall be used to configure and initialize the
     databases. They need to be executed only following the installation of
     the database. An upgrade of the dCache code does not require the 
     commands to be executed again. All commands shall be carried out by
     user 'postgres'

     su postgres
     # Create directory the db will live in
     mkdir <database_directory_name>/data

     # Command to initialize DB
     initdb -D <database_directory_name>/data
     
     # Enable network access in postgres config file (default port 5432 is used)
     <database_directory_name>/data/postgresql.conf
     #
     tcpip_socket = true

     # Edit <database_directory_name>/data/pg_hba.conf to allow hosts to connect
     # to the DB (records at the bottom of the file)
     # TYPE  DATABASE    USER        IP-ADDRESS        IP-MASK           METHOD

     local   all         all                                             trust
     host    all         all         127.0.0.1         255.255.255.255   trust
     host    all         all         <IP of DB host>   255.255.255.255   trust
     host    all         all         <IP of SRM host>  255.255.255.255   trust (if SRM host != DB host)
     #
     # Command to start the DBMS, make sure the log file exists and
     # user 'postgres' has write permission
     postmaster -i -D <database_directory_name>/data >logfile 2>&1 &
     [Note: You may want to create an rc-script under /etc/init.d 
            to automatically start the DB upon start of the system]

     # Command to create the DB for the SRM
     createdb dcache
     
     # Command to connect to the DB
     psql -U postgres dcache

     # Create DB user 'srmdcache'
     create user srmdcache password 'srmdcache';
     # Disconnect from dcache db
     \q

     # All tables required for SRM operation will be created by the SRM
     # server

     # Command to create the DB for the Resilience Manager
     createdb -O srmdcache replicas

     # Initialize db tables for the Resilience Manager
     # This step requires the dcache-core RPM (v 1.5.2-80 or higher) to be installed
     psql -d replicas -U srmdcache -f /opt/d-cache/etc/pd_dump-s-U_enstore.sql     

     # Just for completeness: Command to stop the DBMS (as user 'postgres')
     # pg_ctl stop -D <database_directory_name>/data

To install the pnfs manager follow the instructions below
-------------------------------------------------------

 1. install the pnfs rpm

 2. copy the template /opt/pnfs.3.1.10/pnfs/etc/pnfs_config.template =>
                      /opt/pnfs.3.1.10/pnfs/etc/pnfs_config
    and customize pnfs_config according to your needs

    The pnfs config file contains

PNFS_INSTALL_DIR = /opt/pnfs.3.1.10/pnfs
PNFS_ROOT = /pnfs
PNFS_DB = /opt/pnfsdb
PNFS_LOG = /var/log/pnfsd.log
PNFS_OVERWRITE = no
    - don't overwrite pnfsdb if one exists in the place specified above

 3. run the install script at
    /opt/pnfs.3.1.10/pnfs/install/pnfs-install.sh
    - It generates the file "pnfsSetup" in /usr/etc/

 4. Start/Stop pnfs
    /opt/pnfs.3.1.10/pnfs/bin/pnfs start|stop
    - starts pnfs and mounts it at /pnfs/fs

 5. Security
    In order to minimize the administrative overhead the pnfs filesystem (/pnfs
    and /fs) is exported world-wide by default. /pnfs is required by local clients 
    utilizing the dcap protocol, while /fs is needed by dCache doors (SRM, GridFTP,
    gsidcap) that are not running on the host the pnfs filesystem is installed on.
    
    The ability to mount these filesystems can be limited by applying a kind of 
    "network mask" as a file name.
    The installation of pnfs installs a file called 0.0.0.0..0.0.0.0 in
    /pnfs/fs/admin/etc/exports. Suppose the ability to mount /pnfs (/fs) should be 
    limited to local hosts living in network 123.111. Therefore  the file would have 
    to be renamed to 255.255.0.0..123.111.0.0 or 255.255.255.0..123.111.1.0 for a 
    class C network. This can further be limited to individual hosts and particular
    pnfs subtrees, e.g. the host with IP address 123.111.1.1 is allowed to mount 
    /pnfs/theorie
    - create a file named 123.111.1.1
    - content of the file is (one line)
      /theorie   /0/root/fs/usr/data/theorie  30 rw,soft
    The mechanism as it is implemented will first look for the host IP address and will
    apply the rule if the file exists. If it doesn't it will select the one with the
    "mask" and will apply the rule therein respectively.
    

To install the admin node and or pool node(s) follow the instructions below
---------------------------------------------------------------------------

 1. Install the dCache core rpm. In case you want to install optional
    components, like the srm/gridftp servers and/or the client 
    components, it's a good time to install the "d-cache-opt" rpm(s) as well. 
    (Can also be done later.)
    NOTE: In case of the intent to access data using SRM based transfers 
          (srmcp) with an installation with multiple pool nodes the 
          d-cache-opt rpm need to be installed on every pool node. In
          addition to the software components each pool node needs a 
          host certificate and full access to the public Internet for
          TCP connections in the port range from port 20000 - 50000. 

    From dcache-core RPM rev 1.5.2-80 on the Resilience Manager is included.
    Please find more information about its functionality and configuration at
    http://cmsdcam.fnal.gov/dcache/resilient/Resilient_dCache_v1_0.html.

    The Resilience Manager is preconfigured but not automatically started
    with the core services. The dCache core start-up script contains the
    instructions required to start/stop the replica domain, but they are
    commented out. Remove the "#" at the beginning of the related lines.


 2. configure the installation by using the following template files
    in /opt/d-cache/etc. The arrow indicates the name of the customized file
    - node_config.template  --> /opt/d-cache/etc/node_config
    - dcache.kpwd.template  --> /opt/d-cache/etc/dcache.kpwd
    - dCacheSetup.template  --> /opt/d-cache/config/dCacheSetup
    - pool_path.template    --> /opt/d-cache/etc/pool_path
    - door_config.template  --> /opt/d-cache/etc/door_config   (!!! NEW !!!)
    
    In case of a virgin machine (not an upgrade of an existing dCache
    installation) copy the .template file to its base name (e.g.
    cp node_config.template node_config) and customize the latter
    according to your requirements.
    Note: the final place of the dCacheSetup file is 
          /opt/d-cache/config/dCacheSetup. You need to copy it
          manually from /opt/d-cache/etc to the config directory.

 2.1. etc/node_config
    There is no dedicated rpm for the installation of a pool-node any
    longer. Selection of admin vs. pool node is done via the NODE_TYPE
    parameter in the node_config file. The admin node can also contain
    pools.

NODE_TYPE = dummy # either admin or pool
DCACHE_BASE_DIR = /opt/d-cache
PNFS_ROOT = /pnfs
PNFS_INSTALL_DIR = /opt/pnfs.3.1.10/pnfs
PNFS_START = yes               (start pnfs in case it's not running)
PNFS_OVERWRITE = no            (in case dCache config exists in pnfs)
POOL_PATH = /opt/d-cache/etc   (in case pools are to be configured on
                                admin node; for details see pool instr.)
NUMBER_OF_MOVERS = 100
      
      Copy the template to its base name, if required, and edit the resulting
      file as desired.

 2.2. etc/dcache.kpwd

      The dcache.kpwd authentication file.
      The template needs to be customized and is expected as
      /opt/d-cache/etc/dcache.kpwd

      In case there is an existing dcache.kpwd it will not be overwritten
      See the release notes for further information on the format.

 2.3. config/dCacheSetup

      Important note: If an existing dCacheSetup file is going to be re-used
                      make sure the Java classpath setting is uptodate. The 
                      setting that is required by this version of the software
                      can be found in /opt/d-cache/etc/dCacheSetup.template

                      From version 1.2.2-7 on there is a new parameter to
                      support remote db connections for the SRM server
                      "srmDbHost=<your.dbHost.org>"

      this is the primary configuration file for the dCache core and
      optional components, i.e. srm/gridftp
      Things that need attention (anything else has reasonable defaults)
      - java path
      - serviceLocatorHost
        - use the host name of the node that is running pnfs as it is
          defined in your DNS, replace string "SERVER" by the host name
      - pnfsSrmPath (default is /)
      - srmDbHost=<your.dbHost.org> to let the SRM server know about the db host
      - The following parameters should be set to "true" if the dCache
        installation is going to be used as a LCG Storage Element
        - RecursiveDirectoryCreation=true
        - AdvisoryDelete=true
      If dCache was previously not running on this machine or if there is no
      dCacheSetup file in the config directory copy the dCacheSetup.template
      file to /opt/d-cache/config and customize it according to your needs.

 2.4. etc/pool_path

      The template contains pool parameters (path, size, etc)

      The format of the pool_path file is (3 columns)

      /path/to/pool  size[GB]  "overwrite if exists (yes/no)"
   
      [Note: GB means 1024^3; space for inodes etc. is not accounted for]
      Copy the template to its base name, if required, and edit the resulting file
      as desired. Use an empty file if no pools are wanted (e.g. on a pure admin node).

 2.5. Install and configure the doors (GridFTP, SRM, gsidcap) -
      etc/door_config
 
      A "door" node (neither an "admin" nor a "pool" node) requires the core
      and the opt RPMs to be installed. However, only the following
      installation script needs be executed

      /opt/d-cache/install/install_doors.sh 
      (DON'T run /opt/d-cache/install/install.sh)

      Make sure the template (/opt/d-cache/etc/door_config.template was copied
      to /opt/d-cache/etc/door_config and customized before running the install_doors.sh
      script.

      The format of the door_config file is (2 columns)

      ADMIN_NODE      <name of admin node running pnfs>

      door          active   (default is all active)   
      --------------------
      GSIDCAP         yes    (or "no")
      GRIDFTP         yes    (or "no")
      SRM             yes    (or "no")

      Also the dCacheSetup.template file needs to be copied to
      /opt/d-cache/config/dCacheSetup and customized accordingly.

      If a door or multiple different doors are to be added to an "admin"
      and/or a "pool" node

      - Install the dcache-opt RPM on each node
      - Make sure the template (/opt/d-cache/etc/door_config.template was
        copied to /opt/d-cache/etc/door_config and customized before running
        the install_doors.sh
      - On a "pool" node, copy the dcache authentication file (../etc/dcache.kpwd) 
        from the admin node to /opt/d-cache/etc on the "pool" node(s)

 3. To install an "admin" or a "pool" node run the install script at

    - /opt/d-cache/install/install.sh

      For an "admin" node this will do all dCache specific preparations in pnfs, etc. 
      If there is a pool location configured in pool_path it will also install a pool
      (in case the file is empty it will not install/configure any pool related
      stuff).

    - /opt/d-cache/install/install_doors.sh

      in case one or multiple different of the following doors are supposed to 
      be installeda on the admin and/or the pool nodeB
      - GridFTP
      - SRM
      - gsidcap
      This will update the ../bin/dcache-opt script accordingly.         
      Note: "door" nodes need to mount the pnfs fs. Make sure NFS related
            communication is enabled between the "admin" and the "door" node(s).
            For pnfs installations prior to the one which is part of the 1.2.2-7
            distribution do the following
            
            - Make sure pnfs is running
            - cp /pnfs/fs/admin/etc/exports/127.0.0.1 \
              /pnfs/fs/admin/etc/exports/0.0.0.0..0.0.0.0
              (overwrite existing file)

    - Monitoring of the door domains via the Web page
      In the recent setup, the srm and the gridftp door(s) have changed their
      name(s) so that the web page is asking the wrong cell whether or not it's alive.
      The srm/gridftp name changed from SRM/GFTP to SRM/GFTP-<HOSTNAME> (where HOSTNAME is
      the host, the SRM/GFTP door is running on). To properly update the status page you 
      have to manually modify the config/httpd.batch on the headnode (or the node were the 
      http service is running).
      At the end of the httpd.batch file you will find a list of 'cells'. The ones you
      need to change are called SRM and GFTP. Please change them to SRM-<HOSTNAME> and
      GFTP-<HOSTNAME> respectively. In case multiple gridftp doors are configured you need 
      to add as many lines as there are gridftp doors.
      You need to restart the httpd service to activate the changes.
      In case you don't want to restart the services you may
      as well make the changes in the batch file (for future
      restarts ) and use the ssh interface to make temp.
      changes :

      (local) admin  cd collector@httpdDomain
      >>(collector@httpdDomain) admin > unwatch SRM
      > >>>(collector@httpdDomain) admin > unwatch GFTP
      > >>>(collector@httpdDomain) admin > unwatch DCap-gsi
      > >>>(collector@httpdDomain) admin > 
      > >>>(collector@httpdDomain) admin > watch SRM-<HOSTNAME>
      > >>>(collector@httpdDomain) admin > watch GFTP-<HOSTNAME>
      > >>>(collector@httpdDomain) admin > watch DCap-gsi-<HOSTNAME>
            
 4. Start/stop the dCache services

    Make sure pnfs is running and the pnfs filesystem is mounted
    - To start/stop pnfs 
     /opt/pnfs.3.1.10/pnfs/bin/pnfs start|stop
     Starting pnfs will also mount the fs, stopping it will unmount pnfs.

    Start the core services
    /opt/d-cache/bin/dcache-core start|stop

    [in case dCache optional components (srm/gridftp/gsidcapdoor) are
    installed on an "admin", a "pool" or a "door" node they are
    started/stopped with
     /opt/d-cache/bin/dcache-opt start|stop ]

    To start|stop a pool use
    /opt/d-cache/bin/dcache-pool start|stop


 5. Client installation
    Client components are installed under /opt/d-cache.

    The libraries (32 and 64-bit versions) can be found under
     /opt/d-cache/dcap/lib. libdcap.so and libpdcap.so
    are symbolic links pointing to the 64-bit version.
    Also the gsidcap tunnel lib (libgsiTunnel.so) is
    installed here. In case the 32 bit version is supposed to be
    the default the links can be customized accordingly.

    Besides the libraries header files (/opt/d-cache/dcap/include)
    and the dccp binary (/opt/d-cache/dcap/bin) are installed with
    the Client RPM).    
    It is sufficient to install the Client RPM. No further installation
    step is required to make the client functions operational.

 6. Log files

    Common location for all dCache related log files is
    /opt/d-cache/log
    The default location for the PNFS log file is
    /var/log/pnfsd.log

The default is all pools register automatically with the default
pgroup.

dCache SRM installation and configuration instructions
======================================================

Requirements on the srm and pool nodes

   1. The nodes on which srm server (srm cell) and pool 
      nodes are installed need to have the grid host certificate 
      installed. Please refer to the instructions from your Certification 
      Authority on how to obtain a grid host certificate. 
   2. There should be a postgres database server running on a 
      machine accessible by the srm server, and there should be a 
      postgres user account created, capable of creating new 
      tables.
      Instructions on the installation are provided in section
      "Prerequisites", top III

Non-standard DCache Services that srm relies upon
[Note: The configuration options as they are described below are
       already part of the srm.batch file as it is coming with
       this package. Ususally no modifications are necessary.
       Only when an existing installation is upgraded _AND_ the
       admin is going to reuse the old srm.batch file the "pin
       manager" config entries need to be added. The standard 
       RPM upgrade mechanism overwrites that file.] 

   1. Pin Manager.
      Pin Manager is used by srm to perform the so 
      called file "pin in cache" operation. When a file is in pinned 
      state, it will not be deleted from the cache to make room for 
      other incoming files. The Pin Manager cell can be started by 
      adding the following lines to one of the dcache domain 
      configuration "batch" files (e.g. srm.batch):
    
    #
    #pin manager
    #
    create diskCacheV111.services.PinManager PinManager \
    " default -export  \
     -jdbcUrl=jdbc:postgresql://localhost/dcache \
     -jdbcDriver=org.postgresql.Driver \
     -dbUser=<user> \
     -dbPass=<password>"

     [Note: defaults for <user>=srmdcache, <password>=srmdcache]

     The configurable parameters are the folowing:
    -jdbcUrl url is pointing to the type and the location of the 
	database, which will be used by the pin manager. For example, 
	if the database is running on a host "hosta" on a nonstandard 
        port "12345", and the database name is "name1", this option 
        value would be "jdbc:postgresql://hosta:12345/name1".
     -jdbcDriver specifies the class name for the driver. Should 
        remain the same for the postgres database.
     -dbUser a name of the database user
     -dbPass a password for the database user, could be an arbitrary 
	string if the host on which the pin manager is running is 
	included in the postgres list of the trusted hosts.
     The two optional parameters are the 
     -poolManager and -pnfsManager, which allow the specification 
       of alternative names for the PoolManager and PnfsManager cells.
    2. GsiftpTransferManager
       This service is used by srm to perform the transfers 
       from a remote server to the dcache via the gsiftp protocol.
       The GsiftpTransferManager cell is started by the folowing 
       "batch" command:

      #
      # RemoteGsiftpTransferManager
      #
      create diskCacheV111.services.GsiftpTransferManager
      RemoteGsiftpTransferManager
      \
        "default -export \
        -pool_manager_timeout=60 \
        -pnfs_manager_timeout=60 \
        -pool_timeout=300 \
        -mover_timeout=86400 \
        -max_transfers=30 \
      "
      The configurable parameters are the folowing:

      -pool_manager_timeout is the timeout in seconds for PoolManager 
       message exchanges.

      -pnfs_manager_timeout is the timeout in seconds for PnfsManager 
       message exchanges.

      -pool_timeout is the timeout in seconds before the first pool 
       message, confirming the the creation of the mover

      -mover_timeout is the time before the transfer manager will 
       stop waiting for the completion of the started transfer. If
       expired it will try to kill the mover, and report the error 
       back to the caller (srm).

      -max_transfers is the maximum number of simultaneous transfers.
       If more transfers are scheduled, the transfer manager will fail 
       them.
      
      3. Copy Manager, 
	This service is used by the srm when the source and the destination
        files in the srm copy request are both local to the storage.
        Its configuration parameters are mostly the same as of the
        GsiftpTransferManager. The example startup command follows:
        
	create diskCacheV111.doors.CopyManager CopyManager \
	"default -export \
	-pool_manager_timeout=60 \
	-pool_timeout=300 \
	-mover_timeout=86400 \
	-max_transfers=30 \
        "

SRM Configuration Guide
-----------------------
Note: The package is coming with a set of suitable parameters. We expect that
      modifications are necessary in rare cases only.
  
      We provide deep technical information for those who are interested in
      the details. Those details are not required to operate the SRM/dCache. 


	In order to start the srm server, the instance of the cell
	diskCacheV111.srm.dcache.Storage needs to be created. Here is 
 	the example of a dcache batch file command starting the srm cell, 
	illustrating most of the configurable parameters:

	create diskCacheV111.srm.dcache.Storage  SRM \
            "default -srmport=${srmPort1} \
            -export \
            -kpwd-file=${config}/dcache.kpwd \
            -pnfs-srm-path=/ \
            -buffer_size=1048576 \
            -tcp_buffer_size=1048576 \
            -parallel_streams=10 \
            -debug=true \
            -get-lifetime=86400000 \
            -put-lifetime=86400000 \
            -copy-lifetime=86400000 \
            -get-req-thread-queue-size=1000 \
            -get-req-thread-pool-size=30 \
            -get-req-max-waiting-requests=1000 \
            -get-req-ready-queue-size=1000 \
            -get-req-max-ready-requests=30 \
            -get-req-max-number-of-retries=10 \
            -get-req-retry-timeout=60000 \
            -get-req-max-num-of-running-by-same-owner=10 \
            -put-req-thread-queue-size=1000 \
            -put-req-thread-pool-size=30 \
            -put-req-max-waiting-requests=1000 \
            -put-req-ready-queue-size=1000 \
            -put-req-max-ready-requests=30 \
            -put-req-max-number-of-retries=10\
            -put-req-retry-timeout=60000 \
            -put-req-max-num-of-running-by-same-owner=10 \
            -copy-req-thread-queue-size=1000 \
            -copy-req-thread-pool-size=8 \
            -copy-req-max-waiting-requests=1000 \
            -copy-req-max-number-of-retries=30\
            -copy-req-retry-timeout=6000 \
            -copy-req-max-num-of-running-by-same-owner=10 \
            -recursive-dirs-creation=true \
            -jdbcUrl=jdbc:postgresql://localhost/dcache \
            -jdbcDriver=org.postgresql.Driver \
            -dbUser=srmdcache \
            -dbPass=srmdcache \
            "    
       
	The available configuration options are:

        -kpwd-file specifies the location of the dcache authorization 
	"database" file.

        -pnfs-srm-path specifies the root of the srm within the pnfs
	namespace. Essentially this means that the value of this option 
	will be prepended to all the local storage paths given to the srm 
        server.

        -buffer_size and -tcp_buffer_size specify the size of memory, in
	bytes, and socket buffer size, in bytes, to be used with the embedded 
	gsiftp clients, when performing transfers between the storage and 
	a gsiftp server.

        -parallel_streams specifies the max. number of parallel streams to be
	used by the embedded gsiftp client.

        -debug tells if extra debug info should be logged. Most of the debug 
        logging can be turned off by setting the printout domain variable to
	error (2). Usually this is done in the first line of the dcache batch
	file.

        -recursive-dirs-creation turnes on and off the automatic creation
	 of unexsistent directories, in case of put/copy requests.

        -jdbcUrl, -jdbcDriver, -dbUser, -dbPass: these options have exactly
	 same meaning as the same options of the PinManager.
         -jdbcUrl url is pointing to the type and the location of the
          database, which will be used by the pin manager. For example,
          if the database is running on a host "hosta" on a nonstandard
          port "12345", and the database name is "name1", this option
          value would be "jdbc:postgresql://hosta:12345/name1".
         -jdbcDriver specifies the class name for the driver. Should  
          remain the same for the postgres database.

        -get-lifetime, -put-lifetime, -copy-lifetime specify the lifetimes, in 
        milliseconds, of the srm get, put and copy requests respectively.

        In order to develop a better understanding of the rest of the 
        parameters we will first describe how the request scheduler works.
        Please note that the following explanation is a simplification.
        The SRM Scheduler executes the instances of the SRM Job classes. For
	the scheduler, execution of the job is the execution of the job's run
	method in one of the threads. Jobs are initially in the Pending state. 
	Once the scheduler receives the job, it puts it in the TQueued state and 
	transfers it into the Thread Queue.
	The Scheduler takes the java threads from the pool and will actually execute 
	the jobs' "run" methods. Once a thread in the pool becomes available,
	the first job from the Thread Queue is removed, and this job's state is
	changed to running. The thread starts the execution of the job's run method.
        Once the run method returns, the state of the job can still remain
	"Running", or it might have changed to  "Done" or "AsyncWait". 

        If the state is still "Running", it will be placed on the ready queue.
	Once the job execution is completed, it now waits to be put to the "Ready"
	state by the scheduler.
	If it is "AsyncWait" this means that the job is partially completed, and
	it now waits for the internal event to continue execution; if it is "Done", 
	the job needs no further processing.
        
        To limit the number of simultaneous transfers by the srm client/user (in
	cases when the srm server does not perform the transfer itself, i.e.
	"get" and "put" requests) the number of jobs that are "Ready" can be
	limited according to the number set in the configuration. The rest of the 
        requests, which are prepared to be "Ready" are put on the "Ready" queue. 
        Once the clients finish the transfers, they notify the system by changing 
        the state of the put or get file requests to "Done". If users/clients never 
        perform this state change, the request changes its state automatically 
        upon the expiration of the request's lifetime. If "Ready" spots become 
        available, corresponding requests are removed from the Ready queue and 
        their state changes to "ready".
        
        In the dcache srm there are three instances of the scheduler, one for
	each possible type of srm requests: copy, get and put.

        The following options are described as follows:
         
        -[type]-req-thread-queue-size - maximum number of requests in the
	thread queue

        -[type]-req-thread-pool-size - maximum number of threads in the thread 
	 pool. This parameter is especially important for copy requests,
	 since the copy operation for each file is performed in a separate thread. 
         The number for copy requests should be less than the -max_transfers 
         parameter of the transfer managers.

        -[type]-req-max-waiting-requests - maximum number of requests in
         the async wait jobs
         
        -[type]-req-ready-queue-size - maximum number of requests in the 
	 ready queue. This and the following parameters are not important 
         for the copy scheduler.

        -[type]-req-max-ready-requests
	 this parameter is important for put and get requests, and it is 
         equivalent to the number of transfer urls given out to the clients,
         which are actively transferring (or intend to transfer).

        -[type]-req-max-number-of-retries
         number of times the job is allowed to fail and to be retried
	 before SRM should give up and return an error to the user.

        -[type]-req-max-num-of-running-by-same-owner
         The job owner is roughly equivalent to the user account in the kpwd
	 file.
         When the jobs are removed from the Thread and Ready Queue, their
	 "owner" is taken in consideration. If the number of jobs submitted by 
         the user exceeds the number in the configuration, jobs of this particular 
         user will not be removed from the queue, even if they are first. 
         If there are jobs belonging to another user, for whom this number was not 
	 reached yet, these will be executed first. This will not lead to 
         underutilization of the system. If only one owner is running jobs 
         they all will get scheduled to occupy all available scheduler threads 
         or ready spots.

	Other available options are (these are not recommended to be used so
	they are not explained here) :
        -poolManager, 
	-pnfsManager, 
        -proxies-directory
        -url-copy-command
        -timeout-command
        -usekftp
        -globus-url-copy
        -use-urlcopy-script
        -use-dcap-for-srm-copy
        -use-gsiftp-for-srm-copy
        -use-http-for-srm-copy
        -use-ftp-for-srm-copy
        -save-memory

        For more information refer to the dCache web site at http://www.dcache.org and the 
        FNAL SRM web site at http://www-isd.fnal.gov/srm.

System Monitoring
=================

http://admin.node.org:2288

 - allows monitoring of services only

Admin Interface
===============

The admin interface offers a very rich set of commands (to
be described elsewhere) allowing to alter system configuration
while the system is running and to solve eventual problems.

Since the distribution comes with an initial password it is
important to login right after the installation in order to
customize it.
Note: The following is supported in version 1.2.2-1 for the
      first time and will be supported in future releases.

   Assuming you are logged in on the admin node log in to the
   admin interface to set the password

    ssh -l admin -c blowfish -p 22223 localhost
    (passwd : dickerelch)

    (local) admin > cd acm
    (acm) admin > create user admin
    (acm) admin > set passwd <newPasswd> <newPasswd>
    (acm) admin > ..
    (local) admin > logoff

  From now on login as user admin will only be successful if
  newPasswd is presented.

  Note: When setting the password string in the shell one can
        disable the echo by typing "ctrl I" following "set
        password".

[return to top]

Running dCache on non Linux operating systems

Because of the wide spread usage of dCache in the grid word, we provide the dCache binary packages mainly for a certain Linux flavour complaint to the rest of the WLCG, gLite or OSG software. The dCache system itself is, except for the filesystem engine (pnfs) and the dCap client, pure Java, so that a port should be easy as long as the OS is coming with a resonably modern Java virtual machine. For now dCache is fine with Java 1.5. This chapter should give an estimate on how much effort is necessary to do the porting to other operating systems.

[return to top]

dCache Pools

The dCache pool code is certainly the easiest part to port.

Chapter 3. Getting in Touch with dCache

This section is a guide for exploring a newly installed dCache system. The confidence obtained by this exploration will prove very helpful when encountering problems in the running system. This forms the basis for the more detailed stuff in the later parts of this book.

The starting point is a fresh installation according to the installation instructions shipped with any distribution of dCache. All components (pnfs, dcache-core, dcache-pool, and dcache-opt) are started on the same host. Additional pools on other hosts may also be running.

[return to top]

Checking the Functionality

First, we will get used to the client tools. On the dCache admin host, change into the pnfs directory, where the users are going to store their data:

[user] $ cd /pnfs/<site.de>/data/
[user] $

Note that on the dCache admin node this directory is a link to /pnfs/fs/usr/ and the NFS export localhost:/fs is mounted to /pnfs/fs. The NFS server running on the dCache admin node is part of the pnfs system which is not part of dCache but heavily used by it.

The pnfs filesystem is not intended for reading or writing actual data with regular file operations via the NFS protocol. However, localhost:/fs is a special, privileged NFS export that allows reading and writing for administrative tasks. It should only be mounted by the admin node.

Reading and writing data to and from a dCache instance can be done with a number of protocols. After a standard installation, these protocols are dCap, GSIdCap, and GridFTP. In addition dCache comes with an implementation of the SRM protocol which negotiates the actual data transfer protocol.

We will first try dCap with the dccp command:

[user] $ export PATH=/opt/d-cache/dcap/bin/:$PATH
[user] $ cd /pnfs/<site.de>/data/
[user] $ dccp /bin/sh my-test-file
541096 bytes in 0 seconds

This command succeeds if the user user has the Unix rights to write to the current directory /pnfs/<site.de>/data/.

The dccp command also accepts URLs. We can copy the data back using the dccp command and the dCap protocol but this time describing the location of the file using a URL.

[user] $ dccp dcap://<adminNode>/pnfs/<site.de>/data/my-test-file /tmp/test.tmp
541096 bytes in 0 seconds

However, this command only succeeds if the file is world readable. The following shows how ensure the file is not world readable and illustrates dccp consequently failing to copy the file.

[user] $ chmod o-r my-test-file
[user] $ dccp dcap://<adminNode>/pnfs/<site.de>/data/my-test-file /tmp/test2.tmp
Command failed!
Server error message for [1]: "Permission denied" (errno 2).
Failed open file in the dCache.
Can't open source file : "Permission denied"
System error: Input/output error

This command did not succeed, because dCap access is unauthenticated and the user is mapped to a non-existent user in order to determine the access rights. However, you should be able to access the file with the NFS mount:

[user] $ dccp my-test-file /tmp/test2.tmp
541096 bytes in 0 seconds

If you have a valid grid proxy with a certificate subject which is properly mapped in the configuration file /opt/d-cache/etc/dcache.kpwd you can also try grid-authenticated access via the GSI-authenticated version of dCap:

[user] $ chgrp <yourVO> my-test-file
[user] $ export LD_LIBRARY_PATH=/opt/d-cache/dcap/lib/:$LD_LIBRARY_PATH
[user] $ dccp gsidcap://<adminNode>:22128/pnfs/<site.de>/data/my-test-file /tmp/test3.tmp
541096 bytes in 0 seconds

Or we let the SRM negotiate the protocol:

[user] $ export PATH=/opt/d-cache/srm/bin/:$PATH
[user] $ srmcp srm://<adminNode>:8443/pnfs/desy.de/data/my-test-file file:////tmp/test4.tmp
configuration file not found, configuring srmcp
created configuration file in ~/.srmconfig/config.xml

If the dCache instance is registered as a storage element in the LCG/EGEE grid and the LCG user interface software is available the file can be accessed via SRM:

[user] $ lcg-cp -v --vo <yourVO> \
srm://<dCacheAdminFQN>/pnfs/<site.de>/data/my-test-file \
file:///tmp/test5.tmp
Source URL: srm://<dCacheAdminFQN>/pnfs/<site.de>/data/my-test-file
File size: 541096
Source URL for copy: gsiftp://<dCacheAdminFQN>:2811//pnfs/site.de/data/my-test-file
Destination URL: file:///tmp/test5.tmp
# streams: 1
Transfer took 770 ms

and it can be deleted with the help of the SRM interface:

[user] $ srm-advisory-delete srm://<dCacheAdminFQN>:8443/pnfs/<site.de>/data/my-test-file
 srmcp error :  advisoryDelete(User [name=...],pnfs/<site.de>/data/my-test-file) 
Error user User [name=...] has no permission to delete 000100000000000000BAF0C0

This works only if the grid certificate subject is mapped to a user which has permissions to delete the file:

[user] $ chown <yourVO>001 my-test-file
[user] $ srm-advisory-delete srm://<dCacheAdminFQN>:8443/pnfs/<site.de>/data/my-test-file

If the grid functionality is not required the file can be deleted with the NFS mount of the pnfs filesystem:

[user] $ rm my-test-file

[return to top]

The Web Interface for Monitoring dCache

In the standard configuration the dCache web interface is started on the admin node and can be reached via port 2288. Point a web browser to http://<adminNode>:2288/ to get to the main menue of the dCache web interface. The contents of the web interface are self-explanatory and are the primary source for most monitoring and trouble-shooting tasks.

The “Cell Services” page displays the status of some important cells of the dCache instance. You might observe that some cells are marked “OFFLINE” even though you know that they are running and fine. These might be the cells “SRM”, “GFTP”, and “DCap-gsi”. The reason is that the names of the cells monitored by the web interface are explicitly configured in the file /opt/d-cache/config/httpd.batch. However, the cells have other names. If you change the following section near the end of the file

#
create diskCacheV111.cells.WebCollectorV3 collector \
    "PnfsManager \
     PoolManager \
     GFTP \
     SRM \
     DCap-gsi \
     -replyObject"
#

such that it reads

#
create diskCacheV111.cells.WebCollectorV3 collector \
    "PnfsManager \
     PoolManager \
     GFTP-<adminNode> \
     SRM-<adminNode> \
     DCap-gsi-<adminNode> \
     -replyObject"
#

and restart the httpDomain domain by executing

[root] # /opt/d-cache/jobs/httpd stop
[root] # /opt/d-cache/jobs/httpd -logfile=/opt/d-cache/log/http.log start

More information about the <domainName>.batch will follow in the next section.

The “Pool Usage” page gives an good overview of the current space usage of the whole dCache instance. In the graphs, free space is marked yellow, space occupied by cached files (which may be deleted when space is needed) is marked green, and space occupied by precious files, which cannot be deleted. Other states (e.g., files which are currently written) are marked purple.

The page “Pool Request Queues” (or “Pool Transfer Queues”) gives information about the number current requests handled by each pool. “Actions Log” keeps track of all the transfers performed by the pools up to now.

The remaining pages are only relevant with more advanced configurations: The page “Pools” (or “Pool Attraction Configuration”) can be used to analyze the current configuration of the pool selection unit in the pool manager. The remaining pages are relevant only if a tertiary storage system (HSM) is connected to the dCache instance.

In this section we will have a look at the configuration and log files of pnfs and dCache.

The primary configuration file for pnfs is /usr/etc/pnfsSetup. For administrative tasks in pnfs it has to be sourced and the path to the admninistration tools should be added to $PATH:

[root] # . /usr/etc/pnfsSetup
[root] # PATH=$PATH:$pnfs/tools

Stopping and starting the pnfs server can now be done with

[root] # pnfs stop
[root] # pnfs start

The log files for the three pnfs server daemons are /var/log/pmountd.log, /var/log/dbserver.log, and /var/log/pnfsd.log. The pnfsd is the NFS server implementation. Its log file contains one line for each NFS filesystem operation. The result code of that operation can be found at the end of the line.

The pnfsd daemons communicate with the dbserver daemons via a shared memory area. These perform the actual operations on the database files which can be found in /opt/pnfsdb/pnfs/databases/. The pnfs system finds them via the information in the directory /opt/pnfsdb/pnfs/info/.

Each database file is handled by one dbserver daemon and each access will lock the database file. Each database file/server is the container for one directory sub-tree. Similar to mounts in the UNIX filesystem, the filesystem tree is stitched together from several such sub-trees. In the standard configuration the admin database contains the root of the filesystem (the one mounted on /pnfs/fs/) and the data1 database the sub-tree starting at /pnfs/fs/usr/data/. the section called “The Databases of pnfs describes how to create new databases.

The dCache software is installed in one directory, normally /opt/d-cache/. All configuration and log files can be found here. In the following filenames will always be relative to this directory.

In the previous section we have already seen how a domain is restarted:

[root] # /opt/d-cache/jobs/<domainName> stop
[root] # /opt/d-cache/jobs/<domainName> \
-logfile=/opt/d-cache/log/<domainName>.log start

Listing the contents of jobs/ you will see that these scripts are all in fact links to jobs/wrapper2.sh. This script will determine the domain to start by its basename. From the above commands you can already read of the standard location for the log files.

The files config/<domainName>Setup will be used by the wrapper script and by the domains at start-up. These files are also links to config/dCacheSetup. This is the primary configuration file of the dCache system.

The only files which are different for each domain are config/<domainName>.batch. They describe which cells are started in the domains. Normally, changes in these files should not be necessary. However, if you need to change something, consider the following:

Since the standard config/<domainName>.batch files will be overwritten when updating to a newer version of dCache (e.g. with RPM), it is a good idea to modify only private copies of them. When choosing a name like config/<newDomainName>.batch you give the domain the name <newDomainName>. The necessary links can be created with

[root] # cd /opt/d-cache/config/
[root] #