The Ultimate Golden Release Upgrade Guide

How to get from 1.9.12 to 2.2

By Gerd Behrmann <behrmann@nordu.net>

Table of contents

Introduction

dCache 2.2 is the third long term support release (aka golden release). It is the result of 12 months of development since the release of dCache 1.9.12, the previous long term support release. During these 12 months 4 feature releases (1.9.13, 2.0, 2.1, and 2.2) were made at regular intervals. This document compiles the most important information from these four feature releases.

dCache 2.2 will be supported with regular bug fix releases at least until April 2014. dCache 1.9.12 will be supported until April 2013, however one should expect the release frequency to drop of as only the most critical bugs and security issues will be fixed. While the upgrade path from 1.9.12 to 2.2 is easy, it should be expected that no direct upgrade path from releases prior to 2.2 to a future (fourth) long term support release is provided.

Many things have changed between 1.9.12 and 2.2 and this document does not attempt to describe every little change. The focus is on changes that affect the upgrade process and on new features. Minor feature changes and bug fixes are often excluded. There is more information scattered in the release notes of each individual release.

The last section of this document contains useful reference material that should be consulted while reading this document. The reference material also includes a proposed checklist that may be used while planning an upgrade to 2.2.

The filesystem hierarchy standard

The filesystem hierarchy standard (FHS) provides a set of requirements and guidelines for file and directory placement under UNIX-like operating systems.

dCache has traditionally been installed in /opt/d-cache. Although this directory is specified by the FHS, dCache did not follow the recommendations for the installation in /opt.

dCache is now distributed in two different layouts:

The FHS packages install the bulk of the static files in /usr/share/dcache, with a few files in /usr/bin, /usr/sbin and /usr/share/man. Configuration files are stored under /etc/dcache and non-static files are stored under /var/lib/dcache with log files in /var/log/dcache.

The FHS packages automatically create a user account (dcache) during installation and dCache will drop to this account during startup. An init script and a logrotation configuration is automatically installed. Admin door ssh keys are auto created during installation.

Both layouts are distributed via www.dcache.org, but the FHS packages should be preferred. It should be expected that future feature releases will only use the FHS layout.

Migration from the classic layout to the FHS layout is possible and recommended, however the procedure is manual and not covered by this document. This process is described in a separate document.

We recommend and expect that all users will transition to the FHS packages. Users that wish to continue installing in /opt may do so using the FHS tarball. This tarball uses an internal layout similar to the other FHS packages, but can be installed in any directory. One looses the convenience of the package manager, but gains flexibility in where to install dCache.

Version number scheme

Veteran dCache administrators will note the change in version number scheme. Starting with the release of dCache 2.0.0 we use the second digit to distinguish feature releases and the third digit to distinguish bug fix releases.

Upgrading from 1.9.12

No direct upgrade path is provided from releases earlier than 1.9.12. Users of earlier releases should upgrade to 1.9.12 first, verify that everything works, and subsequently upgrade to 2.2.

We strongly recommend upgrading to the latest release of 1.9.12 before upgrading to 2.2. At the time of writing this is 1.9.12-17.

Important: Before upgrading to 2.2, the pool manager's configuration file has to be regenerated by issuing the save command in the pool manager's admin interface. This has to be done using at least version 1.9.12-9 or 1.9.13-3. Failing to do so will prevent the pool manager from starting after upgrade and the configuration file has to rewritten by hand.

Head nodes of 2.2 are compatible with pools 1.9.12-11 and newer, 1.9.13-4 and newer, 2.0.0 and newer, and 2.1.0 and newer, up to and including 2.2. Pools from any of these releases can be mixed. The exception to this rule is if NFS 4 is used; in that case all pools and head nodes have to be upgraded to 2.2 together.

Beginning with the release of 2.3 (primo July 2011) head nodes will only be compatible with pools belonging to release 2.2 and newer.

Assuming that NFS 4 not is used, a staged upgrade can be performed by first updating all head nodes (PNFS manager, pool manager, pin manager, space manager, SRM, all doors, monitoring services, etc) while leaving pools on the earlier release. Once the head nodes are online again and confirmed to be working, pools can be upgraded one by one while dCache is online. Obviously the service will be slightly degraded as files on a pool that is being upgraded will be momentarily unavailable. Please note that staged upgrade is not possible if pool nodes run any other dCache service (such as doors) besides the pool service.

The alternative to a staged upgrade is to shutdown all dCache services and upgrade all nodes.

In either case an in-place upgrade is possible and recommended.

Lots of components have been modified to improve consistency, robustness, latency, and add new features. In several cases this has affected the semantics of common operations in dCache. We recommend reading through the following sections, paying attention to issues like authorization, file ownership, multihoming, obsolete and forbidden configuration properties, and init scripts.

File access authorization

ACLs are now tightly integrated into Chimera. Consequently, ACL support for PNFS has been removed. We recommend upgrading to Chimera if ACL support is needed.

The ACL command line tools chimera-get-acl and chimera-set-acl have been replaced by the getfacl and setfacl subcommands of the Chimera command line tool chimera-cli. Eg. rather than chimera-get-acl one now uses chimera-cli getfacl. The arguments of the setfacl command have changed to no longer rely on an explicit index to order the ACEs. The following is an example of using the setfacl command:

$ chimera-cli setfacl /pnfs/desy.de/data USER:123:lfx:A GROUP:123:lfx:D

The ACLs can now also be queried and updated through NFS 4. Assuming NFS 4 file system is mounted, one can query and update ACLs like so:

$ nfs4_getfacl /pnfs/desy.de/data/generated/acl-test
$ nfs4_setfacl -a A::tigran@desy.afs:arw

The admin shell commands previously provided by the acl service, that is, setfacl and getfacl, are now integrated into the pnfsmanager service. The acl service is obsolete and should be removed from layout files.

By default, dCache does not implement the correct POSIX semantics for lookup permissions: Only lookup permissions of the parent directory are enforced. This was traditionally done to improve performance with the PNFS backend, but is now only kept to maintain backwards compatibility. The default behaviour is unchanged, however setting the new pnfsmanager configuration property pnfsVerifyAllLookups to true will enable POSIX semantics. The property is only supported for Chimera.

For Chimera, the authorization checks have been optimized to reduce the number of database queries involved. This reduces latency on name space operations and improves Chimera throughput.

Authentication

gPlazma 1

gPlazma 1 is no longer supported as a standalone service. The gplazma.version property is obsolete. Support for legacy authentication and mapping schemes is provided through the new gplazma1 plugin for gPlazma 2. This plugin uses the legacy /etc/dcache/dcachesrm-gplazma.policy configuration file. The default gPlazma 2 configuration in /etc/dcache/gplazma.conf loads the gplazma1 plugin, which means that existing users of gPlazma 1 should not have to make any modifications when upgrading to dCache 2.2. We do however recommend that users migrate away from the gplazma1 plugin as soon as possible. Henceforth we will no longer refer to specific versions of gPlazma.

New plugins

gplazma1

Supports the legacy /etc/dcache/dcachesrm-gplazma.policy configuration file. Should be used like this:

auth    requisite gplazma1
map     requisite gplazma1
session requisite gplazma1

Although mixing the gplazma1 plugin with other plugins is possible, we recommend migrating away from this plugin as soon as possible.

kpwd

gPlazma now supports password authentication through the kpwd plugin. The kpwd plugin is not new, however the support for password based authentication is. The dcache kpwd subcommand of the dcache script allows kpwd files to be manipulated.

Should be used like this:

auth    sufficient kpwd
map     sufficient kpwd
account sufficient kpwd
session sufficient kpwd

jaas

The new jaas authentication plugin for gPlazma delegates password authentication to the Java Authentication and Authorization Services (JAAS). A valid JAAS configuration has to provided in /etc/dcache/jgss.conf. JAAS has traditionally been used to support Kerberos authentication in dCache, however the jaas plugin is not limited to the Kerberos use case. Successful authentication results in a user name principal, which can be further mapped using one of the mapping plugins.

Should be used like this:

auth sufficient jaas gplazma.jaas.name=gplazma

with a /etc/dcache/jgss.conf containing something like:

gplazma {
    com.sun.security.auth.module.JndiLoginModule required
        user.provider.url="nis://NISServerHostName/NISDomain/user"
        group.provider.url="nis://NISServerHostName/NISDomain/system/group";
}

This would cause JAAS to use an external directory service to verify the password (the JndiLoginModule also supports LDAP).

Note that the gPlazma jaas module only supports the auth step. It cannot associate the session with UID, GID or other information provided by JAAS.

There are lots of third party JAAS login modules available, allowing you to easily use external password validation services with dCache.

krb5

The new krb5 mapping plugin for gPlazma is to be used in conjunction with the nfsv41 service for Kerberos authentication (see Kerberos authentication). The nfsv41 service submits KerberosPrincipals on the form user@example.org to gPlazma. The krb5 plugin strips the domain suffix, leaving only the user name in a user name principal. Other plugins (eg nsswitch, nis, authzdb) can be used to map the user name to UID and GID.

Use the plugin like this:

map optional krb5

nsswitch

The new nsswitch mapping, identity, and session plugin for gPlazma allows the systems native name service switch to be used for mapping user name principals to UID and GID.

nis

The new nis mapping, identity, and session plugin for gPlazma allows the Network Information System to be used to map user name principals to UID, GID and home directory.

FTP door

The ftp service now supports gPlazma for password authentication. Use the useGPlazmaAuthorizationModule and useGPlazmaAuthorizationCell configuration properties to control whether gPlazma is used or not. Note that by default gPlazma is used. Existing deployments will either have to update their gPlazma configuration to support password authentication or explicitly disable the use of gPlazma for the ftp service.

HTTP Basic Auth

The webdav service has been updated to support HTTP Basic authentication. Password verification is done through gPlazma. Please note that HTTP Basic authentication over an unencrypted channel is vulnerable to man-in-the-middle attacks. We strongly recommend only using HTTP Basic authentication over HTTPS.

gPlazma cell commands

Two new commands were added to the gPlazma the admin interface.

test login replaces the existing get mapping command. It shows the result of a login attempt but is more flexible when describing which principals have been identified for the user.

explain login uses the same format as test login, but provides detailed information on how the login was processed. The result of each processed plugin is explained in a tree structure.

Pool manager

Pool manager is used by doors to perform pool selection. Essentially, pool manager routes transfer to pools, controls staging from tape, and coordinates pool to pool transfers. Some of the biggest changes between 1.9.12 and 2.2 happened in the pool manager and how it is used by doors and pin manager.

Retry logic

In previous versions the retry logic in case of pool selection failures was placed inside pool manager. The consequence of that design decision was that doors and pin manager would never know what was happening inside pool manager: Was a file being staged or copied, or was the transfer suspended because the pool with the file was offline. Another consequence was that pool manager needed logic to query file meta data from PNFS manager. The query logic replicated similar logic already present in doors and would add latency to the pool selection process.

This behaviour was changed such that pool manager never retries requests internally. Instead, a pool selection failure causes the request to fail and be sent back to the door or pin manager. It is at the discretion of the requester to query PNFS manager for updated meta data and to retry the request. A consequence is that pool selection latency is reduced and that the retry logic can be tuned for every type of door. For instance, xrootd doors can rely on clients retrying requests and the door thus propagates a failure all the way back to the client. The SRM door on the other hand may return SRM_FILE_UNAVAILABLE, letting the client know that the pool with the file is offline. An FTP door will retry the pool selection internally.

The logic for suspending requests has not changed. A request that repeatedly fails will eventually get suspended. As before, doors will wait for a suspended request to be unsuspended.

Partition manager

The pool selection process consists of two parts: The first part uses an admin configured rule engine. This is called the pool selection unit and controls to and from which pools particular files may be written, read or staged. Once a set of candidate pools has been determined, the second step chooses one of those based on criteria such as free space and load.

The second step is now pluggable. This means that the admin may choose among several selection algorithms, and that third party developers may write custom plugins to further tweak and tune dCache.

To support such pluggable selection algorithms, the partition manager was rewritten. Partition manager allows several sets of parameters (partitions) to be defined and associated with different links. This mechanism has now been extended such that the pool selection logic itself is part of a partition. The partition is pluggable such that different links may use different pool selection strategies.

As part of the rewrite, several partition manager related commands have changed. In particular pm create now takes a plugin type parameter and the output format of pm ls has changed.

Important: Before upgrading, PoolManager.conf MUST be regenerated by using the save command using either 1.9.12-9, 1.9.13-3 or newer. dCache will fail to start if PoolManager.conf contains any of the obsolete commands. If third party scripts are used to generate PoolManager.conf, then these scripts will likely have to be updated (see the release notes of dCache 2.0 for details).

As before, a partition named default is hardcoded into the system. It is used by all links that do not explicitly define the partition to use. The default partitions can however be recreated/overwritten using the pm create command. Doing so allows the partition type to be set.

Each partition has its own set of parameters and different types of partitions may support different sets of parameters. All partitions inherit from a global set of parameters. This set is modified using pm set -option=value (ie without specifying a partition name). Note: For legacy installations in which the default partition has not been explicitly recreated, the parameters of the default partition and the set of parameters inherited by other partitions are identical. This is done to ensure backwards compatibility with old pool manager configurations. Once pm create default is used this coupling is removed. We recommend that you create the default partition explicitly and regenerate the pool manager configuration. Support for legacy configurations will be removed in a future update.

The following partition types are supported:

classic
The pool selection algorithm used in previous versions of dCache.
random
Selects a pool randomly.
lru
Selects the pool that has not been used the longest.
wass
A new selection algorithm that select pools randomly weighted by available space, while incorporating age and amount of garbage collectible files and information about load (see Weighted Available Space Selection). For fresh installations this is the default partition type.
wrandom
A simplified version of wass. The behaviour corresponds to wass with breakeven=0, gap=0, cpucostfactor=0, spacecostfactor=1, p2p=0, alert=0, halt=0, fallback=0, slope=0, idle=0.

Third party plugins providing additional partition types can be installed.

Weighted Available Space Selection

wass is the new default pool selection algorithm for new installations. Existing installations will continue to use the classic algorithm, however we encourage sites to transition to the wass algorithm.

The internals of the wass algorithm is much more complicated than the classic algorithm, however tuning it should be considerably easier, with less request clumping and more uniform filling of pools.

How to switch to WASS

First, inspect the existing partitions and parameters using pm ls -l. Keep the output for reference. The output will also show the partition type of each partition. To change the type the partition has to be recreated using the pm create command, eg:

(PoolManager) admin > pm create -type=wass default

Note that this will reset the parameters of the partition. If you have partition specific parameters, like a replication threshold, then these need to be set again using pm set. It either case, it is a good idea to reset the cpucostfactor and spacecostfactor to their default values, eg:

(PoolManager) admin > pm set default -spacecostfactor=1.0 -cpucostfactor=1.0

The read pool selection is identical to the classic algorithm: In essence, the set of pools able to serve the file is computed, and the pool with the lowest performance cost is selected. Idle cost cuts, fall back cost cuts, etc are processed as before. It should be noted that space cost factor and cpu cost factor have no influence on read pool selection.

The crucial difference in wass is the write pool selection step: Essentially, pools are selected randomly with a weight computed from the available space of the pool. The weight is however adjusted to take the current write load and the garbage collectible files into account. The higher the current write load (number of write movers), the less likely a pool is selected. The more garbage collectible files and the older the last access time of those files, the more likely a pool is selected.

WASS parameters

The WASS algorithm can be tuned by using the following parameters:

breakeven
Set per pool. The value must be between 0.0 and 1.0. High values of breakeven mean that old files are more valuable. Low values mean that old files quickly become invaluable. Pools with invaluable files are more likely to be selected.
mover cost factor and cpu cost factor
The mover cost factor is set per pool, while the cpu cost factor is set per partition in the pool manager. The product of these two factors allows the aggressiveness of the write load feedback signal to be adjusted. A low value means we expect pools to scale well with load. A value of 0.0 means that load information is ignored completely; no feedback is used. A negative value would mean that a busy pool becomes more attractive; hardly a useful configuration. The intuitive meaning of the product is that for a value of f, the probability of choosing a pool is halved for every 1/f concurrent writers.
space cost factor
Set per partition in pool manager. Intuitively, the larger the value the more we prefer pools with free space. For a value of 1.0 the probability of selecting a pool is proportional with available space. With smaller values the role of available space drops, until at 0.0 available space no longer influences pool selection. For negative values the algorithm will give higher preference to pools with less free space, but that is hardly a useful configuration. At values higher than 1.0 we give additional preference to pools with free space.

The following table lists the useful range, the default value, and special values for all four parameters.

ParameterUseful rangeDefaultSpecial values
mover cost factor Non-negative 0.5 0.0 means that write load has no influence on pool selection for this pool. The useful range of the product of mover cost factor and cpu cost factor is between 0.0 and 1.0.
cpu cost factor Non-negative 1.0 0.0 means that write load has no influence on pool selection for this partition. The useful range of the product of mover cost factor and cpu cost factor is between 0.0 and 1.0.
space cost factor Non-negative 1.0 0.0 means that free or garbage collectable space has no influence on pool selection. 1.0 means that pools are selected with a probability proportional to free space.
breakeven [0;1] 0.7 0.0 means that garbage collectible files are considered as free space. 1.0 means that garbage collectible files are considered as used space.

It is unlikely that large values for any of the above parameters leads to useful results.

Except for the mover cost factor all parameters exist for the classic algorithm too. They serve similar roles and increasing or decreasing either parameter has similar effects. However the details for how these parameters are used has changed significantly and we strongly recommend starting out with the defaults values. The exact mathematical meaning is unimportant at this stage. In our experience the default values are pretty good for most cases. We expect the tuning process to be iterative, with small changes to the above four parameters being applied, followed by an observation phase.

Here are some general tips for tuning: If you want pools with more free space to fill more quickly then increase the space cost factor. If you want pools with free space to attract fewer transfers then reduce the space cost factor. If you want unaccessed files to be garbage collected more aggressively then reduce the breakeven parameter. If you want unaccessed files to be kept longer then increase the breakeven value. If you want the write load to have a higher impact on write pool selection then increase either the cpu cost factor or the mover cost factor (depending on whether the effect should be for individual pools or for all pools). Conversely, reduce either factor to reduce the effect write load has on pool selection. Remember than these two factors are multiplied, so setting either to zero means write load is not taken into account.

One final point to note compared to the classic algorithm is that read movers have no direct influence on write pool selection. This is on purpose and prevents that popular files on particular pools lead to increased write clumping (which in turns leads to additional read clumping in the future, resulting in a negative feedback loop). If pool performance is significantly degraded by read access patterns, then write movers will eventually accumulate and result in a lower probability for the pool to be selected for further writes.

Wildcards in PSU commands

Several psu commands now accept wildcard (glob patterns). Use help psu to see which commands accept wildcards.

Staging without location

In previous releases pool manager would initiate a stage for any file if a disk copy was not online. It did so even for files for which no tape location was known. Starting with dCache 2.1, pool manager will only generate a stage request for files with a known tape location.

Sites that rely on the previous behaviour to import data stored to tape without dCache should contact support@dcache.org.

Pinning

Pin manager is used by SRM and DCAP to trigger staging from tape and to ensure that the file is not garbage collected for a certain amount of time. It does this by placing a sticky flag (a pin) on the file on one of the pools.

In previous versions pin manager would unconditionally delegate pool selection to pool manager. Now, pin manager will handle some cases without delegating pool selection to pool manager. This is the case when a file is already online, or when a disk only file is offline. In other cases, eg when a pool to pool transfer or a stage from tape is required, pin manager continues to delegate pool selection to pool manager.

The benefit of running the pool selection algorithm in pin manager is that it reduces latency for the common cases that don't require any internal transfers. It also reduces load on pool manager.

Pool selection in pin manager is implemented by periodically exporting a snapshot of the configuration and pool status information from pool manager. Changes to the pool manager configuration may take up to 30 seconds to propagate to pin manager.

Chimera

Update of stored procedures

The stored PostgreSQL procedures used by Chimera have been updated. During upgrade, the SQL script to create/update the stored procedures has to be applied:

$ psql -U postgres -f /usr/share/dcache/chimera/sql/pgsql-procedures.sql chimera

Directory tags

The mode, owner and group of directory tags can now be changed (using the regular chmod, chown and chgrp utilities).

New checksum command

The command chimera-cli checksum was added to query the checksum of a file.

Pools

Continous checksum validation

The checksum scanner has been extended with configurable Continous background checksumming. Any checksum errors are logged and files are marked as broken and will not be available for download. The new -scrub option of the csm set policy command allows the feature to be enabled. Eg.

(pool_0) admin > csm set policy -scrub -limit=2 -period=720

Consult the help output of that command for information about setting throughput limits and scan frequency.

Pool to pool transfers

Pool to pool transfers used to use DCAP. Pool to pool transfers now use HTTP. This change should be transparent and a WebDAV door is not required. The primary observable change is that the TCP connection between source and destination pool is now created from the destination pool to the source pool. In previous versions the direction was reversed. Since HTTP is classified as a WAN protocol, the port used by the source pool to listen for the TCP connection will be allocated from the WAN port range. The pool CLI commands pp set port and pp set listen are deprecated and replaced by the command pp interface. The pp set listen command however calls through to the pp interface command, meaning that old pool configurations will work as is. Important: Due to this change, new pools are only compatible with pools of releases 1.9.12-11, 1.9.13-4 and newer.

Info provider

Parts of the info-provider configuration that used to be in /etc/dcache/info-provider.xml was moved to the /etc/dcache/dcache.conf. Therefore, you will need to edit dcache.conf so that it contains your configuration. See the defaults file /usr/share/dcache/defaults/info-provider.properties for the list of affected properties. You may want to recreate /etc/dcache/info-provider.xml to get rid of the excess configuration.

Note that, if you are using the default value for an info-provider property then you do not need to configure that property in dcache.conf: the default value will be used automatically.

GLUE2 compliance has been improved. Be sure you have at least v2.0.8 of glue-schema RPM installed on the node running the info provider.

Admin

The admin shell has received numerous improvements. Support for version 2 of the ssh protocol has been added and is available on port 22224. Support for version 1 still exists and is needed for the dCache GUI. Although allowed by the protocol, we have not been able to implement support for both protocols on the same TCP port. Support for version 1 will eventually be removed.

Color highlighting as well as limited tab completion has been added.

Billing

The output format of messages written to billing files is now configurable. Have a look at /usr/share/dcache/defaults/billing.properties for details about available formats.

The billing database schema has changed. The schema is automatically updated during upgrade. Downgrade is not possible once upgraded.

When using the billing database, the httpd service is able to generate plots from the information in the database. Support is enabled by setting the billingToDb property to yes for the httpd service. The plots are available under http://admin.example.org:2288/billingHistory/.

SRM

Listing

The SRM list operation provides information about file locality, among other things. In previous versions the SRM door would query pool manager to compute the file locality for each file being listed. dCache now computes the file locality internally in the SRM. The effect is that latency is reduced. The algorithm relies on a periodic snapshot of the pool manager configuration and pool state being transferred from pool manager to the SRM door (similar to how it is now done in pin manager).

Pinning

The new srmPinOnlineFiles property controls whether dCache pins files that have ONLINE access-latency. If set to false then dCache will refrain from pinning ONLINE files; dCache still ensures that the file is available on a read pool before returning the transfer URL to the client, but no guarantee is made that the file will not be garbage collected before the transfer URL expires.

Pinning ONLINE files

In previous versions of dCache, when SRM clients asks dCache to prepare a file for download, the SRM door would always ask the pin manager to pin the file. This was to ensure that the file is indeed online, that the file's data is available on a pool the user may read from, and that the data will not be garbage collected during the transfer URL's lifetime. A correct implementation of the SRM protocol must provide these three guarantees so, for the general case, pinning is required even when access latency is ONLINE.

The disadvantage of always pinning ONLINE files is that it introduces latency that, in many cases, is unnecessary; for example, if a file is permanently available on a pool that the end user can read from then pinning the file is unnecessary.

Some dCache deployments only store files on pools that are readable: they have no pools dedicated for writing or staging. Pinning ONLINE files isn't required for such deployments as dCache already makes the necessary guarantees.

Other sites may know that the risk of a replicated file becoming garbage-collected during the lifetime of the transfer URL is small. If it is garbage-collected then opening the file will still succeed, but will incur a delay. The site-admin may know that their user community will accept this small risk in exchange for improved throughput, in which case pinning ONLINE files is unnecessary.

A side effect of disabling srmPinOnlineFiles is that it becomes possible to setup a tapeless system without pin manager. The default access latency in dCache is however NEARLINE, even when no HSM system is attached. The access latency has to be changed to ONLINE if dCache is to run without a pin manager (the system wide default is controlled through the DefaultAccessLatency property in PNFS manager).

Unavailable files

Due to the changes to pin manager, SRM can now report SRM_FILE_UNAVAILABLE if files are offline, that is, when the pools holding the file are down and no tape copy is available.

Overwrite flag

The SRM protocol allows the client to request that existing files are overwritten upon upload. By default dCache rejects to follow this option. Instead it always refuses to overwrite a file.

The configuration property overwriteEnabled allowed this behaviour to be changed. When set to true the SRM would respect the overwrite request of the client. The way this was implemented however meant that the option had to be set in all doors that the SRM could redirect to. Thus one was faced with the choice of either not honoring the clients request or to enable overwrite by default for all other protocols too.

The handling of the overwrite flag in the SRM has changed. When overwriteEnabled is set to true and the client requests to overwrite a file, then the SRM will delete the file before redirecting to another door. This means that an SRM door can now honor the clients request even when all other doors are configured not to overwrite existing files.

Internal delegation of credentials

srmCopy with GridFTP used to use GSI delegation to transfer credentials from the SRM door to the pool. This was a slow and CPU intensive process. The SRM door and pool has now been updated to transfer the credentials through the dCache internal message passing mechanism.

We assume that the message passing mechanism is secure. Care should be taken to properly firewall access to the message passing system. This should be done whether srmCopy is used or not.

The file protocol

nfsv41 doors now register with LoginBroker using the file:// protocol. This allows SRM to produce TURLs for this protocol.

No kpwd support

Support for running the srm service so that it directly uses a custom kpwd file (independent of the rest of dCache) has been removed. All SRM authentication and authorisation activity must now go through gPlazma. Note that it is possible to configure SRM to use a different gPlazma configuration from the rest of dCache by 'embedding' gPlazma (see useGPlazmaAuthorizationModule and useGPlazmaAuthorizationCell options). These options, along with gPlazma's kpwd plugin, allow for an equivalent configuration.

SSL

The srm service now listens on two ports. These are, by default, 8443 (as before) and 8445. Port 8443 continues to be for SRM clients that use GSI-based communication and the new port is for SRM clients that use SSL. While the SSL port is configurable, it is recommended to use the default as this an agreed port-number for SSL-based SRM traffic.

NFS

Kerberos authentication

Support for RPCSEC_GSS security was added to the NFS 4 door. To enable it, the following configuration is required:

The dCache NFS implementation supports the following RPCSEC_GSS QOPs (quality of protection):

NONE
authentication only
INTEGRITY
RPC requests integrity validation
PRIVACY
RPC requests encryption

These correspond to krb5, krb5i and krb5p mount options, for example:

# mount -o krb5i server:/export /local/path

Notice, that all data access with NFS 4.1 uses the same QOP as it was specified for mount, e.g, if privacy was requested at the mount time, then all NFS traffic including data coming from pools will be encrypted.

ACL query and update

ACLs can now be queried and updated through a mounted NFS 4.1 file system. No special configuration is required. Eg:

$ nfs4_getfacl /pnfs/desy.de/data/generated/acl-test
$ nfs4_setfacl -a A::tigran@desy.afs:arw

FTP

FTP doors now support renaming of files. This is provided through the RNFR and RNTO commands defined by RFC 959.

To use this functionality you must use a client that supports renaming. UberFTP supports renaming.

Hopping manager

A service definition for hopping manager was added. The name of the new service is hopping.

dcache script

Tab completion for the Bash shell was added. The FHS RPM and DEB packages automatically install the dcache.bash-completion script.

The subcommand dcache ports lists all used TCP and UDP ports and port ranges of configured services. Use the command like this:

$ dcache ports
DOMAIN          CELL               SERVICE PROTO PORT
dCacheDomain    -                  -       TCP   11112
dCacheDomain    -                  -       TCP   11113
dCacheDomain    -                  -       TCP   11111
dCacheDomain    -                  -       UDP   11111
dCacheDomain    -                  -       UDP   0
dCacheDomain    httpd              httpd   TCP   2288
dCacheDomain    info               info    TCP   22112
dCacheDomain    DCap-gsi-dcache-vm gsidcap TCP   22128
dCacheDomain    SRM-dcache-vm      srm     TCP   8443
dCacheDomain    Xrootd-dcache-vm   xrootd  TCP   1094
dCacheDomain    WebDAV-dcache-vm   webdav  TCP   2880
adminDomain     -                  -       UDP   0
adminDomain     alm                admin   TCP   22223
namespaceDomain -                  -       UDP   0
namespaceDomain NFSv3-dcache-vm    nfsv3   TCP   (111)
namespaceDomain NFSv3-dcache-vm    nfsv3   TCP   2049
namespaceDomain NFSv3-dcache-vm    nfsv3   UDP   (111)
namespaceDomain NFSv3-dcache-vm    nfsv3   UDP   2049
pool            -                  -       UDP   0
pool            pool_0             pool    TCP   20000-25000
pool            pool_0             pool    TCP   33115-33145
pool            pool_1             pool    TCP   20000-25000
pool            pool_1             pool    TCP   33115-33145
pool            pool_2             pool    TCP   20000-25000
pool            pool_2             pool    TCP   33115-33145
gridftp-Domain  -                  -       UDP   0
gridftp-Domain  GFTP-dcache-vm     gridftp TCP   2811
gridftp-Domain  GFTP-dcache-vm     gridftp TCP   20000-25000
testDomain      -                  -       UDP   0
testDomain      pool10             pool    TCP   20000-25000
testDomain      pool10             pool    TCP   33115-33145

Ports with '-' under the CELL and SERVICE columns provide inter-domain
communication for dCache. They are established independently of any service
in the layouts file and are configured by the broker.* family of
properties.

Entries where the port number is zero indicates that a random port number
is chosen. The chosen port is guaranteed not to conflict with already open
ports.

The dcache pool convert command replaces the existing procedure for converting between pool meta data repository formats. The subcommand supports conversion from file to db and from db to file. The metaDataRepositoryImport configuration property is no longer supported. Use the command like this:

$ dcache pool convert pool_0 db
INFO  - Copying 000097E9203C0F264F8380C3014BCF405783 (1 of 540)
INFO  - Copying 0000F881F830AF3E471C9263D8752D5A4BA2 (2 of 540)
...
INFO  - Copying 0000DBFEDCB237C24DFA92E48ED9FCD6782D (539 of 540)
INFO  - Copying 00009AAC974918F4452AB7E08DF97DF477B3 (540 of 540)

The pool meta data database of 'pool_0' was converted from type
org.dcache.pool.repository.meta.file.FileMetaDataRepository to type
org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository. Note that
to use the new meta data store, the pool configuration must be updated by
adjusting the metaDataRepository property, eg, in the layout file:


metaDataRepository=org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository

The dcache pool yaml subcommand replaces the meta2yaml utility. The command dumps the pool meta data repository data to YAML format. Both the db and file pool backends are supported. Use the command like this:

$ dcache pool yaml pool_0
000097E9203C0F264F8380C3014BCF405783:
  state: CACHED
  sticky:
    system: -1
  storageclass: myStore:STRING
  cacheclass: null
  bitfileid: <Unknown>
  locations:
  hsm: osm
  filesize: 954896
  map:
    uid: -1
    StoreName: myStore
    gid: -1
    path: /pnfs/dcache-vm/data/test-1314688795-15
    SpaceToken: 18090188
  retentionpolicy: REPLICA
  accesslatency: ONLINE
0000F881F830AF3E471C9263D8752D5A4BA2:
  state: CACHED
  sticky:
    system: -1
  storageclass: myStore:STRING
  cacheclass: null
  bitfileid: <Unknown>
  locations:
  hsm: osm
  filesize: 954896
  map:
    uid: -1
    StoreName: myStore
    gid: -1
    path: /pnfs/dcache-vm/data/test-1314817040-7
    SpaceToken: 18100092
  retentionpolicy: REPLICA
  accesslatency: ONLINE
....

Parsers for YAML are available for all major scripting languages. The dcache pool yaml command is the preferred mechanism for direct access (ie without using the pool) to the meta data of a dCache pool.

Configuration

Two minor additions to the configuration language have been made. Both additions allow us to catch more misconfigurations and the additions provide added documentation value.

The immutable annotation means that the value of a property cannot be modified. We use this annotation to mark properties for internal use.

The one-of annotation limits properties to have one of a limited set of values. We typically use this annotation for boolean properties or enumerations.

Reference material

Upgrade checklist

Use this checklist to plan the upgrade to 2.2. The checklist focuses on upgrading a single node and focuses on a manual upgrade. Upgrades involving installation scripts like dCacheConfigure, YAIM, or site specific deployment scripts are not covered by this process.

  1. Login to the admin shell and execute:
    (local) admin > cd PoolManager
    (PoolManager) admin > save
  2. Execute /usr/bin/dcache stop.
  3. Install the 2.2 package (RPM, DEB, or PKG).
  4. Execute /usr/bin/dcache check-config.
  5. For each generated warning and error, update /etc/dcache/dcache.conf and the layout file (check the tables below).
  6. Repeat the previous two steps until all warnings and errors are gone.
  7. If Chimera is used then recreate the stored procedures in the database by executing:
    $ psql -U postgres -f /usr/share/dcache/chimera/sql/pgsql-procedures.sql chimera
    You may have to provide a user name or password depending on your PostgreSQL configuration.
  8. If this node publishes GLUE information then ensure that the relevant info provider properties are defined in /etc/dcache/dcache.conf. Consult the tables below for a list of properties that may need to be defined.
  9. Start dCache by executing /usr/bin/dcache start.
  10. Carefully monitor the log files for any sign of trouble.

Terminology

Term Description
cell A component of dCache. dCache consists of many cells. A cell must have a name which is unique within the domain hosting the cell.
domain A container hosting one or more dCache cells. A domain runs within its own process. A domain must have a name which is unique throughout the dCache instance.
service An abstraction used in dCache configuration files to describe atomic units to add to a domain. A service is typically implemented through one or more cells.
layout Definition of domains and services on a given host. The layout is specified in the layout file. The layout file may contain both domain- and service- specific configuration values.
pool A service providing physical data storage.

Services

This section lists all supported services. Those marked with a * are services that dCache requires to function correctly.

Core services
Name Decscription
broadcast Internal message broadcast service.
cleaner Service to remove files from pools and tapes when the name space entry is deleted.
cns Cell naming service used in conjuction with JMS for well known name lookup.
dir Directory listing support for DCAP.
gplazma Authorization cell
hopping Internal file transfer orchestration.
loginbroker Central registry of all doors. Provides data to SRM for load balancing.
pinmanager Pinning and staging support for SRM and DCAP.
pnfsmanager Gateway to name space (either PNFS or Chimera).
pool Provides physical data storage.
poolmanager Central registry of all pools. Routes transfers to pools, triggers staging from tape, performs hot spot detection.
replica Manages file replication for Resilient dCache.
spacemanager Space reservation support for SRM.
srm-loginbroker Central registry of all SRM doors.
Admin and monitoring services
Name Decscription
admin SSH based admin shell.
billing Service for logging to billing files or the billing database.
httpd Legacy monitoring portal. Depends on: loginbroker, topo.
info Info service that collects information about the dCache instance. Recommends: httpd
statistics Collects usage statistics from all pools and generates reports in HTML.
topo Builds a topology map of all domains and cells in the dCache instance.
webadmin Web admin portal. Depends on: info
Doors
Name Decscription
authdcap Authenticated DCAP door. Depends on: dir. Recommends: pinmanager.
dcap dCap door. Depends on: dir. Recommends: pinmanager.
gsidcap GSI dCap door. Depends on: dir. Recommends: pinmanager.
kerberosdcap Kerberized dCap door. Depends on: dir. Recommends: pinmanager.
ftp Regular FTP door without strong authentication.
gridftp GridFTP door.
kerberosftp Kerberized FTP door.
nfsv3 NFS 3 name space export (only works with Chimera).
nfsv41 NFS 4.1 door (only works with Chimera).
srm SRM door. Depends on: pinmanager, loginbroker, srm-loginbroker. Recommends: transfermanagers, spacemanager.
transfermanagers Server side srmCopy support for SRM.
webdav HTTP and WebDAV door.
xrootd XROOT door.
Obsolete services
Name Reason
acl Integrated into pnfsmanager service.
dummy-prestager DCAP uses pinmanager for staging.

gPlazma 2 plugins

The following gPlazma 2 plugins ship with dCache and can be used in gplazma.conf. Note that several plugins implement more than one type. Usually such plugins should be added to all phases supported by the plugin.

gPlazma 2 plugins
Name Type Description
gplazma1 auth Legacy support for dcachesrm-gplazma.policy configuration.
jaas auth Implements password authentication through the Java Authentcation and Authorization Services (JAAS). A valid JAAS setup for password verification has to be defined in etc/jgss.conf. Fails if no password credential is provided or if JAAS denies the login. A username principal is generated upon success.
kpwd auth Implements password authentication using the kpwd file. Fails if no password credential is provided, if the username is not defined in the kpwd file, if the password is invalid, or if the entry has been disabled in the kpwd file.
voms auth Validates any VOMS attributes in an X.509 certificate and extracts all valid FQANs. Requires that a vomsdir is configured. Fails if no valid FQAN is found.
x509 auth Extracts the DN from an X.509 certificate. The certificate chain is not validated (it is assumed that the door already validated the chain). The plugin fails if no certificate chain is provided.
xacml auth
authzdb map Maps user and group name principals to UID and GID principals according to a storage authzdb file. The file format does not distinguish between user names and group names and hence each entry in the file maps to both a UID and one or more GIDs. Therefore the UID and the primary GID are determined by the mapping for the primary group name or user name. The name of that mapping is kept as the user name of the login and may be used for a session plugin or for authorization in space manager. Remaining GIDs are collected from other mappings of available group names.
gplazma1 maph Legacy support for dcachesrm-gplazma.policy configuration.
gridmap map Maps DN principals to user name principals according to a grid-mapfile. Fails if no DN was provided or no mapping is found.
kpwd map Maps user names, DNs and Kerberos principals according to the kpwd file. Only user names verified by the kpwd auth plugin are mapped. Fails if nothing was mapped or if the kpwd entry has been disabled. Maps to user name, UID and GID principals.
krb5 map Maps Kerberos principals to username principals by stripping the domain suffix.
nis map Maps user name principals to UID and GID through lookup in NIS.
nsswitch map Maps user name to UID and GID according to the system native Name Service Switch.
vorolemap map Maps FQAN principals to group name principals according to a grid-vorolemap file. Each FQAN is mapped to the first entry that is a prefix of the FQAN. The primary FQAN (the first in the certificate) is mapped to the primary group name. Fails if no FQAN was provided or no mapping was found.
argus account
kpwd account Fails if the kpwd entry used during the map has been disabled.
authzdb session Associates a user name with root and home directory and read-only status according to a storage authzdb file.
gplazma1 session Legacy support for dcachesrm-gplazma.policy configuration.
kpwd session Adds home and root directories and read-only status to the session. Only applies to mappings generated by the kpwd map plugin.
nis session Associates a user name with a home directory through NIS lookup. The sessions root directory is always set to root and the session is newer read-only.
nsswitch session Sets the session home directory and root directory to the file system root, and sets the session's read-only status to false.
nis identity Maps user name principals to UID and group name principals to GID.
nsswitch identity Maps user name principals to UID and group name principals to GID.

Please consult /opt/d-cache/share/defaults/gplazma.properties for details about available configuration properties.

Changed properties

Most configuration properties are unchanged. Some have however been removed or replaced and others have been added. The following tables provide an overview of the properties that may need to be changed when upgrading from dCache 1.9.12 to 2.2.

Deprecated properties
Property Alternative Description
gplazmaPolicy gplazma.legacy.config Location of legacy gPlazma configuration file.
New properties
Property Default Description
pnfsVerifyAllLookups false Whether to verify lookup permissions for the entire path.
srmPinOnlineFiles true Whether to pin disk files
nfs.port 2049 TCP port used by NFS doors.
nfs.v3 false Whether to enable NFS 3 support in NFS 4.1 door.
nfs.domain The local NFSv4 domain name.
nfs.idmap.cache.size 512 Principal cache size of NFS door.
nfs.idmap.cache.timeout 30 Principal cache timeout of NFS door.
nfs.idmap.cache.timeout.unit SECONDS Principal cache timeout unit of NFS door.
nfs.rpcsec_gss false Whether to enable RPCSEC_GSS for NFS 4.1 door.
info-provider.site-unique-id EXAMPLESITE-ID Single words or phrases that describe your site.
info-provider.se-unique-id dcache-srm.example.org Your dCache's Unique ID.
info-provider.se-name A human understandable name for your SE.
info-provider.glue-se-status UNDEFINEDVALUE Current status of dCache
info-provider.dcache-quality-level UNDEFINEDVALUE Maturity of the service in terms of quality of the software components.
info-provider.dcache-architecture UNDEFINEDVALUE the architecture of the storage
info-provider.dit resource
info.provider.paths.tape-info /usr/share/dcache/xml/tape-info-empty.xml Location of tape accounting information.
collectorTimeout 5000 Webadmin timeout for the data collecting cell.
transfersCollectorUpdate 60 Webadmin update time for the data collecting cell.
webdavBasicAuthentication false Whether HTTP Basic authentication is enabled.
admin.colors.enable true Whether to enable color output of admin door.
sshVersion both Which version of the SSH protocol to use for the admin door.
admin.ssh2AdminPort 22224 Port to use for SSH 2.
admin.authorizedKey2 /etc/dcache/admin/authorized_keys2 Authorized keys for SSH 2.
admin.dsaHostKeyPrivate /etc/dcache/admin/ssh_host_dsa_key Location of SSH 2 private key.
admin.dsaHostKeyPublic /etc/dcache/admin/ssh_host_dsa_key.pub Location of SSH 2 public key.
broker.messaging.port 11111 TCP port used for cells messaging.
broker.client.port 0 UDP port for cells messaging client.
billing.formats.MoverInfoMessage See /usr/share/dcache/defaults/billing.properties
billing.formats.RmoveFileInfoMessage See /usr/share/dcache/defaults/billing.properties
billing.formats.DoorRequestInfoMessage See /usr/share/dcache/defaults/billing.properties
billing.formats.StorageInfoMessage See /usr/share/dcache/defaults/billing.properties
gplazma.nis.server nisserv.domain.com NIS server contacted by gPlazma NIS plugin.
gplazma.nis.domain domain.com NIS domain used by gPlazma NIS plugin.
xrootd.gsi.hostcert.key /etc/grid-security/hostkey.pem Host key used by xrootd GSI plugin.
xrootd.gsi.hostcert.cert /etc/grid-security/hostcert.pem Host certificated used by xrootd GSI plugin.
xrootd.gsi.hostcert.refresh 43200 Host certificate reload period used by xrootd GSI plugin.
xrootd.gsi.hostcert.verify true
xrootd.gsi.ca.path /etc/grid-security/certificates CA certificates used by xrootd GSI plugin.
xrootd.gsi.ca.refresh 43200 CA certificate reload period used by xrootd GSI plugin.
webadmin.warunpackdir /var/tmp Place to unpack Webadmin WAR file.
gplazma.jaas.name gplazma JAAS application name.
ftp.read-only false (true for weak ftp) Whether FTP door allows users to modify content.
srm.ssl.port 8445 TCP port for SRM over SSL.
httpd.static-content.plots /var/lib/dcache/plots Where to look for billing plot files.
httpd.static-content.plots.subdir /plots URI path element for billing plot files.
Obsolete properties
Name Description
aclTable ACLs are now part of Chimera
aclConnDriver ACLs are now part of Chimera
aclConnUrl ACLs are now part of Chimera
aclConnUser ACLs are now part of Chimera
aclConnPaswd ACLs are now part of Chimera
gplazma.version gPlazma 2 is the only version of gPlazma included.
srmGssMode SRM over SSL now has a dedicated port.
billingDb Use billingLogsDir.

Most of the following forbidden properties were marked as obsolete or deprecated in 1.9.12.

Forbidden properties
Property Alternative
namespaceProvider dcache.namespace
webdav.templates.list webdav.templates.html
metaDataRepositoryImport Use dcache pool convert command.
SpaceManagerDefaultAccessLatency DefaultAccessLatencyForSpaceReservation
keyBase dcache.paths.ssh-keys
kerberosScvPrincipal kerberos.service-principle-name
gsiftpDefaultStreamsPerClient Forbidden by GridFTP protocol.
gPlazmaNumberOfSimutaneousRequests gPlazmaNumberOfSimultaneousRequests
srmDbHost srmDatabaseHost
srmPnfsManager pnfsmanager
srmPoolManager poolmanager
srmNumberOfDaysInDatabaseHistory srmKeepRequestHistoryPeriod
srmOldRequestRemovalPeriodSeconds srmExpiredRequestRemovalPeriod
srmJdbcMonitoringLogEnabled srmRequestHsitoryDatabaseEnabled
srmJdbcSaveCompletedRequestsOnly srmStoreCompletedRequestsOnly
srmJdbcEnabled srmDatabaseEnabled
java Use JAVA_HOME environment variable.
java_options dcache.java.options or dcache.java.options.extra
user dcache.user
pidDir dcache.pid.dir
logArea dcache.log.dir
logMode dcache.log.mode
classpath dcache.java.classpath
librarypath dcache.java.library.path
kerberosRealm kerberos.realm
kerberosKdcList kerberos.key-distribution-center-list
authLoginConfig kerberos.jaas.config
messageBroker broker.scheme
serviceLocatorHost broker.host
serviceLocatorPort broker.port
amqHost broker.amq.host
amdPort broker.amq.port
amqSSLPort broker.amq.ssl.port
amqUrl broker.amq.url
ourHomeDir dcache.home
portBase set protocol-specific default ports
httpPortNumber webdavPort
httpRootPath webdavRootPath
httpAllowedPaths webdavAllowedPaths
webdavContextPath webdav.static-content.location
cleanerArchive cleaner.archive
cleanerDB cleaner.book-keeping.dir
cleanerPoolTimeout cleaner.pool-reply-timeout
cleanerProcessFilesPerRun cleaner.max-files-in-message
cleanerRecover cleaner.pool-retry
cleanerRefresh cleaner.period
hsmCleaner cleaner.hsm
hsmCleanerFlush cleaner.hsm.flush.period
hsmCleanerRecover cleaner.pool-retry
hsmCleanerRepository cleaner.hsm.repository.dir
hsmCleanerRequest cleaner.hsm.max-files-in-message
hsmCleanerScan cleaner.period
hsmCleanerTimeout cleaner.hsm.pool-reply-timeout
hsmCleanerTrash cleaner.hsm.trash.dir
hsmCleanerQueue cleaner.hsm.max-concurrent-requests
trash cleaner.trash.dir
httpHost info-provider.http.host
xsltProcessor info-provider.processor
xylophoneConfigurationFile info-provider.configuration.file
saxonDir info-provider.saxon.dir
xylophoneXSLTDir info-provider.xylophone.dir
xylophoneConfigurationDir info-provider.configuration.dir
images httpd.static-content.images
styles httpd.static-content.styles