dCache 1.9.1 Release Notes

Introduction

It is our long term goal to renovate the entire dCache code base, focusing on increased modularity, consistency and orthogonality. As a first step in this direction, the 1.9.1 release introduces a refactored pool component.

The new pool component is a drop in replacement for the old pool, with an unaltered external interface to ease testing and deployment. The new pool has run in limited production at NDGF for several months and we are fairly confident that - although not perfect - it is safe to use. That being said, other sites have different usage patterns, and we urge everybody to carefully test the new pool before widespread deployment. We provide an upgrade strategy below.

It has been a design requirement to make the new pool a drop in replacement for the existing pool. Therefore the administrative interface is mostly unchanged, and the feature set is more or less the same. We provide a detailed list of new and dropped features below.

Further iterations of the pool code are planned. Where the current release focuses on modularity and internal consistency, future releases will focus on mover management and improved support for advanced protocols like NFS 4.1 and xrootd. These protocols incorporate knowledge of the distributed nature of dCache and do not fit well with the current mover based I/O model.

Work has already begun on 1.9.2, our next feature release. It is our intention to make bug fix release to both 1.9.0 and 1.9.1, should this become necessary.

Upgrading from 1.9.0

There are no restrictions on the order in which components can be upgraded to 1.9.1: Any mix of 1.9.0 and 1.9.1 is supported.

The new pool is a drop in replacement. It is therefor possible to upgrade a single pool to 1.9.1 in an existing 1.9.0 installation for the purpose of evaluation and testing. At any time the installation can be downgraded to 1.9.0. The restrictions described in the 1.9.0 release notes also apply to 1.9.1 and a 1.9.1 pool should not be used with any 1.8.0 head nodes.

Besides the new pool, 1.9.1 introduces significant changes to the logging infrastructure, see below. To take advantage of these changes on a dCache node, that node needs to be updated to 1.9.1.

Important: Apparently, config/log4j.properties is not always automatically updated upon upgrade. Please make sure to use the version for the 1.9.1 release, as logging will otherwise be very noisy and likely in a wrong format.

Known Issues

Modified Handling of Transfer Failures

The new pool treats any upload failure as an error and marks the replica as broken (using the flags BAD and FROM_CLIENT). This is a change from the old pool, which would mark such files as PRECIOUS.

The new behaviour is currently being debated by the dCache team and will be altered again in the future.

Changed and New Features in 1.9.1

Refactored Pool

As described above, 1.9.1 ships with a refactored pool. The new pool is an evolution of the old pool, not a complete rewrite. The old pool is still shipped with 1.9.1, although deactivated by default. A batch file is available for activating the old pool.

Modular Pool Design

Although not really a new feature, the modular structure is evident through several commands in the pool. In particular the output of the info command has changed. The output is now grouped per module.

A new command set starting with the prefix bean was added. The bean commands allow inspection of the loaded modules. The commands were mainly introduced for debugging purposes.

Removed pool functionality

Several features have been removed from the new pool compared to the old one. Most of these features did not work in the old pool anyway or have been replaced by new concepts. The following features are not supported:

Log4j

For some time now dCache has used two logging systems. An old logging system based on a simple printout level per cell, and a new logging system based on Log4j. This has led to some confusion as the two systems used incompatible configuration mechanism and inconsistent formatting. This has been resolved in version 1.9.1.

Internally, dCache still uses two logging systems, but the old system has been refurbished to call-through to Log4j. That means the user only has to deal with one logging system.

The old logging system only had two log levels (debug and error). These are mapped to equivalent log levels in Log4j, but the mapping may be adjusted through the printout level. This is a stopgap solution while some code still uses the old logging calls and will eventually be removed. Code logging to Log4j directly is not affected by the printout level. The Log4j runtime user interface introduced in version 1.9.0 should be preferred to adjusting the printout level.

The Cells pinboard system has been restructured. Previously code would write directly to the pinboard using special log calls. This has been replaced by a custom Log4j appender. This appender redirects messages to the appropriate cell's pinboard. It is configurable through Log4j which messages are added to the pinboard.

Log Context

We have introduced a logging context. Using our default Log4j configuration, the log context is printed in square brackets in front of any log message. The content of the log context depends on the context (hence the name), but will in many cases contain:

  1. The source of the request (cell name)
  2. Session ID
  3. Message type
  4. PNFS ID

Session ID uniquely identifies an activity across multiple cells. Support for session ID is limited in 1.9.1, since not all components generate or forward the session ID. Support will improve over time.

Migration Module

A migration module was written for the new pool. This module subsumes the functionality of the copy module for the maintenance cell. The migration module allows replicas to be copied or moved between pools. If 1.9.0 head nodes are used, then the only supported value for the -target option is pool.

In contrast to the old copy module, the migration module can maintain sticky flags and can update the state of the source after transfer, including deleting the source replica. Notice that the migration module is unaware of the pin manager and the space manager.

More development is planned for future releases. Known issues: Performing several migration tasks from the same source pool may in rare cases interact in unforeseen ways. Logging is almost non-existing. In case of failures, the module retries eagerly without pausing.

The migration module is accessed through the admin commands of the pool. All commands are prefixed with migration. Please use help migration copy for usage information. For your convenience, the documentation is reprinted below:

Copies files to other pools. Unless filter options are specified,
all files on the source pool are copied.

The operation is idempotent, that is, it can safely be repeated
without creating extra copies of the files. If the replica exists
on any of the target pools, then it is not copied again.

Both the state of the local replica and that of the target replica
can be specified. If the target replica already exists, the state
is updated to be at least as strong as the specified target state,
that is, the lifetime of sticky bits is extended, but never reduced,
and cached can be changed to precious, but never the opposite.

Syntax:
  copy [options] <target> ...

Options:
  -state=cached|precious
          Only copy replicas in the given state.
  -sticky[=<owner>[,<owner> ...]]
          Only copy sticky replicas. Can optionally be limited to
          the list of owners. A sticky flag for each owner must be
          present for the replica to be selected.
  -storage=<class>
          Only copy replicas with the given storage class.
  -pnfsid=<pnfsid>
          Only copy the replica with the given PNFS ID.
  -smode=same|cached|precious|removable|delete[+<owner>[(<lifetime>)] ...]
          Update the local replica to the given mode after transfer.
          'same' does not change the local state (this is the
          default), 'cached' marks it cached, 'precious' marks it
          precious, 'removable' marks it cached and strips all
          existing sticky flags, and 'delete' deletes the replica.
          An optional list of sticky flags can be specified. The
          lifetime is in seconds. A lifetime of 0 causes the flag
          to immediate expire. Notice that existing sticky flags
          of the same owner are overwritten.
  -tmode=same|cached|precious[+<owner>[(<lifetime>)] ...]
          Set the mode of the target replica. 'same' applies the
          state and sticky bits of the local replica (this is the
          default), 'cached' marks it cached, 'precious' marks it
          precious. An optional list of sticky flags can be
          specified. The lifetime is in seconds.
  -select=proportional|best|random
          Determines how a pool is selected from the set of target
          pools. 'proportional' selects a pool with a probability
          inversely proportional to the cost of the pool. 'best'
          selects the pool with the lowest cost. 'random' selects
          a pool randomly. The default is 'proportional'.
  -target=pool|pgroup|link
          Determines the interpretation of the target names. 'pool'
          is the default.
  -refresh=<time>
          Specifies the period in seconds of when target pool
          information is queried from the pool manager. The
          default is 300 seconds.
  -exclude=<pool>[,<pool> ...]
          Exclude target pools.
  -concurrency=<concurrency>
          Specifies how many concurrent transfers to perform.
          Defaults to 1.

dCache 1.9.2 Goals

Changelog since 1.9.0-4

Pool

pnfsDomain

dCacheDomain

Cells

infoDomain

Misc

SRM Client Tools

FTP Door

Xrootd

hsmcp.rb