1.9.5 Release Notes

The main focus areas of the 1.9.5 release are optimisations and polish ahead of LHC data taking. This release is expected to be maintained throughout the first LHC data taking run.

Upgrade Instructions

Incompatibilities

Please consider the following changes when upgrading from a version before 1.9.5-1:

Compatibility

It is safe to mix pools of releases 1.9.4 and 1.9.5. Head nodes and doors must be upgraded to 1.9.5 together and cannot be mixed with head nodes or doors of releases before 1.9.5. Components of different 1.9.5 releases can be mixed freely.

Compatibility Matrix

We distinguish between pool components and head nodes. Any component which is not a pool is considered a head node, including doors. The following table shows compatibility between different releases.

1.9.5-4 Head1.9.5-4 Pool
Head 1.9.1-1..7,9..11nono
1.9.1-8nono
1.9.2-1..5,8..11nono
1.9.2-6,7nono
1.9.3-1..4nono
1.9.4-1..4nono
1.9.5-1..4yes yes
Pool 1.9.1-1..7,9..11yesyes[2]
1.9.1-8yes[1]yes[2]
1.9.2-1..5,8..11yesyes[2]
1.9.2-6,7yes[1]yes[2]
1.9.3-1..4yesyes[2]
1.9.4-1..4yesyes
1.9.5-1..4yesyes
  1. The migration module will not work for -target=pgroup and -target=link.
  2. The migration module does not work.

1.9.5-4

The previous patch level release contained an RPM dependency on the java-package RPM. This broke compatibility with a number of distributions that did not provide this package. The dCache 1.9.5-4 release removes this dependency.

In addition to the dependency change, the following fixes are included:

Detailed changelog 1.9.5-3 to 1.9.5-4

1.9.5-3

Detailed changelog 1.9.5-2 to 1.9.5-3

1.9.5-2

Detailed changelog 1.9.5-1 to 1.9.5-2

1.9.5-1

Permission Checking and ACLs

In dCache, file permission checking such as for create, read, and delete has traditionally been the responsibility of the doors. Starting with the 1.9.5 release, this check can optionally be moved to PnfsManager. Besides the structural benefits of enforcing permissions at a single point, there are performance gains from avoiding extra round trips between the door and PnfsManager. To enable permission checking inside PnfsManager, define permissionPolicyEnforcementPoint to PnfsManager in config/dCacheSetup of doors:

permissionPolicyEnforcementPoint=PnfsManager

Currently PnfsManager based permission checking is only fully supported by the DCAP and FTP doors. For the SRM door, permission checking for srmRm, srmMove, srmMkdir and srmLs is always delegated to PnfsManager, no matter the definition of permissionPolicyEnforcementPoint. This speeds up those operations and enforces ACLs if ACLs are enabled in the PnfsManager. Permission checking for other SRM operations are still performed in the SRM door and are not subject to ACLs yet.

The configuration parameter PermissionHandlerDataSource was removed. Permission handlers in doors now always query meta data from PnfsManager rather than from the mounted name space file system.

Enabling ACLs has been simplified in dCache 1.9.5. To enable ACLs, define aclEnabled to true in config/dCacheSetup of doors and PnfsManager. This must be done in addition to defining the database connection parameters. There is no longer a need to redefine the permissionHandler parameter - it is however still respected if defined. If ACLs are used, these must now also be configured in pnfsDomain or chimeraDomain.

Directory Listing

Directory listing in FTP, SRM and the dirDomain used to be performed on the mounted name space. Starting with 1.9.5, doors request the directory listing from the PnfsManager.

One consequence is that it is no longer required to mount the name space on FTP doors and the dirDomain. SRM still uses the mounted file system for some other list related operations. With PNFS, PnfsManager must have access to the mounted name space. With Chimera, even PnfsManager does not require the mounted file system.

Another consequence is that directory listing through FTP is now significantly faster.

PnfsManager executes the directory listing on dedicated threads. The number of threads used is defined by the parameter pnfsNumberOfListThreads in config/dCacheSetup.

For Chimera, the directoryLookupPool was previously started inside the chimeraDomain. Starting with version 1.9.5, the regular directoryLookupPool in the dirDomain works with Chimera, and thus chimeraDomain no longer contains directoryLookupPool.

Starting dCache as an unprivileged user

Until dCache 1.9.5, there was no support for running dCache as a user different from root. This has now changed. If the variable user is defined in config/dCacheSetup, then the init scripts will drop privileges and start dCache as that user.

Log files are still generated as root, which means they can still be written to the default location of /var/log/. Ownership of PID files is changed to the unprivileged user, which means they can still be written to the default location of /var/run/. To support automatic restart, the dCache init script generates a stop file to surpress restarts when dcache stop is executed. This used to be generated in the jobs/ directory. Starting with dCache 1.9.5, these files are now generated as hidden files in /tmp.

Please take care that the user under which dCache is executed has sufficient priviledges. Watch out for the following:

Hot-spot detection

The trigger mechanism for hot-pool replication has been enhanced by integrating an algorithm contributed by Jon Bakken, FNAL. The algorithm ranks pools based on their CPU cost. The n-th percentile pool cost is chosen, where the n-th percentile is the cost of the pool within that ranking: 0% selects the lowest pool cost, 50% selects the median cost and 100% selects the highest pool cost. This cost is used as the threshold for establishing pool-to-pool "on cost" transfers.

In PoolManager, specifying a on-cost value as a number not ending with "%" will result in the old behaviour; all current dCache deployments will have such a value. Specifying a value ending with "%" will result in the percentile cost being calculated dynamically and the resulting value used as the threshold for on-cost pool-to-pool transfers.

Stage Protection

Stage protection was added in dCache 1.9.4. In version 1.9.4, stage protection had to be configured in every door and in the PinManager. In version 1.9.5, the stage protection can now optionally be configured in the PoolManager rather than in the doors and PinManager. Thus the white-list only needs to be present on a single node. To enable this, define the following in config/dCacheSetup:

stagePolicyEnforcementPoint=PoolManager

The file name of the white-list must still be configured by setting the stageConfigurationFilePath parameter, however the parameter only needs to be defined on the nodes which enforce the stage protection, i.e. either on the doors and PinManager, or in PoolManager.

Cell Communication

Robustness of the cell message tunnel has been improved. In particular we moved to the Java NIO API for I/O and disabled Nagle's algorithm on the TCP connections used for cell communicaton. This has dramatically reduced the latency of cell communication.

PnfsManager

PnfsManager was restructured internally. As a consequence the option -storageinfo-provider is no longer accepted.

PnfsManager now supports an operation to set several attributes of a file in one operation. This has cut down on the number of messages a door or a pool needs to send to PnfsManager during upload of a file. This also reduces the number of PNFS operations required for setting these attributes. However as a consequence of this change, pools from 1.9.5 releases will not work with older head nodes.

SrmSpaceManager

SrmSpaceManager supports changing the lifetime of a reservation with the update space reservation command.

The SpaceManagerDefaultRetentionPolicy parameter was removed, as it was no longer used. The SpaceManagerDefaultAccessLatency parameter was renamed to DefaultAccessLatencyForSpaceReservation to better reflect its purpose. The old parameter is still respected if it is defined.

PinManager

PinManager now supports the command bulk pin for administratively pinning a large number of files. Which files to pin is defined by a local file on the node hosting the PinManager.

Pools

The migration module can now filter on access latency and retention policy using the -al and -rp options. The -exclude option now supports single character and multi character wildcards.

The flush logic was extended to handle FILE_NOT_FOUND errors from PnfsManager: Such an error now causes the file to be deleted from the pool, thus avoiding an infinite retry loop in case the file was not properly registered in the companion. This change will only have an effect with Chimera or with PNFS supporting a trash table.

The pool every 60 seconds checks the amount of free space on the file system and adjusts the pool size if the amount of free space on the file system is smaller than the configured amount of free space in the pool.

Info Service

The info service has been refactored to ease unit testing. This is now the most well-tested component in dCache.

External libraries

The JGlobus library used for GSI and GridFTP handling has been updated to version 1.7.

The log4j logging library has been updated to version 1.2.15.

Chimera

Scalability of the Chimera NFS daemon was improved. Log and PID files are now stored in /var/log/ and /var/run/, respectively.

Protocol support

SRM

Verbosity of logging in the SRM has been reduced. Some of the code has been transitioned to use log4j, which exposes more log levels than used in previous versions.

SRM has seen a few performance related changes. In particular the srmPutDone operation performed at the end of an upload is now faster.

The SRM code has been refactored internally to prepare for multiple SRM doors running on top of the Terracotta distributed shared memory framework. Running multiple SRM doors is not yet supported for production setups, however much of the infrastructure is in place to support such setups.

Error reporting of srmMkdir and srmRmdir has been improved. In particular we now use specific error codes rather than the generic SRM_FAILURE.

FTP

The legacy callouts to encp from inside the GridFTP door have been removed. Hence the option -encp-put is no longer supported.

Xrootd

In version 1.9.4, the XROOTD door (also known as the XROOTD redirector) was reimplemented for better scalability. In version 1.9.5, the XROOTD mover (also known as the XROOTD data server) underwent the same kind of transformation. The new versions uses significantly fewer threads, and we hope it will scale better and be more robust under load than the old version.

The old mover is still shipped with dCache and can be activated by modifying the pool movermap.

NFS 4.1

The NFS 4.1 implementation was refactored for better scalability and thread management.

Detailed changelog 1.9.4-1 to 1.9.5-1