dCache 2.5 Release Notes

Highlights:

Table of contents

Upgrade Instructions

Starting with version 2.4.0, dCache requires Java 7.

Compatibility

If NFS4.1 and HTTP PUT redirect are not used, then 2.5 doors, head nodes can be mixed with pools of releases 2.2 or newer. Doors and head nodes have to be updated at the same time.

If server side SRM copy is used dCache 2.5 is not compatible with pools of dCache 2.2.8 or before. Please upgrade to dCache 2.2.9 in this case.

Please note that due to a regression introduced in dCache 2.5.0 dCache 2.5.0 and dCache 2.5.1 are not compatible with dCache 2.2 to 2.4. As this has been fixed in version 2.5.2, dCache 2.5.2 is not compatible with dCache 2.5.0 or dCache 2.5.1.

dCache 2.5.2


Chimera

Fixed compatibility issues with uberftp and list operations on the root directory.

Service: pool

Fixed a bug that was causing the number of movers shown in the info page and on the httpd pages to be negative.

Fixed a regression in the migration module. The regression broke the functionality of the migration info command.

Service: poolmanager

Fixed glob matching for psu commands.

Service: pnfsmanager

Fixed compatibility issues with uberftp and list operations on the root directory.

doors

Fixed compatibility issues with uberftp and list operations on the root directory.

Miscellaneous

Added a producer of StAR accounting records. The StAR producer works with EGI's Apel SSM v2. Sites publishing StAR records for EGI must update their SSM RPM to at least v2.0.1.

In the case when the OS does not allow dCache to create more threads dCache wants to restart the service. Fixed a bug that did not allow dCache to stop a service in this case.

Fixed an infinite loop and the false positive logging of a stack trace in message handling.

Fixed the incompatibility of dCache 2.5 with dCache 2.2 to 2.4. As a side effect 2.5.2 is incompatible with 2.5.0 and 2.5.1.

Improved compatibility issue with Java 7.

Fixed the problem of a DN containing double slashes like in http://.

Changelog 2.5.1 to 2.5.2

dCache 2.5.1


Chimera

Updated output from chimera-cli ls command to match ls -l.

Service: pool

Fixed a timeout bug in the migration module.

Service: pinmanager

Fixed a regression that was introduced in dCache 1.9.13 and prevented pinmanager from retrying stage failures.

Service: spacemanager

Fixed a bug that produced a stack-trace if spacemanager operations were attempted when the spacemanager service was disabled.

Service: httpd

Fixed a problem with usage charts on the pool and pool groups pages not being shown when using certain settings for system locales.

Fixed the bad behavior of the webadmin whereby the user was redirected back to an unencrypted connection after logging on to an admin page.

Made webadmin login more intuitive and friendly. Now, a valid certificate which is mapped to an admin user allows immediate access to the page without having to go through the login page.

Service: billing

Fixed a problem where the billing service would spontaneously restart, due to running out of memory when computing the 24-hour histograms from fine-grained data tables.

Service: alarms

Modified the initialization process of the alarms service to make it easier to use an RDBMS instead of the XML file to store the alarms. Just create the alarms database with

createdb -U srmdcache alarms

and set the property alarms.store.db.type=rdbms in the dcache.conf file.

Added the possibility for the use of group tags in the include-in-key in the definition of an alarm.
As an example we show the definition of the checksum alarm:

<alarmType>
     logger:org.dcache.pool.classic.ChecksumScanner,
     regex:"Checksum mismatch detected for (.+) - marking as BROKEN",
     type:CHECKSUM,
     level:ERROR,
     severity:MODERATE,
     include-in-key:group1 type host service domain
</alarmType>

Here the tag group1 extracts the pnfsid from the message and includes only that portion of the message string as an identifier.
The tag must be expressed as "group + number" without any whitespace. group0 is identical to "message".

Defined the following alarms:

TYPE [SEVERITY]
--------------------------------------------------------------------------------------
SERVICE_CREATION_FAILURE [CRITICAL]
DB_OUT_OF_CONNECTIONS [CRITICAL]
DB_UNAVAILABLE [CRITICAL]
JVM_OUT_OF_MEMORY [CRITICAL]
OUT_OF_FILE_DESCRIPTORS [CRITICAL]
IO_ERROR [HIGH]
HSM_READ_FAILURE [HIGH]
HSM_WRITE_FAILURE [HIGH]
LOCATION_MANAGER_UNAVAILABLE [HIGH]
POOL_MANAGER_UNAVAILABLE [HIGH]
POOL_DISABLED [MODERATE]
CHECKSUM [MODERATE]

Added the property alarms.server.host to configure the host on which the alarms server runs.

Service: srm

Fixed a bug that prevented clean shutdown of the SRM.

Fixed credential delegation for srmCopy transfers which was broken due to the jGlobus 2 update.

dcache script

Fixed a bug that lead to an error message on attempts to run the dcache command. This problem occurred when the partition storing the configuration cache ran out of space. Since this is stored as /var/lib/dcache/config/cache by default, this can happen if log files are aggressively using up all available space and the dCache configuration has changed since the last dcache command.

# dcache status
/usr/share/dcache/lib/loadConfig.sh: line 89: getProperty: command not found
/usr/share/dcache/lib/loadConfig.sh: line 90: getProperty: command not found
/usr/bin/dcache: line 370: getProperty: command not found
New properties:

Miscellanious

Fixed the incompatiblity of dCache 2.5.0 pools with pools older than 2.5.

Added a possibility to catch system start-up exceptions for a given domain and process them as alarms.

Increased speed of system shutdown by fixing several small problems.

Changelog 2.5.0 to 2.5.1

dCache 2.5.0


Service: pool

Introduced the new property destroyOrphanReplicaOnFlush. By default this property is set to true, which does not change the former behaviour. Due to some sort of failure (like a power outage) it might happen that there are files without corresponding namespace entries. If the property destroyOrphanReplicaOnFlush is set to true these files will be destroyed on flush. Set this property to false to avoid this.

Enabled the migration module to work even if the pool it wants to read from is disabled.

Service: poolmanager

Added a possibility to see clients in the request container. With the option -l the command rc ls will not only list the number of requests but also the actual clients.

Example:

[example.org] (PoolManager) admin > rc ls -l
00002ECA2067CC954F8CBEBD872D20D04B2B@0.0.0.0/0-*/* m=2 r=0 [<unknown>] \
 [Suspended (pool unavailable) 11.29 16:19:03] {0,}
    DCap-3.0,131.169.185.68:35150
    NFS4-4.1:example.desy.de/131.169.185.68:714

Service: alarms

Added the new service alarms.

Any logging event can be defined as an alarm. Administrators can thus be directly notified of problems which need immediate attention and rectification.

To enable the alarms service, it is recommended to add it to a new domain which needs to be on the same node as the httpd service, e.g.:

[alarmserverDomain] [alarmserverDomain/alarms]

To be able to use the alarms webpage, you need to be able to login to the webadmin.

Service: gplazma2

Restored auth dcap functionality.

Made the location of the VOMS directory configurable for the VOMS and XACML plugins. It remains fixed (as it was originally) at the standard location for the GsiTunnel and the SRMAuthorizer.

Service: srm

Added the missing 'export SRM_PATH' statement to srmrmdir.

Added timeout arguments to GridFtpClient. The timeouts -first_byte_timout and -next_byte_timeout (in seconds) can be specified via the srmcp client. When transferring large files (~ 10GB) it is advisable to set -next_byte_timeout to e.g., 1200 seconds.

Service: nfs

Fixed last access time calculation in the mover.

Service: nfsv4.1

Added support for linux clients with numeric idmappings. A typical NFSv4 installation requires an LDAP or NIS server for user identity management. To allow NFSv3 style numeric id based mapping the legacy mode should be enabled in the dcache.conf file or in the layout file:

nfs.idmap.legacy= true

Service: dcap

Fixed GSS error message.

Modified dcap to not request GSI delegation.

Introduced the two new properties dcapAnonymousAccessLevel and dcapReadOnly to restore the read only dcap functionality which was lost at some point between 1.9.5 and 2.2.
The property dcapAnonymousAccessLevel controls anonymous user access level and is set by default to READONLY. This is the level of access in case that the authenticated login failed (e.g. for kerberos or gsi dcap). The plain dcap door provides unauthenticated and therefore anonymous access by definition. In order to enable writes via plain dcap doors this variable must be set to FULL.
The property dcapReadOnly enables/disables write access to any dcap door (regardless of anonymous or authenticated access). By default it is set to false.

Renamed the authenticated dcap door cell name from DCap-${host.name} to DCap-auth-${host.name} and introduced the new property dCapAuthPort with default value 22129 for its port.

Service: missing-files

Introduced this new service. This is an optional, pluggable component that allows dCache to respond to missing files. This central service instructs the door to either fail the request or retry (which makes sense only if the file has been fetched from some external source).

For the start the missing-files service can be enabled for the WebDAV door. The other doors will follow. To enable it write something like

  [someDoors/webdav]
      missing-files.enabled = true 
in your layout file.

We currently provide a single plugin semsg which uses an external program to send a notification when a user tries to read a file that doesn't exist. The default configuration is in the missingfiles-semsg.properties file. Anyone else may write their own plugins and add them to the comma-separated list missing-files.plugin.list like.

  [someDoors/webdav]
      missing-files.enabled=true
      missing-files.plugin.list=plugin1,plugin2,plugin3
These plugins are used to determine how dCache should react when a user attempts to read a missing file. Each plugin is asked in turn what to do until a plugin replies with a terminating answer or the list of plugins is exhausted. A plugin replies saying to fail the request, to retry the request or to ask the next plugin in the chain. If the last plugin defers the request then then the missing-files service will instruct the door to fail the request.

Note that populating dCache with files from a remote storage system is currently not possible since the plugin interface does not allow triggering 3rd-party copies.

Properties

New properties:

Miscellanious

Fixed incompatibility of JAIDA libraries with changes in java.awt package in Java 1.7 which prevented the generation of the billing histograms. The fix is backward compatible with Java 1.6.

Due to immanent transition to digital signatures that use SHA-2 (Secure Hash Algorithm) we had to migrate to jglobus-2 libraries to support SHA-2 signed certificates. Unlike previously used cog-jglobus libraries Jglobus-2 suports both SHA-1 SHA-2 signed certificates.

Changelog 2.4.0 to 2.5.0

Crossed out entries have been merged into the 2.4 branch.