Edit me on GitHub

Storage Resource Reporting / Storage Descriptor

WLCG Storage Resource Reporting

The WLCG Storage Resource Reporting (SRR) is JSON based file that describes storage resources according to WCLG operational team specified format.

The dCache implementation is integrated into frontend service and exposed as REST-API. To access SRR reporting a frontend service must be defined, if not exist:

[srrDomain]

[srrDomain/frontend]
frontend.authn.basic=true
frontend.authn.protocol=http
frontend.authz.anonymous-operations=READONLY
frontend.srr.shares=user:/cms,store:/cms

NOTE: By default, the access to SRR information is restricted to localhost only and listens on port 3880

If desired, the access to the srr information can be made public as with corresponding configuration:

frontend.srr.public=true

To access/inspect SRR information a simple curl command can be used:

$ curl http://localhost:3880/api/v1/srr

{
  "storageservice" : {
    "name" : "TEST",
    "id" : "dcache-se",
    "servicetype" : "multidisk",
    "implementation" : "dCache",
    "implementationversion" : "7.2.6",
    "qualitylevel" : "production",
    "latestupdate" : 1658136506,
    "storagecapacity" : {
      "online" : {
        "totalsize" : 11223418619294662,
        "usedsize" : 9799559662614017
      }
    },
    ....
}

The storage shares that represent space reservations are published automatically under names that matches space storage description. To publish shares that represent pool groups an explicit configuration is required. The frontend.srr.shares controls which pools groups should be published, for example:

frontend.srr.shares=user:/cms,store:/cms

publishes pool groups user and store for VO cms and will produce output like:

    "storageshares" : [ {
      "name" : "store",
      "timestamp" : 1601977212,
      "totalsize" : 5973622320626816,
      "usedsize" : 4904609242438918,
      "assignedendpoints" : [ "all" ],
      "vos" : [ "/cms" ]
    }, {
      "name" : "user",
      "timestamp" : 1601977212,
      "totalsize" : 4078599175816242,
      "usedsize" : 4025729976567280,
      "assignedendpoints" : [ "all" ],
      "vos" : [ "/cms" ]
    } ]

The WLCG SRR uses the following info-provider properties to control the generated json output:

info-provider.se-unique-id=
info-provider.se-name=
info-provider.dcache-architecture=
info-provider.dcache-quality-level=
storage-descriptor.door.tag=

The Storage Descriptor format, also known as a Storage Resource Reporting (SRR) record, is a JSON object that describes a storage system. This includes information about which protocol endpoints are available and about storage accounting information.

The WLCG Storage Space accounting project has a goal of enabling the high level overview of the total and available space provided by the WLCG infrastructure. dCache support WLCG Storage Resource Reporting, which is for service discovery and reporting capacity usage.

The storage-descriptor/SRR JSON object is stored as a regular file within dCache. This allows clients to download the information using any of the supported transfer protocols. Through the file’s permissions, it is also possible to control who is able to obtain this information.

Legacy storage descriptor

How the file is generated

dCache is supplied with a script called dcache-storage-descriptor. Running this script will combine static information (from dCache configuration) and dynamic information (from the info service) to generate a file containing the Storage Descriptor JSON Object. This file is stored on the local filesystem, as /var/spool/dcache/storage-descriptor.json by default.

Once the file is written, it must be imported into dCache. This could be achieved by NFS-mounting dCache and configuring the script to write into dCache directly. Alternatively, it may be uploaded using any of the supported protocols (HTTP, FTP, dcap, xroot) with any of the supported authentication schemes (username+password, X.509, Kerberos, OIDC, macaroons). Note that the dcache-storage-descriptor script does not upload the file itself (unless it writes into a mounted dCache).

To maintain the liveliness of the information, it is recommended to run the dcache-storage-descriptor script periodically using a cron job. This cron job could also upload the file into dCache.

Setting up dCache for Storage Descriptor

The following steps are needed to enable Storage Descriptor.

1. Ensure services are running

There are two dCache services on which the dcache-storage-descriptor script relies: info and httpd.

The info service collects information from other parts of dCache and caches the results.

The httpd service provides an HTTP endpoint for admin and script access. The dcache-storage-descriptor script uses the httpd service to obtain the dynamic information it needs from the info service.

These services may already be running; however, if not, dCache layout files need to be updated so they are running. There are no requirements on the domains within which these services run, nor on the hosts on which these domains run.

2. Review the storage-descriptor properties

The dcache-storage-descriptor script uses dCache’s standard configuration system to adjust how it collects information and to provide some static details.

The script is run independently from dCache: it is not a service and is not run within a domain. Therefore, configuration for the script must not appear in any of the domain- or service-specific sections of the layout file. Instead, the script’s behaviour may be adjusted by adding configuration to the top of the layout file, or in the /etc/dcache/dcache.conf file.

Some of the properties default values are placeholders and must be modified. In particular, the storage-descriptor.unique-id must be configured with the unique identifier for this dCache instance. The default value (dcache.example.org) is not appropriate.

Some properties have default values that may be correct, but should be reviewed. The storage-descriptor.http.host property is an example. This describes the host name of the machine running the httpd service. The default (localhost) is good if the httpd service is running on the same machine as the script.

In general, you should look at all the configuration options, as listed in the defaults file /usr/share/dcache/defaults/storage-descriptor.properties, and consider which values should be adjusted.

3. Configure doors

The Storage Descriptor format includes information about which endpoints are available in your dCache instance. The loginbroker tags, published by doors, controls whether or not a door is include in the output.

The storage-descriptor.door.tag property controls the tag name used by the script to select doors for publishing. The default value is storage-descriptor, so (by default) any door publishing this tag is described by Storage Descriptor output.

By default, all doors include the storage-descriptor tag, and consequently are published in the Storage Descriptor output. To suppress publishing a door, configure the door’s loginbroker tags so they exclude this storage-descriptor tag. This may be done within the dCache configuration or dynamically through the admin interface. The former is persistent when restarting dCache while the latter does not require restarting the door’s domain.

Using dCache configuration

Each door has its own property for controlling which tags are published; for example, WebDAV doors use the webdav.loginbroker.tags configuration property and FTP doors use the ftp.loginbroker.tags configuration property. By default, all doors inherit tags from a common default set of tags: dcache.loginbroker.tags, which (by default) contains the storage-descriptor tag.

To prevent all doors from being published, update the dcache.loginbroker.tags configuration property, removing the storage-descriptor tag.

To prevent publishing doors of a particular type, remove the storage-descriptor from dcache.loginbroker.tags and add it to the door-specific configuration (e.g., webdav.loginbroker.tags for WebDAV doors) for all protocols that should be published. Alternatively, you can update the door-specific configuration for all protocols that should not be published, copying the desired tags from dcache.loginbroker.tags.

To prevent a specific door from being published, update that door’s definition (in the layout file) to configure the door-specific property; e.g.,

[dCacheDomain/webdav]
webdav.cell.name=WebDAV-S-${host.name}
webdav.net.port=2881
webdav.authz.anonymous-operations=READONLY
webdav.authn.protocol=https
webdav.redirect.on-read=false
webdav.redirect.on-write=false
webdav.loginbroker.tags = dcache-view

Note that any changes to configuration properties requires the corresponding domains to be restarted before they have an effect.

Using the admin interface

Connect to the door using the \c command:

[celebrimbor] (local) admin > \c WebDAV-celebrimbor

The prompt will change to indicate that you are now connected to that specific door.

The info command will show the current loginbroker tags:

[celebrimbor] (WebDAV-celebrimbor@dCacheDomain) admin > info
--- cache-login-strategy (Processes mapping requests) ---
gPlazma login cache: CacheStats{hitCount=0, missCount=0, loadSuccessCount=0, loadExceptionCount=0, totalLoadTime=0, evictionCount=0}
gPlazma map cache: CacheStats{hitCount=0, missCount=0, loadSuccessCount=0, loadExceptionCount=0, totalLoadTime=0, evictionCount=0}
gPlazma reverse map cache: CacheStats{hitCount=0, missCount=0, loadSuccessCount=0, loadExceptionCount=0, totalLoadTime=0, evictionCount=0}

--- lb (Registers the door with a LoginBroker) ---
    LoginBroker      : LoginBrokerTopic@local
    Protocol Family  : http
    Protocol Version : 1.1
    Port             : 2880
    Addresses        : [localhost/127.0.0.1]
    Tags             : [cdmi, dcache-view, glue, srm, storage-descriptor]
    Root             : /
    Read paths       : [/]
    Write paths      : [/]
    Update Time      : 5 SECONDS
    Update Threshold : 10 %
    Last event       : UPDATE_SENT

--- path-mapper (Mapping between request paths and dCache paths with OwnCloud Sync client-specific path trimming.) ---
Root path : /

--- pool-monitor (Maintains runtime information about all pools) ---
last refreshed = 2019-11-22 10:38:34.824 (20 seconds ago)
refresh count = 6
active refresh target = [>SpaceManager@local]

--- resource-factory (Exposes dCache resources to Milton WebDAV library) ---
Allowed paths: /
IO queue     : 

In the above example, the door is publishing five tags: cdmi, dcache-view, glue, srm and storage-descriptor. To remove this door from storage-descriptor output, use the lb set tags command:

[celebrimbor] (WebDAV-celebrimbor@dCacheDomain) admin > lb set tags cdmi dcache-view glue srm

The info command will now show the storage-descriptor tag is no longer listed:

[celebrimbor] (WebDAV-celebrimbor@dCacheDomain) admin > info
--- cache-login-strategy (Processes mapping requests) ---
gPlazma login cache: CacheStats{hitCount=0, missCount=0, loadSuccessCount=0, loadExceptionCount=0, totalLoadTime=0, evictionCount=0}
gPlazma map cache: CacheStats{hitCount=0, missCount=0, loadSuccessCount=0, loadExceptionCount=0, totalLoadTime=0, evictionCount=0}
gPlazma reverse map cache: CacheStats{hitCount=0, missCount=0, loadSuccessCount=0, loadExceptionCount=0, totalLoadTime=0, evictionCount=0}

--- lb (Registers the door with a LoginBroker) ---
    LoginBroker      : LoginBrokerTopic@local
    Protocol Family  : http
    Protocol Version : 1.1
    Port             : 2880
    Addresses        : [localhost/127.0.0.1]
    Tags             : [cdmi, dcache-view, glue, srm]
    Root             : /
    Read paths       : [/]
    Write paths      : [/]
    Update Time      : 5 SECONDS
    Update Threshold : 10 %
    Last event       : UPDATE_SENT

--- path-mapper (Mapping between request paths and dCache paths with OwnCloud Sync client-specific path trimming.) ---
Root path : /

--- pool-monitor (Maintains runtime information about all pools) ---
last refreshed = 2019-11-22 10:41:35.064 (7 seconds ago)
refresh count = 12
active refresh target = [>SpaceManager@local]

--- resource-factory (Exposes dCache resources to Milton WebDAV library) ---
Allowed paths: /
IO queue     : 

Note that this change is not permanent: restarting the domain will return the tags to their configured values.

4. Configure tape information

If a site has no tape storage to report, this step may be skipped.

To configure tape usage reporting, the configuration property storage-descriptor.paths.tape-info must be modified to point to an XML file maintained by the site.

The default value for this property is /usr/share/dcache/xml/tape-info-empty.xml, which is a file delivered with dCache. This tape-info-empty.xml file serves two purposes: first, it serves as a realistic file for sites without any tape storage; second, it provides detailed information on how data should be structured within a “live” document for sites with tape storage.

Sites that should report tape usage information must provide a “live” document that adheres to this XML file format. It is the site’s responsibility to acquire the required information and to ensure the liveliness of this information.

Note: this tape-info file has the same format as for GLUE-based tape storage accounting. A common file may be used to satisfy both uses.

5. Install dependencies

The dcache-storage-descriptor script requires the xsltproc command. As this is not included as a dependency in the dCache package, it may need to be installed manually.

In RedHat-derived distributions, this is usually located within the libxslt package, and may be installed using yum:

yum -y -d 1 install libxslt

In Debian-derived distributions, this is usually located within the xsltproc package, and may be installed using apt:

apt install xsltproc

6. Run the script manually

As a simple test, try running the script manually. This should be sufficient to provide a JSON file:

dcache-storage-descriptor
|JSON available at /var/spool/dcache/storage-descriptor.json

Note: the script does not need to be run as root; however, it must have sufficient permissions to write the output JSON file.

You can also supply the URL from which the script should take the XML data as a command-line option; e.g.,

dcache-storage-descriptor http://static-data.example.org:8080/info
|JSON available at /var/spool/dcache/storage-descriptor.json

The supplied URL may be anything that curl understands; for example, the XML info data may be stored in a file and the script run against that saved data.

curl -o /tmp/info.xml http://localhost:2288/info
|  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
|                                 Dload  Upload   Total   Spent    Left  Speed
|100  166k  100  166k    0     0  6930k      0 --:--:-- --:--:-- --:--:-- 7242k
dcache-storage-descriptor file:/tmp/info.xml
|JSON available at /var/spool/dcache/storage-descriptor.json

Once the Storage Descriptor output has been generated, it may be inspected. In particular, look for any example.org entries, which point that further configuration is necessary.

7. Configure cron

Configure cron to run a small script that generates the Storage Descriptor output and uploads this into dCache.

To do this via NFS, mount dCache anywhere; e.g.,

mkdir /dcache
mount -o vers=4.1 localhost:/ /dcache

Update the configuration, so the script writes the output directly into dCache; e.g.,

storage-descriptor.output.path = /dcache/storage-descriptor.json.new

As a final step, the cron job would rename the file /dcache/storage-descriptor.json.new to /dcache/storage-descriptor.json. This is done to make the update atomic: a client reading the dCache file /storage-descriptor.json will either get the complete old version or the complete new version, but never see partial content.

To upload the file using WebDAV, you can use any standard client. For example, the following curl command uploads the file using username + password authentication.

curl -u fakeUser:fakePassword -T /var/spool/dcache/storage-descriptor.json http://dcache-webdav-door.example.org:2880/specific-path/

Different authentication schemes (like X.509, username+password, Kerberos, Macaroon, trusted-host, OIDC, …) are supported in dCache. For details, please see the authentication section of dCache User Guide’s WebDAV chapter.

Please note that the path to the script and the output of the result depend on the package you are running and your configuration.

8. Ensure file is readable through HTTP

The Storage Descriptor file may be stored anywhere in dCache namespace; however, WLCG currently require that the file be readable via HTTP and the URL recorded within CRIC.

Currently, WLCG will read the file using an X.509 robot credential with VOMS asserting WLCG experiment VO membership. File permissions and ownership should be chosen to allow read access for such clients.

Configurable properties of dCache Storage Descriptor

Below is table that comprises list of configurable storage’s properties and their definitions. Note that any value that with ${...} indicate that the value depends on either dcache properties or the package.

Properties Definition Default value Possible values
storage-descriptor.name The human-readable name that describes this dCache instance.
storage-descriptor.unique-id A unique identifier for your dCache instance. dcache.example.org
storage-descriptor.quality-level The “quality” of the dCache instance. production development or testing or pre-production or production
storage-descriptor.http.host Configuration options on where to fetch dynamic information. The name of the machine that is running the dCache web server. This is used to build the URI for fetching dCache’s current state. localhost
storage-descriptor.http.port The TCP port the dCache web server is running on. This is used to build the URI for fetching dCache’s current state. 2288
storage-descriptor.paths.tape-info Nearline accounting. The location of the nearline storage XML file. Sites with nearline storage should modify this value to point to a file that they maintain. Sites without nearline storage should leave this value alone. ${dcache.paths.share}/xml/tape-info-empty.xml
storage-descriptor.door.tag Login-provider tag. The tag that doors identify themselves with before they are published. storage-descriptor
storage-descriptor.output.path Output path. The location where the JSON output is written. /var/spool/dcache/storage-descriptor.json or ${dcache.home}/var/spool/dcache/storage-descriptor.json
storage-descriptor.xslt.path XSLT path. The location of the XSLT stylesheet that transforms the info service’s XML into the Storage Descriptor JSON format. ${dcache.paths.share}/xml/xslt/storage-descriptor.xsl