release notes | Book: 1.9.5, 1.9.12 (opt, FHS), 2.11 (FHS), 2.12 (FHS), 2.13 (FHS), 2.14 (FHS), | Wiki | Q&A black_bg
Web: Multi-page, Single page | PDF: A4-size, Letter-size | eBook: epub black_bg

File Hopping on arrival from outside dCache

File Hopping on arrival is a term, denoting the possibility of initiating a pool to pool transfer as the result of a file successfully arriving on a pool. The file must have been written by an external client using any supported protocol (dCap, FTP, xrootd). Files restored from HSM or arriving on a pool as the result of a pool to pool transfer will not yet be forwarded.

Forwarding of incoming files is enabled per pool in the <hostname>.poollist file. The pool is requested to send a replicateFile message to either the PoolManager or to the HoppingManager, if available. The different approaches are briefly described below and in more detail in the subsequent sections.

  • The replicateFile message is sent to the PoolManager. This happens for all files arriving at that pool from outside (no restore or p2p). No intermediate HoppingManager is needed. The restrictions are

    • All files are replicated. No pre-selection, e.g. on the storage class can be done.

    • The mode of the replicated file is determined by the destination pool and can’t be overwritten. See ’File mode of replicated files’.

  • The replicateFile message is sent to the HoppingManager. The HoppingManager can be configured to replicate certain storage classes only and to set the mode of the replicated file according to rules. The file mode of the source file can’t be modified.

[return to top]

File mode of replicated files

The mode of a replicated file can either be determined by settings in the destination pool or by the HoppingManager.

  • If no HoppingManager is used for replication, the mode of the replicated file is determined by the p2p=<cached|precious> setting in the <hostname>.poollist file of the destination pool. The default setting is cached.

  • If a HoppingManager is used for file replication, the mode of the replicated file is determined by the HoppingManager rule responsible for this particular replication. If the destination mode is set to keep in the rule, the mode of the destination pool determines the final mode of the replicated file.

[return to top]

File Hopping managed by the PoolManager

File hopping configuration instructs a pool to send a ’replicateFile’ request to the PoolManager as the result of a file arriving on that pool from some external client. All arriving files will be treated the same. The PoolManager will process this request by trying to find a ’link’ with the following attributes :

Table 9.1. PoolManager Hopping Request Attributes

Data Flow DirectionProtocolStorage ClassClient IP Number
Pool 2 PooldCap/3Class of fileConfigurable

In order to get pool 2 pool enabled for a particular pool, the corresponding entry of that pool in the XXX.poollist file has to be extended by the replicateOnArrival key-value pair.

<PoolName> <PoolPath>  replicateOnArrival=PoolManager,<ip-number> ... 

where <ip-number> may be a real IP number of a farm node which may be taken as example node for others intending to read the file, or the IP number may be taken from a non existing IP number range. This range can be used to instruct the PoolManager to replicate files from this pool to a special set of destination pools.

Please see the section "File mode of replicated files" for the mode of the file on the destination pool.

[return to top]

Example for File Hopping by the PoolManager only

We assume that we want of have all files, arriving on pool ocean of host earth to be immediately replicated to a subset of read pools. This subset of pools is described by pool group ocean-copies. No other pool is member of pool group ocean-copies. Other than that, files arriving at pool mountain should be replicated to all read pools from which farm nodes on the 131.169.10.0/24 subnet are allowed to read.

The earth.poollist file must be modified as followes.

ocean     /bigdisk/pools/ocean     replicateOnArrival=PoolManager,192.1.1.1    <more options>
mountain  /bigdisk/pools/mountain  replicateOnArrival=PoolManager,131.169.10.1 <more options>

While 131.169.10.1 is a legal IP address e.g. of one of you farm nodes, the 192.1.1.1 IP address must not exist anywhere at your site.

Add the following lines to the PoolManager.conf in order to instruct the PoolManager to replicate files, arriving at the ocean pool to be replicated to the ocean-copies subset of your read pools.

#
#  define the read-pools pool group and add pool members
#
psu create pgroup farm-read-pools
#
psu addto pgroup farm-read-pools read-pool-1
psu addto pgroup farm-read-pools read-pool-2
psu addto pgroup farm-read-pools read-pool-3
psu addto pgroup farm-read-pools read-pool-4

#
psu create unit -net 131.169.10.0/255.255.255.0
#
psu create ugroup farm-network
#
psu addto ugroup farm-network  131.169.10.0/255.255.255.0
#
psu create link farm-read-link any-store any-protocol farm-network
#
psu addto link farm-read-link farm-read-pools
#
psu set link farm-read-link -p2ppref=100 -readpref=100 -writepref=0 -cachepref=XXX...
#
#
#--------------------------------------------------------------
#
# create the faked net unit
#
psu create unit -net 192.1.1.1/255.255.255.255
#
psu create ugroup ocean-copy-network
#
psu addto ugroup ocean-copy-network  192.1.1.1/255.255.255.255
#
# we assume that 'any-protocol' and 'any-store' is already defined.
#
psu create link ocean-copy-link any-store any-protocol ocean-copy-network
#
psu addto link ocean-copy-link ocean-copy-pools
#
#  define the ocean-copy pool group and add pool members
#
psu create pgroup ocean-copy-pools
#
psu addto pgroup ocean-copy-pools  read-pool-1
#
psu set link ocean-copy-link -p2ppref=100 -readpref=100 -writepref=0 -cachepref=XXX...
#
#
#
#

[return to top]

File Hopping managed by the HoppingManager

[return to top]

Starting the FileHopping Manager service

The HoppingManager is not automatically started in 1.7.0. Please perform the following steps to get it started :

  • Create a file hopping.batch in the /opt/d-cache/config directory with the following content :

        set printout default 3
        set printout CellGlue none
        onerror shutdown
        #
        check -strong setupFile
        #
        copy file:${setupFile} context:setupContext
        #
        #  import the variables into our $context.
        #  don't overwrite already existing variables.
        #
        import context -c setupContext
        #
        #   Make sure we got what we need.
        #
        check -strong serviceLocatorPort serviceLocatorHost
        #
        create dmg.cells.services.RoutingManager  RoutingMgr
        #
        #   The LocationManager Part
        #
        create dmg.cells.services.LocationManager lm \
               "${serviceLocatorHost} ${serviceLocatorPort}"
        #
        #
        create diskCacheV111.services.FileHoppingManager HoppingManager \
             "${config}/HoppingManager.conf -export"
        #
  • change to /opt/d-cache/jobs

  • Run ./initPackage.sh

  • Start the service ./hopping start

Initially no rules are configured for the hopping manager. You may add rules by either edit the /opt/d-cache/config/HoppingManager.conf and restart the hopping service, or use the admin interface and save the modifications by ’save’ into HoppingManager.conf

[return to top]

Configuring pools to use the HoppingManager(x)

In order to instruct the pool to send a ’replicateFile’ message to the HoppingManager service, modify the <hostname>.poolllist file as follows :

           ...
           ocean     /bigdisk/pools/ocean     replicateOnArrival=HoppingManager  <more options>
           ...

[return to top]

HoppingManager Configuration Introduction

  • The HoppingManager essentially receives ’replicateFile’ messages from pools, configured to support file hopping, and either discards or modifies and forwards them to the PoolManager, depending on rules described below.

  • The HoppingManager decides on the action to perform, based on a set of configurable rules. Each rule has a name. Rules are checked in alphabetic order concerning their names.

  • A rule it triggered if the storage class matches the storage class pattern assigned to that rule. If a rule is triggered, it is processed and no further rule checking is performed. If no rule is found for this request the file is not replicated.

  • If for whatever reason, a file couldn’t be replicated, NO RETRY is being performed.

  • Processing a triggered rule can be :

    • The message is discarded. No replication is done for this particular storage class.

    • The rule modifies the ’replicate message’, before it is forwarded to the PoolManager.

      The ’destination ip number can be added to the ’replicate file’ message. This has the same effect as the ip-number following the "PoolManager" keyword in the <hostname>.poollist file in the ’unconditional replication section above. The rule assignes a ’destination’ ip number to the ’replicate message’, before it is forwarded to the PoolManager. This has the same effect as the ip-number following the "PoolManager" keyword in the <hostname>.poollist file in the ’unconditional replication section above.

      The mode of the replicated file can be specified. This can either be ’precious’, ’cached’ or ’keep’. ’keep’ means that the pool mode determines the replicated file mode.

      The requested protocol can be specified.

[return to top]

HoppingManager Configuration Reference

         define hop OPTIONS <name> <pattern> precious|cached|keep
            OPTIONS
              -destination=<cellDestination> # default : PoolManager
              -overwrite
              -continue
              -source=write|restore|*   #  !!!! for experts only      StorageInfoOptions
              -host=<destinationHostIp>
              -protType=dCap|ftp...
              -protMinor=<minorProtocolVersion>
              -protMajor=<majorProtocolVersion> 

pattern is a storage class pattern which. If the incoming storage class matches this pattern, this rule is processed.

precious|cached|keep determines the mode of the replicated file. ’keep’ leaves the destination mode to the pool setting.

destination shouldn’t be used. For experts only.

overwrite In case, a rule with the same name already exists, it is overwritten. If this overwrite option is specified, the error will occure.

continue If a rule has triggered and the corresponding action has been performed, no other rules are checked. If the ’continue’ option is specified, rule checking continues. This is for debugging purposes only.

source Don’t use.

host, protType This is the ’host ip number’ and ’protocol type’used by the PoolManager in order find an appropriate pool for the replication request. Please note that ’host’ is not the host of the destination pool.

[return to top]

HoppingManager configuration examples

Define the HoppingManager as destination for the ’replicate file’ requests on the pool(s).

           ...
           ocean     /bigdisk/pools/ocean     replicateOnArrival=HoppingManager  <more options>
           ...

Replicate ’raw’ data files by all experiments.

          #
          define hop replicate-raw  .*:raw@osm -host=<Farm Node Ip Number>
          #

Replicate all CMS files to pools assigned to CMS farm nodes and all ATLAS files to pools assigned to ATLAS farm nodes. Don’t replicate any other files.

          #
          define hop replicate-cms    cms:.*@osm    -host=<Farm Node Ip Number of cms farm>
          define hop replicate-atlas  atlase:.*@osm -host=<Farm Node Ip Number of atlas farm>
          #