When writing data into the dCache, and possibly later on into an HSM, checksums may be calculated at different points within this chain.
The client calculates the checksum before or while the
data is sent to the dCache. The checksum value,
depending on when it has been calculated, may sent
together with the open request to the door and stored
into pnfs
before the data transfer begins or it may be
sent with the close operation after the data has been
transferred.
The dCap
protocol providing both methods, but the
dCap
clients use the latter by default.
The FTP
protocol does not provide a mechanism to send
a checksum. Nevertheless, some FTP
clients can
(mis-)use the “site
”
command to send the checksum prior to the actual data
transfer.
While data is coming in, the server data mover may calculate the checksum on the fly.
After all the file data has been received by the dCache server and the file has been fully written to disk, the server may calculate the checksum, based on the disk file.
The graph below sketches the different schemes for dCap
and
FTP
with and without client checksum calculation:
Table 20.1. Checksum calculation flow
Step | FTP (w/o initial CRC) | FTP (with initial CRC) | dCap |
---|---|---|---|
1 | Create Entry | ||
2 | Store Client CRC in pnfs | ||
3 | Server calculates transfer CRC | ||
4 | Get Client CRC from pnfs | Get Client CRC from mover | |
5 | Compare Client and Server CRC | ||
6 | Store transfer CRC in pnfs | Store client CRC in pnfs | |
7 | Server calculates disk file CRC |
As far as the server data mover is concerned, only the
Client Checksum and the
Transfer Checksum are of interrest. While
the client checksum is just delivered to the server mover as
part of the protocol (e.g. close operation for dCap
), the
transfer checksum has to be calcalated by the server mover on
the fly. In order to communicate the different checksums to
the embedding pool, the server mover has to implement the
ChecksumMover interface in addition to
the MoverProtocol Interface. A mover, not
implementing the MoverProtocol is assumed
not to handle checksums at all. The Disk File
Checksum is calculated independedly of the mover
within the pool itself.
public interface ChecksumMover { public void setDigest( Checksum transferChecksum ) ; public Checksum getClientChecksum() ; public Checksum getTransferChecksum() ; }
The pool will or will not call the setDigest method to advise the mover which checksum algorithm to use. If setDigest is not called, the mover is not assumed to calculate the Transfer Checksum.
java.security.MessageDigest transferDigest = transferChecksum.getMessageDigest() ; *** while( ... ){ rc = read( buffer , 0 , buffer.length ) ; *** transferDigest.update( buffer , 0 , rc ) ; }
getClientChecksum and getTransferChecksum are called by the pool after the MoverProtocols runIO method has been successfully processed. These routines should return null if the corresponding checksum could not be determined for whatever reason.
public void setDigest( Checksum transferChecksum ){ this.transferChecksum = transferChecksum ; } public Checksum getClientChecksum(){ return clientChecksumString == null ? null : Checksum( clientChecksumString ) ; } public Checksum getTransferChecksum(){ return transferChecksum ; }
The DCapProtocol_3_nio mover implements the ChecksumMover interface and is able to report the Client Checksum and the Transfer Checksum to the pool. To enable the DCapProtocol_3_nio Mover to calculate the Transfer Checksum, either the cell context dCap3-calculate-transfer-crc or the cell batch line option calculate-transfer-crc must be set to true. The latter may as well be set in the *.poolist file. DCapProtocol_3_nio disables checksum calculation as soon as the mover receives a client command except ’write’ (e.g. read, seek or seek_and_write).
The checksum module (as part of the Pool) and its command subset (csm ...) determines the behavious of the checksum calculation.
csm set policy -ontransfer=on
Movers, implementing the ChecksumMover interface, are requested to calculate the Transfer Checksum. Whether or not the mover actually performance the calculation might depend on additional, mover specific flags, like the dCap3-calculate-transfer-crc flag for the DCapProtocol_3_nio mover.
If the mover reports the Transfer
Checksum and there is a Client
Checksum available, either from pnfs
or from
the mover protocol, the Transfer
Checksum and the Client
Checksum are compared. A mismatch will result in
a CRC Exception .
If there is no Client Checksum
available whatsoever, the Transfer
Checksum is stored in pnfs
.
csm set policy -onwrite=on
After the dataset has been completely and successfully written to disk, the pool calculates the checksum based on the disk file (Server File Checksum). The result is compared to either the Client Checksum or the Transfer Checksum and a CRC Exception is thrown in case of a mismatch.
If there is neither the Client Checksum
nor the Transfer Checksum available,
the Server File Checksum is stored in
pnfs
.
csm set policy -enforcecrc=on
In case of -onwrite=off, this options
enforces the calculation of the Server File
Checksum ONLY if neither the Client
Checksum nor the Transfer
Checksum has been sucessfully calculated. The
result is stored in pnfs
.