It is advisable to run the alarms service in a separate
domain and list this domain first in the layout file. That way
the alarms service gets booted first and can catch
startup errors reported by the other domains. Since both the
httpd service and the alarms service will access
the storage file generated at
/opt/d-cache/alarms/alarms.xml the
alarms service should be defined on the same host as the
httpd service. You can modify where this file is placed
by setting the property httpd.alarms.db.xml.path to
a different location.
Add a domain for the alarms service to the layout file
where the httpd service is defined.
[alarmserverDomain] [alarmserverDomain/alarms] ... [httpdDomain]
If all of the dCache domains run on the same host, then the default setting (localhost) will work.
In general your dCache will not be configured to run on one
node. In this case each node needs to know on which node the
alarms service is running. The alarms service
and the httpd will run on one of the nodes. On all the
other nodes you need to modify the
/opt/d-cache/etc/dcache.conf file or the
layout file to set the alarms.server.host
property to the host on which the alarms service is
running and restart dCache.
Example:
Look at an example of a dCache which consists of a head
node, some door nodes and some pool nodes. Assume that the
httpd service and the alarms service are
running on the head node. Then you would need to set the
property alarms.server.host on the pool
nodes and on the door nodes to the host on which the
alarms service is running.
alarms.server.host=<head-node>
The alarms defined are listed below. There are four
different levels of severity, CRITICAL,
HIGH, MODERATE and
LOW.
CRITICALSERVICE_CREATION_FAILUREDB_OUT_OF_CONNECTIONSDB_UNAVAILABLEJVM_OUT_OF_MEMORYOUT_OF_FILE_DESCRIPTORSThe affected dCache can’t work (is down).
HIGHIO_ERRORHSM_READ_FAILUREHSM_WRITE_FAILURELOCATION_MANAGER_UNAVAILABLEPOOL_MANAGER_UNAVAILABLEThese functions are affected and not working or not working properly, even though the dCache domain may be running.
MODERATEPOOL_DISABLEDCHECKSUMThere is an issue which should be taken care of in the interest of performance or usability, but which is not impeding the functioning of the system as a whole.
LOWThis issue might be worth investigating if it occurs, but is not urgent.
Given that an alarm has been triggered, you will find an entry
in the file
/opt/d-cache/alarms/alarms.xml.
As it is not very convenient to read an XML file, the Alarms Web Page can be used to inspect and manage the generated warnings.