The
The
The first node in the cluster where CTSS is started will become the master time manager. Other CTSS daemons will communicate with the master CTSS and validate the time. If a time difference between hosts in the cluster is detected it will adjust the time, similar to the NTP daemon. CTSS will never go back into time. If time differences are taking place, it will be reported in the alert.log. If the time difference between hosts during startup is too large( 1000 msec) Oracle Clusterware will not startup on the newly joined nodes. An alert will be written to the alert.log of the Oracle Clusterware home e.g. (/u01/app/11.2.0/grid/log//alert.log).You manually need to modify the time and start Oracle Clusterware in that case.
Solution:-
How does Oracle Clusterware decide to start CTSS in observe or active mode?
CTSS is a process which runs as root on each node. As soon as Oracle Clusterware is started the CTSS daemon validates if the /etc/ntp.conf file exists, if this file exists CTSS will run in observer mode. To determine if NTPd aemon is active cluster verification utility to get that part of t he information.
root 3582 1 0 11:24 ? 00:00:12 /u01/app/11.2.0/grid/bin/octssd.binreboot
How to validate CTSS is running in observer mode or active mode?
Of course there is the alert.log on cluster level which will report the status as well as the trace file. But the two easiest way is to use crsctl or cluvfu for this purpose. Crsctl will tell if CTSS is running and when the role is Active report the Offset in msec. Cluvfy will report much more information. Below are examples:
$ crsctl check ctss
$ cluvfy comp clocksync -n all
$ cluvfy comp clocksync -verbose
$
$
This will list the status of CTSS, if it is running and after that it will report the current mode, active or observer. If the mode is active it will also report if a time synchronization issue exists.
Sample output using CRSCTL:
[oracle@server1 log]$ /u01/app/11.2.0/grid/bin/crsctlcheck ctss
CRS-4701: The Cluster Time Synchronization Service is in Active mode.
CRS-4702: Offset (in msec): 0
[oracle@server1
CRS-4701:
CRS-4702:
Sample output CTSS in Observer mode but NTP not active:
[oracle@server1 server1]$ cluvfy comp clocksync
Verifying Clock Synchronization across the cluster nodes
Verifying
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
Check
Checking
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...
Query
Check
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using NetworkTime Protocol(NTP)...
NTP Configuration file check started...
NTP Configuration file check passed
NTP
NTP
Checking daemon liveness...
Liveness check failed for "ntpd"
Check failed on nodes: server1
Liveness
Check
PRVF-5415: Check to see if NTP daemon is running failed
Clock synchronization check using Network Time Protocol(NTP) failed
PRVF-9652: Cluster Time Synchronization Services checkfailed
Clock
PRVF-9652:
Verification of Clock Synchronization across thecluster nodes was unsuccessful on all the specified nodes.
Here we see the error messagesPRVF-5415followed byPRVF-9652,indicating there is an issue with NTP. This is correct as it wasnotconfigured.
Sample output CTSS in Observer mode but NTP not active in verbose mode:
[oracle@server1]$ cluvfy comp clocksync -verbose
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
[oracle@server1]$
Verifying
Checking
Check
Checking
Check:
Node Name Status
------------------------------------------------------------
server1 passed
Result: CTSS resource check passed
server1
Result:
Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed
Check CTSS state started...
Check: CTSS state
Node Name State
Result:
Check
Check:
Node
------------------------------------------------------------
server1 Observer
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
Starting
NTP Configuration file check started...
The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP Configuration file check passed
The
NTP
Checking daemon liveness...
Check: Liveness for "ntpd"
Node Name Running?
Check:
Node
------------------------------------------------------------
server1 no
Result: Liveness check failed for "ntpd"
PRVF-5415 : Check to see if NTP daemon is running failed
Result: Clock synchronization check using Network Time Protocol(NTP) failed
PRVF-9652 : Cluster Time Synchronization Services check failed
Result:
PRVF-5415
Result:
PRVF-9652
Verification of Clock Synchronization across the cluster nodes was unsuccessful on all the specified nodes.
Same result as above, although easier to see the State and that NTP is not active.
Sample output when CTSS is in active mode using cluvfy:-
[oracle@server1~]$ cluvfy comp clock sync -verbose
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking
Check
Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
NodeName Status
------------------------------------ ------------------------
server1 passed
Result: CTSS resource check passed
Check:
NodeName
------------------------------------
Result:
Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed
Check CTSS state started...
Check: CTSS state
NodeName State
------------------------------------ ------------------------
server1 Active
CTSS is in Active state. Proceeding with check of clock time offsets on allnodes...
Reference Time Offset Limit: 1000.0 msecs
Check: Reference Time Offset
Node Name TimeOffset Status
------------ ------------------------ ------------------------
server1 0.0 passed
Result:
Check
Check:
CTSS
Reference
Check:
Time offset is within the specified limits on thefollowing set of nodes:
"[server1]"
Result: Check of clock time offsets passed
"[server1]"
Result:
Oracle Cluster Time Synchronization Services checkpassed
Verification of Clock Synchronization across the cluster nodes was successful.
[oracle@server1 ~]$
Verification
[oracle@server1
Sample output when time offset is violated using cluvfy:
[oracle@server1 ~]$ cluvfy comp clocksync -n all
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
Verifying
Checking
Check
Checking
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Querying
Query
Check CTSS state started...
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
PRVF-9661 : Time offset is NOT within the specified limits on the following nodes:"[server2]"
CTSS
PRVF-9661
PRVF-9652 : Cluster Time Synchronization Services check failed
Verification of Clock Synchronization across the cluster nodes was unsuccessful.
Checks did not pass for the following node(s): server2
Checks
[oracle@server2 ~]$ crsctl check ctss
CRS-4701:
CRS-4702:
How to switch between observer mode to active mode in either direction?
To execute this task it is very simple. Make sure the /etc/ntp.conf fileis not available. Based on the existence of ntp.conf file CTSS will be inactive or observe mode. So remove/rename the file.
Every 30 second CTSS will check if the current state is still correct.When CTSS discovers this state is incorrect it will automatically switch fromobserver to active mode when the file is removed. If you don’t want to use theCTSS for the time management create the ntp.conf file again, and on the fly thestate will change.
Trace output explaining the above:
2009-09-28 16:27:13.768: [CTSS][3010210704]sclsctss_gvss1: NTP default config file found
2009-09-28 16:27:13.768: [CTSS][3010210704]sclsctss_gvss8: Return [0] and NTP status [2].
2009-09-28 16:27:13.768: [ CTSS][3010210704]ctss_check_vendor_sw: Vendor timesync software is detected. status [2].
2009-09-28 16:27:15.375: [ CTSS][3020700560]ctsscomm_prh: Handler called
[ CTSS][3020700560]ctss_process_request_handler: Master: Received sync messageevent
2009-09-28 16:27:15.375: [ CTSS][3020700560]ctsscomm_pi: Received sync msg
2009-09-28 16:27:15.375: [ CTSS][3020700560]ctsscomm_pi: Received from slave (mode [0x46] nodenum [2] hostname [server2] )
2009-09-28 16:27:23.378: [ CTSS][3020700560]ctsscomm_prh: Handler called
[ CTSS][3020700560]ctss_process_request_handler: Master: Received sync messageevent
2009-09-28 16:27:23.378: [ CTSS][3020700560]ctsscomm_pi: Received sync msg
2009-09-28 16:27:23.378: [ CTSS][3020700560]ctsscomm_pi: Received from slave (mode [0x46] nodenum [2] hostname [server2] )
2009-09-28 16:27:31.389: [ CTSS][3020700560]ctsscomm_prh: Handler called
[ CTSS][3020700560]ctss_process_request_handler: Master: Received sync messageevent
2009-09-28 16:27:31.389: [ CTSS][3020700560]ctsscomm_pi: Received sync msg
2009-09-28 16:27:31.389: [ CTSS][3020700560]ctsscomm_pi: Received from slave (mode [0x46] nodenum [2] hostname [server2] )
2009-09-28 16:27:39.389: [ CTSS][3020700560]ctsscomm_prh: Handler called
[ CTSS][3020700560]ctss_process_request_handler: Master: Received sync messageevent
2009-09-28 16:27:39.389: [ CTSS][3020700560]ctsscomm_pi: Received sync msg
2009-09-28 16:27:39.389: [ CTSS][3020700560]ctsscomm_pi: Received from slave (mode [0x46] nodenum [2] hostname [server2] )
2009-09-28 16:27:43.383: [ CTSS][2978741136]ctss_checkcb: clsdm requested checkalive. Returns [6e]
2009-09-28 16:27:43.769: [ CTSS][3010210704]sclsctss_gvss2: NTP default pid file not found<==== here /etc/ntp.conf is renamed.
2009-09-28 16:27:43.770: [CTSS][3010210704]sclsctss_gvss8: Return [0] and NTP status [1].
2009-09-28 16:27:43.770: [ CTSS][3010210704]ctss_check_vendor_sw: Vendor timesync software is not detected. status [1].
2009-09-28 16:27:43.786: [ CTSS][3010210704]ctsselect_determine_role: node [1]with mode [0x4e] in the modes table
2009-09-28 16:27:43.799: [ CTSS][3010210704]ctsselect_determine_role: node [2]with mode [0x46] in the modes table
2009-09-28 16:27:43.799: [ CTSS][3010210704]ctsselect_determine_role: Vendor time synchronization software is not detected on any node in the cluster. Switched to active role.
2009-09-28
2009-09-28
2009-09-28
[
2009-09-28
2009-09-28
2009-09-28
[
2009-09-28
2009-09-28
2009-09-28
[
2009-09-28
2009-09-28
2009-09-28
[
2009-09-28
2009-09-28
2009-09-28
2009-09-28
2009-09-28
2009-09-28
2009-09-28
2009-09-28
2009-09-28
Output from the alert.log when there is an Time synchronization issue.[ctssd(3416)]CRS-2408:The clock on host server2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
2009-10-01 13:50:51.727[ctssd(3416)]CRS-2411:The Cluster Time Synchronization Service will take a long time to perform time synchronization as local time is significantly different from mean cluster time. Details in/u01/app/11.2.0/grid/log/server2/ctssd/octssd.log.
You can find similar output in the operation system logfile.
Remark:
- CTSS will run in observe mode as soon as an NTP configuration is found. This doesn’t tell if the NTP daemon is really working properly. Be aware of this! Default Linux installation will have the ntp.conffile in /etc/
- Use either NTP configuration or CTSS for time management. Don’t “play’ with CTSS on production environments. So discuss what you require.
-
-
Additional trace information:
When you look into the details when using cluvfy you will find out the following checks are performed.
- Validate
if this is a cluster environment, does ocr.loc exists? - Check
if CTSS is running using: /u01/app/11.2.0/grid/bin/crsctl check ctss - Check
if ntp configuration file exists (when found mark as exists):/tmp/CVU_11.2.0.1.0_oracle/exectask.sh -chkfile /etc/ntp.conf - Validate
if NTP daemon is really active using:/tmp/CVU_11.2.0.1.0_oracle/exectask.sh -chkalive ntpd
Alert.log will show when in observer mode:
[ctssd(3582)]CRS-2403:The Cluster Time Synchronization Service on host server1 is in observer mode.
2009-09-27 21:24:46.766[ctssd(3582)]CRS-2407:The new Cluster Time Synchronization Service reference node is host server2.
2009-09-27 21:24:46.938[ctssd(3582)]CRS-2412:The Cluster Time Synchronization Service detects that the local time is significantly different from the mean cluster time. Details in /u01/app/11.2.0/grid/log/server1/ctssd/octssd.log.
2009-09-27 21:24:46.986[ctssd(3582)]CRS-2409:The clock on host server1 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode.
2009-09-27 21:24:47.277[ctssd(3582)]CRS-2401:The Cluster Time Synchronization Service started on host server1.
2009-09-27 21:26:38.926.....
[ctssd(3582)]CRS-2409:The clock on host server1 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode.
2009-09-27 22:00:32.725[ctssd(3582)]CRS-2409:The clock on host server1 is notsynchronous with the mean cluster time. No action has been taken as the ClusterTime Synchronization Service is running in observer mode.
2009-09-27
2009-09-27
2009-09-27
2009-09-27
2009-09-27
[ctssd(3582)]CRS-2409:The
2009-09-27
Here you can read the state of CTSS as well, but also see there is asynchronization issue No action is taken to fix this issue as CTSS is in observer mode.
Alert.log will show when in active mode:
[ctssd(3578)]CRS-2401:TheCluster Time Synchronization Service started on host server1 is in active mode.
Advise:
No comments:
Post a Comment