The Basic 20 Things you have to know when managing RAC Database
- What is OHASD
- What is OCR
- What is VOTING DISK
- What is CRS
- What is CSSD
- What is CTSSD
- What is VIP
- What is SCAN IP and Listener
- What is OLOGGERD
- What is EVMD
- What is MDNSD
- What is ONS
- What is OPROCD
- What is FAN
- What is TAF
- What is FCF
- What is GCS(LMSn)
- What is GES
- What is GNPD
- What is DISKMON
- What is OHASD
Ohasd stands for Oracle High Availability Services Daemon. It's implemented via a new daemon process which also call ohasd on the OS layer. It's the lower part of the Oracle Clusterware stack, which consists of processes that facilitate cluster operations in RAC databases. This includes :
- GPNPD (Grid Plug and Play) provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent profile.
- GIPC (Grid Interprocess Communication) is a helper daemon for the communication infrastructure
- MDNS (Multicast Domain Name Service) allows DNS requests.
- GNS (Oracle Grid Naming Service) is a gateway
Useful Commands:
- crsctl enable has --> To start has services after reboot
- crsctl disable has --> has services should not start after reboot
- crsctl config has --> Check configuration whether autostart is enabled or not
- cat /etc/oracle/scls/<Node_name>/root/ohasdstr --> Check whether it is enabled or not
- cat /etc/oracle/scls/<Node_name>/root/ohasdrun --> whether restart enabled if node fails
- What is OCR, OLR
OCR stands for Oracle Cluster Registry. It holds information on it such as node membership (which nodes are part of this cluster), Software Version, Location of voting disk, Status of RAC Database, Listeners, Instances and services. OCR is placed in ASM or OCFS (Oracle Cluster File System).
ASM can be brought up only if we have access to OCR. But OCR is accessible only after the ASM is up. So how will CRS services coming up?
For this OLR (Oracle Local Registry) is there. This is a multiplexing of OCR file which was places in local file system.
OLR holds information on it such as CRS_HOME, GPnP details, active session, localhost version, OCR latest backup, Node name…
- cat /etc/oracle/ocr.loc --> OCR file Details
ocrconfig_loc=<+ASM_Location>
local_only=FALSE
- cat /etc/oracle/ocr.loc --> OLR file Details
olrconfig_log=<file_name_with_location.olr>
crs_home=<CRS_HOME_Location>
Useful Commands:
- ocrconfig -showbackup --> OCR file backup location
- ocrconfig -export <File_name_with_Full_Location.ocr> --> OCR Backup
- ocrconfig -restore <File_name_with_Full_Location.ocr> --> Restore OCR
- ocrconfig -import <File_name_with_Full_Location.dmp> --> Import Metadata specifically for OCR
- ocrcheck -details --> Gives the OCR info in detail
- ocrcheck -local --> Gives OLR infor in detail
- ocrdump -local <File_name_with_Full_Location.olr> --> Take the dump of OLR
- ocrdump <File_name_with_Full_Location.ocr> --> Take the dump of OCR
- What is VOTING DISK
If a node joins cluster, if a node fails (may be evicted), if VIP need to be assigned in case of GNS is configured etc, voting disks comes into picture. Voting disk saves the information of which nodes were part on cluster. While starting the crs services, with the help of OCR, it will vote in the voting disk.
We need to take the backup of the voting disk periodically like our cron jobs. There are two different jobs done by voting disk:
- Dynamic - Heart beat information
- Static - Node information in the cluster
Useful commands:
- dd if=Name_Of_Voting_Disk=Name_Of_Voting_Disk_Backup --> Taking backup of voting disk
- crsctl query css votedisk --> Check voting disk details
- crsctl add css votedisk Path_to_voting_disk --> To add voting disk
- crsctl add css votedisk -force --> If the cluster is down
- crsctl delete css votedisk <File_Name_With_Password_With_File_Name> --> Delete voting disk
- crsctl delete css votedisk -force --> If the cluster is down
- crsctl replace votedisk <+ASM_Disl_Group> --> Replace the voting disk
- What is CRS
CRSD stands for Cluster Resource Service Daemon. It is a process which is responsible to monitor, stop, start and failover the resources. This process maintains OCR and this is responsible for restarting resource when any failover is about to take place.
Useful Commands :
- crs_stat -t -v --> Check crs resources
- crsctl stat res -t --> Check in a bit detail view
- crsctl enable crs --> Enable automatic start of Services after reboot
- crsctl check crs --> Check crs services
- crsctl disable crs --> Diable automatic start of services after reboot
- crsctl stop crs --> Stop the crs services forcefully
- crsctl start crs --> To start the crs services on respective node
- crsctl start crs -excl --> To start the crs services in exclusive mode when you lost voting disk (You must replace the voting disk after you start the crs)
- crsctl stop cluster -all --> Stop the crs services on the cluster nodes
- crsctl start cluster -all --> Start the crs services on all the cluster nodes
- olsnodes --> Find all the nodes relative to the cluster
- oclumon manage -get master --> With this you will get master node information
- cat $CRS_HOME/crs/init/<node_name>.pid --> Find PID from which crs is running
- What is CSSD
CSSD stands for Cluster Synchronization Service Daemon. This is responsible for communicating the nodes each other, This will monitor the heart beat messages from all nodes.
Useful Commands
- crsctl stop css --> For stopping the css
- crsctl disable css --> Disabling automatic startup after reboot
- What is CTSSD
CTSSD stands for Cluster Time Synchronization Service Daemon. This service by default will be in observer mode. If time difference is there, it won't be taking any action. To run this service in active mode, you need to disable all the time synchronization services like NTP (Network Time Protocol). But, it is recommended to keep this service in observer mode.
Useful commands:
- cluvfy comp clocksync -n all -verbose --> To check the clock synchronization across all the nodes
- crsctl check ctss --> Check the service status and time offset in msecs
- What is VIP
VIP stands for Virtual IP Address. Oracle uses VIP for Databases level access, Basically, when a connection comes from application end. Then using this IP address, I will connect. The VIP is used for RAC failover and RAC management. Suppose if IP for one node is down. As per protocol timeout, it need to wait 90 seconds to get a session. In this scenario, VIP comes into picture. If one of the VIP is down, connections will be routed only to the active node. The VIP must be ion the same address as public address.
Useful Commands:
- Srvctl start vip -n <node_name> -i <VIP_Name> --> To Start VIP
- Srvctl stop vip -n <node_name> -i <VIP_Name> --> To Stop VIP
- What is SCAN IP and Listener
SCAN stands for Single Client Access Name. Scan IP's must be on the same sub net mask. Three SCAN IP's is a recommended number of count which redirects user sessions to the scan listeners. Load balancing on scan listener will be done by least_recently_loaded_algorithm.
When a connection is initiated from the application end, listener verifies the load balancing, and once it gets information, it will assign the connection to the node listener and user can do his transaction.
Main use is that we need not change the connection string in the application servers if any changes on the cluster are done like adding node, deleting node and other modification basing on requirement.
Useful Commands:
- srvctl config scan --> Retrieves scan listener configuration
- srvctl config scan listener --> List of scan listeners with Port Number
- srvctl add scan -n <node_name> --> Add a scan listener to the cluster
- srvctl add scan_listener -p <Desired_port_number> --> To add scan listener on specific port
- SQL> show parameter remote_listener --> Find the list of scan listeners
- srvctl stop scan --> Stops all scan listeners when ysed without -i option
- srvctl stop scan_listener --> Stops one or mode services in the cluster
- srvctl start scan --> To start the scan VIP
- Srvctl start scan_listener --> Start the scan listener
- Srvctl status scan --> Verify scan listener status
- Srvctl modify scan_listener --> Modify the scan listener
- Srvctl relocate scan_listener -i <Ordinal_Number> -n <node_name> --> Relocate the scan listener to another node
- What is OLOGGERD
Ologgerd stands for cluster logger service Daemon. This is otherwise called as cluster logger service. This logger services writes the data in the master node, and chooses other nodes as standby. If any network issue occurs between the nodes, and if it is unable to contact the master, the other node takes ownership and chooses a node as standby node.
Useful Commands:
- oclumon manage -get master --> Find which is the master
- oclumon manage -get reppath --> Will get the path of the repository logs
- oclumon -get repsize -->This will give you the limitations on repository size
- oclumon showobjects --> Find which nodes are connected to loggerd
- oclumon dumpnodevirw --> This will give a detail view including system, topconsumers, processes, devices. Nics, filesystems status, protocol errors.
- oclumon dumpnodeview -n <node_1 node_2 node_3> -last "HH:MM:SS" --> You can view all the details in column from a specific time you mentionned
- oclumon dumpnodeviw allnodes - last "HH:MM:SS" --> If we need information from all nodes
- What is EVMD
EVMD stands for Event Volume Manager Daemon. This handles event messaging for the processes. It send and receives actions regarding resource state changes to and from all other nodes in a cluster. This will take the helps of ONS (Oracle Notification Services)
Useful Commands:
- evmwatch -A -t "@timestamp @@" --> Get events generated in evmd
- evmpost -u "Message Here" -h <node_name> --> This will post message in evmd log in the mentioned node
- What is MDNSD
Mdnsd stands for Multicast Domain Name Service Daemon. This process is used by gpndp to locate profiles in the cluster as well as by GNS to perform name resolutions. Mdnsd updates the pid file in init directory.
- What is ONS
ONS stands for Oracle Notification Service. It will allow users to send SMS, emails, voice messages and Fax messages in an easy way. ONS will send the state of database, instance. This state information is used for load balancing. ONS will also communicate with daemons in other nodes for informing state of database.
Useful Commands:
- srvctl status nodeapps --> Status of nodeapps
- cat $ORACLE_HOME/opmn/conf/ons.config --> Check ons configuration
- $ORACLE_HOME_HOME/opmn/logs --> ONS logs will be in this location
- What is OPROCD
OPROCD stands for Oracle Process Monitor Daemon. It monitors the system state of cluster nodes.
Useful Commands:
- CRS_HOME/oprocod stop --> To stop the processon single mode
- What is FAN
FAN stands for Fast Application Notification. If any state change occurs in cluster/instance/node, an event is triggered by the event manager and it is propagated by ONS. The event is known as FAN event.
Useful Commands:
- onsctl ping --> Tocheck wheteher ons is running or not
- onsclt debug --> Will get detail view of ons
- onsctl start --> Start the daemon
- onsctl stop --> Stop the daemon
- What is TAF
TAF stands for Transparent Application Failover. When any RAC node is down, the select statements need to failover to the active node. Insert, Delete, Update and also Alter session statements are not supported by TAF. Temporary objects and PL/SQL are lost during the failover.
There are two types of failover methods used in TAF:
- Basic Failover: It will connect to single node and no overload will be there. End user experiences delay in completing the transaction
- Preconnect Failover: It will connect to primary and backup node at a time.This offers faster failover. An overload will be
Useful Commands:
SELECT machine, failover_type, failover_method, fail_over, Count(*)
FROM gv$session GROUP BY machine,failover_type, failover; --> Check Failover Status
- What is FCF
FCF stands for Fast Connection Failover. It is an application level failover process. This will automatically subscribes to FAN events and this will help in immediate reaction on the up and down events from the database cluster. All the failure applications are cleaned up immediately, so that the application will receive a failure message. After cleanup, If new connection is received then with load balancing, it will reach active node.
- What is GCS
GCS stands for Global Cache Service. It caches the information of data blocks, and access privileges of various instances. Integrity is maintained by maintaining global access. It is responsible for transferring blocks from instance to another instance when needed.
- What is GES
GES stands for Global Enqueue Service. It controls library and dictionary caches on all the nodes. GES manages transaction locks, table locks, library cache locks, dictionary cache locks and database locks.
- What is GPNPD
GPNPD stands for Grid Plug aNd Play Daemon. A file is located in CRS_HOME/gpnp/<node_name>/profile/peer/profile.xml which is known as GPNP profile. And this profile consists of cluster name, hostname, network profiles with IP addresses, OCR. If we need to do any modifications for voting disk, profile will be updated.
- What is DISKMON
Disk Monitor daemon continuously runs when ocssd starts and it monitors and perform I/O fencing for Exadata Storage Server (This term is named as Cell as per Exadata). This process will run since the ocssd starts because exadata cell can be added to any cluster at any time.
No comments:
Post a Comment