Disclaimer

Wednesday 21 July 2021

Automatic Failback of a Service in a Oracle-19c-RAC-Database

Automatic Failback of a Service in a Oracle-19c-RAC-Database


High-availability of database services has been a feature of Oracle Real Application Servers since many versions. 
Basically, when a database instance fails, a service which has got this instance as a preferred instance, fails over to another available instance. 
Unfortunately, the service did not fail back to the original instance as soon as the instance is up again. 
The administrator had to relocate the service. This has changed with Oracle Database 19c.

Environment running a three node Oracle 19c cluster. 

Both Grid Infrastructure and RDBMS are Oracle 19.3.0.0.0. 

An administrator managed database is running on all nodes:

oracle@rac01$] srvctl status database -db RACDB
 Instance RACDB01 is running on node rac01
 Instance RACDB02 is running on node rac02
 Instance RACDB03 is running on node rac03

Let’s create a simple service for this database:

oracle@rac01$] srvctl add service -db RACDB -service FOTEST -preferred RACDB02 -available RACDB01 -failback YES

Please note the new option “-failback YES”. 

This will make the service fail back to the original instance (in my case “RACDB02”) . 

The default is “NO”, i.e. Oracle will keep the old behavior by default.

oracle@rac01$] srvctl start service -db RACDB -service FOTEST

oracle@rac01$] srvctl status service -db RACDB -service FOTEST
Service FOTEST is running on instance(s) RACDB02


oracle@rac01$] srvctl config service -db RACDB -service FOTEST

Service name: FOTEST
Server pool:
Cardinality: 1
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type:
Failover method:
Failover retries:
Failover delay:
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name:
Hub service:
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Failback :  true
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: RACDB02
Available instances: RACDB01
CSS critical: no
Service uses Java: false

The line “Failback: true” shows, that failback is configured. 

Unfortunately, there is no line “Failback: false” if failback is not configured.

Let’s reboot node white (which hosts instance RACDB02) in another session and see what happens:

When the instance is down on node white, the service is started on node rac01 (instance RACDB02). This is the expected, well known behavior):

oracle@rac01$] srvctl status service -db RACDB -service FOTEST
Service FOTEST is running on instance(s) RACDB01


It takes some time for node white to reboot and to start all the Grid Infrastructure stuff, but after some time – without intervention of the DBA:

oracle@rac01$] srvctl status service -db RACDB -service FOTEST
Service FOTEST is running on instance(s) RACDB02

The service is back on instance RACDB02 again.

To sum up, a feature which has been expected by RAC administrators for years, finally was implemented in Oracle 19c. And it works fine.




No comments:

Post a Comment

100 Oracle DBA Interview Questions and Answers

  Here are 100 tricky interview questions tailored for a Senior Oracle DBA role. These questions span a wide range of topics, including perf...