13 August, 2021

Failover and Database Incarnation

In previous blog posts, I have demonstrated : 

1.  Using Flashback Database for [destructive] testing at a Standby

2.  Using Snapshot Standby Database for [destructive] testing at a Standby



In this blog post, I will demonstrate the Database Incarnation feature when a Failover is executed.


This is the current status at the Primary :


SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      ORCLCDB
SQL> archive log list

SQL>
SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /opt/oracle/archivelog/ORCLCDB
Oldest online log sequence     401
Next log sequence to archive   403
Current log sequence           403
SQL>

SQL> select incarnation#, resetlogs_change#, to_char(resetlogs_time,'DD-MON-RR HH24:MI') resetlogs_time,
  2  status, resetlogs_id
  3  from v$database_incarnation
  4  order by 1
  5  /

INCARNATION# RESETLOGS_CHANGE# RESETLOGS_TIME           STATUS  RESETLOGS_ID
------------ ----------------- ------------------------ ------- ------------
           1                 1 17-APR-19 00:55          PARENT    1005785759
           2           1920977 04-MAY-19 23:21          PARENT    1007421686
           3           4797184 27-MAR-20 00:00          CURRENT   1036108814

SQL>
SQL> l
  1  select dest_id, db_unique_name, status, type, archived_seq#, applied_seq#, synchronized, gap_status
  2  from v$archive_dest_status
  3  where dest_id in (1,2,3)
  4* order by 1
SQL> /

   DEST_ID DB_UNIQUE_NAME STATUS    TYPE             ARCHIVED_SEQ# APPLIED_SEQ# SYN GAP_STATUS
---------- -------------- --------- ---------------- ------------- ------------ --- ------------------------
         1 NONE           VALID     LOCAL                      402            0 NO
         2 STDBYDB        VALID     PHYSICAL                   402          401 YES NO GAP
         3 STDB2          VALID     PHYSICAL                   402          384 NO  NO GAP

SQL>


And this is the current status at the first Standby (STDBYDB)


From the alert log :

2021-08-13T23:03:00.998407+08:00
Recovery of Online Redo Log: Thread 1 Group 4 Seq 403 Reading mem 0
  Mem# 0: /opt/oracle/oradata/STDBYDB/stdbredo01.log


SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      STDBYDB
SQL> select incarnation#, resetlogs_change#, to_char(resetlogs_time,'DD-MON-RR HH24:MI') resetlogs_time,
  2  status, resetlogs_id
  3  from v$database_incarnation
  4  order by 1
  5  /

INCARNATION# RESETLOGS_CHANGE# RESETLOGS_TIME           STATUS  RESETLOGS_ID
------------ ----------------- ------------------------ ------- ------------
           1                 1 17-APR-19 00:55          PARENT    1005785759
           2           1920977 04-MAY-19 23:21          PARENT    1007421686
           3           4797184 27-MAR-20 00:00          CURRENT   1036108814

SQL>


I now create a Backup of the Standby (STDBYDB) (for this test I will not be using Flashback Database or Snapshot Standby Database {see the first two blog posts listed at the beginning for demonstrations of both methods}) :


oracle19c>rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Fri Aug 13 23:07:04 2021
Version 19.12.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCLCDB (DBID=2778483057, not open)

RMAN> backup as compressed backupset database;

Starting backup at 13-AUG-21
Starting implicit crosscheck backup at 13-AUG-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=128 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=252 device type=DISK
Crosschecked 10 objects
Crosschecked 11 objects
Finished implicit crosscheck backup at 13-AUG-21

Starting implicit crosscheck copy at 13-AUG-21
using channel ORA_DISK_1
using channel ORA_DISK_2
Finished implicit crosscheck copy at 13-AUG-21

searching for all files in the recovery area
cataloging files...
no files cataloged

using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/opt/oracle/oradata/STDBYDB/system01.dbf
input datafile file number=00004 name=/opt/oracle/oradata/STDBYDB/undotbs01.dbf
channel ORA_DISK_1: starting piece 1 at 13-AUG-21
channel ORA_DISK_2: starting compressed full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00003 name=/opt/oracle/oradata/STDBYDB/sysaux01.dbf
input datafile file number=00007 name=/opt/oracle/oradata/STDBYDB/users01.dbf
channel ORA_DISK_2: starting piece 1 at 13-AUG-21


....
....
....


Starting Control File and SPFILE Autobackup at 13-AUG-21
piece handle=/opt/oracle/FRA/STDBYDB/STDBYDB/autobackup/2021_08_13/o1_mf_s_1080515319_jkf2tzhp_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 13-AUG-21

RMAN>
RMAN> quit


Recovery Manager complete.
oracle19c>



I now execute a FAILOVER command at STDBYDB :


oracle19c>sqlplus '/ as sysdba'

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Aug 13 23:11:08 2021
Version 19.12.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.12.0.0.0

SQL> alter database recover managed standby database cancel;

Database altered.

SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      STDBYDB
SQL> alter database failover to STDBYDB;

Database altered.

SQL>
SQL> shutdown immediate;
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> startup open;
ORACLE instance started.

Total System Global Area 1207955544 bytes
Fixed Size                  9134168 bytes
Variable Size             436207616 bytes
Database Buffers          754974720 bytes
Redo Buffers                7639040 bytes
Database mounted.
Database opened.
SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      STDBYDB
SQL>
SQL> select incarnation#, resetlogs_change#, to_char(resetlogs_time,'DD-MON-RR HH24:MI') resetlogs_time,
  2  status, resetlogs_id
  3  from v$database_incarnation
  4  order by 1
  5  /

INCARNATION# RESETLOGS_CHANGE# RESETLOGS_TIME           STATUS  RESETLOGS_ID
------------ ----------------- ------------------------ ------- ------------
           1                 1 17-APR-19 00:55          PARENT    1005785759
           2           1920977 04-MAY-19 23:21          PARENT    1007421686
           3           4797184 27-MAR-20 00:00          PARENT    1036108814
           4          14030903 13-AUG-21 23:11          CURRENT   1080515513

SQL>

>
The STDYDB alert log also shows :


2021-08-13T23:11:18.443497+08:00
alter database recover managed standby database cancel
2021-08-13T23:11:18.475672+08:00
PR00 (PID:354): MRP0: Background Media Recovery cancelled with status 16037
2021-08-13T23:11:18.475814+08:00
Errors in file /opt/oracle/diag/rdbms/stdbydb/STDBYDB/trace/STDBYDB_pr00_354.trc:
ORA-16037: user requested cancel of managed recovery operation
PR00 (PID:354): Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 14030850
Stopping change tracking
2021-08-13T23:11:18.858990+08:00
Errors in file /opt/oracle/diag/rdbms/stdbydb/STDBYDB/trace/STDBYDB_pr00_354.trc:
ORA-16037: user requested cancel of managed recovery operation
2021-08-13T23:11:19.004299+08:00
Background Media Recovery process shutdown (STDBYDB)
2021-08-13T23:11:19.477146+08:00
Managed Standby Recovery Canceled (STDBYDB)
Completed: alter database recover managed standby database cancel
2021-08-13T23:11:40.986985+08:00
alter database failover to STDBYDB

2021-08-13T23:11:40.987075+08:00
.... (PID:4479): The Time Management Interface (TMI) is being enabled for role transition
.... (PID:4479): information.  This will result in messages beingoutput to the alert log
.... (PID:4479): file with the prefix 'TMI: '.  This is being enabled to make the timing of
.... (PID:4479): the various stages of the role transition available for diagnostic purposes.
.... (PID:4479): This output will end when the role transition is complete.
TMI: dbsdrv failover to target BEGIN 2021-08-13 23:11:40.987474
Terminal Recovery requested in process 4479
TMI: adbdrv termRecovery BEGIN 2021-08-13 23:11:40.989180
2021-08-13T23:11:41.082414+08:00
Attempt to do a Terminal Recovery (STDBYDB)
TMI: adbdrv termRecovery END 2021-08-13 23:11:41.082472
2021-08-13T23:11:41.083019+08:00
Attempt to do a Terminal Recovery (STDBYDB)
TMI: adbdrv termRecovery END 2021-08-13 23:11:41.082472
2021-08-13T23:11:41.083019+08:00
Media Recovery Start: Managed Standby Recovery (STDBYDB)
 Started logmerger process
2021-08-13T23:11:41.141008+08:00
PR00 (PID:4636): Managed Standby Recovery not using Real Time Apply
2021-08-13T23:11:42.361339+08:00
max_pdb is 5
2021-08-13T23:11:43.009492+08:00
Parallel Media Recovery started with 4 slaves
2021-08-13T23:11:44.585417+08:00
Stopping change tracking
PR00 (PID:4636): Media Recovery Waiting for T-1.S-403 (in transit)
2021-08-13T23:11:45.982280+08:00
PR00 (PID:4636): Killing 3 processes (PIDS:375,391,394) (all RFS, wait for I/O) in order to disallow current and future RFS connections. Requested by OS process 4636
2021-08-13T23:11:45.982627+08:00
Process termination requested for pid 375 [source = rdbms], [info = 2] [request issued by pid: 4636, uid: 54321]
2021-08-13T23:11:45.983224+08:00
Process termination requested for pid 391 [source = rdbms], [info = 2] [request issued by pid: 4636, uid: 54321]
2021-08-13T23:11:45.983244+08:00
Process termination requested for pid 394 [source = rdbms], [info = 2] [request issued by pid: 4636, uid: 54321]
2021-08-13T23:11:47.984399+08:00
PR00 (PID:4636): Begin: SRL archival
PR00 (PID:4636): End: SRL archival
PR00 (PID:4636): Terminal Recovery timestamp is '08/13/2021 23:11:48'
2021-08-13T23:11:49.723810+08:00
PR00 (PID:4636): Terminal Recovery: applying standby redo logs.
PR00 (PID:4636): Terminal Recovery: thread 1 seq# 403 redo required
2021-08-13T23:11:49.887210+08:00
PR00 (PID:4636): Terminal Recovery:
2021-08-13T23:11:50.007728+08:00
Recovery of Online Redo Log: Thread 1 Group 4 Seq 403 Reading mem 0
  Mem# 0: /opt/oracle/oradata/STDBYDB/stdbredo01.log
2021-08-13T23:11:50.853550+08:00
Terminal Recovery finished with No-Data-Loss
2021-08-13T23:11:51.223461+08:00
Incomplete Recovery applied until change 14030902 time 08/13/2021 23:11:45
2021-08-13T23:11:51.329028+08:00
Media Recovery Complete (STDBYDB)
2021-08-13T23:11:52.416801+08:00
Terminal Recovery: successful completion

PR00 (PID:4636): Forcing ARSCN to IRSCN for TR SCN:0x0000000000d61836
PR00 (PID:4636): Attempt to set limbo arscn SCN:0x0000000000d61836 irscn SCN:0x0000000000d61836
PR00 (PID:4636): Resetting standby activation ID 2807176397 (0xa7521ccd)
Stopping change tracking
2021-08-13T23:11:52.625495+08:00
ALTER DATABASE SWITCHOVER TO PRIMARY (STDBYDB)
2021-08-13T23:11:52.626348+08:00
Maximum wait for role transition is 15 minutes.
TMI: kcv_commit_to_so_to_primary wait for MRP to finish BEGIN 2021-08-13 23:11:52.626362
TMI: kcv_commit_to_so_to_primary wait for MRP to finish END 2021-08-13 23:11:52.626450
TMI: kcv_commit_to_so_to_primary Switchover from physical BEGIN 2021-08-13 23:11:52.626957
Backup controlfile written to trace file /opt/oracle/diag/rdbms/stdbydb/STDBYDB/trace/STDBYDB_ora_4479.trc
Standby terminal recovery start SCN: 14030850
RESETLOGS after incomplete recovery UNTIL CHANGE 14030902 time 08/13/2021 23:11:45
NET  (PID:4479): ORL pre-clearing operation disabled by switchover
Online log /opt/oracle/oradata/STDBYDB/redo01.log: Thread 1 Group 1 was previously cleared
Online log /opt/oracle/oradata/STDBYDB/redo02.log: Thread 1 Group 2 was previously cleared
Online log /opt/oracle/oradata/STDBYDB/redo03.log: Thread 1 Group 3 was previously cleared
2021-08-13T23:11:53.667669+08:00
Standby became primary SCN: 14030849
2021-08-13T23:11:53.668267+08:00
Setting recovery target incarnation to 4
2021-08-13T23:11:53.715844+08:00

NET (PID:4479): RT: Role transition work is not done
NET (PID:4479): The Time Management Interface (TMI) is being enabled for role transition
NET (PID:4479): information. This will result in messages beingoutput to the alert log
NET (PID:4479): file with the prefix 'TMI: '. This is being enabled to make the timing of
NET (PID:4479): the various stages of the role transition available for diagnostic purposes.
NET (PID:4479): This output will end when the role transition is complete.
NET (PID:4479): Redo network throttle feature is disabled at mount time
2021-08-13T23:11:53.934288+08:00
NET (PID:4479): Database role cleared from PHYSICAL STANDBY [kcvs.c:1099]
Switchover: Complete - Database mounted as primary
TMI: kcv_commit_to_so_to_primary Switchover from physical END 2021-08-13 23:11:53.935283
TMI: dbsdrv failover to target END 2021-08-13 23:11:53.935372
Failover completed with No-Data-Loss.
Completed: alter database failover to STDBYDB

2021-08-13T23:12:16.059509+08:00
ARC1 (PID:32662): Becoming the 'no SRL' ARCH
2021-08-13T23:12:26.958308+08:00
Shutting down ORACLE instance (immediate) (OS id: 4479)
....
....
....
2021-08-13T23:12:45.501291+08:00
Instance shutdown complete (OS id: 4479)
....
....
....
2021-08-13T23:12:55.538734+08:00
Starting ORACLE instance (normal) (OS id: 5005)
....
....
....
2021-08-13T23:14:09.422553+08:00
CJQ0 started with pid=64, OS id=6101
Completed: ALTER DATABASE OPEN


Meanwhile, the Primary shows that it no longer communicates with STDBYDB


SQL> l
  1  select dest_id, db_unique_name, status, type, archived_seq#, applied_seq#, synchronized, gap_status
  2  from v$archive_dest_status
  3  where dest_id in (1,2,3)
  4* order by 1
SQL> /

   DEST_ID DB_UNIQUE_NAME STATUS    TYPE             ARCHIVED_SEQ# APPLIED_SEQ# SYN GAP_STATUS
---------- -------------- --------- ---------------- ------------- ------------ --- ------------------------
         1 NONE           VALID     LOCAL                      403            0 NO
         2 STDBYDB        ERROR     PHYSICAL                   401          401 NO  RESOLVABLE GAP
         3 STDB2          VALID     PHYSICAL                   403          386 NO  NO GAP

SQL>


from the alert log :
2021-08-13T23:02:58.137970+08:00
NET  (PID:32207): Archived Log entry 956 added for T-1.S-401 ID 0xa7521ccd LAD:1
2021-08-13T23:03:00.379365+08:00
ALTER SYSTEM ARCHIVE LOG
2021-08-13T23:03:01.008064+08:00
Thread 1 advanced to log sequence 403 (LGWR switch),  current SCN: 14029372
  Current log# 1 seq# 403 mem# 0: /opt/oracle/oradata/ORCLCDB/redo01.log
2021-08-13T23:03:01.143735+08:00
NET  (PID:32207): Archived Log entry 959 added for T-1.S-402 ID 0xa7521ccd LAD:1
2021-08-13T23:11:46.487983+08:00
LGWR (PID:3724): Attempting LAD:2 network reconnect (3113)
LGWR (PID:3724): LAD:2 network reconnect abandoned
2021-08-13T23:11:46.488223+08:00
Errors in file /opt/oracle/diag/rdbms/orclcdb/ORCLCDB/trace/ORCLCDB_lgwr_3724.trc:
ORA-03113: end-of-file on communication channel
LGWR (PID:3724): Error 3113 for LNO:1 to 'STDBYDB'
2021-08-13T23:11:46.532708+08:00
LGWR (PID:3724): LAD:2 is UNSYNCHRONIZED
2021-08-13T23:11:48.149800+08:00
LGWR (PID:3724): Failed to archive LNO:1 T-1.S-403, error=3113
LGWR (PID:3724): Error 1041 disconnecting from LAD:2 standby host 'STDBYDB'
2021-08-13T23:11:48.300060+08:00
Thread 1 advanced to log sequence 404 (LGWR switch),  current SCN: 14030909
  Current log# 2 seq# 404 mem# 0: /opt/oracle/oradata/ORCLCDB/redo02.log
2021-08-13T23:11:49.005544+08:00
ARC1 (PID:3853): Archived Log entry 961 added for T-1.S-403 ID 0xa7521ccd LAD:1
2021-08-13T23:13:23.840728+08:00
 rfs (PID:5505): krsr_rfs_atc: Identified database type as 'PRIMARY': Client is Foreground (PID:5166)
 rfs (PID:5505): Database mount ID mismatch [0xa9f74b79:0xa9f7dc16] (2851556217:2851593238)
 rfs (PID:5505): Not using real application clusters


So, now the STDBYDB database has diverged from the Primary at SCN#14030903 and no longer accepts Redo from the Primary.
The STDBYDB database is now at Incarnation#4


I can stop Redo Shipping from the Primary


SQL> show parameter db_unique_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      ORCLCDB
SQL> select incarnation#, resetlogs_change#, to_char(resetlogs_time,'DD-MON-RR HH24:MI') resetlogs_time,
  2  status, resetlogs_id
  3  from v$database_incarnation
  4  order by 1
  5  /

INCARNATION# RESETLOGS_CHANGE# RESETLOGS_TIME           STATUS  RESETLOGS_ID
------------ ----------------- ------------------------ ------- ------------
           1                 1 17-APR-19 00:55          PARENT    1005785759
           2           1920977 04-MAY-19 23:21          PARENT    1007421686
           3           4797184 27-MAR-20 00:00          CURRENT   1036108814

SQL>
SQL> l
  1  select dest_id, db_unique_name, status, type, archived_seq#, applied_seq#, synchronized, gap_status
  2  from v$archive_dest_status
  3  where dest_id in (1,2,3)
  4* order by 1
SQL> /

   DEST_ID DB_UNIQUE_NAME STATUS    TYPE             ARCHIVED_SEQ# APPLIED_SEQ# SYN GAP_STATUS
---------- -------------- --------- ---------------- ------------- ------------ --- ------------------------
         1 NONE           VALID     LOCAL                      403            0 NO
         2 STDBYDB        ERROR     PHYSICAL                   401          401 NO  RESOLVABLE GAP
         3 STDB2          VALID     PHYSICAL                   403          386 NO  NO GAP

SQL> 
SQL> show parameter log_archive_dest_state_2

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2             string      enable
log_archive_dest_state_20            string      enable
log_archive_dest_state_21            string      enable
log_archive_dest_state_22            string      enable
log_archive_dest_state_23            string      enable
log_archive_dest_state_24            string      enable
log_archive_dest_state_25            string      enable
log_archive_dest_state_26            string      enable
log_archive_dest_state_27            string      enable
log_archive_dest_state_28            string      enable
log_archive_dest_state_29            string      enable
SQL> alter system set log_archive_dest_state_2='DEFER';

System altered.

SQL>

from the alert log
2021-08-13T23:21:58.120088+08:00
ALTER SYSTEM SET log_archive_dest_state_2='DEFER' SCOPE=BOTH;


Notice how v$archive_dest_status at the Primary now show that Sequence#401 is the last Archived and Applied at STDBYDB. Sequence#402 and subsequent ones are no longer relevant to STDBYDB.

You might note that the STDB2 which I have configured as a Lagging Standby (DEST_ID=3) is still receiving Redo from the Primary.  So, that Standby is still active.  (See here and here as to how I configured STDB2 as a second Standby that is lagging)


In the next Blog Post, I will show what happens if I attempt to restore the backup of STDBYDB and resync with the Primary after executing multiple changes (redo) in this database want to discard it, inspite of it now running at a different Incarnation# (Incarnation=4)

No comments: