One scenario I consider when testing a Standby Database -- actually testing a Disaster Recovery site -- is to simulate complete loss of the Primary Database (at the Production site)
Previously, I have demonstrated two methods of "Destructive" Disaster Recovery site testing :
1. Being able to Switchover to the Standby (at a Disaster Recovery site), verifying that transactions created at the Disaster Recovery site do actually flow back to the Production site and, finally, reverting both databases to their pre-test image)
2. Opening a Standby Database as a Snapshot Standby, verify that it is writable and, finally, reverting it to being a "normal" Standby
But what if you want to simulate complete loss of the Production site (Primary database) -- signifying that you may have even lost some transactions, but needing to open the Standby for Read-Write operations ? OR you actually encounter such a situation where the Primary goes down when the Standby is lagging ?
On my Primary I had these configured for MAXIMUM AVAILABILITY (with SYNC and AFFIRM for log_archive_dest_2) :
SQL>alter system set log_archive_dest_2='SERVICE=STDBYDB SYNC AFFIRM VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=STDBYDB'; System altered. SQL> SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup mount; ORACLE instance started. Total System Global Area 1207955552 bytes Fixed Size 9134176 bytes Variable Size 822083584 bytes Database Buffers 369098752 bytes Redo Buffers 7639040 bytes Database mounted. SQL> SQL> alter database SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY ; Database altered. SQL> shutdown ORA-01109: database not open Database dismounted. ORACLE instance shut down. SQL> startup; ORACLE instance started. Total System Global Area 1207955552 bytes Fixed Size 9134176 bytes Variable Size 822083584 bytes Database Buffers 369098752 bytes Redo Buffers 7639040 bytes Database mounted. Database opened. SQL> SQL> select protection_mode, open_mode, database_role from v$database; PROTECTION_MODE OPEN_MODE DATABASE_ROLE -------------------- -------------------- ---------------- MAXIMUM AVAILABILITY READ WRITE PRIMARY SQL>
This is information of the Primary Database (at the Production site) when Primary and Standby are in sync :
SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /opt/oracle/archivelog/ORCLCDB Oldest online log sequence 235 Next log sequence to archive 237 Current log sequence 237 SQL>
And this is at the Standby Database (at the Disaster Recovery site) :
rfs (PID:4415): Changing standby controlfile to MAXIMUM AVAILABILITY mode rfs (PID:4415): Changing standby controlfile to RESYNCHRONIZATION level rfs (PID:4415): Standby controlfile consistent with primary Recovery of Online Redo Log: Thread 1 Group 5 Seq 237 Reading mem 0 Mem# 0: /opt/oracle/oradata/STDBYDB/stdbredo02.log
My Standby database / site goes down for a short while.
LGWR (PID:4280): Failed to archive LNO:1 T-1.S-238, error=3113 LGWR (PID:4280): Error 1041 disconnecting from LAD:2 standby host 'STDBYDB' 2021-07-10T17:58:05.264661+08:00 Thread 1 advanced to log sequence 239 (LGWR switch) Current log# 2 seq# 239 mem# 0: /opt/oracle/oradata/ORCLCDB/redo02.log 2021-07-10T17:58:05.390312+08:00 ARC1 (PID:4345): Archived Log entry 524 added for T-1.S-238 ID 0xa7521ccd LAD:1 and sometime later SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /opt/oracle/archivelog/ORCLCDB Oldest online log sequence 242 Next log sequence to archive 244 Current log sequence 244 SQL> select current_scn from v$database; CURRENT_SCN ----------- 12768459 SQL>
We now know that the Standby (last applying Online Redo Sequence#237 [it went on to 238 before the Instance/Server crashed]) is lagging behind the Primary (current Redo Log 244, SCN 12768459 .
So the FAILOVER must be issued from the Standby when I bring it up. Let my try that now
SQL> startup mount; ORACLE instance started. Total System Global Area 1207955552 bytes Fixed Size 9134176 bytes Variable Size 436207616 bytes Database Buffers 754974720 bytes Redo Buffers 7639040 bytes Database mounted. SQL> create restore point before_failover guarantee flashback database; Restore point created. SQL> SQL> select protection_mode, open_mode, database_role from v$database; PROTECTION_MODE OPEN_MODE DATABASE_ROLE -------------------- -------------------- ---------------- MAXIMUM AVAILABILITY MOUNTED PHYSICAL STANDBY SQL> show parameter db_unique_name NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_unique_name string STDBYDB SQL> SQL> alter database failover to STDBYDB; Database altered. SQL> SQL> select protection_mode, open_mode, database_role from v$database; PROTECTION_MODE OPEN_MODE DATABASE_ROLE -------------------- -------------------- ---------------- MAXIMUM PERFORMANCE MOUNTED PRIMARY SQL> alter database open; Database altered. SQL>
Let's see what the alert log on the Standby shows
ARC1 (PID:6205): FAL: Error 12541 connecting to ORCLCDB for fetching gap sequence 2021-07-10T18:04:34.507622+08:00 RVWR started with pid=45, OS id=11261 2021-07-10T18:04:38.704492+08:00 Allocating 8388608 bytes in shared pool for flashback generation buffer. Allocated 5430064 bytes in shared pool for flashback generation buffer 2021-07-10T18:04:39.802983+08:00 Created guaranteed restore point BEFORE_FAILOVER 2021-07-10T18:05:50.402988+08:00 alter database failover to STDBYDB 2021-07-10T18:05:50.403086+08:00 .... (PID:6240): The Time Management Interface (TMI) is being enabled for role transition .... (PID:6240): information. This will result in messages beingoutput to the alert log .... (PID:6240): file with the prefix 'TMI: '. This is being enabled to make the timing of .... (PID:6240): the various stages of the role transition available for diagnostic purposes. .... (PID:6240): This output will end when the role transition is complete. TMI: dbsdrv failover to target BEGIN 2021-07-10 18:05:50.403742 Terminal Recovery requested in process 6240 TMI: adbdrv termRecovery BEGIN 2021-07-10 18:05:50.408509 2021-07-10T18:05:50.494984+08:00 Attempt to do a Terminal Recovery (STDBYDB) TMI: adbdrv termRecovery END 2021-07-10 18:05:50.495053 2021-07-10T18:05:50.495642+08:00 Media Recovery Start: Managed Standby Recovery (STDBYDB) Started logmerger process 2021-07-10T18:05:50.532463+08:00 PR00 (PID:11393): Managed Standby Recovery not using Real Time Apply max_pdb is 5 2021-07-10T18:05:51.094491+08:00 Parallel Media Recovery started with 2 slaves 2021-07-10T18:05:51.233396+08:00 stopping change tracking 2021-07-10T18:05:51.583393+08:00 PR00 (PID:11393): Begin: SRL archival PR00 (PID:11393): End: SRL archival PR00 (PID:11393): Terminal Recovery timestamp is '07/10/2021 18:05:51' PR00 (PID:11393): Terminal Recovery: applying standby redo logs. PR00 (PID:11393): Terminal Recovery: thread 1 seq# 238 redo required 2021-07-10T18:05:51.682050+08:00 PR00 (PID:11393): Terminal Recovery: 2021-07-10T18:05:51.685886+08:00 Recovery of Online Redo Log: Thread 1 Group 4 Seq 238 Reading mem 0 Mem# 0: /opt/oracle/oradata/STDBYDB/stdbredo01.log Terminal Recovery finished with No-Data-Loss 2021-07-10T18:05:51.985957+08:00 Incomplete Recovery applied until change 12767732 time 07/10/2021 17:57:59 2021-07-10T18:05:52.074427+08:00 Media Recovery Complete (STDBYDB) 2021-07-10T18:05:52.655358+08:00 Terminal Recovery: successful completion PR00 (PID:11393): Forcing ARSCN to IRSCN for TR SCN:0x0000000000c2d1f4 PR00 (PID:11393): Attempt to set limbo arscn SCN:0x0000000000c2d1f4 irscn SCN:0x0000000000c2d1f4 PR00 (PID:11393): Resetting standby activation ID 2807176397 (0xa7521ccd) stopping change tracking 2021-07-10T18:05:52.873082+08:00 ALTER DATABASE SWITCHOVER TO PRIMARY (STDBYDB) 2021-07-10T18:05:52.873684+08:00 Maximum wait for role transition is 15 minutes. TMI: kcv_commit_to_so_to_primary wait for MRP to finish BEGIN 2021-07-10 18:05:52.873693 TMI: kcv_commit_to_so_to_primary wait for MRP to finish END 2021-07-10 18:05:52.873771 TMI: kcv_commit_to_so_to_primary Switchover from physical BEGIN 2021-07-10 18:05:52.874148 Backup controlfile written to trace file /opt/oracle/diag/rdbms/stdbydb/STDBYDB/trace/STDBYDB_ora_6240.trc 2021-07-10T18:05:53.571015+08:00 Standby terminal recovery start SCN: 12767670 RESETLOGS after complete recovery through change 12767732 NET (PID:6240): ORL pre-clearing operation disabled by switchover Online log /opt/oracle/oradata/STDBYDB/redo01.log: Thread 1 Group 1 was previously cleared Online log /opt/oracle/oradata/STDBYDB/redo02.log: Thread 1 Group 2 was previously cleared Online log /opt/oracle/oradata/STDBYDB/redo03.log: Thread 1 Group 3 was previously cleared 2021-07-10T18:05:53.907884+08:00 Standby became primary SCN: 12767669 2021-07-10T18:05:53.908157+08:00 Setting recovery target incarnation to 4 2021-07-10T18:05:53.966494+08:00 NET (PID:6240): RT: Role transition work is not done NET (PID:6240): The Time Management Interface (TMI) is being enabled for role transition NET (PID:6240): information. This will result in messages beingoutput to the alert log NET (PID:6240): file with the prefix 'TMI: '. This is being enabled to make the timing of NET (PID:6240): the various stages of the role transition available for diagnostic purposes. NET (PID:6240): This output will end when the role transition is complete. NET (PID:6240): Redo network throttle feature is disabled at mount time 2021-07-10T18:05:54.087343+08:00 NET (PID:6240): Database role cleared from PHYSICAL STANDBY [kcvs.c:1030] Switchover: Complete - Database mounted as primary TMI: kcv_commit_to_so_to_primary Switchover from physical END 2021-07-10 18:05:54.088069 TMI: dbsdrv failover to target END 2021-07-10 18:05:54.088135 Failover completed with No-Data-Loss. Completed: alter database failover to STDBYDB 2021-07-10T18:06:33.754882+08:00 alter database open 2021-07-10T18:06:33.755022+08:00 TMI: adbdrv open database BEGIN 2021-07-10 18:06:33.754898 Ping without log force is disabled: instance mounted in exclusive mode. 2021-07-10T18:06:33.945688+08:00 Crash Recovery excluding pdb 2 which was cleanly closed. 2021-07-10T18:06:33.945807+08:00 Crash Recovery excluding pdb 4 which was cleanly closed. Endian type of dictionary set to little 2021-07-10T18:06:33.999216+08:00 Assigning activation ID 2848626420 (0xa9ca96f4) 2021-07-10T18:06:34.009104+08:00 LGWR (PID:6045): LAD:2 is UNSYNCHRONIZED 2021-07-10T18:06:34.503600+08:00 Thread 1 advanced to log sequence 2 (thread open) 2021-07-10T18:06:34.503635+08:00 ARC3 (PID:6209): Becoming the 'no SRL' ARCH 2021-07-10T18:06:34.503832+08:00 Redo log for group 2, sequence 2 is not located on DAX storage Thread 1 opened at log sequence 2 Current log# 2 seq# 2 mem# 0: /opt/oracle/oradata/STDBYDB/redo02.log Successful open of redo thread 1 2021-07-10T18:06:34.521347+08:00 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set stopping change tracking 2021-07-10T18:06:34.662361+08:00 TT03 (PID:11461): Sleep 5 seconds and then try to clear SRLs in 2 time(s) 2021-07-10T18:06:34.883461+08:00 ARC0 (PID:6199): Archived Log entry 259 added for T-1.S-1 ID 0xa9ca96f4 LAD:1 2021-07-10T18:06:35.784688+08:00 Undo initialization recovery: err:0 start: 1407469 end: 1407660 diff: 191 ms (0.2 seconds) [6240] Successfully onlined Undo Tablespace 2. Undo initialization online undo segments: err:0 start: 1407660 end: 1408339 diff: 679 ms (0.7 seconds) Undo initialization finished serial:0 start:1407469 end:1408406 diff:937 ms (0.9 seconds) Dictionary check beginning 2021-07-10T18:06:36.927538+08:00 Dictionary check complete Verifying minimum file header compatibility for tablespace encryption for pdb 1.. Verifying file header compatibility for tablespace encryption completed for pdb 1 Database Characterset is AL32UTF8 No Resource Manager plan active 2021-07-10T18:06:39.177858+08:00 TT00 (PID:6201): Error 12541 received logging on to the standby TT00 (PID:6201): Check whether the listener is up and running. 2021-07-10T18:06:39.185611+08:00 TT03 (PID:11461): Sleep 5 seconds and then try to clear SRLs in 3 time(s) 2021-07-10T18:06:39.733886+08:00 joxcsys_required_dirobj_exists: directory object exists with required path /opt/oracle/product/19c/dbhome_1/javavm/admin/, pid 6240 cid 1 replication_dependency_tracking turned off (no async multimaster replication found) 2021-07-10T18:06:42.855357+08:00 LOGSTDBY: Validating controlfile with logical metadata LOGSTDBY: Validation complete Starting background process AQPC 2021-07-10T18:06:43.050601+08:00 AQPC started with pid=50, OS id=11483 PDB$SEED(2):Autotune of undo retention is turned on. 2021-07-10T18:06:44.060956+08:00 PDB$SEED(2):Endian type of dictionary set to little 2021-07-10T18:06:44.314613+08:00 TT03 (PID:11461): Enabling archival of deferred physical standby SRLs 2021-07-10T18:06:44.532641+08:00 TT03 (PID:11461): Archived Log entry 260 added for T-1.S-238 ID 0xa7521ccd LAD:1 2021-07-10T18:06:44.681960+08:00 PDB$SEED(2):Undo initialization finished serial:0 start:1416557 end:1416557 diff:0 ms (0.0 seconds) PDB$SEED(2):Pluggable database PDB$SEED dictionary check beginning 2021-07-10T18:06:48.986311+08:00 PDB$SEED(2):Pluggable Database PDB$SEED Dictionary check complete PDB$SEED(2):Database Characterset for PDB$SEED is AL32UTF8 2021-07-10T18:07:04.127503+08:00 PDB$SEED(2):Opening pdb with no Resource Manager plan active 2021-07-10T18:07:05.037256+08:00 QPI: opatch file present, opatch QPI: qopiprep.bat file present 2021-07-10T18:07:06.501677+08:00 Starting background process CJQ0 2021-07-10T18:07:06.511870+08:00 CJQ0 started with pid=47, OS id=11538 Completed: alter database open
Note the messages
PR00 (PID:11393): Terminal Recovery: 2021-07-10T18:05:51.685886+08:00 Recovery of Online Redo Log: Thread 1 Group 4 Seq 238 Reading mem 0 Mem# 0: /opt/oracle/oradata/STDBYDB/stdbredo01.log Terminal Recovery finished with No-Data-Loss 2021-07-10T18:05:51.985957+08:00 Incomplete Recovery applied until change 12767732 time 07/10/2021 17:57:59 2021-07-10T18:05:52.074427+08:00 Media Recovery Complete (STDBYDB) 2021-07-10T18:05:52.655358+08:00 Terminal Recovery: successful completion Standby terminal recovery start SCN: 12767670 Standby became primary SCN: 12767669 Failover completed with No-Data-Loss. Completed: alter database failover to STDBYDB
The Standby completed Recovery to Sequence#238 and SCN 12767669 which are much lower then the last known ArchiveLog (Sequence#243) and SCN (something higher than 12768459).
SQL> select scn_to_timestamp(12767669) from dual; SCN_TO_TIMESTAMP(12767669) --------------------------------------------------------------------------- 10-JUL-21 05.48.25.000000000 PM SQL>
which certainly tells me that the last transaction at the Standby is as of 05:48pm (approximately, the SCN_TO_TIMESTAMP provides us an approximate time only), even though I startup the Standby at 06:04pm.
2021-07-10T18:00:12.509208+08:00 Thread 1 advanced to log sequence 244 (LGWR switch) Current log# 1 seq# 244 mem# 0: /opt/oracle/oradata/ORCLCDB/redo01.log 2021-07-10T18:00:12.634091+08:00 ARC2 (PID:4347): Archived Log entry 529 added for T-1.S-243 ID 0xa7521ccd LAD:1
So I can confirm that there IS Data Loss !
oracle19c>oerr ora 16472 16472, 0000, "failover failed due to data loss" // *Cause: The database was in MaxAvailability or MaxProtection mode but // terminal recovery finished with data loss. // *Action: For a data loss failover, issue ALTER DATABASE ACTIVATE STANDBY // DATABASE command. oracle19c>
This signifies that a FAILOVER command has not succeeded and an ALTER DATABASE ACTIVATE STANDBY DATABASE must be issued.
No comments:
Post a Comment