Blog

ORA-00257: Archiver Error in Oracle 19c CDB — Diagnosis and Resolution

Background

This post documents a real production incident where an Oracle E-Business Suite (EBS) environment became completely unresponsive due to a full Fast Recovery Area (FRA). Users were reporting that everything was either slow or hanging — a classic sign that something fundamental had broken at the database layer.

Symptoms

EBS application users reporting sessions hanging or extremely slow response
sqlplus apps/<password> failing with:

ORA-00257: Archiver error. Connect AS SYSDBA only until resolved.

Non-SYSDBA connections completely blocked
30 FNDLIBR (Concurrent Manager) processes running on the application server — higher than expected for a QA environment

Environment

Oracle Database 19c (CDB/PDB architecture)
PDB: Application database PDB
EBS 12.2 on AIX
FRA configured on ASM diskgroup (+RECO), size 4095 GB
No explicit log_archive_dest — archivelogs defaulting to FRA

Step 1 — Identify the Root Cause

Connected to the CDB as SYSDBA and confirmed the error:

SHOW PARAMETER db_recovery_file_dest;

NAME                        TYPE        VALUE
--------------------------- ----------- ------
db_recovery_file_dest       string      +RECO
db_recovery_file_dest_size  big integer 4095G

Checked FRA usage:

COLUMN name           FORMAT A10
COLUMN limit_gb       FORMAT 999,999.99 HEADING 'LIMIT GB'
COLUMN used_gb        FORMAT 999,999.99 HEADING 'USED GB'
COLUMN reclaimable_gb FORMAT 999,999.99 HEADING 'RECLAIMABLE GB'
COLUMN number_of_files FORMAT 99999     HEADING 'FILES'

SELECT name,
       ROUND(space_limit/1024/1024/1024,2)       limit_gb,
       ROUND(space_used/1024/1024/1024,2)        used_gb,
       ROUND(space_reclaimable/1024/1024/1024,2) reclaimable_gb,
       number_of_files
FROM   v$recovery_file_dest;

Output:

NAME         LIMIT GB     USED GB  RECLAIMABLE GB  FILES
---------- ----------- ----------- -------------- ------
+RECO         4,095.00    4,075.65            .00   2426

FRA was at 99.5% — 4,075 GB used out of 4,095 GB, zero reclaimable, 2,426 archivelog files.

This was the root cause. With the FRA full and nothing reclaimable, the archiver (ARCn) process could not write new archive logs, blocking all database activity.

Step 2 — Assess Backup Status Before Taking Action

Before deleting any archivelogs, it is critical to understand the backup posture. Blindly deleting archivelogs without knowing the backup state can leave the database unrecoverable.

Started an RMAN session and ran a crosscheck first:

rman target /

RMAN> crosscheck archivelog all;

All 2,416 objects returned “validation succeeded” — no expired logs. This meant the FRA was genuinely full of valid, undeleted archivelogs.

Next, checked backup history:

RMAN> list backup summary;

Backups were listed daily going back several weeks — however, on closer inspection:

SELECT session_key,
       TO_CHAR(start_time,'DD-MON-YY HH24:MI') start_time,
       TO_CHAR(end_time,'DD-MON-YY HH24:MI')   end_time,
       status,
       input_bytes,
       output_bytes
FROM   v$rman_backup_job_details
ORDER  BY start_time DESC
FETCH FIRST 10 ROWS ONLY;

Output revealed a critical finding:

SESSION_KEY  START_TIME         END_TIME           STATUS    INPUT_BYTES  OUTPUT_BYTES
-----------  -----------------  -----------------  --------  -----------  ------------
      19986  30-APR-26 01:00    30-APR-26 01:00    COMPLETED           0         98304
      19904  29-APR-26 01:00    29-APR-26 01:00    COMPLETED           0         98304

INPUT_BYTES = 0 — no datafiles were being backed up
OUTPUT_BYTES = 98304 (96 KB) — only a controlfile/SPFILE autobackup
Jobs completing in under a minute — impossible for a real full DB backup

Additionally:

RMAN> list backup of archivelog all;
-- specification does not match any backup in the repository

No archivelog backups existed at all. The scheduled backup job was not performing proper datafile or archivelog backups — a separate issue to be addressed post-incident.

Step 3 — Escalation and Approval

Given that there were no archivelog backups and the datafile backup was questionable, the decision to delete archivelogs and Approval was obtained with the instruction to start conservatively — delete older than 15 days first, then 7 days if needed.

Step 4 — Resolution

First attempt — delete older than 15 days:

RMAN> delete noprompt archivelog until time 'SYSDATE-15';

RMAN returned warnings:

RMAN-08138: warning: archived log not deleted - must create more backups

This is because the RMAN retention policy was blocking deletion of logs that had never been backed up. With Tier 3 approval, used the force option to override:

RMAN> delete noprompt force archivelog until time 'SYSDATE-15';

This ran successfully, deleting archivelogs from January through late April.

FRA usage after deletion:

USED_GB  RECLAIMABLE_GB  NUMBER_OF_FILES
-------  --------------  ---------------
 246.71               0              149

Over 3,800 GB freed — FRA dropped from 99.5% to ~6%.

Step 5 — Verification

Forced a log switch and verified archiver status:

ALTER SYSTEM ARCHIVE LOG CURRENT;

SELECT dest_id, status, error
FROM   v$archive_dest
WHERE  status != 'INACTIVE';

Output:

DEST_ID  STATUS    ERROR
-------  --------  -----
      1  VALID

Archiver resumed successfully. Application connectivity was restored and users confirmed sessions were working normally.

Redo Log Status Check

As part of the verification, also confirmed redo log health:

SELECT thread#,
       sequence#,
       TO_CHAR(first_time,'DD-MON-YY HH24:MI:SS') first_time,
       first_change#,
       archived,
       status
FROM   v$log
ORDER  BY thread#, sequence#;

THREAD#  SEQUENCE#  FIRST_TIME                FIRST_CHANGE#  ARC  STATUS
-------  ---------  ------------------------  -------------  ---  -------
      1       2417  05-MAY-26 01:56:44         1.7867E+10    NO   INACTIVE
      1       2418  05-MAY-26 03:27:20         1.7867E+10    NO   INACTIVE
      1       2419  05-MAY-26 07:57:10         1.7868E+10    NO   CURRENT

No stuck or unarchived redo logs — database in healthy state.

Incident Summary

Item	Detail
Error	ORA-00257: Archiver error
Root Cause	FRA (+RECO ASM diskgroup) at 99.5% capacity
FRA Before	4,075 GB used / 2,426 files
FRA After	246 GB used / 149 files
Action Taken	`delete noprompt force archivelog until time 'SYSDATE-15'`
Space Freed	~3,829 GB
Resolution Time	~45 minutes from identification to restoration

Key Lessons Learned

1. Monitor FRA Proactively

Set up OEM 13c threshold alerts on FRA usage at 70% and 85%. Do not wait for ORA-00257 to discover the problem.

SELECT ROUND(space_used/space_limit*100,2) pct_used
FROM   v$recovery_file_dest;

Alert when this exceeds 80%.

2. Always Check Backup Status Before Deleting Archivelogs

In this incident, the backup job appeared to be running daily but was only backing up the controlfile (INPUT_BYTES=0). This is a serious gap — verify actual backup content, not just job status.

3. Investigate the Backup Job

A proper RMAN backup script for a 19c CDB should include:

BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;

4. Consider an Archivelog Deletion Policy

If archivelog backups are not being taken, configure an RMAN retention policy to prevent FRA accumulation:

CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;

5. FRA Sizing Review

Even at 4095 GB, the FRA filled up due to months of uncleaned archivelogs. Review archivelog generation rate and size FRA accordingly, or implement regular cleanup.

Conclusion

ORA-00257 is one of those errors that brings an entire EBS environment to its knees instantly. The fix itself is straightforward — free up FRA space — but the investigation matters. Rushing to delete archivelogs without understanding the backup posture can result in an unrecoverable database.

In this case, careful investigation revealed a deeper issue with the backup job that would have gone unnoticed had we not looked. Always verify, always escalate, and always get approval before deleting recovery-critical files.

Syed Anwar Ahmed is an Oracle Apps DBA with over 11 years of production experience across Oracle EBS, Database, RAC, GoldenGate, and OEM environments. He writes about real-world Oracle incidents at syedanwarahmedoracle.blog.

06/05/2026

Applying Oracle 19c Apr 2026 CPU Patch (19.31) on a Single Instance CDB

In this article, we are going to demonstrate the steps to apply the Oracle April 2026 Critical Patch Update (CPU) on a Single Instance Oracle 19c Container Database (CDB). Here we will apply combo patch 39062931 which contains 39034528 (Database Apr 2026 Release Update 19.31.0.0.260421) and 38906621 (OJVM Component Release Update 19.31.0.0.260421). We also cover an important real-world scenario where a previously applied overlay patch 34672698 (ORA-00800 fix) conflicted with the Apr 2026 RU — confirming that the fix is now natively included in the RU itself.

Follow the high-level plan to apply patch 39062931 (19.31.0.0.260421).

Go through the README documents of patch IDs before proceeding with patch apply.
Download the required patches from My Oracle Support.
Upgrade OPatch utility to the latest version as required by the patch README.
Perform pre-checks at CDB and PDB level — invalid objects, patch history, registry components.
Take Oracle Home TAR backup.
Blackout in OEM and disable crontab jobs before maintenance.
Shutdown database and listener.
Apply DB Release Update (39034528).
Apply OJVM Release Update (38906621).
Validate overlay patch (34672698) — check if still required post RU upgrade.
Start database and listener.
Execute datapatch on the database and validate.
Uncomment crontab and remove OEM blackout.
Perform post-checks and compare with pre-check data.

Environment Details:

COMPONENTS	DETAILS
Oracle Home	/u01/app/oracle/product/19c/db_1
DB Version	19.30.0.0.0 (Jan 2026 RU)
OPatch Version (Before)	12.2.0.1.49
OPatch Version (After)	12.2.0.1.51
DB Type	Single Instance, Multitenant (CDB+PDB)
CDB Name	DEVCDB
PDBs	DEVPDB1, DEVPDB2
OS	Linux x86-64
Combo Patch	39062931 (19.31.0.0.260421)
DB RU Patch	39034528
OJVM Patch	38906621
Overlay Patch	34672698 (validated — not reapplied)

1. Pre-Checks

1.1: Capture invalid objects at CDB level.

sqlplus / as sysdba

Set lines 230
set pages 100
col OBJECT_NAME for a40
col OBJECT_TYPE for a40
col OWNER for a20
select OBJECT_NAME, OBJECT_TYPE, OWNER,
       to_char(LAST_DDL_TIME,'DD-MON-YYYY HH24:MI:SS'), status
from dba_objects
where status='INVALID'
order by owner;

no rows selected

1.2: Capture patch history from dba_registry_sqlpatch.

select patch_id, action, status, action_time, description
from dba_registry_sqlpatch;

PATCH_ID	ACTION	STATUS	ACTION_TIME	DESCRIPTION
38523609	APPLY	SUCCESS	03-FEB-26	OJVM RELEASE UPDATE: 19.30.0.0.260120
38632161	APPLY	SUCCESS	03-FEB-26	Database Release Update: 19.30.0.0.260120

44 rows selected.

1.3: Capture DB registry components.

SELECT COMP_ID, comp_name, version, status FROM dba_registry;

COMP_ID	COMP_NAME	VERSION	STATUS
CATALOG	Oracle Database Catalog Views	19.30.0.0.0	VALID
CATPROC	Oracle Database Packages and Types	19.30.0.0.0	VALID
JAVAVM	JServer JAVA Virtual Machine	19.30.0.0.0	VALID
XML	Oracle XDK	19.30.0.0.0	VALID
XDB	Oracle XML Database	19.30.0.0.0	VALID
ORDIM	Oracle Multimedia	19.30.0.0.0	VALID
CONTEXT	Oracle Text	19.30.0.0.0	VALID
… 15 rows total. All components VALID.

1.4: Connect to each PDB and capture invalid objects as baseline.

alter session set container=DEVPDB1;
select OBJECT_NAME, OBJECT_TYPE, OWNER, status
from dba_objects where status='INVALID'
order by owner;

📌 Note: Document all pre-existing invalid objects at PDB level before patching. These serve as a baseline for post-patch comparison. Do not attempt to fix pre-existing invalid objects before patching.

1.5: Capture current patch inventory from OS.

[oracle@dbhost ~]$ opatch lspatches

PATCH_ID	DESCRIPTION
34672698	ORA-00800 SOFT EXTERNAL ERROR, ARGUMENTS [SET PRIORITY FAILED], [VKTM], DISM(16)
38523609	OJVM RELEASE UPDATE: 19.30.0.0.260120 (38523609)
38632161	Database Release Update: 19.30.0.0.260120 (38632161)
29585399	OCW RELEASE UPDATE 19.3.0.0.0 (29585399)

OPatch succeeded.

2. Download Patches

2.1: Create a dedicated directory for April 2026 patches.

mkdir -p /u01/app/oracle/patches/APR_2026
cd /u01/app/oracle/patches/APR_2026

2.2: Download OPatch (patch 6880880).

⚠️ Important: Always download the Linux-x86-64 version of OPatch, not the generic LINUX (32-bit) version. The 32-bit LINUX zip will cause oui/lib/linux directory errors during patch apply.

wget --user=<MOS_EMAIL> --password='<MOS_PASSWORD>' --no-check-certificate -c \
     "<DOWNLOAD_URL_FROM_MOS>" -O p6880880_190000_Linux-x86-64.zip

2.3: Download Combo Patch and Overlay Patch.

wget --user=<MOS_EMAIL> --password='<MOS_PASSWORD>' --no-check-certificate -c \
     "<DOWNLOAD_URL_FROM_MOS>" -O p39062931_190000_Linux-x86-64.zip

wget --user=<MOS_EMAIL> --password='<MOS_PASSWORD>' --no-check-certificate -c \
     "<DOWNLOAD_URL_FROM_MOS>" -O p34672698_1930000DBRU_Linux-x86-64.zip

2.4: Unzip patches and verify structure.

unzip p39062931_190000_Linux-x86-64.zip
unzip p34672698_1930000DBRU_Linux-x86-64.zip

ls -l 39062931/
drwxr-x---. 4 oracle oinstall  6144 38906621    -- OJVM RU
drwxr-x---. 4 oracle oinstall  6144 39034528    -- DB RU
-rw-r--r--. 1 oracle oinstall   118 README.html

3. Upgrade OPatch

3.1: Check current OPatch version.

[oracle@dbhost ~]$ $ORACLE_HOME/OPatch/opatch version
OPatch Version: 12.2.0.1.49
OPatch succeeded.

The April 2026 RU requires OPatch 12.2.0.1.51 or higher. We need to upgrade.

3.2: Backup and replace OPatch.

mv $ORACLE_HOME/OPatch $ORACLE_HOME/OPatch_old_12.2.0.1.49
unzip p6880880_190000_Linux-x86-64.zip -d $ORACLE_HOME

3.3: Verify new OPatch version.

[oracle@dbhost ~]$ $ORACLE_HOME/OPatch/opatch version
OPatch Version: 12.2.0.1.51
OPatch succeeded.

💡 Tip: If OPatch fails with Invalid maximum heap size: -Xmx5120m, set the following before retrying:

export OPATCH_JRE_MEMORY_OPTIONS="-Xmx1024m"

Add this to ~/.bash_profile to make it permanent.

4. Pre-Patching Activities

4.1: Comment out crontab entries to prevent jobs from running during maintenance.

[oracle@dbhost ~]$ (/usr/bin/crontab -l | /bin/sed 's/^/##APR_2026##&/g' | /usr/bin/crontab)

4.2: Place OEM blackout if OEM is configured for this target.

📌 Note: If the environment has application-level database triggers, disable them before shutting down the database. Consult the application team before making any trigger changes. Re-enable them after patching is complete.

5. Shutdown Database and Listener

5.1: Shutdown the database and listener, and take an Oracle Home TAR backup.

sqlplus / as sysdba
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit;

lsnrctl stop LISTENER

# Oracle Home TAR backup (run in background)
nohup tar -zcvf /u01/app/oracle/patches/oracle_home_APR2026.tar.gz $ORACLE_HOME &

6. Apply DB Release Update (39034528)

6.1: Set PATH and run the prerequisite conflict check.

export PATH=$ORACLE_HOME/OPatch:$PATH
cd /u01/app/oracle/patches/APR_2026/39062931/39034528

opatch prereq CheckConflictAgainstOHWithDetail -ph ./

ZOP-47: Superset Patch 39034528 has Subset Patch 38632161 which has overlay patches
[34672698] and these overlay patches conflict with Superset Patch.
OPatch failed with error code 73

⚠️ Action Required: The prereq check identified that overlay patch 34672698 conflicts with the new DB RU. Roll it back before applying the DB RU.

6.2: Rollback overlay patch 34672698.

opatch rollback -id 34672698
Patch 34672698 successfully rolled back.
OPatch succeeded.

6.3: Re-run prereq check after rollback.

opatch prereq CheckConflictAgainstOHWithDetail -ph ./
Prereq "checkConflictAgainstOHWithDetail" passed.
OPatch succeeded.

6.4: Apply the DB Release Update.

opatch apply
Patch 39034528 successfully applied.
Sub-set patch [38632161] has become inactive due to the application of a super-set patch [39034528].
OPatch succeeded.

6.5: Verify the patch is applied.

opatch lsinventory | grep 39034528
39034528  Database Release Update : 19.31.0.0.260421

7. Apply OJVM Release Update (38906621)

7.1: Run prereq check and apply the OJVM patch.

export PATH=$ORACLE_HOME/OPatch:$PATH
cd /u01/app/oracle/patches/APR_2026/39062931/38906621

opatch prereq CheckConflictAgainstOHWithDetail -ph ./
Prereq "checkConflictAgainstOHWithDetail" passed.
OPatch succeeded.

opatch apply
Patch 38906621 successfully applied.
Sub-set patch [38523609] has become inactive due to the application of a super-set patch [38906621].
OPatch succeeded.

7.2: Verify both patches are listed.

[oracle@dbhost ~]$ opatch lspatches

PATCH_ID	DESCRIPTION
39034528	Database Release Update: 19.31.0.0.260421 (39034528)
38906621	OJVM RELEASE UPDATE: 19.31.0.0.260421 (38906621)
29585399	OCW RELEASE UPDATE 19.3.0.0.0 (29585399)

OPatch succeeded.

8. Overlay Patch (34672698) — Conflict with Apr 2026 RU

The overlay patch 34672698 was previously applied on top of the Jan 2026 RU to fix ORA-00800: SOFT EXTERNAL ERROR, ARGUMENTS: [SET PRIORITY FAILED], [VKTM], DISM(16). It was rolled back in Step 6. Before reapplying, always run the prereq check.

📌 Note on oradism permissions: Some environments require a temporary oradism ownership change to oracle before applying the overlay patch. Always check the current permissions first:

ls -l $ORACLE_HOME/bin/oradism

If oradism is already oracle-owned (e.g., -rwxr-x---. 1 oracle oinstall), no root permission change is required. Only apply the chown/chmod steps if oradism is root-owned with setuid bit.

8.1: Run prereq check before reapplying the overlay patch.

cd /u01/app/oracle/patches/APR_2026/34672698
opatch prereq CheckConflictAgainstOHWithDetail -ph ./

ZOP-40: The patch(es) has conflicts with other patches installed in the Oracle Home.
Following patches have conflicts: 34672698, 39034528
OPatch failed with error code 73

✔ Expected Result: The prereq confirms overlay patch 34672698 conflicts with Apr 2026 DB RU (39034528). Oracle has included the ORA-00800 fix natively in the Apr 2026 RU. The overlay patch is no longer required and must NOT be reapplied.

⚠️ Important: This is expected behaviour when upgrading RUs. An overlay patch applied on a previous RU may become redundant when the next RU natively includes the same fix. Always run prereq check before reapplying overlay patches after an RU upgrade.

8.2: Verify the overlay patch is absent from inventory.

opatch lsinventory | grep 34672698
[oracle@dbhost 34672698]$

✔ Confirmed: No output from the grep confirms the overlay patch is correctly absent from the Oracle Home.

9. Start Database and Run Datapatch

9.1: Start the database and open all PDBs.

sqlplus / as sysdba
SQL> startup;
ORACLE instance started.
Database mounted.
Database opened.

SQL> alter pluggable database all open;
Pluggable database altered.

SQL> exit;

lsnrctl start LISTENER

9.2: Run datapatch to apply the SQL portion of the patches.

cd $ORACLE_HOME/OPatch
./datapatch -verbose

SQL Patching tool version 19.31.0.0.0 Production on Tue May 5 13:17:39 2026

Installation queue:
  For CDB$ROOT PDB$SEED DEVPDB1 DEVPDB2:
    Rollback: 38523609 (OJVM 19.30)
    Apply: 39034528 (DB RU 19.31)
    Apply: 38906621 (OJVM 19.31)

Patch 38523609 rollback (CDB$ROOT): SUCCESS
Patch 39034528 apply (CDB$ROOT): SUCCESS
Patch 38906621 apply (CDB$ROOT): SUCCESS
Patch 38523609 rollback (DEVPDB1): SUCCESS
Patch 39034528 apply (DEVPDB1): SUCCESS
Patch 38906621 apply (DEVPDB1): SUCCESS
Patch 38523609 rollback (DEVPDB2): SUCCESS
Patch 39034528 apply (DEVPDB2): SUCCESS
Patch 38906621 apply (DEVPDB2): SUCCESS

SQL Patching tool complete on Tue May 5 13:24:18 2026

📌 Note: Datapatch applies the SQL changes to both CDB and all open PDBs. Wait for it to complete fully before proceeding. Do not interrupt the process.

10. Post-Patching Activities

10.1: Uncomment crontab entries.

[oracle@dbhost ~]$ (/usr/bin/crontab -l | /bin/sed 's/##APR_2026##//g' | /usr/bin/crontab)

10.2: Remove OEM blackout if applicable.

📌 Note: If application-level database triggers were disabled before patching, re-enable them now after confirming datapatch completed successfully. Consult the application team before enabling.

11. Post-Checks

11.1: Verify patch registry — all entries should show SUCCESS.

select patch_id, action, status, action_time, description
from dba_registry_sqlpatch
order by action_time;

PATCH_ID	ACTION	STATUS	ACTION_TIME	DESCRIPTION
38523609	ROLLBACK	SUCCESS	05-MAY-26 01:20:47	OJVM RELEASE UPDATE: 19.30.0.0.260120
38906621	APPLY	SUCCESS	05-MAY-26 01:21:02	OJVM RELEASE UPDATE: 19.31.0.0.260421
39034528	APPLY	SUCCESS	05-MAY-26 01:21:56	Database Release Update: 19.31.0.0.260421

11.2: Verify all DB registry components are VALID.

SELECT COMP_ID, comp_name, version, status FROM dba_registry;

COMP_ID	COMP_NAME	VERSION	STATUS
CATALOG	Oracle Database Catalog Views	19.31.0.0.0	VALID
CATPROC	Oracle Database Packages and Types	19.31.0.0.0	VALID
JAVAVM	JServer JAVA Virtual Machine	19.31.0.0.0	VALID
XML	Oracle XDK	19.31.0.0.0	VALID
XDB	Oracle XML Database	19.31.0.0.0	VALID
ORDIM	Oracle Multimedia	19.31.0.0.0	VALID
CONTEXT	Oracle Text	19.31.0.0.0	VALID
… 15 rows total. All components VALID.

11.3: Confirm final patch inventory.

[oracle@dbhost ~]$ opatch lspatches

PATCH_ID	DESCRIPTION
39034528	Database Release Update: 19.31.0.0.260421 (39034528)
38906621	OJVM RELEASE UPDATE: 19.31.0.0.260421 (38906621)
29585399	OCW RELEASE UPDATE 19.3.0.0.0 (29585399)

OPatch succeeded.

✔ Confirmed: Overlay patch 34672698 is no longer listed. Its fix is now natively included in the Apr 2026 RU (39034528). This is the expected and correct final state.

11.4: Compare invalid objects at CDB and PDB level with pre-patch baseline. Compile any new invalid objects introduced by the patch:

@$ORACLE_HOME/rdbms/admin/utlrp.sql

Key MOS References

Document	Description
KB869205	Oracle Database 19c Apr 2026 RU Known Issues
KB137197	OJVM Conditional Rolling Install Details
244241.1	OPatch Support for RAC Rolling Patches
293369.1	OPatch Documentation List

Disclaimer: All server names, hostnames, database names, and environment-specific paths in this post have been anonymized. Steps and outputs are based on real production experience adapted for general use. Always test in a non-production environment before applying patches to production systems.

If you found this useful, connect with me on LinkedIn or explore more Oracle DBA scripts on my GitHub. More patching walkthroughs at syedanwarahmedoracle.blog.

💬 Share Your Experience

Have you applied Oracle 19c CPU patches and hit a similar overlay patch conflict? Or found that a fix you applied months ago is now natively bundled into the next RU? Your real-world experiences help fellow DBAs tackle the same challenges. Drop a comment below — questions, observations, and feedback are always welcome.

01/05/2026

P1 Incident: /dbbackup Filesystem 100% Full — How We Traced, Fixed, and Recovered Two Failed RMAN Backups in One Night

🔴 Incident Overview

Severity	P1 — Two production database backups failed
Environment	Oracle 19c (19.30) on Linux x86-64
Backup Tool	RMAN with Recovery Catalog
Backup Volume	/dbbackup — 1TB LVM filesystem
Databases Affected	DBPRO01 (12:30 failure), DBPRO02 (22:30 failure)
Total Space Recovered	~492G (disk went from 100% → 52%)

1. The Alerts — Two Failures, Same Root Cause

It started with an RMAN failure at 22:30. The backup script for DBPRO02 fired on schedule and died within 2 minutes. The RMAN log told the story clearly:

RMAN-03009: failure of backup command on c4 channel at 22:32:13
ORA-19502: write error on file "/dbbackup/DBPRC02/rman/DiffInc_DBPRC02_4u4mi8bf"
ORA-27072: File I/O error
Additional information: 4

Three more channels followed — c1, c2, c3 — all crashing at exactly 22:32:48. When multiple channels fail simultaneously at the same timestamp, it almost always means one thing: the destination filesystem just hit 100%.

A quick check confirmed it:

$ df -hP /dbbackup
Filesystem                        Size  Used Avail Use%
/dev/mapper/orabkupvg-orabkuplv1 1023G 1020G  3.3G 100%

1TB volume. 3.3G free. Completely full.

What we did not know yet — digging into backup history would reveal that DBPRO01 had already failed at 12:30 that same day for the same reason, 10 hours earlier. Two databases unprotected on the same night.

2. The Investigation — Folder by Folder

The first step was understanding what was consuming the disk. One command gave us the top-level picture:

$ du -sh /dbbackup/*
744G    DBPRC01
152G    ColdBackup_11April2026
 41G    DBPRC02
 26G    DBPRC03
 15G    DBPRO01
7.7G    JAN2026_CPU
7.7G    OCT2025_CPU
5.3G    infra_arch

744G inside DBPRC01 alone — 73% of the entire disk. That was our primary suspect.

Drilling into DBPRC01

$ du -sh /dbbackup/DBPRC01/rman/* | sort -rh | head -10
7.5G    DiffInc_DBPRC01_fl4lumij
7.5G    DiffInc_DBPRC01_eg4lrnh5
7.5G    DiffInc_DBPRC01_b44lc7uh
...

Every single file was a DiffInc_ or ArchivelogAll_ backup piece. No variety. No cleanup. Just backup after backup piling up.

$ ls /dbbackup/DBPRC01/rman/ | wc -l
1522

1,522 backup pieces. We checked the oldest and newest:

Oldest file on disk:  2022-05-07
Newest file on disk:  2026-04-25

Four years of backup files on disk — or so we thought.

3. The RMAN Investigation — Where Things Got Interesting

We connected RMAN to the database and ran the retention check:

RMAN> SHOW RETENTION POLICY;
CONFIGURE RETENTION POLICY TO REDUNDANCY 30;

REDUNDANCY 30. This tells RMAN to keep the last 30 complete backup copies of every datafile before considering anything obsolete.

Next logical step — check what RMAN considers obsolete:

RMAN> REPORT OBSOLETE;
no obsolete backups found

Nothing? With 1,522 files on disk?

We ran CROSSCHECK BACKUP — all 1,693 objects came back AVAILABLE. Then we checked the actual date range RMAN was tracking from the database control file:

SELECT TO_CHAR(MIN(completion_time),'DD-MON-YYYY') oldest,
       TO_CHAR(MAX(completion_time),'DD-MON-YYYY') newest,
       COUNT(*) total_pieces
FROM v$backup_piece_details
WHERE status = 'A';

OLDEST          NEWEST          TOTAL_PIECES
03-DEC-2025     25-APR-2026     1541

The control file only tracks pieces from December 2025 onwards — about 5 months. The 2022/2023 files seen on disk were old directories and scripts, not backup pieces. All 1,541 current pieces were legitimate and RMAN considered every one of them necessary under REDUNDANCY 30.

This was the key insight: RMAN was not broken. The retention policy itself was the problem.

4. Root Cause — The Architecture Trap

The deeper investigation revealed something unexpected. Looking at the actual RMAN backup script:

connect target rman/password@DBPRO01
...
format '/dbbackup/DBPRC01/rman/DiffInc_%d_%u'
(database);
...
delete obsolete;

DBPRO01 (the production database) was backing up INTO the DBPRC01 directory. The directory names suggested one database but contained another database’s backups entirely. The naming convention was PRO to PRC — production database backups stored in the production-copy directory.

This pattern existed for all three database pairs on the server. Each production database backed up into its corresponding copy directory.

The delete obsolete command was in the script — but with REDUNDANCY 30 and weekly Level 0 backups, obsolete only kicks in after 30 complete Level 0 cycles. That is 30 weeks = 7.5 months of retention. Since the current tracking window was only 5 months, delete obsolete ran every night and found absolutely nothing to delete.

The math:

Retention policy	REDUNDANCY 30
Level 0 frequency	Weekly (Sundays)
Effective retention period	~30 weeks / 7.5 months
Backup tracking since	December 2025 (~5 months)
Result	delete obsolete finds nothing — ever
Daily backup size	~7–7.5G per run
Total accumulated	744G

Adding fuel to the fire — the patching activity on April 18 triggered an extra Level 0 backup, followed by the regular Sunday Level 0 on April 19. Two large Level 0 runs (~27G each) within 24 hours wrote the final ~54G that pushed the disk over the edge.

5. Secondary Findings During Investigation

OCT2025 CPU Patch Artifacts (7.7G)

The October 2025 CPU patch files (zip archives + extracted directories) were still sitting in /dbbackup/OCT2025_CPU/. A quick OPatch check confirmed the database had since been patched to 19.30 (January 2026 RU) — the October 2025 patches were fully superseded and rolled back from inventory. Safe to delete immediately.

$ $ORACLE_HOME/OPatch/opatch lsinventory | grep -E "38291812|38194382"
# Empty — neither Oct 2025 patch in inventory anymore

5-Year-Old Pre-Migration Export Dumps

Three directories contained Oracle 11.2.0.4 export dumps from January–March 2021 — taken before the migration to 19c. With the database now running 19.30, these had zero recovery value but occupied ~14G collectively. Flagged for manager approval before deletion.

Recovery Catalog Version Mismatch

The original RMAN log flagged this warning:

PL/SQL package RMAN.DBMS_RCVCAT version 19.11.00.00 in RCVCAT database is not current
PL/SQL package RMAN.DBMS_RCVMAN version 19.11.00.00 in RCVCAT database is not current

The recovery catalog is running 19.11 packages while the RMAN client is now 19.30. Non-critical tonight but requires UPGRADE CATALOG in the next maintenance window.

6. The Fix — Emergency Space Recovery

With management approval obtained, we executed a time-based delete — keeping the last 30 days of backups and removing everything older:

RMAN> DELETE NOPROMPT BACKUP COMPLETED BEFORE 'SYSDATE-30';

This command does three things atomically:

Queries catalog/controlfile for all pieces completed before the cutoff date
Deletes the physical files from disk
Removes the records from RMAN catalog — no orphaned entries, no catalog drift

The output scrolled for several minutes:

deleted backup piece
backup piece handle=/dbbackup/DBPRC01/rman/DiffInc1_DBPRC01_6u4jaaih ...
deleted backup piece
backup piece handle=/dbbackup/DBPRC01/rman/ArchivelogAll_DBPRC01_784jab6o ...
...
Deleted 1072 objects

1,072 backup pieces deleted. Catalog updated. Disk checked:

BEFORE:  Used 1020G  Avail 3.3G  (100%)
AFTER:   Used  536G  Avail 488G   (53%)

Then the OCT2025_CPU directory was removed:

$ rm -rf /dbbackup/OCT2025_CPU/
$ df -hP /dbbackup
Used 528G  Avail 495G  (52%)

Final result: 495G free. Disk at 52%.

Both failed backups were re-submitted immediately and ran successfully in parallel:

$ nohup sh /opt/oracle/scripts/rman/rman_backup_DBPRO01.sh &
$ nohup sh /opt/oracle/scripts/rman/rman_backup_DBPRO02.sh &

$ jobs -l
[1] Running   nohup sh ...rman_backup_DBPRO01.sh &
[2] Running   nohup sh ...rman_backup_DBPRO02.sh &

7. Incident Timeline

12:30	DBPRO01 Level 0 backup fails — ORA-19502/ORA-27072 (disk full)
22:30	DBPRO02 Level 0 backup fails — same errors, all 4 channels
23:08	Investigation begins — df -hP /dbbackup confirms 100% full
23:15	DBPRC01 directory identified as 744G consumer
23:25	RMAN connected — REDUNDANCY 30 discovered
23:35	Architecture confirmed — PRO databases backing up into PRC directories
23:45	Root cause confirmed — 7.5-month retention, delete obsolete finds nothing
23:50	DELETE BACKUP COMPLETED BEFORE SYSDATE-30 executed
23:51	1,072 pieces deleted — disk drops to 53%
23:55	OCT2025_CPU removed — disk at 52%, 495G free
00:00	Both backup jobs re-submitted and running successfully

8. Permanent Fix Recommendations

Fix 1 — Change Retention Policy to RECOVERY WINDOW

RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 14 DAYS;

REDUNDANCY 30 with weekly Level 0s means 7.5 months of retention — far beyond what any production SLA requires. A 14-day recovery window keeps 2 weeks of backups regardless of backup frequency, and delete obsolete will actually find and remove old pieces going forward.

Fix 2 — Add Pre-Backup Space Check to Cron Script

#!/bin/bash
BACKUP_FS="/dbbackup"
THRESHOLD=20

AVAIL_PCT=$(df -hP $BACKUP_FS | awk 'NR==2 {gsub(/%/,""); print 100-$5}')

if [ "$AVAIL_PCT" -lt "$THRESHOLD" ]; then
  echo "ABORT: $BACKUP_FS is ${AVAIL_PCT}% free — below ${THRESHOLD}% threshold" \
    | mailx -s "BACKUP ABORTED: Low space on $BACKUP_FS" $MAILTO
  exit 1
fi

A failing backup that writes 3G before dying is worse than a backup that never starts — it wastes the last 3G of free space and leaves partial pieces on disk.

Fix 3 — Upgrade the Recovery Catalog

RMAN> CONNECT TARGET /
RMAN> CONNECT CATALOG rman/password@rmancat
RMAN> UPGRADE CATALOG;
RMAN> UPGRADE CATALOG;   -- run twice as prompted

The catalog is 2 major patch levels behind the RMAN client. Some catalog-dependent operations will start failing if left unaddressed.

Fix 4 — Filesystem Monitoring Alert

The FRA check scripts already email on FRA usage above 80%. The same pattern should exist for /dbbackup. A simple cron entry checking disk usage every hour with alert at 80% would have caught this days before the disk hit 100%.

9. Key Takeaways for Oracle DBAs

REDUNDANCY N is not always safer than RECOVERY WINDOW. REDUNDANCY 30 with weekly Level 0 backups means 7.5 months of retention — likely far beyond your RPO requirement and a silent space accumulator.

Always verify what delete obsolete actually deletes. If it finds nothing to delete every single night, that is a warning sign — not reassurance.
Check backup naming conventions carefully. When a directory named DBPRC01 contains DBPRO01 backups, retention policies applied to the wrong database RMAN configuration control the cleanup behavior.
Patching days generate oversized backups. A Level 0 taken manually on patch day plus the regular Sunday Level 0 the next day equals 2x the normal space consumption in 24 hours. Ensure extra headroom exists going into patch windows.
Use DELETE BACKUP COMPLETED BEFORE SYSDATE-N for emergency cleanup — not OS-level rm. RMAN deletes atomically update both the physical files and the catalog, preventing expired/orphaned piece confusion later.
Never use rm on RMAN backup pieces directly unless you follow up with CROSSCHECK BACKUP and DELETE EXPIRED BACKUP to sync the catalog.

10. Commands Reference — Quick Cheat Sheet

-- Check retention policy
RMAN> SHOW RETENTION POLICY;

-- Preview what would be deleted (dry run)
RMAN> REPORT OBSOLETE;
RMAN> LIST BACKUP COMPLETED BEFORE 'SYSDATE-30';

-- Emergency cleanup — delete pieces older than 30 days
RMAN> DELETE NOPROMPT BACKUP COMPLETED BEFORE 'SYSDATE-30';

-- Standard cleanup based on retention policy
RMAN> DELETE NOPROMPT OBSOLETE;

-- Sync catalog after any OS-level file operations
RMAN> CROSSCHECK BACKUP;
RMAN> DELETE NOPROMPT EXPIRED BACKUP;

-- Change to time-based retention (recommended)
RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 14 DAYS;

-- Check backup piece date range in control file
SELECT TO_CHAR(MIN(completion_time),'DD-MON-YYYY') oldest,
       TO_CHAR(MAX(completion_time),'DD-MON-YYYY') newest,
       COUNT(*) total_pieces
FROM v$backup_piece_details
WHERE status = 'A';

-- Check backup history
SELECT session_key, input_type, status,
       TO_CHAR(start_time,'YYYY-MM-DD HH24:MI:SS') start_time,
       output_bytes_display, time_taken_display
FROM v$rman_backup_job_details
ORDER BY start_time DESC;

Conclusion

What appeared to be a simple disk full incident turned out to involve a multi-database backup architecture, a misconfigured retention policy, and a cleanup mechanism that was technically running correctly but never finding anything to clean. The fix itself — one RMAN command — took under 5 minutes. The real work was the systematic investigation to understand exactly what was safe to delete and why.

That is Oracle DBA work in a nutshell: the fix is often simple; understanding why it is safe to run is the real job.

If you found this useful, connect with me on LinkedIn or explore more Oracle DBA scripts on my GitHub. More incident walkthroughs at syedanwarahmedoracle.blog.

28/04/2026

Troubleshooting OGG-01224 TCP/IP Error 79: GoldenGate EXTRACT Abend Due to Connection Refused

Overview

In a GoldenGate 19c environment, one of the EXTRACT processes abended with a TCP/IP error while all other EXTRACT processes were running normally. This post walks through the exact error, the diagnosis process, and how the issue was resolved — along with preventive recommendations to avoid recurrence.

Environment Details

Component	Detail
GoldenGate Version	Oracle GoldenGate 19c (Classic Architecture)
Source Database	Oracle 12.1.0.2
Source Server	ogg-source01.example.com
Manager Port	8910
Abended Process	EXTRACT EXTR0001

The Problem

During routine monitoring, the GoldenGate process status showed:

EXTRACT     ABENDED     EXTR0001    CLASSIC
EXTRACT     RUNNING     EXTR0002    CLASSIC
EXTRACT     RUNNING     EXTR0003    CLASSIC
EXTRACT     RUNNING     EXTR0004    CLASSIC

Only EXTR0001 had abended while the other three EXTRACT processes were running normally.

Reading the Report File

The first step was to view the EXTRACT report file from GGSCI:

GGSCI> VIEW REPORT EXTR0001

The relevant portion of the report showed:

2026-04-14 12:38:41  INFO    OGG-02820  Positioning to (Thread 1) Sequence 230,424,
                              RBA 34,213,904, SCN 459.4,134,390,557.

2026-04-14 12:38:41  INFO    OGG-02824  Positioned to (Thread 1) Sequence 230,424,
                              RBA 34,213,904, Apr 14, 2026 10:07:59 AM.

Source Context :
  SourceModule  : [ggnet.tcprtn]
  SourceID      : [../gglib/ggnet/tcprtn.c]
  SourceMethod  : [opt_negotiate(int, errmsg_t &)]
  SourceLine    : [4739]

2026-04-14 12:38:41  ERROR   OGG-01224  TCP/IP error 79 (Connection refused),
                              endpoint: ogg-target02:8910.

2026-04-14 12:38:41  ERROR   OGG-01668  PROCESS ABENDING.

Two important observations from this output:

The EXTRACT successfully positioned in the source redo logs at the correct SCN — this ruled out any issue with the source database or archived logs.
The abend happened at opt_negotiate — the GoldenGate TCP-level handshake between the EXTRACT and the remote Manager.

Error Breakdown

OGG-01224  TCP/IP error 79 (Connection refused)
endpoint: ogg-target02:8910

Element	Meaning
OGG-01224	GoldenGate TCP/IP communication failure
Error 79	ECONNREFUSED — socket connect actively rejected
opt_negotiate	Protocol negotiation phase (post-TCP, pre-trail write)
Port 8910	GoldenGate Manager port on the remote server

“Connection refused” (errno 79) means the TCP SYN packet reached the remote server but the connection could not be completed — a transient error commonly seen in GoldenGate environments.

Diagnosis Steps

Step 1 — View the EXTRACT Report

The first step was to read the report file and identify the exact error:

GGSCI> VIEW REPORT EXTR0001

The error pointed to OGG-01224 TCP/IP error 79 (Connection refused) at the opt_negotiate stage — meaning the EXTRACT successfully read the source redo logs but failed when attempting to communicate with the remote GoldenGate Manager.

Step 2 — Check EXTRACT Status

GGSCI> INFO EXTRACT EXTR0001, DETAIL
GGSCI> INFO ALL

This confirmed only EXTR0001 had abended while the remaining EXTRACT processes were running normally.

Root Cause

OGG-01224 TCP/IP errors are among the most commonly encountered errors in GoldenGate environments. They are often transient in nature — caused by brief network interruptions, a momentary spike in system load, or a short-lived instability in the GoldenGate Manager communication layer. In many cases, no deep investigation is required; the EXTRACT simply needs to be restarted after waiting a short period for the underlying condition to clear.

The other EXTRACT processes (EXTR0002, EXTR0003, EXTR0004) were unaffected, confirming the issue was isolated and transient.

Resolution

After waiting briefly for the transient condition to clear, the abended EXTRACT was restarted:

GGSCI> START EXTRACT EXTR0001
GGSCI> INFO EXTRACT EXTR0001, DETAIL

The EXTRACT came up immediately and resumed processing without any data loss or position reset.

Post-Recovery Verification

-- Confirm all processes running
GGSCI> INFO ALL

-- Check lag is reducing
GGSCI> LAG EXTRACT EXTR0001

-- Confirm records are flowing
GGSCI> STATS EXTRACT EXTR0001

Key Takeaways

1. OGG-01224 TCP/IP errors are common and often transient.
These errors are frequently seen in GoldenGate environments and do not always indicate a serious underlying problem. A brief network hiccup, momentary system load, or a short instability in the Manager communication layer can trigger this error. In many cases, waiting a short time and restarting the EXTRACT is all that is needed.

2. The opt_negotiate method is the first GoldenGate-level handshake.
After the TCP connection is established, GoldenGate negotiates capabilities via opt_negotiate. A failure at this stage is typically transient and clears on its own after a retry.

3. Multiple EXTRACTs can fail independently.
When one EXTRACT abends while others continue running, the issue is isolated. Do not assume a global outage — check each process individually.

4. Add AUTORESTART to Manager params.
To reduce downtime on similar future occurrences, configure Manager to auto-restart key processes:

-- In mgr.prm
AUTORESTART EXTRACT *, RETRIES 5, WAITMINUTES 2, RESETMINUTES 60

Diagnostic Checklist for OGG-01224 TCP/IP Errors

[ ] View EXTRACT report: GGSCI> VIEW REPORT <extract_name>
[ ] Note the SourceMethod — is it opt_negotiate or during data transfer?
[ ] Check if other EXTRACTs are running (INFO ALL)
[ ] If error is transient: wait briefly, then START EXTRACT <name>
[ ] If error persists: test port reachability (telnet <rmthost> <mgrport>)
[ ] If error persists beyond a retry: escalate for network/firewall checks
[ ] Verify RMTHOST/MGRPORT in EXTRACT param file (VIEW PARAMS <name>)
[ ] Check for encryption mismatch if opt_negotiate consistently fails
[ ] Add AUTORESTART in mgr.prm to handle future transient failures automatically

References

Oracle GoldenGate Documentation — docs.oracle.com
MOS Note 966227.1: OGG Troubleshooting TCP/IP Errors In Open Systems (My Oracle Support login required)

About the Author

Syed Anwar Ahmed is an Oracle Apps DBA with extensive production experience managing Oracle RAC, E-Business Suite, and GoldenGate replication environments. He shares practical DBA knowledge from real-world incidents on his blog and is an Oracle ACE Apprentice candidate.

Connect on LinkedIn | Blog | GitHub

15/04/2026

Fixing Oracle EBS 12.2 Forms Not Opening (Step-by-Step Guide)
Oracle E-Business Suite (EBS) 12.2 uses Java Web Start (JNLP) to launch forms, unlike older versions like 11i which relied on browser plugins. If your forms are not opening, this guide will help you fix the issue completely.

Common Symptoms
- Forms not opening after click
- .jnlp file downloads but does nothing
- Application Blocked by Java Security error
- Nothing happens when clicking forms
Root Causes
- Using unsupported Java version (Java 11, 17, or 24)
- Missing Java Web Start (OpenWebStart)
- Java security blocking self-signed certificates
- Old EBS 11i configurations applied to 12.2 environment
Step-by-Step Solution

Step 1: Remove Unsupported Java Versions

Uninstall Java 11, 17, 24 or any non-Java 8 version from Control Panel → Programs → Uninstall a Program.

Verify removal:
java -version
Expected output:
'java' is not recognized as an internal or external command
Also clean up residual folders if they exist:
C:\Program Files\Java\ C:\Program Files (x86)\Java\ C:\Users\<username>\AppData\LocalLow\Sun\Java\
Step 2: Install Java 8

Download and install Java 8 JRE (1.8.0_xxx) from Oracle’s website. Verify after installation:
java -version
Expected output:
java version "1.8.0_xxx"
Step 3: Install OpenWebStart

Oracle EBS 12.2 requires Java Web Start to launch .jnlp files. Since modern Java versions removed the built-in Web Start, install OpenWebStart as a replacement. Download from: https://openwebstart.com

Step 4: Configure Java Security

Go to Control Panel → Java → Security → Edit Site List and add your EBS URLs:
http://your-ebs-url http://your-ebs-url:port https://your-ebs-url:port
Step 5: Clear Java Cache

Go to Java Control Panel → General → Delete Files, select all options, and click OK.

Step 6: Associate JNLP Files with OpenWebStart
- Right-click any .jnlp file
- Select Open with → Choose another app
- Select OpenWebStart
- Check Always use this app
Fix for “Application Blocked by Java Security”

If you see the Application Blocked by Java Security error, add your EBS URL to the Exception Site List as described in Step 4. This allows self-signed certificates to run without being blocked.

Expected Behavior After Configuration
1. Login to EBS 12.2
2. Click a Form-based responsibility
3. .jnlp file launches via OpenWebStart
4. Oracle Form opens successfully
Important Notes
- Do not use Java 7 — that is only required for EBS 11i
- IE Mode in Edge is not required for EBS 12.2
- Always use the Java 8 + OpenWebStart combination
- Ensure popups and file downloads are allowed in your browser
Summary

Component Requirement
Java Java 8 (required)
Launcher OpenWebStart (required)
Browser Plugin Not required
IE Mode Not required

Conclusion

Most EBS 12.2 Forms issues come down to incorrect Java setup or missing OpenWebStart. If you previously configured your machine for EBS 11i, fully revert those settings before applying the 12.2 configuration — conflicting setups are one of the most common causes of Forms not opening.

— Syed Anwar Ahmed | Oracle Apps DBA | LinkedIn
14/04/2026

Component	Requirement
Java	Java 8 (required)
Launcher	OpenWebStart (required)
Browser Plugin	Not required
IE Mode	Not required

How I Fixed Oracle AHF/TFA Not Starting on an 11g RAC Cluster (TFA-00002 / AHF-07250)

Category: Oracle RAC | Troubleshooting | AHF/TFA | Level: Intermediate to Advanced

Background

We recently had a critical production incident on our two-node Oracle 11g RAC cluster where the Fast Recovery Area (FRA) hit capacity, causing both instances to enter an INTERMEDIATE state due to a Stuck Archiver condition. Oracle Support raised an SR and asked for CRS diagnostic data collected using TFA (Trace File Analyzer).

That’s when we discovered a second problem — TFA was completely non-functional on both nodes with the infamous TFA-00002 error. This post documents the full journey of diagnosing and fixing TFA, and how we manually collected the CRS logs for the SR in the meantime.

The SR Request

Oracle Support requested the following:

CRS alert log from all nodes: <ORACLE_BASE>/diag/crs/*/crs/trace/alert.log
All CRS-related trace files updated during the incident period

Step 1 — Finding the CRS Alert Log

The first challenge was locating the CRS logs. This cluster has a separate Grid Infrastructure installation with a different OS user (grid) from the database (oracle).

[oracle@racnode1 ~]$ echo $ORACLE_BASE
/u01/oradb/oracle

Switching to the grid user:

su - grid
echo $ORACLE_HOME
# /u01/oragrid/11.2/grid

ORACLE_BASE was not set for the grid user, so we used the orabase binary:

$ORACLE_HOME/bin/orabase
# /u01/oragrid/oracle

However, the ADR path (/u01/oragrid/oracle/diag/crs/*/crs/trace/alert.log) didn’t exist. This is because Oracle 11.2 Grid Infrastructure uses a different log location — not the ADR diag tree. The correct format is:

$GRID_HOME/log/<hostname>/alert<hostname>.log

The correct path on our cluster:

ls -lh /u01/oragrid/11.2/grid/log/racnode1/alertracnode1.log
# -rw-rw-r--. 1 grid oinstall 14M Apr 12 12:53 alertracnode1.log

⚠️ Key takeaway: On 11.2 GI, CRS alert logs live under $GRID_HOME/log/<hostname>/ — not in the ADR structure used by 12c and later.

Step 2 — Manually Collecting CRS Logs

The log directory structure under $GRID_HOME/log/<hostname>/ includes:

alertracnode1.log    ← Main CRS alert log
crsd/                ← CRSD rotating logs (crsd.log, crsd.l01, crsd.l02 ...)
cssd/                ← CSS daemon logs
ohasd/               ← Oracle High Availability Services logs
ctssd/               ← Cluster Time Sync Service logs

Since TFA was broken, we collected manually:

On node 1:

tar cvf /tmp/crstrace.racnode1.$(date +%Y%m%d%H%M%S).tar \
  /u01/oragrid/11.2/grid/log/racnode1/crsd/crsd.log \
  /u01/oragrid/11.2/grid/log/racnode1/crsd/crsd.l01 \
  /u01/oragrid/11.2/grid/log/racnode1/crsd/crsdOUT.log \
  /u01/oragrid/11.2/grid/log/racnode1/alertracnode1.log \
  /u01/oragrid/11.2/grid/log/racnode1/cssd/ \
  /u01/oragrid/11.2/grid/log/racnode1/ohasd/

zip /tmp/crstrace.racnode1.zip /tmp/crstrace.racnode1.*.tar

On node 2:

ssh racnode2 "tar cvf /tmp/crstrace.racnode2.$(date +%Y%m%d%H%M%S).tar \
  /u01/oragrid/11.2/grid/log/racnode2/crsd/crsd.log \
  /u01/oragrid/11.2/grid/log/racnode2/crsd/crsd.l01 \
  /u01/oragrid/11.2/grid/log/racnode2/alertracnode2.log \
  /u01/oragrid/11.2/grid/log/racnode2/cssd/ \
  /u01/oragrid/11.2/grid/log/racnode2/ohasd/ && \
  zip /tmp/crstrace.racnode2.zip /tmp/crstrace.racnode2.*.tar"

scp racnode2:/tmp/crstrace.racnode2.zip /tmp/

Note: The crsd logs use a rotating format (.log, .l01, .l02 …) — not .trc files. The incident-period data was in crsd.l01.

Step 3 — Diagnosing TFA-00002

With the SR logs uploaded, we turned to fixing TFA. Here’s what we found:

tfactl status
# TFA-00002 Oracle Trace File Analyzer (TFA) is not running
# TFA-00107 TFA failed to start after multiple attempts of start (retries from init.tfa)

Checking AHF Installation Layout

cat /etc/oracle.ahf.loc
# /opt/oracle.ahf

cat /opt/oracle.ahf/install.properties
# AHF_HOME=/opt/oracle.ahf
# BUILD_VERSION=2603000
# BUILD_DATE=202604061821
# TFA_HOME=/opt/oracle.ahf/tfa
# DATA_DIR=/u01/oragrid/oracle/oracle.ahf/data

The AHF binaries were at /opt/oracle.ahf/ and data at /u01/oragrid/oracle/oracle.ahf/data/ — a non-default split layout.

The Actual Error — AHF-07250

Checking the systemd journal revealed the real error:

journalctl -u oracle-tfa --no-pager | tail -20

init.tfa: AHF-07250: Cannot establish connection with TFA Server.
init.tfa: Cause: Cannot establish connection with TFA server on 5000.
init.tfa: Action: Ensure that communication is open on port 5000 and
          that no firewall is blocking port 5000.
init.tfa: ERROR: TFAMain is spawning too fast, Human intervention required!!!
init.tfa: Disabling TFA at : ...

What We Ruled Out

Check	Result
Port 5000 blocked by iptables	Not blocked — policy ACCEPT
SELinux enforcing	Disabled
Java missing/incompatible	Java 11.0.30 — fine
Disk space	16GB free on /, 410GB on /u01
portmapping.txt / ssl.properties missing	Missing — but not the root cause

The TFA Java process was crashing before it could bind to port 5000. The AHF upgrade had left TFA in an unrecoverable broken state on both nodes.

Attempted Fix — tfactl syncnodes

tfactl syncnodes
# Generating new TFA Certificates...
# Successfully generated certificates.
# ...
# TFA-00002 Oracle Trace File Analyzer (TFA) is not running

Certificates were synced successfully but TFA still wouldn’t start. The issue was deeper than certificate mismatches.

Step 4 — The Fix: Clean AHF Reinstall

Uninstall on node 1

ahfctl uninstall -local
# AHF will be uninstalled on: racnode1
# Do you want to continue with AHF uninstall ? [Y]|N : Y
# ...
# CHA is disabled

Note: Uninstalling AHF does NOT remove the data/repository directory, so historical collections and diag data are preserved.

Download AHF Installer

Download AHF-LINUX_v26.x.x.zip from My Oracle Support and stage it to /tmp/ on node 1.

🔗 MOS Doc ID 2550798.1 — Autonomous Health Framework (AHF) Download

Reinstall on both nodes from node 1

unzip /tmp/AHF-LINUX_v26.3.0.zip -d /tmp/ahf_install
cd /tmp/ahf_install
./ahf_setup -ahf_loc /opt/oracle.ahf -data_dir /u01/oragrid/oracle/oracle.ahf/data

Answer N to email notification. Answer Y to install on cluster nodes when prompted.

Node 2 needed a separate local reinstall

The cluster-wide install didn’t fully fix node 2. We reinstalled locally using the -local flag:

# From node 1
scp /tmp/AHF-LINUX_v26.3.0.zip racnode2:/tmp/

ssh racnode2 "ahfctl uninstall -local"

ssh racnode2 "unzip /tmp/AHF-LINUX_v26.3.0.zip -d /tmp/ahf_install && \
  cd /tmp/ahf_install && \
  ./ahf_setup -ahf_loc /opt/oracle.ahf \
  -data_dir /u01/oragrid/oracle/oracle.ahf/data -local"

The -local flag skips cluster coordination and installs cleanly on the local node only.

Final Verification

tfactl print status

| Host     | Status of TFA | PID   | Port | Version    | Inventory Status |
|----------|---------------|-------|------|------------|------------------|
| racnode1 | RUNNING       |  6355 | 5000 | 26.3.0.0.0 | COMPLETE         |
| racnode2 | RUNNING       | 28301 | 5000 | 26.3.0.0.0 | COMPLETE         |

Both nodes RUNNING with COMPLETE inventory status. ✅

Summary

Problem	Root Cause	Fix
CRS alert log not found at ADR path	11.2 GI uses $GRID_HOME/log/hostname/ not ADR	Collect from $GRID_HOME/log/ directly
TFA-00002 on both nodes	AHF upgrade left TFA in broken state	Clean uninstall + reinstall of AHF 26.3.0
TFA not starting after syncnodes	Deeper corruption beyond cert mismatch	Full reinstall with -local flag on each node

Key Commands Reference

# Find CRS alert log on 11.2 GI
ls $GRID_HOME/log/$(hostname)/alert$(hostname).log

# Collect CRS logs manually
tar cvf /tmp/crstrace.$(hostname).tar \
  $GRID_HOME/log/$(hostname)/crsd/crsd.log \
  $GRID_HOME/log/$(hostname)/crsd/crsd.l01 \
  $GRID_HOME/log/$(hostname)/alert$(hostname).log \
  $GRID_HOME/log/$(hostname)/cssd/ \
  $GRID_HOME/log/$(hostname)/ohasd/

# Check TFA status
tfactl print status

# Check actual TFA error
journalctl -u oracle-tfa --no-pager | tail -30

# Uninstall AHF
ahfctl uninstall -local

# Reinstall AHF (cluster-wide)
./ahf_setup -ahf_loc /opt/oracle.ahf -data_dir <data_dir>

# Reinstall AHF (local node only)
./ahf_setup -ahf_loc /opt/oracle.ahf -data_dir <data_dir> -local

# Collect TFA diagnostics for Support
tfactl diagnosetfa

References

13/04/2026

Oracle EBS 12.2 — ADOP fs_clone Failure: Failed to Delete FMW_Home (Root Cause & Fix)

Category: Oracle EBS 12.2 | Topic: ADOP Patching | Difficulty: Intermediate | Oracle Support: Search ADOP fs_clone Failed to delete FMW_Home on My Oracle Support

Introduction

Oracle EBS 12.2 introduced Online Patching (ADOP), which relies on a dual file system architecture — a Run File System (fs2) where production runs, and a Patch File System (fs1) where patches are applied. The fs_clone phase synchronises fs1 from fs2 at the start of each patching cycle, making fs1 a fresh copy of the production file system.

One of the most common issues encountered during fs_clone is a failure while trying to delete the FMW_Home directory on the Patch FS. This blog walks through a real production scenario on a 2-node RAC database with 4 application server nodes — covering the exact error, step-by-step diagnostic process, root cause identification, fix applied, and the final successful run with actual timings. All server-specific details have been anonymised.

This issue applies to Oracle EBS 12.2.x on all platforms. For related Oracle Support articles, search “ADOP fs_clone Failed to delete FMW_Home” on My Oracle Support.

Environment

Parameter	Value
Application	Oracle E-Business Suite 12.2 (2-Node RAC DB + 4 Application Server Nodes)
ADOP Version	C.Delta.13
ADOP Session ID	129
Run File System (fs2)	/u01/app/fs2 (Production — active)
Patch File System (fs1)	/u01/app/fs1 (Inactive — patching target)
Shared Storage	NFS-mounted shared volume (1.3T, 447G free)
OS User	applmgr

Incident Timeline

Event	Timestamp	Duration	Outcome
1st run started	Apr 11, 2026 17:07:35	—	time adop phase=fs_clone executed
1st run failed	Apr 11, 2026 ~18:41	~1h 34m	FATAL ERROR — FMW_Home deletion failed
Diagnosis performed	Apr 11, 2026 18:45–19:44	~59m	Root cause identified — root-owned OHS log files
Fix applied (mv)	Apr 11, 2026 ~19:44	Seconds	FMW_Home renamed to dated backup as applmgr
2nd run started	Apr 11, 2026 19:45	—	time adop phase=fs_clone re-executed after fix
2nd run completed	Apr 12, 2026 00:49:08	5h 21m 25s	SUCCESS — all 4 app nodes completed ✅
Total session elapsed	Apr 11 17:07 → Apr 12 00:49	7h 46m 33s	Full session including failed run + fix + retry

The Issue

During time adop phase=fs_clone, the synchronisation process was progressing normally — staging the file system clone, detaching Oracle Homes, removing APPL_TOP and COMM_TOP — until it reached stage 6 (REMOVE-1012-ORACLE-HOME) inside the removeFMWHome() function, where it attempted to delete the FMW_Home directory on the Patch FS and hit a fatal error.

Error on the Console

fs_clone/remote_execution_result_level1.xml:
*******FATAL ERROR*******
PROGRAM : (.../fs2/EBSapps/appl/ad/12.0.0/patch/115/bin/txkADOPPreparePhaseSynchronize.pl)
TIME    : Apr 11 18:41:40 2026
FUNCTION: main::removeDirectory [ Level 1 ]
ERRORMSG: Failed to delete the directory /u01/app/fs1/FMW_Home.
[UNEXPECTED]fs_clone has failed

Key Log File: txkADOPPreparePhaseSynchronize.log

The primary log file is located at:

$ADOP_LOG_DIR/<session_id>/<timestamp>/fs_clone/<node>/TXK_SYNC_create/
    txkADOPPreparePhaseSynchronize.log

Inside this log, the clone status progression was clearly visible:

========================== Inside getCloneStatus()... ==========================
clone_status             = REMOVE-1012-ORACLE-HOME
clone_status_from_caller = 7
clone_status_from_db     = 6
Removing the directory: /u01/app/fs1/FMW_Home
Failed to delete the directory /u01/app/fs1/FMW_Home.
*******FATAL ERROR*******
FUNCTION: main::removeDirectory [ Level 1 ]
ERRORMSG: Failed to delete the directory /u01/app/fs1/FMW_Home.

clone_status_from_db = 6 indicates the process had already completed: fs_clone staging, detach of Oracle Homes, removal of APPL_TOP, COMM_TOP, and 10.1.2 Oracle Home. It failed specifically and only while removing FMW_Home.

ADOP fs_clone Stage Flow

Stage DB	Clone Status	Description
1	STARTED	Session initialised
2	FSCLONESTAGE-DONE	File system staging completed
3	DEREGISTER-ORACLE-HOMES	Oracle Homes deregistered from inventory
4	REMOVE-APPL-TOP	APPL_TOP removed from Patch FS
5	REMOVE-COMM-TOP	COMM_TOP removed from Patch FS
6	REMOVE-1012-ORACLE-HOME	Removing FMW_Home — ❌ FAILED HERE
7+	(clone proceeds…)	Clone fs2 to fs1, re-register homes, config clone

Diagnostic Steps

Step 1 — Confirm You Are on the Correct File System

Most critical check first. The target must be on Patch FS (fs1), never Run FS (fs2):

echo $FILE_EDITION   # Must show: run
echo $RUN_BASE       # Must show path to fs2

$ echo $FILE_EDITION
run
$ echo $RUN_BASE
/u01/app/fs2

⚠️ If FILE_EDITION shows patch, stop immediately — source the Run FS environment before proceeding.

Step 2 — Check for Open File Handles

lsof +D /u01/app/fs1/FMW_Home 2>/dev/null
fuser -cu /u01/app/fs1/FMW_Home 2>&1

In our case both commands returned empty output — no active process was holding FMW_Home open.

Step 3 — Identify Root-Owned Files

find /u01/app/fs1/FMW_Home ! -user applmgr -ls 2>/dev/null

Output revealed multiple root-owned files under the OHS instance directories:

drwxr-x---  3 root root 4096 Feb 15 06:48 .../EBS_web_OHS4/auditlogs/OHS
-rw-------  1 root root    0 Feb 15 06:48 .../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/sec_audit_log
-rw-r-----  1 root root 5670 Feb 15 06:49 .../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/EBS_web.log
-rw-r-----  1 root root  249 Feb 15 06:49 .../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/access_log
... (same pattern for EBS_web_OHS2 and EBS_web_OHS3)

All root-owned files were dated February 15 — nearly 2 months stale. This confirmed they were leftovers from OHS being incorrectly started as root during a previous patching cycle. No active process was involved.

Step 4 — Attempt Manual Delete to Confirm the Error

rm -rf /u01/app/fs1/FMW_Home 2>&1 | head -5

rm: cannot remove '.../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/sec_audit_log': Permission denied

This confirmed the issue was purely a file ownership/permission problem — not filesystem corruption or an NFS issue.

Step 5 — Check Disk Space

df -h /u01/app/fs1
Filesystem      Size  Used Avail Use% Mounted on
nfs_server:/vol  1.3T  844G  447G  66% /u01

447GB free — sufficient to retain a backup of FMW_Home by renaming it.

Root Cause Analysis

The root cause was OHS (Oracle HTTP Server) being started as root on the Patch File System during a previous patching cycle in February 2026. This created log and audit files owned by root under:

/u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS2/auditlogs/OHS/
/u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS2/diagnostics/logs/OHS/EBS_web/
/u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS3/  (same structure)
/u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS4/  (same structure)

Since fs_clone runs as applmgr, and applmgr cannot delete files owned by root, the removeDirectory() function in txkADOPPreparePhaseSynchronize.pl failed with Permission Denied — surfaced as a fatal error.

Why did OHS create root-owned files? If OHS start/stop scripts are executed as root or with sudo (instead of using applmgr-owned wrapper scripts), the resulting log and audit files are created with root ownership and persist on the Patch FS across patching cycles.

Pre-Action Safety Checklist

Check	Expected	Result
FILE_EDITION = run	run	✅ PASS
RUN_BASE points to fs2	/u01/app/fs2	✅ PASS
FMW_Home target is on fs1 (Patch FS only)	fs1 only	✅ PASS
lsof returns empty (no open handles)	Empty	✅ PASS
Root-owned files are stale (no active processes)	Stale only	✅ PASS
Sufficient disk space for backup rename	> 50GB free	✅ PASS
Production services confirmed running on fs2	fs2 up	✅ PASS

Solution — Move FMW_Home as Backup

The safest approach on production is to move (rename) FMW_Home rather than deleting it. This avoids the need for root access entirely, completes in seconds, and preserves a backup.

Why mv works even with root-owned files: mv on the same filesystem is a purely atomic rename at the directory level. It does not touch or modify any file contents inside the directory — so applmgr can rename FMW_Home even if files inside are owned by root. This is fundamentally different from rm -rf, which must access and remove each individual file.

Step 1 — Move FMW_Home as a Dated Backup

mv /u01/app/fs1/FMW_Home /u01/app/fs1/FMW_Home_$(date +%d%b%Y)_bkp && echo "MOVE SUCCESSFUL"

MOVE SUCCESSFUL

Step 2 — Verify FMW_Home Is Gone

ls -lrt /u01/app/fs1/

Step 3 — Confirm You Are applmgr Before Retrying

whoami
# Expected output: applmgr

⚠️ Never run adop as root. Always confirm whoami shows applmgr before executing any adop command.

Step 4 — Retry fs_clone

time adop phase=fs_clone

Running fs_clone Safely on Production

time adop phase=fs_clone on a 2-node RAC with 4 application server nodes takes several hours. Never run it in a plain SSH/PuTTY session that could disconnect. Use one of the following:

VNC Session (Best): Network drops have zero impact on the running process.
nohup: nohup adop phase=fs_clone > /tmp/fsclone_$(date +%Y%m%d_%H%M%S).log 2>&1 &
screen: screen -S fsclone then time adop phase=fs_clone. Detach with Ctrl+A D, reattach with screen -r fsclone.

Successful Run — 2nd Attempt

After applying the fix, time adop phase=fs_clone was re-executed. The adopmon output confirmed all 4 application nodes progressing through validation, port blocking, clone steps, and config clone phases without any errors.

ADOP (C.Delta.13)
Session Id: 129
Command:    status
Node Name   Node Type  Phase        Status     Started               Finished              Elapsed
----------  ---------  -----------  ---------  --------------------  --------------------  -------
app-node1   master     FS_CLONE     COMPLETED  2026/04/11 17:07:35   2026/04/12 00:49:08   7:46:33
app-node2   slave      CONFIG_CLONE COMPLETED  2026/04/11 17:07:36   2026/04/12 01:01:55   7:47:19
app-node3   slave      CONFIG_CLONE COMPLETED  2026/04/11 17:07:36   2026/04/12 01:01:25   7:47:49
app-node4   slave      CONFIG_CLONE COMPLETED  2026/04/11 17:07:36   2026/04/12 01:02:16   7:47:40
File System Synchronization Type: Full
adop exiting with status = 0 (Success)

Summary report for current adop session:
    Node app-node1:  - Fs_clone status: Completed successfully
    Node app-node2:  - Fs_clone status: Completed successfully
    Node app-node3:  - Fs_clone status: Completed successfully
    Node app-node4:  - Fs_clone status: Completed successfully
adop exiting with status = 0 (Success)
real    321m25.733s   (5 hours 21 minutes 25 seconds)
user     40m1.142s
sys      70m59.804s

Node	Type	Started	Finished	Elapsed
app-node1	Master	Apr 11, 2026 17:07:35	Apr 12, 2026 00:49:08	7h 46m 33s
app-node2	Slave	Apr 11, 2026 17:07:36	Apr 12, 2026 01:01:55	7h 47m 19s
app-node3	Slave	Apr 11, 2026 17:07:36	Apr 12, 2026 01:01:25	7h 47m 49s
app-node4	Slave	Apr 11, 2026 17:07:36	Apr 12, 2026 01:02:16	7h 47m 40s

The 2nd run completed cleanly in 5 hours 21 minutes 25 seconds across all 4 application nodes. File System Synchronization Type: Full.

Post-Resolution Cleanup

After a successful fs_clone and full patching cycle, old FMW_Home backups can be removed. Keep the most recent backup until the next patching cycle completes, then clean up older ones as root (since they may contain root-owned files):

ls -lrt /u01/app/fs1/FMW_Home*
du -sh /u01/app/fs1/FMW_Home*
# Remove old backups as root
sudo rm -rf /u01/app/fs1/FMW_Home_<old_date>_bkp

Prevention — Avoiding Recurrence

Never start OHS as root. Always use applmgr-owned wrapper scripts. Never use sudo or root to run adohs.sh or adadminsrvctl.sh.
Post-patching ownership check. After every adop finalize/cutover, run: find /u01/app/fs1 ! -user applmgr -ls 2>/dev/null | head -20
Pre-fs_clone health check. Verify no lingering adop sessions, confirm Run FS services are healthy, check disk space, and verify no root-owned files under fs1/FMW_Home before starting.

Summary

Item	Detail
Phase	adop phase=fs_clone
Failing Function	main::removeDirectory inside removeFMWHome()
Clone Stage	clone_status_from_db = 6 (REMOVE-1012-ORACLE-HOME)
Root Cause	OHS started as root in a previous cycle — stale root-owned OHS log/audit files blocking applmgr deletion
Production Impact	None — fs1 is Patch FS, production ran on fs2 throughout
Fix Applied	mv FMW_Home to dated backup as applmgr — atomic rename, no root needed, completed in seconds. rm -rf was NOT used.
1st Run Duration	~1h 34m before fatal error (Apr 11 17:07 → 18:41)
2nd Run Duration	5h 21m 25s — completed successfully (Apr 11 19:45 → Apr 12 00:49)
Total Session Elapsed	7h 46m 33s (including failed run, diagnosis, fix, and retry)
Final Status	adop exiting with status = 0 (Success) — all 4 app nodes completed ✅
Prevention	Never start OHS as root; add post-patching ownership check to runbook
Oracle Support	Search “ADOP fs_clone Failed to delete FMW_Home” on My Oracle Support

Happy Debugging! All server-specific details have been anonymised. The diagnostic commands and fix are generic and applicable to any Oracle EBS 12.2.x environment. If this helped you, feel free to share with the community.

12/04/2026

Stuck Archiver on Oracle 11g RAC: How a Full FRA Brought Down Archiving — and How We Fixed It Live

Introduction

One of the most disruptive incidents a DBA can face in a production RAC environment is a Stuck Archiver — a condition where Oracle’s archiver process (ARCn) is unable to archive online redo logs, effectively stalling all database activity. Transactions pile up, applications hang, and the business takes a hit.

This post documents a real-world Stuck Archiver incident on a two-node Oracle 11g RAC cluster, caused by Fast Recovery Area (FRA) exhaustion. I’ll walk through the exact symptoms, diagnosis queries, root cause analysis, and the live fix — along with preventive measures to ensure this doesn’t repeat.

Environment Overview

Component	Details
Database	Oracle 11g R2 (11.2.0.4.0)
Architecture	RAC — 2 Nodes
Database Role	PRIMARY — READ WRITE, ARCHIVELOG mode
OS	Linux (x86_64)
FRA Location	ASM Diskgroup +ARCHIVELOG
FRA Size (pre-incident)	199 GB
FRA Size (post-fix)	400 GB

The Incident: What We Saw

The alert came in the early hours of the morning. Applications were hanging, users were reporting timeouts, and both RAC instances had entered an INTERMEDIATE state. The alert log was flooded with:

ORA-00257: archiver error. Connect internal only, until freed.
ARC0: Error 19809 Creating archive log file to '<FRA_PATH>'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance PRODDB1 - Archival Error

The archiver was stuck on both instances, and Oracle had locked down the database to internal connections only. The incident was open for approximately 45 minutes before the fix was applied and applications recovered.

Step 1: Check Redo Log Status

First priority — verify the state of online redo log groups. If LGWR cannot overwrite ACTIVE logs because they are unarchived, the database will hang regardless of any other fix.

SELECT group#, status, archived, members FROM v$log ORDER BY group#;

GROUP#   STATUS           ARCHIVED   MEMBERS
------   ---------------  ---------  -------
1        ACTIVE           NO         2
2        CURRENT          NO         2
3        INACTIVE         YES        2
4        INACTIVE         YES        2

Groups 1 and 2 needed archiving. LGWR cannot overwrite ACTIVE or CURRENT redo logs until they are archived — this confirmed the database hang was directly caused by the stuck archiver.

Step 2: Confirm the Archiver Status

-- Check both RAC instances
SELECT inst_id, instance_name, host_name, version,
       database_role, open_mode, log_mode
FROM gv$instance ORDER BY inst_id;

INST_ID  INSTANCE_N  VERSION       DATABASE_ROLE  OPEN_MODE   LOG_MODE
-------  ----------  ------------  -------------  ----------  ----------
      1  PRODDB1     11.2.0.4.0    PRIMARY        READ WRITE  ARCHIVELOG
      2  PRODDB2     11.2.0.4.0    PRIMARY        READ WRITE  ARCHIVELOG

SELECT instance_name, status, archiver FROM v$instance;

INSTANCE_NAME    STATUS       ARCHIVER
---------------- ------------ --------
PRODDB1          OPEN         STOPPED

ARCHIVER = STOPPED confirmed it. Both RAC instances were in INTERMEDIATE state — the entire cluster was effectively down for application workloads.

Step 3: Check FRA Usage

SELECT
    space_limit / (1024*1024*1024)       AS limit_gb,
    space_used  / (1024*1024*1024)       AS used_gb,
    space_reclaimable / (1024*1024*1024) AS reclaimable_gb,
    ROUND((space_used / space_limit) * 100, 2) AS pct_used
FROM v$recovery_file_dest;

LIMIT_GB   USED_GB   RECLAIMABLE_GB   PCT_USED
--------   -------   --------------   --------
199        198.99     0.00            100.00

100% FRA utilization with zero reclaimable space. The FRA on ASM diskgroup +ARCHIVELOG was completely full with 3,094 archived log files. Before proceeding, the ASM diskgroup was checked — +ARCHIVELOG had 299 GB free, confirming sufficient underlying storage for a size increase.

Step 4: Identify What Was Consuming the FRA

SELECT file_type,
    ROUND(space_used / (1024*1024*1024), 2)        AS used_gb,
    ROUND(space_reclaimable / (1024*1024*1024), 2) AS reclaimable_gb,
    number_of_files
FROM v$recovery_area_usage
ORDER BY space_used DESC;

FILE_TYPE       USED_GB   RECLAIMABLE_GB   NUMBER_OF_FILES
-----------     -------   --------------   ---------------
ARCHIVED LOG    38.50     0.05             1,200+
BACKUP PIECE    10.20     0.07             45
FLASHBACK LOG    1.30     0.00             12

Over 1,200 archived log files had piled up because the RMAN ARCH backup job had been silently failing for multiple nights.

Step 5: Redo Volume Analysis — Identifying the Nightly Batch Spike

SELECT TO_CHAR(first_time,'YYYY-MM-DD HH24') AS hour,
       COUNT(*) AS logs,
       ROUND(SUM(blocks*block_size)/1024/1024/1024,2) AS gb
FROM v$archived_log
WHERE dest_id=1
GROUP BY TO_CHAR(first_time,'YYYY-MM-DD HH24')
ORDER BY 1;

Hour	Archived Logs	GB Generated
Daytime (typical)	2–26	< 1.3 GB
01:00 AM (nightly)	440–446	~31 GB ⚠
03:00–04:00 AM	191–411	11–28 GB ⚠
05:00–06:00 AM	537–796	34–50 GB ⚠

Clear Pattern — Nightly Batch Job Is the Culprit: Every night between 01:00 AM and 06:00 AM, redo generation spikes to 30–50 GB per hour. Daytime activity rarely exceeds 0.5 GB/hour. With the RMAN ARCH backup silently failing for multiple nights, these logs were never backed up or removed — they simply accumulated until the FRA hit 100%.

The Resolution — Three Actions

Action 1: Manually Delete Archive Logs from ASM

Because RMAN could not run against a full FRA and there was no reclaimable space, older archived logs were manually deleted via ASMCMD:

su - grid
asmcmd

ASMCMD> ls +ARCHIVELOG/PRODDB/ARCHIVELOG/
ASMCMD> rm +ARCHIVELOG/PRODDB/ARCHIVELOG/2026_04_04/*
ASMCMD> rm +ARCHIVELOG/PRODDB/ARCHIVELOG/2026_04_05/*
# Recent logs retained to preserve recoverability

After ASMCMD deletion, the RMAN catalog was resynced:

RMAN> CROSSCHECK ARCHIVELOG ALL;
RMAN> DELETE EXPIRED ARCHIVELOG ALL;

Important: ASMCMD deletion removes files from ASM but the control file still holds references. Always follow up with CROSSCHECK and DELETE EXPIRED to leave the catalog in a consistent state.

Action 2: Increase FRA Size Online

-- Increase FRA from 199GB to 400GB (online, no restart required)
ALTER SYSTEM SET db_recovery_file_dest_size = 400G SCOPE=BOTH SID='*';

SID='*' applies the change to both RAC instances simultaneously. Immediately after, the archiver restarted automatically:

ARC1: Archival started
ARC2: Archival started
ARC3: Archival started
ORACLE Instance PRODDB1 - Archival Restarted

Applications reconnected and transactions resumed within seconds.

Action 3: Verify Database and Cluster Status

SELECT instance_name, archiver FROM v$instance;
-- PRODDB1   STARTED

SELECT round(space_used/1024/1024/1024,2) used_gb,
       round(space_limit/1024/1024/1024,2) limit_gb,
       round((space_used/space_limit)*100,2) used_pct
FROM v$recovery_file_dest;
-- 202.80 GB / 400 GB / 50.70%

SELECT group#, status, archived FROM v$log ORDER BY group#;
-- All groups INACTIVE/YES except CURRENT group

Full CRS cluster check via crsctl status res -t confirmed all resources ONLINE — both instances Open, TAF service live on both nodes, all ASM diskgroups healthy. Note: ora.gsd OFFLINE is expected for Oracle 11g RAC — it is a legacy component not required in this configuration.

Root Cause Confirmed: Archive Log Backups Were Not Running

With the database stable, investigation confirmed the RMAN ARCH backup had been silently failing for multiple nights. The cron job fired the script on schedule, but without Oracle environment variables set in the cron context (ORACLE_SID, ORACLE_HOME, PATH all unset), the script exited immediately without connecting to RMAN. The log at /tmp/RMAN_ARCH.log appeared empty — no RMAN error, nothing surfacing to the team.

Complete Incident Chain:
ARCH backup silently fails (missing Oracle env vars in cron) → 3,094 archived logs accumulate over multiple nights → Heavy batch redo (30–50 GB/hr, 01:00–06:00 AM) → FRA hits 100% (198.99 GB / 199 GB) → ARCn cannot write → Both instances enter INTERMEDIATE state → ORA-00257 → Applications hang → Archive logs manually deleted via ASMCMD → FRA expanded to 400 GB → Archiver restarts → Cron fix applied → ARCH backup reruns successfully → FRA space recovered → Cluster fully healthy

Fix: Correct the Cron Environment

Broken cron entry:

05 * * * * /home/oracle/scripts/RMAN_Hotbackup.ksh ARCH > /tmp/RMAN_ARCH.log 2>&1

Fixed cron entry:

05 * * * * . /home/oracle/.bash_profile && /home/oracle/scripts/RMAN_Hotbackup.ksh ARCH > /tmp/RMAN_ARCH.log 2>&1

ARCH Backup Reruns and Completes

After the cron fix, the ARCH backup was immediately re-run manually to clear the remaining backlog and validate the fix end-to-end:

. /home/oracle/.bash_profile
/home/oracle/scripts/RMAN_Hotbackup.ksh ARCH

The backup completed successfully. RMAN backed up accumulated archived logs, marked them as obsolete per retention policy, and removed them from the FRA — freeing additional space beyond what the manual ASMCMD deletion had already recovered.

Preventive Measures Put in Place

Fix the cron environment — all RMAN jobs now source ~/.bash_profile before executing.
RMAN exit code alerting — script updated to email the DBA team on any non-zero RMAN exit status.
FRA monitoring at 70% threshold — SQL script runs every 30 minutes via cron and alerts when FRA crosses 70%, giving enough runway before the database is impacted.
FRA sized on measured data — 400 GB based on actual peak batch redo (30–50 GB/hr), not an arbitrary estimate.
RAC: always use SID=’*’ — and always verify ASM diskgroup free space before increasing the FRA limit.

Key Takeaways for Oracle DBAs

ORA-00257 always points to the FRA being full — check v$recovery_file_dest and v$recovery_area_usage immediately.
When RMAN cannot run against a full FRA, use ASMCMD to manually delete older archive logs — then run CROSSCHECK and DELETE EXPIRED to resync the catalog.
db_recovery_file_dest_size is dynamic — increase it online without a restart. Always confirm ASM diskgroup free space first.
Use SID='*' in RAC — applies the parameter change to all instances simultaneously.
The archiver restarts automatically after FRA space is freed — no manual ARCn intervention needed.
Silent cron failures are dangerous — always validate Oracle environment variables in cron context, and capture RMAN exit codes with alerting.
Monitor FRA proactively at 70% — this gives enough runway for corrective action before hitting 100%.
After a manual ASMCMD deletion, always follow up with a proper RMAN backup run to leave the FRA in a consistent state.
Analyse redo generation patterns — v$archived_log reveals batch spikes that should directly inform FRA sizing decisions.

Conclusion

A Stuck Archiver on a production RAC database is a high-severity incident, but a structured approach resolves it quickly. The immediate fix required two steps: manually deleting archived logs via ASMCMD to unblock the archiver, and increasing db_recovery_file_dest_size online to prevent immediate recurrence.

The deeper fix was identifying why the RMAN ARCH backup had been silently failing for multiple nights — a missing Oracle environment in the cron context. Once found, the fix was a single line in crontab. Rerunning the backup manually and watching it complete successfully, with FRA usage dropping as a result, confirmed the incident was fully resolved rather than just patched.

The combination of a corrected cron environment, RMAN exit code alerting, and proactive FRA monitoring at 70% ensures this class of incident does not repeat.

Syed Anwar Ahmed | Oracle Apps DBA

11/04/2026

Step-by-Step Guide: Applying Oracle 19c CPU Patch – January 2026 (Patch 38658587)

This guide walks through the complete process of applying the Oracle 19c January 2026 Critical Patch Update (CPU) in a CDB/PDB environment. The bundle includes the following patches:

Bundle Patch 38658587 — January 2026 Oracle Database 19c Bundle Patch
DBSU Patch 38632161 — Database Release Update: 19.30.0.0.260120
OJVM Patch 38523609 — OJVM Release Update: 19.30.0.0.260120
Overlay Patch 34672698 — Required overlay/conflict resolution patch

Step 1: Capture Database Details

echo $ORACLE_SID
sqlplus / as sysdba
select name from v$database;
show pdbs;

Step 2: Pre-Patch Checks (CDB and PDB Level)

sqlplus / as sysdba

-- Check invalid objects
Set lines 230
set pages 100
col OBJECT_NAME for a40
col OBJECT_TYPE for a40
col OWNER for a20
select OBJECT_NAME, OBJECT_TYPE, OWNER,
       to_char(LAST_DDL_TIME,'DD-MON-YYYY HH24:MI:SS'), status
from dba_objects
where status='INVALID'
order by owner;

-- Check patch registry
col ACTION_TIME for a30
set pages 100
set lines 300
col status for a15
col description for a60
col ACTION for a15
select patch_id,action,status,action_time,description
from dba_registry_sqlpatch;

-- Check component versions
COL comp_name FOR a44 HEA 'Component'
COL version FOR a17 HEA 'Version'
COL status FOR a17 HEA 'Status'
col COMP_ID for a12
SELECT COMP_ID,comp_name, version, status FROM dba_registry;

Step 3: Comment Out Crontab Entries

(/usr/bin/crontab -l | /bin/sed 's/^/##JAN_2026##&/g' | /usr/bin/crontab)

Step 4: Place OEM Blackout (If Applicable)

If Oracle Enterprise Manager (OEM) is monitoring this database, place a blackout to suppress false alerts during the patching window.

Step 5: Shutdown Database and Take Oracle Home Backup

Bring down the database and listener, then take a TAR backup of the Oracle Home before applying any patches. This is your rollback point.

Step 6: Apply DBSU Patch 38632161

export PATH=$ORACLE_HOME/OPatch:$PATH
cd /<PATCH_STAGING_DIR>/JAN_2026/38658587/38632161

opatch prereq CheckConflictAgainstOHWithDetail -ph ./
opatch rollback -id 34672698
opatch prereq CheckConflictAgainstOHWithDetail -ph ./
opatch apply
opatch lsinventory | grep 38632161

Step 7: Apply OJVM Patch 38523609

export PATH=$ORACLE_HOME/OPatch:$PATH
cd /<PATCH_STAGING_DIR>/JAN_2026/38658587/38523609

opatch prereq CheckConflictAgainstOHWithDetail -ph ./
opatch apply
opatch lsinventory | grep 38523609
opatch lspatches

Step 8: Change oradism File Permissions (Pre-Overlay)

# Run as root
chown oracle:oinstall /<ORACLE_HOME>/bin/oradism
chmod 755 /<ORACLE_HOME>/bin/oradism

Step 9: Apply Overlay Patch 34672698

export PATH=$ORACLE_HOME/OPatch:$PATH
cd /<PATCH_STAGING_DIR>/JAN_2026/34672698

opatch prereq CheckConflictAgainstOHWithDetail -ph ./
opatch apply
opatch lsinventory | grep 34672698

Step 10: Revert oradism File Permissions

# Run as root
chown root:oinstall /<ORACLE_HOME>/bin/oradism
chmod 4750 /<ORACLE_HOME>/bin/oradism

Step 11: Start Database and Listener

sqlplus / as sysdba
startup;
alter pluggable database all open;
exit;

echo $TNS_ADMIN
lsnrctl start LISTENER

Step 12: Run Datapatch

cd $ORACLE_HOME/OPatch
./datapatch -verbose

Step 13: Final Checks and Remove OEM Blackout

Confirm the database and listener are up and running, then remove the OEM blackout if one was placed in Step 4.

Step 14: Re-enable Crontab Jobs

(/usr/bin/crontab -l | /bin/sed 's/##JAN_2026##//g' | /usr/bin/crontab)

Step 15: Post-Patch Validation — Real Output

After patching, verify the installed patches using OPatch. Below is the actual output from a production patching run:

[oracle@<HOSTNAME> OPatch]$ opatch lspatches
34672698;ORA-00800 SOFT EXTERNAL ERROR, ARGUMENTS [SET PRIORITY FAILED], [VKTM], DISM(16)
38523609;OJVM RELEASE UPDATE: 19.30.0.0.260120 (38523609)
38632161;Database Release Update : 19.30.0.0.260120 (38632161)
29585399;OCW RELEASE UPDATE 19.3.0.0.0 (29585399)
OPatch succeeded.

Verify patch registry — confirm patches 38523609 and 38632161 show APPLY SUCCESS:

PATCH_ID   ACTION    STATUS   ACTION_TIME                    DESCRIPTION
---------- --------- -------- ------------------------------ -----------------------------------------------------------
  38194382 ROLLBACK  SUCCESS  03-FEB-26 02.39.24.188908 PM   OJVM RELEASE UPDATE: 19.29.0.0.251021 (38194382)
  38523609 APPLY     SUCCESS  03-FEB-26 02.39.39.457321 PM   OJVM RELEASE UPDATE: 19.30.0.0.260120 (38523609)
  38632161 APPLY     SUCCESS  03-FEB-26 02.40.30.102001 PM   Database Release Update : 19.30.0.0.260120 (38632161)

Verify all 15 database components are VALID:

COMP_ID      Component                                    Version           Status
------------ -------------------------------------------- ----------------- -----------------
CATALOG      Oracle Database Catalog Views                19.0.0.0.0        VALID
CATPROC      Oracle Database Packages and Types           19.0.0.0.0        VALID
RAC          Oracle Real Application Clusters             19.0.0.0.0        OPTION OFF
JAVAVM       JServer JAVA Virtual Machine                 19.0.0.0.0        VALID
XML          Oracle XDK                                   19.0.0.0.0        VALID
CATJAVA      Oracle Database Java Packages                19.0.0.0.0        VALID
APS          OLAP Analytic Workspace                      19.0.0.0.0        VALID
XDB          Oracle XML Database                          19.0.0.0.0        VALID
OWM          Oracle Workspace Manager                     19.0.0.0.0        VALID
CONTEXT      Oracle Text                                  19.0.0.0.0        VALID
ORDIM        Oracle Multimedia                            19.0.0.0.0        VALID
SDO          Spatial                                      19.0.0.0.0        VALID
XOQ          Oracle OLAP API                              19.0.0.0.0        VALID
OLS          Oracle Label Security                        19.0.0.0.0        VALID
DV           Oracle Database Vault                        19.0.0.0.0        VALID
15 rows selected.

Patch History Reference

The table below shows the complete CPU patching history demonstrating consistent quarterly patching from May 2022 through January 2026:

Patch ID	Action	Date	Description
33561310 / 33515361	APPLY	22-MAY-22	OJVM + DB RU 19.14.0.0.220118
34086870 / 34133642	APPLY	02-AUG-22	OJVM + DB RU 19.16.0.0.220719
34411846 / 34419443	APPLY	06-DEC-22	OJVM + DB RU 19.17.0.0.221018
34786990 / 34765931	APPLY	07-MAR-23	OJVM + DB RU 19.18.0.0.230117
35050341 / 35042068	APPLY	04-MAY-23	OJVM + DB RU 19.19.0.0.230418
35354406 / 35320081	APPLY	01-AUG-23	OJVM + DB RU 19.20.0.0.230718
35648110 / 35643107	APPLY	02-JAN-24	OJVM + DB RU 19.21.0.0.231017
36199232 / 36233263	APPLY	13-MAY-24	OJVM + DB RU 19.23.0.0.240416
36414915 / 36582781	APPLY	06-AUG-24	OJVM + DB RU 19.24.0.0.240716
36878697 / 36912597	APPLY	03-DEC-24	OJVM + DB RU 19.25.0.0.241015
37102264 / 37260974	APPLY	04-MAR-25	OJVM + DB RU 19.26.0.0.250121
37499406 / 37642901	APPLY	06-MAY-25	OJVM + DB RU 19.27.0.0.250415
37847857 / 37960098	APPLY	05-AUG-25	OJVM + DB RU 19.28.0.0.250715
38194382 / 38291812	APPLY	04-NOV-25	OJVM + DB RU 19.29.0.0.251021
38523609 / 38632161	APPLY	03-FEB-26	OJVM + DB RU 19.30.0.0.260120

Patch Summary

Patch ID	Description	Type	Status
38658587	Oracle Database 19c Bundle Patch – January 2026	Bundle	SUCCESS
38632161	Database Release Update: 19.30.0.0.260120	Security	SUCCESS
38523609	OJVM Release Update: 19.30.0.0.260120	Component	SUCCESS
34672698	Overlay / Conflict Resolution Patch	Overlay	SUCCESS

Key Takeaways

Always capture pre and post-patch snapshots for comparison
Never skip Datapatch — OPatch alone does not apply SQL-level changes
Revert oradism permissions after patching to avoid security issues
Verify all patch IDs show SUCCESS in dba_registry_sqlpatch after Datapatch
Verify all 15 components show VALID in dba_registry after patching
Maintain a consistent quarterly patching cadence to keep the database current and secure

Written by Syed Anwar Ahmed — Oracle Apps DBA with 11 years of production experience.
Connect: sdanwarahmed@gmail.com | LinkedIn

01/04/2026

Statspack in Oracle 19c: A Real-World Performance Tuning Case Study
In a production Oracle environment running on 19c, a critical performance degradation impacted business operations during peak hours. Despite being on a modern database version, the absence of Diagnostic Pack licensing meant AWR was not available. This case study demonstrates how Statspack was implemented and leveraged to identify the root cause and deliver a ~40% performance improvement using fundamental performance analysis techniques.

Environment Overview

Component Details
Database Version Oracle 19c
Edition Standard Edition (No AWR)
Application Oracle E-Business Suite
Workload Type OLTP (High concurrent users)
Peak Users ~1200 concurrent sessions

Problem Statement

During peak business hours, users experienced intermittent slowness with transactions taking ~3x longer than baseline. There was no historical performance repository due to AWR unavailability. Key constraints included:
- Diagnostic Pack not licensed — AWR disabled
- No ADDM recommendations available
- Production-critical issue requiring immediate resolution
Strategy and Approach

Even in Oracle Database 19c, we relied on core DBA principles: enable Statspack, capture snapshots at 30-minute intervals, and perform comparative analysis between peak vs non-peak workload and before vs during the degradation window.

Implementation Steps

Step 1: Install Statspack

Connect as SYSDBA and run the Statspack creation script. This creates the PERFSTAT schema and all required objects:
```
sqlplus / as sysdba
@?/rdbms/admin/spcreate.sql
```
You will be prompted for the PERFSTAT password, default tablespace, and temporary tablespace. Choose an appropriate tablespace with sufficient space (minimum 100MB recommended for active environments).

Step 2: Automate Snapshot Collection

Schedule automatic snapshots every 30 minutes using the built-in Statspack job:
```
-- Connect as PERFSTAT user
sqlplus perfstat/<password>

-- Take a manual snapshot
EXECUTE statspack.snap;

-- Schedule automatic snapshots (every 30 mins) using spauto.sql
@?/rdbms/admin/spauto.sql

-- Or manually schedule with DBMS_JOB
VARIABLE jobno NUMBER;
BEGIN
  DBMS_JOB.SUBMIT(:jobno,
    'statspack.snap;',
    SYSDATE,
    'SYSDATE + 30/1440');
  COMMIT;
END;
/
```
Step 3: Verify Snapshots
```
-- Check available snapshots
SELECT snap_id, snap_time, snap_level
FROM stats$snapshot
ORDER BY snap_time DESC;

-- Count total snapshots
SELECT COUNT(*) FROM stats$snapshot;
```
Step 4: Generate Statspack Report

Generate a report between two snapshot IDs covering the peak period:
```
sqlplus perfstat/<password>
@?/rdbms/admin/spreport.sql
-- Enter begin and end snapshot IDs when prompted
```
Step 5: Query Top Wait Events Directly
```
SELECT event,
       total_waits,
       time_waited_micro/1000000 time_waited_secs,
       average_wait
FROM stats$system_event se,
     stats$snapshot s
WHERE s.snap_id BETWEEN &begin_snap AND &end_snap
AND se.snap_id = s.snap_id
AND event NOT LIKE 'SQL*Net%'
ORDER BY time_waited_micro DESC
FETCH FIRST 10 ROWS ONLY;
```
Analysis Findings

Key finding: log file sync dominated database time — a clear indication of commit-related contention. No high-cost SQL was identified, ruling out query inefficiency as the primary cause. The load profile showed elevated user commits per second and an increased redo generation rate.

Root Cause

Frequent commits inside application loops (row-by-row processing) caused high log file sync waits, redo log contention, increased I/O latency, and reduced transaction throughput.

Resolution

The application team introduced commit batching to significantly reduce commit frequency — modifying loop logic to commit every N rows rather than every single row. Minor redo log tuning was also performed:
```
-- Check redo log switch frequency (should not exceed 4-5 per hour ideally)
SELECT TO_CHAR(first_time,'YYYY-MM-DD HH24') hour,
       COUNT(*) switches
FROM v$log_history
WHERE first_time > SYSDATE - 1
GROUP BY TO_CHAR(first_time,'YYYY-MM-DD HH24')
ORDER BY 1;

-- Check redo log sizes
SELECT group#, members, bytes/1024/1024 size_mb, status
FROM v$log
ORDER BY group#;
```
Results

Metric Before After
Transaction Time High (3x baseline) Down ~40%
log file sync waits Very High Minimal
User Experience Poor Stable

Purging Old Statspack Data

Statspack data grows quickly in busy environments. Purge old snapshots regularly:
```
-- Purge a range of snapshots
EXECUTE statspack.purge(
  i_begin_snap  => &oldest_snap_id,
  i_end_snap    => &newest_snap_id,
  i_snap_range  => TRUE
);

-- Or use the provided script
@?/rdbms/admin/sppurge.sql
```
Key Takeaways
- Modern DB does not mean automatic performance — even in Oracle 19c, performance issues require structured analysis
- Wait events reveal the truth — hit ratios appeared normal, but wait events exposed the actual bottleneck
- Application design matters — commit strategy directly impacts performance; database tuning alone cannot fix design inefficiencies
- Statspack still matters — highly effective in modern 19c environments, especially without AWR licensing
- Always capture snapshots during peak workload with intervals of 30 minutes or less
Why Not AWR?

Reason Explanation
Licensing Diagnostic Pack not enabled
Cost Constraints Avoid additional licensing overhead
Practicality Statspack provided sufficient insight for this issue

This case reflects real-world experience in Oracle E-Business Suite environments, high-concurrency OLTP systems, and performance tuning under licensing constraints. Have questions? Reach out at sdanwarahmed@gmail.com.
30/03/2026

Component	Details
Database Version	Oracle 19c
Edition	Standard Edition (No AWR)
Application	Oracle E-Business Suite
Workload Type	OLTP (High concurrent users)
Peak Users	~1200 concurrent sessions

Metric	Before	After
Transaction Time	High (3x baseline)	Down ~40%
log file sync waits	Very High	Minimal
User Experience	Poor	Stable

Reason	Explanation
Licensing	Diagnostic Pack not enabled
Cost Constraints	Avoid additional licensing overhead
Practicality	Statspack provided sufficient insight for this issue

Blog

Background

Symptoms

Environment

Step 1 — Identify the Root Cause

Step 2 — Assess Backup Status Before Taking Action

Step 3 — Escalation and Approval

Step 4 — Resolution

Step 5 — Verification

Redo Log Status Check

Incident Summary

Key Lessons Learned

1. Monitor FRA Proactively

2. Always Check Backup Status Before Deleting Archivelogs

3. Investigate the Backup Job

4. Consider an Archivelog Deletion Policy

5. FRA Sizing Review

Conclusion

1. Pre-Checks

2. Download Patches

3. Upgrade OPatch

4. Pre-Patching Activities

5. Shutdown Database and Listener

6. Apply DB Release Update (39034528)

7. Apply OJVM Release Update (38906621)

8. Overlay Patch (34672698) — Conflict with Apr 2026 RU

9. Start Database and Run Datapatch

10. Post-Patching Activities

11. Post-Checks

Key MOS References

💬 Share Your Experience

🔴 Incident Overview

1. The Alerts — Two Failures, Same Root Cause

2. The Investigation — Folder by Folder

Drilling into DBPRC01

3. The RMAN Investigation — Where Things Got Interesting

4. Root Cause — The Architecture Trap

5. Secondary Findings During Investigation

OCT2025 CPU Patch Artifacts (7.7G)

5-Year-Old Pre-Migration Export Dumps

Recovery Catalog Version Mismatch

6. The Fix — Emergency Space Recovery

7. Incident Timeline

8. Permanent Fix Recommendations

Fix 1 — Change Retention Policy to RECOVERY WINDOW

Fix 2 — Add Pre-Backup Space Check to Cron Script

Fix 3 — Upgrade the Recovery Catalog

Fix 4 — Filesystem Monitoring Alert

9. Key Takeaways for Oracle DBAs

10. Commands Reference — Quick Cheat Sheet

Conclusion

Overview

Environment Details

The Problem

Reading the Report File

Error Breakdown

Diagnosis Steps

Step 1 — View the EXTRACT Report

Step 2 — Check EXTRACT Status

Root Cause

Resolution

Post-Recovery Verification

Key Takeaways

Diagnostic Checklist for OGG-01224 TCP/IP Errors

References

About the Author

Common Symptoms

Root Causes

Step-by-Step Solution

Step 1: Remove Unsupported Java Versions

Step 2: Install Java 8

Step 3: Install OpenWebStart

Step 4: Configure Java Security

Step 5: Clear Java Cache

Step 6: Associate JNLP Files with OpenWebStart

Fix for “Application Blocked by Java Security”

Expected Behavior After Configuration

Important Notes

Summary

Conclusion