Blog

  • ORA-00257: Archiver Error in Oracle 19c CDB — Diagnosis and Resolution

    Background

    This post documents a real production incident where an Oracle E-Business Suite (EBS) environment became completely unresponsive due to a full Fast Recovery Area (FRA). Users were reporting that everything was either slow or hanging — a classic sign that something fundamental had broken at the database layer.

    Symptoms

    • EBS application users reporting sessions hanging or extremely slow response
    • sqlplus apps/<password> failing with:
    ORA-00257: Archiver error. Connect AS SYSDBA only until resolved.
    • Non-SYSDBA connections completely blocked
    • 30 FNDLIBR (Concurrent Manager) processes running on the application server — higher than expected for a QA environment

    Environment

    • Oracle Database 19c (CDB/PDB architecture)
    • PDB: Application database PDB
    • EBS 12.2 on AIX
    • FRA configured on ASM diskgroup (+RECO), size 4095 GB
    • No explicit log_archive_dest — archivelogs defaulting to FRA

    Step 1 — Identify the Root Cause

    Connected to the CDB as SYSDBA and confirmed the error:

    SHOW PARAMETER db_recovery_file_dest;
    NAME                        TYPE        VALUE
    --------------------------- ----------- ------
    db_recovery_file_dest       string      +RECO
    db_recovery_file_dest_size  big integer 4095G

    Checked FRA usage:

    COLUMN name           FORMAT A10
    COLUMN limit_gb       FORMAT 999,999.99 HEADING 'LIMIT GB'
    COLUMN used_gb        FORMAT 999,999.99 HEADING 'USED GB'
    COLUMN reclaimable_gb FORMAT 999,999.99 HEADING 'RECLAIMABLE GB'
    COLUMN number_of_files FORMAT 99999     HEADING 'FILES'
    
    SELECT name,
           ROUND(space_limit/1024/1024/1024,2)       limit_gb,
           ROUND(space_used/1024/1024/1024,2)        used_gb,
           ROUND(space_reclaimable/1024/1024/1024,2) reclaimable_gb,
           number_of_files
    FROM   v$recovery_file_dest;

    Output:

    NAME         LIMIT GB     USED GB  RECLAIMABLE GB  FILES
    ---------- ----------- ----------- -------------- ------
    +RECO         4,095.00    4,075.65            .00   2426

    FRA was at 99.5% — 4,075 GB used out of 4,095 GB, zero reclaimable, 2,426 archivelog files.

    This was the root cause. With the FRA full and nothing reclaimable, the archiver (ARCn) process could not write new archive logs, blocking all database activity.

    Step 2 — Assess Backup Status Before Taking Action

    Before deleting any archivelogs, it is critical to understand the backup posture. Blindly deleting archivelogs without knowing the backup state can leave the database unrecoverable.

    Started an RMAN session and ran a crosscheck first:

    rman target /
    RMAN> crosscheck archivelog all;

    All 2,416 objects returned “validation succeeded” — no expired logs. This meant the FRA was genuinely full of valid, undeleted archivelogs.

    Next, checked backup history:

    RMAN> list backup summary;

    Backups were listed daily going back several weeks — however, on closer inspection:

    SELECT session_key,
           TO_CHAR(start_time,'DD-MON-YY HH24:MI') start_time,
           TO_CHAR(end_time,'DD-MON-YY HH24:MI')   end_time,
           status,
           input_bytes,
           output_bytes
    FROM   v$rman_backup_job_details
    ORDER  BY start_time DESC
    FETCH FIRST 10 ROWS ONLY;

    Output revealed a critical finding:

    SESSION_KEY  START_TIME         END_TIME           STATUS    INPUT_BYTES  OUTPUT_BYTES
    -----------  -----------------  -----------------  --------  -----------  ------------
          19986  30-APR-26 01:00    30-APR-26 01:00    COMPLETED           0         98304
          19904  29-APR-26 01:00    29-APR-26 01:00    COMPLETED           0         98304
    • INPUT_BYTES = 0 — no datafiles were being backed up
    • OUTPUT_BYTES = 98304 (96 KB) — only a controlfile/SPFILE autobackup
    • Jobs completing in under a minute — impossible for a real full DB backup

    Additionally:

    RMAN> list backup of archivelog all;
    -- specification does not match any backup in the repository

    No archivelog backups existed at all. The scheduled backup job was not performing proper datafile or archivelog backups — a separate issue to be addressed post-incident.

    Step 3 — Escalation and Approval

    Given that there were no archivelog backups and the datafile backup was questionable, the decision to delete archivelogs and  Approval was obtained with the instruction to start conservatively — delete older than 15 days first, then 7 days if needed.

    Step 4 — Resolution

    First attempt — delete older than 15 days:

    RMAN> delete noprompt archivelog until time 'SYSDATE-15';

    RMAN returned warnings:

    RMAN-08138: warning: archived log not deleted - must create more backups

    This is because the RMAN retention policy was blocking deletion of logs that had never been backed up. With Tier 3 approval, used the force option to override:

    RMAN> delete noprompt force archivelog until time 'SYSDATE-15';

    This ran successfully, deleting archivelogs from January through late April.

    FRA usage after deletion:

    USED_GB  RECLAIMABLE_GB  NUMBER_OF_FILES
    -------  --------------  ---------------
     246.71               0              149

    Over 3,800 GB freed — FRA dropped from 99.5% to ~6%.

    Step 5 — Verification

    Forced a log switch and verified archiver status:

    ALTER SYSTEM ARCHIVE LOG CURRENT;
    
    SELECT dest_id, status, error
    FROM   v$archive_dest
    WHERE  status != 'INACTIVE';

    Output:

    DEST_ID  STATUS    ERROR
    -------  --------  -----
          1  VALID

    Archiver resumed successfully. Application connectivity was restored and users confirmed sessions were working normally.

    Redo Log Status Check

    As part of the verification, also confirmed redo log health:

    SELECT thread#,
           sequence#,
           TO_CHAR(first_time,'DD-MON-YY HH24:MI:SS') first_time,
           first_change#,
           archived,
           status
    FROM   v$log
    ORDER  BY thread#, sequence#;
    THREAD#  SEQUENCE#  FIRST_TIME                FIRST_CHANGE#  ARC  STATUS
    -------  ---------  ------------------------  -------------  ---  -------
          1       2417  05-MAY-26 01:56:44         1.7867E+10    NO   INACTIVE
          1       2418  05-MAY-26 03:27:20         1.7867E+10    NO   INACTIVE
          1       2419  05-MAY-26 07:57:10         1.7868E+10    NO   CURRENT

    No stuck or unarchived redo logs — database in healthy state.

    Incident Summary

    Item Detail
    Error ORA-00257: Archiver error
    Root Cause FRA (+RECO ASM diskgroup) at 99.5% capacity
    FRA Before 4,075 GB used / 2,426 files
    FRA After 246 GB used / 149 files
    Action Taken delete noprompt force archivelog until time 'SYSDATE-15'
    Space Freed ~3,829 GB
    Resolution Time ~45 minutes from identification to restoration

    Key Lessons Learned

    1. Monitor FRA Proactively

    Set up OEM 13c threshold alerts on FRA usage at 70% and 85%. Do not wait for ORA-00257 to discover the problem.

    SELECT ROUND(space_used/space_limit*100,2) pct_used
    FROM   v$recovery_file_dest;

    Alert when this exceeds 80%.

    2. Always Check Backup Status Before Deleting Archivelogs

    In this incident, the backup job appeared to be running daily but was only backing up the controlfile (INPUT_BYTES=0). This is a serious gap — verify actual backup content, not just job status.

    3. Investigate the Backup Job

    A proper RMAN backup script for a 19c CDB should include:

    BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;

    4. Consider an Archivelog Deletion Policy

    If archivelog backups are not being taken, configure an RMAN retention policy to prevent FRA accumulation:

    CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;

    5. FRA Sizing Review

    Even at 4095 GB, the FRA filled up due to months of uncleaned archivelogs. Review archivelog generation rate and size FRA accordingly, or implement regular cleanup.

    Conclusion

    ORA-00257 is one of those errors that brings an entire EBS environment to its knees instantly. The fix itself is straightforward — free up FRA space — but the investigation matters. Rushing to delete archivelogs without understanding the backup posture can result in an unrecoverable database.

    In this case, careful investigation revealed a deeper issue with the backup job that would have gone unnoticed had we not looked. Always verify, always escalate, and always get approval before deleting recovery-critical files.


    Syed Anwar Ahmed is an Oracle Apps DBA with over 11 years of production experience across Oracle EBS, Database, RAC, GoldenGate, and OEM environments. He writes about real-world Oracle incidents at syedanwarahmedoracle.blog.

  • Applying Oracle 19c Apr 2026 CPU Patch (19.31) on a Single Instance CDB

    In this article, we are going to demonstrate the steps to apply the Oracle April 2026 Critical Patch Update (CPU) on a Single Instance Oracle 19c Container Database (CDB). Here we will apply combo patch 39062931 which contains 39034528 (Database Apr 2026 Release Update 19.31.0.0.260421) and 38906621 (OJVM Component Release Update 19.31.0.0.260421). We also cover an important real-world scenario where a previously applied overlay patch 34672698 (ORA-00800 fix) conflicted with the Apr 2026 RU — confirming that the fix is now natively included in the RU itself.

    Follow the high-level plan to apply patch 39062931 (19.31.0.0.260421).

    • Go through the README documents of patch IDs before proceeding with patch apply.
    • Download the required patches from My Oracle Support.
    • Upgrade OPatch utility to the latest version as required by the patch README.
    • Perform pre-checks at CDB and PDB level — invalid objects, patch history, registry components.
    • Take Oracle Home TAR backup.
    • Blackout in OEM and disable crontab jobs before maintenance.
    • Shutdown database and listener.
    • Apply DB Release Update (39034528).
    • Apply OJVM Release Update (38906621).
    • Validate overlay patch (34672698) — check if still required post RU upgrade.
    • Start database and listener.
    • Execute datapatch on the database and validate.
    • Uncomment crontab and remove OEM blackout.
    • Perform post-checks and compare with pre-check data.

    Environment Details:

    COMPONENTS DETAILS
    Oracle Home /u01/app/oracle/product/19c/db_1
    DB Version 19.30.0.0.0 (Jan 2026 RU)
    OPatch Version (Before) 12.2.0.1.49
    OPatch Version (After) 12.2.0.1.51
    DB Type Single Instance, Multitenant (CDB+PDB)
    CDB Name DEVCDB
    PDBs DEVPDB1, DEVPDB2
    OS Linux x86-64
    Combo Patch 39062931 (19.31.0.0.260421)
    DB RU Patch 39034528
    OJVM Patch 38906621
    Overlay Patch 34672698 (validated — not reapplied)

    1. Pre-Checks

    1.1: Capture invalid objects at CDB level.

    sqlplus / as sysdba
    
    Set lines 230
    set pages 100
    col OBJECT_NAME for a40
    col OBJECT_TYPE for a40
    col OWNER for a20
    select OBJECT_NAME, OBJECT_TYPE, OWNER,
           to_char(LAST_DDL_TIME,'DD-MON-YYYY HH24:MI:SS'), status
    from dba_objects
    where status='INVALID'
    order by owner;
    no rows selected

    1.2: Capture patch history from dba_registry_sqlpatch.

    select patch_id, action, status, action_time, description
    from dba_registry_sqlpatch;
    PATCH_ID ACTION STATUS ACTION_TIME DESCRIPTION
    38523609 APPLY SUCCESS 03-FEB-26 OJVM RELEASE UPDATE: 19.30.0.0.260120
    38632161 APPLY SUCCESS 03-FEB-26 Database Release Update: 19.30.0.0.260120

    44 rows selected.

    1.3: Capture DB registry components.

    SELECT COMP_ID, comp_name, version, status FROM dba_registry;
    COMP_ID COMP_NAME VERSION STATUS
    CATALOG Oracle Database Catalog Views 19.30.0.0.0 VALID
    CATPROC Oracle Database Packages and Types 19.30.0.0.0 VALID
    JAVAVM JServer JAVA Virtual Machine 19.30.0.0.0 VALID
    XML Oracle XDK 19.30.0.0.0 VALID
    XDB Oracle XML Database 19.30.0.0.0 VALID
    ORDIM Oracle Multimedia 19.30.0.0.0 VALID
    CONTEXT Oracle Text 19.30.0.0.0 VALID
    … 15 rows total. All components VALID.

    1.4: Connect to each PDB and capture invalid objects as baseline.

    alter session set container=DEVPDB1;
    select OBJECT_NAME, OBJECT_TYPE, OWNER, status
    from dba_objects where status='INVALID'
    order by owner;
    📌 Note: Document all pre-existing invalid objects at PDB level before patching. These serve as a baseline for post-patch comparison. Do not attempt to fix pre-existing invalid objects before patching.

    1.5: Capture current patch inventory from OS.

    [oracle@dbhost ~]$ opatch lspatches
    PATCH_ID DESCRIPTION
    34672698 ORA-00800 SOFT EXTERNAL ERROR, ARGUMENTS [SET PRIORITY FAILED], [VKTM], DISM(16)
    38523609 OJVM RELEASE UPDATE: 19.30.0.0.260120 (38523609)
    38632161 Database Release Update: 19.30.0.0.260120 (38632161)
    29585399 OCW RELEASE UPDATE 19.3.0.0.0 (29585399)
    OPatch succeeded.

    2. Download Patches

    2.1: Create a dedicated directory for April 2026 patches.

    mkdir -p /u01/app/oracle/patches/APR_2026
    cd /u01/app/oracle/patches/APR_2026

    2.2: Download OPatch (patch 6880880).

    ⚠️ Important: Always download the Linux-x86-64 version of OPatch, not the generic LINUX (32-bit) version. The 32-bit LINUX zip will cause oui/lib/linux directory errors during patch apply.
    wget --user=<MOS_EMAIL> --password='<MOS_PASSWORD>' --no-check-certificate -c \
         "<DOWNLOAD_URL_FROM_MOS>" -O p6880880_190000_Linux-x86-64.zip

    2.3: Download Combo Patch and Overlay Patch.

    wget --user=<MOS_EMAIL> --password='<MOS_PASSWORD>' --no-check-certificate -c \
         "<DOWNLOAD_URL_FROM_MOS>" -O p39062931_190000_Linux-x86-64.zip
    
    wget --user=<MOS_EMAIL> --password='<MOS_PASSWORD>' --no-check-certificate -c \
         "<DOWNLOAD_URL_FROM_MOS>" -O p34672698_1930000DBRU_Linux-x86-64.zip

    2.4: Unzip patches and verify structure.

    unzip p39062931_190000_Linux-x86-64.zip
    unzip p34672698_1930000DBRU_Linux-x86-64.zip
    
    ls -l 39062931/
    drwxr-x---. 4 oracle oinstall  6144 38906621    -- OJVM RU
    drwxr-x---. 4 oracle oinstall  6144 39034528    -- DB RU
    -rw-r--r--. 1 oracle oinstall   118 README.html

    3. Upgrade OPatch

    3.1: Check current OPatch version.

    [oracle@dbhost ~]$ $ORACLE_HOME/OPatch/opatch version
    OPatch Version: 12.2.0.1.49
    OPatch succeeded.

    The April 2026 RU requires OPatch 12.2.0.1.51 or higher. We need to upgrade.

    3.2: Backup and replace OPatch.

    mv $ORACLE_HOME/OPatch $ORACLE_HOME/OPatch_old_12.2.0.1.49
    unzip p6880880_190000_Linux-x86-64.zip -d $ORACLE_HOME

    3.3: Verify new OPatch version.

    [oracle@dbhost ~]$ $ORACLE_HOME/OPatch/opatch version
    OPatch Version: 12.2.0.1.51
    OPatch succeeded.
    💡 Tip: If OPatch fails with Invalid maximum heap size: -Xmx5120m, set the following before retrying:

    export OPATCH_JRE_MEMORY_OPTIONS="-Xmx1024m"

    Add this to ~/.bash_profile to make it permanent.


    4. Pre-Patching Activities

    4.1: Comment out crontab entries to prevent jobs from running during maintenance.

    [oracle@dbhost ~]$ (/usr/bin/crontab -l | /bin/sed 's/^/##APR_2026##&/g' | /usr/bin/crontab)

    4.2: Place OEM blackout if OEM is configured for this target.

    📌 Note: If the environment has application-level database triggers, disable them before shutting down the database. Consult the application team before making any trigger changes. Re-enable them after patching is complete.

    5. Shutdown Database and Listener

    5.1: Shutdown the database and listener, and take an Oracle Home TAR backup.

    sqlplus / as sysdba
    SQL> shutdown immediate;
    Database closed.
    Database dismounted.
    ORACLE instance shut down.
    SQL> exit;
    
    lsnrctl stop LISTENER
    
    # Oracle Home TAR backup (run in background)
    nohup tar -zcvf /u01/app/oracle/patches/oracle_home_APR2026.tar.gz $ORACLE_HOME &

    6. Apply DB Release Update (39034528)

    6.1: Set PATH and run the prerequisite conflict check.

    export PATH=$ORACLE_HOME/OPatch:$PATH
    cd /u01/app/oracle/patches/APR_2026/39062931/39034528
    
    opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    ZOP-47: Superset Patch 39034528 has Subset Patch 38632161 which has overlay patches
    [34672698] and these overlay patches conflict with Superset Patch.
    OPatch failed with error code 73
    ⚠️ Action Required: The prereq check identified that overlay patch 34672698 conflicts with the new DB RU. Roll it back before applying the DB RU.

    6.2: Rollback overlay patch 34672698.

    opatch rollback -id 34672698
    Patch 34672698 successfully rolled back.
    OPatch succeeded.

    6.3: Re-run prereq check after rollback.

    opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    Prereq "checkConflictAgainstOHWithDetail" passed.
    OPatch succeeded.

    6.4: Apply the DB Release Update.

    opatch apply
    Patch 39034528 successfully applied.
    Sub-set patch [38632161] has become inactive due to the application of a super-set patch [39034528].
    OPatch succeeded.

    6.5: Verify the patch is applied.

    opatch lsinventory | grep 39034528
    39034528  Database Release Update : 19.31.0.0.260421

    7. Apply OJVM Release Update (38906621)

    7.1: Run prereq check and apply the OJVM patch.

    export PATH=$ORACLE_HOME/OPatch:$PATH
    cd /u01/app/oracle/patches/APR_2026/39062931/38906621
    
    opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    Prereq "checkConflictAgainstOHWithDetail" passed.
    OPatch succeeded.
    
    opatch apply
    Patch 38906621 successfully applied.
    Sub-set patch [38523609] has become inactive due to the application of a super-set patch [38906621].
    OPatch succeeded.

    7.2: Verify both patches are listed.

    [oracle@dbhost ~]$ opatch lspatches
    PATCH_ID DESCRIPTION
    39034528 Database Release Update: 19.31.0.0.260421 (39034528)
    38906621 OJVM RELEASE UPDATE: 19.31.0.0.260421 (38906621)
    29585399 OCW RELEASE UPDATE 19.3.0.0.0 (29585399)
    OPatch succeeded.

    8. Overlay Patch (34672698) — Conflict with Apr 2026 RU

    The overlay patch 34672698 was previously applied on top of the Jan 2026 RU to fix ORA-00800: SOFT EXTERNAL ERROR, ARGUMENTS: [SET PRIORITY FAILED], [VKTM], DISM(16). It was rolled back in Step 6. Before reapplying, always run the prereq check.

    📌 Note on oradism permissions: Some environments require a temporary oradism ownership change to oracle before applying the overlay patch. Always check the current permissions first:

    ls -l $ORACLE_HOME/bin/oradism

    If oradism is already oracle-owned (e.g., -rwxr-x---. 1 oracle oinstall), no root permission change is required. Only apply the chown/chmod steps if oradism is root-owned with setuid bit.

    8.1: Run prereq check before reapplying the overlay patch.

    cd /u01/app/oracle/patches/APR_2026/34672698
    opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    
    ZOP-40: The patch(es) has conflicts with other patches installed in the Oracle Home.
    Following patches have conflicts: 34672698, 39034528
    OPatch failed with error code 73
    ✔ Expected Result: The prereq confirms overlay patch 34672698 conflicts with Apr 2026 DB RU (39034528). Oracle has included the ORA-00800 fix natively in the Apr 2026 RU. The overlay patch is no longer required and must NOT be reapplied.
    ⚠️ Important: This is expected behaviour when upgrading RUs. An overlay patch applied on a previous RU may become redundant when the next RU natively includes the same fix. Always run prereq check before reapplying overlay patches after an RU upgrade.

    8.2: Verify the overlay patch is absent from inventory.

    opatch lsinventory | grep 34672698
    [oracle@dbhost 34672698]$
    ✔ Confirmed: No output from the grep confirms the overlay patch is correctly absent from the Oracle Home.

    9. Start Database and Run Datapatch

    9.1: Start the database and open all PDBs.

    sqlplus / as sysdba
    SQL> startup;
    ORACLE instance started.
    Database mounted.
    Database opened.
    
    SQL> alter pluggable database all open;
    Pluggable database altered.
    
    SQL> exit;
    
    lsnrctl start LISTENER

    9.2: Run datapatch to apply the SQL portion of the patches.

    cd $ORACLE_HOME/OPatch
    ./datapatch -verbose
    SQL Patching tool version 19.31.0.0.0 Production on Tue May 5 13:17:39 2026
    
    Installation queue:
      For CDB$ROOT PDB$SEED DEVPDB1 DEVPDB2:
        Rollback: 38523609 (OJVM 19.30)
        Apply: 39034528 (DB RU 19.31)
        Apply: 38906621 (OJVM 19.31)
    
    Patch 38523609 rollback (CDB$ROOT): SUCCESS
    Patch 39034528 apply (CDB$ROOT): SUCCESS
    Patch 38906621 apply (CDB$ROOT): SUCCESS
    Patch 38523609 rollback (DEVPDB1): SUCCESS
    Patch 39034528 apply (DEVPDB1): SUCCESS
    Patch 38906621 apply (DEVPDB1): SUCCESS
    Patch 38523609 rollback (DEVPDB2): SUCCESS
    Patch 39034528 apply (DEVPDB2): SUCCESS
    Patch 38906621 apply (DEVPDB2): SUCCESS
    
    SQL Patching tool complete on Tue May 5 13:24:18 2026
    📌 Note: Datapatch applies the SQL changes to both CDB and all open PDBs. Wait for it to complete fully before proceeding. Do not interrupt the process.

    10. Post-Patching Activities

    10.1: Uncomment crontab entries.

    [oracle@dbhost ~]$ (/usr/bin/crontab -l | /bin/sed 's/##APR_2026##//g' | /usr/bin/crontab)

    10.2: Remove OEM blackout if applicable.

    📌 Note: If application-level database triggers were disabled before patching, re-enable them now after confirming datapatch completed successfully. Consult the application team before enabling.

    11. Post-Checks

    11.1: Verify patch registry — all entries should show SUCCESS.

    select patch_id, action, status, action_time, description
    from dba_registry_sqlpatch
    order by action_time;
    PATCH_ID ACTION STATUS ACTION_TIME DESCRIPTION
    38523609 ROLLBACK SUCCESS 05-MAY-26 01:20:47 OJVM RELEASE UPDATE: 19.30.0.0.260120
    38906621 APPLY SUCCESS 05-MAY-26 01:21:02 OJVM RELEASE UPDATE: 19.31.0.0.260421
    39034528 APPLY SUCCESS 05-MAY-26 01:21:56 Database Release Update: 19.31.0.0.260421

    11.2: Verify all DB registry components are VALID.

    SELECT COMP_ID, comp_name, version, status FROM dba_registry;
    COMP_ID COMP_NAME VERSION STATUS
    CATALOG Oracle Database Catalog Views 19.31.0.0.0 VALID
    CATPROC Oracle Database Packages and Types 19.31.0.0.0 VALID
    JAVAVM JServer JAVA Virtual Machine 19.31.0.0.0 VALID
    XML Oracle XDK 19.31.0.0.0 VALID
    XDB Oracle XML Database 19.31.0.0.0 VALID
    ORDIM Oracle Multimedia 19.31.0.0.0 VALID
    CONTEXT Oracle Text 19.31.0.0.0 VALID
    … 15 rows total. All components VALID.

    11.3: Confirm final patch inventory.

    [oracle@dbhost ~]$ opatch lspatches
    PATCH_ID DESCRIPTION
    39034528 Database Release Update: 19.31.0.0.260421 (39034528)
    38906621 OJVM RELEASE UPDATE: 19.31.0.0.260421 (38906621)
    29585399 OCW RELEASE UPDATE 19.3.0.0.0 (29585399)
    OPatch succeeded.
    ✔ Confirmed: Overlay patch 34672698 is no longer listed. Its fix is now natively included in the Apr 2026 RU (39034528). This is the expected and correct final state.

    11.4: Compare invalid objects at CDB and PDB level with pre-patch baseline. Compile any new invalid objects introduced by the patch:

    @$ORACLE_HOME/rdbms/admin/utlrp.sql

    Key MOS References

    Document Description
    KB869205 Oracle Database 19c Apr 2026 RU Known Issues
    KB137197 OJVM Conditional Rolling Install Details
    244241.1 OPatch Support for RAC Rolling Patches
    293369.1 OPatch Documentation List

    Disclaimer: All server names, hostnames, database names, and environment-specific paths in this post have been anonymized. Steps and outputs are based on real production experience adapted for general use. Always test in a non-production environment before applying patches to production systems.

    If you found this useful, connect with me on LinkedIn or explore more Oracle DBA scripts on my GitHub. More patching walkthroughs at syedanwarahmedoracle.blog.


    💬 Share Your Experience

    Have you applied Oracle 19c CPU patches and hit a similar overlay patch conflict? Or found that a fix you applied months ago is now natively bundled into the next RU? Your real-world experiences help fellow DBAs tackle the same challenges. Drop a comment below — questions, observations, and feedback are always welcome.

  • P1 Incident: /dbbackup Filesystem 100% Full — How We Traced, Fixed, and Recovered Two Failed RMAN Backups in One Night

    🔴 Incident Overview

    Severity P1 — Two production database backups failed
    Environment Oracle 19c (19.30) on Linux x86-64
    Backup Tool RMAN with Recovery Catalog
    Backup Volume /dbbackup — 1TB LVM filesystem
    Databases Affected DBPRO01 (12:30 failure), DBPRO02 (22:30 failure)
    Total Space Recovered ~492G (disk went from 100% → 52%)

    1. The Alerts — Two Failures, Same Root Cause

    It started with an RMAN failure at 22:30. The backup script for DBPRO02 fired on schedule and died within 2 minutes. The RMAN log told the story clearly:

    RMAN-03009: failure of backup command on c4 channel at 22:32:13
    ORA-19502: write error on file "/dbbackup/DBPRC02/rman/DiffInc_DBPRC02_4u4mi8bf"
    ORA-27072: File I/O error
    Additional information: 4
    

    Three more channels followed — c1, c2, c3 — all crashing at exactly 22:32:48. When multiple channels fail simultaneously at the same timestamp, it almost always means one thing: the destination filesystem just hit 100%.

    A quick check confirmed it:

    $ df -hP /dbbackup
    Filesystem                        Size  Used Avail Use%
    /dev/mapper/orabkupvg-orabkuplv1 1023G 1020G  3.3G 100%
    

    1TB volume. 3.3G free. Completely full.

    What we did not know yet — digging into backup history would reveal that DBPRO01 had already failed at 12:30 that same day for the same reason, 10 hours earlier. Two databases unprotected on the same night.


    2. The Investigation — Folder by Folder

    The first step was understanding what was consuming the disk. One command gave us the top-level picture:

    $ du -sh /dbbackup/*
    744G    DBPRC01
    152G    ColdBackup_11April2026
     41G    DBPRC02
     26G    DBPRC03
     15G    DBPRO01
    7.7G    JAN2026_CPU
    7.7G    OCT2025_CPU
    5.3G    infra_arch
    

    744G inside DBPRC01 alone — 73% of the entire disk. That was our primary suspect.

    Drilling into DBPRC01

    $ du -sh /dbbackup/DBPRC01/rman/* | sort -rh | head -10
    7.5G    DiffInc_DBPRC01_fl4lumij
    7.5G    DiffInc_DBPRC01_eg4lrnh5
    7.5G    DiffInc_DBPRC01_b44lc7uh
    ...
    

    Every single file was a DiffInc_ or ArchivelogAll_ backup piece. No variety. No cleanup. Just backup after backup piling up.

    $ ls /dbbackup/DBPRC01/rman/ | wc -l
    1522
    

    1,522 backup pieces. We checked the oldest and newest:

    Oldest file on disk:  2022-05-07
    Newest file on disk:  2026-04-25
    

    Four years of backup files on disk — or so we thought.


    3. The RMAN Investigation — Where Things Got Interesting

    We connected RMAN to the database and ran the retention check:

    RMAN> SHOW RETENTION POLICY;
    CONFIGURE RETENTION POLICY TO REDUNDANCY 30;
    

    REDUNDANCY 30. This tells RMAN to keep the last 30 complete backup copies of every datafile before considering anything obsolete.

    Next logical step — check what RMAN considers obsolete:

    RMAN> REPORT OBSOLETE;
    no obsolete backups found
    

    Nothing? With 1,522 files on disk?

    We ran CROSSCHECK BACKUP — all 1,693 objects came back AVAILABLE. Then we checked the actual date range RMAN was tracking from the database control file:

    SELECT TO_CHAR(MIN(completion_time),'DD-MON-YYYY') oldest,
           TO_CHAR(MAX(completion_time),'DD-MON-YYYY') newest,
           COUNT(*) total_pieces
    FROM v$backup_piece_details
    WHERE status = 'A';
    
    OLDEST          NEWEST          TOTAL_PIECES
    03-DEC-2025     25-APR-2026     1541
    

    The control file only tracks pieces from December 2025 onwards — about 5 months. The 2022/2023 files seen on disk were old directories and scripts, not backup pieces. All 1,541 current pieces were legitimate and RMAN considered every one of them necessary under REDUNDANCY 30.

    This was the key insight: RMAN was not broken. The retention policy itself was the problem.


    4. Root Cause — The Architecture Trap

    The deeper investigation revealed something unexpected. Looking at the actual RMAN backup script:

    connect target rman/password@DBPRO01
    ...
    format '/dbbackup/DBPRC01/rman/DiffInc_%d_%u'
    (database);
    ...
    delete obsolete;
    

    DBPRO01 (the production database) was backing up INTO the DBPRC01 directory. The directory names suggested one database but contained another database’s backups entirely. The naming convention was PRO to PRC — production database backups stored in the production-copy directory.

    This pattern existed for all three database pairs on the server. Each production database backed up into its corresponding copy directory.

    The delete obsolete command was in the script — but with REDUNDANCY 30 and weekly Level 0 backups, obsolete only kicks in after 30 complete Level 0 cycles. That is 30 weeks = 7.5 months of retention. Since the current tracking window was only 5 months, delete obsolete ran every night and found absolutely nothing to delete.

    The math:

    Retention policy REDUNDANCY 30
    Level 0 frequency Weekly (Sundays)
    Effective retention period ~30 weeks / 7.5 months
    Backup tracking since December 2025 (~5 months)
    Result delete obsolete finds nothing — ever
    Daily backup size ~7–7.5G per run
    Total accumulated 744G

    Adding fuel to the fire — the patching activity on April 18 triggered an extra Level 0 backup, followed by the regular Sunday Level 0 on April 19. Two large Level 0 runs (~27G each) within 24 hours wrote the final ~54G that pushed the disk over the edge.


    5. Secondary Findings During Investigation

    OCT2025 CPU Patch Artifacts (7.7G)

    The October 2025 CPU patch files (zip archives + extracted directories) were still sitting in /dbbackup/OCT2025_CPU/. A quick OPatch check confirmed the database had since been patched to 19.30 (January 2026 RU) — the October 2025 patches were fully superseded and rolled back from inventory. Safe to delete immediately.

    $ $ORACLE_HOME/OPatch/opatch lsinventory | grep -E "38291812|38194382"
    # Empty — neither Oct 2025 patch in inventory anymore
    

    5-Year-Old Pre-Migration Export Dumps

    Three directories contained Oracle 11.2.0.4 export dumps from January–March 2021 — taken before the migration to 19c. With the database now running 19.30, these had zero recovery value but occupied ~14G collectively. Flagged for manager approval before deletion.

    Recovery Catalog Version Mismatch

    The original RMAN log flagged this warning:

    PL/SQL package RMAN.DBMS_RCVCAT version 19.11.00.00 in RCVCAT database is not current
    PL/SQL package RMAN.DBMS_RCVMAN version 19.11.00.00 in RCVCAT database is not current
    

    The recovery catalog is running 19.11 packages while the RMAN client is now 19.30. Non-critical tonight but requires UPGRADE CATALOG in the next maintenance window.


    6. The Fix — Emergency Space Recovery

    With management approval obtained, we executed a time-based delete — keeping the last 30 days of backups and removing everything older:

    RMAN> DELETE NOPROMPT BACKUP COMPLETED BEFORE 'SYSDATE-30';
    

    This command does three things atomically:

    1. Queries catalog/controlfile for all pieces completed before the cutoff date
    2. Deletes the physical files from disk
    3. Removes the records from RMAN catalog — no orphaned entries, no catalog drift

    The output scrolled for several minutes:

    deleted backup piece
    backup piece handle=/dbbackup/DBPRC01/rman/DiffInc1_DBPRC01_6u4jaaih ...
    deleted backup piece
    backup piece handle=/dbbackup/DBPRC01/rman/ArchivelogAll_DBPRC01_784jab6o ...
    ...
    Deleted 1072 objects
    

    1,072 backup pieces deleted. Catalog updated. Disk checked:

    BEFORE:  Used 1020G  Avail 3.3G  (100%)
    AFTER:   Used  536G  Avail 488G   (53%)
    

    Then the OCT2025_CPU directory was removed:

    $ rm -rf /dbbackup/OCT2025_CPU/
    $ df -hP /dbbackup
    Used 528G  Avail 495G  (52%)
    

    Final result: 495G free. Disk at 52%.

    Both failed backups were re-submitted immediately and ran successfully in parallel:

    $ nohup sh /opt/oracle/scripts/rman/rman_backup_DBPRO01.sh &
    $ nohup sh /opt/oracle/scripts/rman/rman_backup_DBPRO02.sh &
    
    $ jobs -l
    [1] Running   nohup sh ...rman_backup_DBPRO01.sh &
    [2] Running   nohup sh ...rman_backup_DBPRO02.sh &
    

    7. Incident Timeline

    12:30 DBPRO01 Level 0 backup fails — ORA-19502/ORA-27072 (disk full)
    22:30 DBPRO02 Level 0 backup fails — same errors, all 4 channels
    23:08 Investigation begins — df -hP /dbbackup confirms 100% full
    23:15 DBPRC01 directory identified as 744G consumer
    23:25 RMAN connected — REDUNDANCY 30 discovered
    23:35 Architecture confirmed — PRO databases backing up into PRC directories
    23:45 Root cause confirmed — 7.5-month retention, delete obsolete finds nothing
    23:50 DELETE BACKUP COMPLETED BEFORE SYSDATE-30 executed
    23:51 1,072 pieces deleted — disk drops to 53%
    23:55 OCT2025_CPU removed — disk at 52%, 495G free
    00:00 Both backup jobs re-submitted and running successfully

    8. Permanent Fix Recommendations

    Fix 1 — Change Retention Policy to RECOVERY WINDOW

    RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 14 DAYS;
    

    REDUNDANCY 30 with weekly Level 0s means 7.5 months of retention — far beyond what any production SLA requires. A 14-day recovery window keeps 2 weeks of backups regardless of backup frequency, and delete obsolete will actually find and remove old pieces going forward.

    Fix 2 — Add Pre-Backup Space Check to Cron Script

    #!/bin/bash
    BACKUP_FS="/dbbackup"
    THRESHOLD=20
    
    AVAIL_PCT=$(df -hP $BACKUP_FS | awk 'NR==2 {gsub(/%/,""); print 100-$5}')
    
    if [ "$AVAIL_PCT" -lt "$THRESHOLD" ]; then
      echo "ABORT: $BACKUP_FS is ${AVAIL_PCT}% free — below ${THRESHOLD}% threshold" \
        | mailx -s "BACKUP ABORTED: Low space on $BACKUP_FS" $MAILTO
      exit 1
    fi
    

    A failing backup that writes 3G before dying is worse than a backup that never starts — it wastes the last 3G of free space and leaves partial pieces on disk.

    Fix 3 — Upgrade the Recovery Catalog

    RMAN> CONNECT TARGET /
    RMAN> CONNECT CATALOG rman/password@rmancat
    RMAN> UPGRADE CATALOG;
    RMAN> UPGRADE CATALOG;   -- run twice as prompted
    

    The catalog is 2 major patch levels behind the RMAN client. Some catalog-dependent operations will start failing if left unaddressed.

    Fix 4 — Filesystem Monitoring Alert

    The FRA check scripts already email on FRA usage above 80%. The same pattern should exist for /dbbackup. A simple cron entry checking disk usage every hour with alert at 80% would have caught this days before the disk hit 100%.


    9. Key Takeaways for Oracle DBAs

    REDUNDANCY N is not always safer than RECOVERY WINDOW. REDUNDANCY 30 with weekly Level 0 backups means 7.5 months of retention — likely far beyond your RPO requirement and a silent space accumulator.

    • Always verify what delete obsolete actually deletes. If it finds nothing to delete every single night, that is a warning sign — not reassurance.
    • Check backup naming conventions carefully. When a directory named DBPRC01 contains DBPRO01 backups, retention policies applied to the wrong database RMAN configuration control the cleanup behavior.
    • Patching days generate oversized backups. A Level 0 taken manually on patch day plus the regular Sunday Level 0 the next day equals 2x the normal space consumption in 24 hours. Ensure extra headroom exists going into patch windows.
    • Use DELETE BACKUP COMPLETED BEFORE SYSDATE-N for emergency cleanup — not OS-level rm. RMAN deletes atomically update both the physical files and the catalog, preventing expired/orphaned piece confusion later.
    • Never use rm on RMAN backup pieces directly unless you follow up with CROSSCHECK BACKUP and DELETE EXPIRED BACKUP to sync the catalog.

    10. Commands Reference — Quick Cheat Sheet

    -- Check retention policy
    RMAN> SHOW RETENTION POLICY;
    
    -- Preview what would be deleted (dry run)
    RMAN> REPORT OBSOLETE;
    RMAN> LIST BACKUP COMPLETED BEFORE 'SYSDATE-30';
    
    -- Emergency cleanup — delete pieces older than 30 days
    RMAN> DELETE NOPROMPT BACKUP COMPLETED BEFORE 'SYSDATE-30';
    
    -- Standard cleanup based on retention policy
    RMAN> DELETE NOPROMPT OBSOLETE;
    
    -- Sync catalog after any OS-level file operations
    RMAN> CROSSCHECK BACKUP;
    RMAN> DELETE NOPROMPT EXPIRED BACKUP;
    
    -- Change to time-based retention (recommended)
    RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 14 DAYS;
    
    -- Check backup piece date range in control file
    SELECT TO_CHAR(MIN(completion_time),'DD-MON-YYYY') oldest,
           TO_CHAR(MAX(completion_time),'DD-MON-YYYY') newest,
           COUNT(*) total_pieces
    FROM v$backup_piece_details
    WHERE status = 'A';
    
    -- Check backup history
    SELECT session_key, input_type, status,
           TO_CHAR(start_time,'YYYY-MM-DD HH24:MI:SS') start_time,
           output_bytes_display, time_taken_display
    FROM v$rman_backup_job_details
    ORDER BY start_time DESC;
    

    Conclusion

    What appeared to be a simple disk full incident turned out to involve a multi-database backup architecture, a misconfigured retention policy, and a cleanup mechanism that was technically running correctly but never finding anything to clean. The fix itself — one RMAN command — took under 5 minutes. The real work was the systematic investigation to understand exactly what was safe to delete and why.

    That is Oracle DBA work in a nutshell: the fix is often simple; understanding why it is safe to run is the real job.


    If you found this useful, connect with me on LinkedIn or explore more Oracle DBA scripts on my GitHub. More incident walkthroughs at syedanwarahmedoracle.blog.

  • Troubleshooting OGG-01224 TCP/IP Error 79: GoldenGate EXTRACT Abend Due to Connection Refused


    Overview

    In a GoldenGate 19c environment, one of the EXTRACT processes abended with a TCP/IP error while all other EXTRACT processes were running normally. This post walks through the exact error, the diagnosis process, and how the issue was resolved — along with preventive recommendations to avoid recurrence.


    Environment Details

    Component Detail
    GoldenGate Version Oracle GoldenGate 19c (Classic Architecture)
    Source Database Oracle 12.1.0.2
    Source Server ogg-source01.example.com
    Manager Port 8910
    Abended Process EXTRACT EXTR0001

    The Problem

    During routine monitoring, the GoldenGate process status showed:

    EXTRACT     ABENDED     EXTR0001    CLASSIC
    EXTRACT     RUNNING     EXTR0002    CLASSIC
    EXTRACT     RUNNING     EXTR0003    CLASSIC
    EXTRACT     RUNNING     EXTR0004    CLASSIC
    

    Only EXTR0001 had abended while the other three EXTRACT processes were running normally.


    Reading the Report File

    The first step was to view the EXTRACT report file from GGSCI:

    GGSCI> VIEW REPORT EXTR0001
    

    The relevant portion of the report showed:

    2026-04-14 12:38:41  INFO    OGG-02820  Positioning to (Thread 1) Sequence 230,424,
                                  RBA 34,213,904, SCN 459.4,134,390,557.
    
    2026-04-14 12:38:41  INFO    OGG-02824  Positioned to (Thread 1) Sequence 230,424,
                                  RBA 34,213,904, Apr 14, 2026 10:07:59 AM.
    
    Source Context :
      SourceModule  : [ggnet.tcprtn]
      SourceID      : [../gglib/ggnet/tcprtn.c]
      SourceMethod  : [opt_negotiate(int, errmsg_t &)]
      SourceLine    : [4739]
    
    2026-04-14 12:38:41  ERROR   OGG-01224  TCP/IP error 79 (Connection refused),
                                  endpoint: ogg-target02:8910.
    
    2026-04-14 12:38:41  ERROR   OGG-01668  PROCESS ABENDING.
    

    Two important observations from this output:

    1. The EXTRACT successfully positioned in the source redo logs at the correct SCN — this ruled out any issue with the source database or archived logs.
    2. The abend happened at opt_negotiate — the GoldenGate TCP-level handshake between the EXTRACT and the remote Manager.

    Error Breakdown

    OGG-01224  TCP/IP error 79 (Connection refused)
    endpoint: ogg-target02:8910
    
    Element Meaning
    OGG-01224 GoldenGate TCP/IP communication failure
    Error 79 ECONNREFUSED — socket connect actively rejected
    opt_negotiate Protocol negotiation phase (post-TCP, pre-trail write)
    Port 8910 GoldenGate Manager port on the remote server

    “Connection refused” (errno 79) means the TCP SYN packet reached the remote server but the connection could not be completed — a transient error commonly seen in GoldenGate environments.


    Diagnosis Steps

    Step 1 — View the EXTRACT Report

    The first step was to read the report file and identify the exact error:

    GGSCI> VIEW REPORT EXTR0001
    

    The error pointed to OGG-01224 TCP/IP error 79 (Connection refused) at the opt_negotiate stage — meaning the EXTRACT successfully read the source redo logs but failed when attempting to communicate with the remote GoldenGate Manager.

    Step 2 — Check EXTRACT Status

    GGSCI> INFO EXTRACT EXTR0001, DETAIL
    GGSCI> INFO ALL
    

    This confirmed only EXTR0001 had abended while the remaining EXTRACT processes were running normally.


    Root Cause

    OGG-01224 TCP/IP errors are among the most commonly encountered errors in GoldenGate environments. They are often transient in nature — caused by brief network interruptions, a momentary spike in system load, or a short-lived instability in the GoldenGate Manager communication layer. In many cases, no deep investigation is required; the EXTRACT simply needs to be restarted after waiting a short period for the underlying condition to clear.

    The other EXTRACT processes (EXTR0002, EXTR0003, EXTR0004) were unaffected, confirming the issue was isolated and transient.


    Resolution

    After waiting briefly for the transient condition to clear, the abended EXTRACT was restarted:

    GGSCI> START EXTRACT EXTR0001
    GGSCI> INFO EXTRACT EXTR0001, DETAIL
    

    The EXTRACT came up immediately and resumed processing without any data loss or position reset.


    Post-Recovery Verification

    -- Confirm all processes running
    GGSCI> INFO ALL
    
    -- Check lag is reducing
    GGSCI> LAG EXTRACT EXTR0001
    
    -- Confirm records are flowing
    GGSCI> STATS EXTRACT EXTR0001
    

    Key Takeaways

    1. OGG-01224 TCP/IP errors are common and often transient.
    These errors are frequently seen in GoldenGate environments and do not always indicate a serious underlying problem. A brief network hiccup, momentary system load, or a short instability in the Manager communication layer can trigger this error. In many cases, waiting a short time and restarting the EXTRACT is all that is needed.

    2. The opt_negotiate method is the first GoldenGate-level handshake.
    After the TCP connection is established, GoldenGate negotiates capabilities via opt_negotiate. A failure at this stage is typically transient and clears on its own after a retry.

    3. Multiple EXTRACTs can fail independently.
    When one EXTRACT abends while others continue running, the issue is isolated. Do not assume a global outage — check each process individually.

    4. Add AUTORESTART to Manager params.
    To reduce downtime on similar future occurrences, configure Manager to auto-restart key processes:

    -- In mgr.prm
    AUTORESTART EXTRACT *, RETRIES 5, WAITMINUTES 2, RESETMINUTES 60
    

    Diagnostic Checklist for OGG-01224 TCP/IP Errors

    [ ] View EXTRACT report: GGSCI> VIEW REPORT <extract_name>
    [ ] Note the SourceMethod — is it opt_negotiate or during data transfer?
    [ ] Check if other EXTRACTs are running (INFO ALL)
    [ ] If error is transient: wait briefly, then START EXTRACT <name>
    [ ] If error persists: test port reachability (telnet <rmthost> <mgrport>)
    [ ] If error persists beyond a retry: escalate for network/firewall checks
    [ ] Verify RMTHOST/MGRPORT in EXTRACT param file (VIEW PARAMS <name>)
    [ ] Check for encryption mismatch if opt_negotiate consistently fails
    [ ] Add AUTORESTART in mgr.prm to handle future transient failures automatically
    

    References

    • Oracle GoldenGate Documentation — docs.oracle.com
    • MOS Note 966227.1: OGG Troubleshooting TCP/IP Errors In Open Systems (My Oracle Support login required)

    About the Author

    Syed Anwar Ahmed is an Oracle Apps DBA with extensive production experience managing Oracle RAC, E-Business Suite, and GoldenGate replication environments. He shares practical DBA knowledge from real-world incidents on his blog and is an Oracle ACE Apprentice candidate.

    Connect on LinkedIn | Blog | GitHub

  • Fixing Oracle EBS 12.2 Forms Not Opening (Step-by-Step Guide)

    Oracle E-Business Suite (EBS) 12.2 uses Java Web Start (JNLP) to launch forms, unlike older versions like 11i which relied on browser plugins. If your forms are not opening, this guide will help you fix the issue completely.

    Common Symptoms

    • Forms not opening after click
    • .jnlp file downloads but does nothing
    • Application Blocked by Java Security error
    • Nothing happens when clicking forms

    Root Causes

    • Using unsupported Java version (Java 11, 17, or 24)
    • Missing Java Web Start (OpenWebStart)
    • Java security blocking self-signed certificates
    • Old EBS 11i configurations applied to 12.2 environment

    Step-by-Step Solution

    Step 1: Remove Unsupported Java Versions

    Uninstall Java 11, 17, 24 or any non-Java 8 version from Control Panel → Programs → Uninstall a Program.

    Verify removal:

    java -version

    Expected output:

    'java' is not recognized as an internal or external command

    Also clean up residual folders if they exist:

    C:\Program Files\Java\
    C:\Program Files (x86)\Java\
    C:\Users\<username>\AppData\LocalLow\Sun\Java\

    Step 2: Install Java 8

    Download and install Java 8 JRE (1.8.0_xxx) from Oracle’s website. Verify after installation:

    java -version

    Expected output:

    java version "1.8.0_xxx"

    Step 3: Install OpenWebStart

    Oracle EBS 12.2 requires Java Web Start to launch .jnlp files. Since modern Java versions removed the built-in Web Start, install OpenWebStart as a replacement. Download from: https://openwebstart.com

    Step 4: Configure Java Security

    Go to Control Panel → Java → Security → Edit Site List and add your EBS URLs:

    http://your-ebs-url
    http://your-ebs-url:port
    https://your-ebs-url:port

    Step 5: Clear Java Cache

    Go to Java Control Panel → General → Delete Files, select all options, and click OK.

    Step 6: Associate JNLP Files with OpenWebStart

    • Right-click any .jnlp file
    • Select Open with → Choose another app
    • Select OpenWebStart
    • Check Always use this app

    Fix for “Application Blocked by Java Security”

    If you see the Application Blocked by Java Security error, add your EBS URL to the Exception Site List as described in Step 4. This allows self-signed certificates to run without being blocked.

    Expected Behavior After Configuration

    1. Login to EBS 12.2
    2. Click a Form-based responsibility
    3. .jnlp file launches via OpenWebStart
    4. Oracle Form opens successfully

    Important Notes

    • Do not use Java 7 — that is only required for EBS 11i
    • IE Mode in Edge is not required for EBS 12.2
    • Always use the Java 8 + OpenWebStart combination
    • Ensure popups and file downloads are allowed in your browser

    Summary

    ComponentRequirement
    JavaJava 8 (required)
    LauncherOpenWebStart (required)
    Browser PluginNot required
    IE ModeNot required

    Conclusion

    Most EBS 12.2 Forms issues come down to incorrect Java setup or missing OpenWebStart. If you previously configured your machine for EBS 11i, fully revert those settings before applying the 12.2 configuration — conflicting setups are one of the most common causes of Forms not opening.

    — Syed Anwar Ahmed | Oracle Apps DBA | LinkedIn

  • How I Fixed Oracle AHF/TFA Not Starting on an 11g RAC Cluster (TFA-00002 / AHF-07250)

    Category: Oracle RAC | Troubleshooting | AHF/TFA  |  Level: Intermediate to Advanced


    Background

    We recently had a critical production incident on our two-node Oracle 11g RAC cluster where the Fast Recovery Area (FRA) hit capacity, causing both instances to enter an INTERMEDIATE state due to a Stuck Archiver condition. Oracle Support raised an SR and asked for CRS diagnostic data collected using TFA (Trace File Analyzer).

    That’s when we discovered a second problem — TFA was completely non-functional on both nodes with the infamous TFA-00002 error. This post documents the full journey of diagnosing and fixing TFA, and how we manually collected the CRS logs for the SR in the meantime.


    The SR Request

    Oracle Support requested the following:

    1. CRS alert log from all nodes: <ORACLE_BASE>/diag/crs/*/crs/trace/alert.log
    2. All CRS-related trace files updated during the incident period

    Step 1 — Finding the CRS Alert Log

    The first challenge was locating the CRS logs. This cluster has a separate Grid Infrastructure installation with a different OS user (grid) from the database (oracle).

    [oracle@racnode1 ~]$ echo $ORACLE_BASE
    /u01/oradb/oracle

    Switching to the grid user:

    su - grid
    echo $ORACLE_HOME
    # /u01/oragrid/11.2/grid

    ORACLE_BASE was not set for the grid user, so we used the orabase binary:

    $ORACLE_HOME/bin/orabase
    # /u01/oragrid/oracle

    However, the ADR path (/u01/oragrid/oracle/diag/crs/*/crs/trace/alert.log) didn’t exist. This is because Oracle 11.2 Grid Infrastructure uses a different log location — not the ADR diag tree. The correct format is:

    $GRID_HOME/log/<hostname>/alert<hostname>.log

    The correct path on our cluster:

    ls -lh /u01/oragrid/11.2/grid/log/racnode1/alertracnode1.log
    # -rw-rw-r--. 1 grid oinstall 14M Apr 12 12:53 alertracnode1.log

    ⚠️ Key takeaway: On 11.2 GI, CRS alert logs live under $GRID_HOME/log/<hostname>/ — not in the ADR structure used by 12c and later.


    Step 2 — Manually Collecting CRS Logs

    The log directory structure under $GRID_HOME/log/<hostname>/ includes:

    alertracnode1.log    ← Main CRS alert log
    crsd/                ← CRSD rotating logs (crsd.log, crsd.l01, crsd.l02 ...)
    cssd/                ← CSS daemon logs
    ohasd/               ← Oracle High Availability Services logs
    ctssd/               ← Cluster Time Sync Service logs

    Since TFA was broken, we collected manually:

    On node 1:

    tar cvf /tmp/crstrace.racnode1.$(date +%Y%m%d%H%M%S).tar \
      /u01/oragrid/11.2/grid/log/racnode1/crsd/crsd.log \
      /u01/oragrid/11.2/grid/log/racnode1/crsd/crsd.l01 \
      /u01/oragrid/11.2/grid/log/racnode1/crsd/crsdOUT.log \
      /u01/oragrid/11.2/grid/log/racnode1/alertracnode1.log \
      /u01/oragrid/11.2/grid/log/racnode1/cssd/ \
      /u01/oragrid/11.2/grid/log/racnode1/ohasd/
    
    zip /tmp/crstrace.racnode1.zip /tmp/crstrace.racnode1.*.tar

    On node 2:

    ssh racnode2 "tar cvf /tmp/crstrace.racnode2.$(date +%Y%m%d%H%M%S).tar \
      /u01/oragrid/11.2/grid/log/racnode2/crsd/crsd.log \
      /u01/oragrid/11.2/grid/log/racnode2/crsd/crsd.l01 \
      /u01/oragrid/11.2/grid/log/racnode2/alertracnode2.log \
      /u01/oragrid/11.2/grid/log/racnode2/cssd/ \
      /u01/oragrid/11.2/grid/log/racnode2/ohasd/ && \
      zip /tmp/crstrace.racnode2.zip /tmp/crstrace.racnode2.*.tar"
    
    scp racnode2:/tmp/crstrace.racnode2.zip /tmp/

    Note: The crsd logs use a rotating format (.log, .l01, .l02 …) — not .trc files. The incident-period data was in crsd.l01.


    Step 3 — Diagnosing TFA-00002

    With the SR logs uploaded, we turned to fixing TFA. Here’s what we found:

    tfactl status
    # TFA-00002 Oracle Trace File Analyzer (TFA) is not running
    # TFA-00107 TFA failed to start after multiple attempts of start (retries from init.tfa)

    Checking AHF Installation Layout

    cat /etc/oracle.ahf.loc
    # /opt/oracle.ahf
    
    cat /opt/oracle.ahf/install.properties
    # AHF_HOME=/opt/oracle.ahf
    # BUILD_VERSION=2603000
    # BUILD_DATE=202604061821
    # TFA_HOME=/opt/oracle.ahf/tfa
    # DATA_DIR=/u01/oragrid/oracle/oracle.ahf/data

    The AHF binaries were at /opt/oracle.ahf/ and data at /u01/oragrid/oracle/oracle.ahf/data/ — a non-default split layout.

    The Actual Error — AHF-07250

    Checking the systemd journal revealed the real error:

    journalctl -u oracle-tfa --no-pager | tail -20
    
    init.tfa: AHF-07250: Cannot establish connection with TFA Server.
    init.tfa: Cause: Cannot establish connection with TFA server on 5000.
    init.tfa: Action: Ensure that communication is open on port 5000 and
              that no firewall is blocking port 5000.
    init.tfa: ERROR: TFAMain is spawning too fast, Human intervention required!!!
    init.tfa: Disabling TFA at : ...

    What We Ruled Out

    Check Result
    Port 5000 blocked by iptables Not blocked — policy ACCEPT
    SELinux enforcing Disabled
    Java missing/incompatible Java 11.0.30 — fine
    Disk space 16GB free on /, 410GB on /u01
    portmapping.txt / ssl.properties missing Missing — but not the root cause

    The TFA Java process was crashing before it could bind to port 5000. The AHF upgrade had left TFA in an unrecoverable broken state on both nodes.

    Attempted Fix — tfactl syncnodes

    tfactl syncnodes
    # Generating new TFA Certificates...
    # Successfully generated certificates.
    # ...
    # TFA-00002 Oracle Trace File Analyzer (TFA) is not running

    Certificates were synced successfully but TFA still wouldn’t start. The issue was deeper than certificate mismatches.


    Step 4 — The Fix: Clean AHF Reinstall

    Uninstall on node 1

    ahfctl uninstall -local
    # AHF will be uninstalled on: racnode1
    # Do you want to continue with AHF uninstall ? [Y]|N : Y
    # ...
    # CHA is disabled

    Note: Uninstalling AHF does NOT remove the data/repository directory, so historical collections and diag data are preserved.

    Download AHF Installer

    Download AHF-LINUX_v26.x.x.zip from My Oracle Support and stage it to /tmp/ on node 1.

    🔗 MOS Doc ID 2550798.1 — Autonomous Health Framework (AHF) Download

    Reinstall on both nodes from node 1

    unzip /tmp/AHF-LINUX_v26.3.0.zip -d /tmp/ahf_install
    cd /tmp/ahf_install
    ./ahf_setup -ahf_loc /opt/oracle.ahf -data_dir /u01/oragrid/oracle/oracle.ahf/data

    Answer N to email notification. Answer Y to install on cluster nodes when prompted.

    Node 2 needed a separate local reinstall

    The cluster-wide install didn’t fully fix node 2. We reinstalled locally using the -local flag:

    # From node 1
    scp /tmp/AHF-LINUX_v26.3.0.zip racnode2:/tmp/
    
    ssh racnode2 "ahfctl uninstall -local"
    
    ssh racnode2 "unzip /tmp/AHF-LINUX_v26.3.0.zip -d /tmp/ahf_install && \
      cd /tmp/ahf_install && \
      ./ahf_setup -ahf_loc /opt/oracle.ahf \
      -data_dir /u01/oragrid/oracle/oracle.ahf/data -local"

    The -local flag skips cluster coordination and installs cleanly on the local node only.


    Final Verification

    tfactl print status
    
    | Host     | Status of TFA | PID   | Port | Version    | Inventory Status |
    |----------|---------------|-------|------|------------|------------------|
    | racnode1 | RUNNING       |  6355 | 5000 | 26.3.0.0.0 | COMPLETE         |
    | racnode2 | RUNNING       | 28301 | 5000 | 26.3.0.0.0 | COMPLETE         |

    Both nodes RUNNING with COMPLETE inventory status. ✅


    Summary

    Problem Root Cause Fix
    CRS alert log not found at ADR path 11.2 GI uses $GRID_HOME/log/hostname/ not ADR Collect from $GRID_HOME/log/ directly
    TFA-00002 on both nodes AHF upgrade left TFA in broken state Clean uninstall + reinstall of AHF 26.3.0
    TFA not starting after syncnodes Deeper corruption beyond cert mismatch Full reinstall with -local flag on each node

    Key Commands Reference

    # Find CRS alert log on 11.2 GI
    ls $GRID_HOME/log/$(hostname)/alert$(hostname).log
    
    # Collect CRS logs manually
    tar cvf /tmp/crstrace.$(hostname).tar \
      $GRID_HOME/log/$(hostname)/crsd/crsd.log \
      $GRID_HOME/log/$(hostname)/crsd/crsd.l01 \
      $GRID_HOME/log/$(hostname)/alert$(hostname).log \
      $GRID_HOME/log/$(hostname)/cssd/ \
      $GRID_HOME/log/$(hostname)/ohasd/
    
    # Check TFA status
    tfactl print status
    
    # Check actual TFA error
    journalctl -u oracle-tfa --no-pager | tail -30
    
    # Uninstall AHF
    ahfctl uninstall -local
    
    # Reinstall AHF (cluster-wide)
    ./ahf_setup -ahf_loc /opt/oracle.ahf -data_dir <data_dir>
    
    # Reinstall AHF (local node only)
    ./ahf_setup -ahf_loc /opt/oracle.ahf -data_dir <data_dir> -local
    
    # Collect TFA diagnostics for Support
    tfactl diagnosetfa

    References

  • Oracle EBS 12.2 — ADOP fs_clone Failure: Failed to Delete FMW_Home (Root Cause & Fix)

    Category: Oracle EBS 12.2  |  Topic: ADOP Patching  |  Difficulty: Intermediate  |  Oracle Support: Search ADOP fs_clone Failed to delete FMW_Home on My Oracle Support

    Introduction

    Oracle EBS 12.2 introduced Online Patching (ADOP), which relies on a dual file system architecture — a Run File System (fs2) where production runs, and a Patch File System (fs1) where patches are applied. The fs_clone phase synchronises fs1 from fs2 at the start of each patching cycle, making fs1 a fresh copy of the production file system.

    One of the most common issues encountered during fs_clone is a failure while trying to delete the FMW_Home directory on the Patch FS. This blog walks through a real production scenario on a 2-node RAC database with 4 application server nodes — covering the exact error, step-by-step diagnostic process, root cause identification, fix applied, and the final successful run with actual timings. All server-specific details have been anonymised.

    This issue applies to Oracle EBS 12.2.x on all platforms. For related Oracle Support articles, search “ADOP fs_clone Failed to delete FMW_Home” on My Oracle Support.

    Environment

    ParameterValue
    ApplicationOracle E-Business Suite 12.2 (2-Node RAC DB + 4 Application Server Nodes)
    ADOP VersionC.Delta.13
    ADOP Session ID129
    Run File System (fs2)/u01/app/fs2 (Production — active)
    Patch File System (fs1)/u01/app/fs1 (Inactive — patching target)
    Shared StorageNFS-mounted shared volume (1.3T, 447G free)
    OS Userapplmgr

    Incident Timeline

    EventTimestampDurationOutcome
    1st run startedApr 11, 2026 17:07:35time adop phase=fs_clone executed
    1st run failedApr 11, 2026 ~18:41~1h 34mFATAL ERROR — FMW_Home deletion failed
    Diagnosis performedApr 11, 2026 18:45–19:44~59mRoot cause identified — root-owned OHS log files
    Fix applied (mv)Apr 11, 2026 ~19:44SecondsFMW_Home renamed to dated backup as applmgr
    2nd run startedApr 11, 2026 19:45time adop phase=fs_clone re-executed after fix
    2nd run completedApr 12, 2026 00:49:085h 21m 25sSUCCESS — all 4 app nodes completed ✅
    Total session elapsedApr 11 17:07 → Apr 12 00:497h 46m 33sFull session including failed run + fix + retry

    The Issue

    During time adop phase=fs_clone, the synchronisation process was progressing normally — staging the file system clone, detaching Oracle Homes, removing APPL_TOP and COMM_TOP — until it reached stage 6 (REMOVE-1012-ORACLE-HOME) inside the removeFMWHome() function, where it attempted to delete the FMW_Home directory on the Patch FS and hit a fatal error.

    Error on the Console

    fs_clone/remote_execution_result_level1.xml:
    *******FATAL ERROR*******
    PROGRAM : (.../fs2/EBSapps/appl/ad/12.0.0/patch/115/bin/txkADOPPreparePhaseSynchronize.pl)
    TIME    : Apr 11 18:41:40 2026
    FUNCTION: main::removeDirectory [ Level 1 ]
    ERRORMSG: Failed to delete the directory /u01/app/fs1/FMW_Home.
    [UNEXPECTED]fs_clone has failed

    Key Log File: txkADOPPreparePhaseSynchronize.log

    The primary log file is located at:

    $ADOP_LOG_DIR/<session_id>/<timestamp>/fs_clone/<node>/TXK_SYNC_create/
        txkADOPPreparePhaseSynchronize.log

    Inside this log, the clone status progression was clearly visible:

    ========================== Inside getCloneStatus()... ==========================
    clone_status             = REMOVE-1012-ORACLE-HOME
    clone_status_from_caller = 7
    clone_status_from_db     = 6
    Removing the directory: /u01/app/fs1/FMW_Home
    Failed to delete the directory /u01/app/fs1/FMW_Home.
    *******FATAL ERROR*******
    FUNCTION: main::removeDirectory [ Level 1 ]
    ERRORMSG: Failed to delete the directory /u01/app/fs1/FMW_Home.

    clone_status_from_db = 6 indicates the process had already completed: fs_clone staging, detach of Oracle Homes, removal of APPL_TOP, COMM_TOP, and 10.1.2 Oracle Home. It failed specifically and only while removing FMW_Home.

    ADOP fs_clone Stage Flow

    Stage DBClone StatusDescription
    1STARTEDSession initialised
    2FSCLONESTAGE-DONEFile system staging completed
    3DEREGISTER-ORACLE-HOMESOracle Homes deregistered from inventory
    4REMOVE-APPL-TOPAPPL_TOP removed from Patch FS
    5REMOVE-COMM-TOPCOMM_TOP removed from Patch FS
    6REMOVE-1012-ORACLE-HOMERemoving FMW_Home — ❌ FAILED HERE
    7+(clone proceeds…)Clone fs2 to fs1, re-register homes, config clone

    Diagnostic Steps

    Step 1 — Confirm You Are on the Correct File System

    Most critical check first. The target must be on Patch FS (fs1), never Run FS (fs2):

    echo $FILE_EDITION   # Must show: run
    echo $RUN_BASE       # Must show path to fs2
    $ echo $FILE_EDITION
    run
    $ echo $RUN_BASE
    /u01/app/fs2

    ⚠️ If FILE_EDITION shows patch, stop immediately — source the Run FS environment before proceeding.

    Step 2 — Check for Open File Handles

    lsof +D /u01/app/fs1/FMW_Home 2>/dev/null
    fuser -cu /u01/app/fs1/FMW_Home 2>&1

    In our case both commands returned empty output — no active process was holding FMW_Home open.

    Step 3 — Identify Root-Owned Files

    find /u01/app/fs1/FMW_Home ! -user applmgr -ls 2>/dev/null

    Output revealed multiple root-owned files under the OHS instance directories:

    drwxr-x---  3 root root 4096 Feb 15 06:48 .../EBS_web_OHS4/auditlogs/OHS
    -rw-------  1 root root    0 Feb 15 06:48 .../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/sec_audit_log
    -rw-r-----  1 root root 5670 Feb 15 06:49 .../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/EBS_web.log
    -rw-r-----  1 root root  249 Feb 15 06:49 .../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/access_log
    ... (same pattern for EBS_web_OHS2 and EBS_web_OHS3)

    All root-owned files were dated February 15 — nearly 2 months stale. This confirmed they were leftovers from OHS being incorrectly started as root during a previous patching cycle. No active process was involved.

    Step 4 — Attempt Manual Delete to Confirm the Error

    rm -rf /u01/app/fs1/FMW_Home 2>&1 | head -5
    rm: cannot remove '.../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/sec_audit_log': Permission denied

    This confirmed the issue was purely a file ownership/permission problem — not filesystem corruption or an NFS issue.

    Step 5 — Check Disk Space

    df -h /u01/app/fs1
    Filesystem      Size  Used Avail Use% Mounted on
    nfs_server:/vol  1.3T  844G  447G  66% /u01

    447GB free — sufficient to retain a backup of FMW_Home by renaming it.

    Root Cause Analysis

    The root cause was OHS (Oracle HTTP Server) being started as root on the Patch File System during a previous patching cycle in February 2026. This created log and audit files owned by root under:

    /u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS2/auditlogs/OHS/
    /u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS2/diagnostics/logs/OHS/EBS_web/
    /u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS3/  (same structure)
    /u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS4/  (same structure)

    Since fs_clone runs as applmgr, and applmgr cannot delete files owned by root, the removeDirectory() function in txkADOPPreparePhaseSynchronize.pl failed with Permission Denied — surfaced as a fatal error.

    Why did OHS create root-owned files? If OHS start/stop scripts are executed as root or with sudo (instead of using applmgr-owned wrapper scripts), the resulting log and audit files are created with root ownership and persist on the Patch FS across patching cycles.

    Pre-Action Safety Checklist

    CheckExpectedResult
    FILE_EDITION = runrun✅ PASS
    RUN_BASE points to fs2/u01/app/fs2✅ PASS
    FMW_Home target is on fs1 (Patch FS only)fs1 only✅ PASS
    lsof returns empty (no open handles)Empty✅ PASS
    Root-owned files are stale (no active processes)Stale only✅ PASS
    Sufficient disk space for backup rename> 50GB free✅ PASS
    Production services confirmed running on fs2fs2 up✅ PASS

    Solution — Move FMW_Home as Backup

    The safest approach on production is to move (rename) FMW_Home rather than deleting it. This avoids the need for root access entirely, completes in seconds, and preserves a backup.

    Why mv works even with root-owned files: mv on the same filesystem is a purely atomic rename at the directory level. It does not touch or modify any file contents inside the directory — so applmgr can rename FMW_Home even if files inside are owned by root. This is fundamentally different from rm -rf, which must access and remove each individual file.

    Step 1 — Move FMW_Home as a Dated Backup

    mv /u01/app/fs1/FMW_Home /u01/app/fs1/FMW_Home_$(date +%d%b%Y)_bkp && echo "MOVE SUCCESSFUL"
    MOVE SUCCESSFUL

    Step 2 — Verify FMW_Home Is Gone

    ls -lrt /u01/app/fs1/

    Step 3 — Confirm You Are applmgr Before Retrying

    whoami
    # Expected output: applmgr

    ⚠️ Never run adop as root. Always confirm whoami shows applmgr before executing any adop command.

    Step 4 — Retry fs_clone

    time adop phase=fs_clone

    Running fs_clone Safely on Production

    time adop phase=fs_clone on a 2-node RAC with 4 application server nodes takes several hours. Never run it in a plain SSH/PuTTY session that could disconnect. Use one of the following:

    • VNC Session (Best): Network drops have zero impact on the running process.
    • nohup: nohup adop phase=fs_clone > /tmp/fsclone_$(date +%Y%m%d_%H%M%S).log 2>&1 &
    • screen: screen -S fsclone then time adop phase=fs_clone. Detach with Ctrl+A D, reattach with screen -r fsclone.

    Successful Run — 2nd Attempt

    After applying the fix, time adop phase=fs_clone was re-executed. The adopmon output confirmed all 4 application nodes progressing through validation, port blocking, clone steps, and config clone phases without any errors.

    ADOP (C.Delta.13)
    Session Id: 129
    Command:    status
    Node Name   Node Type  Phase        Status     Started               Finished              Elapsed
    ----------  ---------  -----------  ---------  --------------------  --------------------  -------
    app-node1   master     FS_CLONE     COMPLETED  2026/04/11 17:07:35   2026/04/12 00:49:08   7:46:33
    app-node2   slave      CONFIG_CLONE COMPLETED  2026/04/11 17:07:36   2026/04/12 01:01:55   7:47:19
    app-node3   slave      CONFIG_CLONE COMPLETED  2026/04/11 17:07:36   2026/04/12 01:01:25   7:47:49
    app-node4   slave      CONFIG_CLONE COMPLETED  2026/04/11 17:07:36   2026/04/12 01:02:16   7:47:40
    File System Synchronization Type: Full
    adop exiting with status = 0 (Success)
    Summary report for current adop session:
        Node app-node1:  - Fs_clone status: Completed successfully
        Node app-node2:  - Fs_clone status: Completed successfully
        Node app-node3:  - Fs_clone status: Completed successfully
        Node app-node4:  - Fs_clone status: Completed successfully
    adop exiting with status = 0 (Success)
    real    321m25.733s   (5 hours 21 minutes 25 seconds)
    user     40m1.142s
    sys      70m59.804s
    NodeTypeStartedFinishedElapsed
    app-node1MasterApr 11, 2026 17:07:35Apr 12, 2026 00:49:087h 46m 33s
    app-node2SlaveApr 11, 2026 17:07:36Apr 12, 2026 01:01:557h 47m 19s
    app-node3SlaveApr 11, 2026 17:07:36Apr 12, 2026 01:01:257h 47m 49s
    app-node4SlaveApr 11, 2026 17:07:36Apr 12, 2026 01:02:167h 47m 40s

    The 2nd run completed cleanly in 5 hours 21 minutes 25 seconds across all 4 application nodes. File System Synchronization Type: Full.

    Post-Resolution Cleanup

    After a successful fs_clone and full patching cycle, old FMW_Home backups can be removed. Keep the most recent backup until the next patching cycle completes, then clean up older ones as root (since they may contain root-owned files):

    ls -lrt /u01/app/fs1/FMW_Home*
    du -sh /u01/app/fs1/FMW_Home*
    # Remove old backups as root
    sudo rm -rf /u01/app/fs1/FMW_Home_<old_date>_bkp

    Prevention — Avoiding Recurrence

    • Never start OHS as root. Always use applmgr-owned wrapper scripts. Never use sudo or root to run adohs.sh or adadminsrvctl.sh.
    • Post-patching ownership check. After every adop finalize/cutover, run: find /u01/app/fs1 ! -user applmgr -ls 2>/dev/null | head -20
    • Pre-fs_clone health check. Verify no lingering adop sessions, confirm Run FS services are healthy, check disk space, and verify no root-owned files under fs1/FMW_Home before starting.

    Summary

    ItemDetail
    Phaseadop phase=fs_clone
    Failing Functionmain::removeDirectory inside removeFMWHome()
    Clone Stageclone_status_from_db = 6 (REMOVE-1012-ORACLE-HOME)
    Root CauseOHS started as root in a previous cycle — stale root-owned OHS log/audit files blocking applmgr deletion
    Production ImpactNone — fs1 is Patch FS, production ran on fs2 throughout
    Fix Appliedmv FMW_Home to dated backup as applmgr — atomic rename, no root needed, completed in seconds. rm -rf was NOT used.
    1st Run Duration~1h 34m before fatal error (Apr 11 17:07 → 18:41)
    2nd Run Duration5h 21m 25s — completed successfully (Apr 11 19:45 → Apr 12 00:49)
    Total Session Elapsed7h 46m 33s (including failed run, diagnosis, fix, and retry)
    Final Statusadop exiting with status = 0 (Success) — all 4 app nodes completed ✅
    PreventionNever start OHS as root; add post-patching ownership check to runbook
    Oracle SupportSearch “ADOP fs_clone Failed to delete FMW_Home” on My Oracle Support

    Happy Debugging! All server-specific details have been anonymised. The diagnostic commands and fix are generic and applicable to any Oracle EBS 12.2.x environment. If this helped you, feel free to share with the community.

  • Stuck Archiver on Oracle 11g RAC: How a Full FRA Brought Down Archiving — and How We Fixed It Live

    Introduction

    One of the most disruptive incidents a DBA can face in a production RAC environment is a Stuck Archiver — a condition where Oracle’s archiver process (ARCn) is unable to archive online redo logs, effectively stalling all database activity. Transactions pile up, applications hang, and the business takes a hit.

    This post documents a real-world Stuck Archiver incident on a two-node Oracle 11g RAC cluster, caused by Fast Recovery Area (FRA) exhaustion. I’ll walk through the exact symptoms, diagnosis queries, root cause analysis, and the live fix — along with preventive measures to ensure this doesn’t repeat.

    Environment Overview

    Component Details
    Database Oracle 11g R2 (11.2.0.4.0)
    Architecture RAC — 2 Nodes
    Database Role PRIMARY — READ WRITE, ARCHIVELOG mode
    OS Linux (x86_64)
    FRA Location ASM Diskgroup +ARCHIVELOG
    FRA Size (pre-incident) 199 GB
    FRA Size (post-fix) 400 GB

    The Incident: What We Saw

    The alert came in the early hours of the morning. Applications were hanging, users were reporting timeouts, and both RAC instances had entered an INTERMEDIATE state. The alert log was flooded with:

    ORA-00257: archiver error. Connect internal only, until freed.
    ARC0: Error 19809 Creating archive log file to '<FRA_PATH>'
    ARCH: Archival stopped, error occurred. Will continue retrying
    ORACLE Instance PRODDB1 - Archival Error

    The archiver was stuck on both instances, and Oracle had locked down the database to internal connections only. The incident was open for approximately 45 minutes before the fix was applied and applications recovered.

    Step 1: Check Redo Log Status

    First priority — verify the state of online redo log groups. If LGWR cannot overwrite ACTIVE logs because they are unarchived, the database will hang regardless of any other fix.

    SELECT group#, status, archived, members FROM v$log ORDER BY group#;
    
    GROUP#   STATUS           ARCHIVED   MEMBERS
    ------   ---------------  ---------  -------
    1        ACTIVE           NO         2
    2        CURRENT          NO         2
    3        INACTIVE         YES        2
    4        INACTIVE         YES        2

    Groups 1 and 2 needed archiving. LGWR cannot overwrite ACTIVE or CURRENT redo logs until they are archived — this confirmed the database hang was directly caused by the stuck archiver.

    Step 2: Confirm the Archiver Status

    -- Check both RAC instances
    SELECT inst_id, instance_name, host_name, version,
           database_role, open_mode, log_mode
    FROM gv$instance ORDER BY inst_id;
    
    INST_ID  INSTANCE_N  VERSION       DATABASE_ROLE  OPEN_MODE   LOG_MODE
    -------  ----------  ------------  -------------  ----------  ----------
          1  PRODDB1     11.2.0.4.0    PRIMARY        READ WRITE  ARCHIVELOG
          2  PRODDB2     11.2.0.4.0    PRIMARY        READ WRITE  ARCHIVELOG
    
    SELECT instance_name, status, archiver FROM v$instance;
    
    INSTANCE_NAME    STATUS       ARCHIVER
    ---------------- ------------ --------
    PRODDB1          OPEN         STOPPED

    ARCHIVER = STOPPED confirmed it. Both RAC instances were in INTERMEDIATE state — the entire cluster was effectively down for application workloads.

    Step 3: Check FRA Usage

    SELECT
        space_limit / (1024*1024*1024)       AS limit_gb,
        space_used  / (1024*1024*1024)       AS used_gb,
        space_reclaimable / (1024*1024*1024) AS reclaimable_gb,
        ROUND((space_used / space_limit) * 100, 2) AS pct_used
    FROM v$recovery_file_dest;
    
    LIMIT_GB   USED_GB   RECLAIMABLE_GB   PCT_USED
    --------   -------   --------------   --------
    199        198.99     0.00            100.00

    100% FRA utilization with zero reclaimable space. The FRA on ASM diskgroup +ARCHIVELOG was completely full with 3,094 archived log files. Before proceeding, the ASM diskgroup was checked — +ARCHIVELOG had 299 GB free, confirming sufficient underlying storage for a size increase.

    Step 4: Identify What Was Consuming the FRA

    SELECT file_type,
        ROUND(space_used / (1024*1024*1024), 2)        AS used_gb,
        ROUND(space_reclaimable / (1024*1024*1024), 2) AS reclaimable_gb,
        number_of_files
    FROM v$recovery_area_usage
    ORDER BY space_used DESC;
    
    FILE_TYPE       USED_GB   RECLAIMABLE_GB   NUMBER_OF_FILES
    -----------     -------   --------------   ---------------
    ARCHIVED LOG    38.50     0.05             1,200+
    BACKUP PIECE    10.20     0.07             45
    FLASHBACK LOG    1.30     0.00             12

    Over 1,200 archived log files had piled up because the RMAN ARCH backup job had been silently failing for multiple nights.

    Step 5: Redo Volume Analysis — Identifying the Nightly Batch Spike

    SELECT TO_CHAR(first_time,'YYYY-MM-DD HH24') AS hour,
           COUNT(*) AS logs,
           ROUND(SUM(blocks*block_size)/1024/1024/1024,2) AS gb
    FROM v$archived_log
    WHERE dest_id=1
    GROUP BY TO_CHAR(first_time,'YYYY-MM-DD HH24')
    ORDER BY 1;
    Hour Archived Logs GB Generated
    Daytime (typical) 2–26 < 1.3 GB
    01:00 AM (nightly) 440–446 ~31 GB ⚠
    03:00–04:00 AM 191–411 11–28 GB ⚠
    05:00–06:00 AM 537–796 34–50 GB ⚠

    Clear Pattern — Nightly Batch Job Is the Culprit: Every night between 01:00 AM and 06:00 AM, redo generation spikes to 30–50 GB per hour. Daytime activity rarely exceeds 0.5 GB/hour. With the RMAN ARCH backup silently failing for multiple nights, these logs were never backed up or removed — they simply accumulated until the FRA hit 100%.

    The Resolution — Three Actions

    Action 1: Manually Delete Archive Logs from ASM

    Because RMAN could not run against a full FRA and there was no reclaimable space, older archived logs were manually deleted via ASMCMD:

    su - grid
    asmcmd
    
    ASMCMD> ls +ARCHIVELOG/PRODDB/ARCHIVELOG/
    ASMCMD> rm +ARCHIVELOG/PRODDB/ARCHIVELOG/2026_04_04/*
    ASMCMD> rm +ARCHIVELOG/PRODDB/ARCHIVELOG/2026_04_05/*
    # Recent logs retained to preserve recoverability

    After ASMCMD deletion, the RMAN catalog was resynced:

    RMAN> CROSSCHECK ARCHIVELOG ALL;
    RMAN> DELETE EXPIRED ARCHIVELOG ALL;

    Important: ASMCMD deletion removes files from ASM but the control file still holds references. Always follow up with CROSSCHECK and DELETE EXPIRED to leave the catalog in a consistent state.

    Action 2: Increase FRA Size Online

    -- Increase FRA from 199GB to 400GB (online, no restart required)
    ALTER SYSTEM SET db_recovery_file_dest_size = 400G SCOPE=BOTH SID='*';

    SID='*' applies the change to both RAC instances simultaneously. Immediately after, the archiver restarted automatically:

    ARC1: Archival started
    ARC2: Archival started
    ARC3: Archival started
    ORACLE Instance PRODDB1 - Archival Restarted

    Applications reconnected and transactions resumed within seconds.

    Action 3: Verify Database and Cluster Status

    SELECT instance_name, archiver FROM v$instance;
    -- PRODDB1   STARTED
    
    SELECT round(space_used/1024/1024/1024,2) used_gb,
           round(space_limit/1024/1024/1024,2) limit_gb,
           round((space_used/space_limit)*100,2) used_pct
    FROM v$recovery_file_dest;
    -- 202.80 GB / 400 GB / 50.70%
    
    SELECT group#, status, archived FROM v$log ORDER BY group#;
    -- All groups INACTIVE/YES except CURRENT group

    Full CRS cluster check via crsctl status res -t confirmed all resources ONLINE — both instances Open, TAF service live on both nodes, all ASM diskgroups healthy. Note: ora.gsd OFFLINE is expected for Oracle 11g RAC — it is a legacy component not required in this configuration.

    Root Cause Confirmed: Archive Log Backups Were Not Running

    With the database stable, investigation confirmed the RMAN ARCH backup had been silently failing for multiple nights. The cron job fired the script on schedule, but without Oracle environment variables set in the cron context (ORACLE_SID, ORACLE_HOME, PATH all unset), the script exited immediately without connecting to RMAN. The log at /tmp/RMAN_ARCH.log appeared empty — no RMAN error, nothing surfacing to the team.

    Complete Incident Chain:
    ARCH backup silently fails (missing Oracle env vars in cron) → 3,094 archived logs accumulate over multiple nights → Heavy batch redo (30–50 GB/hr, 01:00–06:00 AM) → FRA hits 100% (198.99 GB / 199 GB) → ARCn cannot write → Both instances enter INTERMEDIATE state → ORA-00257 → Applications hang → Archive logs manually deleted via ASMCMD → FRA expanded to 400 GB → Archiver restarts → Cron fix applied → ARCH backup reruns successfully → FRA space recovered → Cluster fully healthy

    Fix: Correct the Cron Environment

    Broken cron entry:

    05 * * * * /home/oracle/scripts/RMAN_Hotbackup.ksh ARCH > /tmp/RMAN_ARCH.log 2>&1

    Fixed cron entry:

    05 * * * * . /home/oracle/.bash_profile && /home/oracle/scripts/RMAN_Hotbackup.ksh ARCH > /tmp/RMAN_ARCH.log 2>&1

    ARCH Backup Reruns and Completes

    After the cron fix, the ARCH backup was immediately re-run manually to clear the remaining backlog and validate the fix end-to-end:

    . /home/oracle/.bash_profile
    /home/oracle/scripts/RMAN_Hotbackup.ksh ARCH

    The backup completed successfully. RMAN backed up accumulated archived logs, marked them as obsolete per retention policy, and removed them from the FRA — freeing additional space beyond what the manual ASMCMD deletion had already recovered.

    Preventive Measures Put in Place

    1. Fix the cron environment — all RMAN jobs now source ~/.bash_profile before executing.
    2. RMAN exit code alerting — script updated to email the DBA team on any non-zero RMAN exit status.
    3. FRA monitoring at 70% threshold — SQL script runs every 30 minutes via cron and alerts when FRA crosses 70%, giving enough runway before the database is impacted.
    4. FRA sized on measured data — 400 GB based on actual peak batch redo (30–50 GB/hr), not an arbitrary estimate.
    5. RAC: always use SID=’*’ — and always verify ASM diskgroup free space before increasing the FRA limit.

    Key Takeaways for Oracle DBAs

    • ORA-00257 always points to the FRA being full — check v$recovery_file_dest and v$recovery_area_usage immediately.
    • When RMAN cannot run against a full FRA, use ASMCMD to manually delete older archive logs — then run CROSSCHECK and DELETE EXPIRED to resync the catalog.
    • db_recovery_file_dest_size is dynamic — increase it online without a restart. Always confirm ASM diskgroup free space first.
    • Use SID='*' in RAC — applies the parameter change to all instances simultaneously.
    • The archiver restarts automatically after FRA space is freed — no manual ARCn intervention needed.
    • Silent cron failures are dangerous — always validate Oracle environment variables in cron context, and capture RMAN exit codes with alerting.
    • Monitor FRA proactively at 70% — this gives enough runway for corrective action before hitting 100%.
    • After a manual ASMCMD deletion, always follow up with a proper RMAN backup run to leave the FRA in a consistent state.
    • Analyse redo generation patterns — v$archived_log reveals batch spikes that should directly inform FRA sizing decisions.

    Conclusion

    A Stuck Archiver on a production RAC database is a high-severity incident, but a structured approach resolves it quickly. The immediate fix required two steps: manually deleting archived logs via ASMCMD to unblock the archiver, and increasing db_recovery_file_dest_size online to prevent immediate recurrence.

    The deeper fix was identifying why the RMAN ARCH backup had been silently failing for multiple nights — a missing Oracle environment in the cron context. Once found, the fix was a single line in crontab. Rerunning the backup manually and watching it complete successfully, with FRA usage dropping as a result, confirmed the incident was fully resolved rather than just patched.

    The combination of a corrected cron environment, RMAN exit code alerting, and proactive FRA monitoring at 70% ensures this class of incident does not repeat.

    Syed Anwar Ahmed | Oracle Apps DBA

  • Step-by-Step Guide: Applying Oracle 19c CPU Patch – January 2026 (Patch 38658587)

    This guide walks through the complete process of applying the Oracle 19c January 2026 Critical Patch Update (CPU) in a CDB/PDB environment. The bundle includes the following patches:

    • Bundle Patch 38658587 — January 2026 Oracle Database 19c Bundle Patch
    • DBSU Patch 38632161 — Database Release Update: 19.30.0.0.260120
    • OJVM Patch 38523609 — OJVM Release Update: 19.30.0.0.260120
    • Overlay Patch 34672698 — Required overlay/conflict resolution patch

    Step 1: Capture Database Details

    echo $ORACLE_SID
    sqlplus / as sysdba
    select name from v$database;
    show pdbs;

    Step 2: Pre-Patch Checks (CDB and PDB Level)

    sqlplus / as sysdba
    
    -- Check invalid objects
    Set lines 230
    set pages 100
    col OBJECT_NAME for a40
    col OBJECT_TYPE for a40
    col OWNER for a20
    select OBJECT_NAME, OBJECT_TYPE, OWNER,
           to_char(LAST_DDL_TIME,'DD-MON-YYYY HH24:MI:SS'), status
    from dba_objects
    where status='INVALID'
    order by owner;
    
    -- Check patch registry
    col ACTION_TIME for a30
    set pages 100
    set lines 300
    col status for a15
    col description for a60
    col ACTION for a15
    select patch_id,action,status,action_time,description
    from dba_registry_sqlpatch;
    
    -- Check component versions
    COL comp_name FOR a44 HEA 'Component'
    COL version FOR a17 HEA 'Version'
    COL status FOR a17 HEA 'Status'
    col COMP_ID for a12
    SELECT COMP_ID,comp_name, version, status FROM dba_registry;

    Step 3: Comment Out Crontab Entries

    (/usr/bin/crontab -l | /bin/sed 's/^/##JAN_2026##&/g' | /usr/bin/crontab)

    Step 4: Place OEM Blackout (If Applicable)

    If Oracle Enterprise Manager (OEM) is monitoring this database, place a blackout to suppress false alerts during the patching window.


    Step 5: Shutdown Database and Take Oracle Home Backup

    Bring down the database and listener, then take a TAR backup of the Oracle Home before applying any patches. This is your rollback point.


    Step 6: Apply DBSU Patch 38632161

    export PATH=$ORACLE_HOME/OPatch:$PATH
    cd /<PATCH_STAGING_DIR>/JAN_2026/38658587/38632161
    
    opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    opatch rollback -id 34672698
    opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    opatch apply
    opatch lsinventory | grep 38632161

    Step 7: Apply OJVM Patch 38523609

    export PATH=$ORACLE_HOME/OPatch:$PATH
    cd /<PATCH_STAGING_DIR>/JAN_2026/38658587/38523609
    
    opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    opatch apply
    opatch lsinventory | grep 38523609
    opatch lspatches

    Step 8: Change oradism File Permissions (Pre-Overlay)

    # Run as root
    chown oracle:oinstall /<ORACLE_HOME>/bin/oradism
    chmod 755 /<ORACLE_HOME>/bin/oradism

    Step 9: Apply Overlay Patch 34672698

    export PATH=$ORACLE_HOME/OPatch:$PATH
    cd /<PATCH_STAGING_DIR>/JAN_2026/34672698
    
    opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    opatch apply
    opatch lsinventory | grep 34672698

    Step 10: Revert oradism File Permissions

    # Run as root
    chown root:oinstall /<ORACLE_HOME>/bin/oradism
    chmod 4750 /<ORACLE_HOME>/bin/oradism

    Step 11: Start Database and Listener

    sqlplus / as sysdba
    startup;
    alter pluggable database all open;
    exit;
    
    echo $TNS_ADMIN
    lsnrctl start LISTENER

    Step 12: Run Datapatch

    cd $ORACLE_HOME/OPatch
    ./datapatch -verbose

    Step 13: Final Checks and Remove OEM Blackout

    Confirm the database and listener are up and running, then remove the OEM blackout if one was placed in Step 4.


    Step 14: Re-enable Crontab Jobs

    (/usr/bin/crontab -l | /bin/sed 's/##JAN_2026##//g' | /usr/bin/crontab)

    Step 15: Post-Patch Validation — Real Output

    After patching, verify the installed patches using OPatch. Below is the actual output from a production patching run:

    [oracle@<HOSTNAME> OPatch]$ opatch lspatches
    34672698;ORA-00800 SOFT EXTERNAL ERROR, ARGUMENTS [SET PRIORITY FAILED], [VKTM], DISM(16)
    38523609;OJVM RELEASE UPDATE: 19.30.0.0.260120 (38523609)
    38632161;Database Release Update : 19.30.0.0.260120 (38632161)
    29585399;OCW RELEASE UPDATE 19.3.0.0.0 (29585399)
    OPatch succeeded.

    Verify patch registry — confirm patches 38523609 and 38632161 show APPLY SUCCESS:

    PATCH_ID   ACTION    STATUS   ACTION_TIME                    DESCRIPTION
    ---------- --------- -------- ------------------------------ -----------------------------------------------------------
      38194382 ROLLBACK  SUCCESS  03-FEB-26 02.39.24.188908 PM   OJVM RELEASE UPDATE: 19.29.0.0.251021 (38194382)
      38523609 APPLY     SUCCESS  03-FEB-26 02.39.39.457321 PM   OJVM RELEASE UPDATE: 19.30.0.0.260120 (38523609)
      38632161 APPLY     SUCCESS  03-FEB-26 02.40.30.102001 PM   Database Release Update : 19.30.0.0.260120 (38632161)

    Verify all 15 database components are VALID:

    COMP_ID      Component                                    Version           Status
    ------------ -------------------------------------------- ----------------- -----------------
    CATALOG      Oracle Database Catalog Views                19.0.0.0.0        VALID
    CATPROC      Oracle Database Packages and Types           19.0.0.0.0        VALID
    RAC          Oracle Real Application Clusters             19.0.0.0.0        OPTION OFF
    JAVAVM       JServer JAVA Virtual Machine                 19.0.0.0.0        VALID
    XML          Oracle XDK                                   19.0.0.0.0        VALID
    CATJAVA      Oracle Database Java Packages                19.0.0.0.0        VALID
    APS          OLAP Analytic Workspace                      19.0.0.0.0        VALID
    XDB          Oracle XML Database                          19.0.0.0.0        VALID
    OWM          Oracle Workspace Manager                     19.0.0.0.0        VALID
    CONTEXT      Oracle Text                                  19.0.0.0.0        VALID
    ORDIM        Oracle Multimedia                            19.0.0.0.0        VALID
    SDO          Spatial                                      19.0.0.0.0        VALID
    XOQ          Oracle OLAP API                              19.0.0.0.0        VALID
    OLS          Oracle Label Security                        19.0.0.0.0        VALID
    DV           Oracle Database Vault                        19.0.0.0.0        VALID
    15 rows selected.

    Patch History Reference

    The table below shows the complete CPU patching history demonstrating consistent quarterly patching from May 2022 through January 2026:

    Patch IDActionDateDescription
    33561310 / 33515361APPLY22-MAY-22OJVM + DB RU 19.14.0.0.220118
    34086870 / 34133642APPLY02-AUG-22OJVM + DB RU 19.16.0.0.220719
    34411846 / 34419443APPLY06-DEC-22OJVM + DB RU 19.17.0.0.221018
    34786990 / 34765931APPLY07-MAR-23OJVM + DB RU 19.18.0.0.230117
    35050341 / 35042068APPLY04-MAY-23OJVM + DB RU 19.19.0.0.230418
    35354406 / 35320081APPLY01-AUG-23OJVM + DB RU 19.20.0.0.230718
    35648110 / 35643107APPLY02-JAN-24OJVM + DB RU 19.21.0.0.231017
    36199232 / 36233263APPLY13-MAY-24OJVM + DB RU 19.23.0.0.240416
    36414915 / 36582781APPLY06-AUG-24OJVM + DB RU 19.24.0.0.240716
    36878697 / 36912597APPLY03-DEC-24OJVM + DB RU 19.25.0.0.241015
    37102264 / 37260974APPLY04-MAR-25OJVM + DB RU 19.26.0.0.250121
    37499406 / 37642901APPLY06-MAY-25OJVM + DB RU 19.27.0.0.250415
    37847857 / 37960098APPLY05-AUG-25OJVM + DB RU 19.28.0.0.250715
    38194382 / 38291812APPLY04-NOV-25OJVM + DB RU 19.29.0.0.251021
    38523609 / 38632161APPLY03-FEB-26OJVM + DB RU 19.30.0.0.260120

    Patch Summary

    Patch IDDescriptionTypeStatus
    38658587Oracle Database 19c Bundle Patch – January 2026BundleSUCCESS
    38632161Database Release Update: 19.30.0.0.260120SecuritySUCCESS
    38523609OJVM Release Update: 19.30.0.0.260120ComponentSUCCESS
    34672698Overlay / Conflict Resolution PatchOverlaySUCCESS

    Key Takeaways

    • Always capture pre and post-patch snapshots for comparison
    • Never skip Datapatch — OPatch alone does not apply SQL-level changes
    • Revert oradism permissions after patching to avoid security issues
    • Verify all patch IDs show SUCCESS in dba_registry_sqlpatch after Datapatch
    • Verify all 15 components show VALID in dba_registry after patching
    • Maintain a consistent quarterly patching cadence to keep the database current and secure

    Written by Syed Anwar Ahmed — Oracle Apps DBA with 11 years of production experience.
    Connect: sdanwarahmed@gmail.com  |  LinkedIn

  • Statspack in Oracle 19c: A Real-World Performance Tuning Case Study

    In a production Oracle environment running on 19c, a critical performance degradation impacted business operations during peak hours. Despite being on a modern database version, the absence of Diagnostic Pack licensing meant AWR was not available. This case study demonstrates how Statspack was implemented and leveraged to identify the root cause and deliver a ~40% performance improvement using fundamental performance analysis techniques.


    Environment Overview

    ComponentDetails
    Database VersionOracle 19c
    EditionStandard Edition (No AWR)
    ApplicationOracle E-Business Suite
    Workload TypeOLTP (High concurrent users)
    Peak Users~1200 concurrent sessions

    Problem Statement

    During peak business hours, users experienced intermittent slowness with transactions taking ~3x longer than baseline. There was no historical performance repository due to AWR unavailability. Key constraints included:

    • Diagnostic Pack not licensed — AWR disabled
    • No ADDM recommendations available
    • Production-critical issue requiring immediate resolution

    Strategy and Approach

    Even in Oracle Database 19c, we relied on core DBA principles: enable Statspack, capture snapshots at 30-minute intervals, and perform comparative analysis between peak vs non-peak workload and before vs during the degradation window.


    Implementation Steps

    Step 1: Install Statspack

    Connect as SYSDBA and run the Statspack creation script. This creates the PERFSTAT schema and all required objects:

    sqlplus / as sysdba
    @?/rdbms/admin/spcreate.sql

    You will be prompted for the PERFSTAT password, default tablespace, and temporary tablespace. Choose an appropriate tablespace with sufficient space (minimum 100MB recommended for active environments).

    Step 2: Automate Snapshot Collection

    Schedule automatic snapshots every 30 minutes using the built-in Statspack job:

    -- Connect as PERFSTAT user
    sqlplus perfstat/<password>
    
    -- Take a manual snapshot
    EXECUTE statspack.snap;
    
    -- Schedule automatic snapshots (every 30 mins) using spauto.sql
    @?/rdbms/admin/spauto.sql
    
    -- Or manually schedule with DBMS_JOB
    VARIABLE jobno NUMBER;
    BEGIN
      DBMS_JOB.SUBMIT(:jobno,
        'statspack.snap;',
        SYSDATE,
        'SYSDATE + 30/1440');
      COMMIT;
    END;
    /

    Step 3: Verify Snapshots

    -- Check available snapshots
    SELECT snap_id, snap_time, snap_level
    FROM stats$snapshot
    ORDER BY snap_time DESC;
    
    -- Count total snapshots
    SELECT COUNT(*) FROM stats$snapshot;

    Step 4: Generate Statspack Report

    Generate a report between two snapshot IDs covering the peak period:

    sqlplus perfstat/<password>
    @?/rdbms/admin/spreport.sql
    -- Enter begin and end snapshot IDs when prompted

    Step 5: Query Top Wait Events Directly

    SELECT event,
           total_waits,
           time_waited_micro/1000000 time_waited_secs,
           average_wait
    FROM stats$system_event se,
         stats$snapshot s
    WHERE s.snap_id BETWEEN &begin_snap AND &end_snap
    AND se.snap_id = s.snap_id
    AND event NOT LIKE 'SQL*Net%'
    ORDER BY time_waited_micro DESC
    FETCH FIRST 10 ROWS ONLY;

    Analysis Findings

    Key finding: log file sync dominated database time — a clear indication of commit-related contention. No high-cost SQL was identified, ruling out query inefficiency as the primary cause. The load profile showed elevated user commits per second and an increased redo generation rate.


    Root Cause

    Frequent commits inside application loops (row-by-row processing) caused high log file sync waits, redo log contention, increased I/O latency, and reduced transaction throughput.


    Resolution

    The application team introduced commit batching to significantly reduce commit frequency — modifying loop logic to commit every N rows rather than every single row. Minor redo log tuning was also performed:

    -- Check redo log switch frequency (should not exceed 4-5 per hour ideally)
    SELECT TO_CHAR(first_time,'YYYY-MM-DD HH24') hour,
           COUNT(*) switches
    FROM v$log_history
    WHERE first_time > SYSDATE - 1
    GROUP BY TO_CHAR(first_time,'YYYY-MM-DD HH24')
    ORDER BY 1;
    
    -- Check redo log sizes
    SELECT group#, members, bytes/1024/1024 size_mb, status
    FROM v$log
    ORDER BY group#;

    Results

    MetricBeforeAfter
    Transaction TimeHigh (3x baseline)Down ~40%
    log file sync waitsVery HighMinimal
    User ExperiencePoorStable

    Purging Old Statspack Data

    Statspack data grows quickly in busy environments. Purge old snapshots regularly:

    -- Purge a range of snapshots
    EXECUTE statspack.purge(
      i_begin_snap  => &oldest_snap_id,
      i_end_snap    => &newest_snap_id,
      i_snap_range  => TRUE
    );
    
    -- Or use the provided script
    @?/rdbms/admin/sppurge.sql

    Key Takeaways

    • Modern DB does not mean automatic performance — even in Oracle 19c, performance issues require structured analysis
    • Wait events reveal the truth — hit ratios appeared normal, but wait events exposed the actual bottleneck
    • Application design matters — commit strategy directly impacts performance; database tuning alone cannot fix design inefficiencies
    • Statspack still matters — highly effective in modern 19c environments, especially without AWR licensing
    • Always capture snapshots during peak workload with intervals of 30 minutes or less

    Why Not AWR?

    ReasonExplanation
    LicensingDiagnostic Pack not enabled
    Cost ConstraintsAvoid additional licensing overhead
    PracticalityStatspack provided sufficient insight for this issue

    This case reflects real-world experience in Oracle E-Business Suite environments, high-concurrency OLTP systems, and performance tuning under licensing constraints. Have questions? Reach out at sdanwarahmed@gmail.com.

Syed Anwar Ahmed – Oracle DBA Blog

Oracle Database and EBS troubleshooting guides based on real production experience.

Skip to content ↓