Tag: ADOP

  • Oracle EBS 12.2 — ADOP fs_clone Failure: Failed to Delete FMW_Home (Root Cause & Fix)

    Category: Oracle EBS 12.2  |  Topic: ADOP Patching  |  Difficulty: Intermediate  |  Oracle Support: Search ADOP fs_clone Failed to delete FMW_Home on My Oracle Support

    Introduction

    Oracle EBS 12.2 introduced Online Patching (ADOP), which relies on a dual file system architecture — a Run File System (fs2) where production runs, and a Patch File System (fs1) where patches are applied. The fs_clone phase synchronises fs1 from fs2 at the start of each patching cycle, making fs1 a fresh copy of the production file system.

    One of the most common issues encountered during fs_clone is a failure while trying to delete the FMW_Home directory on the Patch FS. This blog walks through a real production scenario on a 2-node RAC database with 4 application server nodes — covering the exact error, step-by-step diagnostic process, root cause identification, fix applied, and the final successful run with actual timings. All server-specific details have been anonymised.

    This issue applies to Oracle EBS 12.2.x on all platforms. For related Oracle Support articles, search “ADOP fs_clone Failed to delete FMW_Home” on My Oracle Support.

    Environment

    ParameterValue
    ApplicationOracle E-Business Suite 12.2 (2-Node RAC DB + 4 Application Server Nodes)
    ADOP VersionC.Delta.13
    ADOP Session ID129
    Run File System (fs2)/u01/app/fs2 (Production — active)
    Patch File System (fs1)/u01/app/fs1 (Inactive — patching target)
    Shared StorageNFS-mounted shared volume (1.3T, 447G free)
    OS Userapplmgr

    Incident Timeline

    EventTimestampDurationOutcome
    1st run startedApr 11, 2026 17:07:35time adop phase=fs_clone executed
    1st run failedApr 11, 2026 ~18:41~1h 34mFATAL ERROR — FMW_Home deletion failed
    Diagnosis performedApr 11, 2026 18:45–19:44~59mRoot cause identified — root-owned OHS log files
    Fix applied (mv)Apr 11, 2026 ~19:44SecondsFMW_Home renamed to dated backup as applmgr
    2nd run startedApr 11, 2026 19:45time adop phase=fs_clone re-executed after fix
    2nd run completedApr 12, 2026 00:49:085h 21m 25sSUCCESS — all 4 app nodes completed ✅
    Total session elapsedApr 11 17:07 → Apr 12 00:497h 46m 33sFull session including failed run + fix + retry

    The Issue

    During time adop phase=fs_clone, the synchronisation process was progressing normally — staging the file system clone, detaching Oracle Homes, removing APPL_TOP and COMM_TOP — until it reached stage 6 (REMOVE-1012-ORACLE-HOME) inside the removeFMWHome() function, where it attempted to delete the FMW_Home directory on the Patch FS and hit a fatal error.

    Error on the Console

    fs_clone/remote_execution_result_level1.xml:
    *******FATAL ERROR*******
    PROGRAM : (.../fs2/EBSapps/appl/ad/12.0.0/patch/115/bin/txkADOPPreparePhaseSynchronize.pl)
    TIME    : Apr 11 18:41:40 2026
    FUNCTION: main::removeDirectory [ Level 1 ]
    ERRORMSG: Failed to delete the directory /u01/app/fs1/FMW_Home.
    [UNEXPECTED]fs_clone has failed

    Key Log File: txkADOPPreparePhaseSynchronize.log

    The primary log file is located at:

    $ADOP_LOG_DIR/<session_id>/<timestamp>/fs_clone/<node>/TXK_SYNC_create/
        txkADOPPreparePhaseSynchronize.log

    Inside this log, the clone status progression was clearly visible:

    ========================== Inside getCloneStatus()... ==========================
    clone_status             = REMOVE-1012-ORACLE-HOME
    clone_status_from_caller = 7
    clone_status_from_db     = 6
    Removing the directory: /u01/app/fs1/FMW_Home
    Failed to delete the directory /u01/app/fs1/FMW_Home.
    *******FATAL ERROR*******
    FUNCTION: main::removeDirectory [ Level 1 ]
    ERRORMSG: Failed to delete the directory /u01/app/fs1/FMW_Home.

    clone_status_from_db = 6 indicates the process had already completed: fs_clone staging, detach of Oracle Homes, removal of APPL_TOP, COMM_TOP, and 10.1.2 Oracle Home. It failed specifically and only while removing FMW_Home.

    ADOP fs_clone Stage Flow

    Stage DBClone StatusDescription
    1STARTEDSession initialised
    2FSCLONESTAGE-DONEFile system staging completed
    3DEREGISTER-ORACLE-HOMESOracle Homes deregistered from inventory
    4REMOVE-APPL-TOPAPPL_TOP removed from Patch FS
    5REMOVE-COMM-TOPCOMM_TOP removed from Patch FS
    6REMOVE-1012-ORACLE-HOMERemoving FMW_Home — ❌ FAILED HERE
    7+(clone proceeds…)Clone fs2 to fs1, re-register homes, config clone

    Diagnostic Steps

    Step 1 — Confirm You Are on the Correct File System

    Most critical check first. The target must be on Patch FS (fs1), never Run FS (fs2):

    echo $FILE_EDITION   # Must show: run
    echo $RUN_BASE       # Must show path to fs2
    $ echo $FILE_EDITION
    run
    $ echo $RUN_BASE
    /u01/app/fs2

    ⚠️ If FILE_EDITION shows patch, stop immediately — source the Run FS environment before proceeding.

    Step 2 — Check for Open File Handles

    lsof +D /u01/app/fs1/FMW_Home 2>/dev/null
    fuser -cu /u01/app/fs1/FMW_Home 2>&1

    In our case both commands returned empty output — no active process was holding FMW_Home open.

    Step 3 — Identify Root-Owned Files

    find /u01/app/fs1/FMW_Home ! -user applmgr -ls 2>/dev/null

    Output revealed multiple root-owned files under the OHS instance directories:

    drwxr-x---  3 root root 4096 Feb 15 06:48 .../EBS_web_OHS4/auditlogs/OHS
    -rw-------  1 root root    0 Feb 15 06:48 .../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/sec_audit_log
    -rw-r-----  1 root root 5670 Feb 15 06:49 .../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/EBS_web.log
    -rw-r-----  1 root root  249 Feb 15 06:49 .../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/access_log
    ... (same pattern for EBS_web_OHS2 and EBS_web_OHS3)

    All root-owned files were dated February 15 — nearly 2 months stale. This confirmed they were leftovers from OHS being incorrectly started as root during a previous patching cycle. No active process was involved.

    Step 4 — Attempt Manual Delete to Confirm the Error

    rm -rf /u01/app/fs1/FMW_Home 2>&1 | head -5
    rm: cannot remove '.../EBS_web_OHS4/diagnostics/logs/OHS/EBS_web/sec_audit_log': Permission denied

    This confirmed the issue was purely a file ownership/permission problem — not filesystem corruption or an NFS issue.

    Step 5 — Check Disk Space

    df -h /u01/app/fs1
    Filesystem      Size  Used Avail Use% Mounted on
    nfs_server:/vol  1.3T  844G  447G  66% /u01

    447GB free — sufficient to retain a backup of FMW_Home by renaming it.

    Root Cause Analysis

    The root cause was OHS (Oracle HTTP Server) being started as root on the Patch File System during a previous patching cycle in February 2026. This created log and audit files owned by root under:

    /u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS2/auditlogs/OHS/
    /u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS2/diagnostics/logs/OHS/EBS_web/
    /u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS3/  (same structure)
    /u01/app/fs1/FMW_Home/webtier/instances/EBS_web_OHS4/  (same structure)

    Since fs_clone runs as applmgr, and applmgr cannot delete files owned by root, the removeDirectory() function in txkADOPPreparePhaseSynchronize.pl failed with Permission Denied — surfaced as a fatal error.

    Why did OHS create root-owned files? If OHS start/stop scripts are executed as root or with sudo (instead of using applmgr-owned wrapper scripts), the resulting log and audit files are created with root ownership and persist on the Patch FS across patching cycles.

    Pre-Action Safety Checklist

    CheckExpectedResult
    FILE_EDITION = runrun✅ PASS
    RUN_BASE points to fs2/u01/app/fs2✅ PASS
    FMW_Home target is on fs1 (Patch FS only)fs1 only✅ PASS
    lsof returns empty (no open handles)Empty✅ PASS
    Root-owned files are stale (no active processes)Stale only✅ PASS
    Sufficient disk space for backup rename> 50GB free✅ PASS
    Production services confirmed running on fs2fs2 up✅ PASS

    Solution — Move FMW_Home as Backup

    The safest approach on production is to move (rename) FMW_Home rather than deleting it. This avoids the need for root access entirely, completes in seconds, and preserves a backup.

    Why mv works even with root-owned files: mv on the same filesystem is a purely atomic rename at the directory level. It does not touch or modify any file contents inside the directory — so applmgr can rename FMW_Home even if files inside are owned by root. This is fundamentally different from rm -rf, which must access and remove each individual file.

    Step 1 — Move FMW_Home as a Dated Backup

    mv /u01/app/fs1/FMW_Home /u01/app/fs1/FMW_Home_$(date +%d%b%Y)_bkp && echo "MOVE SUCCESSFUL"
    MOVE SUCCESSFUL

    Step 2 — Verify FMW_Home Is Gone

    ls -lrt /u01/app/fs1/

    Step 3 — Confirm You Are applmgr Before Retrying

    whoami
    # Expected output: applmgr

    ⚠️ Never run adop as root. Always confirm whoami shows applmgr before executing any adop command.

    Step 4 — Retry fs_clone

    time adop phase=fs_clone

    Running fs_clone Safely on Production

    time adop phase=fs_clone on a 2-node RAC with 4 application server nodes takes several hours. Never run it in a plain SSH/PuTTY session that could disconnect. Use one of the following:

    • VNC Session (Best): Network drops have zero impact on the running process.
    • nohup: nohup adop phase=fs_clone > /tmp/fsclone_$(date +%Y%m%d_%H%M%S).log 2>&1 &
    • screen: screen -S fsclone then time adop phase=fs_clone. Detach with Ctrl+A D, reattach with screen -r fsclone.

    Successful Run — 2nd Attempt

    After applying the fix, time adop phase=fs_clone was re-executed. The adopmon output confirmed all 4 application nodes progressing through validation, port blocking, clone steps, and config clone phases without any errors.

    ADOP (C.Delta.13)
    Session Id: 129
    Command:    status
    Node Name   Node Type  Phase        Status     Started               Finished              Elapsed
    ----------  ---------  -----------  ---------  --------------------  --------------------  -------
    app-node1   master     FS_CLONE     COMPLETED  2026/04/11 17:07:35   2026/04/12 00:49:08   7:46:33
    app-node2   slave      CONFIG_CLONE COMPLETED  2026/04/11 17:07:36   2026/04/12 01:01:55   7:47:19
    app-node3   slave      CONFIG_CLONE COMPLETED  2026/04/11 17:07:36   2026/04/12 01:01:25   7:47:49
    app-node4   slave      CONFIG_CLONE COMPLETED  2026/04/11 17:07:36   2026/04/12 01:02:16   7:47:40
    File System Synchronization Type: Full
    adop exiting with status = 0 (Success)
    Summary report for current adop session:
        Node app-node1:  - Fs_clone status: Completed successfully
        Node app-node2:  - Fs_clone status: Completed successfully
        Node app-node3:  - Fs_clone status: Completed successfully
        Node app-node4:  - Fs_clone status: Completed successfully
    adop exiting with status = 0 (Success)
    real    321m25.733s   (5 hours 21 minutes 25 seconds)
    user     40m1.142s
    sys      70m59.804s
    NodeTypeStartedFinishedElapsed
    app-node1MasterApr 11, 2026 17:07:35Apr 12, 2026 00:49:087h 46m 33s
    app-node2SlaveApr 11, 2026 17:07:36Apr 12, 2026 01:01:557h 47m 19s
    app-node3SlaveApr 11, 2026 17:07:36Apr 12, 2026 01:01:257h 47m 49s
    app-node4SlaveApr 11, 2026 17:07:36Apr 12, 2026 01:02:167h 47m 40s

    The 2nd run completed cleanly in 5 hours 21 minutes 25 seconds across all 4 application nodes. File System Synchronization Type: Full.

    Post-Resolution Cleanup

    After a successful fs_clone and full patching cycle, old FMW_Home backups can be removed. Keep the most recent backup until the next patching cycle completes, then clean up older ones as root (since they may contain root-owned files):

    ls -lrt /u01/app/fs1/FMW_Home*
    du -sh /u01/app/fs1/FMW_Home*
    # Remove old backups as root
    sudo rm -rf /u01/app/fs1/FMW_Home_<old_date>_bkp

    Prevention — Avoiding Recurrence

    • Never start OHS as root. Always use applmgr-owned wrapper scripts. Never use sudo or root to run adohs.sh or adadminsrvctl.sh.
    • Post-patching ownership check. After every adop finalize/cutover, run: find /u01/app/fs1 ! -user applmgr -ls 2>/dev/null | head -20
    • Pre-fs_clone health check. Verify no lingering adop sessions, confirm Run FS services are healthy, check disk space, and verify no root-owned files under fs1/FMW_Home before starting.

    Summary

    ItemDetail
    Phaseadop phase=fs_clone
    Failing Functionmain::removeDirectory inside removeFMWHome()
    Clone Stageclone_status_from_db = 6 (REMOVE-1012-ORACLE-HOME)
    Root CauseOHS started as root in a previous cycle — stale root-owned OHS log/audit files blocking applmgr deletion
    Production ImpactNone — fs1 is Patch FS, production ran on fs2 throughout
    Fix Appliedmv FMW_Home to dated backup as applmgr — atomic rename, no root needed, completed in seconds. rm -rf was NOT used.
    1st Run Duration~1h 34m before fatal error (Apr 11 17:07 → 18:41)
    2nd Run Duration5h 21m 25s — completed successfully (Apr 11 19:45 → Apr 12 00:49)
    Total Session Elapsed7h 46m 33s (including failed run, diagnosis, fix, and retry)
    Final Statusadop exiting with status = 0 (Success) — all 4 app nodes completed ✅
    PreventionNever start OHS as root; add post-patching ownership check to runbook
    Oracle SupportSearch “ADOP fs_clone Failed to delete FMW_Home” on My Oracle Support

    Happy Debugging! All server-specific details have been anonymised. The diagnostic commands and fix are generic and applicable to any Oracle EBS 12.2.x environment. If this helped you, feel free to share with the community.

  • When ADOP Remembers Too Much: Fixing Patch Failures Caused by Stale Metadata in Oracle EBS

    During an Oracle E-Business Suite ADOP patching cycle in a multi-node environment, the apply phase failed on one node while completing successfully on others. Despite retries — including downtime mode — the issue persisted, pointing to a deeper inconsistency within the patching framework.


    Symptoms Observed

    • ADOP session status: FAILED
    • Patch applied successfully on some nodes, failed on admin node
    • Repeated failures even with restart=no, abandon=yes, and downtime mode
    • No immediate actionable error from standard logs

    Timeline of Events

    T0 -- Patch execution initiated (ADOP apply phase)
    T1 -- Failure observed on admin node
    T2 -- Retry using downtime mode -- Failure persists
    T3 -- ADOP session review shows inconsistent state
    T4 -- Internal metadata tables analyzed
    T5 -- Cleanup performed (tables + restart directory)
    T6 -- Patch re-executed -- Success across all nodes

    Investigation

    Step 1: Check ADOP Session State

    Query the ADOP session status to understand the current state across all nodes:

    -- Check current ADOP session status
    SELECT session_id, node_name, phase, status,
           start_date, end_date
    FROM applsys.ad_adop_sessions
    ORDER BY start_date DESC;
    
    -- Check apply phase status per node
    SELECT s.session_id, n.node_name, p.phase_code,
           p.status, p.start_date, p.end_date
    FROM applsys.ad_adop_sessions s,
         applsys.ad_adop_session_phases p,
         applsys.fnd_nodes n
    WHERE s.session_id = p.session_id
    AND p.node_id = n.node_id
    ORDER BY p.start_date DESC;

    The existing session showed status FAILED with the apply phase partially completed — a clear indicator of inconsistent execution state across nodes.

    Step 2: Check adalldefaults.txt

    Reviewed the defaults file for any relevant configuration:

    cat $APPL_TOP/admin/$TWO_TASK/adalldefaults.txt | grep -i missing
    -- Key parameter found:
    -- MISSING_TRANSLATED_VERSION = No

    Modifying and retrying with this parameter had no impact, confirming the issue was not translation-related.

    Step 3: Check Install Processes Table

    -- Check for stale entries in FND_INSTALL_PROCESSES
    SELECT COUNT(*) FROM applsys.fnd_install_processes;
    
    -- View stale entries in detail
    SELECT process_status, process_name, last_update_date
    FROM applsys.fnd_install_processes
    ORDER BY last_update_date DESC;
    
    -- Check AD_DEFERRED_JOBS
    SELECT COUNT(*) FROM applsys.ad_deferred_jobs;
    SELECT * FROM applsys.ad_deferred_jobs;

    Observation: FND_INSTALL_PROCESSES contained stale entries from the failed session. AD_DEFERRED_JOBS was empty.


    Root Cause

    The failure was caused by stale and inconsistent ADOP metadata tables — specifically APPLSYS.FND_INSTALL_PROCESSES and APPLSYS.AD_DEFERRED_JOBS. ADOP internally relies on these tables to track patch progress checkpoints, deferred job execution, and restart state management. When these tables retain entries from failed or incomplete sessions, ADOP assumes an incorrect execution state, leading to patch reconciliation failure, apply phase breakdown, and node-level inconsistencies.


    Resolution Steps

    Step 1: Backup Critical Tables

    -- Always backup before any cleanup
    CREATE TABLE applsys.fnd_install_processes_bak AS
    SELECT * FROM applsys.fnd_install_processes;
    
    CREATE TABLE applsys.ad_deferred_jobs_bak AS
    SELECT * FROM applsys.ad_deferred_jobs;
    
    -- Verify backups
    SELECT COUNT(*) FROM applsys.fnd_install_processes_bak;
    SELECT COUNT(*) FROM applsys.ad_deferred_jobs_bak;

    Step 2: Drop Stale Metadata Tables

    Dropping these tables forces ADOP to rebuild clean metadata during the next run:

    DROP TABLE applsys.fnd_install_processes;
    DROP TABLE applsys.ad_deferred_jobs;

    Step 3: Reset the Restart Directory

    The restart directory can silently preserve failure states. Back it up and create a fresh one:

    cd $APPL_TOP/admin/$TWO_TASK
    
    -- Backup existing restart directory
    mv restart restart_bkp_$(date +%Y%m%d)
    
    -- Create fresh restart directory
    mkdir restart
    
    -- Verify
    ls -la | grep restart

    Step 4: Re-run the Patch

    adop phase=apply \
         patches=<patch_id> \
         restart=no \
         abandon=yes \
         apply_mode=downtime

    The patch completed successfully across all nodes after the metadata cleanup.


    Before vs After

    ComponentBefore FixAfter Fix
    ADOP SessionFailedSuccessful
    Node ConsistencyPartialFull
    Restart BehaviorStuckClean
    Patch ExecutionIncompleteCompleted

    Key Takeaways

    • ADOP is state-driven — even when logs appear clean, internal metadata drives execution decisions
    • Partial success is a clue — if some nodes succeed and one fails, focus on local metadata, not the patch itself
    • The restart directory matters — it can silently preserve failure states and must be validated before retrying
    • Downtime mode is not a fix-all — even in downtime, ADOP still reads metadata tables; corruption persists unless cleaned
    • Always backup before cleanup — never drop tables without creating a backup first

    When NOT to Use This Approach

    Avoid applying this fix if the issue is caused by missing database patches (ETCC warnings), file system or permission issues, incorrect patch sequencing, or environment misconfiguration. Always validate the root cause before performing any metadata cleanup.


    This scenario highlights a subtle but critical behavior in ADOP — sometimes patch failures are not caused by the patch itself, but by what the system remembers about past attempts. By resetting stale metadata, we allow ADOP to re-evaluate the environment cleanly, leading to successful execution.

    Have questions or faced a similar issue? Reach out at sdanwarahmed@gmail.com.