Oracle EBS OACORE Server in FAILED_NOT_RESTARTABLE State: Real-Time Issue, RCA and Fix

In Oracle E-Business Suite (EBS) environments, application tier stability is critical to ensure seamless user experience. However, there are scenarios where managed servers behave unexpectedly and require manual intervention. This post walks through a real-world production issue where an OACORE managed server entered a FAILED_NOT_RESTARTABLE state, its impact, root cause analysis, and how it was resolved.


Environment Details

  • Oracle E-Business Suite: R12.2.x
  • Application Tier: WebLogic Managed Servers
  • Component Impacted: OACORE Server (oacore_server1)
  • Environment Type: Production

Problem Statement

An alert was received indicating oacore_server1 was in FAILED_NOT_RESTARTABLE state. Upon verification, the server was running but Node Manager could not auto-restart it.


Key Observations

Despite the OACORE server being in a failed state, the application remained accessible and functional — traffic was being handled by other OACORE servers. This is due to the multi-OACORE architecture with load balancing via OHS/Web tier. However, this creates a hidden risk: load redistribution increases pressure on remaining servers and can lead to cascading failures if not addressed promptly.


Detailed Analysis

  • Managed server restart attempts failed during initialization
  • Bulk concurrent requests were actively running
  • CPU utilization spiked on the application tier
  • JVM resources were under pressure

Understanding FAILED_NOT_RESTARTABLE

In Oracle WebLogic Server, a managed server is marked as FAILED_NOT_RESTARTABLE after repeated unsuccessful restart attempts. This is a protective mechanism designed to prevent unstable restart loops when the server cannot recover successfully.


Root Cause Analysis

The OACORE managed server entered FAILED_NOT_RESTARTABLE state due to repeated startup failures following an unclean or resource-constrained shutdown. High CPU utilization and heavy concurrent workload placed JVM resources under pressure, preventing a clean restart cycle. Residual runtime artifacts (such as incomplete shutdown state or resource locks) prevented successful reinitialization, causing WebLogic to mark the server as FAILED_NOT_RESTARTABLE.


Resolution

cd $ADMIN_SCRIPTS_HOME
./admanagedsrvctl.sh stop oacore_server1
./admanagedsrvctl.sh start oacore_server1

After the controlled restart, the server returned to RUNNING state with all deployments active and the application stable.


Identify Inactive Forms Sessions

Inactive sessions holding resources can contribute to JVM pressure. Use this query to identify them safely — do not terminate without proper validation and approvals:

SELECT s.sid,
       s.serial#,
       s.username,
       s.status,
       s.program,
       s.machine,
       ROUND(s.last_call_et/3600,2) AS hours_inactive
FROM v$session s
WHERE s.status = 'INACTIVE'
AND s.username = 'APPS'
AND s.program LIKE 'frmweb%'
AND s.last_call_et > 28800   -- 8 hours
ORDER BY hours_inactive DESC;

Reference only — do NOT execute without validation:

ALTER SYSTEM KILL SESSION 'SID,SERIAL#' IMMEDIATE;

Automate Session Monitoring

Use this script to monitor inactive sessions every 8 hours via cron:

#!/bin/bash
export ORACLE_SID=your_sid
export ORACLE_HOME=/path/to/oracle_home
export PATH=$ORACLE_HOME/bin:$PATH

sqlplus -s / as sysdba <<EOF
SET LINES 200
SET PAGES 200
SELECT COUNT(*) AS inactive_sessions
FROM v\$session
WHERE status='INACTIVE'
AND username='APPS'
AND program LIKE 'frmweb%'
AND last_call_et > 28800;
EXIT;
EOF
# Crontab entry - every 8 hours
0 */8 * * * /path/to/inactive_sessions.sh >> /tmp/inactive_sessions.log

DBA Quick Commands

-- Check system load
top
uptime
ps -ef | grep oacore

-- Check running concurrent requests
SELECT request_id, phase_code, status_code
FROM fnd_concurrent_requests
WHERE phase_code = 'R';

Key Takeaways

  • Application may appear healthy even when an OACORE server fails due to load balancing
  • FAILED_NOT_RESTARTABLE is a protective mechanism, not the root cause itself
  • Resource pressure and restart failures must be analyzed together
  • Controlled and governed actions are critical in production environments
  • Proactive session monitoring via automation helps prevent recurrence

Have questions or faced a similar issue? Reach out at sdanwarahmed@gmail.com.


Discover more from Syed Anwar Ahmed – Oracle DBA Blog

Subscribe to get the latest posts sent to your email.

Comments

Leave a comment

Discover more from Syed Anwar Ahmed – Oracle DBA Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading