Document Display
In this Document
| Purpose |
|---|
| File Formats for Data Uploaded to Oracle Support |
|---|
| Troubleshooting Steps |
|---|
| 1. Data Gathering for All Oracle Clusterware Issues |
|---|
| 2. Data Gathering for Node Reboot/Eviction |
|---|
| 3. Data Gathering for All Real Application Cluster Issues |
|---|
| 4. Data Gathering for Real Application Cluster Performance/Hang Issues |
|---|
| 5. Data Gathering for Oracle Clusterware Installation Issues |
|---|
| 5.1. Failure before executing root script: |
|---|
| 5.2. Failure while or after executing root script |
|---|
| Appendix A. RDA |
|---|
| Appendix B. OS logs |
|---|
| Appendix C. systemstate and hanganalyze in RAC |
|---|
| References |
|---|
Applies to:
Oracle Database – Enterprise Edition – Version 10.1.0.2 and later Oracle Database Exadata Cloud Machine – Version N/A and later Oracle Cloud Infrastructure – Database Service – Version N/A and later Oracle Database Cloud Exadata Service – Version N/A and later Oracle Database Exadata Express Cloud Service – Version N/A and later Information in this document applies to any platform.
Purpose
This note will be obsolete in the future, it's strongly recommended to use TFA to prune and collect files from all nodes:
Reference: note 1513912.1 TFA Collector – Tool for Enhanced Diagnostic Gathering
TFA Collector is installed in the GI HOME and comes with 11.2.0.4 GI and higher. For GI 11.2.0.3 or lower, install the TFA Collector by referring to note 1513912.1 for instruction on downloading and installing TFA collector.
$GI_HOME/tfa/bin/tfactl diagcollect -from "MMM/dd/yyyy hh:mm:ss" -to "MMM/dd/yyyy hh:mm:ss"
Format example: "Jul/1/2014 21:00:00" Specify the "from time" to be 4 hours before and the "to time" to be 4 hours after the time of error.
This note lists what to collect for different type of Oracle Clusterware and Real Application Cluster issues, it's not mandatory to upload all the files to open a SR, however, it will speed up the resolution if all relevant info are uploaded.
File Formats for Data Uploaded to Oracle Support
Oracle Support requests that you upload compressed files grouped together by node and labeled as such in a standard format, such as .tar, .gz, .Z or .zip.
Older runs of diagcollection or any other files (i.e. if diagcollection was run a few days or weeks back) may not provide current log information which can delay the resolution.
Troubleshooting Steps
1. Data Gathering for All Oracle Clusterware Issues
Provide current diagcollection output from all nodes in the cluster.
Note 330358.1 – CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide Note 272332.1 – CRS 10gR1 Diagnostic Collection Guide
2. Data Gathering for Node Reboot/Eviction
Provide files in Section "Data Gathering for All Oracle Clusterware Issues" and the followings:
- Approximate date and time of the reboot, and the hostname of the rebooted node
- OSWatcher archives which cover the reboot time at an interval of 20 seconds with private network monitoring configured.
Note 301137.1 – OS Watcher User Guide Note.433472.1 – OS Watcher For Windows (OSWFW) User Guide
- For pre-11.2, zip of /var/opt/oracle/oprocd/* or /etc/oracle/oprocd/*
- For pre-11.2, OS logs – refer to Section Appendix B
- For 11gR2+, zip of /etc/oracle/lastgasp/* or /var/opt/oracle/lastgasp/*
- CHM/OS data that covers the reboot time for platforms where it is available, refer to Note 1328466.1 for section "How do I collect the Cluster Health Monitor data"
- If vendor clusterware is being used, upload the vendor clusterware logs
3. Data Gathering for All Real Application Cluster Issues
From all nodes:
- Provide instance alert_{$ORACLE_SID}.log, lmon, lmd*, lms*, ckpt, lgwr, lck*, dia*, lmhb(11g only), and all others traces that are modified around incident time. A quick way to identify all traces and tar them up is to use incident time with the following example:
$ grep "2010-09-02 03" *.trc | awk -F: '{print $1}' | sort -u |xargs tar cvf trace.`hostname`.`date +%Y%m%d%H%M%S`.tar
$ gzip trace*.tar
For pre-11g, execute the command in bdump and udump to identify the list of files.
For 11g+, execute the command in ${ORACLE_BASE}/diag/rdbms/$DBNAME/${ORACLE_SID}/trace to identify the list of files
- Incident files/packages in alert.log at time of the incident
- If ASM is involved, provide same set of files for ASM
- OS logs – refer to Appendix B
4. Data Gathering for Real Application Cluster Performance/Hang Issues
Provide files in Section "Data Gathering for All Real Application Cluster Issues" and the following:
- systemstate and hanganalyze – refer to Appendix C
- awr, addm and ash report, each report covers a period no more than 60 minutes
- OSWatcher archives which cover the hang time
Note 301137.1 – OS Watcher User Guide Note.433472.1 – OS Watcher For Windows (OSWFW) User Guide
- CHM/OS data what covers the hang time for platforms where it is available, refer to Note 1328466.1 for section "How do I collect the Cluster Health Monitor data"
5. Data Gathering for Oracle Clusterware Installation Issues
5.1. Failure before executing root script:
For 11gR2: note 1056322.1 – Troubleshoot 11gR2 Grid Infrastructure/RAC Database runInstaller Issues
For pre-11.2: note 406231.1 – Diagnosing RAC/RDBMS Installation Problems
5.2. Failure while or after executing root script
Provide files in Section "Data Gathering for All Oracle Clusterware Issues" and the following:
- root script (root.sh or rootupgrade.sh) screen output
- For 11gR2: provide zip of <$ORACLE_BASE>/cfgtoollogs and <$ORACLE_BASE>/diag for grid user.
- For pre-11.2: Note 240001.1 – Troubleshooting 10g or 11.1 Oracle Clusterware Root.sh Problems
Appendix A. RDA
It's recommended to provide the latest RDA from for all issues from all nodes in the cluster
Note 314422.1 – Remote Diagnostics Agent (RDA)
Appendix B. OS logs
OS logs are in the following directory depending on platform:
Linux: /var/log/messages
AIX: /bin/errpt -a (redirect this to a file called messages.out)
Solaris: /var/adm/messages
HP-UX: /var/adm/syslog/syslog.log
Tru64: /var/adm/messages
Windows: save Application Log and System Log as .TXT files using Event Viewer
Note: From 11gR2, OS logs are part of diagcollection on Linux, Solaris, HP-UX.
Appendix C. systemstate and hanganalyze in RAC
To collect hanganalyze and systemstate in RAC, execute the following on one instance to generate cluster wide dumps:
a – Connect to sqlplus as sysdba: "sqlplus / as sysdba"; if this does not work, use "sqlplus -prelim / as sysdba"
b – Execute the following commands:
- For 11g+
SQL> oradebug setospid <ospid of diag process> SQL> oradebug unlimit SQL> oradebug -g all hanganalyze 3
..Wait about 2 minutes
SQL> oradebug -g all hanganalyze 3 SQL> oradebug -g all dump systemstate 258
If possible, take another one at level 266 instead of 258
If SGA is large or fix for bug 11800959 (fixed in 11.2.0.2 DB PSU5, 11.2.0.3 and above) is not applied, level 266 could take very long time and generate a huge trace file and may not finish in hours.
- For 10g
SQL> oradebug setospid <ospid of diag process> SQL> oradebug unlimit SQL> oradebug -g all dump systemstate 266##..Wait about 2 minutes SQL> oradebug -g all dump systemstate 266
Please upload *diag* trace from either bdump or trace directory.
- If diag trace is huge or "oradebug -g all …" command is hanging, please collect system state dump from each instance individually at similar time:
SQL> oradebug setmypid SQL> oradebug unlimit SQL> oradebug hanganalyze 3
..Wait about 2 minutes
SQL> oradebug hanganalyze 3 SQL> oradebug dump systemstate 258 SQL> oradebug tracefile_name
Please upload the trace file listed above.
- If "sqlplus -prelim / as sysdba" does not work, refer to note 121779.1
If ASM is involved, collect hanganalyze and systemstate from ASM with the instruction above.
References
NOTE:736752.1 – Introducing Cluster Health Monitor (IPD/OS) NOTE:314422.1 – Remote Diagnostic Agent (RDA) – Getting Started NOTE:330358.1 – Oracle Clusterware 10gR2/ 11gR1/ 11gR2/ 12.1.0.1 Diagnostic Collection Guide NOTE:406231.1 – Diagnosing RAC/RDBMS Installation Problems NOTE:272332.1 – CRS 10g Diagnostic Collection Guide NOTE:433472.1 – OS Watcher For Windows (OSWFW) User Guide NOTE:1328466.1 – Cluster Health Monitor (CHM) FAQ NOTE:240001.1 – Troubleshooting 10g or 11.1 Oracle Clusterware Root.sh Problems NOTE:942166.1 – How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation NOTE:969254.1 – How to Proceed from Failed Upgrade to 11gR2 Grid Infrastructure on Linux/Unix NOTE:301137.1 – OSWatcher (Includes: [Video])
NOTE:1056322.1 – Troubleshoot Grid Infrastructure/RAC Database installer/runInstaller Issues