Validate Network and Name Resolution Setup for the Clusterware and RAC

2025-10-09Linux/AIX / Oracle / RAC
Purpose
Scope
Details
A. Requirement
B. Example of what we expect
C. Syntax reference
D. Multicast
E. Runtime network issues
F. Symptoms of network issues
G. Basics of Subnet
References

Applies to:

Oracle Database – Enterprise Edition – Version 10.1.0.2 and later

Oracle Database – Standard Edition – Version 12.1.0.2 and later

Generic UNIX

Generic Linux

Purpose

![1756815009177-bb9104b6-258a-4677-a131-690c99250ef8.png](https://support.oracle.com/epmos/faces/DocumentDisplay?&id=1268927.2&cid=ocdbgeneric-ad-Document-Display&parent=KM-Advert&sourceId=ocdbgeneric-ad-Document-Display)

Cluster Verification Utility (aka CVU, command runcluvfy.sh or cluvfy) does very good checking on the network and name resolution setup, but it may not capture all issues. If the network and name resolution is not setup properly before installation, it is likely the installation will fail; if network or name resolution is malfunctioning, likely the clusterware and/or RAC will have issues. The goal of this note is to provide a list of things to verify regarding the network and name resolution setup for Grid Infrastructure (clusterware) and RAC.

Scope

This document is intended for Oracle Clusterware/RAC Database Administrators and Oracle Support engineers.

Details

A. Requirement

o Network ping with package size of Network Adapter (NIC) MTU should work on all public and private network and the time of ping should be small (sub-second).

o IP address 127.0.0.1 should only map to localhost and/or localhost.localdomain, not anything else.

o 127.*.*.* should not be used by any network interface.

o Public NIC name must be same on all nodes.

o Private NIC name should be same in 11gR2 and must be same for pre-11gR2 on all nodes

o Public and private network must not be in link local subnet (169.254.*.*), should be in non-related separate subnet.

o MTU should be the same for corresponding network on all nodes.

o Network size should be same for corresponding network on all nodes.

o As the private network needs to be directly attached, traceroute should work with a packet size of NIC MTU without fragmentation or going through the routing table on all private networks in 1 hop.

o Firewall needs to be turned off on the private network.

o For 10.1 to 11.1, name resolution should work for the public, private and virtual names.

o For 11.2 without Grid Naming Service (aka GNS), name resolution should work for all public, virtual, and SCAN names; and if SCAN is configured in DNS, it should not be in local hosts file.

o For 11.2.0.2 and above, multicast group 230.0.1.0 should work on private network; with patch 9974223, both group 230.0.1.0 and 224.0.0.251 are supported. With patch 10411721 (fixed in 11.2.0.3), broadcast will be supported. See Multicast/Broadcast section to verify.

o For 11.2.0.1-11.2.0.3, Installer may report a warning if reverse lookup is not setup correctly for pubic IP, node VIP, and SCAN VIP, with bug 9574976 fix in 11.2.0.4, the warning shouldn't be there any more.

o OS level bonding is recommended for the private network for pre-11.2.0.2. Depending on the platform, you may implement bonding, teaming, Etherchannel, IPMP, MultiPrivNIC etc, please consult with your OS vendor for details. Started from 11.2.0.2, Redundant Interconnect and HAIP is introduced to provide native support for multiple private network, refer to note 1210883.1 for details.

o The commands verifies jumbo frames if it's configured. To know more about jumbo frames, refer to note 341788.1

B. Example of what we expect

Example below shows what we expect while validating the network and name resolution setup. As the network setup is slightly different for 11gR2 and 11gR1 or below, we have both case in the below example. The difference between 11gR1 or below and 11gR2 is for 11gR1, we need a public name, VIP name, private hostname, and we rely on the private name to find out the private IP for cluster communication. For 11gR2, we do not rely on the private name anymore, rather the private network is selected based on the GPnP profile while the clusterware comes up. Assuming a 3-node cluster with the following node information:

11gR1 or below cluster: Nodename |Public IP |VIP name |VIP |Private |private IP1 |private IP2

NIC/MTU Name1 NIC/MTU

rac1 |120.0.0.1 |rac1v |120.0.0.11 |rac1p |10.0.0.1

eth0/1500 eth1/1500

rac2 |120.0.0.2 |rac2v |120.0.0.12 |rac2p |10.0.0.2

eth0/1500 eth1/1500

rac3 |120.0.0.3 |rac3v |120.0.0.13 |rac3p |10.0.0.3

eth0/1500 eth1/1500

11gR2 cluster Nodename |Public IP |VIP name |VIP |private IP1

NIC/MTU NIC/MTU

rac1 |120.0.0.1 |rac1v |120.0.0.11 |10.0.0.1

eth0/1500 eth1/1500

rac2 |120.0.0.2 |rac2v |120.0.0.12 |10.0.0.2

eth0/1500 eth1/1500

rac3 |120.0.0.3 |rac3v |120.0.0.13 |10.0.0.3

eth0/1500 eth1/1500

SCAN name |SCAN IP1 |SCAN IP2 |SCAN IP3 ———-
———–|——————– scancl1 |120.0.0.21 |120.0.0.22 |120.0.0.23 ———-
———–|——————–

Below is what is needed to be verify on each node – please note the example is from a Linux platform:

  1. To find out the MTU

/bin/netstat -in

MTU

eth01500 In above example MTU is set to 1500 for eth0.

  1. To find out the IP address and subnet, compare Broadcast and Netmask on all nodes

/sbin/ifconfig

eth0

inet addr:120.0.0.1 Bcast:120.0.0.127 Mask:255.255.255.128

UP RUNNING MTU:1500

In the above example, the IP address for eth0 is 120.0.0.1, broadcast is 120.0.0.127, and net mask is 255.255.255.128, which is subnet of 120.0.0.0 with a maximum of 126 IP addresses. Refer to Section "Basics of Subnet" for more details.

Note: An active NIC must have both "UP" and "RUNNING" flag; on Solaris, "PHYSRUNNING" will indicate whether the physical interface is running

  1. Run all ping commands twice to make sure result is consistent

Below is an example ping output from node1 public IP to node2 public hostname:

0% packet losstime 1000ms

Please pay attention to the packet loss and time. If it is not 0% packet loss, or if it is not sub-second time, then it indicates there is a problem in the network. Please engage network administrator to check further.

3.1 Ping all public nodenames from the local public IP with packet size of MTU

3.2.1 Ping all private IP(s) from all local private IP(s) with packet size of MTU

      applies to 11gR2 example, private name is optional

3.2.2 Ping all private nodename from local private IP with packet size of MTU

      applies to 11gR1 and earlier example
  1. Traceroute private network

Example below shows traceroute from node1 private IP to node2 private hostname

**# Packet size of MTU – on Linux packet length needs to be MTU – 28 bytes otherwise error send: Message too long is reported.

For example with MTU value of 1500 we would use 1472 :**

1 rac2p (10.0.0.2) 0.626 ms 0.567 ms 0.529 ms MTU size packet traceroute complete in 1 hop without going through the routing table. Output other than above indicates issue, i.e. when "*" or "!H" presents.

Note: traceroute option "-F" may not work on RHEL3/4 OEL4 due to OS bug, refer to note: 752844.1 for details.

4.1 Traceroute all private IP(s) from all local private IP(s) with :

   applies to 11gR2 onwards

If "-F" option does not work, then traceroute without the "-F" parameter but with packet that's triple the MTU size, i.e.:

4.2 Traceroute all private nodename from local private IP with packet size of MTU

   applies to 11gR1 and earlier example

If "-F" option does not work, then run traceroute without the "-F" parameter but with packet that's triple MTU size, i.e.:

  1. Ping VIP hostname

Ping of all VIP nodename should resolve to correct IP

Before the clusterware is installed, ping should be able to resolve VIP nodename but

should fail as VIP is managed by the clusterware

After the clusterware is up and running, ping should succeed

  1. Ping SCAN name

applies to 11gR2

Ping of SCAN name should resolve to correct IP

Before the clusterware is installed, ping should be able to resolve SCAN name but

should fail as SCAN VIP is managed by the clusterware

After the clusterware is up and running, ping should succeed

  1. Nslookup VIP hostname and SCAN name

applies to 11gR2

To check whether VIP nodename and SCAN name are setup properly in DNS

  1. To check name resolution order
/etc/nsswitch.conf on Linux, Solaris and hp-ux, /etc/netsvc.conf on AIX
  1. To check local hosts file

If local files is in naming switch setting (nsswitch.conf), to make sure

hosts file doesn't have typo or misconfiguration, grep all nodename and IP

127.0.0.1 should not map to SCAN name, public, private and VIP hostname

Public and node VIP:

/bin/grep rac1 /etc/hosts

/bin/grep rac2 /etc/hosts

/bin/grep rac3 /etc/hosts

/bin/grep rac1v /etc/hosts

/bin/grep rac2v /etc/hosts

/bin/grep rac3v /etc/hosts

/bin/grep 120.0.0.1 /etc/hosts

/bin/grep 120.0.0.2 /etc/hosts

/bin/grep 120.0.0.3 /etc/hosts

/bin/grep 120.0.0.11 /etc/hosts

/bin/grep 120.0.0.12 /etc/hosts

/bin/grep 120.0.0.13 /etc/hosts

pre-11gR2 private example

/bin/grep rac1p /etc/hosts

/bin/grep rac2p /etc/hosts

/bin/grep rac3p /etc/hosts

/bin/grep 10.0.0.1 /etc/hosts

/bin/grep 10.0.0.2 /etc/hosts

/bin/grep 10.0.0.3 /etc/hosts

11gR2 private example

/bin/grep 10.0.0.1 /etc/hosts

/bin/grep 10.0.0.2 /etc/hosts

/bin/grep 10.0.0.3 /etc/hosts

SCAN example

If SCAN name is setup in DNS, it should not be in local hosts file

/bin/grep scancl1 /etc/hosts

/bin/grep 120.0.0.21 /etc/hosts

/bin/grep 120.0.0.22 /etc/hosts

/bin/grep 120.0.0.23 /etc/hosts

C. Syntax reference

Please refer to below for command syntax on different platform

Linux:

/bin/netstat -in
/sbin/ifconfig
/bin/ping -s <MTU> -c 2 -I source_IP nodename
/bin/traceroute -s source_IP -r -F  nodename-priv <MTU-28>
/usr/bin/nslookup

Solaris:

/bin/netstat -in
/usr/sbin/ifconfig -a
/usr/sbin/ping -i source_IP -s nodename <MTU> 2 
/usr/sbin/traceroute -s source_IP -r -F nodename-priv <MTU>
/usr/sbin/nslookup

HP-UX:

/usr/bin/netstat -in
/usr/sbin/ifconfig NIC
/usr/sbin/ping -i source_IP nodename <MTU> -n 2
/usr/contrib/bin/traceroute -s source_IP -r -F nodename-priv <MTU>
/bin/nslookup

AIX:

/bin/netstat -in 
/usr/sbin/ifconfig -a
/usr/sbin/ping -S source_IP -s <MTU> -c 2 nodename
/bin/traceroute -s source_IP -r nodename-priv <MTU>
/bin/nslookup 

Windows:

MTU:  
  Windows XP:          netsh interface ip show interface
  Windows Vista/7:    netsh interface ipv4 show subinterfaces
ipconfig /all 
ping -n 2 -l <MTU-28> -f nodename
tracert
nslookup

D. Multicast

Started with 11.2.0.2, multicast group 230.0.1.0 should work on private network for bootstrapping. patch 9974223 introduces support for another group 224.0.0.251

Please refer to note 1212703.1 to verify whether multicast is working fine.

As fix for bug 10411721 is included in 11.2.0.3, broadcast is supported for bootstrapping as well as multicast. When 11.2.0.3 Grid Infrastructure starts up, it will try broadcast, multicast group 230.0.1.0 and 224.0.0.251 simultaneously, if anyone succeeds, it will be able to start.

On hp-ux, if 10 Gigabit Ethernet is used as private network adapter, without driver revision B.11.31.1011 or later of the 10GigEthr-02 software bundle, multicast may not work. Run "swlist 10GigEthr-02" command to identify the current version on your HP server.

E. Runtime network issues

OSWatcher or Cluster Health Monitor(IPD/OS) can be deployed to capture runtime network issues.

F. Symptoms of network issues

o ping doesn't work, ping packet loss or ping time is too long (not sub-second)

o traceroute doesn't work

o name resolution doesn't work

o traceroute output like:

o gipcd.log shows:

or

o ocssd.log shows:

or

o crsd.log shows:

or

or

o octssd.log shows:

G. Basics of Subnet

Refer to note 1386709.1 for details

References

NOTE:1212703.1 – Grid Infrastructure Startup During Patching, Install or Upgrade May Fail Due to Multicasting Requirement

NOTE:301137.1 – OSWatcher (Includes: [Video])

NOTE:1210883.1 – Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip

NOTE:341788.1 – Recommendation for the Real Application Cluster Interconnect and Jumbo Frames

NOTE:1386709.1 – The Basics of IPv4 Subnet and Oracle Clusterware

NOTE:1507482.1 – Oracle Clusterware Cannot Start on all Nodes: Network communication with node missing for 90% of timeout interval

NOTE:752844.1 – RHEL3, RHEL4, OEL4: traceroute Fails with -F (do not fragment bit) Argument

NOTE:1056322.1 – Troubleshoot Grid Infrastructure/RAC Database installer/runInstaller Issues