RAC环境下删除了_var_tmp_.oracle_的临时文件,有什么后果,以及如何处理 – 世间所有相遇都是久别重逢 – Lunar的oracle实验室

2025-07-03ASM / Linux/AIX / Oracle / RAC

← Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-11-配置VNC和常见问题处理11.2 RAC 的启动过程 →# RAC环境下删除了/var/tmp/.oracle/的临时文件,有什么后果,以及如何处理 发表于2016 年 1 月 25 日Lunar联系:QQ(5163721)

标题:RAC环境下删除了/var/tmp/.oracle/的临时文件,有什么后果,以及如何处理

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.] **测试目的: 模拟RAC环境下有人误操作,删除了/var/tmp/.oracle/*下的oracle临时文件(删除Network Socket File)

测试过程:观察会有什么后果,以及如何处理。

**.

**测试环境:OEL 6.6 ,Oracle 11.2.0.4 Standalone(单实例使用ASM的环境)

如果是RAC,测试结论应该大体一致(机制类似)。

**

[root@lunarlib rootwork]# cat /etc/oracle-release Oracle Linux Server release 6.6[root@lunarlib rootwork]# [root@lunarlib rootwork]# uname -aLinux lunarlib 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep 10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux[root@lunarlib rootwork]#

在Linux平台上,RAC或者HAS(单实例使用ASM的环境,比如standalone或者我们说的Oracle Restart)使用的Network Socket File在/var/tmp/.oracle/*文件:

(在其他平台(比如, AIX HPUX等等)Network Socket File可能在:ls -lrt /tmp/.oracle/* /tmp/.oracle 或者 /usr/tmp/.oracle) | [root@lunarlib etc]# ls -lrt /var/tmp/.oracle/* prw-r--r-- 1 grid oinstall 0 Oct 11 01:30 /var/tmp/.oracle/npohasdsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sprocr_local_conn_0_PROLsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/slunarlibDBG_OHASDsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sOHASD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sCRSD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/slunarlibDBG_EVMDsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.2srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.1srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sAevmsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sSYSTEM.evm.acceptor.authsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sCevmsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/slunarlibDBG_CSSDsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOCSSD_LL_lunarlib_srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost[root@lunarlib etc]#

使用crsctl stop has -f停止has,然后就可以直接删除/var/tmp/.oracle/* 下面的Network Socket File: | [root@lunarlib rootwork]# crsctl stop has -fCRS-2791: Starting shutdownof Oracle High Availability Services-managed resources on 'lunarlib'CRS-2673: Attempting to stop 'ora.LISTENER.lsnr'on 'lunarlib'CRS-2673: Attempting to stop 'ora.CRSDG.dg'on 'lunarlib'CRS-2673: Attempting to stop 'ora.lunardb.db'on 'lunarlib'CRS-2677: Stop of 'ora.LISTENER.lsnr'on 'lunarlib'succeededCRS-2677: Stop of 'ora.lunardb.db'on 'lunarlib'succeededCRS-2673: Attempting to stop 'ora.DATADG1.dg'on 'lunarlib'CRS-2673: Attempting to stop 'ora.DATADG2.dg'on 'lunarlib'CRS-2677: Stop of 'ora.DATADG1.dg'on 'lunarlib'succeededCRS-2677: Stop of 'ora.DATADG2.dg'on 'lunarlib'succeededCRS-2677: Stop of 'ora.CRSDG.dg'on 'lunarlib'succeededCRS-2679: Attempting to clean 'ora.CRSDG.dg'on 'lunarlib'CRS-2681: Clean of 'ora.CRSDG.dg'on 'lunarlib'succeededCRS-2673: Attempting to stop 'ora.asm'on 'lunarlib'CRS-2677: Stop of 'ora.asm'on 'lunarlib'succeededCRS-2673: Attempting to stop 'ora.cssd'on 'lunarlib'CRS-2677: Stop of 'ora.cssd'on 'lunarlib'succeededCRS-2673: Attempting to stop 'ora.evmd'on 'lunarlib'CRS-2677: Stop of 'ora.evmd'on 'lunarlib'succeededCRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'lunarlib'has completedCRS-4133: Oracle High Availability Services has been stopped.[root@lunarlib rootwork]# ls -lrt /var/tmp/.oracle/* prw-r--r-- 1 grid oinstall 0 Oct 11 01:30 /var/tmp/.oracle/npohasdsrwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.2srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.1-rw-r--r-- 1 grid oinstall 0 Jan 11 11:01 /var/tmp/.oracle/sprocr_local_conn_0_PROL_lock-rw-r--r-- 1 grid oinstall 0 Jan 11 11:01 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/s#5185.2srwxrwxrwx 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/s#5185.1-rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOCSSD_LL_lunarlib__lock-rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1_lock-rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 11:33 /var/tmp/.oracle/s#5516.2srwxrwxrwx 1 grid oinstall 0 Jan 11 11:33 /var/tmp/.oracle/s#5516.1srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sprocr_local_conn_0_PROLsrwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/slunarlibDBG_OHASDsrwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sOHASD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sCRSD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/slunarlibDBG_EVMDsrwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/slunarlibDBG_CSSDsrwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sAevmsrwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sSYSTEM.evm.acceptor.authsrwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sCevmsrwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sOCSSD_LL_lunarlib_srwxrwxrwx 1 grid oinstall 0 Jan 11 17:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1srwxrwxrwx 1 grid oinstall 0 Jan 11 17:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost[root@lunarlib rootwork]#[root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/*[root@lunarlib rootwork]# ll /var/tmp/.oracletotal 0[root@lunarlib rootwork]# crsctl start hasCRS-4123: Oracle High Availability Services has been started.[root@lunarlib rootwork]#

如果/var/tmp/.oracle目录不存在,可以手工重建: | [root@lunarlib rootwork]# mkdir /var/tmp/.oracle[root@lunarlib rootwork]# ll /var/tmp/.oracletotal 0[root@lunarlib rootwork]# crsctl start hasCRS-4123: Oracle High Availability Services has been started.[root@lunarlib rootwork]# [root@lunarlib rootwork]# ps -ef|grep d.bingrid 5177 1 1 18:12 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootgrid 5306 1 1 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagentgrid 5311 1 1 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bingrid 5339 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bingrid 5341 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnrLISTENER -inheritgrid 5356 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 5387 5339 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.logroot 5400 5264 0 18:14 pts/100:00:00 grepd.bin[root@lunarlib rootwork]# ls -lrt /var/tmp/.oracle/* prw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/npohasd-rw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sprocr_local_conn_0_PROLsrwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/slunarlibDBG_OHASD-rw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sCRSD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/s#5341.2srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/s#5341.1srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/slunarlibDBG_EVMDsrwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/slunarlibDBG_CSSD-rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib__locksrwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sAevmsrwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sSYSTEM.evm.acceptor.authsrwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sCevm-rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1-rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost[root@lunarlib rootwork]#

如果在has正常运行的状态下删除上述oracle临时文件,那么数据库可以使用,但是不能正常关闭: | [root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/* [root@lunarlib rootwork]# ll /var/tmp/.oracle/* ls: cannot access /var/tmp/.oracle/*: No such fileor directory[root@lunarlib rootwork]# ll /var/tmp/.oracle/total 0[root@lunarlib rootwork]# [root@lunarlib rootwork]# ps -ef|grep ohasdroot 2877 1 0 17:12 ? 00:00:00 /bin/sh/etc/init.d/init.ohasd rungrid 5177 1 0 18:12 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootroot 5653 5264 0 18:21 pts/100:00:00 grepohasd[root@lunarlib rootwork]# ps -ef|grep ohasdroot 2877 1 0 17:12 ? 00:00:00 /bin/sh/etc/init.d/init.ohasd rungrid 5177 1 0 18:12 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootroot 5660 5264 0 18:23 pts/100:00:00 grepohasd[root@lunarlib rootwork]# ps -ef|grep d.bingrid 5177 1 0 18:12 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootgrid 5306 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagentgrid 5311 1 0 18:14 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/oraagent.bingrid 5339 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bingrid 5341 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnrLISTENER -inheritgrid 5356 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 5387 5339 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.logroot 5662 5264 0 18:23 pts/100:00:00 grepd.bin[root@lunarlib rootwork]# crsctl status res -tCRS-4639: Could not contact Oracle High Availability ServicesCRS-4000: Command Status failed, or completed with errors.[root@lunarlib rootwork]#

可以看到,这时,crs通信异常了。

我们看下数据库: | [oracle@lunarlib work]$ ssSQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:26:17 2016Copyright (c) 1982, 2013, Oracle. All rights reserved.Connected to:Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning and Automatic Storage Management optionsSYS@lunardb>alter system switch logfile;System altered.Elapsed: 00:00:00.14SYS@lunardb>alter system checkpoint;System altered.Elapsed: 00:00:00.06SYS@lunardb>shutdownimmediateORA-29701: unable to connect to Cluster Synchronization ServiceSYS@lunardb>exitDisconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning and Automatic Storage Management options[oracle@lunarlib work]$

这里看到数据库可以正常使用,但是不能关闭,关闭是报错:不能跟CSS进程通信。 | [oracle@lunarlib work]$ ssSQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:26:46 2016Copyright (c) 1982, 2013, Oracle. All rights reserved.Connected to:Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning and Automatic Storage Management optionsSYS@lunardb>

数据库的alert显示为: | Mon Jan 11 18:26:37 2016Shutting down instance (immediate)Stopping background process SMCOShutting down instance: further logons disabled[oracle@lunarlib trace]$ catlunardb_ora_22027.trcTrace file/u01/app/oracle/diag/rdbms/lunardb/lunardb/trace/lunardb_ora_22027.trcOracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning and Automatic Storage Management optionsORACLE_HOME = /u01/app/oracle/product/11.2.0.4/dbhome_1System name: LinuxNode name: lunarlibRelease: 3.8.13-44.1.1.el6uek.x86_64Version: #2 SMP Wed Sep 10 06:10:25 PDT 2014Machine: x86_64Instance name: lunardbRedo thread mounted by this instance: 1Oracle process number: 23Unix process pid: 22027, image: oracle@lunarlib (TNS V1-V3)* 2016-01-11 18:26:37.174* SESSION ID:(135.10871) 2016-01-11 18:26:37.174* CLIENT ID:() 2016-01-11 18:26:37.174* SERVICE NAME:(SYS$USERS) 2016-01-11 18:26:37.174* MODULE NAME:(sqlplus@lunarlib (TNS V1-V3)) 2016-01-11 18:26:37.174* ACTION NAME:() 2016-01-11 18:26:37.174 Stopping background process SMCO*** 2016-01-11 18:26:38.176kgxgncin: CLSS init failed with status 3kgxgncin: returnstatus 3 (1311719766 SKGXN not av) from CLSSNOTE: kfmsInit: ASM failed to initialize group services[oracle@lunarlib trace]$

检查一下oarcle的进程: | [oracle@lunarlib trace]$ ps-ef|grepora_oracle 5495 1 0 18:14 ? 00:00:00 ora_pmon_lunardboracle 5497 1 0 18:14 ? 00:00:00 ora_psp0_lunardboracle 5504 1 4 18:14 ? 00:00:36 ora_vktm_lunardboracle 5508 1 0 18:14 ? 00:00:00 ora_gen0_lunardboracle 5510 1 0 18:14 ? 00:00:00 ora_diag_lunardboracle 5512 1 0 18:14 ? 00:00:00 ora_dbrm_lunardboracle 5514 1 0 18:14 ? 00:00:00 ora_dia0_lunardboracle 5516 1 0 18:14 ? 00:00:00 ora_mman_lunardboracle 5518 1 0 18:14 ? 00:00:00 ora_dbw0_lunardboracle 5520 1 0 18:14 ? 00:00:00 ora_lgwr_lunardboracle 5522 1 0 18:14 ? 00:00:00 ora_ckpt_lunardboracle 5524 1 0 18:14 ? 00:00:00 ora_smon_lunardboracle 5526 1 0 18:14 ? 00:00:00 ora_reco_lunardboracle 5528 1 0 18:14 ? 00:00:00 ora_rbal_lunardboracle 5530 1 0 18:14 ? 00:00:00 ora_asmb_lunardboracle 5532 1 0 18:14 ? 00:00:00 ora_mmon_lunardboracle 5536 1 0 18:14 ? 00:00:00 ora_mmnl_lunardboracle 5540 1 0 18:14 ? 00:00:00 ora_mark_lunardboracle 5568 1 0 18:14 ? 00:00:00 ora_arc0_lunardboracle 5570 1 0 18:14 ? 00:00:00 ora_arc1_lunardboracle 5572 1 0 18:14 ? 00:00:00 ora_arc2_lunardboracle 5574 1 0 18:14 ? 00:00:00 ora_arc3_lunardboracle 5583 1 0 18:14 ? 00:00:00 ora_qmnc_lunardboracle 5611 1 0 18:14 ? 00:00:00 ora_q000_lunardboracle 5613 1 0 18:14 ? 00:00:00 ora_q001_lunardboracle 6691 6657 0 18:29 pts/400:00:00 grepora_oracle 22988 1 0 18:26 ? 00:00:00 ora_o000_lunardboracle 23012 1 0 18:26 ? 00:00:00 ora_o001_lunardb[oracle@lunarlib trace]$

使用shutdown abort关闭数据库: | SYS@lunardb>shutdownabortORACLE instance shut down.SYS@lunardb>exitDisconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning and Automatic Storage Management options[oracle@lunarlib work]$ [oracle@lunarlib trace]$ ps-ef|grepora_oracle 6709 6657 0 18:31 pts/400:00:00 grepora_[oracle@lunarlib trace]$

alert显示: | Mon Jan 11 18:30:38 2016Shutting down instance (abort)License high water mark = 5USER (ospid: 26332): terminating the instanceInstance terminated by USER, pid = 26332Mon Jan 11 18:30:38 2016Instance shutdowncomplete

这时,如果数据库再次启动就会报错: | [oracle@lunarlib work]$ ssSQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:31:50 2016Copyright (c) 1982, 2013, Oracle. All rights reserved.Connected to an idle instance.SYS@lunardb>startupORA-01078: failure inprocessing system parametersORA-01565: error inidentifying file'+DATADG1/lunardb/spfilelunardb.ora'ORA-17503: ksfdopn:2 Failed to openfile+DATADG1/lunardb/spfilelunardb.oraORA-29701: unable to connect to Cluster Synchronization ServiceSYS@lunardb>

而此时has的其他进程是存在的,只是/var/tmp/.oracle/* 下面的网络socket文件不在了: | [root@lunarlib rootwork]# ll /var/tmp/.oracle/* ls: cannot access /var/tmp/.oracle/*: No such fileor directory[root@lunarlib rootwork]# ps -ef|grep ohasdroot 2877 1 0 17:12 ? 00:00:00 /bin/sh/etc/init.d/init.ohasd rungrid 5177 1 0 18:12 ? 00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootroot 6723 4677 0 18:33 pts/000:00:00 grepohasd[root@lunarlib rootwork]# ps -ef|grep d.bingrid 5177 1 0 18:12 ? 00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootgrid 5306 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagentgrid 5339 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bingrid 5341 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnrLISTENER -inheritgrid 5356 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 5387 5339 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.logroot 6725 4677 0 18:33 pts/000:00:00 grepd.bin[root@lunarlib rootwork]# /u01/app/11.2.0.4/grid/log/lunarlib/ohasd下的ohasd.log中有如下信息:2016-01-11 18:28:09.091: [ CRSCOMM][406906624] IpcL: connection to member 9 has been removed2016-01-11 18:28:09.091: [CLSFRAME][406906624] Removing IPC Member:{Relative|Node:0|Process:9|Type:3}2016-01-11 18:28:09.091: [CLSFRAME][406906624] Disconnected from AGENT process: {Relative|Node:0|Process:9|Type:3}2016-01-11 18:28:09.092: [ AGFW][333440768]{0:0:132} Agfw Proxy Server received process disconnected notification, count=12016-01-11 18:28:09.092: [ AGFW][333440768]{0:0:132} /u01/app/11.2.0.4/grid/bin/oraagent_griddisconnected.2016-01-11 18:28:09.092: [ AGFW][333440768]{0:0:132} Agent /u01/app/11.2.0.4/grid/bin/oraagent_grid[5311] stopped!2016-01-11 18:28:09.092: [ CRSCOMM][333440768]{0:0:132} IpcL: removeConnection: Member 9 does not exist inpending connections.2016-01-11 18:28:09.093: [ AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid2016-01-11 18:28:09.093: [ AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagentwith user id: grid and incarnation:32016-01-11 18:28:09.095: [ CRSPE][322934528]{0:0:133} Disconnected from server:2016-01-11 18:28:09.098: [ AGFW][333440768]{0:0:132} Starting the HB [Interval = 30000, misscount = 6kill allowed=1] foragent: /u01/app/11.2.0.4/grid/bin/oraagent_grid2016-01-11 18:31:39.112: [ INIT][333440768]{0:0:132} {0:0:132} Created alert : (:CRSAGF00130:) : Failed to start the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid2016-01-11 18:31:39.112: [ AGFW][333440768]{0:0:132} Can not stop the agent: /u01/app/11.2.0.4/grid/bin/oraagent_gridbecause pid is not initialized2016-01-11 18:31:39.112: [ AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid2016-01-11 18:31:39.112: [ AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagentwith user id: grid and incarnation:52016-01-11 18:31:39.119: [ AGFW][333440768]{0:0:132} Starting the HB [Interval = 30000, misscount = 6kill allowed=1] foragent: /u01/app/11.2.0.4/grid/bin/oraagent_grid2016-01-11 18:35:09.131: [ INIT][333440768]{0:0:132} {0:0:132} Created alert : (:CRSAGF00130:) : Failed to start the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid2016-01-11 18:35:09.131: [ AGFW][333440768]{0:0:132} Can not stop the agent: /u01/app/11.2.0.4/grid/bin/oraagent_gridbecause pid is not initialized2016-01-11 18:35:09.131: [ AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid2016-01-11 18:35:09.131: [ AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagentwith user id: grid and incarnation:72016-01-11 18:35:09.137: [ AGFW][333440768]{0:0:132} Starting the HB [Interval = 30000, misscount = 6kill allowed=1] foragent: /u01/app/11.2.0.4/grid/bin/oraagent_grid

此时,使用crsctl stop has -f不能停止has服务: | [root@lunarlib rootwork]# crsctl stop has -fCRS-4544: Unable to connect to OHASCRS-4000: Command Stop failed, or completed with errors.[root@lunarlib rootwork]#

reboot是比较好的选择。那么,如果主机不方便reboot,怎么办呢?

不能重启主机,咱们可以手工处理相关问题。首先,手工清理所有has的进程的网络通讯socket临时文件: | [root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/*[root@lunarlib rootwork]# ll /var/tmp/.oracle/total 0[root@lunarlib rootwork]# ps -ef|grep d.bingrid 4332 1 0 18:40 ? 00:00:09 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootgrid 4560 1 0 18:42 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/cssdagentgrid 4566 1 0 18:42 ? 00:00:11 /u01/app/11.2.0.4/grid/bin/oraagent.bingrid 4591 1 0 18:42 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bingrid 4594 1 0 18:42 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnrLISTENER -inheritgrid 4603 1 0 18:42 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 4639 4591 0 18:42 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.logroot 4994 4305 0 19:02 pts/100:00:00 grepd.bin[root@lunarlib rootwork]# ps -ef|grep ohasdroot 2882 1 0 18:40 ? 00:00:00 /bin/sh/etc/init.d/init.ohasd rungrid 4332 1 0 18:40 ? 00:00:09 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootroot 4996 4305 0 19:02 pts/100:00:00 grepohasd[root@lunarlib rootwork]# crsctl status res -tCRS-4639: Could not contact Oracle High Availability ServicesCRS-4000: Command Status failed, or completed with errors.[root@lunarlib rootwork]# [root@lunarlib rootwork]# crsctl stop has -fCRS-4544: Unable to connect to OHASCRS-4000: Command Stop failed, or completed with errors.[root@lunarlib rootwork]#

这时正常的停止has的命令都不能使用了,因为进程间通讯的socket文件被我们删除了。

但是我们可以kill他们: | [root@lunarlib rootwork]# kill -9 4332 4560 4566 4591 4594 4603 4639 2882 4332[root@lunarlib rootwork]# ps -ef|grep d.binroot 15575 4305 0 19:04 pts/100:00:00 grepd.bin[root@lunarlib rootwork]# ps -ef|grep ohasdroot 15548 1 0 19:04 ? 00:00:00 /bin/sh/etc/init.d/init.ohasd runroot 15580 4305 0 19:04 pts/100:00:00 grepohasd[root@lunarlib rootwork]# kill -9 15548[root@lunarlib rootwork]# ps -ef|grep ohasdroot 15581 1 0 19:04 ? 00:00:00 /bin/sh/etc/init.d/init.ohasd runroot 15608 4305 0 19:04 pts/100:00:00 grepohasd[root@lunarlib rootwork]# [root@lunarlib rootwork]# ps -ef|grep d.binroot 15623 4305 0 19:04 pts/100:00:00 grepd.bin[root@lunarlib rootwork]# [root@lunarlib rootwork]# /etc/init.d/init.ohasd stop -f[root@lunarlib rootwork]# ps -ef|grep ohasdroot 15581 1 0 19:04 ? 00:00:00 /bin/sh/etc/init.d/init.ohasd runroot 15650 4305 0 19:05 pts/100:00:00 grepohasd[root@lunarlib rootwork]# /etc/init.d/init.ohasd stop[root@lunarlib rootwork]# ps -ef|grep ohasdroot 15581 1 0 19:04 ? 00:00:00 /bin/sh/etc/init.d/init.ohasd runroot 15672 4305 0 19:05 pts/100:00:00 grepohasd[root@lunarlib rootwork]#

在我的测试中,has环境下,一次kill所有进程主机都没有重启(在rac环境下,kill ocssd.bin可能会引起主机重启): | [root@lunarlib rootwork]# ipcs -ma------ Shared Memory Segments --------key shmid owner perms bytes nattch status ------ Semaphore Arrays --------key semid owner perms nsems 0x00000000 0 root 600 1 0x00000000 65537 root 600 1 ------ Message Queues --------key msqid owner perms used-bytes messages [root@lunarlib rootwork]#

然后手工重启has: | [root@lunarlib rootwork]# crsctl start hasCRS-4123: Oracle High Availability Services has been started.[root@lunarlib rootwork]# [root@lunarlib rootwork]# ps -ef|grep ohasdroot 15581 1 0 19:04 ? 00:00:00 /bin/sh/etc/init.d/init.ohasd rungrid 15811 1 1 19:09 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootroot 15817 4520 0 19:09 pts/000:00:00 tail-f ohasd.logroot 15935 15908 0 19:10 pts/200:00:00 grepohasd[root@lunarlib rootwork]# ps -ef|grep d.binroot 15806 4305 0 19:09 pts/100:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin start hasgrid 15811 1 1 19:09 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootgrid 15851 1 0 19:09 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.binroot 15937 15908 0 19:10 pts/200:00:00 grepd.bin[root@lunarlib rootwork]#

随着has的启动,它自己创建了新的网络通讯socket文件: | [root@lunarlib rootwork]# ll /var/tmp/.oracletotal 0prw-r--r-- 1 grid oinstall 0 Jan 11 19:04 npohasdsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sCRSD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 slunarlibDBG_OHASDsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL_lock[root@lunarlib rootwork]# ps -ef|grep d.binroot 15806 4305 0 19:09 pts/100:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin start hasgrid 15811 1 1 19:09 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootgrid 15851 1 0 19:09 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.binroot 15940 15908 0 19:11 pts/200:00:00 grepd.bin[root@lunarlib rootwork]# ps -ef|grep d.bingrid 15811 1 1 19:09 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin rebootgrid 15947 1 0 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagentgrid 15952 1 1 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bingrid 15977 1 0 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnrLISTENER -inheritgrid 15980 1 1 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bingrid 15994 1 1 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 16026 15980 0 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.logroot 16040 15908 0 19:11 pts/200:00:00 grepd.bin[root@lunarlib rootwork]# ll /var/tmp/.oracletotal 0prw-r--r-- 1 grid oinstall 0 Jan 11 19:04 npohasdsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 s#15977.1srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 s#15977.2srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sAevmsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sCevmsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sCRSD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 slunarlibDBG_CSSDsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 slunarlibDBG_EVMDsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 slunarlibDBG_OHASDsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_localhost-rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_localhost_lock-rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib__locksrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_UI_SOCKETsrwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOracle_CSS_LclLstnr_localhost_1-rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOracle_CSS_LclLstnr_localhost_1_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL_locksrwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sSYSTEM.evm.acceptor.auth[root@lunarlib rootwork]#

现在has全部启动正常了: | [root@lunarlib rootwork]# crsctl status res -t--------------------------------------------------------------------------------NAME TARGET STATE SERVER STATE_DETAILS --------------------------------------------------------------------------------Local Resources--------------------------------------------------------------------------------ora.CRSDG.dg ONLINE ONLINE lunarlib ora.DATADG1.dg ONLINE ONLINE lunarlib ora.DATADG2.dg ONLINE ONLINE lunarlib ora.LISTENER.lsnr ONLINE ONLINE lunarlib ora.asm ONLINE ONLINE lunarlib Started ora.ons OFFLINE OFFLINE lunarlib --------------------------------------------------------------------------------Cluster Resources--------------------------------------------------------------------------------ora.cssd 1 ONLINE ONLINE lunarlib ora.diskmon 1 OFFLINE OFFLINE ora.evmd 1 ONLINE ONLINE lunarlib ora.lunardb.db 1 ONLINE ONLINE lunarlib Open [root@lunarlib rootwork]#

总结,如果RAC或者HAS下

1,在Linux平台上,Network Socket File在/var/tmp/.oracle/目录下。在其他平台,可能的目录有:/tmp/.oracle/*, /tmp/.oracle 或者 /usr/tmp/.oracle

2,如果CRS或者HAS没有启动,删除oracle临时文件(Network Socket File),在CRS重启后会自动重新创建,没有不良影响。

3,如果CRS或者HAS已经启动并正常运行中,删除oracle临时文件,不影响数据库运行,但是数据库不能正常关闭(可以abort,但是不能启动)

4,如果出现了上面的情况3,CRS不能关闭(包括使用-f选项),只能手工清理共享内存段和kill 进程。在HAS中,kill ocssd.bin进程不会造成主机重启。但是在RAC环境下kill ocssd.bin进程会造成主机重启。

5,如果完成了上面的情况4,只需要重启CRS或者HAS就可以了。 ASMOracle 11.1 & Oracle11.2RACcrs不能关闭数据库不能关闭固定链接← Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-11-配置VNC和常见问题处理11.2 RAC 的启动过程 →### 发表评论 电子邮件地址不会被公开。 必填项已用 * 标注

姓名 *

电子邮件 *

站点

评论

您可以使用这些 标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>