解决一则enq: TX – row lock contention的性能故障

Asher/ 十一月 26, 2012/ Experience, INDEX, Oracle, Performance Tuning, RAC/ 3 comments

        上周二早上,收到项目组的一封邮件:

早上联代以下时间点用户有反馈EDI导入“假死”,我们跟踪了EDI导入服务,服务是正常在跑,可能是处理的慢所以用户感觉是“假死”了,请帮忙从数据库中检查跟踪以下时间点是否有“异常”操作,多谢!

>

解决一则row cache lock引起的性能故障1

Snap Id | Snap Time | Sessions | Cursors/Session

Begin Snap: | 15142 | 20-11月-12 09:00:05 | 62 | 5.8

End Snap: | 15143 | 20-11月-12 10:00:56 | 74 | 8.3

Elapsed: | | 60.85 (mins) | |

DB Time: | | 492.88 (mins) | |

2enq: TX – row lock contentionTop 5 Timed Events

Event | Waits | Time(s) | Avg Wait(ms) | % Total Call Time | Wait Class


CPU time | | 21,215 | | 71.7 |

enq: TX – row lock contention | 12,232 | 6,013 | 492 | 20.3 | Application

gc cr multi block request | 14,696,067 | 1,675 | 0 | 5.7 | Cluster

gc buffer busy | 441,472 | 719 | 2 | 2.4 | Cluster

db file sequential read | 4,191 | 25 | 6 | .1 | User I/O

  • **不同的session更新或删除同一条记录;

**

  • **唯一索引有重复索引;

**

  • **位图索引同时被更新或同时并发的向位图索引字段上插入相同字段值;

**

  • **并发的对同一个数据块上的数据进行update操作;

**

  • **等待索引块完成分裂;

Segments by Row Lock Waits**

  • % of Capture shows % of row lock waits for each top segment compared
  • with total row lock waits for all segments captured by the Snapshot

Owner | Tablespace Name | Object Name | Subobject Name | Obj. Type | Row Lock Waits | % of Capture


SUNISCO | SUNISCO_DATA1 | BIND_PROCESS_LOG_REFNO | | INDEX | 159 | 67.66

SUNISCO | FDN_EDI_I01 | IDX_EDI_WORK_QUEUE_1 | | INDEX | 29 | 12.34

SUNISCO | SUNISCO_DATA1 | IND_EDI_CUSTOMER_TYPE_CODE | | INDEX | 15 | 6.38

SUNISCO | SUNISCO_DATA1 | IDX_EDI_MESSAGE_1 | | INDEX | 14 | 5.96

SUNISCO | FDN_BASE_T01 | BSE_NUM_LIST | | TABLE | 6 | 2.55

看到row lock waits发生在一个索引上。

3 那么,究竟是什么操作导致了这个enq: TX – row lock contention等待事件呢? 查看系统中,当前有哪些会话产生了enq: TX – row lock contention等待事件?

SQL> selectevent,sid,p1,p2,p3 fromv$session_wait whereevent='enq: TX - row lock contention';EVENT SID P1 P2 P3---------------------------------------------------------------- ---------- ---------- ---------- ----------enq: TX - row lock contention 224 1415053316 1441815 144197enq: TX - row lock contention 238 1415053316 1441815 144197enq: TX - row lock contention 247 1415053316 1441815 144197enq: TX - row lock contention 248 1415053316 1441815 144197enq: TX - row lock contention 253 1415053316 1441815 144197SQL>

看到SID为224,238,247,248,253的会话产生enq: TX – row lock contention等待事件。

4 查看系统中的当前会话,是在哪个对象上产生了产生了enq: TX – row lock contention等待事件?

SQL> selectROW_WAIT_OBJ#,ROW_WAIT_FILE#,ROW_WAIT_BLOCK#,ROW_WAIT_ROW# fromv$session whereevent='enq: TX - row lock contention';ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW#------------- -------------- --------------- -------------369195 0 0 0369195 0 0 0369195 0 0 0369195 0 0 0369195 0 0 0369195 0 0 06 rowsselectedSQL>

5 那么这个数据库对象为369195的对象究竟是什么呢?

SQL> selectobject_name,object_id fromdba_objects whereobject_id=369195;OBJECT_NAME OBJECT_ID----------------------------------- ----------BIND_PROCESS_LOG_REFNO 369195SQL> selectOWNER,OBJECT_NAME,OBJECT_ID,DATA_OBJECT_ID, OBJECT_TYPE fromdba_objects whereobject_name='BIND_PROCESS_LOG_REFNO';OWNER OBJECT_NAME OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE------------------------------ ----------------------------- ---------- -------------- -------------------SUNISCO BIND_PROCESS_LOG_REFNO 369195 369195 INDEXSQL>

可以看到,定位到的结果同上述AWR报告中段统计信息吻合,是SUNISCO这个用户下的一个索引。

6 接下来,继续看看SID为224,238,247,248,253的会话到底在执行哪些操作导致enq: TX – row lock contention等待事件?

SQL> selectsid,sql_text fromv$session a,v$sql b wheresid in(224,238,247,248,253) and(b.sql_id=a.sql_id orb.sql_id=a.prev_sql_id);SID SQL_TEXT---------- --------------------------------------------------------------------------------224 selectcount(1) fromEDI_MESSAGE_PROCESS_LOG where(LOG_ID = :P_0_0 )224 INSERTINTOEDI_MESSAGE_PROCESS_LOG(LOG_ID, SERVICE_TYPE, SERVICE_STATUS, INFO_C238 selectcount(1) fromEDI_MESSAGE_PROCESS_LOG where(LOG_ID = :P_0_0 )238 INSERTINTOEDI_MESSAGE_PROCESS_LOG(LOG_ID, SERVICE_TYPE, SERVICE_STATUS, INFO_C247 INSERTINTOEDI_MESSAGE_PROCESS_LOG(LOG_ID, SERVICE_TYPE, SERVICE_STATUS, REFNO,247 INSERTINTOEDI_MESSAGE_PROCESS_LOG(LOG_ID, SERVICE_TYPE, SERVICE_STATUS, REFNO,248 INSERTINTOEDI_MESSAGE_PROCESS_LOG (LOG_ID, SERVICE_TYPE, SERVICE_STATUS, REFNO248 INSERTINTOEDI_MESSAGE_PROCESS_LOG (LOG_ID, SERVICE_TYPE, SERVICE_STATUS, REFNO248 SELECTSEQ_NEWID.NEXTVAL FROMDUAL253 SELECTSEQ_NEWID.NEXTVAL FROMDUAL253 INSERTINTOEDI_MESSAGE_PROCESS_LOG (LOG_ID, SERVICE_TYPE, SERVICE_STATUS, REFNO11 rowsselectedSQL>

看到有SQL_ID不同的SQL在同时向EDI_MESSAGE_PROCESS_LOG这张表执行INSERT操作。

7 接下去看看EDI_MESSAGE_PROCESS_LOG这张表和索引BIND_PROCESS_LOG_REFNO之间有没有什么关系?

SQL> selectindex_name,table_name,index_type fromuser_indexes wheretable_name='EDI_MESSAGE_PROCESS_LOG';INDEX_NAME TABLE_NAME INDEX_TYPE------------------------------ ------------------------------ ----------PK_EDI_MESSAGE_PROCESS_LOG EDI_MESSAGE_PROCESS_LOG NORMALID_EDI_LOG_INPUT_DATE EDI_MESSAGE_PROCESS_LOG NORMALBIND_PROCESS_LOG_REFNO EDI_MESSAGE_PROCESS_LOG BITMAPSQL> selectindex_name,table_name,column_name fromuser_ind_columns wheretable_name='EDI_MESSAGE_PROCESS_LOG';INDEX_NAME TABLE_NAME COLUMN_NAM------------------------------ ------------------------------ ----------PK_EDI_MESSAGE_PROCESS_LOG EDI_MESSAGE_PROCESS_LOG LOG_IDID_EDI_LOG_INPUT_DATE EDI_MESSAGE_PROCESS_LOG INPUT_DATEBIND_PROCESS_LOG_REFNO EDI_MESSAGE_PROCESS_LOG REFNOSQL> selectobject_name,object_id,object_type,created fromuser_objects whereobject_name='BIND_PROCESS_LOG_REFNO';OBJECT_NAME OBJECT_ID OBJECT_TYPE CREATED------------------------------ ---------- --------------- -------------------BIND_PROCESS_LOG_REFNO 369195 INDEX2012/11/05 10:18:28SQL> selectindex_name,index_type fromuser_indexes whereindex_name='BIND_PROCESS_LOG_REFNO';INDEX_NAME INDEX_TYPE------------------------------- -----------BIND_PROCESS_LOG_REFNO BITMAPSQL>

发现,这个索引BIND_PROCESS_LOG_REFNO是位于EDI_MESSAGE_PROCESS_LOG这张表的REFNO字段上的一个位图索引,而且是2012/11/05 10:18:28创建的,也就是说是近期才创建的1个位图索引。

问题定位到这一步基本比较清晰了,产生enq: TX – row lock contention事件的原因就是上述的第2个可能原因:位图索引同时被更新或同时并发的向位图索引字段上插入相同字段值

8 那么,解决的办法也比较简单了,就是干掉这个位图索引,因为这个位图索引在这种应用场景下确实不太适合。事后,经过同客户方沟通确认,该索引是他们的一个DBA当初看到系统比较慢,而加上去的一个位图索引。

9 补充,从当时的ADDM报告中,也可以看到数据库给我们的建议:

FINDING 4: 20% impact (6013 seconds)------------------------------------发现 SQL 语句正处于行锁定等待。RECOMMENDATION 1: Application Analysis, 17% benefit (5131 seconds)ACTION: 在 INDEX"SUNISCO.BIND_PROCESS_LOG_REFNO"(对象 ID 为 369195)中检测到了严重的行争用。使用指定的阻塞 SQL 语句在应用程序逻辑中跟踪行争用的起因。RELEVANT OBJECT: databaseobject withid 369195RATIONALE: SQL_ID 为 "dr4uxu769tmmb"的 SQL 语句在行锁上被阻塞。RELEVANT OBJECT: SQL statement withSQL_ID dr4uxu769tmmbINSERTINTOEDI_MESSAGE_PROCESS_LOG(LOG_ID, SERVICE_TYPE,SERVICE_STATUS, LOG_DATETIME, REFNO, REF_TYPE, MSG_ID, BL_NO, BL_ID,VOYAGE_ID, VESSEL_NAME, IMO_NO, VOYAGE_NO, FUNCTION_TYPE, INPUT_DATE,IN_STATUS, SYSTEM_TYPE, ERROR_LOG, FILE_NAME) VALUES( :B1 , :B2 ,:B3 , :B4 , :B5 , :B6 , :B7 , :B8 , :B9 , :B10 , :B11 , :B12 , :B13 ,:B14 , :B15 , :B16 , :B17 , :B18 , :B19 )RATIONALE: SQL_ID 为 "dxsbgubsb6r4n"的 SQL 语句在行锁上被阻塞。RELEVANT OBJECT: SQL statement withSQL_ID dxsbgubsb6r4nINSERTINTOEDI_MESSAGE_PROCESS_LOG(LOG_ID, SERVICE_TYPE,SERVICE_STATUS, INFO_CODE, INFORMATION, INFO_LEVEL, LOG_DATETIME,REFNO, REF_TYPE, MSG_ID, BL_NO, VOYAGE_ID, VESSEL_NAME, IMO_NO,VOYAGE_NO, FUNCTION_TYPE, INPUT_DATE, IN_STATUS, SYSTEM_TYPE,ERROR_LOG, FILE_NAME) VALUES( :B1 , :B2 , :B3 , :B4 , :B5 , :B6 ,:B7 , :B8 , :B9 , :B10 , :B11 , :B12 , :B13 , :B14 , :B15 , :B16 ,:B17 , :B18 , :B19 , :B20 , :B21 )RATIONALE: SQL_ID 为 "b38qhyzvn5bdd"的 SQL 语句在行锁上被阻塞。RELEVANT OBJECT: SQL statement withSQL_ID b38qhyzvn5bddINSERTINTOEDI_MESSAGE_PROCESS_LOG(LOG_ID, SERVICE_TYPE,SERVICE_STATUS, LOG_DATETIME, REFNO, REF_TYPE, MSG_ID, BL_NO,VOYAGE_ID, VESSEL_NAME, IMO_NO, VOYAGE_NO, FUNCTION_TYPE, INPUT_DATE,IN_STATUS, SYSTEM_TYPE, ERROR_LOG, FILE_NAME) VALUES( :B1 , :B2 ,:B3 , :B4 , :B5 , :B6 , :B7 , :B8 , :B9 , :B10 , :B11 , :B12 , :B13 ,:B14 , :B15 , :B16 , :B17 , :B18 )RATIONALE: SQL_ID 为 "36k2xpx3c6wr5"的 SQL 语句在行锁上被阻塞。RELEVANT OBJECT: SQL statement withSQL_ID 36k2xpx3c6wr5INSERTINTOEDI_MESSAGE_PROCESS_LOG(LOG_ID, SERVICE_TYPE,SERVICE_STATUS, REFNO, REF_TYPE, MSG_ID, BL_NO, VOYAGE_ID,VESSEL_NAME, IMO_NO, VOYAGE_NO, FUNCTION_TYPE, INPUT_DATE, IN_STATUS,SYSTEM_TYPE, ERROR_LOG, FILE_NAME) VALUES( :B1 , :B2 , :B3 , :B4 ,:B5 , :B6 , :B7 , :B8 , :B9 , :B10 , :B11 , :B12 , :B13 , :B14 , :B15, :B16 , :B17 )RATIONALE: 具有 ID "268", 用户 ID "31", 程序 "FC.EdiService.Import.exe"模块"FC.EdiService.Import.exe"的会话是构成此建议案中的优化建议的 51% 的阻塞会话。RATIONALE: 具有 ID "307", 用户 ID "31", 程序 "FC.EdiService.Import.exe"模块"FC.EdiService.Import.exe"的会话是构成此建议案中的优化建议的 11% 的阻塞会话。RATIONALE: 具有 ID "227", 用户 ID "31", 程序 "FC.EdiService.Import.exe"模块"FC.EdiService.Import.exe"的会话是构成此建议案中的优化建议的 11% 的阻塞会话。RATIONALE: 具有 ID "273", 用户 ID "31", 程序 "FC.EdiService.Import.exe"模块"FC.EdiService.Import.exe"的会话是构成此建议案中的优化建议的 9% 的阻塞会话。

10 最后,从本案例中,可以看到在日常的数据库维护中,添加或修改一些对象信息时,务必要经过严格的测试,尤其是在生产系统上做调整更应如此。同样,可以看出,数据库的一些理论基础知识对于DBA还是蛮重要的。 下载本文的PDF版本, 方便离线阅读转载### Post navigation ← Oracle索引组织表学习系列三(完)如何正确手工启动Windows下的Oracle RAC数据库? →