Oracle數(shù)據(jù)庫4031故障分析

不知名網(wǎng)友發(fā)布于2023-01-11 11:02 / 3667人閱讀

點(diǎn)擊上方“IT那活兒”公眾號(hào)，關(guān)注后了解更多內(nèi)容，不管IT什么活兒，干就完了?。?！

一

問題描述

數(shù)據(jù)庫業(yè)務(wù)出現(xiàn)超時(shí)，數(shù)據(jù)庫出現(xiàn)enq: TX - index contention等等待事件。

二

問題分析

2.1 數(shù)據(jù)庫出現(xiàn)enq: TX - index contention等待

通過查看問題前的ash，發(fā)現(xiàn)在數(shù)據(jù)庫異常之前，數(shù)據(jù)庫的主要等待事件是：SGA: allocation forcing component growth。

Alert.log：

2022-10-03T00:16:41.029338+08:00
Errors in file /u01/app/oracle/diag/rdbms/*/trace/db2_ora_6519.trc
ORA-04031: unable to allocate 63496 bytes of shared memory ("shared pool","DBMS_BACKUP_RESTORE","PLMCD^a5f238a0","BAMIMA: Bam Buffer")
Incident details in: /u01/app/oracle/diag/rdbms/*/incident/incdir_218692/db2_ora_6519_i218692.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-10-03T00:16:41.321084+08:00
ORA-04031 heap dump being written to trace file /u01/app/oracle/diag/rdbms/*/incident/incdir_218692/db2_ora_6519_i218692.trc
2022-10-03T00:19:38.948488+08:00
Errors in file /u01/app/oracle/diag/rdbms/*/trace/b2_m000_8030.trc (incident=218693):
ORA-04031: unable to allocate 57768 bytes of shared memory ("shared pool","DBMS_STATS_INTERNAL","PLMCD^60dfd26c","BAMIMA: Bam Buffer")
Incident details in: /u01/app/oracle/diag/rdbms/*/incident/incdir_218693/b2_m000_8030_i218693.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-10-03T00:19:42.233916+08:00
Errors in file /u01/app/oracle/diag/rdbms/*/trace/db2_ora_8166.trc (incident=218700):
ORA-04031: unable to allocate 63496 bytes of shared memory ("shared pool","DBMS_BACKUP_RESTORE","PLMCD^a5f238a0","BAMIMA: Bam Buffer")
Incident details in: /u01/app/oracle/diag/rdbms/*/incident/incdir_218700/db2_ora_8166_i218700.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
…
2022-10-03T04:23:42.937853+08:00
Errors in file /u01/app/oracle/diag/rdbms/*/trace/db2_lmd0_7330.trc (incident=215852):
ORA-04031: unable to allocate 13840 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges resource dynamic")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-10-03T04:23:49.345977+08:00
Errors in file /u01/app/oracle/diag/rdbms/*/trace/db2_lmd0_7330.trc (incident=215853):
ORA-04031: unable to allocate 13840 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges resource dynamic")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
…
2022-10-03T08:58:07.793554+08:00
DDE: Problem Key ORA 4031 was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2022-10-03T09:08:09.957747+08:00
DDE: Problem Key ORA 4031 was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2022-10-03T09:18:12.005452+08:00
DDE: Problem Key ORA 4031 was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2022-10-03T09:28:12.835118+08:00
DDE: Problem Key ORA 4031 was completely flood controlled (0x6)

022-10-03T09:38:13.206769+08:00
DDE: Problem Key ORA 4031 was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2022-10-03T09:44:00.615729+08:00
Thread 2 advanced to log sequence 29428 (LGWR switch)
Current log# 8 seq# 29428 mem# 0: +DG_DATA_SSD_1/N*/ONLINELOG/group_8.276.1008547021
Current log# 8 seq# 29428 mem# 1: +DG_DATA_SSD_1/N*/ONLINELOG/group_8.277.1008547025
2022-10-03T09:44:01.334950+08:00
TT03: Standby redo logfile selected for thread 2 sequence 29428 for destination LOG_ARCHIVE_DEST_2
2022-10-03T09:44:03.799630+08:00
Archived Log entry 119834 added for T-2.S-29427 ID 0x79a6de54 LAD:1
2022-10-03T09:48:17.418439+08:00
DDE: Problem Key ORA 4031 was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2022-10-03T09:58:19.513479+08:00
DDE: Problem Key ORA 4031 was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2022-10-03T10:08:21.022510+08:00
DDE: Problem Key ORA 4031 was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2022-10-03T10:14:50.702688+08:00
LCK1 (ospid: 7342) waits for event libcache interrupt action by LCK for 73 secs.
2022-10-03T10:14:50.702867+08:00
LCK1 (ospid: 7342) is hung in an acceptable location (libcache 0x41.02).
2022-10-03T10:17:11.454316+08:00
LCK1 (ospid: 7342) waits for event libcache interrupt action by LCK for 214 secs.
2022-10-03T10:17:11.454494+08:00
LCK1 (ospid: 7342) is hung in an acceptable location (libcache 0x41.02).
2022-10-03T10:18:23.645094+08:00
DDE: Problem Key ORA 4031 was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2022-10-03T10:21:42.845553+08:00
LCK1 (ospid: 7342) waits for event libcache interrupt action by LCK for 79 secs.
2022-10-03T10:21:42.845731+08:00
LCK1 (ospid: 7342) is hung in an acceptable location (libcache 0x41.02).
2022-10-03T10:24:20.692124+08:00
LCK1 (ospid: 7342) waits for event libcache interrupt action by LCK for 80 secs.
2022-10-03T10:24:20.692681+08:00
LCK1 (ospid: 7342) is hung in an acceptable location (libcache 0x41.02).
2022-10-03T10:27:57.458418+08:00
CKPT (ospid: 7350) waits for event enq: XR - database force logging for 78 secs.
2022-10-03T10:28:07.628285+08:00
CKPT (ospid: 7350) waits for event enq: XR - database force logging for 88 secs.
2022-10-03T10:28:17.668429+08:00
CKPT (ospid: 7350) waits for event enq: XR - database force logging for 98 secs.

2.2 查看數(shù)據(jù)庫內(nèi)存使用情況

通過ash趨勢(shì)和alert.log發(fā)現(xiàn)，數(shù)據(jù)庫在出現(xiàn)enq: TX - index contention之前，數(shù)據(jù)庫主要等待是SGA: allocation forcing component growth和library cache lock，并且alert.log從0點(diǎn)開始出現(xiàn)ora-4031持續(xù)到異常時(shí)段。

Buffer cache的內(nèi)存非常小，只有128m，而shared pool占用了32G。

Shared pool中g(shù)es resource dynamic和ges enqueues分別占用了12G和8.9G，這通常是不正常的。說明由于這兩個(gè)組件占用了過多的內(nèi)存，導(dǎo)致數(shù)據(jù)庫的ORA-4031，進(jìn)而出來sga resize，進(jìn)而影響了數(shù)據(jù)庫的整體性能。

分析lmhb.trc：

grep "library cache pin wait check" n*2_lmhb_7340.trc
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
…
grep "check lck heartbeat" n*2_lmhb_7340.trc
kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname N*, inst 2, node 2
kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname N*, inst 2, node 2
kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname N*, inst 2, node 2
kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname N*, inst 2, node 2
kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname N*, inst 2, node 2
kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname N*, inst 2, node 2
kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname N*, inst 2, node 2
kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname N*, inst 2, node 2
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth

參考文檔：Error ORA-04031 in the Shared Pool with Huge Allocation in Memory Type "ges resource dynamic" or "ges enqueues" memory" (Doc ID 2631592.1)

For Oracle Versions >= 12.2 but BELOW 19.1
Bug:26405036 - VERY HIGH "GES ENQUEUES" ON THE SHARED POOLIf a problem fulfills all of the condition below, it is a duplicate of this problem.
1. The LMHB process keeps reporting "memory load check" failure.
[Example]
============================================================
kjgcr_StatCheckMEM: memory is low, free memory 18%, average 19%
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric memory load check, id 10 failed
============================================================
2. The LMHB process reported "library cache pin wait check" failure.
[Example]
============================================================
kjgcr_ServiceGCR: KJGCR_METRICS: Global metric library cache pin wait check, id 11 failed
============================================================
3. The heap dump of the shared pool contains many object of description "ges resource dynamic".
Bug:27824540 - ORA-04031 ("SHARED POOL","UNKNOWN OBJECT","SGA HEAP(1,0)","GES RESOURCE DYNAMIC"
If a problem fulfills all of the conditions below, it is a duplicate of this problem.
1. Many objects "ges resource dynamic" are allocated in the shared pool.
2. Trace file of LMHB process traced Action 11 (kjgcr_GrowResourceCache)
was executed, and was not resetted.
============================================================
kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
============================================================
3. Trace file of LMHB process keep tracing failure of metric 7 (check lck heartbeat).

認(rèn)為命中Bug:26405036 - VERY HIGH "GES ENQUEUES" ON THE SHARED POOL和Bug:27824540 - ORA-04031 ("SHARED POOL","UNKNOWN OBJECT","SGA HEAP(1,0)","GES RESOURCE DYNAMIC"。

其中Bug:26405036 fix在12.2.0.1.200114 ，Bug:27824540 fix在12.2.0.1.190716。

2.3 對(duì)應(yīng)的解決方案

For Oracle Versions >= 12.2 but BELOW 19.1
Apply fix for Bug:26405036; see NOTE:26405036.8
Workaround: on R12.2 or above, Start pseudo reconfiguration by below command is workaround.
SQL> oradebug setmypid
SQL> oradebug lkdebug -m reconfig lkdebug
Apply fix for Bug:27824540; see NOTE:27824540.8
Workaround: There are 2 possible workarounds.
1) Disable the action 11.
SQL> oradebug dyn_gcr -a 11 -disable
Note: This oradebug command is available on 12.2 and later.
2) Disable the GES resource cache; set the initialization parameter "_ges_direct_free" to TRUE.
Note: Completely disabling GES resource cache may lead to some other side affects like contention on TM lock for insert statements,
so use this workaround with caution if there is still a need.

三

問題總結(jié)

數(shù)據(jù)庫由于shared pool中g(shù)es resource dynamic和ges enqueues占用內(nèi)存過多，shared pool不夠用引起ora-4031和sga resize，進(jìn)而影響了數(shù)據(jù)庫整體性能。

通過重啟實(shí)例后，ges resource dynamic空間釋放，但ges enqueues還有8g，目前看已經(jīng)緩解問題。最終建議通過補(bǔ)丁解決。

建議：

oracle 12.2已過支持期，但基于hp-ux平臺(tái)oracle 12.2.0.1.180417的26405036、27824540的merge patch不存在并且可能無法申請(qǐng)出來。

建議1
只安裝補(bǔ)丁26405036 - VERY HIGH "GES ENQUEUES" ON THE SHARED POOL看問題是否解決。
建議2
升級(jí)RU到12.2.0.1.200114以上版本，已修復(fù)以上問題。
建議3
嘗試使用bug中的workaround或重啟實(shí)例做為臨時(shí)的解決方案：

SQL> oradebug setmypid
SQL> oradebug lkdebug -m reconfig lkdebug

優(yōu)先建議使用建議1，小補(bǔ)丁窗口時(shí)間短并且可以rolling。

本文作者：張芳(上海新炬中北團(tuán)隊(duì)）

本文來源：“IT那活兒”公眾號(hào)

GPU云服務(wù)器云服務(wù)器 oracle 數(shù)據(jù)分析工具故障分析服務(wù)器故障分析 oracle日志分析

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://systransis.cn/yun/129090.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

不知名網(wǎng)友

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

Oracle數(shù)據(jù)庫4031故障分析

閱讀 3668·2023-01-11 11:02
Oceanbase新版本復(fù)合分區(qū)添加分區(qū)操作

閱讀 4209·2023-01-11 11:02
VRRP高可用

閱讀 3007·2023-01-11 11:02
Docker技術(shù)之構(gòu)建鏡像和網(wǎng)絡(luò)模式解析

閱讀 5147·2023-01-11 11:02
?CISCO 4500 主引擎版故障處理

閱讀 4701·2023-01-11 11:02
大數(shù)據(jù)開發(fā)系列五：kafka& zookeeper 配置kerberos認(rèn)證

閱讀 5485·2023-01-11 11:02
自研實(shí)時(shí)計(jì)算模塊介紹及運(yùn)維數(shù)據(jù)應(yīng)用場(chǎng)景實(shí)施

閱讀 5235·2023-01-11 11:02
DataX的限速與調(diào)優(yōu)

閱讀 3864·2023-01-11 11:02

成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來選購！

Oracle數(shù)據(jù)庫4031故障分析

本文作者：張芳(上海新炬中北團(tuán)隊(duì)）

本文來源：“IT那活兒”公眾號(hào)

相關(guān)文章

**大數(shù)據(jù)面臨的挑戰(zhàn)：當(dāng)大數(shù)據(jù)遭遇云計(jì)算**

發(fā)表評(píng)論

0條評(píng)論

不知名網(wǎng)友

男|高級(jí)講師

TA的文章