成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

雙節(jié)點(diǎn)RAC實(shí)例2 HANG 故障分析一例

IT那活兒 / 1911人閱讀
雙節(jié)點(diǎn)RAC實(shí)例2 HANG 故障分析一例

一、環(huán)境:

操作系統(tǒng)為:AIX 5.3

數(shù)據(jù)庫版本:Oracle 10.2.0.5


二、第一現(xiàn)場:

1.主機(jī)能夠登入,系統(tǒng)負(fù)載很低

2.數(shù)據(jù)庫實(shí)例2登錄不進(jìn)去,實(shí)例1能正常提供服務(wù)

3.業(yè)務(wù)不受到影響



三、故障處理:


1、基本狀態(tài)查看

首先查看下主機(jī)情況:

$ uptime

 06:06PM  up 232 days,  19:31,  6 users, load average: 6.15, 6.71, 6.69

 節(jié)點(diǎn)2已經(jīng)運(yùn)行232天,平均負(fù)載6點(diǎn)多,相對(duì)比較輕,最近時(shí)刻的負(fù)載也沒有超過7,因此主機(jī)狀態(tài)基本正常。接下來查看是否有大量換頁出現(xiàn):


$ vmstat 3 10

 

System configuration: lcpu=16mem=79360MB

 

kthr   memory              page              faults        cpu   

----- ----------------------------------- ------------ -----------

 r b   avm   fre re  pi  po fr   sr  cy in   sy  cs us sy id wa

 6  04932557 13865294   0   0   0   0   0   0 246 273723 268060 17 1172  0

 5  04932540 13865311   0   0   0   0   0   0 279 265072 263893 17 1172  0

 5  04932503 13865349   0   0   0   0   0   0 311 265326 264567 17 1172  0

 5  04932511 13865340   0   0   0   0   0   0 597 340525 264203 18 1270  0

 8  04932523 13865328   0   0   0   0   0   0 327 265519 260997 17 1172  0

…….



我們同樣發(fā)現(xiàn),CPU內(nèi)存都相對(duì)空閑,沒有換頁發(fā)生。

$ ps -ef|grep ora|wc -l

 

    367


數(shù)據(jù)庫進(jìn)程300多,對(duì)于16CPU,80G內(nèi)存的主機(jī)系統(tǒng)來說,一切似乎都很正常。

 

 

 

2、信息搜集

鑒于常規(guī)的數(shù)據(jù)庫登錄手段無法進(jìn)入實(shí)例2,無法得知數(shù)據(jù)庫狀態(tài),因此采用Oracle10g開始提供的“后門“,進(jìn)入數(shù)據(jù)庫進(jìn)行信息搜集:

$ sqlplus -prelim / as sysdba

 

SQL*Plus: Release 10.2.0.5.0 -Production on Thu Jun 14 18:01:13 2012

 

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

 

SQL>

SQL> oradebug hanganalyze 3

Hang Analysis in /oracle/admin/bxxx/udump/bxxx2_ora_1011948.trc

SQL> oradebug setmypid

Statement processed.

SQL> oradebug dump systemstate 266

Statement processed.

SQL> set time on

18:02:57 SQL> set timing on

18:03:01 SQL>  oradebug dump systemstate 266

Statement processed.

18:04:13 SQL> oradebug hanganalyze5

Hang Analysis in /oracle/admin/bxxx/udump/bxxx2_ora_1011948.trc

18:04:24 SQL>

18:04:51 SQL>  oradebug dump systemstate 266

Statement processed.

18:07:12 SQL> exit

Disconnected from ORACLE


由于其余數(shù)據(jù)庫實(shí)例正常,因此信息搜集我們僅限于本實(shí)例,以免影響到其他實(shí)例的正常運(yùn)行。



3、實(shí)例重啟
信息搜集完畢,在征得應(yīng)用部門及DBA確認(rèn)后,重新啟動(dòng)實(shí)例2。

先強(qiáng)制關(guān)閉:

$ sqlplus -prelim / as sysdba

 

SQL*Plus: Release 10.2.0.5.0 - Productionon Thu Jun 14 18:10:11 2012

 

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

 

SQL> shutdown abort

ORACLE instance shut down.

SQL> exit

Disconnected from ORACLE

 

再正常啟動(dòng):

$ sqlplus / as sysdba

 

SQL*Plus: Release 10.2.0.5.0 - Productionon Thu Jun 14 18:10:34 2012

 

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

 

Connected to an idle instance.

 

SQL> startup

ORACLE instance started.

 

Total System Global Area 1610612736bytes

Fixed Size                  2096736 bytes

Variable Size            1308623264 bytes

Database Buffers          285212672 bytes

Redo Buffers               14680064 bytes

Database mounted.

Database opened.

實(shí)例重啟后,所有應(yīng)用能正常連接到實(shí)例2。

接下來進(jìn)行原因的深入分析,以防同樣的故障再次出現(xiàn)。

4、故障分析

 

首先我們來看hanganalyze的信息輸出:

*** 2012-06-14 18:01:25.716

==============

HANG ANALYSIS:

==============

Found 177 objects waitingfor

<1/1014/60449/0x6fbfedb8/925832/SGA:allocation forcing componen>

Found 33 objects waiting for

<1/1093/1/0x6fbe9868/295728/NoWait>

Open chains found:

Chain 1 : :

<1/1093/1/0x6fbe9868/295728/NoWait>

 -- <1/1014/60449/0x6fbfedb8/925832/SGA:allocation forcing componen>

 -- <1/674/1309/0x6fc258d8/689108/librarycache load lock>

本次的故障原因從這個(gè)輸出就一目了然,Oracle進(jìn)行內(nèi)部內(nèi)存自動(dòng)分配時(shí),遲遲沒有結(jié)束,導(dǎo)致177個(gè)對(duì)象等待著新的內(nèi)存空間。

進(jìn)一步的從systemdump輸出:

 

…….
waiting for SGA: allocation forcing component growthwait_time=0, seconds since wait started=3
….

SO: 700000050f2a528, type: 50, owner:70000006d500910, flag: INIT/-/-/0x00

     row cache enqueue: count=1 session=70000006d45d6e0object=700000058c461b8, mode=S

     savepoint=0x7a5dd06

     row cache parent object: address=700000058c461b8cid=8(dc_objects)

     hash=875b7c55 typ=11 transaction=0 flags=00008000

     own=700000058c46288[700000050f2a558,700000050f2a558]

……

可以看出,由于Oracle在進(jìn)行內(nèi)存分配時(shí),將一些對(duì)象從內(nèi)存中踢出去了,新的SQL解析時(shí),需要裝載新的對(duì)象,然而沒有可用內(nèi)存空間,因此數(shù)據(jù)庫實(shí)例hang住。

 

 

根據(jù)以往經(jīng)驗(yàn),發(fā)生這種等待,通常是采用了Oracle的ASMM(自動(dòng)共享內(nèi)存管理),即SGA自動(dòng)管理引起。

從實(shí)例2的告警日志alert_bxxx2.log,我們發(fā)現(xiàn):

Thu Jun 14 18:10:38 BEIST 2012

Starting ORACLE instance (normal)

sskgpgetexecname failed to get name

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Interface type 1 en1 172.16.1.0configured from OCR for use as a cluster interconnect

Interface type 1 en8 10.153.246.128configured from OCR for use as  a publicinterface

Picked latch-free SCN scheme 3

Using LOG_ARCHIVE_DEST_1 parameterdefault value as /oracle/product/10.2.0/db/dbs/arch

LICENSE_MAX_USERS = 0

SYS auditing is disabled

ksdpec: called for event 13740 priorto event group initialization

Starting up ORACLE RDBMS Version:10.2.0.5.0.

System parameters with non-defaultvalues:

 processes                = 1000

…….

 nls_territory            = CHINA

  sga_target               = 1610612736

 control_files            =/oradata1/control01.ctl, /oradata2/control02.ctl, /oradata3/control03.ctl

 db_block_size            = 8192

 __db_cache_size          =285212672

 compatible               =10.2.0.5.0

 db_files                 = 2000

 db_file_multiblock_read_count= 16

 cluster_database         = TRUE

 cluster_database_instances= 2

 thread                   = 2

…….

db_name                  = bxxx

 open_cursors             = 300

 pga_aggregate_target     =8311013376

 

確實(shí)采用了SGA自動(dòng)管理(sga_target不為0)。

且總共80G的物理內(nèi)存,只分配給Oracle SGA 2G不到,而本機(jī)唯一的任務(wù)就是確保Oracle系統(tǒng)正常。

5故障分析建議

 

結(jié)合本次故障發(fā)生的原因及新炬在移動(dòng)行業(yè)維護(hù)經(jīng)驗(yàn),我們給出如下三點(diǎn)配置修改建議:

1.將SGA管理改為手動(dòng)。

2.數(shù)據(jù)庫會(huì)話在400個(gè)左右事,將SGA從2G增加到40G,其中:

shared_pool_size=1G

db_cache_size=35G

SGA_MAX_SIZE=40G

SGA_TARGET=0

3.關(guān)閉DRM特性:

_gc_affinity_time=0                                  # Only if DBversion is 10.1 or 10.2

_gc_undo_affinity=FALSE                       # Only if Db version is10.2

 

注:上述建議都需要停止數(shù)據(jù)庫,因此需安排計(jì)劃性停機(jī)。

 

同時(shí),我們開發(fā)了數(shù)據(jù)庫實(shí)例hang住時(shí)的自動(dòng)采集腳本,供數(shù)據(jù)庫發(fā)生hang住的情況下,快速搜集相關(guān)信息(存成文件放到Oracle用戶下的目錄即可):

#

#auto_hang_analyze.sh

#created by shsnc @20120614

#

#any question please sendemailto:[email protected]

#

#!/usr/bin/ksh

 

. ~/.profile

 

#自動(dòng)終止上次運(yùn)行的進(jìn)程

 

for line in `ps -ef|grepauto_hang_analyze.sh|grep -v grep|awk {print $2}`

 

do

 

      for line1 in `ps -ef|grep $line|grep sqlplus|awk {print $2}`

 

      do

 

         for line2 in `ps -ef|grep $line1|grepLOCAL=YES|awk {print $2}`

 

         do

 

           echo $line2

 

           kill -9 $line2

 

         done

 

      done

 

done

 

 

 

count=`ps -ef |grep $0 |grep -v grep|wc -l`

 

if [[ count -gt 2 ]]; then

 

       echo $0 already running!

 

       exit

 

fi

 

echo `date`

 

# 判斷是否需要進(jìn)行自動(dòng)執(zhí)行hang analyze

 

sqlplus -prelim  / as sysdba <

 

set feedback off

 

set termout off;

 

ttitle off;

 

btitle off;

 

set heading off

 

set timing off;

 

set verify off;

 

set echo off;

 

spool session_event_cnt.out

 

select get_event_cnt from dual;

 

spool off;

 

exit

 

EOF

 

event_cnt=`grep -v SQLsession_event_cnt.out|awk {print $1}`

 

if [[ event_cnt -lt 30 ]]; then

 

       echo no need hang analyze!

 

       exit

 

fi

 

#進(jìn)行hang analyze

 

 

 

sqlplus -prelim / as sysdba<

 

set feedback off

 

set termout off;

 

ttitle off;

 

btitle off;

 

set heading off

 

set timing off;

 

set verify off;

 

set echo off;

 

spool hang_analyze.out

 

oradebug setmypid

 

oradebug unlimit;

 

oradebug hanganalyze 3;

oradebug dump systemstate  1;

 

 

 

spool off;

 

exit

 

EOF

 

#取hang analyze 文件名

 

hanganalyze_file=`grep -i -E HangAnalysis hang_analyze.out|awk {print $4}`

 

if test -z "$hanganalyze_file";then

 

 exit

 

fi

 

echo $hanganalyze_file

 

#生成自動(dòng)kill腳本

 

awk{if(index($0,"Found")>0) {printf"%s ",$0} else {print$0}} $hanganalyze_file|grep Found|awk {if($2>20) {{split($7,A,"/")}{ print "ps -ef|grep "A[5]"|grepLOCAL=NO|awk 47{print "

 

-9 "$2}47|xargs kill"}}}>kill_hang_process.sh

 

 

 

#執(zhí)行自動(dòng)終止腳本并備份腳本

 

if [ `cat kill_hang_process.sh|wc -l`-gt 0 ]

 

then

 

sh kill_hang_process.sh

 

echo $hanganalyze_file>>kill_hang_process.sh

 

cp kill_hang_process.sh ./ak_log/kill_hang_process.sh.`date+%b_%d_%H_%M_%S`

 

fi

 

echo `date`



文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址:http://systransis.cn/yun/130246.html

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

最新活動(dòng)
閱讀需要支付1元查看
<