成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

RAC雙節(jié)點crash回復(fù)一例

IT那活兒 / 1706人閱讀
RAC雙節(jié)點crash回復(fù)一例

客戶現(xiàn)場兩節(jié)點庫crash告警。運維人員緊急將數(shù)據(jù)庫拉起,應(yīng)用恢復(fù)。但啟動后alert log 報錯ORA-16191和ORA-01031,為DataGuard主備庫密碼文件不一致所致, 重建密碼文件后, 故障解決。

 分析alert log發(fā)現(xiàn):16:32,節(jié)點1讀取控制文件發(fā)現(xiàn)壞塊,緊接著16:33分實例無法正常讀取控制文件導(dǎo)致crash,然后實例2在16:35關(guān)閉。經(jīng)檢查控制文件并未存在壞塊,初步判定為數(shù)據(jù)庫短暫讀取控制文件失敗導(dǎo)致BUG。 

發(fā)起SR,經(jīng)SSC人員及SR后臺專家共同確認(rèn)為bug 11698676,該bug與bug  9549042為重復(fù)bug,并在patch 9549042上被fixed。 

2. 故障分析/處理

2.1 故障處理 

  4月5日16:34, ssyy庫兩節(jié)點相繼crash, 緊急接入后確認(rèn)兩實例已被徹底關(guān)閉、監(jiān)聽仍然開啟,緊急startup將兩實例拉起,應(yīng)用恢復(fù)連接至生產(chǎn)庫。

  重啟實例后,檢查節(jié)點1 alert log 發(fā)現(xiàn): 

Check that the primary and standby are using a password file

and remote_login_passwordfile is set to SHARED or EXCLUSIVE, 

and that the SYS password is same in the password files.

returning error ORA-16191

    提示為SYS主備庫上密碼文件不一致導(dǎo)致, 于是決定主庫重建密碼文件,并將新生成的密碼文件拷至備庫節(jié)點應(yīng)用(操作前備份原密碼文件,并更改主庫SYS密碼).

  分別在primary-rac兩個節(jié)點上執(zhí)行密碼文件創(chuàng)建語句.

orapwd file=/oracle/db/oracle/product/11.1.0/db/dbs/ssyydb1 entries=5 force=y  password=*********

orapwd file=/oracle/db/oracle/product/11.1.0/db/dbs/ssyydb2 entries=5 force=y  password=*********

       分別將ssyydb1和ssyydb2依次拷至standby-rac節(jié)點1和節(jié)點2.   

  primary-rac1節(jié)點alert log 仍持續(xù)報錯:

Errors in file /oracle/db/diag/rdbms/ssyy/ssyy1/trace/ssyy1_arc2_4134.trc:

ORA-01031: insufficient privileges

PING[ARC2]: Heartbeat failed to connect to standby drdb. Error is 1031.

     此時,主庫節(jié)點1無法向備庫節(jié)點1傳送archive log. 查詢MOS,ORA-01031仍為主備庫密碼文件不一致導(dǎo)致,懷疑主庫歸檔進(jìn)程使用了主機緩存密碼文件導(dǎo)致,因歸檔進(jìn)程為非關(guān)鍵進(jìn)程,kill -9 后會重新啟動,對當(dāng)前數(shù)據(jù)庫無影響。 

  依次kill主庫節(jié)點1和節(jié)點2所有歸檔進(jìn)程,節(jié)點1仍持續(xù)報錯ORA-01031。

  sqlplus連接確認(rèn)主備庫上SYS密碼已更改.

  檢查新生成的密碼文件是否已被應(yīng)用:

--主庫節(jié)點

SQL> select * from  v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS

------------------------------ ----- ----- -----

SYS                            TRUE  TRUE  FALSE

--備庫節(jié)點

SQL> select * from  v$pwfile_users;

no rows selected

     顯然,主庫密碼文件已被應(yīng)用,備庫密碼文件未被應(yīng)用。

     仔細(xì)檢查備庫密碼文件, 文件名未滿足orapw<$ORACLE_SID>命名規(guī)則, 密碼文件沿      用主庫密碼文件,但備庫實例名區(qū)別于主庫實例名。

     修改備庫密碼文件名:

mv $ORACLE_HOME/dbs/ssyydb1 $ORACLE_HOME/dbs/orapwdrdb1 

mv $ORACLE_HOME/dbs/ssyydb2 $ORACLE_HOME/dbs/orapwdrdb2

     持續(xù)觀察幾分鐘,ORA-01031錯誤未解決. 

  查詢MOS,參照ORA-1031 for Remote Archive Destination on Primary (Doc ID 733793.1)解決方案操作.

1. Make sure parameter REMOTE_LOGIN_PASSWORDFILE is set to EXCLUSIVE or SHARED in both databases.  


2. Copy the password file again from primary : 


a. Defer the log_archive_dest_2 on primary: 

SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2 = DEFER; 


b. Copy/ftp the password file from primary to standby and rename it accordingly on the standby database. Creating the password file on standby with orapwd-utility is not supported for 11g anymore.

Make sure that name of password file on both primary and standby is : orapw. Name of the password file is case sensitive. If SID of database on standby is prod then name of the password file should be orapwprod, orapwPROD will not work. 


c. Enable the log_archive_dest_2 on primary: 

SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2 = ENABLE; 


d. Switch 2-3 log files on primary : 

SQL> ALTER SYSTEM SWITCH LOGFILE; 


e. Check the status of log_archive_dest_2 on primary. 

SQL> SELECT STATUS,ERROR FROM V$ARCHIVE_DEST WHERE DEST_ID =2; 

STATUS    ERROR 

--------- ----------------------------------------------------------------- 

VALID 

     持續(xù)跟蹤主庫節(jié)點alert log ,在持續(xù)ORA-01031報錯3-5分鐘后, 主庫節(jié)點均能正常向備庫節(jié)點傳送archive log,備庫實例也能正常應(yīng)用archive log, 主庫節(jié)點1和節(jié)點2 alert log 也未曾重現(xiàn)ORA-01031和ORA-16191.

     至此,故障全部解決! 

2.2 crash分析 

    

    首先,檢查兩節(jié)點syslog,無異常,排除主機因素。

     實例1 alert log:

Fri Apr 05 15:58:52 2013

Archived Log entry 34220 added for thread 1 sequence 12072 ID 0x9441c6d1 dest 1:

Fri Apr 05 16:32:39 2013

Read from controlfile member /dev/oravg/rlv_cntl1 has found a corrupted block (blk# 4, cf seq# 0)

Hex dump of (file 0, block 4) in trace file /oracle/db/diag/rdbms/ssyy/ssyy1/trace/ssyy1_lmon_22418.trc

Corrupt block relative dba: 0x00000004 (file 0, block 4)

Bad check value found during control file block read

Data in bad block:

type: 21 format: 2 rdba: 0x00000004

last change scn: 0x0000.00000000 seq: 0x1 flg: 0x04

spare1: 0x0 spare2: 0x0 spare3: 0x0

consistency value in tail: 0x00001501

check value in block header: 0x8f5d

computed block checksum: 0x2

Re-read from controlfile member /dev/oravg/rlv_cntl1 returned valid block 4

Hex dump of (file 0, block 4) in trace file /oracle/db/diag/rdbms/ssyy/ssyy1/trace/ssyy1_lmon_22418.trc

Errors in file /oracle/db/diag/rdbms/ssyy/ssyy1/trace/ssyy1_lmon_22418.trc:

ORA-00202: control file: /dev/oravg/rlv_cntl1

Errors in file /oracle/db/diag/rdbms/ssyy/ssyy1/trace/ssyy1_lmon_22418.trc  (incident=888259):

ORA-00227: corrupt block detected in control file: (block 4, # blocks 1)

ORA-00202: control file: /dev/oravg/rlv_cntl1

Incident details in: /oracle/db/diag/rdbms/ssyy/ssyy1/incident/incdir_888259/ssyy1_lmon_22418_i888259.trc

Fri Apr 05 16:33:24 2013

Errors in file /oracle/db/diag/rdbms/ssyy/ssyy1/trace/ssyy1_lmon_22418.trc:

ORA-00227: corrupt block detected in control file: (block 4, # blocks 1)

ORA-00202: control file: /dev/oravg/rlv_cntl1

LMON (ospid: 22418): terminating the instance due to error 227

     16:32:39,實例1在讀控制文件/dev/oravg/rlv_cntl1的時候出錯,發(fā)現(xiàn)壞塊。

     16:33:24,實例1因無法正常讀取控制文件導(dǎo)致實例crash。 

     檢查三個控制文件,未發(fā)現(xiàn)壞塊。

ssyy1: dbv file=/dev/datavg02/rlv_cntl1 blocksize=16384

ssyy1: dbv file=/dev/datavg02/rlv_cntl2 blocksize=16384

ssyy1: dbv file=/dev/datavg02/rlv_cntl3 blocksize=16384

     

     查看節(jié)點2 crsd.log: 16:35:23由于數(shù)據(jù)庫異常offline,CRS停掉實例2.

2013-04-05 16:32:42.179: [  CRSRES][6345673] Resource recovery not purged:ora.ssyy.ssyy2.inst

2013-04-05 16:32:42.205: [  CRSRES][6345673] ora.ssyy.ssyy2.inst target set to OFFLINE before stop action

2013-04-05 16:32:42.206: [  CRSRES][6345673] StopResource: setting CLI values

2013-04-05 16:32:42.252: [  CRSRES][6345673] Attempting to stop `ora.ssyy.ssyy2.inst` on member `ssyy2`

2013-04-05 16:33:40.826: [    CRSD][54] SM: rE2Ec: 4

2013-04-05 16:33:40.896: [  CRSRES][6345681] ora.ssyy.db target set to OFFLINE before stop action

2013-04-05 16:33:40.896: [  CRSRES][6345681] StopResource: setting CLI values

2013-04-05 16:33:42.288: [    CRSD][6345681] SM:dE2Ec: all E2E cmds done. 0

2013-04-05 16:35:23.123: [  CRSRES][6345695] Resource recovery not purged:ora.ssyy.db

2013-04-05 16:35:23.124: [  CRSRES][6345695] `ora.ssyy.db` is already OFFLINE.

2013-04-05 16:35:23.173: [  CRSRES][6345673] Stop of `ora.ssyy.ssyy2.inst` on member `ssyy2` succeeded.

     

     初步懷疑為bug導(dǎo)致, 發(fā)起SR,經(jīng)SSC人員及SR后臺專家共同確認(rèn),命中bug 11698676。

     該bug與bug 9549042為重復(fù)bug, 在當(dāng)前HP-UX Itanium 64 bit 平臺下,有現(xiàn)成patch 9549042。

2.3 解決方案 

     官方建議,盡快打patch 9549042, 以規(guī)避此crash故障再現(xiàn)。


文章版權(quán)歸作者所有,未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請注明本文地址:http://systransis.cn/yun/130244.html

相關(guān)文章

發(fā)表評論

0條評論

IT那活兒

|高級講師

TA的文章

閱讀更多
最新活動
閱讀需要支付1元查看
<