一個簡單的強化學(xué)習(xí)實現(xiàn)案列-基于學(xué)習(xí)自動機的鏈路預(yù)測模型

malakashi 發(fā)布于2019-07-31 10:13 / 1448人閱讀

摘要：強化學(xué)習(xí)強化學(xué)習(xí)英語，簡稱是機器學(xué)習(xí)中的一個領(lǐng)域，強調(diào)如何基于環(huán)境而行動，以取得最大化的預(yù)期利益。在強化學(xué)習(xí)的領(lǐng)域中，學(xué)習(xí)自動機的特征是馬可夫決策過程。

強化學(xué)習(xí)

強化學(xué)習(xí)（英語：Reinforcement learning，簡稱RL）是機器學(xué)習(xí)中的一個領(lǐng)域，強調(diào)如何基于環(huán)境而行動，以取得最大化的預(yù)期利益。其靈感來源于心理學(xué)中的行為主義理論，即有機體如何在環(huán)境給予的獎勵或懲罰的刺激下，逐步形成對刺激的預(yù)期，產(chǎn)生能獲得最大利益的習(xí)慣性行為。這個方法具有普適性，因此在其他許多領(lǐng)域都有研究，例如博弈論、控制論、運籌學(xué)、信息論、仿真優(yōu)化、多主體系統(tǒng)學(xué)習(xí)、群體智能、統(tǒng)計學(xué)以及遺傳算法。在運籌學(xué)和控制理論研究的語境下，強化學(xué)習(xí)被稱作“近似動態(tài)規(guī)劃”（approximate dynamic programming，ADP）。在最優(yōu)控制理論中也有研究這個問題，雖然大部分的研究是關(guān)于最優(yōu)解的存在和特性，并非是學(xué)習(xí)或者近似方面。在經(jīng)濟學(xué)和博弈論中，強化學(xué)習(xí)被用來解釋在有限理性的條件下如何出現(xiàn)平衡。

在機器學(xué)習(xí)問題中，環(huán)境通常被規(guī)范為馬可夫決策過程（MDP），所以許多強化學(xué)習(xí)算法在這種情況下使用動態(tài)規(guī)劃技巧。傳統(tǒng)的技術(shù)和強化學(xué)習(xí)算法的主要區(qū)別是，后者不需要關(guān)于MDP的知識，而且針對無法找到確切方法的大規(guī)模MDP。

強化學(xué)習(xí)和標(biāo)準(zhǔn)的監(jiān)督式學(xué)習(xí)之間的區(qū)別在于，它并不需要出現(xiàn)正確的輸入/輸出對，也不需要精確校正次優(yōu)化的行為。強化學(xué)習(xí)更加專注于在線規(guī)劃，需要在探索（在未知的領(lǐng)域）和遵從（現(xiàn)有知識）之間找到平衡。強化學(xué)習(xí)中的“探索-遵從”的交換，在多臂老虎機問題和有限MDP中研究得最多。

學(xué)習(xí)自動機

學(xué)習(xí)自動機是在一隨機環(huán)境下的適應(yīng)性決策產(chǎn)生單元，可以根據(jù)和環(huán)境重復(fù)的互動來學(xué)習(xí)最佳的動作。動作是依照特定的機率分布來決定，而系統(tǒng)會依采取特定行動后的環(huán)境反應(yīng)來更新機率分布。

在強化學(xué)習(xí)的領(lǐng)域中，學(xué)習(xí)自動機的特征是馬可夫決策過程。政策迭代者會直接處理π，這點其他強化學(xué)習(xí)的算法不同。另一個政策迭代者的例子是演化算法。

鏈路預(yù)測

網(wǎng)絡(luò)中的鏈路預(yù)測(Link Prediction)是指如何通過已知的網(wǎng)絡(luò)節(jié)點以及網(wǎng)絡(luò)結(jié)構(gòu)等信息預(yù)測網(wǎng)絡(luò)中尚未產(chǎn)生連邊的兩個節(jié)點之間產(chǎn)生鏈接的可能性。這種預(yù)測既包含了對未知鏈接（exist yet unknown links）的預(yù)測也包含了對未來鏈接（future links）的預(yù)測。該問題的研究在理論和應(yīng)用兩個方面都具有重要的意義和價值。

基于學(xué)習(xí)自動機的鏈路預(yù)測模型實現(xiàn)

import numpy as np
import time
from random import choice
import pandas as pd
import os

定義計算共同鄰居指標(biāo)的方法 define some functions to calculate some baseline index

"""
def Cn(MatrixAdjacency):

Matrix_similarity = np.dot(MatrixAdjacency,MatrixAdjacency)
return Matrix_similarity

"""

計算Jaccard相似性指標(biāo)

def Jaccavrd(MatrixAdjacency_Train):

Matrix_similarity = np.dot(MatrixAdjacency_Train,MatrixAdjacency_Train)

deg_row = sum(MatrixAdjacency_Train)
deg_row.shape = (deg_row.shape[0],1)
deg_row_T = deg_row.T
tempdeg = deg_row + deg_row_T
temp = tempdeg - Matrix_similarity

Matrix_similarity = Matrix_similarity / temp
return Matrix_similarity

定義計算Salton指標(biāo)的方法

def Salton_Cal(MatrixAdjacency_Train):

similarity = np.dot(MatrixAdjacency_Train,MatrixAdjacency_Train)

deg_row = sum(MatrixAdjacency_Train)
deg_row.shape = (deg_row.shape[0],1)
deg_row_T = deg_row.T
tempdeg = np.dot(deg_row,deg_row_T)
temp = np.sqrt(tempdeg)

np.seterr(divide="ignore", invalid="ignore")
Matrix_similarity = np.nan_to_num(similarity / temp)

print np.isnan(Matrix_similarity) Matrix_similarity = np.nan_to_num(Matrix_similarity) print np.isnan(Matrix_similarity)

return Matrix_similarity

定義計算Katz1指標(biāo)的方法

def Katz_Cal(MatrixAdjacency):

#α取值
Parameter = 0.01
Matrix_EYE = np.eye(MatrixAdjacency.shape[0])
Temp = Matrix_EYE - MatrixAdjacency * Parameter

Matrix_similarity = np.linalg.inv(Temp)

Matrix_similarity = Matrix_similarity - Matrix_EYE
return Matrix_similarity

定義計算局部路徑LP相似性指標(biāo)的方法

"""
def LP_Cal(MatrixAdjacency):

Matrix_similarity = np.dot(MatrixAdjacency,MatrixAdjacency)

Parameter = 0.05
Matrix_LP = np.dot(np.dot(MatrixAdjacency,MatrixAdjacency),MatrixAdjacency) * Parameter

Matrix_similarity = np.dot(Matrix_similarity,Matrix_LP)
return Matrix_similarity

"""

計算資源分配（Resource Allocation）相似性指標(biāo)

def RA(MatrixAdjacency_Train):

RA_Train = sum(MatrixAdjacency_Train)
RA_Train.shape = (RA_Train.shape[0],1)
MatrixAdjacency_Train_Log = MatrixAdjacency_Train / RA_Train
MatrixAdjacency_Train_Log = np.nan_to_num(MatrixAdjacency_Train_Log)

Matrix_similarity = np.dot(MatrixAdjacency_Train,MatrixAdjacency_Train_Log)
return Matrix_similarity

隨機環(huán)境一：針對活躍性的節(jié)點對

def RandomEnviromentForActive(MatrixAdjacency,i,j):

Index = np.random.randint(1, 5)
print(Index)
global IndexName
if Index == 1:
    IndexName = "相似性指標(biāo)是：Jaccard Index"
    print(IndexName)
    similarity_matrix = Jaccavrd(MatrixAdjacency)
    similarity = similarity_matrix[i,j]
elif Index == 2:
    IndexName = "相似性指標(biāo)是：Salton Index"
    print(IndexName)
    similarity_matrix = Salton_Cal(MatrixAdjacency)
    similarity = similarity_matrix[i,j]
elif Index == 3:
    IndexName = "相似性指標(biāo)是：Katz Index"
    print(IndexName)
    similarity_matrix = Katz_Cal(MatrixAdjacency)
    similarity = similarity_matrix[i,j]
else index == 4:
    IndexName = "相似性指標(biāo)是：RA Index"
    print(IndexName)
    similarity_matrix = RA(MatrixAdjacency)
    similarity = similarity_matrix[i,j]
return similarity

隨機環(huán)境二：主要針對非活躍性的節(jié)點對

def RandomEnviromentForNonActive():

Action = np.random.randint(1, 4)
if Action == 1:
    ActionName = "ID3"
    similarity_matrix = ID3_Cal(MatrixAdjacency)
    #similarity = similarity_matrix[i,j]
elif Action == 2:
    ActionName = "CART"
    similarity_matrix = Cart_Cal(MatrixAdjacency)
    #similarity = similarity_matrix[i,j]
elif Action == 3:
    ActionName = "C4.5"
    similarity_matrix = C4_Cal(MatrixAdjacency)
    #similarity = similarity_matrix[i,j]
return similarity

構(gòu)建學(xué)習(xí)自動機(To Construct the agent)

def ContructionAgent(filepath,n1,n2):

f = open(filepath)
lines = f.readlines()
A = np.zeros((50, 50), dtype=float)
A_row = 0
for line in lines:
    list = line.strip("
").split(" ")
    A[A_row:] = list[0:50]
    A_row += 1

# 初始化p1和p2
a = 0.05
b = 0.01
p1 =0.5
p2 =0.5
Action = 1
# 在這里使用數(shù)字1代表選擇動作‘Yes’,用2代表動作‘No’
for i in range(1):

    #         global Action
    # 相似性閾值（the threashhold_value of similarity）
    if (p1 >= p2):
        Action = 1
    else:
        Action = 2
    print("選擇的動作是：" + str(Action))
    threshhold_value = 0.3
    similarity = RandomEnviromentForActive(A, n1, n2)
    # p1表示動作1"Yes"被選擇的概率，p2表示動作2"No"被選擇的概率
    # 前一次選擇的動作是‘Yes’，并且該動作得到了獎勵
    if (similarity > threshhold_value) and (Action == 1):
        p1 = p1 + a * (1 - p1)
        p2 = 1-p1
       # p2 = (1 - a) * p2
    # 前一次選擇的動作是"No",并且該動作得到了獎勵
    elif (similarity < threshhold_value) and (Action == 2):
        p2 = (1-a)*p2
        p1 = 1-p2
       # p1 = (1 - a) * p1
    # 前一次選擇的動作是‘Yes’，但該動作得到了懲罰
    elif (similarity < threshhold_value) and (Action == 1):
        p2 = 1-b*p2
        p1 = 1-p2
        #p2 = 1 - b * p2

    # 前一次選擇的動作是‘No’，但該動作得到了懲罰
    elif (similarity > threshhold_value) and (Action == 2):
        p1 = b + (1 - b) * (1 - p1)
        p2 = 1-p1
       # p1 = 1 - b * p1

if (p1 >= p2):
    print("下一時刻選擇的動作是:Yes")
else:
    print("下一時刻選擇的動作是:No")
return p1, p2



import os

import pandas as pd
import numpy as np
path=r"../Data/itcmatrixs/36000/"
result = np.zeros((50, 50))
for i in os.walk(path):

#print(i)
#print(type(i))
for m in range(50):
    for n in range(50):
        r = None
        for j in range(26):
            datapath = path+i[2][j]
            p1,p2 = ContructionAgent(datapath,m,n)
            r = int(p1>=p2)
        result[m,n] = r;

r.save("result.npy")
pass

附注

有需要源碼和數(shù)據(jù)集的請發(fā)送信息到[email protected]，感謝您的關(guān)注。

云服務(wù)器 GPU云服務(wù)器基于機器學(xué)習(xí)的簡單的深度學(xué)習(xí) 基于深度學(xué)習(xí)的語音增強基于深度學(xué)習(xí)的圖像識別

文章版權(quán)歸作者所有，未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請注明本文地址：http://systransis.cn/yun/43450.html

發(fā)表評論

登陸后可評論

0條評論

malakashi

男|高級講師

我要關(guān)注我要私信

TA的文章

廣電物聯(lián)網(wǎng)大賽正式開啟

閱讀 1675·2021-10-13 09:39
XXMhost：9月優(yōu)惠，全場7折，月付40元起

閱讀 2109·2021-09-07 10:20
偽類選擇器 first-child 誤用

閱讀 2691·2019-08-30 15:56
Vue指令詳解

閱讀 2958·2019-08-30 15:56
前端 CSS : 4# CSS 實現(xiàn)暖暖的小火堆

閱讀 939·2019-08-30 15:55
由一個“bug”到鮮為人知的jQuery.cssHooks

閱讀 638·2019-08-30 15:46
css掩人耳目式海浪動效

閱讀 3504·2019-08-30 15:44
前端面試知識點整理（附答案）

閱讀 2563·2019-08-30 11:15

成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！

一個簡單的強化學(xué)習(xí)實現(xiàn)案列-基于學(xué)習(xí)自動機的鏈路預(yù)測模型

相關(guān)文章

貪心學(xué)院-圖神經(jīng)網(wǎng)絡(luò)高級訓(xùn)練營

**從Pix2Code到CycleGAN：2017年深度學(xué)習(xí)重大研究進展全解讀**

發(fā)表評論

0條評論

malakashi

男|高級講師

TA的文章

廣電物聯(lián)網(wǎng)大賽正式開啟

XXMhost：9月優(yōu)惠，全場7折，月付40元起

偽類選擇器 first-child 誤用

Vue指令詳解

前端 CSS : 4# CSS 實現(xiàn)暖暖的小火堆

由一個“bug”到鮮為人知的jQuery.cssHooks

css掩人耳目式海浪動效

前端面試知識點整理（附答案）

最新活動

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！

一個簡單的強化學(xué)習(xí)實現(xiàn)案列-基于學(xué)習(xí)自動機的鏈路預(yù)測模型

相關(guān)文章

發(fā)表評論

0條評論

男|高級講師

TA的文章

最新活動

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！