（一）線性循環(huán)神經(jīng)網(wǎng)絡(luò)（RNN）

zilu 發(fā)布于2019-07-30 15:18 / 454人閱讀

摘要：線性循環(huán)神經(jīng)網(wǎng)絡(luò)這部分教程我們來設(shè)計(jì)一個簡單的模型，這個模型的輸入是一個二進(jìn)制的數(shù)據(jù)流，任務(wù)是去計(jì)算這個二進(jìn)制的數(shù)據(jù)流中存在幾個。

作者：chen_h
微信號 & QQ：862251340
微信公眾號：coderpai
簡書地址：https://www.jianshu.com/p/160...

這篇教程是翻譯Peter Roelants寫的循環(huán)神經(jīng)網(wǎng)絡(luò)教程，作者已經(jīng)授權(quán)翻譯，這是原文。

該教程將介紹如何實(shí)現(xiàn)一個循環(huán)神經(jīng)網(wǎng)絡(luò)（RNN），一共包含兩部分。你可以在以下鏈接找到完整內(nèi)容。

（一）線性循環(huán)神經(jīng)網(wǎng)絡(luò)（RNN）

（二）非線性循環(huán)神經(jīng)網(wǎng)絡(luò)（RNN）

這篇教程中的代碼是由 Python 2 IPython Notebook產(chǎn)生的，在教程的最后，我會給出全部代碼的鏈接，幫助學(xué)習(xí)。神經(jīng)網(wǎng)絡(luò)中有關(guān)矩陣的運(yùn)算我們采用NumPy來構(gòu)建，畫圖使用Matplotlib來構(gòu)建。如果你來沒有安裝這些軟件，那么我強(qiáng)烈建議你使用Anaconda Python來安裝，這個軟件包中包含了運(yùn)行這個教程的所有軟件包，非常方便使用。

循環(huán)神經(jīng)網(wǎng)絡(luò)

本教程主要包含三部分：

一個非常簡單的循環(huán)神經(jīng)網(wǎng)絡(luò)（RNN）

基于時(shí)序的反向傳播（BPTT）

彈性優(yōu)化算法

循環(huán)神經(jīng)網(wǎng)絡(luò)是一種可以解決序列數(shù)據(jù)的模型。在時(shí)序模型上面，這種循環(huán)關(guān)系可以定義成如下式子：

其中，Sk表示在時(shí)間k時(shí)刻的狀態(tài)，Xk是在時(shí)序k時(shí)刻的輸入數(shù)據(jù)，Wrec和Wx都是神經(jīng)網(wǎng)絡(luò)的鏈接權(quán)重。如果簡單的理解，可以把RNN理解成是一個帶反饋回路的狀態(tài)模型。由于循環(huán)關(guān)系和延時(shí)處理，時(shí)序狀態(tài)被加入了模型之中。這個延時(shí)操作賦予了模型記憶力，因?yàn)樗苡涀∧Ｐ颓懊嬉粋€狀態(tài)。

神經(jīng)網(wǎng)絡(luò)最后的輸出結(jié)果Yk是在時(shí)間k時(shí)刻計(jì)算出來的，即是通過前面一個或者多個狀態(tài)Sk，....，Sk+j計(jì)算出來的。

接下來，我們就可以通過輸入的數(shù)據(jù)Xk和前一步的狀態(tài)S(k-1)，來計(jì)算當(dāng)前的狀態(tài)S(k)，或者通過輸入的數(shù)據(jù)Xk和前一步的狀態(tài)S(k)來預(yù)測下一步的狀態(tài)S(k+1)。

這篇教程會說明循環(huán)神經(jīng)網(wǎng)絡(luò)和一般的前饋神經(jīng)網(wǎng)絡(luò)沒有很大的不同，但是在訓(xùn)練的方式上面可能會有一些不同。

線性循環(huán)神經(jīng)網(wǎng)絡(luò)

這部分教程我們來設(shè)計(jì)一個簡單的RNN模型，這個模型的輸入是一個二進(jìn)制的數(shù)據(jù)流，任務(wù)是去計(jì)算這個二進(jìn)制的數(shù)據(jù)流中存在幾個1。

在這個教程中，我們設(shè)計(jì)的RNN模型中的狀態(tài)只有一維，在每個時(shí)間點(diǎn)上，輸入數(shù)據(jù)也是一維的，最后輸出的結(jié)果就是序列狀態(tài)的最后一個狀態(tài)，即y = S(k)。我們將RNN模型進(jìn)行展開，就可以得到下圖的模型。注意，展開的模型可以看做是一個 (n+1) 層的神經(jīng)網(wǎng)絡(luò)，每一層使用相同的鏈接權(quán)重Wrec和Wx。

雖然實(shí)現(xiàn)和訓(xùn)練這個模型是一件非常有意思的事情，但是我們可以很容易得到，當(dāng)W(rec) = W(x) = 1時(shí)，模型是最優(yōu)的。

我們先導(dǎo)入教程需要的軟件包

import numpy as np 
import matplotlib
import matplotlib.pyplot as plt 
from matplotlib import cm
from matplotlib.colors import LogNorm

定義數(shù)據(jù)集

輸入數(shù)據(jù)集 X 一共有20組數(shù)據(jù)，每組數(shù)據(jù)的長度是10，即每組數(shù)據(jù)的時(shí)間狀態(tài)步長是10。輸入數(shù)據(jù)是由均勻的隨機(jī)分布產(chǎn)生的，取值 0 或者 1 。

輸出結(jié)果是輸入的二進(jìn)制數(shù)據(jù)流中存在幾個1，也就是把序列的每一位都加起來求和的結(jié)果。

# Create dataset
nb_of_samples = 20
sequence_len = 10
# Create the sequences
X = np.zeros((nb_of_samples, sequence_len))
for row_idx in range(nb_of_samples):
    X[row_idx,:] = np.around(np.random.rand(sequence_len)).astype(int)
# Create the targets for each sequence
t = np.sum(X, axis=1)

通過基于時(shí)序的反向傳播（BPTT）算法進(jìn)行訓(xùn)練

訓(xùn)練RNN的一個典型算法是BPTT（backpropagation through time）算法。通過名字，你也能發(fā)現(xiàn)這是一個基于BP的算法。

如果你很了解常規(guī)的BP算法，那么BPTT算法和常規(guī)的BP算法沒有很大的不同。唯一的不同是，RNN需要每一個特定的時(shí)間步驟中，將每個神經(jīng)元進(jìn)行展開處理而已。展開圖已經(jīng)在教程的最前面進(jìn)行了說明。展開后，模型就和規(guī)則的神經(jīng)網(wǎng)絡(luò)模型很像了。唯一不同是，RNN有多個輸入源（前一個時(shí)間步驟的輸入狀態(tài)和當(dāng)前的輸入數(shù)據(jù)）和每一層中的鏈接矩陣（ W(rec)和W(x) ）都是一樣的。

正向傳播計(jì)算RNN的輸出結(jié)果

正向傳播的時(shí)候，我們會把RNN展開進(jìn)行處理，這樣就可以按照規(guī)則的神經(jīng)網(wǎng)絡(luò)進(jìn)行處理了。RNN模型最后的輸出結(jié)果將會被使用在損失函數(shù)的計(jì)算中，用于訓(xùn)練網(wǎng)絡(luò)。（其實(shí)這些都和常規(guī)的多層神經(jīng)網(wǎng)絡(luò)一樣。）

當(dāng)我們將RNN進(jìn)行展開計(jì)算時(shí)，在不同的時(shí)間點(diǎn)上面，其實(shí)循環(huán)關(guān)系是相同的，我們將這個相同的循環(huán)關(guān)系在 update_state 函數(shù)中實(shí)現(xiàn)了。

forward_states函數(shù)通過 for 循環(huán)，將update_state函數(shù)應(yīng)用到每一個時(shí)間點(diǎn)上面。如果我們將這些步驟都矢量化，那么就可以進(jìn)行并行計(jì)算了。跟常規(guī)神經(jīng)網(wǎng)絡(luò)一樣，我們需要給權(quán)重進(jìn)行初始化。在這個教程中，我們將權(quán)重初始化為0。

最后，我們通過累加所以輸入數(shù)據(jù)的誤差進(jìn)行計(jì)算均方誤差函數(shù)（MSE）來得到損失函數(shù) ξ 。在程序中，我們使用 cost 函數(shù)來實(shí)現(xiàn)。

# Define the forward step functions
def update_state(xk, sk, wx, wRec):
    """
    Compute state k from the previous state (sk) and current input (xk),
    by use of the input weights (wx) and recursive weights (wRec).
    """
    return xk * wx + sk * wRec

def forward_states(X, wx, wRec):
    """
    Unfold the network and compute all state activations given the input X,
    and input weights (wx) and recursive weights (wRec).
    Return the state activations in a matrix, the last column S[:,-1] contains the
    final activations.
    """
    # Initialise the matrix that holds all states for all input sequences.
    # The initial state s0 is set to 0.
    S = np.zeros((X.shape[0], X.shape[1]+1))
    # Use the recurrence relation defined by update_state to update the 
    #  states trough time.
    for k in range(0, X.shape[1]):
        # S[k] = S[k-1] * wRec + X[k] * wx
        S[:,k+1] = update_state(X[:,k], S[:,k], wx, wRec)
    return S

def cost(y, t): 
    """
    Return the MSE between the targets t and the outputs y.
    """
    return ((t - y)**2).sum() / nb_of_samples

反向傳播的梯度計(jì)算

在進(jìn)行反向傳播過程之前，我們需要先計(jì)算誤差的對于輸出結(jié)果的梯度?ξ/?y，函數(shù) output_gradient 實(shí)現(xiàn)了這個梯度計(jì)算過程。這個梯度將會被通過反向傳播算法一層一層的向前傳播，函數(shù) backward_gradient 實(shí)現(xiàn)了這個計(jì)算過程。具體的數(shù)學(xué)推導(dǎo)如下所示：

梯度最開始的計(jì)算公式為：

其中，n 表示RNN展開之后的時(shí)間步長。需要注意的是，參數(shù) Wrec 擔(dān)當(dāng)著反向傳遞誤差的角色。

損失函數(shù)對于權(quán)重的梯度是通過累加每一層中的梯度得到的。具體數(shù)學(xué)公式如下：

def output_gradient(y, t):
    """
    Compute the gradient of the MSE cost function with respect to the output y.
    """
    return 2.0 * (y - t) / nb_of_samples

def backward_gradient(X, S, grad_out, wRec):
    """
    Backpropagate the gradient computed at the output (grad_out) through the network.
    Accumulate the parameter gradients for wX and wRec by for each layer by addition.
    Return the parameter gradients as a tuple, and the gradients at the output of each layer.
    """
    # Initialise the array that stores the gradients of the cost with respect to the states.
    grad_over_time = np.zeros((X.shape[0], X.shape[1]+1))
    grad_over_time[:,-1] = grad_out
    # Set the gradient accumulations to 0
    wx_grad = 0
    wRec_grad = 0
    for k in range(X.shape[1], 0, -1):
        # Compute the parameter gradients and accumulate the results.
        wx_grad += np.sum(grad_over_time[:,k] * X[:,k-1])
        wRec_grad += np.sum(grad_over_time[:,k] * S[:,k-1])
        # Compute the gradient at the output of the previous layer
        grad_over_time[:,k-1] = grad_over_time[:,k] * wRec
    return (wx_grad, wRec_grad), grad_over_time

梯度檢查

對于RNN，我們也需要對其進(jìn)行梯度檢查，具體的檢查方法可以參考在常規(guī)多層神經(jīng)網(wǎng)絡(luò)中的梯度檢查。如果在反向傳播中的梯度計(jì)算正確，那么這個梯度值應(yīng)該和數(shù)值計(jì)算出來的梯度值應(yīng)該是相同的。

# Perform gradient checking
# Set the weight parameters used during gradient checking
params = [1.2, 1.2]  # [wx, wRec]
# Set the small change to compute the numerical gradient
eps = 1e-7
# Compute the backprop gradients
S = forward_states(X, params[0], params[1])
grad_out = output_gradient(S[:,-1], t)
backprop_grads, grad_over_time = backward_gradient(X, S, grad_out, params[1])
# Compute the numerical gradient for each parameter in the layer
for p_idx, _ in enumerate(params):
    grad_backprop = backprop_grads[p_idx]
    # + eps
    params[p_idx] += eps
    plus_cost = cost(forward_states(X, params[0], params[1])[:,-1], t)
    # - eps
    params[p_idx] -= 2 * eps
    min_cost = cost(forward_states(X, params[0], params[1])[:,-1], t)
    # reset param value
    params[p_idx] += eps
    # calculate numerical gradient
    grad_num = (plus_cost - min_cost) / (2*eps)
    # Raise error if the numerical grade is not close to the backprop gradient
    if not np.isclose(grad_num, grad_backprop):
        raise ValueError("Numerical gradient of {:.6f} is not close to the backpropagation gradient of {:.6f}!".format(float(grad_num), float(grad_backprop)))
print("No gradient errors found")

No gradient errors found

參數(shù)更新

由于不穩(wěn)定的梯度，RNN是非常難訓(xùn)練的。這也使得一般對于梯度的優(yōu)化算法，比如梯度下降，都不能使得RNN找到一個好的局部最小值。

我們在下面的兩張圖中說明了RNN梯度的不穩(wěn)定性。第一張圖表示，當(dāng)我們給定 w(x) 和 w(rec) 時(shí)得到的損失表面圖。圖中帶顏色標(biāo)記的地方，是我們?nèi)×藥讉€值做的實(shí)驗(yàn)結(jié)果。從圖中，我們可以發(fā)現(xiàn)，當(dāng)誤差表面的值接近于0時(shí)，w(x) = w(rec) = 1。但是當(dāng) |w(rec)| > 1時(shí)，誤差表面的值增加的非常迅速。

第二張圖我們通過幾組數(shù)據(jù)模擬了梯度的不穩(wěn)定性，這個隨著時(shí)間步長而不穩(wěn)定的梯度的形式和等比數(shù)列的形式很像，具體數(shù)學(xué)公式如下：

在狀態(tài)S(k)時(shí)的梯度，反向傳播m步得到的狀態(tài)S(k-m)可以被寫成：

在我們簡單的線性模型中，如果 |w(rec)| > 1，那么梯度是一個指數(shù)爆炸的增長。如果 |w(rec)| < 1，那么梯度將會消失。

關(guān)于指數(shù)暴漲，在第二張圖中，當(dāng)我們?nèi)?w(x) =1, w(rec) = 2時(shí)，在圖中顯示梯度是指數(shù)爆炸增長的，當(dāng)我們?nèi)?w(x) =1, w(rec) = -2時(shí)，正負(fù)徘徊指數(shù)增長，為什么會出現(xiàn)徘徊？是因?yàn)槲覀儼褏?shù) w(rec) 取成了負(fù)數(shù)。這個指數(shù)爆炸說明了，模型的訓(xùn)練對參數(shù) w(rec) 是非常敏感的。

關(guān)于梯度消失，在第二張圖中，當(dāng)我們?nèi)?w(x) = 1, w(rec) = 0.5和 w(x) = 1, w(rec) = -0.5時(shí)，那么梯度將會指數(shù)下降，直至消失。這個梯度消失表示模型不能長時(shí)間的訓(xùn)練，因?yàn)樽詈筇荻葘А?/p>

如果 w(rec) = 0 時(shí)，梯度馬上變成了0。當(dāng) w(rec) = 1時(shí)，梯度隨著時(shí)間不變。

在下一部分，我們將說明怎么去優(yōu)化一個不穩(wěn)定的誤差函數(shù)。

# Define plotting functions

# Define points to annotate (wx, wRec, color)
points = [(2,1,"r"), (1,2,"b"), (1,-2,"g"), (1,0,"c"), (1,0.5,"m"), (1,-0.5,"y")]

def get_cost_surface(w1_low, w1_high, w2_low, w2_high, nb_of_ws, cost_func):
    """Define a vector of weights for which we want to plot the cost."""
    w1 = np.linspace(w1_low, w1_high, num=nb_of_ws)  # Weight 1
    w2 = np.linspace(w2_low, w2_high, num=nb_of_ws)  # Weight 2
    ws1, ws2 = np.meshgrid(w1, w2)  # Generate grid
    cost_ws = np.zeros((nb_of_ws, nb_of_ws))  # Initialize cost matrix
    # Fill the cost matrix for each combination of weights
    for i in range(nb_of_ws):
        for j in range(nb_of_ws):
            cost_ws[i,j] = cost_func(ws1[i,j], ws2[i,j])
    return ws1, ws2, cost_ws

def plot_surface(ax, ws1, ws2, cost_ws):
    """Plot the cost in function of the weights."""
    surf = ax.contourf(ws1, ws2, cost_ws, levels=np.logspace(-0.2, 8, 30), cmap=cm.pink, norm=LogNorm())
    ax.set_xlabel("$w_{in}$", fontsize=15)
    ax.set_ylabel("$w_{rec}$", fontsize=15)
    return surf

def plot_points(ax, points):
    """Plot the annotation points on the given axis."""
    for wx, wRec, c in points:
        ax.plot(wx, wRec, c+"o", linewidth=2)

def get_cost_surface_figure(cost_func, points):
    """Plot the cost surfaces together with the annotated points."""
    # Plot figures
    fig = plt.figure(figsize=(10, 4))   
    # Plot overview of cost function
    ax_1 = fig.add_subplot(1,2,1)
    ws1_1, ws2_1, cost_ws_1 = get_cost_surface(-3, 3, -3, 3, 100, cost_func)
    surf_1 = plot_surface(ax_1, ws1_1, ws2_1, cost_ws_1 + 1)
    plot_points(ax_1, points)
    ax_1.set_xlim(-3, 3)
    ax_1.set_ylim(-3, 3)
    # Plot zoom of cost function
    ax_2 = fig.add_subplot(1,2,2)
    ws1_2, ws2_2, cost_ws_2 = get_cost_surface(0, 2, 0, 2, 100, cost_func)
    surf_2 = plot_surface(ax_2, ws1_2, ws2_2, cost_ws_2 + 1)
    plot_points(ax_2, points)
    ax_2.set_xlim(0, 2)
    ax_2.set_ylim(0, 2)
    # Show the colorbar
    fig.subplots_adjust(right=0.8)
    cax = fig.add_axes([0.85, 0.12, 0.03, 0.78])
    cbar = fig.colorbar(surf_1, ticks=np.logspace(0, 8, 9), cax=cax)
    cbar.ax.set_ylabel("$xi$", fontsize=15, rotation=0, labelpad=20)
    cbar.set_ticklabels(["{:.0e}".format(i) for i in np.logspace(0, 8, 9)])
    fig.suptitle("Cost surface", fontsize=15)
    return fig

def plot_gradient_over_time(points, get_grad_over_time):
    """Plot the gradients of the annotated point and how the evolve over time."""
    fig = plt.figure(figsize=(6.5, 4))  
    ax = plt.subplot(111)
    # Plot points
    for wx, wRec, c in points:
        grad_over_time = get_grad_over_time(wx, wRec)
        x = np.arange(-grad_over_time.shape[1]+1, 1, 1)
        plt.plot(x, np.sum(grad_over_time, axis=0), c+"-", label="({0}, {1})".format(wx, wRec), linewidth=1, markersize=8)
    plt.xlim(0, -grad_over_time.shape[1]+1)
    # Set up plot axis
    plt.xticks(x)
    plt.yscale("symlog")
    plt.yticks([10**8, 10**6, 10**4, 10**2, 0, -10**2, -10**4, -10**6, -10**8])
    plt.xlabel("timestep k", fontsize=12)
    plt.ylabel("$frac{partial xi}{partial S_{k}}$", fontsize=20, rotation=0)
    plt.grid()
    plt.title("Unstability of gradient in backward propagation.
(backpropagate from left to right)")
    # Set legend
    leg = plt.legend(loc="center left", bbox_to_anchor=(1, 0.5), frameon=False, numpoints=1)
    leg.set_title("$(w_x, w_{rec})$", prop={"size":15})
    
def get_grad_over_time(wx, wRec):
    """Helper func to only get the gradient over time from wx and wRec."""
    S = forward_states(X, wx, wRec)
    grad_out = output_gradient(S[:,-1], t).sum()
    _, grad_over_time = backward_gradient(X, S, grad_out, wRec)
    return grad_over_time

# Plot cost surface and gradients

# Get and plot the cost surface figure with markers
fig = get_cost_surface_figure(lambda w1, w2: cost(forward_states(X, w1, w2)[:,-1] , t), points)

# Get the plots of the gradients changing by backpropagating.
plot_gradient_over_time(points, get_grad_over_time)
# Show figures
plt.show()

彈性優(yōu)化算法

在上面的部分，我們已經(jīng)介紹了RNN的梯度是非常不穩(wěn)定的，所以梯度在損失表面的跳躍度是非常大的，也就是說優(yōu)化程序可能將最優(yōu)值帶到離真實(shí)最優(yōu)值很遠(yuǎn)的地方，如下圖：

根據(jù)在我們神經(jīng)網(wǎng)絡(luò)里面的基礎(chǔ)教程，梯度下降法更新參數(shù)的公式如下：

其中，W(i) 表示在第 i 次迭代時(shí) W 的值，μ 是學(xué)習(xí)率。

在訓(xùn)練過程中，當(dāng)我們?nèi)?w(x) = 1 和 w(rec) = 2時(shí)，誤差表面上的藍(lán)色點(diǎn)的梯度值將達(dá)到 10^7。盡管我們把學(xué)習(xí)率取的非常小，比如0.000001(1e-6)，但是參數(shù) W 也將離開原來的距離 10 個單位，在我們的模型中，這將會導(dǎo)致災(zāi)難性的結(jié)果。一個解決方案是我們再降低學(xué)習(xí)率的值，但是這樣做將導(dǎo)致，當(dāng)梯度很小時(shí)，更新的點(diǎn)將保持在原地不動。

對于這個問題，研究者們已經(jīng)找到了很多的方法來解決不穩(wěn)定的梯度，比如Gradient clipping，Hessian-Free Optimization，Momentum。

我們可以使用一些優(yōu)化算法來處理這個不穩(wěn)定梯度，以此來減小梯度的敏感度。其中一個技術(shù)就是使用彈性反向傳播（Rprop）。彈性反向傳播算法和之前教程中的動量算法非常相似，但是這里只是用在梯度上面，用來更新參數(shù)。Rprop算法描述如下：

一般情況下，模型的超參數(shù)被設(shè)置為 η^+ = 1.2 和 η^- = 0.5 。如果我們將這個Rprop算法和之前的動量算法進(jìn)行對比的話，我們可以發(fā)現(xiàn)：當(dāng)梯度的符合不改變時(shí)，我們將增加 20% 的權(quán)重；當(dāng)梯度的符合改變時(shí)，我們將減小 50% 的權(quán)重。注意，Rprop算法的更新值 Δ 類似于動量中的速度參數(shù)。不同點(diǎn)是Rprop算法的值只是反映了動量中的速度的值，不包括方向。方向是由當(dāng)前梯度的方向來決定的。

在這個教程中，我們迭代這個Rprop算法 500 次。下圖中的藍(lán)色點(diǎn)就是在誤差表面的更新值。注意圖中，盡管權(quán)重參數(shù)開始的位置是在一個很高的誤差值和一個很高的梯度位置，但是在我們的迭代最后，Rprop算法還是將最優(yōu)值鎖定在坐標(biāo) (1, 1) 左右。

# Define Rprop optimisation function
def update_rprop(X, t, W, W_prev_sign, W_delta, eta_p, eta_n):
    """
    Update Rprop values in one iteration.
    X: input data.
    t: targets.
    W: Current weight parameters.
    W_prev_sign: Previous sign of the W gradient.
    W_delta: Rprop update values (Delta).
    eta_p, eta_n: Rprop hyperparameters.
    """
    # Perform forward and backward pass to get the gradients
    S = forward_states(X, W[0], W[1])
    grad_out = output_gradient(S[:,-1], t)
    W_grads, _ = backward_gradient(X, S, grad_out, W[1])
    W_sign = np.sign(W_grads)  # Sign of new gradient
    # Update the Delta (update value) for each weight parameter seperately
    for i, _ in enumerate(W):
        if W_sign[i] == W_prev_sign[i]:
            W_delta[i] *= eta_p
        else:
            W_delta[i] *= eta_n
    return W_delta, W_sign

# Perform Rprop optimisation

# Set hyperparameters
eta_p = 1.2
eta_n = 0.5

# Set initial parameters
W = [-1.5, 2]  # [wx, wRec]
W_delta = [0.001, 0.001]  # Update values (Delta) for W
W_sign = [0, 0]  # Previous sign of W

ls_of_ws = [(W[0], W[1])]  # List of weights to plot
# Iterate over 500 iterations
for i in range(500):
    # Get the update values and sign of the last gradient
    W_delta, W_sign = update_rprop(X, t, W, W_sign, W_delta, eta_p, eta_n)
    # Update each weight parameter seperately
    for i, _ in enumerate(W):
        W[i] -= W_sign[i] * W_delta[i]
    ls_of_ws.append((W[0], W[1]))  # Add weights to list to plot

print("Final weights are: wx = {0},  wRec = {1}".format(W[0], W[1]))

Final weights are: wx = 1.00135554721, wRec = 0.999674473785

# Plot the cost surface with the weights over the iterations.

# Define plot function
def plot_optimisation(ls_of_ws, cost_func):
    """Plot the optimisation iterations on the cost surface."""
    ws1, ws2 = zip(*ls_of_ws)
    # Plot figures
    fig = plt.figure(figsize=(10, 4))
    # Plot overview of cost function
    ax_1 = fig.add_subplot(1,2,1)
    ws1_1, ws2_1, cost_ws_1 = get_cost_surface(-3, 3, -3, 3, 100, cost_func)
    surf_1 = plot_surface(ax_1, ws1_1, ws2_1, cost_ws_1 + 1)
    ax_1.plot(ws1, ws2, "b.")
    ax_1.set_xlim([-3,3])
    ax_1.set_ylim([-3,3])
    # Plot zoom of cost function
    ax_2 = fig.add_subplot(1,2,2)
    ws1_2, ws2_2, cost_ws_2 = get_cost_surface(0, 2, 0, 2, 100, cost_func)
    surf_2 = plot_surface(ax_2, ws1_2, ws2_2, cost_ws_2 + 1)
    ax_2.set_xlim([0,2])
    ax_2.set_ylim([0,2])
    surf_2 = plot_surface(ax_2, ws1_2, ws2_2, cost_ws_2)
    ax_2.plot(ws1, ws2, "b.")
    # Show the colorbar
    fig.subplots_adjust(right=0.8)
    cax = fig.add_axes([0.85, 0.12, 0.03, 0.78])
    cbar = fig.colorbar(surf_1, ticks=np.logspace(0, 8, 9), cax=cax)
    cbar.ax.set_ylabel("$xi$", fontsize=15)
    cbar.set_ticklabels(["{:.0e}".format(i) for i in np.logspace(0, 8, 9)])
    plt.suptitle("Cost surface", fontsize=15)
    plt.show()
    
# Plot the optimisation
plot_optimisation(ls_of_ws, lambda w1, w2: cost(forward_states(X, w1, w2)[:,-1] , t))
plt.show()

測試模型

最后我們編寫測試代碼。從代碼的執(zhí)行中，我們能發(fā)現(xiàn)目標(biāo)值和真實(shí)值非常的相近。如果我們?nèi)∧Ｐ洼敵鲋档淖羁拷恼麛?shù)，那么預(yù)測值的輸出將更加完美。

test_inpt = np.asmatrix([[0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1]])
print test_inpt
test_outpt = forward_states(test_inpt, W[0], W[1])[:,-1]
print "Target output: {:d} vs Model output: {:.2f}".format(test_inpt.sum(), test_outpt[0])

Target output: 5 vs Model output: 4.99

完整代碼，點(diǎn)擊這里

作者：chen_h
微信號 & QQ：862251340
簡書地址：https://www.jianshu.com/p/160...

CoderPai 是一個專注于算法實(shí)戰(zhàn)的平臺，從基礎(chǔ)的算法到人工智能算法都有設(shè)計(jì)。如果你對算法實(shí)戰(zhàn)感興趣，請快快關(guān)注我們吧。加入AI實(shí)戰(zhàn)微信群，AI實(shí)戰(zhàn)QQ群，ACM算法微信群，ACM算法QQ群。長按或者掃描如下二維碼，關(guān)注 “CoderPai” 微信號（coderpai）

云服務(wù)器 GPU云服務(wù)器 RNN 對于序列數(shù)據(jù)用rnn rnn如何做圖像識別 java編寫的rnn

文章版權(quán)歸作者所有，未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請注明本文地址：http://systransis.cn/yun/41167.html

發(fā)表評論

登陸后可評論

0條評論

zilu

男|高級講師

我要關(guān)注我要私信

TA的文章

怎么看tensorflow的版本

閱讀 1652·2023-04-25 16:29
LeetCode 677 鍵值映射[Map] HERODING的LeetCode之路

閱讀 959·2021-11-15 11:38
HostYun中秋88折：韓國VPS月付15.8元起,香港VPS月付17.6元起

閱讀 2299·2021-09-23 11:45
虛擬主機(jī)是什么-服務(wù)器與虛擬主機(jī)有什么區(qū)別？

閱讀 1427·2021-09-22 16:03
前端每日實(shí)戰(zhàn)：67# 視頻演示如何用純 CSS 創(chuàng)作單元素點(diǎn)陣 loader

閱讀 2542·2019-08-30 15:54
重學(xué)前端學(xué)習(xí)筆記（十二）--瀏覽器工作解析（二）

閱讀 1205·2019-08-30 10:53
前端每日實(shí)戰(zhàn)：24# 視頻演示如何用純 CSS 創(chuàng)作出平滑的層疊海浪特效

閱讀 2605·2019-08-29 15:24
實(shí)現(xiàn)前后端分離的心得

閱讀 1104·2019-08-26 12:25

成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！

（一）線性循環(huán)神經(jīng)網(wǎng)絡(luò)（RNN）

相關(guān)文章

**（二）非線性*循環(huán)**神經(jīng)網(wǎng)絡(luò)（RNN*）**

128塊Tesla V100 4小時(shí)訓(xùn)練40G文本，這篇論文果然很英偉達(dá)

RNN和LSTM弱！爆！了！注意力模型才是王道

人工智能期末筆記

發(fā)表評論

0條評論

zilu

男|高級講師

TA的文章

怎么看tensorflow的版本

LeetCode 677 鍵值映射[Map] HERODING的LeetCode之路

HostYun中秋88折：韓國VPS月付15.8元起,香港VPS月付17.6元起

虛擬主機(jī)是什么-服務(wù)器與虛擬主機(jī)有什么區(qū)別？

前端每日實(shí)戰(zhàn)：67# 視頻演示如何用純 CSS 創(chuàng)作單元素點(diǎn)陣 loader

重學(xué)前端學(xué)習(xí)筆記（十二）--瀏覽器工作解析（二）

前端每日實(shí)戰(zhàn)：24# 視頻演示如何用純 CSS 創(chuàng)作出平滑的層疊海浪特效

實(shí)現(xiàn)前后端分離的心得

最新活動

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！

（一）線性循環(huán)神經(jīng)網(wǎng)絡(luò)（RNN）

相關(guān)文章

發(fā)表評論

0條評論

男|高級講師

TA的文章

最新活動

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！