成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

(五)神經(jīng)網(wǎng)絡入門之構(gòu)建多層網(wǎng)絡

figofuture / 1600人閱讀

摘要:我們通過構(gòu)建一個由兩層隱藏層組成的小型網(wǎng)絡去識別手寫數(shù)字識別,來說明神經(jīng)網(wǎng)絡向多層神經(jīng)網(wǎng)絡的泛化能力。這個神經(jīng)網(wǎng)絡將是通過隨機梯度下降算法進行訓練。批處理的最小數(shù)量訓練樣本的子集經(jīng)常被稱之為最小批處理單位。

作者:chen_h
微信號 & QQ:862251340
微信公眾號:coderpai
簡書地址:https://www.jianshu.com/p/cb6...


這篇教程是翻譯Peter Roelants寫的神經(jīng)網(wǎng)絡教程,作者已經(jīng)授權(quán)翻譯,這是原文。

該教程將介紹如何入門神經(jīng)網(wǎng)絡,一共包含五部分。你可以在以下鏈接找到完整內(nèi)容。

(一)神經(jīng)網(wǎng)絡入門之線性回歸

Logistic分類函數(shù)

(二)神經(jīng)網(wǎng)絡入門之Logistic回歸(分類問題)

(三)神經(jīng)網(wǎng)絡入門之隱藏層設(shè)計

Softmax分類函數(shù)

(四)神經(jīng)網(wǎng)絡入門之矢量化

(五)神經(jīng)網(wǎng)絡入門之構(gòu)建多層網(wǎng)絡

多層網(wǎng)絡的推廣

這部分教程將介紹兩部分:

多層網(wǎng)絡的泛化

隨機梯度下降的最小批處理分析

在這個教程中,我們把前饋神經(jīng)網(wǎng)絡推到任意數(shù)量的隱藏層。其中的概念我們都通過矩陣乘法和非線性變換來進行系統(tǒng)的說明。我們通過構(gòu)建一個由兩層隱藏層組成的小型網(wǎng)絡去識別手寫數(shù)字識別,來說明神經(jīng)網(wǎng)絡向多層神經(jīng)網(wǎng)絡的泛化能力。這個神經(jīng)網(wǎng)絡將是通過隨機梯度下降算法進行訓練。

我們先導入教程需要使用的軟件包。

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, cross_validation, metrics
from matplotlib.colors import colorConverter, ListedColormap
import itertools
import collections
手寫數(shù)字集

在這個教程中,我們使用scikit-learn提供的手寫數(shù)字集。這個手寫數(shù)字集包含1797張8*8的圖片。在處理中,我們可以把像素鋪平,形成一個64維的向量。下圖展示了每個數(shù)字的圖片。注意,這個數(shù)據(jù)集和MNIST手寫數(shù)字集是不一樣,MNIST是一個大型的數(shù)據(jù)集,而這個只是一個小型的數(shù)據(jù)集。

我們會先對這個數(shù)據(jù)集進行一個預處理,將這個數(shù)據(jù)集切分成以下幾部分:

一個訓練集,用于模型的訓練。(輸入數(shù)據(jù):X_train,目標數(shù)據(jù):T_train)

一個驗證的數(shù)據(jù)集,用于去評估模型的性能,如果模型在訓練數(shù)據(jù)集上面出現(xiàn)過擬合了,那么可以終止訓練了。(輸入數(shù)據(jù):X_validation,目標數(shù)據(jù):T_avlidation)

一個測試數(shù)據(jù)集,用于最終對模型的測試。(輸入數(shù)據(jù):X_test,目標數(shù)據(jù):T_test)

# load the data from scikit-learn.
digits = datasets.load_digits()

# Load the targets.
# Note that the targets are stored as digits, these need to be 
#  converted to one-hot-encoding for the output sofmax layer.
T = np.zeros((digits.target.shape[0],10))
T[np.arange(len(T)), digits.target] += 1

# Divide the data into a train and test set.
X_train, X_test, T_train, T_test = cross_validation.train_test_split(
    digits.data, T, test_size=0.4)
# Divide the test set into a validation set and final test set.
X_validation, X_test, T_validation, T_test = cross_validation.train_test_split(
    X_test, T_test, test_size=0.5)
# Plot an example of each image.
fig = plt.figure(figsize=(10, 1), dpi=100)
for i in range(10):
    ax = fig.add_subplot(1,10,i+1)
    ax.matshow(digits.images[i], cmap="binary") 
    ax.axis("off")
plt.show()

網(wǎng)絡層的泛化

在第四部分中,我們設(shè)計的神經(jīng)網(wǎng)絡通過矩陣相乘實現(xiàn)一個線性轉(zhuǎn)換和一個非線性函數(shù)的轉(zhuǎn)換。

在進行非線性函數(shù)處理時,我們是對每個神經(jīng)元進行處理的,這樣的好處是可以幫助我們更加容易的進行理解和計算。

我們利用Python classes構(gòu)造了三個層:

一個線性轉(zhuǎn)換層LinearLayer

一個Logistic函數(shù)LogisticLayer

一個softmax函數(shù)層SoftmaxOutputLayer

在正向傳遞時,每個層可以通過get_output函數(shù)計算該層的輸出結(jié)果,這個結(jié)果將被下一層作為輸入數(shù)據(jù)進行使用。在反向傳遞時,每一層的輸入的梯度可以通過get_input_grad函數(shù)計算得到。如果是最后一層,那么梯度計算方程將利用目標結(jié)果進行計算。如果是中間的某一層,那么梯度就是梯度計算函數(shù)的輸出結(jié)果。如果每個層有迭代參數(shù)的話,那么可以在get_params_iter函數(shù)中實現(xiàn),并且在get_params_grad函數(shù)中按照原來的順序?qū)崿F(xiàn)參數(shù)的梯度。

注意,在softmax層中,梯度和損失函數(shù)的計算將根據(jù)輸入樣本的數(shù)量進行計算。也就是說,這將使得梯度與損失函數(shù)和樣本數(shù)量之間是相互獨立的,以至于當我們改變批處理的數(shù)量時,對別的參數(shù)不會產(chǎn)生影響。

# Define the non-linear functions used
def logistic(z): 
    return 1 / (1 + np.exp(-z))

def logistic_deriv(y):  # Derivative of logistic function
    return np.multiply(y, (1 - y))
    
def softmax(z): 
    return np.exp(z) / np.sum(np.exp(z), axis=1, keepdims=True)
# Define the layers used in this model
class Layer(object):
    """Base class for the different layers.
    Defines base methods and documentation of methods."""
    
    def get_params_iter(self):
        """Return an iterator over the parameters (if any).
        The iterator has the same order as get_params_grad.
        The elements returned by the iterator are editable in-place."""
        return []
    
    def get_params_grad(self, X, output_grad):
        """Return a list of gradients over the parameters.
        The list has the same order as the get_params_iter iterator.
        X is the input.
        output_grad is the gradient at the output of this layer.
        """
        return []
    
    def get_output(self, X):
        """Perform the forward step linear transformation.
        X is the input."""
        pass
    
    def get_input_grad(self, Y, output_grad=None, T=None):
        """Return the gradient at the inputs of this layer.
        Y is the pre-computed output of this layer (not needed in this case).
        output_grad is the gradient at the output of this layer 
         (gradient at input of next layer).
        Output layer uses targets T to compute the gradient based on the 
         output error instead of output_grad"""
        pass
class LinearLayer(Layer):
    """The linear layer performs a linear transformation to its input."""
    
    def __init__(self, n_in, n_out):
        """Initialize hidden layer parameters.
        n_in is the number of input variables.
        n_out is the number of output variables."""
        self.W = np.random.randn(n_in, n_out) * 0.1
        self.b = np.zeros(n_out)
        
    def get_params_iter(self):
        """Return an iterator over the parameters."""
        return itertools.chain(np.nditer(self.W, op_flags=["readwrite"]),
                               np.nditer(self.b, op_flags=["readwrite"]))
    
    def get_output(self, X):
        """Perform the forward step linear transformation."""
        return X.dot(self.W) + self.b
        
    def get_params_grad(self, X, output_grad):
        """Return a list of gradients over the parameters."""
        JW = X.T.dot(output_grad)
        Jb = np.sum(output_grad, axis=0)
        return [g for g in itertools.chain(np.nditer(JW), np.nditer(Jb))]
    
    def get_input_grad(self, Y, output_grad):
        """Return the gradient at the inputs of this layer."""
        return output_grad.dot(self.W.T)
class LogisticLayer(Layer):
    """The logistic layer applies the logistic function to its inputs."""
    
    def get_output(self, X):
        """Perform the forward step transformation."""
        return logistic(X)
    
    def get_input_grad(self, Y, output_grad):
        """Return the gradient at the inputs of this layer."""
        return np.multiply(logistic_deriv(Y), output_grad)
class SoftmaxOutputLayer(Layer):
    """The softmax output layer computes the classification propabilities at the output."""
    
    def get_output(self, X):
        """Perform the forward step transformation."""
        return softmax(X)
    
    def get_input_grad(self, Y, T):
        """Return the gradient at the inputs of this layer."""
        return (Y - T) / Y.shape[0]
    
    def get_cost(self, Y, T):
        """Return the cost at the output of this output layer."""
        return - np.multiply(T, np.log(Y)).sum() / Y.shape[0]
樣本模型

接下來的部分,我們會實現(xiàn)設(shè)計的各個網(wǎng)絡層,以及層與層之間的線性轉(zhuǎn)換,神經(jīng)元的非線性激活。

在這個教程中,我們使用的樣本模型是由兩個隱藏層,Logistic函數(shù)作為激活函數(shù),最后使用softmax函數(shù)作為分類的一個神經(jīng)網(wǎng)絡模型。第一層的隱藏層將輸入的數(shù)據(jù)從64維度降維到20維度。第二層的隱藏層將前一層輸入的20維度經(jīng)過映射之后,還是以20維度輸出。最后一層的輸出層是一個10維度的分類結(jié)果。下圖具體描述了這種架構(gòu)的實現(xiàn):

這個神經(jīng)網(wǎng)絡被表示成一種序列模型,即當前層的輸入數(shù)據(jù)是前一層的輸出數(shù)據(jù),當前層的輸出數(shù)據(jù)將成為下一層的輸入數(shù)據(jù)。第一層作為序列的第0位,最后一層作為序列的索引最后位置。

# Define a sample model to be trained on the data
hidden_neurons_1 = 20  # Number of neurons in the first hidden-layer
hidden_neurons_2 = 20  # Number of neurons in the second hidden-layer
# Create the model
layers = [] # Define a list of layers
# Add first hidden layer
layers.append(LinearLayer(X_train.shape[1], hidden_neurons_1))
layers.append(LogisticLayer())
# Add second hidden layer
layers.append(LinearLayer(hidden_neurons_1, hidden_neurons_2))
layers.append(LogisticLayer())
# Add output layer
layers.append(LinearLayer(hidden_neurons_2, T_train.shape[1]))
layers.append(SoftmaxOutputLayer())
BP算法

BP算法在正向傳播過程和反向傳播過程中的具體細節(jié)已經(jīng)在第四部分中進行了詳細的解釋,如果對此還有疑問,建議再去學習一下。這一部分,我們只單純實現(xiàn)在多層神經(jīng)網(wǎng)絡中的BP算法。

正向傳播過程

在下列代碼中,forward_step函數(shù)實現(xiàn)了正向傳播過程。get_output函數(shù)實現(xiàn)了每層的輸出結(jié)果。這些激活的輸出結(jié)果被保存在activations列表中。

# Define the forward propagation step as a method.
def forward_step(input_samples, layers):
    """
    Compute and return the forward activation of each layer in layers.
    Input:
        input_samples: A matrix of input samples (each row is an input vector)
        layers: A list of Layers
    Output:
        A list of activations where the activation at each index i+1 corresponds to
        the activation of layer i in layers. activations[0] contains the input samples.  
    """
    activations = [input_samples] # List of layer activations
    # Compute the forward activations for each layer starting from the first
    X = input_samples
    for layer in layers:
        Y = layer.get_output(X)  # Get the output of the current layer
        activations.append(Y)  # Store the output for future processing
        X = activations[-1]  # Set the current input as the activations of the previous layer
    return activations  # Return the activations of each layer
反向傳播過程

在反向傳播過程中,backward_step函數(shù)實現(xiàn)了反向傳播過程。反向傳播過程的計算是從最后一層開始的。先利用get_input_grad函數(shù)得到最初的梯度。然后,利用get_params_grad函數(shù)計算每一層的誤差函數(shù)的梯度,并且把這些梯度保存在一個列表中。

# Define the backward propagation step as a method
def backward_step(activations, targets, layers):
    """
    Perform the backpropagation step over all the layers and return the parameter gradients.
    Input:
        activations: A list of forward step activations where the activation at 
            each index i+1 corresponds to the activation of layer i in layers. 
            activations[0] contains the input samples. 
        targets: The output targets of the output layer.
        layers: A list of Layers corresponding that generated the outputs in activations.
    Output:
        A list of parameter gradients where the gradients at each index corresponds to
        the parameters gradients of the layer at the same index in layers. 
    """
    param_grads = collections.deque()  # List of parameter gradients for each layer
    output_grad = None  # The error gradient at the output of the current layer
    # Propagate the error backwards through all the layers.
    #  Use reversed to iterate backwards over the list of layers.
    for layer in reversed(layers):   
        Y = activations.pop()  # Get the activations of the last layer on the stack
        # Compute the error at the output layer.
        # The output layer error is calculated different then hidden layer error.
        if output_grad is None:
            input_grad = layer.get_input_grad(Y, targets)
        else:  # output_grad is not None (layer is not output layer)
            input_grad = layer.get_input_grad(Y, output_grad)
        # Get the input of this layer (activations of the previous layer)
        X = activations[-1]
        # Compute the layer parameter gradients used to update the parameters
        grads = layer.get_params_grad(X, output_grad)
        param_grads.appendleft(grads)
        # Compute gradient at output of previous layer (input of current layer):
        output_grad = input_grad
    return list(param_grads)  # Return the parameter gradients
梯度檢查

正如在第四部分中的分析,我們通過比較數(shù)值梯度和反向傳播計算的梯度,來分析梯度是否正確。

在代碼中,get_params_iter函數(shù)實現(xiàn)了得到每一層的參數(shù),并且返回一個所有參數(shù)的迭代。get_params_grad函數(shù)根據(jù)反向傳播,得到每一個參數(shù)對應的梯度。

# Perform gradient checking
nb_samples_gradientcheck = 10 # Test the gradients on a subset of the data
X_temp = X_train[0:nb_samples_gradientcheck,:]
T_temp = T_train[0:nb_samples_gradientcheck,:]
# Get the parameter gradients with backpropagation
activations = forward_step(X_temp, layers)
param_grads = backward_step(activations, T_temp, layers)

# Set the small change to compute the numerical gradient
eps = 0.0001
# Compute the numerical gradients of the parameters in all layers.
for idx in range(len(layers)):
    layer = layers[idx]
    layer_backprop_grads = param_grads[idx]
    # Compute the numerical gradient for each parameter in the layer
    for p_idx, param in enumerate(layer.get_params_iter()):
        grad_backprop = layer_backprop_grads[p_idx]
        # + eps
        param += eps
        plus_cost = layers[-1].get_cost(forward_step(X_temp, layers)[-1], T_temp)
        # - eps
        param -= 2 * eps
        min_cost = layers[-1].get_cost(forward_step(X_temp, layers)[-1], T_temp)
        # reset param value
        param += eps
        # calculate numerical gradient
        grad_num = (plus_cost - min_cost)/(2*eps)
        # Raise error if the numerical grade is not close to the backprop gradient
        if not np.isclose(grad_num, grad_backprop):
            raise ValueError("Numerical gradient of {:.6f} is not close to the backpropagation gradient of {:.6f}!".format(float(grad_num), float(grad_backprop)))
print("No gradient errors found")

No gradient errors found

BP算法中的隨機梯度下降

這個教程我們使用一個梯度下降的改進版,稱為隨機梯度下降,來優(yōu)化我們的損失函數(shù)。在一整個訓練集上面,隨機梯度下降算法只選擇一個子集按照負梯度的方向進行更新。這樣處理有以下幾個好處:第一,在一個大型的訓練數(shù)據(jù)集上面,我們可以節(jié)省時間和內(nèi)存,因為這個算法減少了很多的矩陣操作。第二,增加了訓練樣本的多樣性。

損失函數(shù)需要和輸入樣本的數(shù)量之間相互獨立,因為在隨機梯度算法處理的每一個過程中,樣本子集的數(shù)量這一信息都被使用了。這也是為什么我們使用損失函授的均方誤差,而不是平方誤差。

批處理的最小數(shù)量

訓練樣本的子集經(jīng)常被稱之為最小批處理單位。在下面的代碼中,我們將最小批處理單位設(shè)置成25,并且將輸入數(shù)據(jù)和目標數(shù)據(jù)打包成一個元祖輸入到網(wǎng)絡中。

# Create the minibatches
batch_size = 25  # Approximately 25 samples per batch
nb_of_batches = X_train.shape[0] / batch_size  # Number of batches
# Create batches (X,Y) from the training set
XT_batches = zip(
    np.array_split(X_train, nb_of_batches, axis=0),  # X samples
    np.array_split(T_train, nb_of_batches, axis=0))  # Y targets
隨機梯度下降算法的更新

在代碼中,update_params函數(shù)中實現(xiàn)了對每個參數(shù)的更新操作。在每一次的迭代中,我們都使用最簡單的梯度下降算法來處理參數(shù)的更新,即:

其中,μ是學習率。

nb_of_iterations函數(shù)實現(xiàn)了,更新操作將會在一整個訓練集上面進行多次迭代,每一次迭代都是取最小批處理單位的數(shù)據(jù)量。在每次全部迭代完之后,模型將會在驗證集上面進行測試。如果在驗證集上面,經(jīng)過三次的完全迭代,損失函數(shù)的值沒有下降,那么我們就認為模型已經(jīng)過擬合了,需要終止模型的訓練?;蛘呓?jīng)過設(shè)置的最大值300次,模型也會被終止訓練。所以的損失誤差值將會被保存下來,以便后續(xù)的分析。

# Define a method to update the parameters
def update_params(layers, param_grads, learning_rate):
    """
    Function to update the parameters of the given layers with the given gradients
    by gradient descent with the given learning rate.
    """
    for layer, layer_backprop_grads in zip(layers, param_grads):
        for param, grad in itertools.izip(layer.get_params_iter(), layer_backprop_grads):
            # The parameter returned by the iterator point to the memory space of
            #  the original layer and can thus be modified inplace.
            param -= learning_rate * grad  # Update each parameter
# Perform backpropagation
# initalize some lists to store the cost for future analysis        
minibatch_costs = []
training_costs = []
validation_costs = []

max_nb_of_iterations = 300  # Train for a maximum of 300 iterations
learning_rate = 0.1  # Gradient descent learning rate

# Train for the maximum number of iterations
for iteration in range(max_nb_of_iterations):
    for X, T in XT_batches:  # For each minibatch sub-iteration
        activations = forward_step(X, layers)  # Get the activations
        minibatch_cost = layers[-1].get_cost(activations[-1], T)  # Get cost
        minibatch_costs.append(minibatch_cost)
        param_grads = backward_step(activations, T, layers)  # Get the gradients
        update_params(layers, param_grads, learning_rate)  # Update the parameters
    # Get full training cost for future analysis (plots)
    activations = forward_step(X_train, layers)
    train_cost = layers[-1].get_cost(activations[-1], T_train)
    training_costs.append(train_cost)
    # Get full validation cost
    activations = forward_step(X_validation, layers)
    validation_cost = layers[-1].get_cost(activations[-1], T_validation)
    validation_costs.append(validation_cost)
    if len(validation_costs) > 3:
        # Stop training if the cost on the validation set doesn"t decrease
        #  for 3 iterations
        if validation_costs[-1] >= validation_costs[-2] >= validation_costs[-3]:
            break
    
nb_of_iterations = iteration + 1  # The number of iterations that have been executed
minibatch_x_inds = np.linspace(0, nb_of_iterations, num=nb_of_iterations*nb_of_batches)
iteration_x_inds = np.linspace(1, nb_of_iterations, num=nb_of_iterations)
# Plot the cost over the iterations
plt.plot(minibatch_x_inds, minibatch_costs, "k-", linewidth=0.5, label="cost minibatches")
plt.plot(iteration_x_inds, training_costs, "r-", linewidth=2, label="cost full training set")
plt.plot(iteration_x_inds, validation_costs, "b-", linewidth=3, label="cost validation set")
# Add labels to the plot
plt.xlabel("iteration")
plt.ylabel("$xi$", fontsize=15)
plt.title("Decrease of cost over backprop iteration")
plt.legend()
x1,x2,y1,y2 = plt.axis()
plt.axis((0,nb_of_iterations,0,2.5))
plt.grid()
plt.show()

模型在測試集上面的性能

最后,我們在測試集上面進行模型的最終測試。在這個模型中,我們最后的訓練正確率是96%。

最后的結(jié)果可以利用混淆圖進行更加深入的分析。這個表展示了每一個手寫數(shù)字被分類為什么數(shù)字的數(shù)量。下圖是利用scikit-learnconfusion_matrix方法實現(xiàn)的。

比如,數(shù)字8被誤分類了五次,其中,兩次被分類成了2,兩次被分類成了5,一次被分類成了9。

# Get results of test data
y_true = np.argmax(T_test, axis=1)  # Get the target outputs
activations = forward_step(X_test, layers)  # Get activation of test samples
y_pred = np.argmax(activations[-1], axis=1)  # Get the predictions made by the network
test_accuracy = metrics.accuracy_score(y_true, y_pred)  # Test set accuracy
print("The accuracy on the test set is {:.2f}".format(test_accuracy))

The accuracy on the test set is 0.96

# Show confusion table
conf_matrix = metrics.confusion_matrix(y_true, y_pred, labels=None)  # Get confustion matrix
# Plot the confusion table
class_names = ["${:d}$".format(x) for x in range(0, 10)]  # Digit class names
fig = plt.figure()
ax = fig.add_subplot(111)
# Show class labels on each axis
ax.xaxis.tick_top()
major_ticks = range(0,10)
minor_ticks = [x + 0.5 for x in range(0, 10)]
ax.xaxis.set_ticks(major_ticks, minor=False)
ax.yaxis.set_ticks(major_ticks, minor=False)
ax.xaxis.set_ticks(minor_ticks, minor=True)
ax.yaxis.set_ticks(minor_ticks, minor=True)
ax.xaxis.set_ticklabels(class_names, minor=False, fontsize=15)
ax.yaxis.set_ticklabels(class_names, minor=False, fontsize=15)
# Set plot labels
ax.yaxis.set_label_position("right")
ax.set_xlabel("Predicted label")
ax.set_ylabel("True label")
fig.suptitle("Confusion table", y=1.03, fontsize=15)
# Show a grid to seperate digits
ax.grid(b=True, which=u"minor")
# Color each grid cell according to the number classes predicted
ax.imshow(conf_matrix, interpolation="nearest", cmap="binary")
# Show the number of samples in each cell
for x in xrange(conf_matrix.shape[0]):
    for y in xrange(conf_matrix.shape[1]):
        color = "w" if x == y else "k"
        ax.text(x, y, conf_matrix[y,x], ha="center", va="center", color=color)       
plt.show()

完整代碼,點擊這里


作者:chen_h
微信號 & QQ:862251340
簡書地址:https://www.jianshu.com/p/cb6...

CoderPai 是一個專注于算法實戰(zhàn)的平臺,從基礎(chǔ)的算法到人工智能算法都有設(shè)計。如果你對算法實戰(zhàn)感興趣,請快快關(guān)注我們吧。加入AI實戰(zhàn)微信群,AI實戰(zhàn)QQ群,ACM算法微信群,ACM算法QQ群。長按或者掃描如下二維碼,關(guān)注 “CoderPai” 微信號(coderpai)

文章版權(quán)歸作者所有,未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請注明本文地址:http://systransis.cn/yun/41163.html

相關(guān)文章

  • Logistic分類函數(shù)

    摘要:對于多分類問題,我們使用函數(shù)來處理多項式回歸。概率方程表示輸出根據(jù)函數(shù)得到的值。最大似然估計可以寫成因為對于給定的參數(shù),去產(chǎn)生和,根據(jù)聯(lián)合概率我們又能將似然函數(shù)改寫成。 作者:chen_h微信號 & QQ:862251340微信公眾號:coderpai簡書地址:https://www.jianshu.com/p/abc... 這篇教程是翻譯Peter Roelants寫的神經(jīng)網(wǎng)絡教程...

    XBaron 評論0 收藏0
  • (一)神經(jīng)網(wǎng)絡入門線性回歸

    摘要:神經(jīng)網(wǎng)絡的模型結(jié)構(gòu)為,其中是輸入?yún)?shù),是權(quán)重,是預測結(jié)果。損失函數(shù)我們定義為對于損失函數(shù)的優(yōu)化,我們采用梯度下降,這個方法是神經(jīng)網(wǎng)絡中常見的優(yōu)化方法。函數(shù)實現(xiàn)了神經(jīng)網(wǎng)絡模型,函數(shù)實現(xiàn)了損失函數(shù)。 作者:chen_h微信號 & QQ:862251340微信公眾號:coderpai簡書地址:https://www.jianshu.com/p/0da... 這篇教程是翻譯Peter Roe...

    lx1036 評論0 收藏0
  • Softmax分類函數(shù)

    摘要:對于多分類問題,我們可以使用多項回歸,該方法也被稱之為函數(shù)。函數(shù)的交叉熵損失函數(shù)的推導損失函數(shù)對于的導數(shù)求解如下上式已經(jīng)求解了當和的兩種情況。最終的結(jié)果為,這個求導結(jié)果和函數(shù)的交叉熵損失函數(shù)求導是一樣的,再次證明函數(shù)是函數(shù)的一個擴展板。 作者:chen_h微信號 & QQ:862251340微信公眾號:coderpai簡書地址:https://www.jianshu.com/p/8eb...

    BicycleWarrior 評論0 收藏0
  • (二)神經(jīng)網(wǎng)絡入門Logistic回歸(分類問題)

    摘要:那么,概率將是神經(jīng)網(wǎng)絡輸出的,即。函數(shù)實現(xiàn)了函數(shù),函數(shù)實現(xiàn)了損失函數(shù),實現(xiàn)了神經(jīng)網(wǎng)絡的輸出結(jié)果,實現(xiàn)了神經(jīng)網(wǎng)絡的預測結(jié)果。 作者:chen_h微信號 & QQ:862251340微信公眾號:coderpai簡書地址:https://www.jianshu.com/p/d94... 這篇教程是翻譯Peter Roelants寫的神經(jīng)網(wǎng)絡教程,作者已經(jīng)授權(quán)翻譯,這是原文。 該教程將介紹如...

    pf_miles 評論0 收藏0

發(fā)表評論

0條評論

最新活動
閱讀需要支付1元查看
<