Deep learning：四十二(Denoise Autoencoder簡(jiǎn)單理解)

gotham 發(fā)布于2019-04-25 17:57 / 2222人閱讀

摘要：本篇博文主要是根據(jù)的那篇文章簡(jiǎn)單介紹下，然后通過個(gè)簡(jiǎn)單的實(shí)驗(yàn)來說明實(shí)際編程中該怎樣應(yīng)用。當(dāng)然作者也從數(shù)學(xué)上給出了一定的解釋。自頂向下的生成模型觀點(diǎn)的解釋。信息論觀點(diǎn)的解釋。

前言：

　　當(dāng)采用無監(jiān)督的方法分層預(yù)訓(xùn)練深度網(wǎng)絡(luò)的權(quán)值時(shí)，為了學(xué)習(xí)到較魯棒的特征，可以在網(wǎng)絡(luò)的可視層（即數(shù)據(jù)的輸入層）引入隨機(jī)噪聲，這種方法稱為Denoise Autoencoder(簡(jiǎn)稱dAE)，由Bengio在08年提出，見其文章Extracting and composing robust features with denoising autoencoders.使用dAE時(shí)，可以用被破壞的輸入數(shù)據(jù)重構(gòu)出原始的數(shù)據(jù)（指沒被破壞的數(shù)據(jù)），所以它訓(xùn)練出來的特征會(huì)更魯棒。本篇博文主要是根據(jù)Benigio的那篇文章簡(jiǎn)單介紹下dAE，然后通過2個(gè)簡(jiǎn)單的實(shí)驗(yàn)來說明實(shí)際編程中該怎樣應(yīng)用dAE。這2個(gè)實(shí)驗(yàn)都是網(wǎng)絡(luò)上現(xiàn)成的工具稍加改變而成，其中一個(gè)就是matlab的Deep Learning toolbox，見https://github.com/rasmusbergpalm/DeepLearnToolbox，另一個(gè)是與python相關(guān)的theano，參考：http://deeplearning.net/tutorial/dA.html.

　　基礎(chǔ)知識(shí)：

　　首先來看看Bengio論文中關(guān)于dAE的示意圖，如下：

　　由上圖可知，樣本x按照qD分布加入隨機(jī)噪聲后變?yōu)?,按照文章的意思，這里并不是加入高斯噪聲，而是以一定概率使輸入層節(jié)點(diǎn)的值清為0，這點(diǎn)與上篇博文介紹的dropout（Deep learning：四十一(Dropout簡(jiǎn)單理解)）很類似，只不過dropout作用在隱含層。此時(shí)輸入到可視層的數(shù)據(jù)變?yōu)椋[含層輸出為y，然后由y重構(gòu)x的輸出z，注意此時(shí)這里不是重構(gòu) ，而是x.

　　Bengio對(duì)dAE的直觀解釋為：1.dAE有點(diǎn)類似人體的感官系統(tǒng)，比如人眼看物體時(shí)，如果物體某一小部分被遮住了，人依然能夠?qū)⑵渥R(shí)別出來，2.多模態(tài)信息輸入人體時(shí)（比如聲音，圖像等），少了其中某些模態(tài)的信息有時(shí)影響也不大。3.普通的autoencoder的本質(zhì)是學(xué)習(xí)一個(gè)相等函數(shù)，即輸入和重構(gòu)后的輸出相等，這種相等函數(shù)的表示有個(gè)缺點(diǎn)就是當(dāng)測(cè)試樣本和訓(xùn)練樣本不符合同一分布，即相差較大時(shí)，效果不好，明顯，dAE在這方面的處理有所進(jìn)步。

　　當(dāng)然作者也從數(shù)學(xué)上給出了一定的解釋。

　　1. 流形學(xué)習(xí)的觀點(diǎn)。一般情況下，高維的數(shù)據(jù)都處于一個(gè)較低維的流形曲面上，而使用dAE得到的特征就基本處于這個(gè)曲面上，如下圖所示。而普通的autoencoder，即使是加入了稀疏約束，其提取出的特征也不是都在這個(gè)低維曲面上（雖然這樣也能提取出原始數(shù)據(jù)的主要信息）。

　　2.自頂向下的生成模型觀點(diǎn)的解釋。3.信息論觀點(diǎn)的解釋。4.隨機(jī)法觀點(diǎn)的解釋。這幾個(gè)觀點(diǎn)的解釋數(shù)學(xué)有一部分?jǐn)?shù)學(xué)公式，大家具體去仔細(xì)看他的paper。

　　當(dāng)在訓(xùn)練深度網(wǎng)絡(luò)時(shí)，且采用了無監(jiān)督方法預(yù)訓(xùn)練權(quán)值，通常，Dropout和Denoise Autoencoder在使用時(shí)有一個(gè)小地方不同：Dropout在分層預(yù)訓(xùn)練權(quán)值的過程中是不參與的，只是后面的微調(diào)部分引入；而Denoise Autoencoder是在每層預(yù)訓(xùn)練的過程中作為輸入層被引入，在進(jìn)行微調(diào)時(shí)不參與。另外，一般的重構(gòu)誤差可以采用均方誤差的形式，但是如果輸入和輸出的向量元素都是位變量，則一般采用交叉熵來表示兩者的差異。

　　實(shí)驗(yàn)過程：

　　實(shí)驗(yàn)一：

　　同樣是用mnist手寫數(shù)字識(shí)別數(shù)據(jù)庫，訓(xùn)練樣本數(shù)為60000，測(cè)試樣本為10000，采用matlab的Deep Learning工具箱（https://github.com/rasmusbergpalm/DeepLearnToolbox），2個(gè)隱含層，每個(gè)隱含層節(jié)點(diǎn)個(gè)數(shù)都是100，即整體網(wǎng)絡(luò)結(jié)構(gòu)為：784-100-100-10. 實(shí)驗(yàn)對(duì)比了有無使用denoise技術(shù)時(shí)識(shí)別的錯(cuò)誤率以及兩種情況下學(xué)習(xí)到了的特征形狀，其實(shí)驗(yàn)結(jié)果如下所示：

　　沒采用denoise的autoencoder時(shí)特征圖顯示：

　　測(cè)試樣本誤差率：9.33%

　　采用了denoise autoencoder時(shí)的特征圖顯示：

　　測(cè)試樣本誤差率：8.26%

　　由實(shí)驗(yàn)結(jié)果圖可知，加入了噪聲后的自編碼器學(xué)習(xí)到的特征要稍好些（沒有去調(diào)參數(shù)，如果能調(diào)得一手好參的話，效果會(huì)更好）。

　　實(shí)驗(yàn)一主要部分的代碼及注釋：

　　Test.m:

%% //導(dǎo)入數(shù)據(jù)
load mnist_uint8;
train_x = double(train_x)/255;
test_x  = double(test_x)/255;
train_y = double(train_y);
test_y  = double(test_y);

%% //實(shí)驗(yàn)一：采用denoising autoencoder進(jìn)行預(yù)訓(xùn)練
rng(0);
sae = saesetup([784 100 100]); % //其實(shí)這里nn中的W已經(jīng)被隨機(jī)初始化過
sae.ae{1}.activation_function       = "sigm";
sae.ae{1}.learningRate              = 1;
sae.ae{1}.inputZeroMaskedFraction   = 0.;
sae.ae{2}.activation_function       = "sigm";
sae.ae{2}.learningRate              = 1;
sae.ae{2}.inputZeroMaskedFraction   = 0.; %這里的denoise autocoder相當(dāng)于隱含層的dropout,但它是分層訓(xùn)練的
opts.numepochs =   1;
opts.batchsize = 100;
sae = saetrain(sae, train_x, opts);% //無監(jiān)督學(xué)習(xí)，不需要傳入標(biāo)簽值，學(xué)習(xí)好的權(quán)重放在sae中，
                                    %  //并且train_x是最后一個(gè)隱含層的輸出。由于是分層預(yù)訓(xùn)練
                                    %  //的，所以每次訓(xùn)練其實(shí)只考慮了一個(gè)隱含層，隱含層的輸入有
                                    %  //相應(yīng)的denoise操作
visualize(sae.ae{1}.W{1}(:,2:end)")
% Use the SDAE to initialize a FFNN
nn = nnsetup([784 100 100 10]);
nn.activation_function              = "sigm";
nn.learningRate                     = 1;
%add pretrained weights
nn.W{1} = sae.ae{1}.W{1}; % //將sae訓(xùn)練好了的權(quán)值賦給nn網(wǎng)絡(luò)作為初始值，覆蓋了前面的隨機(jī)初始化
nn.W{2} = sae.ae{2}.W{1};
% Train the FFNN
opts.numepochs =   1;
opts.batchsize = 100;
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
str = sprintf("testing error rate is: %f",er);
disp(str)


%% //實(shí)驗(yàn)二：采用denoising autoencoder進(jìn)行預(yù)訓(xùn)練
rng(0);
sae = saesetup([784 100 100]); % //其實(shí)這里nn中的W已經(jīng)被隨機(jī)初始化過
sae.ae{1}.activation_function       = "sigm";
sae.ae{1}.learningRate              = 1;
sae.ae{1}.inputZeroMaskedFraction   = 0.5;
sae.ae{2}.activation_function       = "sigm";
sae.ae{2}.learningRate              = 1;
sae.ae{2}.inputZeroMaskedFraction   = 0.5; %這里的denoise autocoder相當(dāng)于隱含層的dropout,但它是分層訓(xùn)練的
opts.numepochs =   1;
opts.batchsize = 100;
sae = saetrain(sae, train_x, opts);% //無監(jiān)督學(xué)習(xí)，不需要傳入標(biāo)簽值，學(xué)習(xí)好的權(quán)重放在sae中，
                                    %  //并且train_x是最后一個(gè)隱含層的輸出。由于是分層預(yù)訓(xùn)練
                                    %  //的，所以每次訓(xùn)練其實(shí)只考慮了一個(gè)隱含層，隱含層的輸入有
                                    %  //相應(yīng)的denoise操作
figure,visualize(sae.ae{1}.W{1}(:,2:end)")
% Use the SDAE to initialize a FFNN
nn = nnsetup([784 100 100 10]);
nn.activation_function              = "sigm";
nn.learningRate                     = 1;
%add pretrained weights
nn.W{1} = sae.ae{1}.W{1}; % //將sae訓(xùn)練好了的權(quán)值賦給nn網(wǎng)絡(luò)作為初始值，覆蓋了前面的隨機(jī)初始化
nn.W{2} = sae.ae{2}.W{1};
% Train the FFNN
opts.numepochs =   1;
opts.batchsize = 100;
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
str = sprintf("testing error rate is: %f",er);
disp(str)

　　也可以類似于上篇博文跟蹤Dropout代碼一樣，這里去跟蹤下dAE代碼。使用sae時(shí)將輸入層加入50%噪聲的語句：

　　sae.ae{1}.inputZeroMaskedFraction = 0.5;

　　繼續(xù)跟蹤到sae的訓(xùn)練過程，其訓(xùn)練過程也是采用nntrain()函數(shù)，里面有如下代碼：

if(nn.inputZeroMaskedFraction ~= 0)

　　batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction); % //在輸入數(shù)據(jù)上加入噪聲，rand()為0-1之間的均勻分布

　　代碼一目了然。

　　實(shí)驗(yàn)二：

　　這部分的實(shí)驗(yàn)基本上就是網(wǎng)頁教程上的：http://deeplearning.net/tutorial/dA.html，具體細(xì)節(jié)可以參考教程的內(nèi)容，里面講得比較詳細(xì)。由于其dAE的實(shí)現(xiàn)是用了theano庫，所以首先需要安裝theano以及與之相關(guān)的一系列庫，比如在ubuntu下安裝就可以參考網(wǎng)頁Installing Theano和Easy Installation of an optimized Theano on Ubuntu， 很容易成功（注意在測(cè)試時(shí)有些不重要的小failure可以忽略掉）。下面是我安裝theano時(shí)的各版本號(hào)：

　　ubuntu 13.04,Linux操作系統(tǒng).

　　python： 2.7.4，編程語言包.

　　python-numpy 1.7.1，python的數(shù)學(xué)運(yùn)算包，包含矩陣運(yùn)算.

　　python-scipy 0.11，有利于稀疏矩陣運(yùn)算.

　　python-pip,1.1,python的包管理軟件.　　

　　python-nose,1.1.2,有利于thenao的測(cè)試.

　　libopenblas-dev,0.2.6,用來管理頭文件的.

　　git,1.8.1,用來下載軟件版本的.

　　gcc,4.7.3,用來編譯c的.

　　theano,0.6.0rc3,多維矩陣操作，優(yōu)化，可與GPU結(jié)合的python庫.

　　這個(gè)實(shí)驗(yàn)也是用的mnist數(shù)據(jù)庫，不過只用了一個(gè)隱含層節(jié)點(diǎn)，節(jié)點(diǎn)個(gè)數(shù)為500. 實(shí)驗(yàn)?zāi)康闹皇菫榱藢?duì)比在使用denoise前后的autoencoder學(xué)習(xí)到的特征形狀的區(qū)別。

　　沒用denoise時(shí)的特征：

　　使用了denoise時(shí)的特征：

　　由圖可見，加入了denoise后學(xué)習(xí)到的特征更具有代表性。

　　實(shí)驗(yàn)二主要部分的代碼及注釋：

　　dA.py:

#_*_coding:UTF-8_*_
import cPickle
import gzip
import os
import sys
import time
import numpy
import theano
import theano.tensor as T #theano中一些常見的符號(hào)操作在子庫tensor中
from theano.tensor.shared_randomStreams import RandomStreams
from logistic_sgd import load_data
from utils import tile_raster_images
import PIL.Image #繪圖所用

class dA(object):
    def __init__(self, numpy_rng, theano_rng=None, input=None,
                 n_visible=784, n_hidden=500,
                 W=None, bhid=None, bvis=None):
        self.n_visible = n_visible
        self.n_hidden = n_hidden
        if not theano_rng:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
        if not W:
            initial_W = numpy.asarray(numpy_rng.uniform(
                      low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                      high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                      size=(n_visible, n_hidden)), dtype=theano.config.floatX)
            W = theano.shared(value=initial_W, name="W", borrow=True) #W,bvis,bhid都為共享變量
        if not bvis:
            bvis = theano.shared(value=numpy.zeros(n_visible, dtype=theano.config.floatX), borrow=True)
        if not bhid:
            bhid = theano.shared(value=numpy.zeros(n_hidden, dtype=theano.config.floatX), name="b", borrow=True)
        self.W = W
        self.b = bhid
        self.b_prime = bvis
        self.W_prime = self.W.T
        self.theano_rng = theano_rng
        if input == None:
            self.x = T.dmatrix(name="input")
        else:
            self.x = input #保存輸入數(shù)據(jù)
        self.params = [self.W, self.b, self.b_prime]

    def get_corrupted_input(self, input, corruption_level):
        return  self.theano_rng.binomial(size=input.shape, n=1,
                                         p=1 - corruption_level,
                                         dtype=theano.config.floatX) * input #binomial()函數(shù)為產(chǎn)生0，1的分布，這里是設(shè)置產(chǎn)生1的概率為p

    def get_hidden_values(self, input):
        return T.nnet.sigmoid(T.dot(input, self.W) + self.b)

    def get_reconstructed_input(self, hidden):
        return  T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)

    def get_cost_updates(self, corruption_level, learning_rate): #每調(diào)用該函數(shù)一次，就算出了前向傳播的誤差cost，網(wǎng)絡(luò)參數(shù)及其導(dǎo)數(shù)
        tilde_x = self.get_corrupted_input(self.x, corruption_level)
        y = self.get_hidden_values(tilde_x)
        z = self.get_reconstructed_input(y)
        L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
        cost = T.mean(L)
        gparams = T.grad(cost, self.params)
        updates = []
        for param, gparam in zip(self.params, gparams):
            updates.append((param, param - learning_rate * gparam)) #append列表中存的是參數(shù)和其導(dǎo)數(shù)構(gòu)成的元組
        return (cost, updates)

# 測(cè)試函數(shù)
def test_dA(learning_rate=0.1, training_epochs=15,
            dataset="data/mnist.pkl.gz",
            batch_size=20, output_folder="dA_plots"):
    datasets = load_data(dataset)
    train_set_x, train_set_y = datasets[0] #train_set_x矩陣中每一行代表一個(gè)樣本
    n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size #求出batch的個(gè)數(shù)
    index = T.lScalar()    # index to a [mini]batch
    x = T.matrix("x")  # the data is presented as rasterized images
    if not os.path.isdir(output_folder):
        os.makedirs(output_folder)
    os.chdir(output_folder)

    # 沒有使用denoise時(shí)
    rng = numpy.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))
    da = dA(numpy_rng=rng, theano_rng=theano_rng, input=x,
            n_visible=28 * 28, n_hidden=500) # 創(chuàng)建dA對(duì)象時(shí)，并不需要數(shù)據(jù)x，只是給對(duì)象da中的一些網(wǎng)絡(luò)結(jié)構(gòu)參數(shù)賦值
    cost, updates = da.get_cost_updates(corruption_level=0.,
                                        learning_rate=learning_rate)
    train_da = theano.function([index], cost, updates=updates, #theano.function()為定義一個(gè)符號(hào)函數(shù)，這里的自變量為indexy
         givens={x: train_set_x[index * batch_size: (index + 1) * batch_size]}) #輸出變量為cost
    start_time = time.clock()
    for epoch in xrange(training_epochs):
        c = []
        for batch_index in xrange(n_train_batches):
            c.append(train_da(batch_index))
        print "Training epoch %d, cost " % epoch, numpy.mean(c)
    end_time = time.clock()
    training_time = (end_time - start_time)
    print >> sys.stderr, ("The no corruption code for file " +
                          os.path.split(__file__)[1] +
                          " ran for %.2fm" % ((training_time) / 60.))
    image = PIL.Image.fromarray(
        tile_raster_images(X=da.W.get_value(borrow=True).T,
                           img_shape=(28, 28), tile_shape=(10, 10),
                           tile_spacing=(1, 1)))
    image.save("filters_corruption_0.png")

    # 使用了denoise時(shí)
    rng = numpy.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))
    da = dA(numpy_rng=rng, theano_rng=theano_rng, input=x,
            n_visible=28 * 28, n_hidden=500)
    cost, updates = da.get_cost_updates(corruption_level=0.3,
                                        learning_rate=learning_rate) #將輸入樣本每個(gè)像素點(diǎn)以30%的概率被清0
    train_da = theano.function([index], cost, updates=updates,
         givens={x: train_set_x[index * batch_size:
                                  (index + 1) * batch_size]})
    start_time = time.clock()
    for epoch in xrange(training_epochs):
        c = []
        for batch_index in xrange(n_train_batches):
            c.append(train_da(batch_index))
        print "Training epoch %d, cost " % epoch, numpy.mean(c)
    end_time = time.clock()
    training_time = (end_time - start_time)
    print >> sys.stderr, ("The 30% corruption code for file " +
                          os.path.split(__file__)[1] +
                          " ran for %.2fm" % (training_time / 60.))
    image = PIL.Image.fromarray(tile_raster_images(
        X=da.W.get_value(borrow=True).T,
        img_shape=(28, 28), tile_shape=(10, 10),
        tile_spacing=(1, 1)))
    image.save("filters_corruption_30.png")
    os.chdir("../")

if __name__ == "__main__":
    test_dA()

　　其中與dAE相關(guān)的代碼為：

def get_corrupted_input(self, input, corruption_level):
      return self.theano_rng.binomial(size=input.shape, n=1,p=1 - corruption_level,
             dtype=theano.config.floatX) * input #binomial()函數(shù)為產(chǎn)生0，1的分布，這里是設(shè)置產(chǎn)生1的概率

　　參考資料：

　　Vincent, P., et al. (2008). Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th international conference on Machine learning, ACM.

? ? ?https://github.com/rasmusbergpalm/DeepLearnToolbox

? ? ?http://deeplearning.net/tutorial/dA.html

? ? ?Deep learning：四十一(Dropout簡(jiǎn)單理解)

? ? ?Installing Theano

? ? ?Easy Installation of an optimized Theano on Ubuntu

作者：tornadomeet 出處：http://www.cnblogs.com/tornadomeet 歡迎轉(zhuǎn)載或分享，但請(qǐng)務(wù)必聲明文章出處。（新浪微博：tornadomeet,歡迎交流！）

GPU云服務(wù)器云服務(wù)器 deep learning 四十二簡(jiǎn)單理解超融合架構(gòu) AutoEncoder

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://systransis.cn/yun/4277.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

gotham

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

項(xiàng)目中引入特殊字體【小程序、h5】包括canvas畫圖

閱讀 3249·2019-08-30 15:55
前端代碼集

閱讀 2973·2019-08-30 13:46
Chrome, vh Is Always Calculated as If the Url Bar

閱讀 1471·2019-08-29 17:29
移動(dòng)前端知識(shí)總結(jié)

閱讀 3542·2019-08-29 11:08
從一滴水說起，談?wù)凜SS形狀的生成思路

閱讀 3469·2019-08-29 11:04
CSS中那些微妙模糊的屬性

閱讀 1111·2019-08-28 18:20
React項(xiàng)目中碰到的IE問題

閱讀 565·2019-08-26 13:37
freeCodeCamp 學(xué)習(xí)記錄——初級(jí)算法「01」翻轉(zhuǎn)字符串

閱讀 1358·2019-08-26 11:49

成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來選購！

Deep learning：四十二(Denoise Autoencoder簡(jiǎn)單理解)

相關(guān)文章

Deep Learning 一些標(biāo)志性的文章

Deep Learning深度學(xué)習(xí)相關(guān)入門文章匯摘

Deep learning：九(Sparse Autoencoder練習(xí))

人工智能術(shù)語表

Deep Learning（深度學(xué)習(xí)）學(xué)習(xí)筆記整理

發(fā)表評(píng)論

0條評(píng)論

gotham

男|高級(jí)講師

TA的文章

項(xiàng)目中引入特殊字體【小程序、h5】包括canvas畫圖

前端代碼集

Chrome, vh Is Always Calculated as If the Url Bar

移動(dòng)前端知識(shí)總結(jié)

從一滴水說起，談?wù)凜SS形狀的生成思路

CSS中那些微妙模糊的屬性

React項(xiàng)目中碰到的IE問題

freeCodeCamp 學(xué)習(xí)記錄——初級(jí)算法「01」翻轉(zhuǎn)字符串

最新活動(dòng)

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來選購！

Deep learning：四十二(Denoise Autoencoder簡(jiǎn)單理解)

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來選購！