圖像描述類任務(wù)就是給圖像生成一個標題。 給定一個圖像:
圖片出處, 許可證:公共領(lǐng)域
我們的目標是用一句話來描述圖片, 比如「一個沖浪者正在沖浪」。 本教程中用到了基于注意力的模型,它使我們很直觀地看到當文字生成時模型會關(guān)注哪些部分。?
這個模型的結(jié)構(gòu)類似于論文: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.(https://arxiv.org/abs/1502.03044)
本教程中的代碼使用到了 ?tf.keras (https://www.tensorflow.org/guide/keras) 和 ?eager execution (https://www.tensorflow.org/programmers_guide/eager)這兩個工具,鏈接里有詳細的內(nèi)容可以學(xué)習(xí)。
這個 notebook 展示了一個端到端模型。 運行的時候,它會自動下載 MS-COCO (http://cocodataset.org/#home)數(shù)據(jù)集,使用 Inception V3 模型訓(xùn)練一個編碼 - 解碼器,然后用模型對新圖像進行文字描述。
這篇代碼可以在 Colab (https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/generative_examples/image_captioning_with_attention.ipynb) 中運行,但是需要 TensorFlow 的版本 >=1.9 ??
本實驗對數(shù)據(jù)進行打亂以后取前 30000 篇描述作為訓(xùn)練集,對應(yīng) 20000 篇圖片(一張圖片可能會包含多個描述)。 訓(xùn)練模型的數(shù)據(jù)量相對較小,因此只用了一個 P100 GPU,訓(xùn)練模型大約需要兩個小時。
# Import TensorFlow and enable eager execution
# This code requires TensorFlow version >=1.9
import tensorflow as tf
# We"ll generate plots of attention in order to see which parts of an image
# our model focuses on during captioning
import matplotlib.pyplot as plt
# Scikit-learn includes many helpful utilities
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
import re
import numpy as np
import os
import time
import json
from glob import glob
from PIL import Image
import pickle
下載 MS-COCO 數(shù)據(jù)集
MS-COCO (http://cocodataset.org/#home)數(shù)據(jù)集包含 82,000 多張圖片,每張圖片都是用至少 5 句不同的文字描述的。 下面的代碼在運行時會自動下載并且解壓數(shù)據(jù)。
注意: 提前下載好數(shù)據(jù),數(shù)據(jù)文件大小 13GB 。
annotation_zip = tf.keras.utils.get_file("captions.zip",?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cache_subdir=os.path.abspath("."),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?origin = "http://images.cocodataset.org/annotations/annotations_trainval2014.zip",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?extract = True)
annotation_file = os.path.dirname(annotation_zip)+"/annotations/captions_train2014.json"
name_of_zip = "train2014.zip"
if not os.path.exists(os.path.abspath(".") + "/" + name_of_zip):
?image_zip = tf.keras.utils.get_file(name_of_zip,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cache_subdir=os.path.abspath("."),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?origin = "http://images.cocodataset.org/zips/train2014.zip",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?extract = True)
?PATH = os.path.dirname(image_zip)+"/train2014/"
?PATH = os.path.abspath(".")+"/train2014/"
本教程中選擇用 30000 篇描述和它們對應(yīng)的圖片來訓(xùn)練模型,但是當使用更多數(shù)據(jù)時,實驗結(jié)果的質(zhì)量通常會得到提高。
# read the json file
with open(annotation_file, "r") as f:
? ?annotations = json.load(f)
# storing the captions and the image name in vectors
all_captions = []
all_img_name_vector = []
for annot in annotations["annotations"]:
? ?caption = "
? ?image_id = annot["image_id"]
? ?full_coco_image_path = PATH + "COCO_train2014_" + "%012d.jpg" % (image_id)
? ?
? ?all_img_name_vector.append(full_coco_image_path)
? ?all_captions.append(caption)
# shuffling the captions and image_names together
# setting a random state
train_captions, img_name_vector = shuffle(all_captions,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?all_img_name_vector,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?random_state=1)
# selecting the first 30000 captions from the shuffled set
num_examples = 30000
train_captions = train_captions[:num_examples]
img_name_vector = img_name_vector[:num_examples]
len(train_captions), len(all_captions)
Inceptions v3 圖像預(yù)處理
這個步驟中需要使用 InceptionV3 (在 Imagenet 上訓(xùn)練好的模型) 對每一張圖片進行分類,并且從最后一個卷積層中提取特征。
首先,我們需要將圖像轉(zhuǎn)換為 inceptionV3 需要的格式:
把圖像的大小固定到 (299, 299)
使用 preprocess_input (https://www.tensorflow.org/api_docs/python/tf/keras/applications/inception_v3/preprocess_input)函數(shù)將像素調(diào)整到 -1 到 1 的范圍內(nèi)(為了匹配 inceptionV3 的輸入格式)。
def load_image(image_path):
? ?img = tf.read_file(image_path)
? ?img = tf.image.decode_jpeg(img, channels=3)
? ?img = tf.image.resize_images(img, (299, 299))
? ?img = tf.keras.applications.inception_v3.preprocess_input(img)
? ?return img, image_path
初始化 InceptionV3 & 下載 Imagenet 的預(yù)訓(xùn)練權(quán)重
將 InceptionV3 的最后一個卷積層作為輸出層時,需要創(chuàng)建一個 keras 模型。
將處理好的圖片輸入神經(jīng)網(wǎng)絡(luò),然后提取最后一層中獲得的向量作為圖像特征保存成字典格式(圖名 --> 特征向量);
在網(wǎng)絡(luò)中訓(xùn)練完成以后,將緩存的字典文件輸出為 pickle 文件并且保存到本地磁盤。
image_model = tf.keras.applications.InceptionV3(include_top=False,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?weights="imagenet")
new_input = image_model.input
hidden_layer = image_model.layers[-1].output
image_features_extract_model = tf.keras.Model(new_input, hidden_layer)
保存從 InceptionV3中提取的特征
利用 InceptionV3 對圖像進行預(yù)處理以后將輸出保存到本地磁盤,將輸出緩存到 RAM 中會更快,但是內(nèi)存更密集,每張圖片都需要 8 * 8 * 2048 浮點數(shù)大小。 在寫入時,這個大小可能會超過 Colab 的限制(也許會有浮動,但是當前這個實例顯示大約需要 12GB)。
采用更復(fù)雜的緩存策略可以提高性能,但前提是代碼會更的更復(fù)雜。例如,通過對數(shù)據(jù)進行分區(qū)來減少磁盤的隨機訪問 I/O 。
通過 GPU 在 Colab 上運行這個模型大約需要花費 10 分鐘。假如需要直觀地看程序進度,可以安裝 tqdm (!pip install tqdm), 并且修改這一行代碼:
for img, path in image_dataset: 為:for img, path in tqdm(image_dataset):
# getting the unique images
encode_train = sorted(set(img_name_vector))
# feel free to change the batch_size according to your system configuration
image_dataset = tf.data.Dataset.from_tensor_slices(
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?encode_train).map(load_image).batch(16)
for img, path in image_dataset:
?batch_features = image_features_extract_model(img)
?batch_features = tf.reshape(batch_features,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(batch_features.shape[0], -1, batch_features.shape[3]))
?for bf, p in zip(batch_features, path):
? ?path_of_feature = p.numpy().decode("utf-8")
? ?np.save(path_of_feature, bf.numpy())
然后,只保存詞表中的前 5000 個詞, 其他的詞標記為 "UNK" (不認識的詞);
# This will find the maximum length of any caption in our dataset
def calc_max_length(tensor):
? ?return max(len(t) for t in tensor)
# The steps above is a general process of dealing with text processing
# choosing the top 5000 words from the vocabulary
top_k = 5000
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=top_k,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?oov_token="
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?filters="!"#$%&()*+.,-/:;=?@[]^_`{|}~ ")
train_seqs = tokenizer.texts_to_sequences(train_captions)
tokenizer.word_index = {key:value for key, value in tokenizer.word_index.items() if value <= top_k}
# putting
tokenizer.word_index[tokenizer.oov_token] = top_k + 1
# creating the tokenized vectors
train_seqs = tokenizer.texts_to_sequences(train_captions)
# creating a reverse mapping (index -> word)
index_word = {value:key for key, value in tokenizer.word_index.items()}
# padding each vector to the max_length of the captions
# if the max_length parameter is not provided, pad_sequences calculates that automatically
cap_vector = tf.keras.preprocessing.sequence.pad_sequences(train_seqs, padding="post")
# calculating the max_length?
# used to store the attention weights
max_length = calc_max_length(train_seqs)
# Create training and validation sets using 80-20 split
img_name_train, img_name_val, cap_train, cap_val = train_test_split(img_name_vector,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cap_vector,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?test_size=0.2,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?random_state=0)
len(img_name_train), len(cap_train), len(img_name_val), len(cap_val)
準備好圖形和描述數(shù)據(jù)后,就可以用 tf.data 訓(xùn)練集來訓(xùn)練模型了!
# feel free to change these parameters according to your system"s configuration
embedding_dim = 256
units = 512
vocab_size = len(tokenizer.word_index)
# shape of the vector extracted from InceptionV3 is (64, 2048)
# these two variables represent that
features_shape = 2048
attention_features_shape = 64
# loading the numpy files?
def map_func(img_name, cap):
? ?img_tensor = np.load(img_name.decode("utf-8")+".npy")
? ?return img_tensor, cap
dataset = tf.data.Dataset.from_tensor_slices((img_name_train, cap_train))
# using map to load the numpy files in parallel
# NOTE: Be sure to set num_parallel_calls to the number of CPU cores you have
# https://www.tensorflow.org/api_docs/python/tf/py_func
dataset = dataset.map(lambda item1, item2: tf.py_func(
? ? ? ? ?map_func, [item1, item2], [tf.float32, tf.int32]), num_parallel_calls=8)
# shuffling and batching
dataset = dataset.shuffle(BUFFER_SIZE)
# https://www.tensorflow.org/api_docs/python/tf/contrib/data/batch_and_drop_remainder
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(1)
有趣的是,本實驗中的解碼器與 Neural Machine Translation with Attention (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb)這篇論文中的結(jié)構(gòu)完全相同。
這個模型的結(jié)構(gòu)參考了 Show, Attend and Tell (https://arxiv.org/pdf/1502.03044.pdf)這篇文章。
在本教程的實驗中,我們從 InceptionV3 模型的下卷積層中提取特征,特征向量的大小為 (8, 8, 2048);
需要把這個形狀拉伸到 (64, 2048);
把這個向量輸入到 CNN 編碼器(還包括了一個全連接層);
用 RNN (這里用的是 RNN 的改進算法 GRU) 來預(yù)測詞序列。
def gru(units):
?# If you have a GPU, we recommend using the CuDNNGRU layer (it provides a?
?# significant speedup).
?if tf.test.is_gpu_available():
? ?return tf.keras.layers.CuDNNGRU(units,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?return_sequences=True,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?return_state=True,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?recurrent_initializer="glorot_uniform")
? ?return tf.keras.layers.GRU(units,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return_sequences=True,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return_state=True,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? recurrent_activation="sigmoid",?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? recurrent_initializer="glorot_uniform")
class BahdanauAttention(tf.keras.Model):
?def __init__(self, units):
? ?super(BahdanauAttention, self).__init__()
? ?self.W1 = tf.keras.layers.Dense(units)
? ?self.W2 = tf.keras.layers.Dense(units)
? ?self.V = tf.keras.layers.Dense(1)
?def call(self, features, hidden):
? ?# features(CNN_encoder output) shape == (batch_size, 64, embedding_dim)
? ?
? ?# hidden shape == (batch_size, hidden_size)
? ?# hidden_with_time_axis shape == (batch_size, 1, hidden_size)
? ?hidden_with_time_axis = tf.expand_dims(hidden, 1)
? ?
? ?# score shape == (batch_size, 64, hidden_size)
? ?score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))
? ?
? ?# attention_weights shape == (batch_size, 64, 1)
? ?# we get 1 at the last axis because we are applying score to self.V
? ?attention_weights = tf.nn.softmax(self.V(score), axis=1)
? ?
? ?# context_vector shape after sum == (batch_size, hidden_size)
? ?context_vector = attention_weights * features
? ?context_vector = tf.reduce_sum(context_vector, axis=1)
? ?
? ?return context_vector, attention_weights
class BahdanauAttention(tf.keras.Model):
?def __init__(self, units):
? ?super(BahdanauAttention, self).__init__()
? ?self.W1 = tf.keras.layers.Dense(units)
? ?self.W2 = tf.keras.layers.Dense(units)
? ?self.V = tf.keras.layers.Dense(1)
?def call(self, features, hidden):
? ?# features(CNN_encoder output) shape == (batch_size, 64, embedding_dim)
? ?
? ?# hidden shape == (batch_size, hidden_size)
? ?# hidden_with_time_axis shape == (batch_size, 1, hidden_size)
? ?hidden_with_time_axis = tf.expand_dims(hidden, 1)
? ?
? ?# score shape == (batch_size, 64, hidden_size)
? ?score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))
? ?
? ?# attention_weights shape == (batch_size, 64, 1)
? ?# we get 1 at the last axis because we are applying score to self.V
? ?attention_weights = tf.nn.softmax(self.V(score), axis=1)
? ?
? ?# context_vector shape after sum == (batch_size, hidden_size)
? ?context_vector = attention_weights * features
? ?context_vector = tf.reduce_sum(context_vector, axis=1)
? ?
? ?return context_vector, attention_weights
class CNN_Encoder(tf.keras.Model):
? ?# Since we have already extracted the features and dumped it using pickle
? ?# This encoder passes those features through a Fully connected layer
? ?def __init__(self, embedding_dim):
? ? ? ?super(CNN_Encoder, self).__init__()
? ? ? ?# shape after fc == (batch_size, 64, embedding_dim)
? ? ? ?self.fc = tf.keras.layers.Dense(embedding_dim)
? ? ? ?
? ?def call(self, x):
? ? ? ?x = self.fc(x)
? ? ? ?x = tf.nn.relu(x)
? ? ? ?return x
class RNN_Decoder(tf.keras.Model):
?def __init__(self, embedding_dim, units, vocab_size):
? ?super(RNN_Decoder, self).__init__()
? ?self.units = units
? ?self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
? ?self.gru = gru(self.units)
? ?self.fc1 = tf.keras.layers.Dense(self.units)
? ?self.fc2 = tf.keras.layers.Dense(vocab_size)
? ?
? ?self.attention = BahdanauAttention(self.units)
? ? ? ?
?def call(self, x, features, hidden):
? ?# defining attention as a separate model
? ?context_vector, attention_weights = self.attention(features, hidden)
? ?
? ?# x shape after passing through embedding == (batch_size, 1, embedding_dim)
? ?x = self.embedding(x)
? ?
? ?# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
? ?x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
? ?
? ?# passing the concatenated vector to the GRU
? ?output, state = self.gru(x)
? ?
? ?# shape == (batch_size, max_length, hidden_size)
? ?x = self.fc1(output)
? ?
? ?# x shape == (batch_size * max_length, hidden_size)
? ?x = tf.reshape(x, (-1, x.shape[2]))
? ?
? ?# output shape == (batch_size * max_length, vocab)
? ?x = self.fc2(x)
? ?return x, state, attention_weights
?def reset_state(self, batch_size):
? ?return tf.zeros((batch_size, self.units))
encoder = CNN_Encoder(embedding_dim)
decoder = RNN_Decoder(embedding_dim, units, vocab_size)
optimizer = tf.train.AdamOptimizer()
# We are masking the loss calculated for padding
def loss_function(real, pred):
? ?mask = 1 - np.equal(real, 0)
? ?loss_ = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=real, logits=pred) * mask
? ?return tf.reduce_mean(loss_)
提取 ?.npy 相關(guān)的文件中存儲的特征并輸入到編碼器中去;
將編碼器的輸出、隱狀態(tài)(初始化為 0) 和解碼器的輸入(句子分詞結(jié)果的索引集合) 一起輸入到解碼器中去;
使用 teacher forcing 來決定解碼器的下一個輸入;
Teacher forcing 是用于篩選編碼器下一個輸入詞的技術(shù);
# adding this in a separate cell because if you run the training cell?
# many times, the loss_plot array will be reset
loss_plot = []
for epoch in range(EPOCHS):
? ?start = time.time()
? ?total_loss = 0
? ?
? ?for (batch, (img_tensor, target)) in enumerate(dataset):
? ? ? ?loss = 0
? ? ? ?
? ? ? ?# initializing the hidden state for each batch
? ? ? ?# because the captions are not related from image to image
? ? ? ?hidden = decoder.reset_state(batch_size=target.shape[0])
? ? ? ?dec_input = tf.expand_dims([tokenizer.word_index["
? ? ? ?
? ? ? ?with tf.GradientTape() as tape:
? ? ? ? ? ?features = encoder(img_tensor)
? ? ? ? ? ?
? ? ? ? ? ?for i in range(1, target.shape[1]):
? ? ? ? ? ? ? ?# passing the features through the decoder
? ? ? ? ? ? ? ?predictions, hidden, _ = decoder(dec_input, features, hidden)
? ? ? ? ? ? ? ?loss += loss_function(target[:, i], predictions)
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?# using teacher forcing
? ? ? ? ? ? ? ?dec_input = tf.expand_dims(target[:, i], 1)
? ? ? ?
? ? ? ?total_loss += (loss / int(target.shape[1]))
? ? ? ?
? ? ? ?variables = encoder.variables + decoder.variables
? ? ? ?
? ? ? ?gradients = tape.gradient(loss, variables)?
? ? ? ?
? ? ? ?optimizer.apply_gradients(zip(gradients, variables), tf.train.get_or_create_global_step())
? ? ? ?
? ? ? ?if batch % 100 == 0:
? ? ? ? ? ?print ("Epoch {} Batch {} Loss {:.4f}".format(epoch + 1,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?batch,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?loss.numpy() / int(target.shape[1])))
? ?# storing the epoch end loss value to plot later
? ?loss_plot.append(total_loss / len(cap_vector))
? ?
? ?print ("Epoch {} Loss {:.6f}".format(epoch + 1,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? total_loss/len(cap_vector)))
? ?print ("Time taken for 1 epoch {} sec ".format(time.time() - start))
plt.title("Loss Plot")
評價函數(shù)與迭代訓(xùn)練的過程類似,除了不使用 teacher forcing 機制,解碼器的每一步輸入都是前一步的預(yù)測結(jié)果、編碼器輸入和隱狀態(tài);
def evaluate(image):
? ?attention_plot = np.zeros((max_length, attention_features_shape))
? ?hidden = decoder.reset_state(batch_size=1)
? ?temp_input = tf.expand_dims(load_image(image)[0], 0)
? ?img_tensor_val = image_features_extract_model(temp_input)
? ?img_tensor_val = tf.reshape(img_tensor_val, (img_tensor_val.shape[0], -1, img_tensor_val.shape[3]))
? ?features = encoder(img_tensor_val)
? ?dec_input = tf.expand_dims([tokenizer.word_index["
? ?result = []
? ?for i in range(max_length):
? ? ? ?predictions, hidden, attention_weights = decoder(dec_input, features, hidden)
? ? ? ?attention_plot[i] = tf.reshape(attention_weights, (-1, )).numpy()
? ? ? ?predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].numpy()
? ? ? ?result.append(index_word[predicted_id])
? ? ? ?if index_word[predicted_id] == "
? ? ? ? ? ?return result, attention_plot
? ? ? ?dec_input = tf.expand_dims([predicted_id], 0)
? ?attention_plot = attention_plot[:len(result), :]
? ?return result, attention_plot
def plot_attention(image, result, attention_plot):
? ?temp_image = np.array(Image.open(image))
? ?fig = plt.figure(figsize=(10, 10))
? ?
? ?len_result = len(result)
? ?for l in range(len_result):
? ? ? ?temp_att = np.resize(attention_plot[l], (8, 8))
? ? ? ?ax = fig.add_subplot(len_result//2, len_result//2, l+1)
? ? ? ?ax.set_title(result[l])
? ? ? ?img = ax.imshow(temp_image)
? ? ? ?ax.imshow(temp_att, cmap="gray", alpha=0.6, extent=img.get_extent())
? ?plt.tight_layout()
? ?plt.show()
# captions on the validation set
rid = np.random.randint(0, len(img_name_val))
image = img_name_val[rid]
real_caption = " ".join([index_word[i] for i in cap_val[rid] if i not in [0]])
result, attention_plot = evaluate(image)
print ("Real Caption:", real_caption)
print ("Prediction Caption:", " ".join(result))
plot_attention(image, result, attention_plot)
# opening the image
image_url = "https://tensorflow.org/images/surf.jpg"
image_extension = image_url[-4:]
image_path = tf.keras.utils.get_file("image"+image_extension,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? origin=image_url)
result, attention_plot = evaluate(image_path)
print ("Prediction Caption:", " ".join(result))
plot_attention(image_path, result, attention_plot)
# opening the image
恭喜你!已經(jīng)可以訓(xùn)練一個基于注意力機制的圖片描述模型,而且你也可以嘗試對不同的圖像數(shù)據(jù)集進行實驗。有興趣的話,可以看一下這個示例 : Neural Machine Translation with Attention(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb)。 這個機器翻譯模型與本實驗使用的結(jié)構(gòu)相似,可以翻譯西班牙語和英語句子。
