taketea2018 が 2026年03月29日15時19分55秒に編集

初版

タイトルの変更

データサイエンス入門　第9回　LSTMでLSTMで小説を執筆しよう　その２

タグの変更

データサイエンス

Python

LSTM

小説執筆

ディープラーニング

電子工作マガジン

電波新聞社

記事種類の変更

セットアップや使用方法

本文の変更

# データサイエンス入門　 AIプログラミングで学ぶデータサイエンス ## 第9回　LSTMでLSTMで小説を執筆しよう　その２　LSTMで文章を学習して保存しましょう LSTMの特徴は把握できたでしょうか。LSTMを利用して、小説の執筆に挑戦しましょう。本連載は人工知能AIを扱うのに適しているPythonを言語として使用し、人工知能AIライブラリであるKerasを利用します。使用するPythonのバージョンは 3.10.12です。またKerasのバージョンは 3.5.0です。 Pythonプログラミングの前にAIに小説を執筆させる方法と手順を説明します。LSTMで日本語を扱う場合、形態素解析により単語単位で学習した方がよいと思いますが、複雑になるので、まずは文字単位で試します。 LSTMで文章を学習させるには時系列データとして文字列を考えます。そこで、数文字の文字列の後にどんな文字が来るのかを学習して、未知の文字列に対して続く文字をLSTMに出力させることで新たな文章を生成します。 ## 〇紹介動画は下記URLよりご視聴ください。 https://youtu.be/mb6X2IXPBeg ## 〇スライド形式pdf解説書です。 https://drive.google.com/file/d/1mun_MOj769r5ZZuoIGVP-oAR5djJLH9q/view?usp=drive_link ## 〇学習元の文章です。 sakuhin_all.txt https://drive.google.com/file/d/1KgJG9D1SMNSTzg49vcv7KlWgCuK4KG9I/view?usp=drive_link ## 〇サンプルプログラム ``` !pip install mecab-python3 !pip install unidic !python -m unidic download !apt-get -q -y install mecab libmecab-dev file !git clone --depth 1 https://github.com/neologd/mecab-unidic-neologd.git !echo yes | mecab-unidic-neologd/bin/install-mecab-unidic-neologd -n ``` ``` import matplotlib.pyplot as plt # 追加 from tensorflow import keras from tensorflow.keras import layers import numpy as np import random import sys import io from google.colab import drive drive.mount('/content/drive') # cut the text in semi-redundant sequences of maxlen characters maxlen = 3 step = 1 sentences = [] gakushu=3 neta_path="/content/drive/MyDrive/data_science/sakuhinn_all.txt" hdf_path="/content/drive/MyDrive/data_science/sakuhinn_all.hdf" with io.open(neta_path, encoding='utf-8') as f: text = f.read().lower() print('corpus length:', len(text)) chars = sorted(list(set(text))) print('総文字数:', len(chars)) char_indices = dict((c, i) for i, c in enumerate(chars)) indices_char = dict((i, c) for i, c in enumerate(chars)) print("chars:",chars) next_chars = [] for i in range(0, len(text) - maxlen, step): sentences.append(text[i: i + maxlen]) next_chars.append(text[i + maxlen]) print('nb sequences:', len(sentences)) print("sentence:",sentences) print('ベクトル処理...') x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool) y = np.zeros((len(sentences), len(chars)), dtype=np.bool) for i, sentence in enumerate(sentences): for t, char in enumerate(sentence): x[i, t, char_indices[char]] = 1 y[i, char_indices[next_chars[i]]] = 1 # build the model: a single LSTM print('AI LSTMモデル定義...') model = keras.Sequential() model.add(layers.LSTM(128, input_shape=(maxlen, len(chars)))) model.add(layers.Dense(len(chars))) model.add(layers.Activation('softmax')) optimizer = keras.optimizers.RMSprop(learning_rate=0.01) model.compile(loss="categorical_crossentropy", optimizer=optimizer) model.summary() #model.compile(loss='categorical_crossentropy') history = model.fit(x, y,batch_size=128,epochs=gakushu) model.save(hdf_path) print('＊＊＊＊＊　LSTMラーニング完了！結果を保存しました。＊＊＊＊＊') print("保存結果:",hdf_path) #print(history.history.keys()) # ヒストリデータのラベルを確認 #dict_keys(['val_acc', 'acc', 'val_loss', 'loss']) import matplotlib.pyplot as plt %matplotlib inline plt.plot(range(1,gakushu+1),history.history['loss']) #plt.plot(history.history['val_loss']) plt.title('model loss') plt.xlabel('Gakushu') plt.ylabel('Sonshitu') plt.grid() plt.legend() plt.show() plt.savefig("loss.png") plt.close() ``` GoogleColaboratoryにアップロードすればすぐに動作を確認できます。実行結果のサンプル付きです。 https://drive.google.com/file/d/14geNl2d8ASRXohV25DX6KfUWeS8j56bA/view?usp=drive_link ## 〇補足公開している動画と解説用pdfは電波新聞社刊行電子工作マガジンに連載された同題名の内容をGoogle NotebookLMにてまとめています