Machine Learning – code-lab.net

Stable Diffusion+Windows 10+AMD GPUの環境で動かす

Stable DiffusionをWindows 10とAMD GPU上で動作させている記事、Running Stable Diffusion on Windows with an AMD GPUを見つけたので、実践。以下は作成してみたサンプル。

事前準備

まずは実行環境を確認。
・メモリ6GB以上のAMD GPU
・Pythonの3.7、3.8、3.9、3.10がインストールされている
・Gitがインストールされている。
・Hugging Face（Stable Diffusionの学習済みモデルを公開している）のアカウント
・6GBの学習モデルをダウンロード＆変換する勇気

と言うわけでPythonのインストール。PythonはMicrosoft StoreからPython 3.10をインストールしてしまうのが手がかからない。

続いてGitのインストール。私はVisual Studioと一緒にインストールされてしまっているが、単独でインストールするならGit for Windowsをインストールするのが良いかと思う。

さらにHugging Faceにユーザー登録する。ユーザー登録自体は無料。

インストール作業

ますはインストール先となるフォルダを作成する。私はD:\stable-diffusionにした。インストールには15GB程度の空き容量が必要になるはずなので、十分に空きのあるドライブを選ぶこと。PowerShellからコマンドプロンプトを開いて、インストール先フォルダに移動します。

Microsoft’s DirectMLに対応したOnnx runtimeをhttps://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightlyからonnxruntime-directmlをダウンロードする。ダウンロードするランタイムはPythonのバージョンによって異なる。Python 3.10.xをインストールしているなら、cp310(onnxruntime_directml-1.12.0-cp310-cp310-win_amd64.whl)をダウンロードする。

コマンドプロンプトまたはPowerShellのプロンプトを開いて以下のコマンドを実行。

python -m venv ./virtualenv
./virtualenv/Scripts/Activate.ps1 または virtualenv\Scripts\activate.bat
pip install diffusers==0.3.0
pip install transformers
pip install onnxruntime
pip install protobuf<3.20.x
pip install onnx
pip install pathToYourDownloadedFile/ort_nightly_whatever_version_you_got.whl --force-reinstall

Stable Diffusionの学習済みモデルをダウンロードするには、ライセンス条項に同意する必要があります。ライセンス条項を確認した上で、Hugging Faceのhttps://huggingface.co/CompVis/stable-diffusion-v1-4のページにアクセスして、「 have read the License and agree with its terms」にチェックをして、Access Repositoryをクリック。

Hugging FaceのAccess Tokenを発行する必要があります。Hugging FaceのWEBページ右上のユーザーアイコンから「Settings→Access Tokens」に進み、Access Tokenを発行します。

プロンプトから以下のコマンドを実行します。

huggingface-cli.exe login

Token:とプロンプトが表示されるので、先ほど発行したAccess Tokenを入力します。「Your token has been saved to ～」と表示されれば大丈夫です。

Stable DiffusionをOnnixに変換するためのスクリプトを「https://raw.githubusercontent.com/huggingface/diffusers/main/scripts/convert_stable_diffusion_checkpoint_to_onnx.py」からダウンロードします。

以下のコマンドを実行します。

python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"

学習モデルのダウンロードと変換に数時間かかります。気長に待ってください。

試しに画像を生成してみます。以下の内容をtext2img.pyとして保存して、プロンプトから実行してみてください。output.pngが無事に生成されればインストール成功です。

from diffusers import StableDiffusionOnnxPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")

prompt = "A happy celebrating robot on a mountaintop, happy, landscape, dramatic lighting, art by artgerm greg rutkowski alphonse mucha, 4k uhd'"

image = pipe(prompt).images[0] 
image.save("output.png")

生成する画像サイズや、反復計算する回数などのパラメータを指定する事もできます。それらを指定する場合には以下のコードを参考に書き換えてください。

from diffusers import StableDiffusionOnnxPipeline
import numpy as np

def get_latents_from_seed(seed: int, width: int, height:int) -> np.ndarray:
    # 1 is batch size
    latents_shape = (1, 4, height // 8, width // 8)
    # Gotta use numpy instead of torch, because torch's randn() doesn't support DML
    rng = np.random.default_rng(seed)
    image_latents = rng.standard_normal(latents_shape).astype(np.float32)
    return image_latents

pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
"""
prompt: Union[str, List[str]],
height: Optional[int] = 512,
width: Optional[int] = 512,
num_inference_steps: Optional[int] = 50,
guidance_scale: Optional[float] = 7.5, # This is also sometimes called the CFG value
eta: Optional[float] = 0.0,
latents: Optional[np.ndarray] = None,
output_type: Optional[str] = "pil",
"""

seed = 50033
# Generate our own latents so that we can provide a seed.
latents = get_latents_from_seed(seed, 512, 512)
prompt = "A happy celebrating robot on a mountaintop, happy, landscape, dramatic lighting, art by artgerm greg rutkowski alphonse mucha, 4k uhd"
image = pipe(prompt, num_inference_steps=25, guidance_scale=13, latents=latents).images[0]
image.save("output.png")

制限

この時点ではいくつかの制限があります。

生成画像解像度の制約

512×512以上の解像度の画像を生成することが出来ません。これはOnnixの制約です。より高解像度の画像が必要な場合には、waifu2xなどの超解像AIを併用するなどの工夫が必要です。

NSFW（職場閲覧注意）による制約

真っ黒な画像を生成することが頻繁にあります。標準で組み込まれているNSFW（職場閲覧注意）フィルタにブロックされていることが原因です。以下の様に1行追加する事でNSFWを無効に出来るようですが、Stable Diffusionの利用規約上NSFWを無効にして生成した画像の公開は慎重に行いましょう。NSFWを無効にすると相当に高速になるという副次的効果もあるようです。

pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", device_map="auto", provider="DmlExecutionProvider", max_memory=max_memory_mapping)
pipe.safety_checker = lambda images, **kwargs: (images, [False] * len(images))

その他、細かなHackは続編のStable Diffusion Updatesに・・・・

UbuntuへのCNTKインストールで「E: Package ‘libpng12-0’ has no installation candidate」

Ubuntu 16.xへのCNTKをインストールする場合、install-cntk.shを実行すると次のエラーが発生します。
E: Package ‘libpng12-0’ has no installation candidate

これはlibpng12が標準のUbuntuパッケージには含まれていないためです。次のURL（http://packages.ubuntu.com/xenial/amd64/libpng12-0/download）の指示に従って、/etc/apt/sources.listにdeb http://cz.archive.ubuntu.com/ubuntu xenial main を追記してから、install-cntk.shを実行します。

the Computational Network Toolkit by Microsoft Research（CNTK）のインストール

久しぶりにCNTKを触ってみたら、以前の記事とインストール手順が大分変わっていたのでメモ。以前は必要なランタイムのインストールは手動で行うしかありませんでしたが、インストールスクリプトで自動的に行ってくれるようになっています。

対象OSとなるWindowsは64bit版のみです。CUDAに対応していますが、CUDAに対応したGPGPUを搭載していなくても動作します。

CNTKを動かすにはCUDA7.0が必要になるので、NVIDIAのホームページからCUDA 7.0 （https://developer.nvidia.com/cuda-toolkit-70）をダウンロードしてインストールします。

CNTKのバイナリをhttps://github.com/Microsoft/CNTK/releasesからダウンロードします。GPU版とCPU版が用意されていますが、私が試したときにはGPU版しか置かれていませんでした。GPGPUが無い環境でもCUDA7.０をインストールしていれば、GPU版を動作させることができます。

ダウンロードしたバイナリをC:\local\cntkに展開します。

Powershellを管理者権限で起動して以下のコマンドを実行します。これによりAnaconda3のインストール他、必要な環境の設定をおこなってくれます。

cd c:\local\cntk\Scripts\install\windows
.\install.ps1 -execute

最初に何度かセキュリティ警告が表示されますが、全て「[R]一度だけ実行する」を選択してください。

途中で以下のように選択が表示されるので１を入力してください。

1 - I agree and want to continue
Q - Quit the installation process

最後に以下のように確認が表示されるのでyを入力してください。

Do you want to continue? (y/n)

以上でインストール完了します。

サンプルを実行するにはコマンドプロンプトを開いて、最初に以下のコマンドを実行します。

c:\local\cntk\scripts\cntkpy34.bat

Microsoft Cognitive Services API公開

Microsoft Cognitive Services APIが公開されました。Windows 10で追加された画像認識や顔認証、モーション認識、Cortanaの音声認識、自然言語認識、音声合成などのもとになっている各種強力な機能を含む多くのWEB APIが新たに公開されました。ちょっと前まで機械学習に関する知識を身に着け、膨大なサンプルデータと計算資源を自ら用意して計算させなくてはできなかった事が、Web API呼び出しだけで実装できるようになったのは、破壊的に大きな変化ですよ。

しかも試しにちょっと使うだけなら、無料って言う・・・

機械学習でtotoの予測をしてみた（結果）

機械学習でtotoの予測をしてみた

13試合中5試合的中。的中率38%とランダムよりはマシだけど、ランダムと大差ない結果になりました。

ランダムと大差ないとかちょっと悔しいので、また学習モデルを検討してみようかな。

機械学習でtotoの予測をしてみた

せっかく学習したのだし多少は実益に・・・・ということで、機械学習（Deep Learning）でtotoの予測を立ててみた。
過去の対戦成績のデータはJ.Leagu Data Siteから頂いた。これだけではつまらないので、気象庁の過去の気象データ・ダウンロードから対戦時の気象情報をダウンロードした。これを元に学習をさせてみる。

チームの構成人員や、それらのパフォーマンスなど入力データを増やすのは疲れるのでやらない。むやみにデータを増やせばよいというものではないし、それらのデータは対戦結果として既にフィードバックされているので問題ないと判断した。

予測結果は次のようになったが、果たして・・・

広島 x Ｇ大阪 -> 広島勝
磐田 x 名古屋 -> 名古屋勝
広島 x 川崎Ｆ -> 川崎Ｆ勝
鳥栖 x 福岡 -> 鳥栖勝
柏 x 浦和 -> 浦和勝
湘南 x 新潟 -> 新潟勝
神戸 x 甲府 -> 神戸勝
横浜FM x 仙台 -> 引分
FC東京 x 大宮 -> FC東京勝
Ｇ大阪 x 鹿島 -> Ｇ大阪勝
東京Ｖ x 札幌 -> 札幌勝
金沢 x 長崎 -> 金沢勝
清水 x 愛媛 -> 引分
山口 x 岡山 -> 引分
熊本 x 松本 -> 引分
群馬 x 岐阜 -> 群馬勝
町田 x Ｃ大阪 -> 引分
横浜FC x 讃岐 -> 横浜FC勝
千葉 x 徳島 -> 徳島勝
京都 x 水戸 -> 水戸勝
北九州 x 山形 -> 北九州勝

the Computational Network Toolkit by Microsoft Research（CNTK）のインストール

WindowsでCNTKを動かしてみたいと思います。

対象OSとなるWindowsは64bit版のみです。CUDAに対応していますが、CUDAに対応したGPGPUを搭載していなくても動作します。

CNTKを動かすにはVisual Studio 2013のランタイムDLLが必要になるので、MicrosoftのホームページからVisual C++ Redistributable Packages for Visual Studio 2013（http://www.microsoft.com/en-us/download/details.aspx?id=40784）をダウンロードしてインストールします。

CNTKを複数台のPCを使った分散環境で動作させるにはMS-MPI SDK（https://msdn.microsoft.com/en-us/library/bb524831(v=vs.85).aspx）が必要になります。なくともスタンドアロンでは動くので、取りあえずスルーします。

とりあえずサンプルを動作させてみます。ダウンロードしたzipファイルを適当なフォルダに解凍します。cntk\cntkとcntk\exampleと言うフォルダが出来ます。\cntk\Examples\Other\Simple2dに移動して次のコマンドを実行します。

[解凍先]\cntk\cntk\cntk.exe configFile=Config\Simple.cntk

何というか、拍子抜けするぐらいお手軽です。Simple.cntkのコンフィグファイルでニューラルネットワークの構成や学習に使用するデータを指定します。設定ファイルの構文は独特ですが、それほど難いわけではありません。データを空白区切りのテキストファイルとして用意すれば、いつでもお手軽に使えそうですね。

参考：Setup CNTK on your machine

BackPropagationLearningをSIMDに対応してみた

Accord.NET(AForge.NET)のBackPropagationLearningをSIMDを使用するように修正してみた。
速度的には以前に作ったマルチスレッド対応版のBackPropagationLearningの1.2倍程度の早さになってます。
簡略版なので並列数は4に固定。したがってAVX非対応のCPUだと動きません。
並列数2だと、おそらく速度的に変わらないので作るつもりもありません。
DeepBeliefNetworkなど関連するクラスも合わせて修正すればもっと早くなるのだろうけど、とりあえずBackPropagationLearningのみの対応。流石に全部はボリューム多すぎるし・・・orz

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
using System.Numerics;
using System.Diagnostics;

using AForge.Neuro;
using AForge.Neuro.Learning;

namespace LernDetectImage
{
    class SimdBackPropagationLearning : BackPropagationLearning
    {
        // network to teach
        private ActivationNetwork network;

        // learning rate
        private double learningRate = 0.1;

        // momentum
        private double momentum = 0.0;

        // neuron's errors
        private Vector<double>[][] neuronErrors = null;

        // weight's updates
        private Vector<double>[][][] weightsUpdates = null;

        // threshold's updates
        private Vector<double>[][] thresholdsUpdates = null;

        public new double LearningRate
        {
            get { return learningRate; }
            set
            {
                learningRate = Math.Max(0.0, Math.Min(1.0, value));
            }
        }

        public new double Momentum
        {
            get { return momentum; }
            set
            {
                momentum = Math.Max(0.0, Math.Min(1.0, value));
            }
        }

        public SimdBackPropagationLearning(ActivationNetwork network) : base(network)
        {
            this.network = network;

            // create error and deltas arrays
            neuronErrors = new Vector<double>[network.Layers.Length][];
            weightsUpdates = new Vector<double>[network.Layers.Length][][];
            thresholdsUpdates = new Vector<double>[network.Layers.Length][];

            // initialize errors and deltas arrays for each layer
            for (int i = 0; i < network.Layers.Length; i++)
            {
                Layer layer = network.Layers[i];

                neuronErrors[i] = new Vector<double>[layer.Neurons.Length / 4];
                weightsUpdates[i] = new Vector<double>[layer.Neurons.Length][];
                thresholdsUpdates[i] = new Vector<double>[layer.Neurons.Length / 4];

                // for each neuron
                for (int j = 0; j < weightsUpdates[i].Length; j++)
                {
                    weightsUpdates[i][j] = new Vector<double>[layer.InputsCount / 4];
                }
            }
        }

        public new double Run(double[] input, double[] output)
        {
            // compute the network's output
            network.Compute(input);

            // calculate network error
            double error = CalculateError(output);

            // calculate weights updates
            CalculateUpdates(input);

            // update the network
            UpdateNetwork();

            return error;

        }

        public new double RunEpoch(double[][] input, double[][] output)
        {
            double error = 0.0;

            // run learning procedure for all samples
            for (int i = 0; i < input.Length; i++)
            {
                error += Run(input[i], output[i]);
            }

            // return summary error
            return error;
        }

        private double CalculateError(double[] desiredOutput)
        {
            // current and the next layers
            Layer layer, layerNext;
            // current and the next errors arrays
            Vector<double>[] errors, errorsNext;
            // error values
            double error = 0;
            // layers count
            int layersCount = network.Layers.Length;

            // vecrorize output
            Vector<double>[] desiredOutputVector = new Vector<double>[desiredOutput.Length / 4];
            for (int i = 0; i < desiredOutputVector.Length; i++)
            {
                desiredOutputVector[i] = new Vector<double>(desiredOutput, i * 4);
            }

            // assume, that all neurons of the network have the same activation function
            IActivationFunction function = (network.Layers[0].Neurons[0] as ActivationNeuron).ActivationFunction;

            // calculate error values for the last layer first
            layer = network.Layers[layersCount - 1];
            errors = neuronErrors[layersCount - 1];
            int outputLoopCnt = layer.Neurons.Length / 4;
            Vector<double>[] errorWork = new Vector<double>[outputLoopCnt];
            Parallel.For(0, outputLoopCnt, new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
            {
                // neuron's output value
                double[] vectorInitTemp = new double[4];
                vectorInitTemp[0] = layer.Neurons[i * 4 + 0].Output;
                vectorInitTemp[1] = layer.Neurons[i * 4 + 1].Output;
                vectorInitTemp[2] = layer.Neurons[i * 4 + 2].Output;
                vectorInitTemp[3] = layer.Neurons[i * 4 + 3].Output;
                Vector<double> output = new Vector<double>(vectorInitTemp);

                // error of the neuron
                Vector<double> e = desiredOutputVector[i] - output;

                // error multiplied with activation function's derivative
                vectorInitTemp[0] = function.Derivative2(output[0]);
                vectorInitTemp[1] = function.Derivative2(output[1]);
                vectorInitTemp[2] = function.Derivative2(output[2]);
                vectorInitTemp[3] = function.Derivative2(output[3]);
                Vector<double> derivative = new Vector<double>(vectorInitTemp);
                errors[i] = e * derivative;

                // squre the error and sum it
                errorWork[i] = (e * e);
            });

            // エラー積算値の算出
            Vector<double> errorTemp = Vector<double>.Zero;
            for (int i = 0;i < outputLoopCnt;i++)
            {
                errorTemp += errorWork[i];
            }
            error = errorTemp[0] + errorTemp[1] + errorTemp[2] + errorTemp[3];

            // calculate error values for other layers
            for (int j = layersCount - 2; j >= 0; j--)
            {
                layer = network.Layers[j];
                layerNext = network.Layers[j + 1];
                errors = neuronErrors[j];
                errorsNext = neuronErrors[j + 1];

                // for all neurons of the layer
                int nextNyuronsLengthTemp = layerNext.Neurons.Length / 4;
                Parallel.For(0, (layer.Neurons.Length / 4), new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
                {
                    double[] vectorInitTemp = new double[4];
                    Vector<double> sum = Vector<double>.Zero;

                    // for all neurons of the next layer
                    for (int k = 0; k < nextNyuronsLengthTemp; k++)
                    {
                        for (int l = 0; l < 4; l++)
                        {
                            vectorInitTemp[0] = layerNext.Neurons[k * 4 + l].Weights[i * 4 + 0];
                            vectorInitTemp[1] = layerNext.Neurons[k * 4 + l].Weights[i * 4 + 1];
                            vectorInitTemp[2] = layerNext.Neurons[k * 4 + l].Weights[i * 4 + 2];
                            vectorInitTemp[3] = layerNext.Neurons[k * 4 + l].Weights[i * 4 + 3];
                            Vector<double> weightsTemp = new Vector<double>(vectorInitTemp);
                            sum += errorsNext[k] * weightsTemp;
                        }
                    }

                    vectorInitTemp[0] = function.Derivative2(layer.Neurons[i * 4 + 0].Output);
                    vectorInitTemp[1] = function.Derivative2(layer.Neurons[i * 4 + 1].Output);
                    vectorInitTemp[2] = function.Derivative2(layer.Neurons[i * 4 + 2].Output);
                    vectorInitTemp[3] = function.Derivative2(layer.Neurons[i * 4 + 3].Output);
                    Vector<double> derivative = new Vector<double>(vectorInitTemp);

                    errors[i] = sum * derivative;
                });
            }

            // return squared error of the last layer divided by 2
            return error / 2.0;
        }

        private void CalculateUpdates(double[] input)
        {
            // current and previous layers
            Layer layer, layerPrev;

            // layer's weights updates
            Vector<double>[][] layerWeightsUpdates;

            // layer's thresholds updates
            Vector<double>[] layerThresholdUpdates;

            // layer's error
            Vector<double>[] errors;

            // vecrorize input
            Vector<double>[] inputVector = new Vector<double>[input.Length / 4];
            Parallel.For(0, inputVector.Length, new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
            {
                inputVector[i] = new Vector<double>(input, i * 4);
            });

            // 1 - calculate updates for the first layer
            layer = network.Layers[0];
            errors = neuronErrors[0];
            layerWeightsUpdates = weightsUpdates[0];
            layerThresholdUpdates = thresholdsUpdates[0];

            // cache for frequently used values
            //double cachedMomentum = learningRate * momentum;
            //double cached1mMomentum = learningRate * (1 - momentum);
            Vector<double> cachedMomentum = Vector.Multiply(Vector<double>.One, learningRate * momentum);
            Vector<double> cached1mMomentum = Vector.Multiply(Vector<double>.One, learningRate * (1 - momentum));

            // for each neuron of the layer
            Parallel.For(0, (layer.Neurons.Length / 4), new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
            {
                Vector<double> cachedError = Vector.Multiply(cached1mMomentum, errors[i]);
                Vector<double>[][] neuronWeightUpdates = new Vector<double>[4][];
                neuronWeightUpdates[0] = layerWeightsUpdates[i * 4 + 0];
                neuronWeightUpdates[1] = layerWeightsUpdates[i * 4 + 1];
                neuronWeightUpdates[2] = layerWeightsUpdates[i * 4 + 2];
                neuronWeightUpdates[3] = layerWeightsUpdates[i * 4 + 3];

                // for each weight of the neuron
                int neuronWeightUpdatesTemp = neuronWeightUpdates[0].Length;
                for (int j = 0; j < neuronWeightUpdatesTemp; j++)
                {
                    // calculate weight update
                    for (int k = 0;k < 4;k++)
                    {
                        neuronWeightUpdates[k][j] = Vector.Multiply(cachedMomentum, neuronWeightUpdates[k][j]) + Vector.Multiply(cachedError[k], inputVector[j]);
                    }
                }

                // calculate treshold update
                layerThresholdUpdates[i] = Vector.Multiply(cachedMomentum, layerThresholdUpdates[i]) + cachedError;
            });


            // 2 - for all other layers
            int layersLengthTemp = network.Layers.Length;
            for (int k = 1; k < layersLengthTemp; k++)
            {
                layerPrev = network.Layers[k - 1];
                layer = network.Layers[k];
                errors = neuronErrors[k];
                layerWeightsUpdates = weightsUpdates[k];
                layerThresholdUpdates = thresholdsUpdates[k];

                // for each neuron of the layer
                int neuronWeightUpdatesTemp = layerWeightsUpdates[0].Length;
                Parallel.For(0, (layer.Neurons.Length / 4), new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
                {
                    double[] vectorInitTemp = new double[4];
                    Vector<double> cachedError = Vector.Multiply(cached1mMomentum, errors[i]);
                    Vector<double>[][] neuronWeightUpdates = new Vector<double>[4][];
                    neuronWeightUpdates[0] = layerWeightsUpdates[i * 4 + 0];
                    neuronWeightUpdates[1] = layerWeightsUpdates[i * 4 + 1];
                    neuronWeightUpdates[2] = layerWeightsUpdates[i * 4 + 2];
                    neuronWeightUpdates[3] = layerWeightsUpdates[i * 4 + 3];

                    // for each synapse of the neuron
                    for (int j = 0; j < neuronWeightUpdatesTemp; j++)
                    {
                        // calculate weight update
                        vectorInitTemp[0] = layerPrev.Neurons[j * 4 + 0].Output;
                        vectorInitTemp[1] = layerPrev.Neurons[j * 4 + 1].Output;
                        vectorInitTemp[2] = layerPrev.Neurons[j * 4 + 2].Output;
                        vectorInitTemp[3] = layerPrev.Neurons[j * 4 + 3].Output;
                        Vector<double> neuronsOutput = new Vector<double>(vectorInitTemp);
                        for (int l = 0; l < 4; l++)
                        {
                            neuronWeightUpdates[l][j] = Vector.Multiply(cachedMomentum, neuronWeightUpdates[l][j]) + Vector.Multiply(cachedError[l], neuronsOutput);
                        }
                    }

                    // calculate treshold update
                    layerThresholdUpdates[i] = Vector.Multiply(cachedMomentum, layerThresholdUpdates[i]) + cachedError;
                });
            }
        }

        private void UpdateNetwork()
        {
            // current layer
            Layer layer;
            // layer's weights updates
            Vector<double>[][] layerWeightsUpdates;
            // layer's thresholds updates
            Vector<double>[] layerThresholdUpdates;

            // for each layer of the network
            int layersLengthTemp = network.Layers.Length;
            for (int i = 0; i < layersLengthTemp; i++)
            {
                layer = network.Layers[i];
                layerWeightsUpdates = weightsUpdates[i];
                layerThresholdUpdates = thresholdsUpdates[i];

                // 誘導変数の使用
                int weightsLengthTemp = layer.Neurons[0].Weights.Length / 4;

                // for each neuron of the layer
                Parallel.For(0, (layer.Neurons.Length / 4), j =>
                {
                    ActivationNeuron[] neuron = new ActivationNeuron[4];
                    neuron[0] = layer.Neurons[j * 4 + 0] as ActivationNeuron;
                    neuron[1] = layer.Neurons[j * 4 + 1] as ActivationNeuron;
                    neuron[2] = layer.Neurons[j * 4 + 2] as ActivationNeuron;
                    neuron[3] = layer.Neurons[j * 4 + 3] as ActivationNeuron;

                    Vector<double>[][] neuronWeightUpdates = new Vector<double>[4][];
                    neuronWeightUpdates[0] = layerWeightsUpdates[j * 4 + 0];
                    neuronWeightUpdates[1] = layerWeightsUpdates[j * 4 + 1];
                    neuronWeightUpdates[2] = layerWeightsUpdates[j * 4 + 2];
                    neuronWeightUpdates[3] = layerWeightsUpdates[j * 4 + 3];

                    // for each weight of the neuron
                    for (int k = 0; k < weightsLengthTemp; k++)
                    {
                        for (int l = 0; l < 4; l++)
                        {
                            // update weight
                            neuron[l].Weights[k * 4 + 0] += neuronWeightUpdates[l][k][0];
                            neuron[l].Weights[k * 4 + 1] += neuronWeightUpdates[l][k][1];
                            neuron[l].Weights[k * 4 + 2] += neuronWeightUpdates[l][k][2];
                            neuron[l].Weights[k * 4 + 3] += neuronWeightUpdates[l][k][3];
                        }
                    }

                    // update treshold
                    neuron[0].Threshold += layerThresholdUpdates[j][0];
                    neuron[1].Threshold += layerThresholdUpdates[j][1];
                    neuron[2].Threshold += layerThresholdUpdates[j][2];
                    neuron[3].Threshold += layerThresholdUpdates[j][3];
                });
            }
        }
    }
}

Download:SimdBackPropagationLearning

ニューラルネットワークを使って回帰分析モデルを作る

地図上の座標に対して大体の緯度経度を求めたかったので、Accord.NETのニューラルネットワークを使った機械学習で回帰分析モデルを作って対処してみた。

本当はきっちり計算式を求めて算出すれば良いのだろうけれど、平面を球面に投射する変換式とか、測地系の変換とか私の数学センスでは手が出ない。そもそも元の地図がどんな図法かもしらないし・・・。と言うわけで、思考を放棄して機械に頼ることにした。

1. 学習元データを作成する

地図上の特徴的な地形の場所の座標と、緯度経度を対にした学習データを作成する。ニューラルネットワークでは学習データのない入力に対して、どのような出力が発生するか予測困難という特徴がある。変換対象内に満遍なく学習サンプルをとるのは当然として、実用上変換したい地図上の座標の上下左右の隅の辺りの学習サンプルも加えるのが重要。
ニューロンからの出力は０～１の範囲になるので、学習元データの出力がこの範囲に収まるように適当に剰余算して丸めておきます。

2.学習アルゴリズムの選択

BackPropagationLearning（誤差逆伝搬法）による学習だと誤差の収束に時間がかかるので、LevenbergMarquardtLearning（レーベンバーグ・マーカート法）を使います。

3.最適な中間層の数を推定する

学習サンプルを使ってニューラルネットワークに学習させた後、学習済ネットワークに学習サンプルを与えて計算をさせ出力の差を積分することで、学習誤差を求める。これを中間層の数を増減させながら繰り返し、学習誤差が大きくならない、最小の中間層の数を求めます。
中間層が多いほどより複雑なデータを学習することがでますが、学習サンプル間を補間するときに発振してしまう可能性が高くなります。可能な限り少ない中間層の数で学習させると、なだらかな曲線で補間される可能性が高くなります。

4.最適な学習済ニューロンを作る

最適と思われる中間層の数を確定したら、学習データを元に学習済ニューラルネットワークを複数作ります。ニューラルネットワークは初期値として与える乱数データによって、学習結果がどのような状態に収束するか変わります。したがって良好な学習結果を得るためには初期状態を変えながら複数の学習結果を得て、その結果同士を比較する必要があります。
今回私は学習誤差の小さい物を最適と判断したました。用途によっては単純に誤差の小さい物では無く、判定方法を変える必要があると思います。

5.使う

これで一応は学習モデルが出来たので、実際に使って異常な変換とか発生しないか確かめます。使いたい範囲内で変な発振してないので大丈夫っぽい。

AForge.NETのBackPropagationLearningをマルチスレッドに書き換える

AForge.NETで遊んでいる。Back Propagation Learning事態は20年以上前からあるのだが、当時は少ないメモリをやり繰りしながら一から実装するほかなかった。お手軽にライブラリでできるって素晴らしいよね。ただ残念なことにAForge.NETのBack Propagation LearningはMulti Threadに対応しないので、Parallel.Forを使って修正してみた。
マルチスレッドに対応させるための書き換えも.Net Framework 4.xではとてもお手軽にできる。今時マルチスレッドに動作しないのってすごく罪だよね。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

using AForge.Neuro;
using AForge.Neuro.Learning;

namespace RemoveRule
{
    class MultiThreadBackPropagationLearning : BackPropagationLearning
    {
        // network to teach
        private ActivationNetwork network;

        // learning rate
        private double learningRate = 0.1;

        // momentum
        private double momentum = 0.0;

        // neuron's errors
        private double[][] neuronErrors = null;

        // weight's updates
        private double[][][] weightsUpdates = null;

        // threshold's updates
        private double[][] thresholdsUpdates = null;

        public new double  LearningRate
        {
            get { return learningRate; }
            set
            {
                learningRate = Math.Max(0.0, Math.Min(1.0, value));
            }
        }

        public new double Momentum
        {
            get { return momentum; }
            set
            {
                momentum = Math.Max(0.0, Math.Min(1.0, value));
            }
        }

        public MultiThreadBackPropagationLearning(ActivationNetwork network) : base(network)
        {
            this.network = network;

            // create error and deltas arrays
            neuronErrors = new double[network.Layers.Length][];
            weightsUpdates = new double[network.Layers.Length][][];
            thresholdsUpdates = new double[network.Layers.Length][];

            // initialize errors and deltas arrays for each layer
            for (int i = 0; i < network.Layers.Length; i++)
            {
                Layer layer = network.Layers[i];

                neuronErrors[i] = new double[layer.Neurons.Length];
                weightsUpdates[i] = new double[layer.Neurons.Length][];
                thresholdsUpdates[i] = new double[layer.Neurons.Length];

                // for each neuron
                for (int j = 0; j < weightsUpdates[i].Length; j++)
                {
                    weightsUpdates[i][j] = new double[layer.InputsCount];
                }
            }
        }

        public new double Run(double[] input, double[] output)
        {
            // compute the network's output
            network.Compute(input);

            // calculate network error
            double error = CalculateError(output);

            // calculate weights updates
            CalculateUpdates(input);

            // update the network
            UpdateNetwork();

            return error;

        }

        public new double RunEpoch(double[][] input, double[][] output)
        {
            double error = 0.0;

            // run learning procedure for all samples
            for (int i = 0; i < input.Length; i++)
            {
                error += Run(input[i], output[i]);
            }

            // return summary error
            return error;
        }

        private double CalculateError(double[] desiredOutput)
        {
            // current and the next layers
            Layer layer, layerNext;
            // current and the next errors arrays
            double[] errors, errorsNext;
            // error values
            double error = 0;
            // neuron's output value
            double output;
            // layers count
            int layersCount = network.Layers.Length;

            // assume, that all neurons of the network have the same activation function
            IActivationFunction function = (network.Layers[0].Neurons[0] as ActivationNeuron).ActivationFunction;

            // calculate error values for the last layer first
            layer = network.Layers[layersCount - 1];
            errors = neuronErrors[layersCount - 1];
            Parallel.For(0, layer.Neurons.Length, i =>
            {
                output = layer.Neurons[i].Output;

                // error of the neuron
                double e = desiredOutput[i] - output;

                // error multiplied with activation function's derivative
                double derivative;
                lock (function)
                {
                    derivative = function.Derivative2(output);
                }
                errors[i] = e * derivative;

                // squre the error and sum it
                Monitor.Enter(this);
                error += (e * e);
                Monitor.Exit(this);
            });

            // calculate error values for other layers
            for (int j = layersCount - 2; j >= 0; j--)
            {
                layer = network.Layers[j];
                layerNext = network.Layers[j + 1];
                errors = neuronErrors[j];
                errorsNext = neuronErrors[j + 1];

                // for all neurons of the layer
                Parallel.For(0, layer.Neurons.Length, i =>
                {
                    double sum = 0.0;

                    // for all neurons of the next layer
                    for (int k = 0; k < layerNext.Neurons.Length; k++)
                    {
                        sum += errorsNext[k] * layerNext.Neurons[k].Weights[i];
                    }

                    double derivative;
                    lock (function)
                    {
                        derivative = function.Derivative2(layer.Neurons[i].Output);
                    }
                    errors[i] = sum * derivative;
                });
            }

            // return squared error of the last layer divided by 2
            return error / 2.0;
        }

        private void CalculateUpdates(double[] input)
        {
            // current neuron
            Neuron neuron;

            // current and previous layers
            Layer layer, layerPrev;

            // layer's weights updates
            double[][] layerWeightsUpdates;

            // layer's thresholds updates
            double[] layerThresholdUpdates;

            // layer's error
            double[] errors;

            // error value
            // double           error;

            // 1 - calculate updates for the first layer
            layer = network.Layers[0];
            errors = neuronErrors[0];
            layerWeightsUpdates = weightsUpdates[0];
            layerThresholdUpdates = thresholdsUpdates[0];

            // cache for frequently used values
            double cachedMomentum = learningRate * momentum;
            double cached1mMomentum = learningRate * (1 - momentum);

            // for each neuron of the layer
            Parallel.For(0, layer.Neurons.Length, i =>
            {
                neuron = layer.Neurons[i];
                double cachedError = errors[i] * cached1mMomentum;
                double[] neuronWeightUpdates = layerWeightsUpdates[i];

                // for each weight of the neuron
                for (int j = 0; j < neuronWeightUpdates.Length; j++)
                {
                    // calculate weight update
                    neuronWeightUpdates[j] = cachedMomentum * neuronWeightUpdates[j] + cachedError * input[j];
                }

                // calculate treshold update
                layerThresholdUpdates[i] = cachedMomentum * layerThresholdUpdates[i] + cachedError;
            });


            // 2 - for all other layers
            for (int k = 1; k < network.Layers.Length; k++)
            {
                layerPrev = network.Layers[k - 1];
                layer = network.Layers[k];
                errors = neuronErrors[k];
                layerWeightsUpdates = weightsUpdates[k];
                layerThresholdUpdates = thresholdsUpdates[k];

                // for each neuron of the layer
                Parallel.For(0, layer.Neurons.Length, i =>
                {
                    neuron = layer.Neurons[i];
                    double cachedError = errors[i] * cached1mMomentum;
                    double[] neuronWeightUpdates = layerWeightsUpdates[i];

                    // for each synapse of the neuron
                    for (int j = 0; j < neuronWeightUpdates.Length; j++)
                    {
                        // calculate weight update
                        neuronWeightUpdates[j] = cachedMomentum * neuronWeightUpdates[j] + cachedError * layerPrev.Neurons[j].Output;
                    }

                    // calculate treshold update
                    layerThresholdUpdates[i] = cachedMomentum * layerThresholdUpdates[i] + cachedError;
                });
            }
        }

        private void UpdateNetwork()
        {
            // current layer
            Layer layer;
            // layer's weights updates
            double[][] layerWeightsUpdates;
            // layer's thresholds updates
            double[] layerThresholdUpdates;

            // for each layer of the network
            for (int i = 0; i < network.Layers.Length; i++)
            {
                layer = network.Layers[i];
                layerWeightsUpdates = weightsUpdates[i];
                layerThresholdUpdates = thresholdsUpdates[i];

                // for each neuron of the layer
                Parallel.For(0, layer.Neurons.Length, j =>
                {
                    ActivationNeuron neuron = layer.Neurons[j] as ActivationNeuron;
                    double[]  neuronWeightUpdates = layerWeightsUpdates[j];

                    // for each weight of the neuron
                    for (int k = 0; k < neuron.Weights.Length; k++)
                    {
                        // update weight
                        neuron.Weights[k] += neuronWeightUpdates[k];
                    }

                    // update treshold
                    neuron.Threshold += layerThresholdUpdates[j];
                });
            }
        }
    }
}

元々のBackPropagationLearningクラスでメンバ変数がprivate宣言されていたのがかなり残念。派生したクラスで挙動をオーバーライト出来ないので、丸ごと同じクラスを実装する他に無いのだ。こうなると、そもそもBackPropagationLearningを継承する必要すらなかったかもしれない。

Download MultiThreadBackPropagationLearning.zip