Stable Diffusion+Windows 10+AMD GPUの環境で動かす

Stable DiffusionをWindows 10とAMD GPU上で動作させている記事、Running Stable Diffusion on Windows with an AMD GPUを見つけたので、実践。以下は作成してみたサンプル。



・メモリ6GB以上のAMD GPU
・Hugging Face(Stable Diffusionの学習済みモデルを公開している)のアカウント

と言うわけでPythonのインストール。PythonはMicrosoft StoreからPython 3.10をインストールしてしまうのが手がかからない。

続いてGitのインストール。私はVisual Studioと一緒にインストールされてしまっているが、単独でインストールするならGit for Windowsをインストールするのが良いかと思う。

さらにHugging Faceにユーザー登録する。ユーザー登録自体は無料。



Microsoft’s DirectMLに対応したOnnx runtimeをからonnxruntime-directmlをダウンロードする。ダウンロードするランタイムはPythonのバージョンによって異なる。Python 3.10.xをインストールしているなら、cp310(onnxruntime_directml-1.12.0-cp310-cp310-win_amd64.whl)をダウンロードする。


python -m venv ./virtualenv
./virtualenv/Scripts/Activate.ps1 または virtualenv\Scripts\activate.bat
pip install diffusers==0.3.0
pip install transformers
pip install onnxruntime
pip install protobuf<3.20.x
pip install onnx
pip install pathToYourDownloadedFile/ort_nightly_whatever_version_you_got.whl --force-reinstall

Stable Diffusionの学習済みモデルをダウンロードするには、ライセンス条項に同意する必要があります。ライセンス条項を確認した上で、Hugging Faceののページにアクセスして、「 have read the License and agree with its terms」にチェックをして、Access Repositoryをクリック。

Hugging FaceのAccess Tokenを発行する必要があります。Hugging FaceのWEBページ右上のユーザーアイコンから「Settings→Access Tokens」に進み、Access Tokenを発行します。


huggingface-cli.exe login

Token:とプロンプトが表示されるので、先ほど発行したAccess Tokenを入力します。「Your token has been saved to ~」と表示されれば大丈夫です。

Stable DiffusionをOnnixに変換するためのスクリプトを「」からダウンロードします。


python --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"



from diffusers import StableDiffusionOnnxPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")

prompt = "A happy celebrating robot on a mountaintop, happy, landscape, dramatic lighting, art by artgerm greg rutkowski alphonse mucha, 4k uhd'"

image = pipe(prompt).images[0]"output.png")


from diffusers import StableDiffusionOnnxPipeline
import numpy as np

def get_latents_from_seed(seed: int, width: int, height:int) -> np.ndarray:
    # 1 is batch size
    latents_shape = (1, 4, height // 8, width // 8)
    # Gotta use numpy instead of torch, because torch's randn() doesn't support DML
    rng = np.random.default_rng(seed)
    image_latents = rng.standard_normal(latents_shape).astype(np.float32)
    return image_latents

pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
prompt: Union[str, List[str]],
height: Optional[int] = 512,
width: Optional[int] = 512,
num_inference_steps: Optional[int] = 50,
guidance_scale: Optional[float] = 7.5, # This is also sometimes called the CFG value
eta: Optional[float] = 0.0,
latents: Optional[np.ndarray] = None,
output_type: Optional[str] = "pil",

seed = 50033
# Generate our own latents so that we can provide a seed.
latents = get_latents_from_seed(seed, 512, 512)
prompt = "A happy celebrating robot on a mountaintop, happy, landscape, dramatic lighting, art by artgerm greg rutkowski alphonse mucha, 4k uhd"
image = pipe(prompt, num_inference_steps=25, guidance_scale=13, latents=latents).images[0]"output.png")






真っ黒な画像を生成することが頻繁にあります。標準で組み込まれているNSFW(職場閲覧注意)フィルタにブロックされていることが原因です。以下の様に1行追加する事でNSFWを無効に出来るようですが、Stable Diffusionの利用規約上NSFWを無効にして生成した画像の公開は慎重に行いましょう。NSFWを無効にすると相当に高速になるという副次的効果もあるようです。

pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", device_map="auto", provider="DmlExecutionProvider", max_memory=max_memory_mapping)
pipe.safety_checker = lambda images, **kwargs: (images, [False] * len(images))

その他、細かなHackは続編のStable Diffusion Updatesに・・・・

UbuntuへのCNTKインストールで「E: Package ‘libpng12-0’ has no installation candidate」

Ubuntu 16.xへのCNTKをインストールする場合、install-cntk.shを実行すると次のエラーが発生します。
E: Package ‘libpng12-0’ has no installation candidate

これはlibpng12が標準のUbuntuパッケージには含まれていないためです。次のURL(の指示に従って、/etc/apt/sources.listにdeb xenial main を追記してから、install-cntk.shを実行します。

the Computational Network Toolkit by Microsoft Research(CNTK)のインストール



CNTKを動かすにはCUDA7.0が必要になるので、NVIDIAのホームページからCUDA 7.0 (をダウンロードしてインストールします。




cd c:\local\cntk\Scripts\install\windows
.\install.ps1 -execute



1 - I agree and want to continue
Q - Quit the installation process


Do you want to continue? (y/n)





Microsoft Cognitive Services API公開

Microsoft Cognitive Services APIが公開されました。Windows 10で追加された画像認識や顔認証、モーション認識、Cortanaの音声認識、自然言語認識、音声合成などのもとになっている各種強力な機能を含む多くのWEB APIが新たに公開されました。ちょっと前まで機械学習に関する知識を身に着け、膨大なサンプルデータと計算資源を自ら用意して計算させなくてはできなかった事が、Web API呼び出しだけで実装できるようになったのは、破壊的に大きな変化ですよ。







せっかく学習したのだし多少は実益に・・・・ということで、機械学習(Deep Learning)でtotoの予測を立ててみた。
過去の対戦成績のデータはJ.Leagu Data Siteから頂いた。これだけではつまらないので、気象庁の過去の気象データ・ダウンロードから対戦時の気象情報をダウンロードした。これを元に学習をさせてみる。



広島 x G大阪 -> 広島勝
磐田 x 名古屋 -> 名古屋勝
広島 x 川崎F -> 川崎F勝
鳥栖 x 福岡 -> 鳥栖勝
柏 x 浦和 -> 浦和勝
湘南 x 新潟 -> 新潟勝
神戸 x 甲府 -> 神戸勝
横浜FM x 仙台 -> 引分
FC東京 x 大宮 -> FC東京勝
G大阪 x 鹿島 -> G大阪勝
東京V x 札幌 -> 札幌勝
金沢 x 長崎 -> 金沢勝
清水 x 愛媛 -> 引分
山口 x 岡山 -> 引分
熊本 x 松本 -> 引分
群馬 x 岐阜 -> 群馬勝
町田 x C大阪 -> 引分
横浜FC x 讃岐 -> 横浜FC勝
千葉 x 徳島 -> 徳島勝
京都 x 水戸 -> 水戸勝
北九州 x 山形 -> 北九州勝


the Computational Network Toolkit by Microsoft Research(CNTK)のインストール



CNTKを動かすにはCUDA7.0が必要になるので、NVIDIAのホームページからCUDA 7.0 (をダウンロードしてインストールします。

CNTKを動かすにはVisual Studio 2013のランタイムDLLが必要になるので、MicrosoftのホームページからVisual C++ Redistributable Packages for Visual Studio 2013(をダウンロードしてインストールします。

CNTKを複数台のPCを使った分散環境で動作させるにはMS-MPI SDK(が必要になります。なくともスタンドアロンでは動くので、取りあえずスルーします。



[解凍先]\cntk\cntk\cntk.exe configFile=Config\Simple.cntk


参考:Setup CNTK on your machine



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
using System.Numerics;
using System.Diagnostics;

using AForge.Neuro;
using AForge.Neuro.Learning;

namespace LernDetectImage
    class SimdBackPropagationLearning : BackPropagationLearning
        // network to teach
        private ActivationNetwork network;

        // learning rate
        private double learningRate = 0.1;

        // momentum
        private double momentum = 0.0;

        // neuron's errors
        private Vector<double>[][] neuronErrors = null;

        // weight's updates
        private Vector<double>[][][] weightsUpdates = null;

        // threshold's updates
        private Vector<double>[][] thresholdsUpdates = null;

        public new double LearningRate
            get { return learningRate; }
                learningRate = Math.Max(0.0, Math.Min(1.0, value));

        public new double Momentum
            get { return momentum; }
                momentum = Math.Max(0.0, Math.Min(1.0, value));

        public SimdBackPropagationLearning(ActivationNetwork network) : base(network)
   = network;

            // create error and deltas arrays
            neuronErrors = new Vector<double>[network.Layers.Length][];
            weightsUpdates = new Vector<double>[network.Layers.Length][][];
            thresholdsUpdates = new Vector<double>[network.Layers.Length][];

            // initialize errors and deltas arrays for each layer
            for (int i = 0; i < network.Layers.Length; i++)
                Layer layer = network.Layers[i];

                neuronErrors[i] = new Vector<double>[layer.Neurons.Length / 4];
                weightsUpdates[i] = new Vector<double>[layer.Neurons.Length][];
                thresholdsUpdates[i] = new Vector<double>[layer.Neurons.Length / 4];

                // for each neuron
                for (int j = 0; j < weightsUpdates[i].Length; j++)
                    weightsUpdates[i][j] = new Vector<double>[layer.InputsCount / 4];

        public new double Run(double[] input, double[] output)
            // compute the network's output

            // calculate network error
            double error = CalculateError(output);

            // calculate weights updates

            // update the network

            return error;


        public new double RunEpoch(double[][] input, double[][] output)
            double error = 0.0;

            // run learning procedure for all samples
            for (int i = 0; i < input.Length; i++)
                error += Run(input[i], output[i]);

            // return summary error
            return error;

        private double CalculateError(double[] desiredOutput)
            // current and the next layers
            Layer layer, layerNext;
            // current and the next errors arrays
            Vector<double>[] errors, errorsNext;
            // error values
            double error = 0;
            // layers count
            int layersCount = network.Layers.Length;

            // vecrorize output
            Vector<double>[] desiredOutputVector = new Vector<double>[desiredOutput.Length / 4];
            for (int i = 0; i < desiredOutputVector.Length; i++)
                desiredOutputVector[i] = new Vector<double>(desiredOutput, i * 4);

            // assume, that all neurons of the network have the same activation function
            IActivationFunction function = (network.Layers[0].Neurons[0] as ActivationNeuron).ActivationFunction;

            // calculate error values for the last layer first
            layer = network.Layers[layersCount - 1];
            errors = neuronErrors[layersCount - 1];
            int outputLoopCnt = layer.Neurons.Length / 4;
            Vector<double>[] errorWork = new Vector<double>[outputLoopCnt];
            Parallel.For(0, outputLoopCnt, new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
                // neuron's output value
                double[] vectorInitTemp = new double[4];
                vectorInitTemp[0] = layer.Neurons[i * 4 + 0].Output;
                vectorInitTemp[1] = layer.Neurons[i * 4 + 1].Output;
                vectorInitTemp[2] = layer.Neurons[i * 4 + 2].Output;
                vectorInitTemp[3] = layer.Neurons[i * 4 + 3].Output;
                Vector<double> output = new Vector<double>(vectorInitTemp);

                // error of the neuron
                Vector<double> e = desiredOutputVector[i] - output;

                // error multiplied with activation function's derivative
                vectorInitTemp[0] = function.Derivative2(output[0]);
                vectorInitTemp[1] = function.Derivative2(output[1]);
                vectorInitTemp[2] = function.Derivative2(output[2]);
                vectorInitTemp[3] = function.Derivative2(output[3]);
                Vector<double> derivative = new Vector<double>(vectorInitTemp);
                errors[i] = e * derivative;

                // squre the error and sum it
                errorWork[i] = (e * e);

            // エラー積算値の算出
            Vector<double> errorTemp = Vector<double>.Zero;
            for (int i = 0;i < outputLoopCnt;i++)
                errorTemp += errorWork[i];
            error = errorTemp[0] + errorTemp[1] + errorTemp[2] + errorTemp[3];

            // calculate error values for other layers
            for (int j = layersCount - 2; j >= 0; j--)
                layer = network.Layers[j];
                layerNext = network.Layers[j + 1];
                errors = neuronErrors[j];
                errorsNext = neuronErrors[j + 1];

                // for all neurons of the layer
                int nextNyuronsLengthTemp = layerNext.Neurons.Length / 4;
                Parallel.For(0, (layer.Neurons.Length / 4), new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
                    double[] vectorInitTemp = new double[4];
                    Vector<double> sum = Vector<double>.Zero;

                    // for all neurons of the next layer
                    for (int k = 0; k < nextNyuronsLengthTemp; k++)
                        for (int l = 0; l < 4; l++)
                            vectorInitTemp[0] = layerNext.Neurons[k * 4 + l].Weights[i * 4 + 0];
                            vectorInitTemp[1] = layerNext.Neurons[k * 4 + l].Weights[i * 4 + 1];
                            vectorInitTemp[2] = layerNext.Neurons[k * 4 + l].Weights[i * 4 + 2];
                            vectorInitTemp[3] = layerNext.Neurons[k * 4 + l].Weights[i * 4 + 3];
                            Vector<double> weightsTemp = new Vector<double>(vectorInitTemp);
                            sum += errorsNext[k] * weightsTemp;

                    vectorInitTemp[0] = function.Derivative2(layer.Neurons[i * 4 + 0].Output);
                    vectorInitTemp[1] = function.Derivative2(layer.Neurons[i * 4 + 1].Output);
                    vectorInitTemp[2] = function.Derivative2(layer.Neurons[i * 4 + 2].Output);
                    vectorInitTemp[3] = function.Derivative2(layer.Neurons[i * 4 + 3].Output);
                    Vector<double> derivative = new Vector<double>(vectorInitTemp);

                    errors[i] = sum * derivative;

            // return squared error of the last layer divided by 2
            return error / 2.0;

        private void CalculateUpdates(double[] input)
            // current and previous layers
            Layer layer, layerPrev;

            // layer's weights updates
            Vector<double>[][] layerWeightsUpdates;

            // layer's thresholds updates
            Vector<double>[] layerThresholdUpdates;

            // layer's error
            Vector<double>[] errors;

            // vecrorize input
            Vector<double>[] inputVector = new Vector<double>[input.Length / 4];
            Parallel.For(0, inputVector.Length, new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
                inputVector[i] = new Vector<double>(input, i * 4);

            // 1 - calculate updates for the first layer
            layer = network.Layers[0];
            errors = neuronErrors[0];
            layerWeightsUpdates = weightsUpdates[0];
            layerThresholdUpdates = thresholdsUpdates[0];

            // cache for frequently used values
            //double cachedMomentum = learningRate * momentum;
            //double cached1mMomentum = learningRate * (1 - momentum);
            Vector<double> cachedMomentum = Vector.Multiply(Vector<double>.One, learningRate * momentum);
            Vector<double> cached1mMomentum = Vector.Multiply(Vector<double>.One, learningRate * (1 - momentum));

            // for each neuron of the layer
            Parallel.For(0, (layer.Neurons.Length / 4), new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
                Vector<double> cachedError = Vector.Multiply(cached1mMomentum, errors[i]);
                Vector<double>[][] neuronWeightUpdates = new Vector<double>[4][];
                neuronWeightUpdates[0] = layerWeightsUpdates[i * 4 + 0];
                neuronWeightUpdates[1] = layerWeightsUpdates[i * 4 + 1];
                neuronWeightUpdates[2] = layerWeightsUpdates[i * 4 + 2];
                neuronWeightUpdates[3] = layerWeightsUpdates[i * 4 + 3];

                // for each weight of the neuron
                int neuronWeightUpdatesTemp = neuronWeightUpdates[0].Length;
                for (int j = 0; j < neuronWeightUpdatesTemp; j++)
                    // calculate weight update
                    for (int k = 0;k < 4;k++)
                        neuronWeightUpdates[k][j] = Vector.Multiply(cachedMomentum, neuronWeightUpdates[k][j]) + Vector.Multiply(cachedError[k], inputVector[j]);

                // calculate treshold update
                layerThresholdUpdates[i] = Vector.Multiply(cachedMomentum, layerThresholdUpdates[i]) + cachedError;

            // 2 - for all other layers
            int layersLengthTemp = network.Layers.Length;
            for (int k = 1; k < layersLengthTemp; k++)
                layerPrev = network.Layers[k - 1];
                layer = network.Layers[k];
                errors = neuronErrors[k];
                layerWeightsUpdates = weightsUpdates[k];
                layerThresholdUpdates = thresholdsUpdates[k];

                // for each neuron of the layer
                int neuronWeightUpdatesTemp = layerWeightsUpdates[0].Length;
                Parallel.For(0, (layer.Neurons.Length / 4), new ParallelOptions { MaxDegreeOfParallelism = 16 }, i =>
                    double[] vectorInitTemp = new double[4];
                    Vector<double> cachedError = Vector.Multiply(cached1mMomentum, errors[i]);
                    Vector<double>[][] neuronWeightUpdates = new Vector<double>[4][];
                    neuronWeightUpdates[0] = layerWeightsUpdates[i * 4 + 0];
                    neuronWeightUpdates[1] = layerWeightsUpdates[i * 4 + 1];
                    neuronWeightUpdates[2] = layerWeightsUpdates[i * 4 + 2];
                    neuronWeightUpdates[3] = layerWeightsUpdates[i * 4 + 3];

                    // for each synapse of the neuron
                    for (int j = 0; j < neuronWeightUpdatesTemp; j++)
                        // calculate weight update
                        vectorInitTemp[0] = layerPrev.Neurons[j * 4 + 0].Output;
                        vectorInitTemp[1] = layerPrev.Neurons[j * 4 + 1].Output;
                        vectorInitTemp[2] = layerPrev.Neurons[j * 4 + 2].Output;
                        vectorInitTemp[3] = layerPrev.Neurons[j * 4 + 3].Output;
                        Vector<double> neuronsOutput = new Vector<double>(vectorInitTemp);
                        for (int l = 0; l < 4; l++)
                            neuronWeightUpdates[l][j] = Vector.Multiply(cachedMomentum, neuronWeightUpdates[l][j]) + Vector.Multiply(cachedError[l], neuronsOutput);

                    // calculate treshold update
                    layerThresholdUpdates[i] = Vector.Multiply(cachedMomentum, layerThresholdUpdates[i]) + cachedError;

        private void UpdateNetwork()
            // current layer
            Layer layer;
            // layer's weights updates
            Vector<double>[][] layerWeightsUpdates;
            // layer's thresholds updates
            Vector<double>[] layerThresholdUpdates;

            // for each layer of the network
            int layersLengthTemp = network.Layers.Length;
            for (int i = 0; i < layersLengthTemp; i++)
                layer = network.Layers[i];
                layerWeightsUpdates = weightsUpdates[i];
                layerThresholdUpdates = thresholdsUpdates[i];

                // 誘導変数の使用
                int weightsLengthTemp = layer.Neurons[0].Weights.Length / 4;

                // for each neuron of the layer
                Parallel.For(0, (layer.Neurons.Length / 4), j =>
                    ActivationNeuron[] neuron = new ActivationNeuron[4];
                    neuron[0] = layer.Neurons[j * 4 + 0] as ActivationNeuron;
                    neuron[1] = layer.Neurons[j * 4 + 1] as ActivationNeuron;
                    neuron[2] = layer.Neurons[j * 4 + 2] as ActivationNeuron;
                    neuron[3] = layer.Neurons[j * 4 + 3] as ActivationNeuron;

                    Vector<double>[][] neuronWeightUpdates = new Vector<double>[4][];
                    neuronWeightUpdates[0] = layerWeightsUpdates[j * 4 + 0];
                    neuronWeightUpdates[1] = layerWeightsUpdates[j * 4 + 1];
                    neuronWeightUpdates[2] = layerWeightsUpdates[j * 4 + 2];
                    neuronWeightUpdates[3] = layerWeightsUpdates[j * 4 + 3];

                    // for each weight of the neuron
                    for (int k = 0; k < weightsLengthTemp; k++)
                        for (int l = 0; l < 4; l++)
                            // update weight
                            neuron[l].Weights[k * 4 + 0] += neuronWeightUpdates[l][k][0];
                            neuron[l].Weights[k * 4 + 1] += neuronWeightUpdates[l][k][1];
                            neuron[l].Weights[k * 4 + 2] += neuronWeightUpdates[l][k][2];
                            neuron[l].Weights[k * 4 + 3] += neuronWeightUpdates[l][k][3];

                    // update treshold
                    neuron[0].Threshold += layerThresholdUpdates[j][0];
                    neuron[1].Threshold += layerThresholdUpdates[j][1];
                    neuron[2].Threshold += layerThresholdUpdates[j][2];
                    neuron[3].Threshold += layerThresholdUpdates[j][3];





1. 学習元データを作成する











AForge.NETで遊んでいる。Back Propagation Learning事態は20年以上前からあるのだが、当時は少ないメモリをやり繰りしながら一から実装するほかなかった。お手軽にライブラリでできるって素晴らしいよね。ただ残念なことにAForge.NETのBack Propagation LearningはMulti Threadに対応しないので、Parallel.Forを使って修正してみた。
マルチスレッドに対応させるための書き換えも.Net Framework 4.xではとてもお手軽にできる。今時マルチスレッドに動作しないのってすごく罪だよね。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

using AForge.Neuro;
using AForge.Neuro.Learning;

namespace RemoveRule
    class MultiThreadBackPropagationLearning : BackPropagationLearning
        // network to teach
        private ActivationNetwork network;

        // learning rate
        private double learningRate = 0.1;

        // momentum
        private double momentum = 0.0;

        // neuron's errors
        private double[][] neuronErrors = null;

        // weight's updates
        private double[][][] weightsUpdates = null;

        // threshold's updates
        private double[][] thresholdsUpdates = null;

        public new double  LearningRate
            get { return learningRate; }
                learningRate = Math.Max(0.0, Math.Min(1.0, value));

        public new double Momentum
            get { return momentum; }
                momentum = Math.Max(0.0, Math.Min(1.0, value));

        public MultiThreadBackPropagationLearning(ActivationNetwork network) : base(network)
   = network;

            // create error and deltas arrays
            neuronErrors = new double[network.Layers.Length][];
            weightsUpdates = new double[network.Layers.Length][][];
            thresholdsUpdates = new double[network.Layers.Length][];

            // initialize errors and deltas arrays for each layer
            for (int i = 0; i < network.Layers.Length; i++)
                Layer layer = network.Layers[i];

                neuronErrors[i] = new double[layer.Neurons.Length];
                weightsUpdates[i] = new double[layer.Neurons.Length][];
                thresholdsUpdates[i] = new double[layer.Neurons.Length];

                // for each neuron
                for (int j = 0; j < weightsUpdates[i].Length; j++)
                    weightsUpdates[i][j] = new double[layer.InputsCount];

        public new double Run(double[] input, double[] output)
            // compute the network's output

            // calculate network error
            double error = CalculateError(output);

            // calculate weights updates

            // update the network

            return error;


        public new double RunEpoch(double[][] input, double[][] output)
            double error = 0.0;

            // run learning procedure for all samples
            for (int i = 0; i < input.Length; i++)
                error += Run(input[i], output[i]);

            // return summary error
            return error;

        private double CalculateError(double[] desiredOutput)
            // current and the next layers
            Layer layer, layerNext;
            // current and the next errors arrays
            double[] errors, errorsNext;
            // error values
            double error = 0;
            // neuron's output value
            double output;
            // layers count
            int layersCount = network.Layers.Length;

            // assume, that all neurons of the network have the same activation function
            IActivationFunction function = (network.Layers[0].Neurons[0] as ActivationNeuron).ActivationFunction;

            // calculate error values for the last layer first
            layer = network.Layers[layersCount - 1];
            errors = neuronErrors[layersCount - 1];
            Parallel.For(0, layer.Neurons.Length, i =>
                output = layer.Neurons[i].Output;

                // error of the neuron
                double e = desiredOutput[i] - output;

                // error multiplied with activation function's derivative
                double derivative;
                lock (function)
                    derivative = function.Derivative2(output);
                errors[i] = e * derivative;

                // squre the error and sum it
                error += (e * e);

            // calculate error values for other layers
            for (int j = layersCount - 2; j >= 0; j--)
                layer = network.Layers[j];
                layerNext = network.Layers[j + 1];
                errors = neuronErrors[j];
                errorsNext = neuronErrors[j + 1];

                // for all neurons of the layer
                Parallel.For(0, layer.Neurons.Length, i =>
                    double sum = 0.0;

                    // for all neurons of the next layer
                    for (int k = 0; k < layerNext.Neurons.Length; k++)
                        sum += errorsNext[k] * layerNext.Neurons[k].Weights[i];

                    double derivative;
                    lock (function)
                        derivative = function.Derivative2(layer.Neurons[i].Output);
                    errors[i] = sum * derivative;

            // return squared error of the last layer divided by 2
            return error / 2.0;

        private void CalculateUpdates(double[] input)
            // current neuron
            Neuron neuron;

            // current and previous layers
            Layer layer, layerPrev;

            // layer's weights updates
            double[][] layerWeightsUpdates;

            // layer's thresholds updates
            double[] layerThresholdUpdates;

            // layer's error
            double[] errors;

            // error value
            // double           error;

            // 1 - calculate updates for the first layer
            layer = network.Layers[0];
            errors = neuronErrors[0];
            layerWeightsUpdates = weightsUpdates[0];
            layerThresholdUpdates = thresholdsUpdates[0];

            // cache for frequently used values
            double cachedMomentum = learningRate * momentum;
            double cached1mMomentum = learningRate * (1 - momentum);

            // for each neuron of the layer
            Parallel.For(0, layer.Neurons.Length, i =>
                neuron = layer.Neurons[i];
                double cachedError = errors[i] * cached1mMomentum;
                double[] neuronWeightUpdates = layerWeightsUpdates[i];

                // for each weight of the neuron
                for (int j = 0; j < neuronWeightUpdates.Length; j++)
                    // calculate weight update
                    neuronWeightUpdates[j] = cachedMomentum * neuronWeightUpdates[j] + cachedError * input[j];

                // calculate treshold update
                layerThresholdUpdates[i] = cachedMomentum * layerThresholdUpdates[i] + cachedError;

            // 2 - for all other layers
            for (int k = 1; k < network.Layers.Length; k++)
                layerPrev = network.Layers[k - 1];
                layer = network.Layers[k];
                errors = neuronErrors[k];
                layerWeightsUpdates = weightsUpdates[k];
                layerThresholdUpdates = thresholdsUpdates[k];

                // for each neuron of the layer
                Parallel.For(0, layer.Neurons.Length, i =>
                    neuron = layer.Neurons[i];
                    double cachedError = errors[i] * cached1mMomentum;
                    double[] neuronWeightUpdates = layerWeightsUpdates[i];

                    // for each synapse of the neuron
                    for (int j = 0; j < neuronWeightUpdates.Length; j++)
                        // calculate weight update
                        neuronWeightUpdates[j] = cachedMomentum * neuronWeightUpdates[j] + cachedError * layerPrev.Neurons[j].Output;

                    // calculate treshold update
                    layerThresholdUpdates[i] = cachedMomentum * layerThresholdUpdates[i] + cachedError;

        private void UpdateNetwork()
            // current layer
            Layer layer;
            // layer's weights updates
            double[][] layerWeightsUpdates;
            // layer's thresholds updates
            double[] layerThresholdUpdates;

            // for each layer of the network
            for (int i = 0; i < network.Layers.Length; i++)
                layer = network.Layers[i];
                layerWeightsUpdates = weightsUpdates[i];
                layerThresholdUpdates = thresholdsUpdates[i];

                // for each neuron of the layer
                Parallel.For(0, layer.Neurons.Length, j =>
                    ActivationNeuron neuron = layer.Neurons[j] as ActivationNeuron;
                    double[]  neuronWeightUpdates = layerWeightsUpdates[j];

                    // for each weight of the neuron
                    for (int k = 0; k < neuron.Weights.Length; k++)
                        // update weight
                        neuron.Weights[k] += neuronWeightUpdates[k];

                    // update treshold
                    neuron.Threshold += layerThresholdUpdates[j];

