tag:blogger.com,1999:blog-32027435356855625832024-02-22T20:48:30.653+01:00GSoC 2017: Neural Networks packageEnrico Bertinohttp://www.blogger.com/profile/03936797921360245682noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-3202743535685562583.post-17582287293702340702017-08-29T17:23:00.002+02:002017-08-29T17:23:37.430+02:00Summary of work done during GSoC<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">GSoC17 is at the end <span lang="en">and I want to thank </span></span><span style="font-family: "verdana" , sans-serif;"><span lang="en"><span style="font-family: "verdana" , sans-serif;">my mentors and the Octave community </span> for giving me the opportunity to participate in this unique experience.</span></span></span><br />
<br />
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">During this Google Summer of Code, my goal was to implement from scratch the Convolutional Neural Networks package for GNU Octave. It will be integrated with the already existing <i>nnet</i> package. </span></span><br />
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><br /></span></span>
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"> This was a very interesting project and a </span><span style="font-family: "verdana" , sans-serif;"><span class="" id="result_box" lang="en" tabindex="-1"><span class="">stimulating experience for both the implemented code and the theoretical base behind the algorithms treated. A part has been implemented in Octave and an other part in Python using the Tensorflow API. </span></span></span></span><br />
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><br /></span></span>
<br />
<h3>
<span style="font-family: "verdana" , sans-serif;"><span style="font-size: large;"><span style="font-family: "verdana" , sans-serif;">Code repository</span></span></span></h3>
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">All the code implemented during these months can be found in </span><span style="font-family: "verdana" , sans-serif;">my public repository: </span></span><br />
<br />
<b><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"> <a href="https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/all">https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/all</a> </span></span></b><br />
<br />
<span style="font-family: "verdana" , sans-serif;">(my username: <i>citti berto</i>, bookmark <i>enrico</i>)</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Since I implemented a completely new part of the package, I pushed the entire project in three commits and I wait for the community approving for preparing a PR for the official package [1].</span></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<h3>
<span style="font-family: "verdana" , sans-serif;"><span style="font-size: large;"><span style="font-family: "verdana" , sans-serif;">Summary</span></span></span></h3>
<span style="font-family: "verdana" , sans-serif;"><b>The first commit</b> <b>(</b></span><span style="font-family: "verdana" , sans-serif;"><b><span class="changeset-hash">ade115a, [2])</span></b> contains the layers. There is a class for each layer, with a corresponding function which calls the constructor. All the layers inherit from a Layer class which lets the user create a layers concatenation, that is the network architecture. Layers have several parameters, <span class="" id="result_box" lang="en" tabindex="-1">for which I have guaranteed the compatibility with Matlab [3].</span></span><br />
<br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><b>The second commit (</b></span><span style="font-family: "verdana" , sans-serif;"><b><span class="changeset-hash">479ecc5 </span>[4]) </b>is about the Python part, including an init file checking the Tensorflow installation. I implemented a Python module, <i>TFintefrace</i>, which includes:</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<ul>
<li><span style="font-family: "verdana" , sans-serif;"><i>layer.py:</i> an abstract class for layers inheritance</span></li>
<li><span style="font-family: "verdana" , sans-serif;"><i>layers/layers.py:</i> the layer classes that are used to add the right layer to the TF graph</span></li>
<li><span style="font-family: "verdana" , sans-serif;"><i>dataset.py:</i> a class for managing the datasets input</span></li>
<li><span style="font-family: "verdana" , sans-serif;"><i>trainNetwork.py:</i> the core class, which initiates the graph and the session, performs the training and the predictions</span></li>
<li><span style="font-family: "verdana" , sans-serif;"><i>deepdream.py:</i> a version of [5] for deepdream implementation (it has to be completed)</span></li>
</ul>
<br />
<br />
<span style="font-family: "verdana" , sans-serif;"><b>The third commit (</b></span><span style="font-family: "verdana" , sans-serif;"><b><span class="changeset-hash">e7201d8 [6])</span> </b>includes:</span><br />
<ul>
<li><span style="font-family: "verdana" , sans-serif;"><i>trainingOptions</i>: All the options for the training. Up to now, the only optimizer available is the stochastic gradient descent with momentum (sgdm) implemented in the class TrainingOptionsSGDM.</span></li>
<li><span style="font-family: "verdana" , sans-serif;"><i>trainNetwork</i>: passing the data, the architecture and the options, this function performs the training and returns a SeriesNetwork object</span></li>
<li><span style="font-family: "verdana" , sans-serif;"><i>SeriesNetwork</i>: class that contains the trained network, including the Tensorflow graph and session. This has three methods</span>
<ul>
<li><span style="font-family: "verdana" , sans-serif;"><i><span style="font-family: "verdana" , sans-serif;"><i>predict: </i></span></i><span style="font-family: "verdana" , sans-serif;">predicting </span><span style="font-family: "verdana" , sans-serif;">scores for regression problems</span><i><span style="font-family: "verdana" , sans-serif;"><i></i></span></i></span></li>
<li><span style="font-family: "verdana" , sans-serif;"><i>classify: </i>predicting labels for classification problems<i><br /></i></span></li>
<li><span style="font-family: "verdana" , sans-serif;"><i>activations:</i></span> <span style="font-family: Verdana,sans-serif;">getting the output of a specific layer of the architecture</span></li>
</ul>
<span style="font-family: "verdana" , sans-serif;"><i> </i></span></li>
</ul>
<h3>
<span style="font-family: "verdana" , sans-serif;"><span style="font-size: large;"><span style="font-family: "verdana" , sans-serif;"> </span></span></span></h3>
<h4>
<span style="font-family: "verdana" , sans-serif; font-size: large;"><span style="font-size: medium;"><span style="font-family: "verdana" , sans-serif;">Goals not met</span></span></span></h4>
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">I did not manage to implement some features because of the lack of time due to the bug fixing in the last period. </span><span class="short_text" id="result_box" lang="en" tabindex="-1">The problem was the </span><span class="short_text" id="result_box" lang="en" tabindex="-1"><span class="short_text" id="result_box" lang="en" tabindex="-1">conspicuous time</span> spent testing the algorithms (because of the different random generators between Matlab, Octave and Python/Tensorflow). I will work in the next weeks to implement the missing features and I plan to continue to contribute to maintaining this package to keep it up to date with both Tensorflow new versions and Matlab new feature. </span></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<br />
<table>
<tbody>
<tr>
<th><span style="font-family: "verdana" , sans-serif;">Function</span></th>
<th><span style="font-family: "verdana" , sans-serif;">Missing features</span></th>
</tr>
<tr>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>activations</i></span></span></span></td>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-size: small;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>OutputAs</i> (for changing output format)</span></span></span></span></td>
</tr>
<tr>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>imageInputLayer</i></span></span></span></td>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-size: small;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>DataAugmentation</i> and <i>Normalization</i></span></span></span></span></td>
</tr>
<tr>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>trainNetwork</i></span></span></span></td>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-size: small;">Accepted inputs: <i>imds</i> or <i>tbl</i></span></span></td>
</tr>
<tr>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>trainNetwork</i></span></span></span></td>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-size: small;"><i>.mat Checkpoints</i></span></span></td>
</tr>
<tr>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>trainNetwork</i></span></span></span></td>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-size: small;"><i>ExecutionEnvironment: 'multi-gpu' </i>and<i> 'parallel'</i></span></span></td>
</tr>
<tr>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>ClassificationOutputLayer </i></span></span></span></td>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-size: small;"><i><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">classnames</span></span></i></span></span></td>
</tr>
<tr>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>TrainingOptions</i></span></span></span></td>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-size: small;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><i>WorkerLoad</i> and <i>OutputFcn</i></span></span></span></span></td>
</tr>
<tr>
<td><span style="font-family: "verdana" , sans-serif;">DeepDreamImages</span></td>
<td><span style="font-family: "verdana" , sans-serif;"><span style="font-size: small;">Generalization to any network and AlexNet example</span></span></td>
</tr>
</tbody></table>
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<h3>
<span style="font-family: "verdana" , sans-serif;"><span style="font-size: large;"><span style="font-family: "verdana" , sans-serif;"> </span></span></span></h3>
<h3>
<span style="font-family: "verdana" , sans-serif;"><span style="font-size: large;"><span style="font-family: "verdana" , sans-serif;">Tutorial for testing the package</span></span></span></h3>
<ol>
<li><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Install Python Tensorflow API (as explained in [4]) </span></span></li>
<li><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Install Pytave (following these instructions [5])</span></span></li>
<li><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Install nnet package (In Octave: install [6] and load [7])</span></span></li>
<li><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Check the package with <i>make check PYTAVE="pytave/dir/"</i> </span></span></li>
<li><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Open Octave, add the Pytave dir the the paths and run your first network:</span></span></li>
</ol>
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "courier new" , "courier" , monospace;"><br />### TRAINING ###<br /># Load the training set<br />[XTrain,TTrain] = digitTrain4DArrayData();<br /> </span></span><br />
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "courier new" , "courier" , monospace;"># Define the layers<br />layers = [imageInputLayer([28 28 1]);<br /> convolution2dLayer(5,20);<br /> reluLayer();<br /> maxPooling2dLayer(2,'Stride',2);<br /> fullyConnectedLayer(10);<br /> softmaxLayer();<br /> classificationLayer()];<br /><br /># Define the training options<br />options = trainingOptions('sgdm', 'MaxEpochs', 15, 'InitialLearnRate', 0.04);<br /><br /># Train the network<br />net = trainNetwork(XTrain,TTrain,layers,options);<br /><br /><br />### TESTING ###<br /># Load the testing set<br />[XTest,TTest]= digitTest4DArrayData();<br /><br /># Predict the new labels<br />YTestPred = classify(net,XTest);</span></span></span><br />
<br />
<h4>
<span style="font-family: "verdana" , sans-serif;"><span style="font-size: large;"><span style="font-family: "verdana" , sans-serif;">Future improvements</span></span></span></h4>
<ul>
<li><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Manage the session saving</span></span></span></li>
<li><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Save the checkpoints as .mat files and not as TF checkpoints </span></span></span></li>
<li><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Optimize array passage via Pytave </span></span></li>
<li><span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">Categorical variables for classification problems </span></span></li>
</ul>
<br />
<h4>
<span style="font-size: large;"><span style="font-family: "verdana" , sans-serif;">Links</span></span></h4>
<span style="font-family: "verdana" , sans-serif;"></span><br />
<span style="font-family: "verdana" , sans-serif;">Repo link: <a href="https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/all">https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/all</a></span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;">[1] <a href="https://sourceforge.net/p/octave/nnet/ci/default/tree/">https://sourceforge.net/p/octave/nnet/ci/default/tree/</a></span><br />
<span style="font-family: "verdana" , sans-serif;">[2] <a href="https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/ade115a0ce0c80eb2f617622d32bfe3b2a729b13">https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/ade115a0ce0c80eb2f617622d32bfe3b2a729b13</a></span><br />
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">[3] <a href="https://it.mathworks.com/help/nnet/classeslist.html?s_cid=doc_ftr">https://it.mathworks.com/help/nnet/classeslist.html?s_cid=doc_ftr</a></span> </span><br />
<span style="font-family: "verdana" , sans-serif;">[4] <a href="https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/479ecc5c1b81dd44c626cc5276ebff5e9f509e84">https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/479ecc5c1b81dd44c626cc5276ebff5e9f509e84</a> </span><br />
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">[5] <a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/tutorials/deepdream">https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/tutorials/deepdream</a></span> </span><br />
<span style="font-family: "verdana" , sans-serif;">[6] <a href="https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/e7201d8081ca3c39f335067ca2a117e7971b5087">https://bitbucket.org/cittiberto/gsoc-octave-nnet/commits/e7201d8081ca3c39f335067ca2a117e7971b5087</a> </span><br />
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">[7] <a href="https://www.tensorflow.org/install/">https://www.tensorflow.org/install/</a></span></span><br />
<span style="font-family: "verdana" , sans-serif;"><span style="font-family: "verdana" , sans-serif;">[8] <a href="https://bitbucket.org/mtmiller/pytave">https://bitbucket.org/mtmiller/pytave</a></span></span><br />
<span style="font-family: "verdana" , sans-serif;">[9] <a href="https://www.gnu.org/software/octave/doc/interpreter/Installing-and-Removing-Packages.html">https://www.gnu.org/software/octave/doc/interpreter/Installing-and-Removing-Packages.html</a></span><br />
<span style="font-family: "verdana" , sans-serif;">[10] <a href="https://www.gnu.org/software/octave/doc/v4.0.1/Using-Packages.html">https://www.gnu.org/software/octave/doc/v4.0.1/Using-Packages.html</a></span>Enrico Bertinohttp://www.blogger.com/profile/03936797921360245682noreply@blogger.com3tag:blogger.com,1999:blog-3202743535685562583.post-42785794561545122712017-07-27T20:34:00.000+02:002017-07-27T20:34:59.705+02:00Deep learning functions<span style="font-family: "times" , "times new roman" , serif;">Hi there,</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">the second part of the project is finishing. This period was quite interesting because I had to dive into the theory behind Neural Networks In particular [1], [2], [3] and [4] were very useful and I will sum up some concepts here below. On the other hand,</span><span style="font-family: times, "times new roman", serif;"> coding became more challenging and t</span><span style="font-family: times, "times new roman", serif;">he focus was on the python layer and in particular the way to structure the class in order to make everything scalable and generalizable. Summarizing the situation, in the first period I implemented all the Octave classes for the user interface. Those are Matlab compatible and they call some Python function in a seamless way. On the Python side, the TensorFlow API is used to build the graph of the Neural Network and perform training, evaluation and prediction.</span><br />
<br />
<span style="font-family: "times" , "times new roman" , serif;">I implemented the three core functions: trainNetwork, SeriesNetwork and trainingOptions. To do this, I used a Python class in which I initialize an object with the graph of the network and I store this object as attribute of SeriesNetwork. Doing that, I call the methods of this class from trainNetwork to perform the training and from predict/classify to perform the predictions. Since it was quite hard to have a clear vision of the situation, I used a Python wrapper (Keras) that allowed me to focus on the integration, "unpack" the problem and go forth "module" by "module". Now I am removing the dependency on the Keras library using directly the Tensorflow API. The code in my repo [5].</span><br />
<br />
<span style="font-family: "times" , "times new roman" , serif;">Since I have already explained in last posts how I structured the package, in this post I would like to focus on the theoretical basis of the deep learning functions used in the package. In particular I will present the available layers and the parameters that are available for the training. </span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<br />
<h2>
<span style="font-family: "times" , "times new roman" , serif; font-size: x-large;">Theoretical dive</span></h2>
<h3>
<span style="font-family: "times" , "times new roman" , serif; font-size: large;">I. Fundamentals</span></h3>
<div>
<span style="font-family: "times" , "times new roman" , serif;">I want to start with a brief explanation about the perceptron and the back propagation, two key concepts in the artificial neural networks world. </span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<b><span style="font-family: "times" , "times new roman" , serif;">Neurons</span></b><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">Let's start from the perceptron, that is the starting point for understanding neural networks and its components. A perceptron is simply a "node" that takes several binary inputs, <span style="text-align: center;"> $ x_1, x_2, ... $, </span>and produces a single binary output:</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">The neuron's output, 0 or 1, is determined by whether the linear combination of the inputs $ \omega \cdot x = \sum_j \omega_j x_j $ is less than or greater than some threshold value. That is a simple mathematical model but is very versatile and powerful because we can combine many perceptrons and varying the weights and the threshold we can get different models. Moving the threshold to the other side of the inequality and replacing it by what's known as the perceptron's bias, b = −threshold, we can rewrite it as</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ out = \bigg \{ \begin{array}{rl} 0 & \omega \cdot x + b \leq 0 \\ 1 & \omega \cdot x + b > 0 \\ \end{array} $ </span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "times" , "times new roman" , serif;">Using the perceptrons like artificial neurons of a network, it turns out that we can devise learning algorithms which can automatically tune the weights and biases. This tuning happens in response to external stimuli, without direct intervention by a programmer and this enables us to have an "automatic" learning.</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">Speaking about learning algorithms, the proceedings are simple: we suppose we make a small change in some weight or bias and what see the corresponding change in the output from the network. If a small change in a weight or bias causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. The problem is that this isn't what happens when our network contains perceptrons since a small change of any single perceptron can sometimes cause the output of that perceptron to completely flip, say from 0 to 1. We can overcome this problem by introducing an activation function. Instead of the binary output we use a function depending on weights and bias. The most common is the sigmoid function:</span><br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ \sigma (\omega \cdot x + b ) = \dfrac{1}{1 + e^{-(\omega \cdot x + b ) } } $ </span></div>
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiqzTOtN1msQtPdEeh6-yZPFvnxqd0AirE_PiBqpnKalMaS_rN2HfqazSbFLmlqGoI57P6v-o2SoVYVtQf2WJkZuUzyE2ikfF2RRgXVBpFdf1JGhUKI5dXcCdOz4olpIYW4UjJwyshzzV5/s1600/68747470733a2f2f75706c6f61642e77696b696d656469612e6f72672f77696b6970656469612f636f6d6d6f6e732f382f38632f50657263657074726f6e5f6d6f6a2e706e67.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="308" data-original-width="817" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiqzTOtN1msQtPdEeh6-yZPFvnxqd0AirE_PiBqpnKalMaS_rN2HfqazSbFLmlqGoI57P6v-o2SoVYVtQf2WJkZuUzyE2ikfF2RRgXVBpFdf1JGhUKI5dXcCdOz4olpIYW4UjJwyshzzV5/s400/68747470733a2f2f75706c6f61642e77696b696d656469612e6f72672f77696b6970656469612f636f6d6d6f6e732f382f38632f50657263657074726f6e5f6d6f6a2e706e67.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 1. Single neuron </td></tr>
</tbody></table>
<br /></div>
<span style="font-family: "times" , "times new roman" , serif;">With the smoothness of the activation function $ \sigma $ we are able to analytically measure the output changes since $ \Delta out $ is a linear function of the changes $ \Delta \omega $ and $ \Delta b$ :</span><br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ \Delta out \approx \sum_j \dfrac{\partial out}{\partial \omega_j} \Delta \omega_j + \dfrac{\partial out}{\partial b} \Delta b $ </span></div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<b><span style="font-family: "times" , "times new roman" , serif;">Loss function</span></b><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">Let x be a training input and y(x) the desired output. What we'd like is an algorithm which lets us find weights and biases so that the output from the network approximates y(x) for all x. Most used loss function is mean squared error (MSE) :</span><br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ L( \omega, b) = \dfrac{1}{2n} \sum_x || Y(x) - out ||^2 $ ,</span></div>
<span style="font-family: "times" , "times new roman" , serif;">where n is the total number of training inputs, <i>out</i> is the vector of outputs from the network when x is input. </span><br />
<span style="font-family: "times" , "times new roman" , serif;">To minimize the loss function, there are many optimizing algorithms. The one we will use is the gradient descend, of which every iteration of an epoch is defined as:</span><br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ \omega_k \rightarrow \omega_k' = \omega_k - \dfrac{\eta}{m} \sum_j \dfrac{\partial L_{X_j}}{\partial \omega_k} $ </span></div>
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ b_k \rightarrow b_k' = b_k - \dfrac{\eta}{m} \sum_j \dfrac{\partial L_{X_j}}{\partial b_k} $ </span></div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">where <i>m</i> is the size of the batch of inputs with which we feed the network and $ \eta $ is the learning rate.</span><br />
<b><span style="font-family: "times" , "times new roman" , serif;"><br /></span></b>
<b><span style="font-family: "times" , "times new roman" , serif;">Backpropagation</span></b><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">The last concept that I would like to emphasize is the backpropagation. Its goal is to compute the partial derivatives$ \partial L / \partial \omega $ and $ \partial L / \partial b} $ of the loss function L with respect to any weight or bias in the network. The reason is that those partial derivatives are computationally heavy and the network training would be excessively slow.</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">Let be $ z^l $ the <i>weighted input</i> to the neurons in layer <i>l</i>, that can be viewed as a linear function of the activations of the previous layer: $ z^l = \omega^l a^{l-1} + b^l $ .</span><br />
<span style="font-family: "times" , "times new roman" , serif;">In the fundamental steps of backpropagation we compute:</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">1) the final error:</span><br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ \delta ^L = \Delta_a L \odot \sigma' (z^L) $ </span></div>
<div style="text-align: left;">
<span style="font-family: "times" , "times new roman" , serif;">The first term measures how fast the loss is changing as a function of every output activation and the second term measures how fast the activation function is changing at $ z_L $ </span></div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">2) the error of every layer <i>l:</i></span><br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ \delta^l = ((\omega^{l+1})^T \delta^{l+1} ) \odot \sigma' (z^l) $ </span></div>
<br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "times" , "times new roman" , serif;">3) the partial derivative of the loss function with respect to any bias in the net</span><br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ \dfrac{\partial L}{\partial b^l_j} = \delta^l_j $ </span></div>
<br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "times" , "times new roman" , serif;">4) the partial derivative of the loss function with respect to any weight in the net</span><br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ \dfrac{\partial L}{\partial \omega^l_{jk}} = a_k^{l-1} \delta^l_j $ </span></div>
<br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "times" , "times new roman" , serif;">We can therefore update the weights and the biases with the gradient descent and train the network. Since inputs can be too numerous, we can use only a random sample of the inputs. Stochastic Gradient Descent (SGD) simply does away with the expectation in the update and computes the gradient of the parameters using only a single or a few training examples. In particular, we will use the SGD with momentum, that is a method that helps accelerate SGD in the relevant direction and damping oscillations. It does this by adding a fraction γ of the update vector of the past time step to the current update vector.</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<br />
<h2>
<span style="font-family: "times" , "times new roman" , serif; font-size: large;">II. Layers</span></h2>
<span style="font-family: "times" , "times new roman" , serif;">Here a brief explanation of the functions that I am considering in the trainNetwork class</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<b><span style="font-family: "times" , "times new roman" , serif;">Convolution2DLayer</span></b><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;">The convolution layer is the core building block of a convolutional neural network (CNN) and it does most of the computational heavy lifting. They derive their name from the “convolution” operator. The primary purpose of convolution is to extract features from the input image preserving the spatial relationship between pixels by learning image features using small squares of input data. </span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXpnUGtMZ_mTqVCjX-R45PzjjEIlhHYhsYZsSyDJ2AY-2r8ldumllYW3VQiIq5x9vEhySKwZCqlcWUr1M7jeyCn8LG-eeIfPiqdOW-C7S9JquyjN9pw-cjxLh3rmWJErf2XQdbJUHqkTa6/s1600/convolution_schematic.gif" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="196" data-original-width="268" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXpnUGtMZ_mTqVCjX-R45PzjjEIlhHYhsYZsSyDJ2AY-2r8ldumllYW3VQiIq5x9vEhySKwZCqlcWUr1M7jeyCn8LG-eeIfPiqdOW-C7S9JquyjN9pw-cjxLh3rmWJErf2XQdbJUHqkTa6/s1600/convolution_schematic.gif" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 2. Feature extraction with convolution (image taken form https://goo.gl/h7pXxf) </td></tr>
</tbody></table>
<span style="font-family: "times" , "times new roman" , serif;">In the example in Fig.2, the 3×3 matrix is called a 'filter' or 'kernel' and the matrix formed by sliding the filter over the image and computing the dot product is called the 'Convolved Feature' or 'Activation Map' (or the 'Feature Map'). In practice, a CNN learns the values of these filters on its own during the training process (although we still need to specify parameters such as number of filters, filter size, architecture of the network etc. before the training process). The more number of filters we have, the more image features get extracted and the better our network becomes at recognizing patterns in unseen images.</span><br />
<span style="font-family: "times" , "times new roman" , serif;">The size of the Feature Map depends on three parameters: the depth (that corresponds to the number of filters we use for the convolution operation), the stride (that is the number of pixels by which we slide our filter matrix over the input matrix) and the padding (that consists in padding the input matrix with zeros around the border).</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><b><br class="Apple-interchange-newline" />ReLULayer</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span><span style="font-family: "times" , "times new roman" , serif;">ReLU stands for Rectified Linear Unit and is a non-linear operation: $ f(x)=max(0,x) $. Usually this is applied element-wise to the output of some other function, such as a matrix-vector product. It </span><span style="font-family: "times" , "times new roman" , serif;">replaces all negative pixel values in the feature map by zero with the purpose of introducing non-linearity in our network, since most of the real-world data we would want to learn would be non-linear.</span><br />
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "times" , "times new roman" , serif;"></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>FullyConnectedLayer</b></span><br />
<br />
<span style="font-family: "times" , "times new roman" , serif;">Neurons in a fully connected layer have full connections to all activations in the previous layer, as seen in regular Neural Networks. H</span><span style="font-family: "times" , "times new roman" , serif;">ence t</span><span style="font-family: "times" , "times new roman" , serif;">heir activations can be computed with a matrix multiplication followed by a bias offset. In our case, the purpose of the </span><span style="font-family: "times" , "times new roman" , serif;">fully-connected</span><span style="font-family: "times" , "times new roman" , serif;"> layer is to use these features for classifying the input image into various classes based on the training dataset. Apart from classification, adding a fully-connected layer is also a cheap way of learning non-linear combinations of the features. </span><br />
<br />
<span style="font-family: "times" , "times new roman" , serif;"><b>Pooling</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;"><b><br /></b></span>
<span style="font-family: "times" , "times new roman" , serif;">It is common to periodically insert a pooling layer in-between successive convolution layers. </span><span style="font-family: "times" , "times new roman" , serif;">Spatial Pooling (also called subsampling or downsampling) reduces the dimensionality of each feature map but retains the most important information. </span><span style="font-family: "times" , "times new roman" , serif;">In particular, pooling</span><br />
<span style="font-family: "times" , "times new roman" , serif;">- </span><span style="font-family: "times" , "times new roman" , serif;">makes the input representations (feature dimension) smaller and more manageable</span><br />
<span style="font-family: "times" , "times new roman" , serif;">- </span><span style="font-family: "times" , "times new roman" , serif;">reduces the number of parameters and computations in the network, therefore, controlling overfitting</span><br />
<span style="font-family: "times" , "times new roman" , serif;">- </span><span style="font-family: "times" , "times new roman" , serif;">makes the network invariant to small transformations, distortions and translations in the input image</span><br />
<span style="font-family: "times" , "times new roman" , serif;">- </span><span style="font-family: "times" , "times new roman" , serif;">helps us arrive at an almost scale invariant representation of our image</span><span style="font-family: "times" , "times new roman" , serif;"> </span><br />
<span style="font-family: "times" , "times new roman" , serif;">Spatial Pooling can be of different types: Max, Average, Sum etc.</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>MaxPooling2DLayer</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">In case of Max Pooling, we define a spatial neighborhood (for example, a 2×2 window) and take the largest element from the rectified feature map within that window. </span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>AveragePooling2DLayer</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">Instead of taking the largest element we could also take the average.</span><br />
<br />
<span style="font-family: "times" , "times new roman" , serif;"><b>DropoutLayer</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span><span style="font-family: "times" , "times new roman" , serif;">Dropout in deep learning works as follows: one or more neural network nodes is switched off once in a while so that it will not interact with the network. With dropout, the learned weights of the nodes become somewhat more insensitive to the weights of the other nodes and learn to decide somewhat more by their own. In general, dropout helps the network to generalize better and increase accuracy since the influence of a single node is decreased.</span><br />
<div>
<br /></div>
<span style="font-family: "times" , "times new roman" , serif;"><b>SoftmaxLayer</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;"><b><br /></b></span>
<span style="font-family: "times" , "times new roman" , serif;">The purpose of the softmax classification layer is simply to transform all the net activations in your final output layer to a series of values that can be interpreted as probabilities. To do this, the softmax function is applied onto the net intputs.</span><br />
<div style="text-align: center;">
<span style="font-family: "times" , "times new roman" , serif;"> $ \phi_{softmax} (z^i) = \dfrac{e^{z^i}}{\sum_{j=0}^k e^{z_k^i}} $ </span></div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>CrossChannelNormalizationLayer</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;"><b><br /></b></span>
<span style="font-family: "times" , "times new roman" , serif;">Local Response Normalization (LRN) layer implements the lateral inhibition that in neurobiology refers to the capacity of an excited neuron to subdue its neighbors.</span><span style="font-family: "times" , "times new roman" , serif;"> This layer is useful when we are dealing with ReLU neurons because they have unbounded activations and we need LRN to normalize that. We want to detect high frequency features with a large response. If we normalize around the local neighborhood of the excited neuron, it becomes even more sensitive as compared to its neighbors. </span><span style="font-family: "times" , "times new roman" , serif;">At the same time, it will dampen the responses that are uniformly large in any given local neighborhood. If all the values are large, then normalizing those values will diminish all of them. So basically we want to encourage some kind of inhibition and boost the neurons with relatively larger activations.</span><br />
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<h2>
III. training options</h2>
<div>
<span style="font-family: "times" , "times new roman" , serif;">The training function takes as input a trainingOptions object that contains the parameters for the training. A brief explanation:</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>SolverName</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">Optimizer chosen for minimize the loss function. To guarantee the Matlab compatibility, only the Stochastic Gradient Descent with Momentum ('sgdm') is allowed</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>Momentum</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">Parameter for the <i>sgdm</i>: it corresponds to the contribution of the previous step to the current iteration</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>InitialLearnRate</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">Initial learning rate <b style="font-style: italic;">η </b>for the optimizer</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>LearnRateScheduleSettings</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">These are the settings for regulating the learning rate. It is a struct containing three values:</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><b>RateSchedule</b>: if it is set to '<i>piecewise', </i>the learning rate will drop of a </span><b style="font-family: times, "times new roman", serif;">RateDropFactor </b><span style="font-family: "times" , "times new roman" , serif;">every a </span><b style="font-family: times, "times new roman", serif;">RateDropPeriod </b><span style="font-family: "times" , "times new roman" , serif;">number of epochs.</span><i style="font-family: times, "times new roman", serif;"> </i><br />
<br />
<span style="font-family: "times" , "times new roman" , serif;"><b>L2Regularization</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">Regularizers allow to apply penalties on layer parameters or layer activity during optimization. This is the factor of the L2 regularization.</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>MaxEpochs</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">Number of epochs for training</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>Verbose</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">Display the information of the training every </span><span style="font-family: "times" , "times new roman" , serif;"><b>VerboseFrequency </b>iterations</span><br />
<br />
<span style="font-family: "times" , "times new roman" , serif;"><b>Shuffle</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">Random shuffle of the data before training if set to <i>'once'</i></span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><b>CheckpointPath</b></span><br />
Path for saving the checkpoints<br />
<br />
<span style="font-family: "times" , "times new roman" , serif;"><b>ExecutionEnvironment</b></span><br />
<span style="font-family: "times" , "times new roman" , serif;">Chose of the hardware for the training: </span><span style="font-family: "times" , "times new roman" , serif;">'cpu', 'gpu', 'multi-gpu' or 'parallel'. the </span><span style="font-family: "times" , "times new roman" , serif;">load is divided between workers of GPUs or CPUs according to the r</span><span style="font-family: "times" , "times new roman" , serif;">elative division set by </span><b style="font-family: times, "times new roman", serif;">WorkerLoad</b><br />
<br />
<span style="font-family: "times" , "times new roman" , serif;"><b>OutputFcn</b></span><br />
Custom output functions to call during training after each iteration passing a struct containing:<br />
Current epoch number, Current iteration number, TimeSinceStart, TrainingLoss, BaseLearnRate, TrainingAccuracy (or TrainingRMSE for regression), State<br />
<br />
Peace,<br />
Enrico<br />
<br />
[1] [Ian Goodfellow, Yoshua Bengio, Aaron Courville] Deep Learning, MIT Press, http://www.deeplearningbook.org, 2016<br />
[2] http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/<br />
[3] http://web.stanford.edu/class/cs20si/syllabus.html<br />
[4] http://neuralnetworksanddeeplearning.com/<br />
[5] https://bitbucket.org/cittiberto/octave-nnet/src</div>
Enrico Bertinohttp://www.blogger.com/profile/03936797921360245682noreply@blogger.com2tag:blogger.com,1999:blog-3202743535685562583.post-81741917703560258972017-06-30T08:32:00.000+02:002017-06-30T08:32:16.280+02:00End of the first work periodHi all,<br />
<br />
this is the end of this first period of GSoC! It was a challenging but very interesting period and I am very excited about the next two months. It is the first time where I have the opportunity to make something that has a real value, albeit small, on someone else's work. Even when I have to do small tasks, like structure the package or learn Tensorflow APIs, I do it with enthusiasm because I have very clear in mind the ultimate goal and the value that my efforts would bring. It may seem trivial, but for a student this is not the daily bread :) I really have fun coding for this project and I hope this will last until the end!<br />
<br />
Speaking about the project, I've spent some time wondering which was the best solution to test the correct installation of the Tensorflow Python APIs on the machine. The last solution was putting the test in a function __nnet_init__ and calling it in the PKG_ADD (code in <i>inst/__nnet_init__ </i>in my repo [1]).<br />
<br />
Regarding the code, in this last days I tried to connect the dots, calling a Tensorflow network from Octave in a "Matlab compatible" way. In particular, I use the classes that I made two weeks ago in order to implement a basic version of trainNetwork, that is the core function of this package. As I explained in my post of June 12, trainNetwork takes as input the data and two objects: the layers and the options. I had some some difficulty during the implementation of the Layer class due to the inheritance and the overloading. Eventually, I decided to store the layers in a cell array as attribute of the Layer class. Overloading the subsref, I let the user call a specific layer with the '()' access, like a classic array. With this kind of overloading I managed to solve the main problem of this structure, that is the possibility to get a property of a layer doing for example <span style="font-family: "courier new" , "courier" , monospace;">layers(1).Name</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">classdef Layer < handle</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> properties (Access = private)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> layers = {};</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> endproperties</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> methods (Hidden, Access = {?Layers})</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> function this = Layer (varargin)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> nargin = numel(varargin);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> this.layers = cell(1, nargin);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> for i = 1:nargin</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> this.layers{i} = varargin{i};</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> end</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> endfunction</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> endmethods</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> methods (Hidden)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> function obj = subsref(this, idx)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> switch idx(1).type</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> case '()'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> idx(1).type = '{}';</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> obj = builtin('subsref',this.layers,idx);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> case '{}'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> error('{} indexing not supported');</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> case '.'</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> obj = builtin('subsref',this,idx);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> endswitch</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> endfunction</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> function obj = numel(this)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> obj = builtin('numel',this.layers);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> endfunction</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> function obj = size(this)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> obj = builtin('size',this.layers);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> endfunction</span><br />
<div>
<span style="font-family: "courier new" , "courier" , monospace;">endmethods</span></div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">endclassdef</span></div>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Therefore, I implemented the same example of my last post in a proper way. In <i>tests/script </i>you can find the function <i>cnn_linear_model</i> which consists simply in:</div>
<div>
<br /></div>
1. Loading the datasets<br />
<span style="font-family: "courier new" , "courier" , monospace;"> load('tests/examples/cnn_linear_model/train_70.mat')<br /> load('tests/examples/cnn_linear_model/test_70.mat')</span><br />
<br />
<div>
2. Defining layers and options<br />
<span style="font-family: "courier new" , "courier" , monospace;"> layers = [ ...<br /> imageInputLayer([28 28 1])<br /> convolution2dLayer(12,25)<br /> reluLayer<br /> fullyConnectedLayer(1)<br /> regressionLayer];<br /> options = trainingOptions('sgdm', 'MaxEpochs', 1);</span><br />
<br />
3. Training<br />
<span style="font-family: "courier new" , "courier" , monospace;">net = trainNetwork(trainImages, trainAngles, layers, options);</span><br />
4. Prediction <br />
<span style="font-family: "courier new" , "courier" , monospace;">acc = net.predict(testImages, testAngles) </span><br />
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<div>
<i>TrainNetwork</i> is a draft and I have not yet implemented the classe <i>seriesNetwork</i>, but I think it's a good start :) In next weeks I will focus on the Tensorflow backend of the above mentioned functions, with the goal of having a working version at the end of the second period!</div>
<div>
<br /></div>
<div>
Peace,</div>
<div>
Enrico</div>
<div>
<br /></div>
<div>
[1] https://bitbucket.org/cittiberto/octave-nnet/src</div>
</div>
</div>
Enrico Bertinohttp://www.blogger.com/profile/03936797921360245682noreply@blogger.com0tag:blogger.com,1999:blog-3202743535685562583.post-71344106571759305752017-06-24T02:21:00.000+02:002017-06-24T13:36:53.444+02:00Train a Convolutional Neural Network for RegressionHello,<br />
<br />
I spent the last period working mostly on Tensorflow, studying the APIs and writing some examples in order to explore the possible implementations of neural networks. For this goal, I chose an interesting example proposed in the Matlab examples at [1]. The dataset is composed by 5000 images, rotated by an angle α, and a corresponding integer label (the rotation angle α). The goal is to make a regression to predict the angle of a rotated image and straighten it up.<br />
All files can be found in <i>tests/examples/cnn_linear_model</i> in my repo [2].<br />
<div>
<br /></div>
I have kept the structure as in the Matlab example, but I generated a new dataset starting from LeCun's MNIST digits (datasets at [3]). Each image was rotated by a random angle between 0° and 70°, in order to keep the right orientation of the digits (code in <i>dataset_generation.m</i>). In Fig. 1 some rotated digits with the corresponding original digits.<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLUmGwQe51cgvbrDfE67UBCdQwBmI5QcRF8Ot4ire3MEl4Al9PLO1gtjA-dgfy5E8N_XiNrjQSQToI1pbrbqjlz7ELCDFgzAaSIqb5-wNDtwYvpLh8m7Ve-1boWK0IALnxlU2h0muHaMRo/s1600/test.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="748" data-original-width="794" height="301" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLUmGwQe51cgvbrDfE67UBCdQwBmI5QcRF8Ot4ire3MEl4Al9PLO1gtjA-dgfy5E8N_XiNrjQSQToI1pbrbqjlz7ELCDFgzAaSIqb5-wNDtwYvpLh8m7Ve-1boWK0IALnxlU2h0muHaMRo/s320/test.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8px;">Figure 1. rotated images in columns 1,3,5 and originals in columns 2,4,6</td></tr>
</tbody></table>
</td></tr>
</tbody></table>
<div style="text-align: left;">
<br />
The implemented linear model is:</div>
<div style="text-align: center;">
$ \hat{Y} = \omega X + b $,</div>
<div>
where the weights $\omega$ and the bias $b$ will be optimized during the training minimizing a loss function. As loss function, I used the mean square error (MSE):</div>
<div>
<div style="text-align: center;">
$ \dfrac{1}{n} \sum_{i=1}^n (\hat{Y_i} - Y_i)^2 $,</div>
<div style="text-align: left;">
where the $Y_i$ are the training labels. </div>
</div>
<div>
<br /></div>
<div>
In order to show the effective improvement given by a Neural Network, I started to make a simple regression feeding the X variable of the model directly with the 28x28 images. Even if for the MSE minimization a close form exists, I implemented an iterative method for discovering some Tensorflow features (code in <i>regression.py</i>). For evaluate the accuracy of the regression, I consider a correct regression if the difference between angles is less than 20°. After 20 epochs, the convergence was almost reached, giving an accuracy of $0.6146$.</div>
<div>
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiQNrba7SaYblTeld0XjYR60z1JcfpmhK3wo8skGFB-axSS_DmwHAT_x8ceYGDfO7HiDS-iHUhSdl4Kq5acQbXNp8MIPvnd00L7nmR8wH0ZHcUt0BqRKshetwhqlVAWDIjEGxRdkwJdt6T/s1600/regr.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="764" data-original-width="787" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiQNrba7SaYblTeld0XjYR60z1JcfpmhK3wo8skGFB-axSS_DmwHAT_x8ceYGDfO7HiDS-iHUhSdl4Kq5acQbXNp8MIPvnd00L7nmR8wH0ZHcUt0BqRKshetwhqlVAWDIjEGxRdkwJdt6T/s320/regr.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 2. rotated images in columns 1,3,5 and after the regression in columns 2,4,6</td></tr>
</tbody></table>
<br />
I want to analyze now the improvement given by a feature extraction performed with a convolutional neural network (CNN). As in the Matlab example, I used a basic CNN since the input images are quite simple (only numbers with monochromatic background) and consequently the features to extract are few.</div>
<div>
</div>
<div>
<ul>
<li>INPUT [28x28x1] will hold the raw pixel values of the image, in this case an image of width 28, height 28</li>
<li>CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This results in volume such as [12x12x25]: 25 filters of size 12x12</li>
<li>RELU layer will apply an element-wise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ([12x12x25]).</li>
<li>FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1x1], which corresponds to the rotation angle. As with ordinary Neural Networks, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
</ul>
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAgX-P69vuf8IGQU45X6Qu3wUbnuyLQ8bJbWYnXc1446QFy1GvcKVlqyg_xsAO1DaDjvDmalUfyT_cODx2tM1_VD-puMhnAzpId_jEZerue4ZpPHnnifsjHM-FVF3U6AQOEmvC36zno90P/s1600/regression_scheme.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="783" data-original-width="1557" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAgX-P69vuf8IGQU45X6Qu3wUbnuyLQ8bJbWYnXc1446QFy1GvcKVlqyg_xsAO1DaDjvDmalUfyT_cODx2tM1_VD-puMhnAzpId_jEZerue4ZpPHnnifsjHM-FVF3U6AQOEmvC36zno90P/s640/regression_scheme.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 3. CNN linear model architecture</td></tr>
</tbody></table>
<div>
<br />
We can visualize the architecture with Tensorboard where the graph of the model is represented.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMHJFCTBYHqf_wSjVuz3IXuB81AA7ZlxCHNSBiw4-nbOB8t2wWGZAL1FI0elYf0ymqfPMTEIc1ui0wE_tg_nklLRD2MvTcEA-El00xJ5jlD-u7cCbyqg52uYMm0kVW75nRqPOKgM1OCHRt/s1600/tensorboard.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="718" data-original-width="900" height="510" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMHJFCTBYHqf_wSjVuz3IXuB81AA7ZlxCHNSBiw4-nbOB8t2wWGZAL1FI0elYf0ymqfPMTEIc1ui0wE_tg_nklLRD2MvTcEA-El00xJ5jlD-u7cCbyqg52uYMm0kVW75nRqPOKgM1OCHRt/s640/tensorboard.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 4. Model graph generated with Tensorboard</td></tr>
</tbody></table>
<br /></div>
<div>
With the implementation in <i>regression_w_CNN.py</i>, the results are quite satisfying: after 15 epochs, it reached an accuracy of $0.75$ (205 seconds overall). One can see in Fig. 4 the marked improvement of the regression.</div>
<div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4ltUBTjTi8HB9ZwlUhn3ayyuTV3aMhEhyI1wlPeNfFymGcD85HsD3XckUo6kwbp0cp1kYW36RR8De6MZekKyTR4O1eBXZo2AWwg9f4O7Tuwp71shGyyhSZF4Eo50txLpMiIxmoUn0kNFs/s1600/conv.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="764" data-original-width="818" height="298" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4ltUBTjTi8HB9ZwlUhn3ayyuTV3aMhEhyI1wlPeNfFymGcD85HsD3XckUo6kwbp0cp1kYW36RR8De6MZekKyTR4O1eBXZo2AWwg9f4O7Tuwp71shGyyhSZF4Eo50txLpMiIxmoUn0kNFs/s320/conv.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8px;">Figure 5. <span style="font-size: 12.8px;">rotated images in columns 1,3,5 and after the CNN regression in columns 2,4,6</span><br />
<div>
<span style="font-size: 12.8px;"><br /></span></div>
</td></tr>
</tbody></table>
</td></tr>
</tbody></table>
With the same parameters, Matlab reached an accuracy of $0.76$ in 370 seconds (code in <i>regression_Matlab_nnet.m</i>), so the performances are quite promising<br />
<br />
In the next post (in few days), I will integrate the work done up to now, calling the Python class within Octave and making a function that simulates the behavior of Matlab. Leveraging the layers classes that I made 2 weeks ago, I will implement a draft of the functions <i>trainNetwork</i> and <i>predict</i> making the Matlab script callable also in Octave.<br />
<br />
I will also care about the dependencies of the package: I will add the dependency from Pytave in the package description and write a test as PKG_ADD in order to verify the version of Tensorflow during the installation of the package. <br />
<br />
Peace,<br />
Enrico</div>
<div style="text-align: left;">
<br />
[1] <a href="https://it.mathworks.com/help/nnet/examples/train-a-convolutional-neural-network-for-regression.html" target="_blank">https://it.mathworks.com/help/nnet/examples/train-a-convolutional-neural-network-for-regression.html</a><br />
[2] <a href="https://bitbucket.org/cittiberto/octave-nnet/src/35e4df2a6f887582216fc266fc75976a816f04fc/tests/examples/cnn_linear_model/?at=default" target="_blank">https://bitbucket.org/cittiberto/octave-nnet/src/35e4df2a6f887582216fc266fc75976a816f04fc/tests/examples/cnn_linear_model/?at=default</a></div>
[3] <a href="http://yann.lecun.com/exdb/mnist/" target="_blank">http://yann.lecun.com/exdb/mnist/</a><br />
<br />Enrico Bertinohttp://www.blogger.com/profile/03936797921360245682noreply@blogger.com0tag:blogger.com,1999:blog-3202743535685562583.post-57143538901010072032017-06-12T23:49:00.000+02:002017-06-24T19:55:31.498+02:00Package structureHello!
During the first period of GSoC I have worked mostly on analyzing the Matlab structure of the net package in order to guarantee the compatibility throughout the whole project. The focus of the project is on the convolutional neural networks, about which I will write the next post.<br />
<br />
Regarding the package structure, the core will be composed by three parts:
<br />
<br />
<ol>
<li><b>Layers</b>:<b> </b>there are 11 types of layers<b> </b>that I defined as Octave classes, using <i>classdef</i>. These layers can be concatenated in order to create a Layer object defining the architecture of the network. This will be the input for the training function. </li>
<li><b>Training</b>: the core of the project is the training function, which takes as input the data, the layers and some options and returns the network as output. </li>
<li><b>Network</b>:<b> </b>the network object has three methods (activations, classify and predict) that let the user compute the final classification and prediction. </li>
</ol>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjOIHLXEAnTw_cPJrjGJTzqJFjdXX92Reag4feP6-IV5iIw4n3e_dg4zQFw-vNCobJgFxuaP2cPV29Q16pcLr8JYWe77tHuSLWofRsmUIsYhMzJI6Ai-k_O3vJ9DTLqEXWMRtfKZPQgHX4/s1600/structure.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjOIHLXEAnTw_cPJrjGJTzqJFjdXX92Reag4feP6-IV5iIw4n3e_dg4zQFw-vNCobJgFxuaP2cPV29Q16pcLr8JYWe77tHuSLWofRsmUIsYhMzJI6Ai-k_O3vJ9DTLqEXWMRtfKZPQgHX4/s400/structure.png" width="380" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 1: conv nnet flowchart </td></tr>
</tbody></table>
<div style="text-align: justify;">
<br /></div>
<div>
<br /></div>
<div>
I have already implemented a draft for the first point, the layers classes [1]. Every layer type inherits some attributes and methods from the parent class <i>Layers</i>. This is useful for creating the <i>Layer</i> object: the concatenation of different layers is always a Layer object that will be used as input for the training function. For this purpose, I overloaded the cat, horzcat and vertcat operators for <i>Layers</i> and subsref for <i>Layer</i>. I need to finalize some details for the disp methods of these classes.</div>
<div>
<br /></div>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZuekZ_GSgqo0UslGXxYwz_KZcKbUkIXTTGUL8ne7rqz4iyAQKNo1sjgsVSjBFeVuAK0aM5_kzkYCtKAbHRQvbEYniFkK1DJXksdQoHvooEeEpH32GartGCqyDvFFm-C4VjdmXqCjG3TOI/s1600/Schermata+2017-06-12+alle+23.16.39.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZuekZ_GSgqo0UslGXxYwz_KZcKbUkIXTTGUL8ne7rqz4iyAQKNo1sjgsVSjBFeVuAK0aM5_kzkYCtKAbHRQvbEYniFkK1DJXksdQoHvooEeEpH32GartGCqyDvFFm-C4VjdmXqCjG3TOI/s640/Schermata+2017-06-12+alle+23.16.39.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8px;">Figure 2: Layers classes definitions </td></tr>
</tbody></table>
</td></tr>
</tbody></table>
<i><br />InputParser</i> is used in every class for the parameter management and the attributes setting.</div>
<div>
<br /></div>
<div>
The objects of these classes can be instantiated with a corresponding function, implemented in the directory <i>inst/</i>. Here an example for creating a Layer object </div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span></div>
<span style="font-family: "courier new" , "courier" , monospace;">> # TEST LAYERS<br />> a = imageInputLayer([2,2,3]); # first layer<br />> b = convolution2dLayer(1,1); # second layer<br />> c = dropoutLayer(1); # third layer<br />> layers = [a b c]; # Layer object from layers concat <br />> drop = layers(3); # Layer element access<br />> drop.Probability # Access layer attribute<br />ans = 0.50000</span><br />
<div>
<br />
<div>
All functions can be tested with the <i>make check</i> of the package.<br />
<br />
<br />
The next step is a focus on the Tensorflow integration, with Pytave, writing a complete test for a regression of images angles and comparing the precision and the computational time with Matlab.<br />
<br />
[1] <a href="https://bitbucket.org/cittiberto/octave-nnet/src/35e4df2a6f887582216fc266fc75976a816f04fc/inst/classes/?at=default">https://bitbucket.org/cittiberto/octave-nnet/src/35e4df2a6f887582216fc266fc75976a816f04fc/inst/classes/?at=default</a></div>
<div>
<br /></div>
<div>
<br /></div>
<style type="text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Monaco; color: #f4f4f4; background-color: #000000; background-color: rgba(0, 0, 0, 0.95)}
span.s1 {font-variant-ligatures: no-common-ligatures}
</style><style type="text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Monaco; color: #f4f4f4; background-color: #000000; background-color: rgba(0, 0, 0, 0.95)}
span.s1 {font-variant-ligatures: no-common-ligatures}
</style></div>
Enrico Bertinohttp://www.blogger.com/profile/03936797921360245682noreply@blogger.com0tag:blogger.com,1999:blog-3202743535685562583.post-52453671386705108932017-05-16T12:53:00.000+02:002017-05-16T12:53:28.332+02:00Introductory PostHello, I'm Enrico Bertino, and this is the blog I'll be using to track my work for Google Summer of Code 2017. The project will consist in rifare the neural network package with a focus on convolutional neural networks. <a href="https://storage.googleapis.com/summerofcode-prod.appspot.com/gsoc/core_project/doc/5236143273541632_1491215106_ProposalGSoCEnricoBertino.pdf?Expires=1495004970&GoogleAccessId=summerofcode-prod%40appspot.gserviceaccount.com&Signature=q2zbDVZeBBnR54MmoV8D%2BQ2t7Pwixk%2BdR4UoCZmcIJPZL5NYcFS41sDVUCV%2BPQq3toJBdKg%2FncHbVbrPdDby2Wg2DHWW6LMMuzYxI0G8g3KwkFd5igMcE4T1MwBOYIOtNVhpUhymXHblLG956d7arQq1CZW0fHEJun6FQ7YhIG5tTYvQH0Z4FVTHRqLkw926%2Fmf5bfEJIvwQrQ3nsHPgfXsam9ZtldnXgk3M2u65G3G6GghtHYqcyrZV4%2B8mKHOUW%2Fk7oxccE2UYXFlHuuLbpDhfREa1G1yv%2FC62I0BWfBSJfUfqJw9clxGwdTfLz8rGaXwNsqiHuX33lgDJX6ey3Q%3D%3D">Here my proposal.</a><br />
<!--?xml version="1.0" encoding="UTF-8"?-->
<br />
<br />
Something about myself: I’m an Mathematical Engineering student and I live in Milan. After a BSc in Politecnico di Milano I did a master double degree both in Ecole Centrale de Nantes, France, and Politecnico di Milano. In the two years in France I attended engineering classes with spec in virtual reality and in Italy I am enrolled in the last year of MSc major Statistics and Big Data.<br />
<br />
During my project, I will leverage on the Pytave project in order to exploit Python code for the implementation of the package. This is almost necessary because nowadays there is a strong tendency to open-researching in the deep learning field, where even the biggest companies release their frameworks and open their code. The level of this resources is very high, mostly for frameworks and APIs on Python, such as Tensorflow. This integration will allow us to guarantee a continuous updating on neural networks improvement and leading to a greater focus on Octave interface. In this sense, letting Octave coexist with Tensorflow, we will exploit their synergy and reach fast consistent goals.<br />
<br />
As of now, I tested the integration of Tensorflow via Pytave. I have come across some problems of variable initialization in Ubuntu 16.04, whereas everything works fine on macOS Sierra. I will perform different tests in order to be be sure it works before starting the implementation of the package. <a href="https://bitbucket.org/cittiberto/octave-nnet">Here my repository.</a>Enrico Bertinohttp://www.blogger.com/profile/03936797921360245682noreply@blogger.com0