The GTX 745 and the tensorflow – gpu installation on Windows

Authoress: Eleonora Bernasconi

 

NVIDIA GeForce GTX 745 Graphics Card specifications

Specifications:https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-745-oem/specifications

CUDA Cores: 384

Base Clock (MHz): 1033

Memory Clock: 1.8 Gbps

Standard Memory Config: 4 GB

Memory Interface: DDR3

Memory Bandwidth (GB/sec): 28.8

 

Figure 01 – nvidia-smi for GPU monitoring

Open the command prompt and insert:

cd C:\Program Files\NVIDIA Corporation\NVSMI

nvidia-smi

N.B. The percentage of use of the GPU ranges between 92% and 94%, in the Windows Task Manager it remains at 70%.

Installing TensorFlow with GPU on Windows 10

Requirements

Python 3.5

Nvidia CUDA GPU. Make sure you do have a CUDA-capable NVIDIA GPU on your system.

Setting up the Nvidia GPU card

Install Cuda Toolkit 8.0 e cuDNN v5.1.

Download and install CUDA Toolkit

Toolkit 8.0 https://developer.nvidia.com/cuda-downloads

Example installation directory: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0

Download and install cuDNN

Install cuDNN version 5.1 for Windows 10:https://developer.nvidia.com/cudnn

Extract the cuDNN files and enter them in the Toolkit directory.

Environment variables

Make sure after installing CUDA toolkit, that CUDA_HOME is set in the environment variables, otherwise add them manually.

Figure 02 – Environmet variables CUDA_HOME parte 01

 

Figure 03 – Environmet variables CUDA_HOME parte 02

Install Anaconda

Download : https://www.anaconda.com/download/

Create a new environment with the name tensorflow-gpu and the python version 3.5.2

conda create -n tensorflow-gpu python=3.5.2

N.B. If you find that you have incompatible versions, turn on these commands to resolve the problem:

conda install -c conda-forge tensorflow-gpu

Anaconda will automatically install the required versions of cuda, cudNN and other packages.

Figure 04 – conda install -c conda-forge tensorflow-gpu

activate tensorflow-gpu

Figure 05 – activate tensorflow-gpu

 

Install tensorFlow

pip install tensorflow-gpu

Figure 06 – pip install tensorflow-gpu

Now you are done and you have successfully installed tensorflow with the GPU!

Remember to activate the command: activate tensorflow-gpu to get into GPU mode!

Test GPU

python

import tensorflow as tf

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

 

Figure 07 – test GPU

 

Test on CIFAR-10 with 10 epochs

Average Training Time per epoch:150 sec

Total time: 25 min

Figure 08 – Test on CIFAR-10 with 10 epochs

Spark and Machine Learning (MLlib)

Author: Antonio Berti

Translator: Sabrina Sala

 

In this tutorial we are going to describe the use of Apache Foundation’s library for Machine Learning: the so-called MLlib.

MLib is one of Spark’s API and it is interoperable with Python NumPy as well as R libraries. If it is developed with Spark, it is possible to use any type of Hadoop data source of Hadoop platform, e.g., HDFS, HBase, data sources coming from relational databases or local data sources such as text files.

Spark execels at interative computation, enabling MLlib to run fast and also allowing companies to use it in their business activity.

MLlib provides different types of algorithm, along with many utility functions. ML also includes classification algorithms, regression ones, decision trees, recommendation and clustering algorithms. Among the most popular utilities we may include trsformation, standardization and normalization, as well as statistical and linear algebra’s functions.

With the following code we wuold like to explain how to develop a simple logistic regression model using MLlib.

 

First of all we load the dataset.

val data =MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")

The dataset is then divided in two parts, one used for training the model (60%) and the other for the test (40%).

val splits = data.randomSplit(Array(0.6, 0.4), seed=11L)

val training = splits(0).cache()

val test = splits(1)

Next, we train the algorithm and build the model.

val model = new LogisticRegressionWithLBFGS()

   .setNumClasses(10)

   .run(training)

The model is run on the test dataset.

val predictionAndLabels = test.map {case LabeledPoint(label, features) =>

 val prediction = model.predict(features)

  (prediction, label)}

By doing so, we can collect model metric and the predictive accurancy.

val metrics = new MulticlassMetrics(predictionAndLabels)

val accuracy = metrics.accuracy

println("Accuracy = $accuracy")

It is now possible to store the model after training fase is over, so that we can recall it as and when required.

model.save(sc, "target/tmp/scalaLogisticRegressionWithLBFGSModel")

val sameModel = LogisticRegressionModel.load(sc,

               "target/tmp/scalaLogisticRegressionWithLBFGSModel")

 

References:

Spark- http://spark.apache.org

MLlib – http://spark.apache.org/docs/latest/ml-guide.html

Data examples – https://github.com/apache/spark/tree/master/examples/src/main/scala/org/apache/spark/examples

Sample dataset of MLlib – https://github.com/apache/spark/tree/master/data/mllib

 

Tensor Building in Pytorch

Author: Andrea Mercuri

The fundamental type of PyTorch is the Tensor just as in the other deep learning frameworks. By adopting tensors to express the operations of a neural network is useful for two a two-pronged purpose: both tensor calculus provides a very compact formalism and parallezing the GPU computation very easily.
Tensors are generally allocated into the Computer’s RAM and processed by the CPU or into the Graphic Card’s RAM processed by the GPU, this second format is called CUDA format. Below there is a list of all the tensor types supported by PyTorch.

 

Type CPU tensor GPU tensor
32-bit floating point torch.FloatTensor torch.cuda.FloatTensor
64-bit floating point torch.DoubleTensor torch.cuda.DoubleTensor
16-bit floating point torch.HalfTensor torch.cuda.HalfTensor
8-bit integer(unsigned) torch.ByteTensor torch.cuda.ByteTensor
8-bit integer(signed) torch.CharTensor torch.cuda.CharTensor
16-bit integer(signed) torch.ShortTensor torch.cuda.ShortTensor
32-bit intero (signed) torch.IntTensor torch.cuda.IntTensor
64-bit intero (signed) torch.LongTensor torch.cuda.LongTensor

 

In order to use them, firstly we import PyTorch:

import torch

We can create an empty tensor by means of the constructor provided for each of the types listed above:

x = torch.FloatTensor(3, 2)
print(x)
1.00000e-25 *
&nbsp&nbsp9.9872  0.0000
&nbsp&nbsp9.9872  0.0000
&nbsp&nbsp0.0000  0.0000
&nbsp&nbsp[torch.FloatTensor of size 3x2]

This tensor is created in the main RAM. If we wish to create a tensor within the GPU, we need to use a CUDA type:

x = torch.cuda.FloatTensor(3, 2)
print(x)
nan nan
nan nan
nan nan
[torch.cuda.FloatTensor of size 3x2 (GPU 0)]

In this case, the tensor is created in the first GPU available. The GPUs on the computer are numbered by an  an integer number starting from 0.
We can generate tensors from Python lists:

>torch.FloatTensor([[1,2,3],[4,5,6]])
1  2  3
4  5  6
[torch.FloatTensor of size 2x3]

or from numpy arrays:

x_np = np.array([1,2,3,4], dtype=np.float32)
x = torch.FloatTensor(x_np)

We get the same result using from_numpy method:

x = torch.from_numpy(x_np)

It’s important to realize that the numpy array and the PyTorch tensor share the same content data. So If we modify one of them, the other will change as well:

x[0] = 0
print(x_np)
[ 0.,  2.,  3.,  4.]
print(x)

0
2
3
4
[torch.FloatTensor of size 4]
[/code]
We can create tensors from other tensors:

y = torch.FloatTensor(x)
print(y)
0
2
3
4
[torch.FloatTensor of size 4]

Again, the new tensor shares the data with the original tensor.
We are able to create tensors of zeros:

torch.zeros(3,2)
0  0
0  0
0  0
[torch.FloatTensor of size 3x2]

Furthermore, we can build tensors made of pseudo-random numbers from a certain statistical distribution, for example, a uniform distribution on the [0,1] interval:

torch.rand(2, 3)
0.1256  0.0406  0.2072
0.2479  0.0515  0.093
[torch.FloatTensor of size 2x3]

Every tensor survives within the allocation space of the central memory or the video card the memory according to our willingness. Two tensors might be the operands of the same operation only under the constraint that they lay in the same memory. In this case the outcoming tensor also lives in the same memory space. Conversely, if we try to combine (for example by summing them) a tensor in the main RAM with a tensor in a video card (or two tensors in two different video-card) we fall into an exception:

xcpu = torch.FloatTensor(3,2)
xgpu = torch.cuda.FloatTensor(3,2)
xcpu + xgpu
TypeError             Traceback (most recent call last)
in ()
----> 1 xcpu + xgpu
…

In the case we intend to copy a tensor X onto the first GPU, we make use of the cuda method:

y = x.cuda(device=0)

If the tensor is already located in the first GPU, we get the original tensor back.
Instead, we can use the CPU method to get a copy of tensor X on the main RAM:

y = x.cpu()

We can convert a tensor to another type by passing that type as a parameter to the type method:

y = x.type(torch.ByteTensor)

We get the same result by calling a conversion method.

y = x.byte()

If we want to change the type and copy the tensor onto the GPU simultaneously, we have to pass a CUDA type to the type method:

y = x.type(torch.cuda.ByteTensor)

or we can write:

y = x.byte().cuda()

To change the status of the second GPU to “current” we needset_device:

torch.cuda.set_device(1)

Hence, if we write:

torch.cuda.current_device()

we have been returned  1, which means the second GPU is the current one (no longer the first one) by now. If we are going to call this CUDA method on a tensor, the system will return a copy of this tensor laying on the second GPU rather than the first one. By exploiting a context manager we are enabled to temporarily change the status of current GPU, as well.
For instance, if we write:

with torch.cuda.device(1):
&nbsp&nbspx1 = torch.cuda.FloatTensor(2,3)
x2 = torch.cuda.FloatTensor(2,3)

Where the initial current GPU is the one with 0 index, so X1 has been created on the second GPU (index 1), X2 on the first one (index 0).
All the mentioned functionalities related to the tensor creation come from the packages torch, torch.cuda and in the class torch.Tensor.

In the next tutorials, we will continue with the tensor exploration.

References

Installation of Keras/Tensorflow – Theano on Windows

Authors: Francesco Pugliese & Matteo Testi

 

In this post, we are going to tackle the tough issue of the installation, on Windows, of the popular framework for Deep Learning “Keras” and all the backend stack “Tensorflow / Theano“.

Installation starts from the need to download the Python 3 package. Let us choose Miniconda and download it at the following link: https://conda.io/miniconda.html that will show the following screen:

 

 

Select Python 3.6 and the operating system version: Windows a 64-bit or 32-bit. Click on the downloaded package and install it with the default settings. In the end of the installation accept the system reboot.

Once the PC is rebooted, from the Windows’ search box, digit cmd.exe and run the prompt. Then run the script c:\Users\-user-\Miniconda3\Scripts\activate.bat which will launch the Anaconda’s prompt (change -user- with the current account name).

Therefore, digit: conda install numpy scipy mkl-service m2w64-toolchain in order to install:

  1. numpy” Python library which is very useful for the matrices and arrays management.
  2. scipy” that is a scientific computing library for python.
  3. “mkl-service” optimization library with vectorial maths routines to speed-up mathematical functions and applications.
  4. “libpython” library for Python 3 for Machine Learning and effective code development. 
  5. “m2w64-toolchain” providing a GCC compatible version, so it is strongly recommended.

Other optional libraries are:

  1. “nose” library for programs testing in Python.
  2. “nose-parameterized” for the parametric testing.
  3. “sphinx” library for building program’s stylish documentation in diverse formats (HTML, PDF, ePyub, etc.).
  4. “pydot-ng” interface for the graphic rendering language Graphviz’s Dot.

Once the environment settings are finished, at this stage you are able to install Cuda drivers from the following link:

https://developer.nvidia.com/cuda-downloads

This will open the following view with different operating systen options and things:

 

 

Download the local version (recommended) of the installation file and proceed with the Cuda drivers installation. Cudas are parallel programming libraries of the Nvidia GPU (Graphic Processing Unit) which is part of the video card. It might be necessary to install the card drivers as well in case it is not updated or it is not working properly.

When the Cuda driver installation process finally ends (and all the possible graphic card drivers)  run the Theano installation plus the additional supporting library “libgpuarray” which is required to handle tensors on GPU, with the command:

conda install theano pygpu

Theano NOTE 1: In order to install Theano we suggest to always use at least 1 point version less of Cuda with regard to the current version. This is due to uneffective maintenance of Theano which is not rapidly up-to-dated and this leads to compilation errors after the installation with the current version of Cuda. For instance, at this time, the most stable version of Theano is 0.9.0, for which we suggest to use Cuda 8.0 instead of Cuda 9.0. There might exist some tricks online to make things working between Cuda 9 and Theano 0.9 but they turn out a little bit tricky and take time, and it will not be worh the risk eventually. Our hint is to handle a steady Cuda-Theano configuration such as the ones recommended.

Now you need to install Visual Studio providing to Theano the C++ compiler for Windows (indeed the previously installed GCC refers to the only C compiler). In order to do this, download Visual Studio Community from the link: https://www.visualstudio.com/it/downloads/ and follow all the required steps, trying to install only the basic components for C++.

Theano NOTE 2: Seemingly, after the next release, Theano will be dismisses, Bengio himself explains it in this: link There are multiple reasons for this choice, we believe essentially due to the latest massive competition from the other Deep Learning frameworks (mxnet, tensorflow, deeplearning4j, gluon, to name a few) which are more mantained. As we just showed, Theano constantly suffers updating problems from the MILA team. However we believe Theano is still a milestone for Deep Learning, the first that introduced the automatic differentiation, clear and  effective parallelization of matrix operations on GPU that enabled the spread of GPU deep neural networks. Hence we consider the need of giving the right prestige to this brilliant framework, and after all it still confers its upside in terms of versatility and speed when used as backend of Keras.

Visual Studio NOTE: Also Visual Studio is affected by compatibility problems with Theano. Basically Visual Studio 2017 will return an exception during the import of Theano both with Cuda 9 and Cuda 8. Therefore we suggest to install a stable preceding version like Visual Studio 2013.

Once you have installed Visual Studio you need to fill the .theanorc, which is the Theano configuration file, you can find it within Miniconda3 at the path: c:\Users\-user-\.theanorc

Fill .theanorc as follows, given that you have decided to install Cuda 8 e Visual Studio 2013 :

[global]
device = gpu
floatX = float32

[cuda]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0

[dnn]
enabled=False

[nvcc]
compiler_bindir = C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin

[lib]
cnmem = 0

Let us pay attention to these parameters: “device” of [global] section which define whether you want to use CPU or GPU, “root” of [cuda] section needed to set Cuda libraries path, whereas “compiler_bindir” of [nvcc] needed to define the C++ compiler path of Visual Studio which is critical to compile Theano programs. CNMeM refers to a library (built-in in Theano) allowing you to set (by means of a value between 0 and 1) the way Deep Learning framework is capable to handle the GPU shared memory, and the way to speed-up neural networks computation on Theano. For example, video cards shared with the monitor we suggest a parameter around 0.8 whereas stand-alone graphic cards work with a cnmem equal to 1.

Another very important parameter for boosting the computation, especially for the convolution, is the setting “enabled” of [dnn]  section which allows to enable or disable Nvidia CuDNN libraries. This is basically a library supplying optimized primitives for deep neural networks leading to the speed-up of th training stage, testing and to energy saving.

In order to install CuDNN you need to go to this link: https://developer.nvidia.com/cudnn and click on the download button. Proceed with the download (Nvidia membership registration might be necessary), the following screed should pop up:

 

 

cuDNN NOTE: also in this case as previously stated, we advise not to download the last version of cuDNN but one of the two preceding version as it may not be “seen” neither by Cuda 8 nor Theano 0.9, in this case we recommend cuDNN 6.0 version. Anyway youa warning may arise with Theano 0.9 indicating that cuDNN version is too much new and could generate possible problems. We noticed incompatibility problems between cuDNN and TensorFlow as well.

Extracting the downloaded file, you will obtain 3 folders: bin, lib and include. All you need to do is copying the content of these folders into the folders of the same name withing the Cuda directory, namely inside: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0

Hence copy cudnn64_6.dll into bin of Cuda path, copy cudnn.h into include and finally copy cudnn.lib into lib.

Once you have installed cuDNN, go with the installation of Keras by means of pip:

pip install keras

The instruction will install all the dependencies and also the last version (currently Keras 2.0.9). To set Theano as Keras backend, go into the folder: c:\users\-user-\.keras and edit the file keras.json as follows, namely setting “theano” as “backend” item.

{
“floatx”: “float32”,
“epsilon”: 1e-07,
“backend”: “theano”,
“image_data_format”: “channels_last”
}

To check everything is going the right way, launch the anaconda prompt and then launch python. From the python prompt digit: import keras. If everything went well the following screen will appear:

 

 

Notice the warning we mentioned earlier speaking of cuDNN and showed by Theano it self: if you meet all the listed problems downgrade cuDNN to the 5.1 version as adived by the team itself. When a stable version of Theano 0.10 will come out, it will probably solve all these compatibility problems.

Anyway we know that the the environment configured with Keras and Theano according these modalities perfectly works on a diverse models that we previously trained and tested. We decided to use Theano as backed because it turns out faster than TensorFlow with some Computer Vision trainings very often.

In any case, if you want to use TensorFlow as backend you need to install it. To install tensorflow for GPU you need to do the following command:

pip install –upgrade tensorflow-gpu

This instruction will install the last version (1.4.0) of Tensorflow-gpu. To try it with Keras change “theano” with the string “tensorflow” withing the file keras.json, reboot the anaconda prompt and re-digit import keras.

TensorFlow NOTE: it is not supported on 32 bit platforms, installation program will download only the wheel related to the 64 bit framework. Furthermore, in order to download the cpu version you just need to specify the following command (without gpu): pip install –upgrade tensorflow.

If everything went fine, you will see TensorFlow appearing as keras backend this time:

 

Other useful packages to work with Keras are:

  1. scikit-image: A library very useful for image processing in python, that allow us to save matrices and tensors onto jpeg pictures or many other supported formats.  Installable with: conda install scikit-image.
  2. gensim: The word embeddings library implementing word2vec algorithm, among other things. Installable with: conda install gensim.
  3. h5py: The library interfacing with the format HDF5 from Pythonic, this is necessary to save models trained on disk in Keras. Installable with pip install h5py.

At this point, the environment Keras/Tf-Th on Windows is ready to go, to test you code and your models natively harnessing the GPU.

Enjoy!

See you at the next tutorial.

Greetings from Deep Learning Italia.

 

For any information or clarification here you have our emails:

Francesco Pugliese – f.pugliese@deeplearningitalia.com

Matteo Testi – m.testi@deeplearningitalia.com