Tuesday, July 17, 2018

Tensorflow: first book (continued)

Before moving to the next book, first a posting on an example given in the Tensorflow book by Ramsundar and Zadeh.

The chapter on convolutional neural networks discusses training a tensorflow architecture to recognize handwritten digits taken from the MNIST dataset. The given Python code automatically downloads the dataset from the Web and partitions the labeled data into a train, validation, and test set (as explained in the book, used to train the network, validate the performance of the model, and test the final model, respectively).

The ultimate objective of the algorithm is, given the tensor with handwritten digits shown below to the left, finding the tensor with labels shown below to the right. 

[7 2 1 0 4 1 4 9
 5 9 0 6 9 0 1 5
 9 7 3 4 9 6 6 5
 4 0 7 4 0 1 3 1
 3 4 7 2 7 1 2 1
 1 7 4 2 3 5 1 2
 4 4 6 3 5 5 6 0
 4 1 9 5 7 8 9 3]


Clearly a fun example, since recognizing digits is an intuitive, but non-straightforward problem. I highly recommend running and modifying the code to become more familiar with such a network (as I did to generate the examples above), and see how the guesses become better with each next iteration of the algorithm.

Sunday, July 8, 2018

TensorFlow: first book

Some first impressions after finishing the book "TensorFlow for Deep Learning - From Linear Regression to Reinforcement Learning" (by Ramsundar and Zadeh). 

The book introduces the concept of tensors, primitives and architectures for deep learning, and the basics of regression, various neural networks, hyperparameter optimization, and reinforcement learning. The art work in the figures is beautiful (something that convinced me to buy the book). The tensorflow code examples can be downloaded from the book's website, making it easy to follow along with the discussion the book.

The book falls a bit short on detailed explanation, however. I found that many times when the discussion in the book was about to get interesting, it referred to other work for details instead. Several architectures were merely "explained" with a figure, no accompanying details in the text.

In addition, although I realize how hard it is to avoid errors in a book, the given linear regression example has a very unfortunate bug. The tensorflow code given in the book fits some toy data with the linear regression shown to the left (with a discussion on how gradient descent algorithms are sometimes trapped in a local minimum). However, with a minor fix that avoids the wrong shape in the loss function, the much better linear regression shown to the right is computed instead.

Linear Regression (original)
Linear Regression (bug fix)

Finally, although I of course understand the generality of gradient descent algorithms, at first reading I was a bit surprised that the tensorflow code needs 8000 iterations to derive an approximation that could have been found by a simple least-squares regression in no time.

Please let me know your thoughts, and stay tuned for my impressions of the other books.

Saturday, July 7, 2018

TensorFlow for Deep Learning

As a CS student, a long time ago in a country far away, I was very interested in AI (Artificial Intelligence), and not just for chess playing programs. In fact, if it weren't for my professor convincing me to continue with compilers and high-performance computing, I may have ended up specializing in the field of AI. Perhaps lucky for me, since AI has gone through many rounds of boom-and-bust.

Nowadays, however, machine learning in general, and deep learning in particular really seem to have taken AI in a very promising new direction. Since I feel machine learning will become an important, if not mandatory skill for computer scientists, I decided to buy a few books on TensorFlow and familiarize myself with the new paradigm.

For starters, I bought the three O'Reilly books below (other recommendations are welcome) and plan to do a few follow-up posts on this topic.