Thursday, November 19, 2009

Another visualization of RNN evolution



I evolved a recurrent neural network (Elman network with 5 hidden sine nodes) on the 89-state maze task (described here) using CMA-ES (thanks to Nikolaus Hansen for making his source code publicly available!), and was able to get good results (approx. 52 time-steps to solve, averaged over 1000 trials.) Each time an improved solution was found, the parameters were written to a text file. The above graphic visualizes the progression of the parameters through time.

Visualization done with Processing 1.0.

Thursday, August 20, 2009

77neurons: Cart-pole visualizations


This is the first video of my project with Jad Nohra; our team is called 77neurons.

This is the non-Markov version of the cart-pole problem (no velocity information given.) The controller is a recurrent neural network using 2 sinusoidal nodes in the hidden layer (as opposed to more standard sigmoids.) It is currently being trained with a simplistic stochastic local search procedure, but we intend to implement various forms of reinforcement learning in the future.

Saturday, June 06, 2009

Delayed N-Armed Bandit

I have been trying to more thoroughly understand concepts related to reinforcement learning; specifically temporal difference learning. I reread portions of the book by Barto and Sutton. One of the examples given in the book as a toy problem is that of the n-armed bandit. An interesting variant of this problem occurred to me – What if the rewards are not dispensed immediately, but rather after some fixed (or perhaps random) delay?

slot machines

Imagine walking into a casino and being given one hour to play a large group of slot machines. However, unlike regular slot machines, these ones will produce a fixed amount of coins per pull. The trick is to figure out which slot machine gives you the most reward, and then exploit it. Sounds simple, right? There are two caveats: First, the machines will only give you the coins after some unknown delay. Second, when you receive the coins, you will not know the identity of the machine that produced the reward.

I wrote a fairly simple program in java to do this experiment by making use of eligibility traces, and found that it worked far better than I had expected was possible. Given a large number of possible actions (say n = 30), the system was able to properly assign credit to actions such that for all actions in A, if the real reward R(An) < R(Am) then the expectation of reward E(An) < E(Am) as well - so the actions would be ranked correctly in terms of value. This ranking could be learned perfectly, even when the delay between action and reward was 100 time-steps or more. Granted, a large number of observations are required (in the millions), but it is still impresses me that such a thing can be done at all.

In order to understand how and why this worked, I attempted to characterize the result mathematically, and was able to do so in a way that very closely matched the empirical results:

Delayed N-Armed Bandit Eligibility Equation

Where:

r := eligibility trace decay rate (such that 0 < r < 1)
d := (fixed) delay between causal action and reward
E[Vall] := Expectation of reward averaged over all actions
E[Vcause] := Expectation of reward due to causal action

This rather simple equation appears to nearly perfectly describe the observed results in the limit of a great number of observations.

The main technique used is treating the “volume” of eligibility as the sum of an infinite geometric series, which is where the 1 / (1 – r) terms come from. Basically, we are assuming a randomized action policy, and therefore, an average over all possible rewards weighted by their relative likelihoods. These rewards will be (incorrectly) applied to our “causal” action, weighted by the eligibility. The one difference is the time-step at which the “actual” reward is emitted; this will be weighted by the eligibility at that delay, which is represented by the rd terms.


Additionally, I tried some variations on the basic experiment.

Successful circumstances:

  • Various delays, uniform across bandits (up to about 500 time-steps)

  • Online randomized delays (varies from observation to observation)

  • Heterogeneous delays across bandits

  • Skewed action frequencies (some actions taken far more frequently than others)

  • Online Gaussian noise added to rewards (up to variance of 10)

  • Various numbers of bandits (up to about 30)

  • Non-random sequences of action (e.g. 1, 2, …, n–1, n, 1, 2, …)


Unsuccessful circumstances:

  • Replacing traces

Wednesday, December 24, 2008

Evolving Mona Lisa

Basic idea: I evolved a feed-forward neural network that takes (x, y) pixel coordinates (each dimension normalized to fall within [0, 1]) as inputs, and outputs a scalar value in the range of [0, 1], representing the grayscale value of that pixel. For example, the image size used during evolution of "Mona" was 30 by 34 pixels, meaning that the neural network needed to be iterated 1020 times to generate a single image; each iteration of the neural network generates only a single pixel. Of course, the neural network will smoothly generalize over any input coordinates in the range [0, 1], therefore it is straightforward to have it generate larger images (as depicted below) simply by partitioning the 2D plane using finer granularity. It is evolved on the smaller images for sake of efficiency.

The results are by no means perfect, but still fun to watch.

Original Version MonaEvolved Version Mona
Original Version StripesEvolved Version Stripes

Tuesday, July 22, 2008

Visualizing high-dimensional fitness landscapes

I've recently been exploring various ways of improving my intuitive understanding of fitness landscapes. This is a difficult problem, because the parameter spaces I want to be able to think about are often composed of dozens or even hundreds of dimensions. I have found that most authors describing fitness landscapes do so using overly simplified diagrams that somewhat resemble hills or mountain ranges.

In some recent experiments, I've found some nice ways of visualizing the geometry of actual fitness landscapes. I produced the images below by evolving recurrent neural networks to optimize performance on various tasks (e.g. wandering through a virtual maze), and then examining the resulting fitness values of a large number of very nearby points to those evolved solutions. Of course it is impossible to visualize the entire space, because of the high dimensionality; so instead what we are looking at are randomly selected 2D slices through that space. The shading is representative of numeric fitness (dark = low-fitness, light = high-fitness).

Here are a few examples of the results:
















Tuesday, March 27, 2007

Really sensitive learning algorithm

HILLCLIMBING

I just finished rereading a book called Stochastic Local Search, and was wondering if I could apply some of the techniques described therein to the analysis of fitness landscapes produced by neuro-evolution. As a first experiment, I decided to implement a very simple hill-climber. I defined the "neighborhood" as the set of vectors produced by transforming the current solution by a fixed amount along a single dimension at a time.

Therefore, a solution of {0.3, -2.4, 2.1} would have the following neighborhood (assuming a step-size of 0.1):

x += 0.1 -> {0.4, -2.4, 2.1}
x -= 0.1 -> {0.2, -2.4, 2.1}
y += 0.1 -> {0.3, -2.3, 2.1}
y -= 0.1 -> {0.3, -2.5, 2.1}
z += 0.1 -> {0.3, -2.4, 2.2}
z -= 0.1 -> {0.3, -2.4, 2.0}

For N parameters, the size of the neighborhood would be 2N.

So, as a first test, I wanted to see how well simple hill-climbing could train a neural network attempting to solve n-bit parity, a standard machine-learning benchmark. I tried 5 bits (this has 32 training exemplars), and it solved it very rapidly. This was a surprise as I was expecting it to get stuck; I merely wanted to see how far it would get before getting stuck.

I eventually got it up to about 94% correct on 12-bit parity. It should be noted that this is an MUCH larger version of the problem (4096 training exemplars), and this is also MUCH larger than what is standardly attempted. Most are 6 or fewer bits. (One interesting larger example - 21 bits - can be found here).

I compared this new method against Evolutionary Programming, and it proved far superior - EP never made it much past 65% accuracy.

NOISE IS BAD

I decided to see how this algorithm would fair in the presence of some noise applied to the fitness function. Incredibly, even the slightest amount of noise totally disrupts it! I have tried applying a gaussian noise with a standard deviation more than 5000 times less than the maximal fitness of the function, and strangely enough just this tiny "whisper" of noise is enough to cripple its performance.

I'm very intrigued, as my intuition has been severely violated twice by these experiments. Clearly more are in order.

Monday, January 22, 2007

miscellaneous stuff

I found a very interesting podcast on iTunes, called Talking Robots talking about both Robotics and AI.

Just read that a Genetic Algorithm has been used to solve a billion-variable problem! The particular algorithm used is something called a Compact Genetic Algorithm (cGA), which uses a statistical model to generate new candidate solutions rather than more standard operators like crossover and mutation. This algorithm falls in the class of Estimation of Distribution Algorithms.

I've been experimenting with an java based open source data mining tool called Weka. I've been using it to try to find patterns in neural network fitness landscapes. As a first experiment, I randomly generated several thousand neural networks and tested them against Exclusive-OR. The value of their parameters and the resulting fitness was used to generate an ARFF file recognizable by Weka. I then analyzed the data using quite a few different tools, including trees, bayesian methods, linear regression, multilayer perceptrons, etc... My conclusion? Looking for patterns in the fitness landscape of a neural net is HARD. None of the dozen or so classifiers I used were able to do any better than chance (most did worse) after performing 10-fold crossvalidation. And this is when dealing with a space of only 9 dimensions, much smaller than anything practical.

I've been reading about the Hierarchical Bayesian Optimization Algorithm (hBOA). The idea behind this is to infer bayesian directed acyclic graphs (DAGs) based on a selected set of successful candidates and use that to generate new candidates (this is also an Estimation of Distribution Algorithm, as described above). By using bayesian DAGs, it is able to sample conditional dependencies, rather than being limited to a univariate model (like the cGA). This allows it to discover "building-blocks" of progressively greater complexity, and it is able to solve some very difficult problems, including Hierarchical If-And-Only-If (HIFF), which is impossible for other evolutionary algorithms. While this is indeed impressive, I am not personally convinced that many real-world problems possess such a structure (however, one example they have found so far is in finding ground-states for Spin Glasses). My strong feeling is that such a technique would not translate well into Neuro-evolution, because small groups of link-weights in a neural network cannot impart fitness without reference to the whole - in other words, I don't think that, in general, building blocks in artificial neural networks even exist, certainly not recurrent ones. [One possible counter-example to this intuition would be Enforced SubPopulations]. I may be proven wrong later, but that's how it looks to me right now.