So, my understanding is that it generates a neural network architecture and then proceeds to train a neural network on this architecture, then evaluates how to improve this model (with what mechanism?) and attempts the changes over the model, running it through the process over again.
Not a genetic algorithm per se, but it depends on the mechanism employed to evaluate and improve the child neural network.
What I think is happening is networks are generated, then they are trained by the normal means (standard backpropagation). The "controller" NN has nothing to do with this part.
Then some metrics/benchmarks are generated (how well the tested network performed in the training process. This is just standard data collection, e.g. looking at how quickly the network converged and how accurate it's results were).
The controller network then learns the mapping from network architecture to benchmarks (standard backpropagation again).
After this, the controller needs to generate new networks to try out. Two possible ways to do this: one would be to tweak "good" networks slightly, e.g. explore the search space in a guided way based on search space regions which look good. They hint at this in the article. The controller NN is basically a heuristic that can guesstimate how good novel topologies are going to be at the task.
But I have a hunch that what they might be doing is feeding the
maximum possible score into the controller NN's output end, then backpropagate it all the way to the inputs. That way you would in fact get a single "ideal" network out of the controller, which you can test, and that would also challege the controller NN, since any error between the "ideal" network and the actual benchmark would be used to retrain the controller NN. I have a feeling this way of doing things might be prone to getting stuck in local maxima / ruts, however.