Numerical optimization can be likened to a kangaroo searching for the top of Mt. Everest. Everest is the global optimum, but the top of any other really high mountain such as K2 would be nearly as good.
When training a neural network, initial weights are usually chosen randomly, which means that the kangaroo may start out anywhere in Asia. If you know something about the scales of the inputs, you may be able to get the kangaroo to start near the Himalayas. However, if you make a really stupid choice of distributions for the random initial weights, or if you have really bad luck, the kangaroo may start in South America.
In standard backprop or stochastic approximation, the kangaroo is blind and has to feel around on the ground to make a guess about which way is up. He may be fooled by rough terrain unless you use batch training. If the kangaroo ever gets near the peak, he may jump back and forth across the peak without ever landing on the peak. If you use a decaying step size, the kangaroo gets tired and makes smaller and smaller hops, so if he ever gets near the peak he has a better chance of actually landing on it before the Himalayas erode away. In backprop with momentum, the kangaroo has poor traction and can't make sharp turns.
With Newton-type (2nd order) algorithms, the Himalayas are covered with a dense fog, and the kangaroo can only see a little way around his location. Judging from the local terrain, the kangaroo make a guess about where the top of the mountain is, and tries to jump all the way there. In a stabilized Newton algorithm, the kangaroo has an altimeter, and if the jump takes him to a lower point, he backs up to where he was and takes a shorter jump. If the algorithm isn't stabilized, the kangaroo may mistakenly jump to Shanghai and get served for dinner in a Chinese restaurant. (I never claimed this analogy was realistic.) In steepest ascent with line search, the fog is very dense, and the kangaroo can only tell which direction leads up. The kangaroo hops in this direction until the terrain starts going down again, then chooses another direction.
Notice that in all the methods discussed so far, the kangaroo can hope at best to find the top of a mountain close to where he starts. There's no guarantee that this mountain will be Everest, or even a very high mountain.
In simulated annealing, the kangaroo is drunk and hops around randomly for a long time. However, he gradually sobers up and tends to hop up hill.
In genetic algorithms, there are lots of kangaroos that are parachuted into the Himalayas (if the pilot didn't get lost) at random places. These kangaroos do not know that they are supposed to be looking for the top of Mt. Everest. However, every few years, you shoot the kangaroos at low altitudes and hope the ones that are left will be fruitful and multiply.
From all this we can conclude that the best methods to find the Mount Everest are (in order):
1. to know where it is
2. to have a map on which you can find it
3. to know someone who knows where it is or who has a map
4. to send (a) kangaroo(s) to search for it
and even if you have to send a kangaroo, it is useful if you know at least:
1. where the mountain range is in which the Mount Everest may be and
2. how to bring your kangaroo to that mountain range.
Page was generated in 0.0131 seconds