https://www.youtube.com/watch?v=gQddtTdmG_8Thought I'd share this. It's about making language vectors with neural networks, and how a very simple model leads to some surprisingly cool features.
You start with a vector say 10,000 units long. Each input represents a unique word. You feed it through an NN with a row with 300 units, and the output is also 10,000 units long, and represents the probability of any other word being near the input word. You then train it long enough and on enough language data that it accurately models the language (in a statistical sense).
The real trick here is that your 10000 words are being funneled through a layer only 300 units wide. So those 300 values (a 300 unit long vector)
must be compressing down all the information. You can then extract the 300 hidden layer values for any specific word, and do "concept calculus" with them.
And remember, all this is from unsupervised learning from raw language data:
For example, if you take the vector for "king" then subtract the vector for "man" and add the vector for "woman" then you get the vector for "queen". If you take the vector for "London" subtract "England" and add "Japan" you get "Tokyo". take "bark" subtract "dog" and add "cat", you get "miaowing". Also from the data set, take "shirt" subtract "man" and add "woman" and you get "blouse". So the hidden layer cells are clearly encoding a whole bunch of meaningful relationships from the real world in them, at least as far as our encoding of the world in language goes.