Neal Wu 0d9a3abdca Remove all references to 'tensorflow.models' which is no longer correct 9 gadi atpakaļ
..
BUILD 86ecc9730d Moving example models from github.com/tensorflow/tensorflow to github.com/tensorflow/models 9 gadi atpakaļ
README.md bb5798c7a0 Made several fixes to the embedding README 9 gadi atpakaļ
__init__.py 0d9a3abdca Remove all references to 'tensorflow.models' which is no longer correct 9 gadi atpakaļ
word2vec.py 0d9a3abdca Remove all references to 'tensorflow.models' which is no longer correct 9 gadi atpakaļ
word2vec_kernels.cc 86ecc9730d Moving example models from github.com/tensorflow/tensorflow to github.com/tensorflow/models 9 gadi atpakaļ
word2vec_ops.cc 86ecc9730d Moving example models from github.com/tensorflow/tensorflow to github.com/tensorflow/models 9 gadi atpakaļ
word2vec_optimized.py 0d9a3abdca Remove all references to 'tensorflow.models' which is no longer correct 9 gadi atpakaļ
word2vec_optimized_test.py 0d9a3abdca Remove all references to 'tensorflow.models' which is no longer correct 9 gadi atpakaļ
word2vec_test.py 0d9a3abdca Remove all references to 'tensorflow.models' which is no longer correct 9 gadi atpakaļ

README.md

This directory contains models for unsupervised training of word embeddings using the model described in:

(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space, ICLR 2013.

Detailed instructions on how to get started and use them are available in the tutorials. Brief instructions are below.

To download the example text and evaluation data:

wget http://mattmahoney.net/dc/text8.zip -O text8.zip
unzip text8.zip
wget https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/word2vec/source-archive.zip
unzip -p source-archive.zip  word2vec/trunk/questions-words.txt > questions-words.txt
rm source-archive.zip

Assuming you have cloned the git repository, navigate into this directory and run using:

cd models/tutorials/embedding
python word2vec_optimized.py \
  --train_data=text8 \
  --eval_data=questions-words.txt \
  --save_path=/tmp/

To run the code from sources using bazel:

bazel run -c opt models/tutorials/embedding/word2vec_optimized -- \
  --train_data=text8 \
  --eval_data=questions-words.txt \
  --save_path=/tmp/

Here is a short overview of what is in this directory.

File What's in it?
word2vec.py A version of word2vec implemented using TensorFlow ops and minibatching.
word2vec_test.py Integration test for word2vec.
word2vec_optimized.py A version of word2vec implemented using C ops that does no minibatching.
word2vec_optimized_test.py Integration test for word2vec_optimized.
word2vec_kernels.cc Kernels for the custom input and training ops.
word2vec_ops.cc The declarations of the custom ops.