Neal Wu 51fcc99bc6 More clarifications 8 gadi atpakaļ
..
README.md 51fcc99bc6 More clarifications 8 gadi atpakaļ
__init__.py 2dfcc4f1bc Word2vec can now be run if the users compile the ops on their own 8 gadi atpakaļ
word2vec.py e6d4e082e9 Merge pull request #891 from DorianKodelja/patch-1 8 gadi atpakaļ
word2vec_kernels.cc 86ecc9730d Moving example models from github.com/tensorflow/tensorflow to github.com/tensorflow/models 8 gadi atpakaļ
word2vec_ops.cc 86ecc9730d Moving example models from github.com/tensorflow/tensorflow to github.com/tensorflow/models 8 gadi atpakaļ
word2vec_optimized.py f5712d0c2f Replaced direct path concatenation with os.path.join 8 gadi atpakaļ
word2vec_optimized_test.py 0d9a3abdca Remove all references to 'tensorflow.models' which is no longer correct 8 gadi atpakaļ
word2vec_test.py 0d9a3abdca Remove all references to 'tensorflow.models' which is no longer correct 8 gadi atpakaļ

README.md

This directory contains models for unsupervised training of word embeddings using the model described in:

(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space, ICLR 2013.

Detailed instructions on how to get started and use them are available in the tutorials. Brief instructions are below.

Assuming you have cloned the git repository, navigate into this directory. To download the example text and evaluation data:

curl http://mattmahoney.net/dc/text8.zip > text8.zip
unzip text8.zip
curl https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/word2vec/source-archive.zip > source-archive.zip
unzip -p source-archive.zip  word2vec/trunk/questions-words.txt > questions-words.txt
rm text8.zip source-archive.zip

You will need to compile the ops as follows:

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
g++ -std=c++11 -shared word2vec_ops.cc word2vec_kernels.cc -o word2vec_ops.so -fPIC -I $TF_INC -O2 -D_GLIBCXX_USE_CXX11_ABI=0

On Mac, add -undefined dynamic_lookup to the g++ command.

(For an explanation of what this is doing, see the tutorial on Adding a New Op to TensorFlow. The flag -D_GLIBCXX_USE_CXX11_ABI=0 is included to support newer versions of gcc. However, if you compiled TensorFlow from source using gcc 5 or later, you may need to exclude the flag.) Then run using:

python word2vec_optimized.py \
  --train_data=text8 \
  --eval_data=questions-words.txt \
  --save_path=/tmp/

Here is a short overview of what is in this directory.

File What's in it?
word2vec.py A version of word2vec implemented using TensorFlow ops and minibatching.
word2vec_test.py Integration test for word2vec.
word2vec_optimized.py A version of word2vec implemented using C ops that does no minibatching.
word2vec_optimized_test.py Integration test for word2vec_optimized.
word2vec_kernels.cc Kernels for the custom input and training ops.
word2vec_ops.cc The declarations of the custom ops.