Skip to content

Building word2vec

aborkar-ibm edited this page Apr 27, 2018 · 14 revisions

Building word2vec

The instructions provided below specify the steps to build word2vec version 0.1c on Linux on IBM Z for following distributions:

  • RHEL (6.9, 7.3, 7.4, 7.5)
  • SLES (11 SP4, 12 SP2, 12 SP3)
  • Ubuntu (16.04, 17.10, 18.04)

General notes:

  • When following the steps below please use a standard permission user unless otherwise specified.

  • A directory /<source_root>/ will be referred to in these instructions, this is a temporary writeable directory anywhere you'd like to place it.

Building word2vec

  1. Install standard utilities, packages and platform specific dependencies
  • RHEL (6.9, 7.3, 7.4, 7.5)

     sudo yum install -y git gcc make wget tar unzip
  • SLES (11 SP4, 12 SP2, 12 SP3)

     sudo zypper install -y git gcc make wget tar unzip
  • Ubuntu (16.04, 17.10, 18.04)

     sudo apt-get update
     sudo apt-get install -y git gcc make wget tar unzip
  1. Create a working directory and download word2vec source code

     mkdir /<source_root>/
     cd /<source_root>/
     wget https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/word2vec/source-archive.zip
     unzip source-archive.zip
    
  2. Build word2vec

     cd word2vec/trunk
     make CFLAGS="-lm -pthread -O3 -Wall -funroll-loops"
    
  3. Set environment variables

     export PATH=$PATH:/<source_root>/word2vec/trunk
    
  4. Test word2vec using demo scripts

     ./demo-word.sh
     ./demo-phrases.sh
    

    Note: Enter test corpus as input and get word vectors as output, e.g. Input=france

  5. Run word2vec binary

     word2vec
    

    Note: The word2vec tool takes a text corpus as input and produces the word vectors as output.

References:

https://code.google.com/archive/p/word2vec/

Clone this wiki locally