Skip to content

Building TensorFlow Transform

aborkar-ibm edited this page Dec 21, 2021 · 23 revisions

Building TensorFlow Transform

ATTENTION!!! This package uses Log4j. Please see details here, for the updates on security vulnerabilities.

The instructions provided below specify the steps to build TensorFlow Transform version 1.0.0 on Linux on IBM Z for the following distributions:

  • Ubuntu (18.04, 20.04, 21.04)

General Notes:

  • When following the steps below please use a standard permission user unless otherwise specified.
  • A directory /<source_root>/ will be referred to in these instructions, this is a temporary writable directory anywhere you'd like to place it.

Step 1: Build and Install TensorFlow Transform v1.0.0

1.1) Build using script

If you want to build TensorFlow Transform using manual steps, go to STEP 1.2.

Use the following commands to build TensorFlow Transform using the build script. Please make sure you have wget installed.

wget -q https://raw.githubusercontent.com/linux-on-ibm-z/scripts/master/TensorflowTransform/1.0.0/build_tensorflow_transform.sh

# Build TensorFlow Transform
bash build_tensorflow_transform.sh    [Provide -t option for executing build with tests]

If the build completes successfully, go to STEP 2. In case of error, check logs for more details or go to STEP 1.2 to follow manual build steps.

1.2) Build and Install TensorFlow 2.5.0

  • Instructions for building TensorFlow 2.5.0 can be found here.

    Note: numpy 1.19.5 version is needed to build Tensorflow v2.5.0, if you are following the Build Instructions please run sudo pip3 install numpy==1.19.5 in step 1.2 Install the dependencies.

1.3) Install the dependencies

export SOURCE_ROOT=/<source_root>/
  • Ubuntu (18.04, 20.04, 21.04)
sudo apt-get update
sudo apt-get install -y libboost-dev libboost-filesystem-dev libboost-system-dev libboost-regex-dev automake autoconf libtool curl

1.4) Build and Install Apache Arrow 2.0.0

  • Build CMake 3.20.4
cd $SOURCE_ROOT
wget https://github.com/Kitware/CMake/releases/download/v3.20.4/cmake-3.20.4.tar.gz
tar -xzf cmake-3.20.4.tar.gz
cd cmake-3.20.4
./bootstrap --prefix=/usr
make
sudo make install
  • Download source code
 cd $SOURCE_ROOT
 git clone https://github.com/apache/arrow.git
 cd arrow
 git checkout apache-arrow-2.0.0
  • Build and install Arrow C++ library
 cd $SOURCE_ROOT/arrow/cpp
 mkdir release
 cd release
 export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
 cmake -DCMAKE_INSTALL_PREFIX=/usr/local \
    -DCMAKE_INSTALL_LIBDIR=lib \
    -DARROW_PARQUET=ON \
    -DARROW_PYTHON=ON \
    -DCMAKE_BUILD_TYPE=Release \
    ..
 make -j4
 sudo make install
  • Install/Update python packages version
 sudo apt remove --purge cython3 -y
 sudo pip3 install 'Cython>=0.29'
  • Build and install pyarrow library
 cd $SOURCE_ROOT/arrow/python
 export ARROW_BUILD_TYPE='release' && export PYARROW_WITH_PARQUET=1
 SETUPTOOLS_SCM_PRETEND_VERSION=2.0.0 python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --bundle-arrow-cpp bdist_wheel
 sudo pip3 install dist/*.whl

1.5) Build and Install Protobuf 3.14.0

  • Build and install Protobuf
 sudo pip3 install pip --upgrade
 sudo pip3 uninstall protobuf -y
 cd $SOURCE_ROOT
 git clone https://github.com/protocolbuffers/protobuf.git
 cd protobuf/
 git submodule update --init --recursive
 git checkout v3.14.0
 ./autogen.sh
 CXXFLAGS="-fPIC -g -O2" ./configure --prefix=/usr
 make; sudo make install
 sudo ldconfig
 cd python/
 sudo python3 setup.py bdist_wheel --cpp_implementation --compile_static_extension
 sudo pip3 install dist/*.whl

1.6) Build and Install Apache Beam 2.29.0

  • Build and install Apache Beam
 cd $SOURCE_ROOT
 git clone https://github.com/apache/beam.git
 cd beam
 git checkout v2.29.0
 cd sdks/python/
 sudo pip3 install -r build-requirements.txt
 sudo pip3 install -e .

1.7) Build and Install tfx-bsl 1.0.0

  • Download source code
 cd $SOURCE_ROOT
 git clone https://github.com/tensorflow/tfx-bsl.git
 cd tfx-bsl
 git checkout v1.0.0
  • Build and install
 export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-s390x
 export PATH=$JAVA_HOME/bin:$PATH:$SOURCE_ROOT/bazel-s390x/output
 python3 setup.py bdist_wheel
 sudo pip3 install dist/*.whl

1.8) Install TensorFlow Transform from binary

  sudo pip3 install tensorflow-transform==1.0.0

1.9) Install TensorFlow Transform from source (optional)

It is also possible to build and install TensorFlow Transform manually. This step is required if you intend to run the test cases as in Step 3.

  • Download source code
 cd $SOURCE_ROOT
 git clone https://github.com/tensorflow/transform.git
 cd transform
 git checkout v1.0.0
  • Apply the following patch
sed -i '217d' tensorflow_transform/coders/example_proto_coder_test.py
  • Build and install
 sudo python3 setup.py install

Note: If any other particular version of a python package is required during installation, please run sudo pip3 install '<package-name>==<version>' to install it:

Step 2: Verify TensorFlow Transform (Optional)

  • Run TensorFlow Transform from command Line

     $ cd $SOURCE_ROOT
     $ /usr/bin/python3
      >>> import tensorflow as tf
      >>> import tensorflow_transform as tft
      >>> tft.version.__version__
      '1.0.0'
      >>>

Step 3: Execute Test Suite (Optional)

  • Run complete testsuite

    python3 -m unittest discover -v -p '*_test.py'
  • Following four test cases fail on s390x due to tolerance issue:

  1. testTukeyHHAnalyzersWithDenseInputstukey_float32in_reduce(tensorflow_transform.beam.tukey_hh_params_integration_test.TukeyHHParamsIntegrationTest)
  2. testTukeyHHAnalyzersWithDenseInputstukey_float32in_reduce(tensorflow_transform.beam.tukey_hh_params_integration_v2_test.TukeyHHParamsIntegrationV2Test)
  3. testTukeyHHAnalyzersWithSparseInputs_float32in_reduce(tensorflow_transform.beam.tukey_hh_params_integration_test.TukeyHHParamsIntegrationTest)
  4. testTukeyHHAnalyzersWithSparseInputs_float32in_reduce(tensorflow_transform.beam.tukey_hh_params_integration_v2_test.TukeyHHParamsIntegrationV2Test)
  • Tolerance can be adjusted by applying the following patch:
index 936481a..a2afddb 100644
--- a/tensorflow_transform/test_case.py
+++ b/tensorflow_transform/test_case.py
@@ -291,7 +291,7 @@ class TransformTestCase(parameterized.TestCase, tf.test.TestCase):
         isinstance(a_value, np.ndarray) and a_value.dtype == np.object):
       self.assertAllEqual(a_value, b_value)
     else:
-      self.assertAllClose(a_value, b_value)
+      self.assertAllClose(a_value, b_value, rtol=7.8e-06, atol=7.8e-06)

   def AssertVocabularyContents(self, vocab_file_path, file_contents):
     if vocab_file_path.endswith('.tfrecord.gz'):

References:

Clone this wiki locally