Skip to content

Building TensorFlow Transform

aborkar-ibm edited this page Jul 31, 2023 · 23 revisions

Building TensorFlow Transform

The instructions provided below specify the steps to build TensorFlow Transform version 1.12.0 on Linux on IBM Z for the following distributions:

  • Ubuntu (20.04, 22.04)

General Notes:

  • When following the steps below please use a standard permission user unless otherwise specified.
  • A directory /<source_root>/ will be referred to in these instructions, this is a temporary writable directory anywhere you'd like to place it.

Step 1: Build and Install TensorFlow Transform v1.12.0

1.1) Build using script

If you want to build TensorFlow Transform using manual steps, go to STEP 1.2.

Use the following commands to build TensorFlow Transform using the build script. Please make sure you have wget installed.

wget -q https://raw.githubusercontent.com/linux-on-ibm-z/scripts/master/TensorflowTransform/1.12.0/build_tensorflow_transform.sh

# Build TensorFlow Transform
bash build_tensorflow_transform.sh    [Provide -t option for executing build with tests, -p option for choosing the Python version from {3.7, 3.8, 3.9, 3.10}, if not specified, the script will use the distro provided Python version (i.e., Python 3.8 on Ubuntu 20.04 and Python 3.10 on Ubuntu 22.04).]

If the build completes successfully, go to STEP 2. In case of error, check logs for more details or go to STEP 1.2 to follow manual build steps.

1.2) Install the dependencies

export SOURCE_ROOT=/<source_root>/
  • Ubuntu 20.04

    sudo apt-get update
    sudo apt-get install -y build-essential cargo curl
  • Ubuntu 22.04

    sudo apt-get update
    sudo apt-get install -y build-essential cargo curl cmake

1.3) Build and Install TensorFlow 2.11.0

  • The instructions for building TensorFlow 2.11.0 can be found here.

  • For Ubuntu 20.04, use the following commands to build TensorFlow 2.11.0 with the distribution provided python version (3.8 at the time of writing):

    cd $SOURCE_ROOT
    wget -O build_tensorflow.sh https://raw.githubusercontent.com/linux-on-ibm-z/scripts/master/Tensorflow/2.11.0/build_tensorflow.sh
    bash build_tensorflow.sh -y
  • For Ubuntu 22.04, use the following commands to build TensorFlow 2.11.0 with python 3.9 which is the highest python version supported by TensorFlow Transform:

    cd $SOURCE_ROOT
    wget -O build_tensorflow.sh https://raw.githubusercontent.com/linux-on-ibm-z/scripts/master/Tensorflow/2.11.0/build_tensorflow.sh
    bash build_tensorflow.sh -y -p 3.9

1.4) Build and Install Apache Arrow 6.0.1

  • Build CMake 3.21.2 (only on Ubuntu 20.04)

    cd $SOURCE_ROOT
    wget https://github.com/Kitware/CMake/releases/download/v3.21.2/cmake-3.21.2.tar.gz
    tar -xzf cmake-3.21.2.tar.gz
    cd cmake-3.21.2
    ./bootstrap --prefix=/usr
    make
    sudo make install
  • Download source code

    cd $SOURCE_ROOT
    git clone https://github.com/apache/arrow.git
    cd arrow
    git checkout apache-arrow-6.0.1
  • Build and install Arrow C++ library

    cd $SOURCE_ROOT/arrow/cpp
    mkdir release
    cd release
    cmake -DCMAKE_INSTALL_PREFIX=/usr/local \
       -DARROW_PARQUET=ON \
       -DARROW_PYTHON=ON \
       -DCMAKE_BUILD_TYPE=Release \
       ..
    make -j4
    sudo make install
    export LD_LIBRARY_PATH=/usr/local/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  • Build and install pyarrow library

    cd $SOURCE_ROOT/arrow/python
    sed -i "s/cython >= 0.29/cython >= 0.29, < 3/g" pyproject.toml
    sed -i "s/cython>=0.29/cython>=0.29,<3/g" requirements-build.txt
    sudo pip3 install -r requirements-build.txt
    sudo python3 setup.py bdist_wheel
    sudo pip3 install dist/*.whl

1.5) Build and Install Protobuf 3.19.4

  • Build and install Protobuf

    sudo pip3 uninstall protobuf -y
    cd $SOURCE_ROOT
    git clone https://github.com/protocolbuffers/protobuf.git
    cd protobuf/
    git submodule update --init --recursive
    git checkout v3.19.4
    ./autogen.sh
    CXXFLAGS="-fPIC -g -O2" ./configure --prefix=/usr
    make; sudo make install
    sudo ldconfig
    cd python/
    sudo python3 setup.py bdist_wheel --cpp_implementation --compile_static_extension
    sudo pip3 install dist/*.whl

1.6) Build and Install Apache Beam 2.41.0

  • Build and install Apache Beam

    cd $SOURCE_ROOT
    sudo pip3 install maturin                    # Only on Ubuntu 22.04
    sudo GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True pip3 install testresources protobuf==3.19.4 'apache-beam[gcp]'==2.41.0

1.7) Build and Install tfx-bsl 1.12.0

  • Download source code

    cd $SOURCE_ROOT
    git clone https://github.com/tensorflow/tfx-bsl.git
    cd tfx-bsl
    git checkout v1.12.0
  • Build and install tfx-bsl

    cd $SOURCE_ROOT
    curl -o bazel-5.3.0-update.patch https://raw.githubusercontent.com/linux-on-ibm-z/scripts/master/TensorflowTransform/1.12.0/patch/bazel-5.3.0-update.patch
    curl -o tfx-bsl.diff https://raw.githubusercontent.com/linux-on-ibm-z/scripts/master/TensorflowTransform/1.12.0/patch/tfx-bsl.diff
    cd tfx-bsl
    patch -p1 < ../bazel-5.3.0-update.patch
    patch -p1 < ../tfx-bsl.diff
    
    sudo touch /usr/local/include/immintrin.h
    
    python3 setup.py bdist_wheel
    sudo pip3 install dist/*.whl

1.8) Install TensorFlow Transform from binary

cd $SOURCE_ROOT
sudo pip3 install tensorflow-transform==1.12.0

1.9) Install TensorFlow Transform from source (optional)

It is also possible to build and install TensorFlow Transform manually. This step is required if you intend to run the test cases as in Step 3.

  • Download source code

    cd $SOURCE_ROOT
    git clone https://github.com/tensorflow/transform.git
    cd transform
    git checkout v1.12.0
  • Apply the following patch to fix the enviroment setup issue mentioned here.

    sed -i '336,339d' tensorflow_transform/coders/example_proto_coder_test.py
  • Build and install

    sudo python3 setup.py install

Note: If any other particular version of a python package is required during installation, please run sudo pip3 install '<package-name>==<version>' to install it:

Step 2: Verify TensorFlow Transform (Optional)

  • Run TensorFlow Transform from command Line

    $ cd $SOURCE_ROOT
    $ python3
     >>> import tensorflow as tf
     >>> import tensorflow_transform as tft
     >>> tft.version.__version__
     '1.12.0'
     >>>
  • Follow instructions in this tutorial to use TensorFlow Transform to preprocess data.

Step 3: Execute Test Suite (Optional)

  • Run the complete testsuite

    cd $SOURCE_ROOT/transform
    python3 -m unittest discover -v -p '*_test.py'
  • Run a single test case (for example BeamImplTest.testHandleBatchError)

    cd $SOURCE_ROOT/transform
    python3 -m unittest -v tensorflow_transform/beam/impl_test.py -k BeamImplTest.testHandleBatchError

Note: Test case BeamImplTest.testNumericAnalyzersWithCompositeInputssparse_elementwise_tf.float64 fails intermittently on both s390x and Intel but will pass after an individual rerun.

References

Clone this wiki locally