Skip to content

Commit 6e2e7d4

Browse files
zheng-xqlly-zero-one
authored andcommitted
A skeleton implmentation for the expression, IR and visitor dispatchers. (pytorch#33)
To run the test: cmake . && make cpptest && ./expr_test Refactor the RefHandle class. (pytorch#34) Add convenience operator for Expr. clang-format change (pytorch#35) Adding Var, Let and eval_context support. (pytorch#36) Add LLVM JIT class for online codegen Refactor llvm codegen fix caps of LlvmJit Generate code for integer arithmetic Test all arithmetic ops with LLVM Fix rtti Compat with llvm 7 and 8 Add support for tensor expressions. (pytorch#38) Add Casting support so mixed dtypes are supported. Add basic dtype and logging support. This should be merged with PyTorch during integration. clang-format fix (pytorch#39) Extend dtypes to support vector types (pytorch#40) Support LLVM 9 too Disambigate dependent type name with template keyword Remove empty scalar.h Add basic support for statements. (pytorch#41) Add support for For, Ramp, Block, Load, Store and Broadcast. Add support for Buffer. Adding Stmt evaluation support. (pytorch#42) Use third_party/googletest from pytorch Remove nnc/tests/googletest submodule Move nnc tld to torch/csrc/jit/compiler Add a README (probably temporary) for jit/compiler Move from namespace nnc to torch::jit::compiler Refactor JIT class to isolate no-rtti pieces Adding comparison operator to Var. (pytorch#43) Fix typo in README.md Use absolute imports and pragma once Use absolute includes in new llvm_jit.h Build non-LLVM compiler stuff with libtorch Minimal asmjit codegen from the tensor IR fix pessimizing moves IR printer fix printer bug Add printer to build system. Add data structure for schedule support and Split. clang-format using the new template Add IRMutator and basic support to substitude Var in Expr and Stmts. Change the default count of RefCounted as zero. Merge Expr(node) and Expr::make(node). Add basic lowering to the tensor expression trees. fix the schedule_test fixed lowering LLVM code generation for simple loops bugfixes refcount fixing self-assignment Make LOG(FATAL) nonreturn Enable Werror Adding statement conversion for SplitWithTail Add a reference tests for Split clang-format A functinoal reference chck for schedule tests. clang-format Add support for Float immediates. Get absolute path for ASMJIT_DIR (pytorch#24) Silence deprecation warnings from LLVM Include legacy PassManager for debug printing Set code model to medium to avoid indirect jumps in generated asm Fix argument type of input float buffers Add support for Casts in LLVM codegen. Add a complete tensor+lower+llvm test Enable the failing test Enable export of compile_commands.json. Floating point arithmetic Test fp32 mul using compute expr Broadcast add test using compute expr Update to LLVM 9 Implementation of Broadcast for LLVM. Add Buffer operator() overload, and some other minor features Cleanup use of ConstantInt API. fix accidental experimental changes Change the Compute interface to bring the dim sizes and names together clang-format refactor Buffer into its own files Add support for vector casts in LLVM CodeGen Implement masked loads and stores. Implement vector masked loads and stores. Add a PaddedBuffer test util Improve the user interface for SimpleIREvaluator Add a test for Block codegen. Fix gtest include path clang-format Add expressions and support for Max and Min. (pytorch#5) Rename compiler to tensorexpr and move files around to be more similar to other pytorch parts. (pytorch#6) Summary: 1. Move compiler to tensorexpr folder 2. Move files from src and include to the same folder (and remove src and include folders) 3. Rename .cc to .cpp Add missing include <math.h> (pytorch#7) Change isnan to std::isnan. It breaks my clang builds. (pytorch#8) Change the SimpleIREvaluator frontend (pytorch#9) Add RefHandle for subclass Make LLVM dependency optional. (pytorch#10) [wip] Basic fuser pass to select texpr subgraphs Revert "[wip] Basic fuser pass to select texpr subgraphs" This reverts commit a9d9919. Revert changes to the main pytorch CMakeLists.txt (for now). Add a test for aten::_cast_Float lowering. (pytorch#12) Hook tensorexp up to the main build, and switch to c10 logging More ATen op tests. (pytorch#16) Fix some missing returns Include tests back to the 'all' target. (pytorch#14) Even more ATen op tests. (pytorch#18) Test for relu ATen op. (pytorch#19) Add intrinsics function support. (pytorch#20) Remove fmax/fmin, as they are already covered by the Max/Min operators (pytorch#21) refactor CallNode and BaseCallNode, so we can have a common concrete base class for visitors. (pytorch#22) This is the first step to add other call types. Add FunctionCall to use existing tensors (pytorch#23) Add the ability to use an existing tensor expression in other compute functions. (pytorch#24) fixing broken compilation on mac/clang adding IRnode for Compare-Select Ops and their LLVM Codegen Fix Werror. (pytorch#26) Add tests for some transcendental ops. (pytorch#27) Add Allocate and Free support. (pytorch#29) Add Eval and test basic alloc support. Add Lowering support for buffer allocation for intermediate tensors. Tensor expr fuser pass for extremely simple expressions Make fusion work for arbitrary buffer/tensor combinations of inputs (pytorch#30) fix Let02 test Access inputs and intermediates uniformly through Tensors (pytorch#31) fix Let02 test (pytorch#32) adding LLVM Codegen for Let modifying CMakeLists.txt to enable ninja test && minor update for LLVM Codegen for Let (handling XQ's comment) Adding ComputeInline support. (pytorch#35) Fix broken tests (pytorch#36) Make tx fuser work with arbitrary ranks [fuser] Broadcast args Improve naming of arg broadcasting function Test cases for tensorexpr fusion (pytorch#37) CompareSelct Op: Addressing XQ and Owen's comments modifying CMakeLists.txt to enable ninja test && minor update for LLVM Codegen for Let (handling XQ's comment) CompareSelct Op: Addressing XQ and Owen's comments Sketch sufficient support for constants to get constant alpha working. (pytorch#40) * Refactor to use a switch statement over Node kinds. * Sketch sufficient support for constants to get constant alpha working. Fix indices when inlining non-leaf calls (pytorch#39) Fixing the inline ordering issue (pytorch#43) Solve more problems with the inliner Avoid creating redundant and/or improperly ordered Constant's in fused subgraphs. (pytorch#42) Move fuser-styled tests to schedule_test (pytorch#44) Add aten::sub to the new fuser. (pytorch#46) Refactor CodeGen from SimpleIREval (pytorch#47) Inline all the things (pytorch#45) clang-format for atent_test.cpp Eliminate a ton of warnings for my own sanity. (pytorch#48) Add support for type promotion/demotion. (pytorch#50) Flesh out new fuser coverage to several more ops. (pytorch#51) Adding the first basic CudaCodeGen. (pytorch#52) aten tests for eq, ge, gt, le, lt support for aten ops: eq support for more aten ops: ge, gt, le, lt, ne Minimal CMake change to link LLVM to libtorch Fix issues causing assertion failures in llvm debug builds Fatal on unimplement llvm codegen ops (Allocate, etc.) Optionally compile tx fuser kernels with llvm Test for 2D broadcasted with large dims to show vectorization Updated isSupported for increased op coverage. (pytorch#54) Refactor LLVMCodeGen to compile kernel in constructor Cmake integration to PT codebase (pytorch#28) With this change our code blends with the usual PyTorch code and is built the usual way. I added a cmake option to specify where to look for LLVM, if it's not specified, LLVM is not used. An example of invocation (from the root of pytorch repo): ``` USE_LLVM=/path/to/llvm9/install python setup.py develop ``` This command will build libtorch.{a,so} and other libraries, and tensorexpr code will be a part of it. The tests will be built in build/bin/test_tensorexpr (I've ported only one test so far). So, invocation of the tests will be: ``` build/bin/test_tensorexpr ``` Remove old padded_buffer.{cpp,h}. (pytorch#56) Add support for code generation of Log10 intrinsics with LLVM. (pytorch#57) Remove tests/test_utils.h: inline what's still used and nuke what's unused. (pytorch#58) Move Fuser tests (tests/tests.py) to test/test_tensorexpr.py. (pytorch#59) Remove old CMakeLists and README.txt Add support for vectorized and unmasked loads and stores with LLVM. (pytorch#62) Enable CodeGen-level optimizations in LLVM. (pytorch#63) Add Bind/GPUBlock/GPUThread support. (pytorch#64) Bind/run interface to CodeGen (pytorch#60) * Bind/run interface to CodeGen * Make LLVMCodeGen implement CodeGen interface * Allow bind/run to be unimplemented for the moment (CUDA) * Cache compilation result * Two nasty bugs: forgot virtual dtor, forgot to clear bindings after run() Fix ambiguity in CreateExtractElementCall (0ull can be a Value*, I guess?) (pytorch#65) Allow constants as lhs/rhs args (not just alpha) (pytorch#66) Use correct tensor type for fuser output (pytorch#67) clang-format Rename 'compiler' namespace to 'tensorexpr'. Include all built llvm targets (pytorch#68) Switch back to linking only the native LLVM target. (pytorch#69) Virtual dtors for IRVisitor/IRMutator (pytorch#70) Add semicolon to make nvcc compile (pytorch#71) Enable NVRTC for the GPU backend. (pytorch#74) Fix non-CUDA testing. (pytorch#75) Getting fused (a)Sin(h), (a)Cos(h),(a) Tan(h), abs working with the interpreter (pytorch#73) * Getting fused (a)Sin(h), (a)Cos(h),(a) Tan(h), abs working with the interpreter * take the interpreter path only when ENABLE_LLVM is not set remove the leak tests, as we will get rid of refcounting (pytorch#76) Implement aten::min, max, and clamp (pytorch#72) * Implement aten::min, max, and clamp * Propagate NaNs like std::max/min * Change NaN propagation in interpreter too clang-format tensorexpr/tests.h (pytorch#77) Refactor UniqueNameManager into its own files. (pytorch#79) refactor cuda_codegen (pytorch#80) simplify nvrtc major, minor versions (pytorch#81) Allow CodeGen to take Var args (interpreter support only) (pytorch#78) * Test demonstrating dynamic shape * Allow binding of Vars to args in interpreter * Pass BufferArgs to LLVMCodeGen * clang-format-diff [LLVMCodeGen] Refactor kernel constructor to be less sprawling (pytorch#82) * Member TM to TM_ in LLVMCodeGen * [LLVMCodeGen] Add helper for getContext * [LLVMCodeGen] Refactor type support * [LLVMCodeGen] Refactor kernel emission
1 parent 61a2b34 commit 6e2e7d4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+11599
-1
lines changed

caffe2/CMakeLists.txt

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -418,6 +418,7 @@ if (NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE)
418418
${TORCH_SRC_DIR}/csrc/jit/passes/requires_grad_analysis.cpp
419419
${TORCH_SRC_DIR}/csrc/jit/passes/specialize_autogradzero.cpp
420420
${TORCH_SRC_DIR}/csrc/jit/passes/subgraph_rewrite.cpp
421+
${TORCH_SRC_DIR}/csrc/jit/passes/tensorexpr_fuser.cpp
421422
${TORCH_SRC_DIR}/csrc/jit/passes/python_print.cpp
422423
${TORCH_SRC_DIR}/csrc/jit/passes/utils/subgraph_utils.cpp
423424
${TORCH_SRC_DIR}/csrc/jit/passes/utils/check_alias_annotation.cpp
@@ -461,8 +462,38 @@ if (NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE)
461462
${TORCH_SRC_DIR}/csrc/jit/fuser/fallback.cpp
462463
${TORCH_SRC_DIR}/csrc/jit/function.cpp
463464
${TORCH_SRC_DIR}/csrc/jit/vararg_functions.cpp
465+
466+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/expr.cpp
467+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/function.cpp
468+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/ir.cpp
469+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/ir_visitor.cpp
470+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/asmjit_codegen.cpp
471+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/llvm_codegen.cpp
472+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/llvm_jit.cpp
473+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/types.cpp
474+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/ir_printer.cpp
475+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/ir_mutator.cpp
476+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/schedule.cpp
477+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/tensor.cpp
478+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/unique_name_manager.cpp
464479
)
465480

481+
if (USE_LLVM)
482+
message(STATUS "Looking for LLVM in ${USE_LLVM}")
483+
find_package(LLVM QUIET PATHS ${USE_LLVM} NO_DEFAULT_PATH)
484+
485+
if (LLVM_FOUND)
486+
message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}")
487+
message(STATUS "Using LLVMConfig.cmake in: ${LLVM_DIR}")
488+
489+
include_directories(${LLVM_INCLUDE_DIRS})
490+
add_definitions(-DENABLE_LLVM ${LLVM_DEFINITIONS})
491+
endif (LLVM_FOUND)
492+
endif (USE_LLVM)
493+
494+
set_source_files_properties(${TORCH_SRC_DIR}/csrc/jit/tensorexpr/llvm_jit.cpp PROPERTIES COMPILE_FLAGS -fno-rtti)
495+
496+
466497
if (NOT INTERN_BUILD_MOBILE)
467498
set (MOBILE_SRCS
468499
${TORCH_SRC_DIR}/csrc/jit/mobile/function.cpp
@@ -525,10 +556,11 @@ if (NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE)
525556

526557
if (USE_CUDA)
527558
list(APPEND Caffe2_GPU_SRCS
528-
${TORCH_SRC_DIR}/csrc/jit/fuser/cuda/fused_kernel.cpp
529559
${TORCH_SRC_DIR}/csrc/autograd/profiler_cuda.cpp
530560
${TORCH_SRC_DIR}/csrc/autograd/functions/comm.cpp
531561
${TORCH_SRC_DIR}/csrc/cuda/comm.cpp
562+
${TORCH_SRC_DIR}/csrc/jit/fuser/cuda/fused_kernel.cpp
563+
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/cuda_codegen.cpp
532564
)
533565
add_library(caffe2_nvrtc SHARED ${ATen_NVRTC_STUB_SRCS})
534566
target_link_libraries(caffe2_nvrtc ${CUDA_NVRTC} ${CUDA_CUDA_LIB} ${CUDA_NVRTC_LIB})
@@ -626,6 +658,13 @@ endif()
626658
add_library(torch_cpu ${Caffe2_CPU_SRCS})
627659
torch_compile_options(torch_cpu) # see cmake/public/utils.cmake
628660

661+
if (LLVM_FOUND)
662+
llvm_map_components_to_libnames(LLVM_LINK_LIBS
663+
support core analysis executionengine instcombine
664+
scalaropts transformutils native orcjit)
665+
target_link_libraries(torch_cpu PRIVATE ${LLVM_LINK_LIBS})
666+
endif (LLVM_FOUND)
667+
629668
# This is required for older versions of CMake, which don't allow
630669
# specifying add_library() without a list of source files
631670
set(DUMMY_EMPTY_FILE ${CMAKE_BINARY_DIR}/empty.cpp)
@@ -759,6 +798,7 @@ ENDIF()
759798

760799
if (BUILD_TEST AND NOT MSVC AND NOT USE_ROCM)
761800
add_subdirectory(${TORCH_ROOT}/test/cpp/jit ${CMAKE_BINARY_DIR}/test_jit)
801+
add_subdirectory(${TORCH_ROOT}/test/cpp/tensorexpr ${CMAKE_BINARY_DIR}/test_tensorexpr)
762802
if (USE_DISTRIBUTED)
763803
add_subdirectory(${TORCH_ROOT}/test/cpp/rpc ${CMAKE_BINARY_DIR}/test_cpp_rpc)
764804
endif()

test/cpp/tensorexpr/CMakeLists.txt

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
set(TENSOREXPR_TEST_ROOT ${TORCH_ROOT}/test/cpp/tensorexpr)
2+
3+
file(GLOB TENSOREXPR_TEST_SRCS ${TENSOREXPR_TEST_ROOT}/test_*.cpp)
4+
set(TENSOREXPR_TEST_SRCS ${TENSOREXPR_TEST_SRCS} PARENT_SCOPE)
5+
6+
add_executable(test_tensorexpr
7+
${TORCH_ROOT}/test/cpp/common/main.cpp
8+
${TENSOREXPR_TEST_ROOT}/gtest.cpp
9+
${TENSOREXPR_TEST_ROOT}/padded_buffer.cpp
10+
${TENSOREXPR_TEST_SRCS})
11+
12+
target_link_libraries(test_tensorexpr PRIVATE torch gtest asmjit)
13+
target_include_directories(test_tensorexpr PRIVATE ${ATen_CPU_INCLUDE})
14+
15+
if (USE_CUDA)
16+
target_link_libraries(test_tensorexpr PRIVATE
17+
${CUDA_LIBRARIES}
18+
${CUDA_NVRTC_LIB}
19+
${CUDA_CUDA_LIB}
20+
${TORCH_CUDA_LIBRARIES})
21+
22+
target_compile_definitions(test_tensorexpr PRIVATE USE_CUDA)
23+
elseif (USE_ROCM)
24+
target_link_libraries(test_tensorexpr PRIVATE
25+
${ROCM_HIPRTC_LIB}
26+
${PYTORCH_HIP_HCC_LIBRARIES}
27+
${TORCH_CUDA_LIBRARIES})
28+
29+
target_link_libraries(test_tensorexpr PRIVATE caffe2_gpu)
30+
31+
target_compile_definitions(test_tensorexpr PRIVATE USE_ROCM)
32+
endif()
33+
34+
if (INSTALL_TEST)
35+
install(TARGETS test_tensorexpr DESTINATION bin)
36+
# Install PDB files for MSVC builds
37+
if (MSVC AND BUILD_SHARED_LIBS)
38+
install(FILES $<TARGET_PDB_FILE:test_tensorexpr> DESTINATION bin OPTIONAL)
39+
endif()
40+
endif()

test/cpp/tensorexpr/README.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# JIT C++ Tests
2+
3+
## How to add a new test
4+
First, create a new test file. Test files should have be placed in this
5+
directory, with a name that starts with `test_`, like `test_foo.cpp`.
6+
7+
Here is an example test file you can copy-paste.
8+
```cpp
9+
#include <test/cpp/jit/test_base.h>
10+
11+
// Tests go in torch::jit
12+
namespace torch {
13+
namespace jit {
14+
15+
// 1. Test cases are void() functions.
16+
// 2. They start with the prefix `test`
17+
void testCaseOne() {
18+
// ...
19+
}
20+
21+
void testCaseTwo() {
22+
// ...
23+
}
24+
}
25+
}
26+
```
27+
28+
Then, register your test in `tests.h`:
29+
```cpp
30+
// Add to TH_FORALL_TESTS_CUDA instead for CUDA-requiring tests
31+
#define TH_FORALL_TESTS(_) \
32+
_(ADFormulas) \
33+
_(Attributes) \
34+
...
35+
_(CaseOne) // note that the `test` prefix is omitted.
36+
_(CaseTwo)
37+
```
38+
39+
We glob all the test files together in `CMakeLists.txt` so that you don't
40+
have to edit it every time you add a test. Unfortunately, this means that in
41+
order to get the build to pick up your new test file, you need to re-run
42+
cmake:
43+
```
44+
python setup.py build --cmake
45+
```
46+
47+
## Why do we have two different test runners?
48+
We have two different ways of running our cpp tests:
49+
1. With `gtest`, from a standalone binary.
50+
2. With Python, from `TestJit.test_cpp` and `TestJit.test_cpp_cuda` (in
51+
`test/test_jit.py`)
52+
53+
We want both because we need to test things from a pure-C++ environment and
54+
with all our various Python patch-points enabled.
55+
56+
## How do I run the tests?
57+
The following commands assume you are in PyTorch root.
58+
59+
1. With `gtest`:
60+
```bash
61+
# (re)build the test binary
62+
ninja build/bin/test_jit
63+
# run
64+
build/bin/test_jit --gtest_filter='glob_style_filter*'
65+
```
66+
2. With Python:
67+
```
68+
python test/test_jit.py TestJit.test_cpp TestJit.test_cpp_cuda
69+
```

test/cpp/tensorexpr/__init__.py

Whitespace-only changes.

test/cpp/tensorexpr/gtest.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#include <test/cpp/tensorexpr/tests.h>
2+
3+
#include <gtest/gtest.h>
4+
5+
namespace torch {
6+
namespace jit {
7+
8+
#define TENSOREXPR_GTEST(name) \
9+
TEST(TensorExprTest, name) { \
10+
test##name(); \
11+
}
12+
TH_FORALL_TESTS(TENSOREXPR_GTEST)
13+
#undef TENSOREXPR_GTEST
14+
15+
#define TENSOREXPR_GTEST_CUDA(name) \
16+
TEST(TensorExprTest, name##_CUDA) { \
17+
test##name(); \
18+
}
19+
TH_FORALL_TESTS_CUDA(TENSOREXPR_GTEST_CUDA)
20+
#undef TENSOREXPR_GTEST_CUDA
21+
22+
} // namespace jit
23+
} // namespace torch
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
#include "test/cpp/tensorexpr/padded_buffer.h"
2+
3+
#include <sstream>
4+
5+
#include <gtest/gtest.h>
6+
7+
#include <c10/util/Logging.h>
8+
9+
namespace torch {
10+
namespace jit {
11+
namespace tensorexpr {
12+
13+
int PaddedBufferBase::Index(const std::vector<int>& indices) const {
14+
DCHECK_EQ(dims_.size(), indices.size());
15+
int total_index = 0;
16+
for (int i = 0; i < dims_.size(); i++) {
17+
total_index += indices[i] * strides_[i];
18+
}
19+
return total_index;
20+
}
21+
22+
PaddedBufferBase::PaddedBufferBase(
23+
const std::vector<int>& dims,
24+
const std::string& name)
25+
: dims_(dims), name_(name), strides_(dims.size()) {
26+
for (int i = dims.size() - 1; i >= 0; --i) {
27+
if (i == dims.size() - 1) {
28+
strides_[i] = 1;
29+
} else {
30+
strides_[i] = strides_[i + 1] * dims[i + 1];
31+
}
32+
}
33+
total_size_ = strides_[0] * dims[0];
34+
}
35+
36+
template <typename T>
37+
std::string CompareErrorMsg(
38+
const PaddedBuffer<T>& v1,
39+
const PaddedBuffer<T>& v2,
40+
int index) {
41+
std::ostringstream oss;
42+
oss << "index: " << index << ", names: " << v1.name() << ", " << v2.name();
43+
return oss.str();
44+
}
45+
46+
template <typename T>
47+
void PaddedBuffer<T>::ValidateWatermark() const {
48+
for (int i = 0; i < kPaddingSize; i++) {
49+
EXPECT_EQ(data_[i], kPaddingValue)
50+
<< "left-side watermark broken: "
51+
<< "index: " << i << ", name: " << name();
52+
EXPECT_EQ(data_[i + total_size_ + kPaddingSize], kPaddingValue)
53+
<< "right-side watermark broken: "
54+
<< "index: " << i << ", name: " << name();
55+
}
56+
}
57+
58+
template <typename T>
59+
void PaddedBuffer<T>::CheckBackup() const {
60+
ValidateWatermark();
61+
DCHECK(backup_data_.size() == data_.size())
62+
<< "Please make sure you have call Backup() before calling CheckBackup()";
63+
for (int i = 0; i < total_size_; i++) {
64+
EXPECT_EQ(data_[i + kPaddingSize], backup_data_[i + kPaddingSize])
65+
<< "mismatch against backup, "
66+
<< "index: " << i << ", name: " << name();
67+
}
68+
}
69+
70+
template <typename T>
71+
void ExpectAllEqual(const PaddedBuffer<T>& f1, const PaddedBuffer<T>& f2) {
72+
const std::vector<T>& v1 = f1.data_;
73+
const std::vector<T>& v2 = f2.data_;
74+
const int kPaddingSize = f1.kPaddingSize;
75+
const int total_size = f1.total_size_;
76+
ASSERT_EQ(v1.size(), v2.size());
77+
f1.ValidateWatermark();
78+
f2.ValidateWatermark();
79+
for (int i = 0; i < total_size; i++) {
80+
EXPECT_EQ(v1[kPaddingSize + i], v2[kPaddingSize + i])
81+
<< CompareErrorMsg(f1, f2, i);
82+
}
83+
}
84+
85+
void ExpectAllNear(
86+
const PaddedBuffer<float>& f1,
87+
const PaddedBuffer<float>& f2,
88+
float abs_error) {
89+
const std::vector<float>& v1 = f1.data_;
90+
const std::vector<float>& v2 = f2.data_;
91+
const int kPaddingSize = f1.kPaddingSize;
92+
const int total_size = f1.total_size_;
93+
ASSERT_EQ(v1.size(), v2.size());
94+
f1.ValidateWatermark();
95+
f2.ValidateWatermark();
96+
for (int i = 0; i < total_size; i++) {
97+
EXPECT_NEAR(v1[kPaddingSize + i], v2[kPaddingSize + i], abs_error)
98+
<< CompareErrorMsg(f1, f2, i);
99+
}
100+
}
101+
102+
template class PaddedBuffer<int>;
103+
template class PaddedBuffer<float>;
104+
template void ExpectAllEqual(
105+
const PaddedBuffer<int>& f1,
106+
const PaddedBuffer<int>& f2);
107+
108+
} // namespace tensorexpr
109+
} // namespace jit
110+
} // namespace torch

0 commit comments

Comments
 (0)