Skip to content

Conversation

xmyqsh
Copy link

@xmyqsh xmyqsh commented Feb 14, 2020

  1. support other floating-point types for CPU
  2. change function parameter to const reference
  3. rename nearest_neighbors_points_cpu.cpp to nearest_neighbor_points_cpu.cpp

Running benchmarks for bm_nearest_neighbor_points/bm_nn_points...

Benchmark Avg Time(μs) Peak Time(μs) Iterations

NN_PYTHON_1_3_1_32 17 50 28602
NN_PYTHON_1_3_1_128 20 54 24513
NN_PYTHON_1_3_128_32 104 149 4822
NN_PYTHON_1_3_128_128 1988 3808 253
NN_PYTHON_1_4_1_32 17 36 28698
NN_PYTHON_1_4_1_128 21 44 24384
NN_PYTHON_1_4_128_32 109 132 4608
NN_PYTHON_1_4_128_128 2058 8013 243
NN_PYTHON_4_3_1_32 22 53 22869
NN_PYTHON_4_3_1_128 27 61 18325
NN_PYTHON_4_3_128_32 2041 7398 245
NN_PYTHON_4_3_128_128 2055 6362 245
NN_PYTHON_4_4_1_32 26 60 19201
NN_PYTHON_4_4_1_128 32 69 15453
NN_PYTHON_4_4_128_32 2084 11528 240
NN_PYTHON_4_4_128_128 2330 10419 215
NN_PYTHON_32_3_1_32 41 85 12251
NN_PYTHON_32_3_1_128 83 138 6022
NN_PYTHON_32_3_128_32 2965 7612 169
NN_PYTHON_32_3_128_128 2588 4914 194
NN_PYTHON_32_4_1_32 44 70 11431
NN_PYTHON_32_4_1_128 88 132 5667
NN_PYTHON_32_4_128_32 2317 13315 216
NN_PYTHON_32_4_128_128 3088 7298 162

Benchmark Avg Time(μs) Peak Time(μs) Iterations

NN_CPU_1_3_1_32 2 34 276333
NN_CPU_1_3_1_128 2 19 231216
NN_CPU_1_3_128_32 20 53 25534
NN_CPU_1_3_128_128 63 98 7963
NN_CPU_1_4_1_32 2 18 273783
NN_CPU_1_4_1_128 2 21 219138
NN_CPU_1_4_128_32 23 45 21636
NN_CPU_1_4_128_128 74 109 6714
NN_CPU_4_3_1_32 2 19 229715
NN_CPU_4_3_1_128 4 20 140817
NN_CPU_4_3_128_32 78 157 6373
NN_CPU_4_3_128_128 246 296 2030
NN_CPU_4_4_1_32 2 19 215370
NN_CPU_4_4_1_128 4 21 126249
NN_CPU_4_4_128_32 90 136 5578
NN_CPU_4_4_128_128 299 375 1675
NN_CPU_32_3_1_32 6 32 80616
NN_CPU_32_3_1_128 17 50 29214
NN_CPU_32_3_128_32 607 681 824
NN_CPU_32_3_128_128 1959 2084 256
NN_CPU_32_4_1_32 9 26 58329
NN_CPU_32_4_1_128 29 61 17353
NN_CPU_32_4_128_32 713 1204 702
NN_CPU_32_4_128_128 2340 2601 214

Benchmark Avg Time(μs) Peak Time(μs) Iterations

NN_CUDA_1_3_1_32 12 92 41091
NN_CUDA_1_3_1_128 12 45 40999
NN_CUDA_1_3_128_32 13 93 39355
NN_CUDA_1_3_128_128 13 44 38039
NN_CUDA_1_4_1_32 13 87 39239
NN_CUDA_1_4_1_128 13 47 39106
NN_CUDA_1_4_128_32 13 86 37276
NN_CUDA_1_4_128_128 14 36 35656
NN_CUDA_4_3_1_32 13 72 39926
NN_CUDA_4_3_1_128 12 32 40978
NN_CUDA_4_3_128_32 14 88 35536
NN_CUDA_4_3_128_128 15 45 33263
NN_CUDA_4_4_1_32 13 125 39217
NN_CUDA_4_4_1_128 13 59 39125
NN_CUDA_4_4_128_32 15 92 32591
NN_CUDA_4_4_128_128 16 57 30676
NN_CUDA_32_3_1_32 12 44 40666
NN_CUDA_32_3_1_128 12 92 40300
NN_CUDA_32_3_128_32 24 94 21261
NN_CUDA_32_3_128_128 24 91 20478
NN_CUDA_32_4_1_32 12 45 40785
NN_CUDA_32_4_1_128 13 74 39771
NN_CUDA_32_4_128_32 28 46 17927
NN_CUDA_32_4_128_128 30 58 16466

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 14, 2020
@xmyqsh xmyqsh force-pushed the nn_points_idx_cpu_type_temp branch 2 times, most recently from 89b4533 to 829b0e3 Compare February 14, 2020 11:15
2. change function parameter to const reference
3. rename nearest_neighbors_points_cpu.cpp to nearest_neighbor_points_cpu.cpp

Running benchmarks for bm_nearest_neighbor_points/bm_nn_points...

Benchmark                    Avg Time(μs)      Peak Time(μs) Iterations
--------------------------------------------------------------------------------
NN_PYTHON_1_3_1_32                  17              50          28602
NN_PYTHON_1_3_1_128                 20              54          24513
NN_PYTHON_1_3_128_32               104             149           4822
NN_PYTHON_1_3_128_128             1988            3808            253
NN_PYTHON_1_4_1_32                  17              36          28698
NN_PYTHON_1_4_1_128                 21              44          24384
NN_PYTHON_1_4_128_32               109             132           4608
NN_PYTHON_1_4_128_128             2058            8013            243
NN_PYTHON_4_3_1_32                  22              53          22869
NN_PYTHON_4_3_1_128                 27              61          18325
NN_PYTHON_4_3_128_32              2041            7398            245
NN_PYTHON_4_3_128_128             2055            6362            245
NN_PYTHON_4_4_1_32                  26              60          19201
NN_PYTHON_4_4_1_128                 32              69          15453
NN_PYTHON_4_4_128_32              2084           11528            240
NN_PYTHON_4_4_128_128             2330           10419            215
NN_PYTHON_32_3_1_32                 41              85          12251
NN_PYTHON_32_3_1_128                83             138           6022
NN_PYTHON_32_3_128_32             2965            7612            169
NN_PYTHON_32_3_128_128            2588            4914            194
NN_PYTHON_32_4_1_32                 44              70          11431
NN_PYTHON_32_4_1_128                88             132           5667
NN_PYTHON_32_4_128_32             2317           13315            216
NN_PYTHON_32_4_128_128            3088            7298            162
--------------------------------------------------------------------------------

Benchmark                 Avg Time(μs)      Peak Time(μs) Iterations
--------------------------------------------------------------------------------
NN_CPU_1_3_1_32                   2              34         276333
NN_CPU_1_3_1_128                  2              19         231216
NN_CPU_1_3_128_32                20              53          25534
NN_CPU_1_3_128_128               63              98           7963
NN_CPU_1_4_1_32                   2              18         273783
NN_CPU_1_4_1_128                  2              21         219138
NN_CPU_1_4_128_32                23              45          21636
NN_CPU_1_4_128_128               74             109           6714
NN_CPU_4_3_1_32                   2              19         229715
NN_CPU_4_3_1_128                  4              20         140817
NN_CPU_4_3_128_32                78             157           6373
NN_CPU_4_3_128_128              246             296           2030
NN_CPU_4_4_1_32                   2              19         215370
NN_CPU_4_4_1_128                  4              21         126249
NN_CPU_4_4_128_32                90             136           5578
NN_CPU_4_4_128_128              299             375           1675
NN_CPU_32_3_1_32                  6              32          80616
NN_CPU_32_3_1_128                17              50          29214
NN_CPU_32_3_128_32              607             681            824
NN_CPU_32_3_128_128            1959            2084            256
NN_CPU_32_4_1_32                  9              26          58329
NN_CPU_32_4_1_128                29              61          17353
NN_CPU_32_4_128_32              713            1204            702
NN_CPU_32_4_128_128            2340            2601            214
--------------------------------------------------------------------------------

Benchmark                  Avg Time(μs)      Peak Time(μs) Iterations
--------------------------------------------------------------------------------
NN_CUDA_1_3_1_32                  12              92          41091
NN_CUDA_1_3_1_128                 12              45          40999
NN_CUDA_1_3_128_32                13              93          39355
NN_CUDA_1_3_128_128               13              44          38039
NN_CUDA_1_4_1_32                  13              87          39239
NN_CUDA_1_4_1_128                 13              47          39106
NN_CUDA_1_4_128_32                13              86          37276
NN_CUDA_1_4_128_128               14              36          35656
NN_CUDA_4_3_1_32                  13              72          39926
NN_CUDA_4_3_1_128                 12              32          40978
NN_CUDA_4_3_128_32                14              88          35536
NN_CUDA_4_3_128_128               15              45          33263
NN_CUDA_4_4_1_32                  13             125          39217
NN_CUDA_4_4_1_128                 13              59          39125
NN_CUDA_4_4_128_32                15              92          32591
NN_CUDA_4_4_128_128               16              57          30676
NN_CUDA_32_3_1_32                 12              44          40666
NN_CUDA_32_3_1_128                12              92          40300
NN_CUDA_32_3_128_32               24              94          21261
NN_CUDA_32_3_128_128              24              91          20478
NN_CUDA_32_4_1_32                 12              45          40785
NN_CUDA_32_4_1_128                13              74          39771
NN_CUDA_32_4_128_32               28              46          17927
NN_CUDA_32_4_128_128              30              58          16466
--------------------------------------------------------------------------------
@xmyqsh xmyqsh force-pushed the nn_points_idx_cpu_type_temp branch from 829b0e3 to b353a77 Compare February 15, 2020 03:36
@nikhilaravi
Copy link
Contributor

@xmyqsh we are making several changes to this file and function - your changes will cause conflicts so we will not be able to merge this PR.

Also note that in future, when running benchmarks you can paste a screenshot of the generated table as currently the comment is not readable. We also need to know the type of machine that the benchmarks were run on. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants