Skip to content

Publishing large data is 30x-100x slower than for rclcpp #763

@karl-schulz

Description

@karl-schulz

TL;DR: Publishing a 10MB Pointcloud takes only 2.8 ms in rclypp but 92 ms in rclpy.

We observed significant latency problems when publishing large data like images or pointclouds via rclpy. Investigating the problem, we found that publishing large data of any kind is 30-100x slower in Python than in C++.

We found the Python Publisher.publish(msg) call takes especially long compared to rclcpp. Publishing a 640x480 Pointcloud of size ~10MB with 15 FPS is not possible with rclpy even on our quite powerful machine (Ryzen 3900X), let alone our robot (Jetson Xavier NX). The time taken for both seems to be about linear with the data size.

Reproduction: https://github.com/karl-schulz/ros2_latency

Bug report

  • Operating System:
    • Ubuntu 20.04 and Ubuntu 18.04
  • Installation type:
    • Both binaries and built from source
  • Version or commit hash:
    • (like in current binaries / rclpy foxy branch)
  • DDS implementation:
    • Both Fast-RTPS and CycloneDDS

Steps to reproduce issue

I created a package for easy reproduction:
https://github.com/karl-schulz/ros2_latency

Basically, it consists of:

  • A node source publishing a random pointcloud of X MB size, X is controllable over a parameter
  • Two repeaters for C++ and Python, respectively, which
    • Copy the pointcloud
    • Update the timestamp
    • Re-publish it on different topics
  • A node measure which subscribes to both topics and compares the timestamps with the current ROS time

Expected behavior

  • rclpy and rclcpp being close in performance, with slightly higher latency for rclpy
  • Both rclpy and rclcpp being able topublish a Pointcloud of a 640x480 RGB-D image with at least 30 FPS on a fast machine

Actual behavior

  • Even on a very fast desktop machine, rclpy takes ~100ms for publishing 10MB of pointcloud data
  • For comparison, rclcpp take only between 1/30 and 1/100 of the time
  • Using rclpy with high FPS is not possible

Additional information

Tested on:

  • A desktop PC with a Ryzen 3800X and Ubuntu 20
  • Another machine with a Rytzen 3700X and Ubuntu 18 with ROS2 Foxy built from source
  • A NVIDIA Jetson Xavier NX L4T (Ubuntu 18) and ROS2 Foxy built from source

Tried Solutions

We were pretty desperate and already tried:

  • Switching the RWM from FastRTPS to CycloneDDS
    (however the good performance on rclcpp indicates that it is not a middleware problem)
  • Tuning CycloneDDS with a custom config XML and a higher system package size, like described here:
    https://index.ros.org/doc/ros2/Troubleshooting/DDS-tuning/
    (but the same rationale applies)
  • Using PYTHONOPTIMIZE=0
  • Different QOS settings: "best effort"/"reliable", depth=1/depth=10
  • Publishing different message types, e.g. Image messages instead of Pointloud2

This problem is actually really critical for our application. In the worst case: Is this expected behavior? This would make the ROS2 Python API useless for high-framerate or low-latency image pipelines.

Help would be really appreciated!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions