Skip to content

[VPP-1996] Bulk ip route add del API #3458

@vvalderrv

Description

@vvalderrv

Description

In some application, if there are many VRF tables in vpp, and there is external router redundancy, e.g. active-standby. when active router is down, so need to delete route from each VRF table, and add new route to each VRF tables. 

in an application, there are 800 VRFs, with a busy vpp (high traffic), it takes 3 minutes to finish deletion one route from each VRF. as a comparison, using a bulk API (put multiple routes in a single API, in the test, put 100 routes in a API, called 8 times, 800 routes deletion in total), it takes ~2 seconds under same traffic condition.

a bulk API example:


define bulk_ip_route_add_del</p>

{   
  u32 client_index;   
  u32 context;   
  u8 is_add;   
  u8 is_multipath;   
  u32 n_routes;   
  vl_api_ip_route_t route[n_routes]; 
}
<p>;

without this API, user need to implement it in vpp plugin and call fib_api_route_add_del after decode the route in the array. and user need to copy ip.api and fib_types.api to their plugin workspace. 

This is a desired new API, for some applications which need many VRFs.

Assignee

Unassigned

Reporter

Baorong Liu

Comments

  • JIRAUSER14713 (Mon, 20 Sep 2021 17:47:04 +0000):

    Fixed: ip6_fib_dump #1 correct my description on main/worker for DP. As you pointed out, since only one core assigned to this vpp application, so only main thread (no separate worker):

vpp-dp# show threads

ID Name Type LWP Sched Policy (Priority) lcore Core Socket State

0 vpp_main 18 other (0) 7 0 0

vpp-dp# quit

Thanks for the suggestion that ping all the main thread to a common CPU.

#2, for the copy ip.api and fib_type.api things. Probably because of our build process. we use ligato-vpp base image (not directly refer to vpp source), I did not figure out how to import that yet. If I import that as the link does, it will complain:

FileNotFoundError: [Errno 2] No such file or directory: './vnet/fib/fib_types.api'

will try it later.

  • vrpolak (Mon, 20 Sep 2021 10:24:56 +0000): > in this project, this is only one CPU core for both vpp worker and main thread

Oh. Does that mean there are two software threads used by the VPP application (main and worker), but they are pinned to the same (logical) hardware thread? That means Linux kernel needs to keep context switching between them, causing both delays in main thread processing commands, and probably packet drops (when CPU is handling main thread or context switch instead of dataplane work).

Is it possible for the VPP application to use only one software thread (only main, no worker)? This setup would be slow, but not as slow as Linux context switches.

In case there are multiple VPP applications running, it is a common practice to pin all their main threads to one common CPU (but workers threads to different CPU each). That would give you good speed, but not sure your project is allowed to get this setup (e.g. if the VPP application is inside a VM with only vCPU available).

I think VPP is not well optimized for main thread being busy with both dataplane and API work at once. Making it more optimized may be worth it, but it would not be an easy work.

Introducing bulk_ip_route_add_del would help only with routes, other API calls will still be slow in your setup. But it may still be the next best thing.

user need to copy ip.api and fib_types.api to their plugin workspace

Are you sure? I see other plugins importing [1] without any issues.

[1]

import "vnet/fib/fib_types.api";

  • JIRAUSER14713 (Sat, 18 Sep 2021 18:38:28 +0000):

    This happened in a real project(Cisco). It is used to implement VPN, each VRF for one VPN forwarding. and in this project, this is only one CPU core for both vpp worker and main thread, and hit this issue.

But I am not sure if there is any other project has this requirement.

As I said in the description, without this, we have to use a workaround, create a new API in user plugin, includes all the routes need to be added or deleted, and call fib_api_route_add_del from that API handler. but the drawback is that need copy ip.api and fib_types.api and other things from vpp code to user plugin.

  • vrpolak (Fri, 17 Sep 2021 15:05:17 +0000): Bulk commands are fine, but before VPP commits to maintaining them, we should investigate whether they are really needed.

800 calls in 3 minutes and 8 calls in 2 seconds is roughly the same rate (4 calls per second) regardless of payload size, which is suspicious.

There are some known issues affecting API execution rate (example: [0], but shared memory transport for PAPI has been deprecated since then), but 4 calls per second is way lower than anything I have heard about before.

Also, VPP that is expected to get busy usually uses few worker threads, and the main thread is not used for dataplane work (so it is ready for processing API calls). I believe route addition/deletion does not require worker barrier, so the fact worker threads are busy with traffic should not slow down the main thread much.

How are you executing the API calls? Depending on your client, you could send 100 commands before you start reading replies, thus hiding latency and increasing the number of commands executed per second.

[0] https://lists.fd.io/g/vpp-dev/topic/80903834#18875

Original issue: https://jira.fd.io/browse/VPP-1996

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions