-
Notifications
You must be signed in to change notification settings - Fork 696
Description
Description
In some application, if there are many VRF tables in vpp, and there is external router redundancy, e.g. active-standby. when active router is down, so need to delete route from each VRF table, and add new route to each VRF tables.
in an application, there are 800 VRFs, with a busy vpp (high traffic), it takes 3 minutes to finish deletion one route from each VRF. as a comparison, using a bulk API (put multiple routes in a single API, in the test, put 100 routes in a API, called 8 times, 800 routes deletion in total), it takes ~2 seconds under same traffic condition.
a bulk API example:
define bulk_ip_route_add_del</p>
{
u32 client_index;
u32 context;
u8 is_add;
u8 is_multipath;
u32 n_routes;
vl_api_ip_route_t route[n_routes];
}
<p>;
without this API, user need to implement it in vpp plugin and call fib_api_route_add_del after decode the route in the array. and user need to copy ip.api and fib_types.api to their plugin workspace.
This is a desired new API, for some applications which need many VRFs.
Assignee
Unassigned
Reporter
Baorong Liu
Comments
- JIRAUSER14713 (Mon, 20 Sep 2021 17:47:04 +0000):
Fixed: ip6_fib_dump #1 correct my description on main/worker for DP. As you pointed out, since only one core assigned to this vpp application, so only main thread (no separate worker):
vpp-dp# show threads
ID Name Type LWP Sched Policy (Priority) lcore Core Socket State
0 vpp_main 18 other (0) 7 0 0
vpp-dp# quit
Thanks for the suggestion that ping all the main thread to a common CPU.
#2, for the copy ip.api and fib_type.api things. Probably because of our build process. we use ligato-vpp base image (not directly refer to vpp source), I did not figure out how to import that yet. If I import that as the link does, it will complain:
FileNotFoundError: [Errno 2] No such file or directory: './vnet/fib/fib_types.api'
will try it later.
- vrpolak (Mon, 20 Sep 2021 10:24:56 +0000): > in this project, this is only one CPU core for both vpp worker and main thread
Oh. Does that mean there are two software threads used by the VPP application (main and worker), but they are pinned to the same (logical) hardware thread? That means Linux kernel needs to keep context switching between them, causing both delays in main thread processing commands, and probably packet drops (when CPU is handling main thread or context switch instead of dataplane work).
Is it possible for the VPP application to use only one software thread (only main, no worker)? This setup would be slow, but not as slow as Linux context switches.
In case there are multiple VPP applications running, it is a common practice to pin all their main threads to one common CPU (but workers threads to different CPU each). That would give you good speed, but not sure your project is allowed to get this setup (e.g. if the VPP application is inside a VM with only vCPU available).
I think VPP is not well optimized for main thread being busy with both dataplane and API work at once. Making it more optimized may be worth it, but it would not be an easy work.
Introducing bulk_ip_route_add_del would help only with routes, other API calls will still be slow in your setup. But it may still be the next best thing.
user need to copy ip.api and fib_types.api to their plugin workspace
Are you sure? I see other plugins importing [1] without any issues.
Line 24 in e3cf4d0
| import "vnet/fib/fib_types.api"; |
- JIRAUSER14713 (Sat, 18 Sep 2021 18:38:28 +0000):
This happened in a real project(Cisco). It is used to implement VPN, each VRF for one VPN forwarding. and in this project, this is only one CPU core for both vpp worker and main thread, and hit this issue.
But I am not sure if there is any other project has this requirement.
As I said in the description, without this, we have to use a workaround, create a new API in user plugin, includes all the routes need to be added or deleted, and call fib_api_route_add_del from that API handler. but the drawback is that need copy ip.api and fib_types.api and other things from vpp code to user plugin.
- vrpolak (Fri, 17 Sep 2021 15:05:17 +0000): Bulk commands are fine, but before VPP commits to maintaining them, we should investigate whether they are really needed.
800 calls in 3 minutes and 8 calls in 2 seconds is roughly the same rate (4 calls per second) regardless of payload size, which is suspicious.
There are some known issues affecting API execution rate (example: [0], but shared memory transport for PAPI has been deprecated since then), but 4 calls per second is way lower than anything I have heard about before.
Also, VPP that is expected to get busy usually uses few worker threads, and the main thread is not used for dataplane work (so it is ready for processing API calls). I believe route addition/deletion does not require worker barrier, so the fact worker threads are busy with traffic should not slow down the main thread much.
How are you executing the API calls? Depending on your client, you could send 100 commands before you start reading replies, thus hiding latency and increasing the number of commands executed per second.
[0] https://lists.fd.io/g/vpp-dev/topic/80903834#18875
Original issue: https://jira.fd.io/browse/VPP-1996