Skip to content

Commit cb3086c

Browse files
committed
Merge branch 'bridge-mdb-limit'
Petr Machata says: ==================== bridge: Limit number of MDB entries per port, port-vlan The MDB maintained by the bridge is limited. When the bridge is configured for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its capacity. In SW datapath, the capacity is configurable through the IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a similar limit exists in the HW datapath for purposes of offloading. In order to prevent the issue of unilateral exhaustion of MDB resources, introduce two parameters in each of two contexts: - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled) per-port-VLAN number of MDB entries that the port is member in. - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled) per-port-VLAN maximum permitted number of MDB entries, or 0 for no limit. Per-port number of entries keeps track of the total number of MDB entries configured on a given port. The per-port-VLAN value then keeps track of the subset of MDB entries configured specifically for the given VLAN, on that port. The number is adjusted as port_groups are created and deleted, and therefore under multicast lock. A maximum value, if non-zero, then places a limit on the number of entries that can be configured in a given context. Attempts to add entries above the maximum are rejected. Rejection reason of netlink-based requests to add MDB entries is communicated through extack. This channel is unavailable for rejections triggered from the control path. To address this lack of visibility, the patchset adds a tracepoint, bridge:br_mdb_full: # perf record -e bridge:br_mdb_full & # [...] # perf script | cut -d: -f4- dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 0 dev v2 af 10 src :: grp ff0e::112/00:00:00:00:00:00 vid 0 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 10 dev v2 af 10 src 2001:db8:1::1 grp ff0e::1/00:00:00:00:00:00 vid 10 dev v2 af 2 src ::ffff:192.0.2.1 grp ::ffff:239.1.1.1/00:00:00:00:00:00 vid 10 Another option to consume the tracepoint is e.g. through the bpftrace tool: # bpftrace -e ' tracepoint:bridge:br_mdb_full /args->af != 0/ { printf("dev %s src %s grp %s vid %u\n", str(args->dev), ntop(args->src), ntop(args->grp), args->vid); } tracepoint:bridge:br_mdb_full /args->af == 0/ { printf("dev %s grp %s vid %u\n", str(args->dev), macaddr(args->grpmac), args->vid); }' This tracepoint is triggered for mcast_hash_max exhaustions as well. The following is an example of how the feature is used. A more extensive example is available in patch #8: # bridge vlan set dev v1 vid 1 mcast_max_groups 1 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1 # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1 Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1. The patchset progresses as follows: - In patch #1, set strict_start_type at two bridge-related policies. The reason is we are adding a new attribute to one of these, and want the new attribute to be parsed strictly. The other was adjusted for completeness' sake. - In patches #2 to #5, br_mdb and br_multicast code is adjusted to make the following additions smoother. - In patch #6, add the tracepoint. - In patch #7, the code to maintain number of MDB entries is added as struct net_bridge_mcast_port::mdb_n_entries. The maximum is added, too, as struct net_bridge_mcast_port::mdb_max_entries, however at this point there is no way to set the value yet, and since 0 is treated as "no limit", the functionality doesn't change at this point. Note however, that mcast_hash_max violations already do trigger at this point. - In patch #8, netlink plumbing is added: reading of number of entries, and reading and writing of maximum. The per-port values are passed through RTM_NEWLINK / RTM_GETLINK messages in IFLA_BRPORT_MCAST_N_GROUPS and _MAX_GROUPS, inside IFLA_PROTINFO nest. The per-port-vlan values are passed through RTM_GETVLAN / RTM_NEWVLAN messages in BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, _MAX_GROUPS, inside BRIDGE_VLANDB_ENTRY. The following patches deal with the selftest: - Patches #9 and #10 clean up and move around some selftest code. - Patches #11 to #14 add helpers and generalize the existing IGMP / MLD support to allow generating packets with configurable group addresses and varying source lists for (S,G) memberships. - Patch #15 adds code to generate IGMP leave and MLD done packets. - Patch #16 finally adds the selftest itself. v3: - Patch #7: - Access mdb_max_/_n_entries through READ_/WRITE_ONCE - Move extack setting to br_multicast_port_ngroups_inc_one(). Since we use NL_SET_ERR_MSG_FMT_MOD, the correct context (port / port-vlan) can be passed through an argument. This also removes the need for more READ/WRITE_ONCE's at the extack-setting site. - Patch #8: - Move the br_multicast_port_ctx_vlan_disabled() check out to the _vlan_ helpers callers. Thus these helpers cannot fail, which makes them very similar to the _port_ helpers. Have them take the MC context directly and unify them. v2: - Cover letter: - Add an example of a bpftrace-based probe script - Patch #6: - Report IPv4 as an IPv6-mapped address through the IPv6 buffer as well, to save ring buffer space. - Patch #7: - In br_multicast_port_ngroups_inc_one(), bounce if n>=max, not if n==max - Adjust extack messages to mention ngroups, now that the bounces appear when n>=max, not n==max - In __br_multicast_enable_port_ctx(), do not reset max to 0. Also do not count number of entries by going through _inc, as that would end up incorrectly bouncing the entries. - Patch #8: - Drop locks around accesses in br_multicast_{port,vlan}_ngroups_{get,set_max}(), - Drop bounces due to max<n in br_multicast_{port,vlan}_ngroups_set_max(). - Patch #12: - In the comment at payload_template_calc_checksum(), s/%#02x/%02x/, that's the mausezahn payload format. - Patch #16: - Adjust the tests that check setting max below n and reset of max on VLAN snooping enablement - Make test naming uniform - Enable testing of control path (IGMP/MLD) in mcast_vlan_snooping bridge - Reorganize the code so that test instances (per bridge type and configuration type) always come right after the test, in order of {d,q,qvs}{4,6}{cfg,ctl}. Then groups of selftests are at the end of the file. Similarly adjust invocation order of the tests. ==================== Signed-off-by: David S. Miller <[email protected]>
2 parents 8b7018f + 3446dcd commit cb3086c

File tree

16 files changed

+1867
-79
lines changed

16 files changed

+1867
-79
lines changed

include/trace/events/bridge.h

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,64 @@ TRACE_EVENT(br_fdb_update,
122122
__entry->flags)
123123
);
124124

125+
TRACE_EVENT(br_mdb_full,
126+
127+
TP_PROTO(const struct net_device *dev,
128+
const struct br_ip *group),
129+
130+
TP_ARGS(dev, group),
131+
132+
TP_STRUCT__entry(
133+
__string(dev, dev->name)
134+
__field(int, af)
135+
__field(u16, vid)
136+
__array(__u8, src, 16)
137+
__array(__u8, grp, 16)
138+
__array(__u8, grpmac, ETH_ALEN) /* For af == 0. */
139+
),
140+
141+
TP_fast_assign(
142+
struct in6_addr *in6;
143+
144+
__assign_str(dev, dev->name);
145+
__entry->vid = group->vid;
146+
147+
if (!group->proto) {
148+
__entry->af = 0;
149+
150+
memset(__entry->src, 0, sizeof(__entry->src));
151+
memset(__entry->grp, 0, sizeof(__entry->grp));
152+
memcpy(__entry->grpmac, group->dst.mac_addr, ETH_ALEN);
153+
} else if (group->proto == htons(ETH_P_IP)) {
154+
__entry->af = AF_INET;
155+
156+
in6 = (struct in6_addr *)__entry->src;
157+
ipv6_addr_set_v4mapped(group->src.ip4, in6);
158+
159+
in6 = (struct in6_addr *)__entry->grp;
160+
ipv6_addr_set_v4mapped(group->dst.ip4, in6);
161+
162+
memset(__entry->grpmac, 0, ETH_ALEN);
163+
164+
#if IS_ENABLED(CONFIG_IPV6)
165+
} else {
166+
__entry->af = AF_INET6;
167+
168+
in6 = (struct in6_addr *)__entry->src;
169+
*in6 = group->src.ip6;
170+
171+
in6 = (struct in6_addr *)__entry->grp;
172+
*in6 = group->dst.ip6;
173+
174+
memset(__entry->grpmac, 0, ETH_ALEN);
175+
#endif
176+
}
177+
),
178+
179+
TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u",
180+
__get_str(dev), __entry->af, __entry->src, __entry->grp,
181+
__entry->grpmac, __entry->vid)
182+
);
125183

126184
#endif /* _TRACE_BRIDGE_H */
127185

include/uapi/linux/if_bridge.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,8 @@ enum {
523523
BRIDGE_VLANDB_ENTRY_TUNNEL_INFO,
524524
BRIDGE_VLANDB_ENTRY_STATS,
525525
BRIDGE_VLANDB_ENTRY_MCAST_ROUTER,
526+
BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS,
527+
BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS,
526528
__BRIDGE_VLANDB_ENTRY_MAX,
527529
};
528530
#define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)

include/uapi/linux/if_link.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -567,6 +567,8 @@ enum {
567567
IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
568568
IFLA_BRPORT_LOCKED,
569569
IFLA_BRPORT_MAB,
570+
IFLA_BRPORT_MCAST_N_GROUPS,
571+
IFLA_BRPORT_MCAST_MAX_GROUPS,
570572
__IFLA_BRPORT_MAX
571573
};
572574
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)

net/bridge/br_mdb.c

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -849,11 +849,10 @@ static int br_mdb_add_group_sg(const struct br_mdb_config *cfg,
849849
}
850850

851851
p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
852-
MCAST_INCLUDE, cfg->rt_protocol);
853-
if (unlikely(!p)) {
854-
NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (S, G) port group");
852+
MCAST_INCLUDE, cfg->rt_protocol, extack);
853+
if (unlikely(!p))
855854
return -ENOMEM;
856-
}
855+
857856
rcu_assign_pointer(*pp, p);
858857
if (!(flags & MDB_PG_FLAGS_PERMANENT) && !cfg->src_entry)
859858
mod_timer(&p->timer,
@@ -1075,11 +1074,10 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg,
10751074
}
10761075

10771076
p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL,
1078-
cfg->filter_mode, cfg->rt_protocol);
1079-
if (unlikely(!p)) {
1080-
NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (*, G) port group");
1077+
cfg->filter_mode, cfg->rt_protocol,
1078+
extack);
1079+
if (unlikely(!p))
10811080
return -ENOMEM;
1082-
}
10831081

10841082
err = br_mdb_add_group_srcs(cfg, p, brmctx, extack);
10851083
if (err)
@@ -1101,8 +1099,7 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg,
11011099
return 0;
11021100

11031101
err_del_port_group:
1104-
hlist_del_init(&p->mglist);
1105-
kfree(p);
1102+
br_multicast_del_port_group(p);
11061103
return err;
11071104
}
11081105

net/bridge/br_multicast.c

Lines changed: 173 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
#include <net/ip6_checksum.h>
3232
#include <net/addrconf.h>
3333
#endif
34+
#include <trace/events/bridge.h>
3435

3536
#include "br_private.h"
3637
#include "br_private_mcast_eht.h"
@@ -234,6 +235,29 @@ br_multicast_pg_to_port_ctx(const struct net_bridge_port_group *pg)
234235
return pmctx;
235236
}
236237

238+
static struct net_bridge_mcast_port *
239+
br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid)
240+
{
241+
struct net_bridge_mcast_port *pmctx = NULL;
242+
struct net_bridge_vlan *vlan;
243+
244+
lockdep_assert_held_once(&port->br->multicast_lock);
245+
246+
if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED))
247+
return NULL;
248+
249+
/* Take RCU to access the vlan. */
250+
rcu_read_lock();
251+
252+
vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid);
253+
if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx))
254+
pmctx = &vlan->port_mcast_ctx;
255+
256+
rcu_read_unlock();
257+
258+
return pmctx;
259+
}
260+
237261
/* when snooping we need to check if the contexts should be used
238262
* in the following order:
239263
* - if pmctx is non-NULL (port), check if it should be used
@@ -668,6 +692,101 @@ void br_multicast_del_group_src(struct net_bridge_group_src *src,
668692
__br_multicast_del_group_src(src);
669693
}
670694

695+
static int
696+
br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx,
697+
struct netlink_ext_ack *extack,
698+
const char *what)
699+
{
700+
u32 max = READ_ONCE(pmctx->mdb_max_entries);
701+
u32 n = READ_ONCE(pmctx->mdb_n_entries);
702+
703+
if (max && n >= max) {
704+
NL_SET_ERR_MSG_FMT_MOD(extack, "%s is already in %u groups, and mcast_max_groups=%u",
705+
what, n, max);
706+
return -E2BIG;
707+
}
708+
709+
WRITE_ONCE(pmctx->mdb_n_entries, n + 1);
710+
return 0;
711+
}
712+
713+
static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx)
714+
{
715+
u32 n = READ_ONCE(pmctx->mdb_n_entries);
716+
717+
WARN_ON_ONCE(n == 0);
718+
WRITE_ONCE(pmctx->mdb_n_entries, n - 1);
719+
}
720+
721+
static int br_multicast_port_ngroups_inc(struct net_bridge_port *port,
722+
const struct br_ip *group,
723+
struct netlink_ext_ack *extack)
724+
{
725+
struct net_bridge_mcast_port *pmctx;
726+
int err;
727+
728+
lockdep_assert_held_once(&port->br->multicast_lock);
729+
730+
/* Always count on the port context. */
731+
err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack,
732+
"Port");
733+
if (err) {
734+
trace_br_mdb_full(port->dev, group);
735+
return err;
736+
}
737+
738+
/* Only count on the VLAN context if VID is given, and if snooping on
739+
* that VLAN is enabled.
740+
*/
741+
if (!group->vid)
742+
return 0;
743+
744+
pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid);
745+
if (!pmctx)
746+
return 0;
747+
748+
err = br_multicast_port_ngroups_inc_one(pmctx, extack, "Port-VLAN");
749+
if (err) {
750+
trace_br_mdb_full(port->dev, group);
751+
goto dec_one_out;
752+
}
753+
754+
return 0;
755+
756+
dec_one_out:
757+
br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
758+
return err;
759+
}
760+
761+
static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
762+
{
763+
struct net_bridge_mcast_port *pmctx;
764+
765+
lockdep_assert_held_once(&port->br->multicast_lock);
766+
767+
if (vid) {
768+
pmctx = br_multicast_port_vid_to_port_ctx(port, vid);
769+
if (pmctx)
770+
br_multicast_port_ngroups_dec_one(pmctx);
771+
}
772+
br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
773+
}
774+
775+
u32 br_multicast_ngroups_get(const struct net_bridge_mcast_port *pmctx)
776+
{
777+
return READ_ONCE(pmctx->mdb_n_entries);
778+
}
779+
780+
void br_multicast_ngroups_set_max(struct net_bridge_mcast_port *pmctx, u32 max)
781+
{
782+
WRITE_ONCE(pmctx->mdb_max_entries, max);
783+
}
784+
785+
u32 br_multicast_ngroups_get_max(const struct net_bridge_mcast_port *pmctx)
786+
{
787+
return READ_ONCE(pmctx->mdb_max_entries);
788+
}
789+
671790
static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
672791
{
673792
struct net_bridge_port_group *pg;
@@ -702,6 +821,7 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
702821
} else {
703822
br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE);
704823
}
824+
br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid);
705825
hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list);
706826
queue_work(system_long_wq, &br->mcast_gc_work);
707827

@@ -1165,6 +1285,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
11651285
return mp;
11661286

11671287
if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
1288+
trace_br_mdb_full(br->dev, group);
11681289
br_mc_disabled_update(br->dev, false, NULL);
11691290
br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
11701291
return ERR_PTR(-E2BIG);
@@ -1284,14 +1405,22 @@ struct net_bridge_port_group *br_multicast_new_port_group(
12841405
unsigned char flags,
12851406
const unsigned char *src,
12861407
u8 filter_mode,
1287-
u8 rt_protocol)
1408+
u8 rt_protocol,
1409+
struct netlink_ext_ack *extack)
12881410
{
12891411
struct net_bridge_port_group *p;
1412+
int err;
12901413

1291-
p = kzalloc(sizeof(*p), GFP_ATOMIC);
1292-
if (unlikely(!p))
1414+
err = br_multicast_port_ngroups_inc(port, group, extack);
1415+
if (err)
12931416
return NULL;
12941417

1418+
p = kzalloc(sizeof(*p), GFP_ATOMIC);
1419+
if (unlikely(!p)) {
1420+
NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
1421+
goto dec_out;
1422+
}
1423+
12951424
p->key.addr = *group;
12961425
p->key.port = port;
12971426
p->flags = flags;
@@ -1305,8 +1434,8 @@ struct net_bridge_port_group *br_multicast_new_port_group(
13051434
if (!br_multicast_is_star_g(group) &&
13061435
rhashtable_lookup_insert_fast(&port->br->sg_port_tbl, &p->rhnode,
13071436
br_sg_port_rht_params)) {
1308-
kfree(p);
1309-
return NULL;
1437+
NL_SET_ERR_MSG_MOD(extack, "Couldn't insert new port group");
1438+
goto free_out;
13101439
}
13111440

13121441
rcu_assign_pointer(p->next, next);
@@ -1320,6 +1449,25 @@ struct net_bridge_port_group *br_multicast_new_port_group(
13201449
eth_broadcast_addr(p->eth_addr);
13211450

13221451
return p;
1452+
1453+
free_out:
1454+
kfree(p);
1455+
dec_out:
1456+
br_multicast_port_ngroups_dec(port, group->vid);
1457+
return NULL;
1458+
}
1459+
1460+
void br_multicast_del_port_group(struct net_bridge_port_group *p)
1461+
{
1462+
struct net_bridge_port *port = p->key.port;
1463+
__u16 vid = p->key.addr.vid;
1464+
1465+
hlist_del_init(&p->mglist);
1466+
if (!br_multicast_is_star_g(&p->key.addr))
1467+
rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode,
1468+
br_sg_port_rht_params);
1469+
kfree(p);
1470+
br_multicast_port_ngroups_dec(port, vid);
13231471
}
13241472

13251473
void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
@@ -1387,7 +1535,7 @@ __br_multicast_add_group(struct net_bridge_mcast *brmctx,
13871535
}
13881536

13891537
p = br_multicast_new_port_group(pmctx->port, group, *pp, 0, src,
1390-
filter_mode, RTPROT_KERNEL);
1538+
filter_mode, RTPROT_KERNEL, NULL);
13911539
if (unlikely(!p)) {
13921540
p = ERR_PTR(-ENOMEM);
13931541
goto out;
@@ -1933,6 +2081,25 @@ static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx)
19332081
br_ip4_multicast_add_router(brmctx, pmctx);
19342082
br_ip6_multicast_add_router(brmctx, pmctx);
19352083
}
2084+
2085+
if (br_multicast_port_ctx_is_vlan(pmctx)) {
2086+
struct net_bridge_port_group *pg;
2087+
u32 n = 0;
2088+
2089+
/* The mcast_n_groups counter might be wrong. First,
2090+
* BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries
2091+
* are flushed, thus mcast_n_groups after the toggle does not
2092+
* reflect the true values. And second, permanent entries added
2093+
* while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected
2094+
* either. Thus we have to refresh the counter.
2095+
*/
2096+
2097+
hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) {
2098+
if (pg->key.addr.vid == pmctx->vlan->vid)
2099+
n++;
2100+
}
2101+
WRITE_ONCE(pmctx->mdb_n_entries, n);
2102+
}
19362103
}
19372104

19382105
void br_multicast_enable_port(struct net_bridge_port *port)

0 commit comments

Comments
 (0)