[PATCH 00/18] xfrm: Add compat layer

Discussion:

Dmitry Safonov

2018-07-26 02:31:26 UTC

Due to some historical mistake, xfrm User ABI differ between native and
compatible applications. The difference is in structures paddings and in
the result in the size of netlink messages.
As it's already visible ABI, it cannot be adjusted by packing structures.

Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").

By some wonderful reasons and brilliant architecture decisions for
creating userspace, on Arista switches we still use 32-bit userspace
with 64-bit kernel. There is slow movement to full 64-bit build, but
it's not yet here. As the switches need support for ipsec tunnels, the
local kernel has reverted mentioned patches that disable xfrm for
compat apps. On the top of that there is a bunch of disgraceful hacks
in userspace to work around the size check for netlink messages
and all that jazz.

It looks like, we're not the only desirable users of compatible xfrm,
there were a couple of attempts to make it work:
https://lkml.org/lkml/2017/1/20/733
https://patchwork.ozlabs.org/patch/44600/
http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctly-parse-netlink-msg-from-32bits-ip-command-on-64bits-host

All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413
https://patchwork.ozlabs.org/patch/433279/

In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752

So, here I add a compatible layer to xfrm.
As xfrm uses netlink notifications, kernel should send them in ABI
format that an application will parse. The proposed solution is
to save the ABI of bind() syscall. The realization detail is
to create kernel-hidden, non visible to userspace netlink groups
for compat applications.

The first two patches simplify ifdeffery, and while I've already submitted
them a while ago, I'm resending them for completeness:
https://lore.kernel.org/lkml/20180717005004.25984-1-***@arista.com/T/#u

There is also an exhaustive selftest for ipsec tunnels and to check
that kernel parses correctly the structures those differ in size.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec

Cc: "David S. Miller" <***@davemloft.net>
Cc: Herbert Xu <***@gondor.apana.org.au>
Cc: Steffen Klassert <***@secunet.com>
Cc: Dmitry Safonov <***@gmail.com>
Cc: ***@vger.kernel.org

Dmitry Safonov (18):
x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
compat: Cleanup in_compat_syscall() callers
selftest/net/xfrm: Add test for ipsec tunnel
net/xfrm: Add _packed types for compat users
net/xfrm: Parse userspi_info{,_packed} depending on syscall
netlink: Do not subscribe to non-existent groups
netlink: Pass groups pointer to .bind()
xfrm: Add in-kernel groups for compat notifications
xfrm: Dump usersa_info in compat/native formats
xfrm: Send state notifications in compat format too
xfrm: Add compat support for xfrm_user_expire messages
xfrm: Add compat support for xfrm_userpolicy_info messages
xfrm: Add compat support for xfrm_user_acquire messages
xfrm: Add compat support for xfrm_user_polexpire messages
xfrm: Check compat acquire listeners in xfrm_is_alive()
xfrm: Notify compat listeners about policy flush
xfrm: Notify compat listeners about state flush
xfrm: Enable compat syscalls

MAINTAINERS | 1 +
arch/x86/include/asm/compat.h | 9 +-
arch/x86/include/asm/ftrace.h | 4 +-
arch/x86/kernel/process_64.c | 4 +-
arch/x86/kernel/sys_x86_64.c | 11 +-
arch/x86/mm/hugetlbpage.c | 4 +-
arch/x86/mm/mmap.c | 2 +-
drivers/firmware/efi/efivars.c | 16 +-
include/linux/compat.h | 4 +-
include/linux/netlink.h | 2 +-
include/net/xfrm.h | 14 -
kernel/audit.c | 2 +-
kernel/time/time.c | 2 +-
net/core/rtnetlink.c | 14 +-
net/core/sock_diag.c | 25 +-
net/netfilter/nfnetlink.c | 24 +-
net/netlink/af_netlink.c | 28 +-
net/netlink/af_netlink.h | 4 +-
net/netlink/genetlink.c | 26 +-
net/xfrm/xfrm_state.c | 5 -
net/xfrm/xfrm_user.c | 690 ++++++++---
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ipsec.c | 1987 ++++++++++++++++++++++++++++++++
24 files changed, 2612 insertions(+), 268 deletions(-)
create mode 100644 tools/testing/selftests/net/ipsec.c

--
2.13.6

Florian Westphal

2018-07-26 08:49:59 UTC

Permalink

Post by Dmitry Safonov
So, here I add a compatible layer to xfrm.
As xfrm uses netlink notifications, kernel should send them in ABI
format that an application will parse. The proposed solution is
to save the ABI of bind() syscall. The realization detail is
to create kernel-hidden, non visible to userspace netlink groups
for compat applications.

Why not use exisiting netlink support?
Just add the 32bit skb to skb64->frag_list and let
netlink find if tasks needs 64 or 32 one.

It only needs this small fix to properly signal the end of a dump:
https://marc.info/?l=linux-netdev&m=126625240303351&w=2

I had started a second attempt to make xfrm compat work,
but its still in early stage.

One link that might still have some value:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
(compat structure definitions with BUILD_BUG_ON checking)

My plan was to make xfrm compat work strictly as shrinker (64->32)
and expander (32->64), i.e. no/little changes to exisiting code and
pass all "expanded" skbs through existing xfrm rcv functions.

Example to illustrate idea:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=c622f067849b02170127b69471cb3481e4bc9e49

... its supposed to take 64bit skb and create a 32bit one from it.

Just for reference; I currently don't plan to work on this again.

Steffen Klassert

2018-07-27 07:37:47 UTC

Permalink

Post by Florian Westphal

Why not use exisiting netlink support?
Just add the 32bit skb to skb64->frag_list and let
netlink find if tasks needs 64 or 32 one.
https://marc.info/?l=linux-netdev&m=126625240303351&w=2
I had started a second attempt to make xfrm compat work,
but its still in early stage.
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
(compat structure definitions with BUILD_BUG_ON checking)
My plan was to make xfrm compat work strictly as shrinker (64->32)
and expander (32->64), i.e. no/little changes to exisiting code and
pass all "expanded" skbs through existing xfrm rcv functions.

I agree here with Florian. The code behind this ABI
is already complicated. Please stay away from generic
code a much as possible. Generic and compat code should
be clearly separated.

Dmitry Safonov

2018-07-27 14:02:53 UTC

Permalink

Post by Steffen Klassert

Post by Florian Westphal

Why not use exisiting netlink support?
Just add the 32bit skb to skb64->frag_list and let
netlink find if tasks needs 64 or 32 one.
https://marc.info/?l=linux-netdev&m=126625240303351&w=2
I had started a second attempt to make xfrm compat work,
but its still in early stage.
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_confi
g_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
(compat structure definitions with BUILD_BUG_ON checking)
My plan was to make xfrm compat work strictly as shrinker (64->32)
and expander (32->64), i.e. no/little changes to exisiting code and
pass all "expanded" skbs through existing xfrm rcv functions.

I agree here with Florian. The code behind this ABI
is already complicated. Please stay away from generic
code a much as possible. Generic and compat code should
be clearly separated.

Yeah, I tend to agree that it would be better to separate it.
But:
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.
2. The patches not-yet-done on the link have +500 added lines - as much
as my working patches set, so probably it'll add more code.

Probably, we don't care that much about amount of code added and
additional copies than about separating compat layer from the main
code. Will look into that.

--
Thanks,
Dmitry

Florian Westphal

2018-07-27 14:19:36 UTC

Permalink

Post by Dmitry Safonov
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.

About those bind() patches, I don't understand why they are needed.

Why can't you just add the compat skb to the native skb when doing
the multicast call?

skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...

Dmitry Safonov

2018-07-27 14:51:51 UTC

Permalink

Post by Florian Westphal

Post by Dmitry Safonov
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.

About those bind() patches, I don't understand why they are needed.
Why can't you just add the compat skb to the native skb when doing
the multicast call?
skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...

Oh yeah, sorry, I think I misread the patch - will try to add compat
skb in the multicast call.

--
Thanks,
Dmitry

Andy Lutomirski

2018-07-27 17:09:50 UTC

Permalink

We (Android) are very interested in removing the restriction for 32-bit userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support is required to pass Android conformance tests, and any manufacturer wishing to ship 32-bit userspace with a recent kernel needs out-of-tree changes (removing the compat_task check) to do so.
https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
We could also employ a (relatively simple) solution such as the one above in the uapi XFRM header itself, though it would require a caller to declare the target kernel ABI at compile time. Maybe thatâs not unthinkable for an uncommon case?

Could there just be an XFRM2 that is entirely identical to XFRM for 64-bit userspace but makes the 32-bit structures match? If there are a grand total of two or so userspace implementations, that should cover most use cases. L

-Nathan

Post by Dmitry Safonov

Post by Florian Westphal

Post by Dmitry Safonov
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.

About those bind() patches, I don't understand why they are needed.
Why can't you just add the compat skb to the native skb when doing
the multicast call?
skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...

Oh yeah, sorry, I think I misread the patch - will try to add compat
skb in the multicast call.
--
Thanks,
Dmitry

Dmitry Safonov

2018-07-28 16:26:55 UTC

Permalink

Glad to hear - that justify my attempts more :)

That said, it’s not difficult to work around alignment issues
directly in userspace, so maybe we could just remove the check and
make this the caller's responsibility? Here’s an example of the
https://android.googlesource.com/platform/system/netd/+/refs/heads/ma
ster/server/XfrmController.h#257

We've kinda same workarounds in our userspace..
But I don't think reverting the check makes much sense - it'll make
broken compat ABI in stone.
If you're fine with disgraceful hacks and just want to get rid of
additional non-mainstream patch - you can make 64-bit syscalls from 32-
bit task (hint: examples in x86 selftests).

We could also employ a (relatively simple) solution such as the one
above in the uapi XFRM header itself, though it would require a
caller to declare the target kernel ABI at compile time. Maybe that’s
not unthinkable for an uncommon case?

Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that 32/64
bit xfrm works - so the most time-taking part is done.
So, if you'll wait a week or two - you may help me to justify acception
of mainstreaming those patches.

Post by Florian Westphal

Post by Dmitry Safonov
1. It will double copy netlink messages, making it O(n) instead

Post by Florian Westphal

Post by Dmitry Safonov
O(1), where n - is number of bind()s.. Probably we don't care

much.

Post by Florian Westphal
About those bind() patches, I don't understand why they are

needed.

Post by Florian Westphal
Why can't you just add the compat skb to the native skb when

doing

Post by Florian Westphal
the multicast call?
skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...

Oh yeah, sorry, I think I misread the patch - will try to add compat
skb in the multicast call.

--
Thanks,
Dmitry

David Miller

2018-07-28 21:18:12 UTC

Permalink

From: Dmitry Safonov <***@arista.com>
Date: Sat, 28 Jul 2018 17:26:55 +0100

Post by Dmitry Safonov
Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that 32/64
bit xfrm works - so the most time-taking part is done.

The way you've done the compat structures using __packed is only going
to work on x86, just FYI.

The "32-bit alignment for 64-bit objects" thing x86 has is very much
not universal amongst ABIs having 32-bit and 64-bit variants.

Dmitry Safonov

2018-07-30 17:39:20 UTC

Permalink

Post by David Miller
Date: Sat, 28 Jul 2018 17:26:55 +0100

Post by Dmitry Safonov
Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that

32/64

Post by Dmitry Safonov
bit xfrm works - so the most time-taking part is done.

The way you've done the compat structures using __packed is only going
to work on x86, just FYI.

Thanks for pointing, so I'll probably cover it under something like
HAS_COMPAT_XFRM.
(if there isn't any better idea).

Post by David Miller
The "32-bit alignment for 64-bit objects" thing x86 has is very much
not universal amongst ABIs having 32-bit and 64-bit variants.

--
Thanks,
Dmitry

Florian Westphal

2018-07-30 19:43:14 UTC

Permalink

Post by Dmitry Safonov

Post by David Miller
Date: Sat, 28 Jul 2018 17:26:55 +0100

Post by Dmitry Safonov
Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that

32/64

Post by Dmitry Safonov
bit xfrm works - so the most time-taking part is done.

The way you've done the compat structures using __packed is only going
to work on x86, just FYI.

Thanks for pointing, so I'll probably cover it under something like
HAS_COMPAT_XFRM.
(if there isn't any better idea).

You can do that, I suspect you can use
CONFIG_COMPAT_FOR_U64_ALIGNMENT
as AFAICR the only reason for the compat problem is different alignment
requirements of 64bit integer types in the structs, not e.g. due to
"long" size differences.

Instead of __packed, you can use the "compat" data types, e.g.
compat_u64 instead of u64:

struct compat_xfrm_lifetime_cur {
compat_u64 bytes, packets, add_time, use_time;
}; /* same size on i386, but only 4 byte alignment required even on x86_64*/

You might be able to reuse
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869

in your patch set.

I can try to submit the first few patches (which are not related to
compat, they just add const qualifiers) for inclusion later this week.

Dmitry Safonov

2018-07-26 02:31:33 UTC

Permalink

Netlink messages sent by xfrm differ in size between 64-bit native and
32-bit compatible applications. To know which UABI to use to send the
message from kernel, I'll use the type of bind() syscall.
Xfrm will have hidden from userspace kernel-only groups for compatible
applications.
So, add pointer to groups to netlink_bind().
With later patches xfrm will set a proper compat group for netlink
socket during bind().

Cc: "David S. Miller" <***@davemloft.net>
Cc: Eric Paris <***@redhat.com>
Cc: Florian Westphal <***@strlen.de>
Cc: Herbert Xu <***@gondor.apana.org.au>
Cc: Jozsef Kadlecsik <***@blackhole.kfki.hu>
Cc: Pablo Neira Ayuso <***@netfilter.org>
Cc: Paul Moore <***@paul-moore.com>
Cc: Steffen Klassert <***@secunet.com>
Cc: ***@netfilter.org
Cc: linux-***@redhat.com
Cc: ***@vger.kernel.org
Cc: netfilter-***@vger.kernel.org
Signed-off-by: Dmitry Safonov <***@arista.com>
---
include/linux/netlink.h | 2 +-
kernel/audit.c | 2 +-
net/core/rtnetlink.c | 14 ++++++--------
net/core/sock_diag.c | 25 ++++++++++++-------------
net/netfilter/nfnetlink.c | 24 ++++++++++++++----------
net/netlink/af_netlink.c | 27 ++++++++++-----------------
net/netlink/af_netlink.h | 4 ++--
net/netlink/genetlink.c | 26 ++++++++++++++++++--------
8 files changed, 64 insertions(+), 60 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index f3075d6c7e82..19202648e04a 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -46,7 +46,7 @@ struct netlink_kernel_cfg {
unsigned int flags;
void (*input)(struct sk_buff *skb);
struct mutex *cb_mutex;
- int (*bind)(struct net *net, int group);
+ int (*bind)(struct net *net, unsigned long *groups);
void (*unbind)(struct net *net, int group);
bool (*compare)(struct net *net, struct sock *sk);
};
diff --git a/kernel/audit.c b/kernel/audit.c
index e7478cb58079..87ca0214bcf2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1523,7 +1523,7 @@ static void audit_receive(struct sk_buff *skb)
}

/* Run custom bind function on netlink socket group connect or bind requests. */
-static int audit_bind(struct net *net, int group)
+static int audit_bind(struct net *net, unsigned long *groups)
{
if (!capable(CAP_AUDIT_READ))
return -EPERM;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e3f743c141b3..0465e692ae32 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4683,15 +4683,13 @@ static void rtnetlink_rcv(struct sk_buff *skb)
netlink_rcv_skb(skb, &rtnetlink_rcv_msg);
}

-static int rtnetlink_bind(struct net *net, int group)
+static int rtnetlink_bind(struct net *net, unsigned long *groups)
{
- switch (group) {
- case RTNLGRP_IPV4_MROUTE_R:
- case RTNLGRP_IPV6_MROUTE_R:
- if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
- return -EPERM;
- break;
- }
+ unsigned long mroute_r;
+
+ mroute_r = 1UL << RTNLGRP_IPV4_MROUTE_R | 1UL << RTNLGRP_IPV6_MROUTE_R;
+ if ((*groups & mroute_r) && !ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return -EPERM;
return 0;
}

diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index c37b5be7c5e4..befa6759f2ad 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -273,20 +273,19 @@ static void sock_diag_rcv(struct sk_buff *skb)
mutex_unlock(&sock_diag_mutex);
}

-static int sock_diag_bind(struct net *net, int group)
+static int sock_diag_bind(struct net *net, unsigned long *groups)
{
- switch (group) {
- case SKNLGRP_INET_TCP_DESTROY:
- case SKNLGRP_INET_UDP_DESTROY:
- if (!sock_diag_handlers[AF_INET])
- sock_load_diag_module(AF_INET, 0);
- break;
- case SKNLGRP_INET6_TCP_DESTROY:
- case SKNLGRP_INET6_UDP_DESTROY:
- if (!sock_diag_handlers[AF_INET6])
- sock_load_diag_module(AF_INET6, 0);
- break;
- }
+ unsigned long inet_mask, inet6_mask;
+
+ inet_mask = 1UL << SKNLGRP_INET_TCP_DESTROY;
+ inet_mask |= 1UL << SKNLGRP_INET_UDP_DESTROY;
+ inet6_mask = 1UL << SKNLGRP_INET6_TCP_DESTROY;
+ inet6_mask |= 1UL << SKNLGRP_INET6_UDP_DESTROY;
+
+ if ((*groups & inet_mask) && !sock_diag_handlers[AF_INET])
+ sock_load_diag_module(AF_INET, 0);
+ if ((*groups & inet6_mask) && !sock_diag_handlers[AF_INET6])
+ sock_load_diag_module(AF_INET6, 0);
return 0;
}

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index e1b6be29848d..6a8893df5285 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -556,21 +556,25 @@ static void nfnetlink_rcv(struct sk_buff *skb)
}

#ifdef CONFIG_MODULES
-static int nfnetlink_bind(struct net *net, int group)
+static int nfnetlink_bind(struct net *net, unsigned long *groups)
{
const struct nfnetlink_subsystem *ss;
- int type;
+ unsigned long _groups = *groups;
+ int type, group_bit, group = -1;

- if (group <= NFNLGRP_NONE || group > NFNLGRP_MAX)
- return 0;
+ while ((group_bit = __builtin_ffsl(_groups))) {
+ group += group_bit;

- type = nfnl_group2type[group];
+ type = nfnl_group2type[group];
+ rcu_read_lock();
+ ss = nfnetlink_get_subsys(type << 8);
+ rcu_read_unlock();
+ if (!ss)
+ request_module("nfnetlink-subsys-%d", type);
+
+ _groups >>= group_bit;
+ }

- rcu_read_lock();
- ss = nfnetlink_get_subsys(type << 8);
- rcu_read_unlock();
- if (!ss)
- request_module("nfnetlink-subsys-%d", type);
return 0;
}
#endif
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ac805caed2e2..1e11e706c683 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -668,7 +668,7 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
struct module *module = NULL;
struct mutex *cb_mutex;
struct netlink_sock *nlk;
- int (*bind)(struct net *net, int group);
+ int (*bind)(struct net *net, unsigned long *groups);
void (*unbind)(struct net *net, int group);
int err = 0;

@@ -969,8 +969,7 @@ static int netlink_realloc_groups(struct sock *sk)
return err;
}

-static void netlink_undo_bind(int group, long unsigned int groups,
- struct sock *sk)
+static void netlink_undo_bind(unsigned long groups, struct sock *sk)
{
struct netlink_sock *nlk = nlk_sk(sk);
int undo;
@@ -978,7 +977,7 @@ static void netlink_undo_bind(int group, long unsigned int groups,
if (!nlk->netlink_unbind)
return;

- for (undo = 0; undo < group; undo++)
+ for (undo = 0; undo < nlk->ngroups; undo++)
if (test_bit(undo, &groups))
nlk->netlink_unbind(sock_net(sk), undo + 1);
}
@@ -991,7 +990,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
struct netlink_sock *nlk = nlk_sk(sk);
struct sockaddr_nl *nladdr = (struct sockaddr_nl *)addr;
int err = 0;
- long unsigned int groups = nladdr->nl_groups;
+ unsigned long groups = nladdr->nl_groups;
bool bound;

if (addr_len < sizeof(struct sockaddr_nl))
@@ -1021,17 +1020,9 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,

netlink_lock_table();
if (nlk->netlink_bind && groups) {
- int group;
-
- for (group = 0; group < nlk->ngroups; group++) {
- if (!test_bit(group, &groups))
- continue;
- err = nlk->netlink_bind(net, group + 1);
- if (!err)
- continue;
- netlink_undo_bind(group, groups, sk);
+ err = nlk->netlink_bind(net, &groups);
+ if (err)
goto unlock;
- }
}

/* No need for barriers here as we return to user-space without
@@ -1042,7 +1033,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
netlink_insert(sk, nladdr->nl_pid) :
netlink_autobind(sock);
if (err) {
- netlink_undo_bind(nlk->ngroups, groups, sk);
+ netlink_undo_bind(groups, sk);
goto unlock;
}
}
@@ -1652,7 +1643,9 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
if (!val || val - 1 >= nlk->ngroups)
return -EINVAL;
if (optname == NETLINK_ADD_MEMBERSHIP && nlk->netlink_bind) {
- err = nlk->netlink_bind(sock_net(sk), val);
+ unsigned long groups = 1UL << val;
+
+ err = nlk->netlink_bind(sock_net(sk), &groups);
if (err)
return err;
}
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 962de7b3c023..e765172abbb7 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -39,7 +39,7 @@ struct netlink_sock {
struct mutex *cb_mutex;
struct mutex cb_def_mutex;
void (*netlink_rcv)(struct sk_buff *skb);
- int (*netlink_bind)(struct net *net, int group);
+ int (*netlink_bind)(struct net *net, unsigned long *groups);
void (*netlink_unbind)(struct net *net, int group);
struct module *module;

@@ -61,7 +61,7 @@ struct netlink_table {
unsigned int groups;
struct mutex *cb_mutex;
struct module *module;
- int (*bind)(struct net *net, int group);
+ int (*bind)(struct net *net, unsigned long *groups);
void (*unbind)(struct net *net, int group);
bool (*compare)(struct net *net, struct sock *sock);
int registered;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 25eeb6d2a75a..a86b105730cf 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -960,28 +960,38 @@ static struct genl_family genl_ctrl __ro_after_init = {
.netnsok = true,
};

-static int genl_bind(struct net *net, int group)
+static int genl_bind(struct net *net, unsigned long *groups)
{
+ unsigned long mcgrps;
struct genl_family *f;
- int err = -ENOENT;
+ int err = 0;
unsigned int id;

down_read(&cb_lock);

idr_for_each_entry(&genl_fam_idr, f, id) {
- if (group >= f->mcgrp_offset &&
- group < f->mcgrp_offset + f->n_mcgrps) {
- int fam_grp = group - f->mcgrp_offset;
+ int fam_grp_bit, fam_grp = -1;
+
+ mcgrps = (1UL << f->n_mcgrps) - 1;
+ mcgrps <<= f->mcgrp_offset;
+ mcgrps &= *groups;
+
+ if (!mcgrps)
+ continue;
+
+ while ((fam_grp_bit = __builtin_ffsl(mcgrps))) {
+ fam_grp += fam_grp_bit;

if (!f->netnsok && net != &init_net)
err = -ENOENT;
else if (f->mcast_bind)
err = f->mcast_bind(net, fam_grp);
- else
- err = 0;
- break;
+
+ if (err)
+ goto out;
}
}
+out:
up_read(&cb_lock);

return err;

--
2.13.6