A memory corruption bug in UDP fragmentation offload (UFO) code inside the Linux kernel can lead to local privilege escalation. In this post we will examine this vulnerability and its accompanying exploit. Although this bug affects both IPv4 and IPv6 code paths, we analyzed only IPv4 code running on vulnerable kernel version 4.8.0 of Ubuntu xenial. The bug was fixed in Commit 85f1bd9.
Andrey Konovalov recently disclosed local privilege escalation exploits for vulnerabilities he found inside the Linux network subsystem while “fuzzing” with the tool syzcaller. In an oss-sec mailing thread, Konovalov wrote “When building a UFO packet with MSG_MORE __ip_append_data() calls ip_ufo_append_data() to append. However in between two send() calls, the append path can be switched from UFO to non-UFO one, which leads to a memory corruption.”
NIC Offloads and UFOs
Network interface card (NIC) offload allows the protocol stack to transmit packets that are larger than the Ethernet maximum transmission unit (MTU), which by default is 1,500 bytes. When NIC offload is enabled, the kernel will assemble multiple packets into a single large packet and pass it to the hardware, which handles IP fragmentation and segmentation into MTU-sized packets. This offload is often used with high-speed network interfaces for increased throughput because UFO can send large UDP packets.
The Linux kernel can take advantage of the segmentation-offload capabilities of various NICs.
Triggering a POC
The following is a simple proof of concept that will trigger kernel panic:
To build UFO packets inside the kernel we can take one of two steps:
- Use the UDP_CORK socket option, which tells the kernel to accumulate all data on this socket into a single diagram to be transmitted when the option is disabled.
- Use the MSG_MORE flag when calling send/sendto/sendmsg, which tells the kernel to accumulate all data on this socket into single diagram to be transmitted when a call is performed that does not specify this flag. This method triggers the vulnerability.
Inside the kernel, the udp_sendmsg function is responsible for constructing UDP packets and sending them to the next layer. The following code shows a stripped implementation of UDP cork functionality enabled by the user program using the UDP_CORK socket option or the MSG_MORE flag when calling send/sendto/sendmsg. When UDP corking is enabled, the ip_append_data function is called to accumulate multiple packets into single large packet.
The function ip_append_data is a wrapper around __ip_append_data, which is responsible for socket buffer management by allocating a new socket buffer to store the data passed to it or by appending the data to existing data when the socket is corked. One important task performed by this function is the handling of UFO. Socket buffers are managed in the socket’s send queue. In the case of corked sockets, the queue has an entry in which additional data can be appended. The data sits on the send queue until udp_sendmsg determines it is time to call udp_push_pending_frames, which finalizes the socket buffer and calls udp_send_skb.
The Linux kernel stores packets in the structure sk_buff (socket buffer), which is used by all network layers to store their headers, information about user data (payload), and other internal information.
The socket buffer inside the kernel.
In the preceding diagram, the head, data, tail, and end members of sk_buff point to the boundaries of the memory region in which protocol headers and the user payload is stored. The head and end point to the beginning and end of space allocated to the buffer. Data and tail point to the beginning and end of user data within the entire space. Immediately following the end boundary, the structure skb_shared_info holds important information for IP fragmentation.
When the first call to “send” is made with the MSG_MORE flag, as shown in the earlier POC, __ip_append_data takes creates a new socket buffer by calling ip_ufo_append_data, as we see in the following code:
When this call is finished, and the new socket buffer is created, user data is copied to the fragment and the shared info structure is updated with fragment information, as shown in the next image. The newly created sk_buff is then placed in the queue.
In the next step, the PoC updates the socket to not calculate a checksum on the UDP by setting the option SO_NO_CHECK ; this overrides the sk->sk_no_check_tx member of the socket structure. Inside __ip_append_data this variable is checked as one of the conditions prior to calling ip_ufo_append_data.
During the POC’s second call to “send,” a non-UFO path is taken inside __ip_append_data, which proceeds to a fragment length calculation loop. During the first iteration of the loop, the value of copy becomes negative, which triggers a new socket buffer allocation. Plus the fraggap calculation exceeds the MTU and triggers fragmentation. This leads to copying the user payload from sk_buff, created by the first send call, to the newly allocated sk_buff using the skb_copy_and_csum_bits function. This copies a specified number of bytes from the source buffer to the destination sk_buff and computes a checksum. Calling skb_copy_and_csum_bits with a length greater than the newly created sk_buff boundary end limit overwrites the data beyond the socket buffer and corrupts the skb_shared_info structure that is immediately preceded by sk_buff.
The corrupted skb_shared_info structure follows. The memory at address 0xffff88003a4ca900 is the newly created sk_buff with end=1728, where the fragmentation is triggered.
This bug can be exploited by an unprivileged user when unprivileged user namespaces are allowed on most default Ubuntu desktop systems. Users should be able to do two things:
- Set up an interface with UFO enabled (possible from the user namespace) or use that interface if it is already present. (The “lo” interface enables UFO by default.)
- Disable the NETIF_F_UFO interface feature or set the SO_NO_CHECK socket option.
Code execution can be diverted to user-mode shellcode by simply crafting a fake skb_shared_info structure at the end of a large buffer and setting the callback member to shellcode. The second “send” triggers an out-of-bounds condition on the socket buffer, overwriting skb_shared_info->destructor_arg with the user-mode shellcode address, which is invoked before sk_buff is released from kernel memory.
The Linux kernel offers a big attack surface when exposed to unprivileged users. All users should keep their systems patched with the latest updates.
Stay up to date on this vulnerability and more by following @McAfee_Labs.