Linux kernel CVE-2016-5195 "Dirty COW" mitigated by Sandstorm

By Kenton Varda - 25 Oct 2016

Last week, a Linux kernel bug, CVE-2016-5195, was described as “the most serious Linux local privilege escalation ever”. The bug – which potentially allowed any code running on a Linux machine to escalate its privileges to root – was already being actively exploited in the wild before it was fixed, and had existed in the kernel for many years.

Since Sandstorm allows any user of a server to upload their own apps, you might wonder if this bug could allow a Sandstorm user to compromise the server.

We’re happy to report that the answer appears to be “no”. As is often the case with Linux kernel bugs, our sandbox blocked the exploit.

Of course, we still recommend updating your kernel in case the bug can be exploited in ways that have not been discovered yet.

Technical Details

The bug in question was a race condition in the handling of memory pages mapped copy-on-write. A process can ask that a read-only file be mapped into its memory space in such a way that it is allowed to modify the mapped memory. When the process writes to the memory, the kernel makes a private copy of the affected page, so that the process only modifies its copy, not the original. Meanwhile, a process can request later on that the modifications it made be discarded, returning the page to its original state. In certain circumstances, by both writing to a page and requesting this discard at the same time (in separate threads), the process could end up writing to the original pages that are shared with other processes on the system, instead of its own private copy. Hence, the process could modify any file on the system. By modifying, say, the sudo utility, it could give itself a backdoor which allows it to gain root privileges trivially.

However, not just any old write worked here. In order to trigger the race condition, the process had to write in a way that calls the kernel’s get_user_pages() function with the force parameter set to 1. The force parameter says: “If this page is mapped copy-on-write, then let me write to it (making a private copy) even if the page’s protection mode is read-only.” As it turns out, it is possible for a memory mapping to be both read-only and copy-on-write, and in fact this is the mode that is usually used when mapping in a program’s main binary and shared libraries. Normally, no copy is ever performed, because the writes that would trigger them are not allowed. However, there is a special case where this combination of flags matters: If you are running a program in a debugger, and you ask the debugger to insert a breakpoint, it does so by overwriting the instruction at the given address with a break instruction. That is, it modifies the mapped executable. The force flag actually exists for exactly this purpose: so that the debugger can inject breakpoints into the program being executed by the process being debugged (without affecting any other processes that happen to be running the same program).

Because the force flag is only useful in very specific circumstances, only certain code paths can trigger the vulnerability. Kernel security engineer and Sandstorm contributor Andrew Lutomirski tells us the only entry points appear to be:

The ptrace() system call’s PTRACE_POKEDATA operation, which is explicitly meant to be used by debuggers, often for the purpose of setting breakpoints.
Writes to /proc/<pid>/mem. It’s unclear why this code uses force – possibly it was a mistake.
Various drivers, which are also probably using the flag by mistake.

As it turns out, none of these code paths can be exploited by Sandstorm apps:

Sandstorm uses seccomp to block the app from invoking ptrace().
Sandstorm does not mount /proc inside app sandboxes.
Sandstorm does not expose driver interfaces from inside app sandboxes. For example, /dev contains only null, zero, and urandom.

So, as far as we can tell, Sandstorm has never been vulnerable to this bug.

Defense in depth

Even if Sandstorm were vulnerable, the exploit would have far reduced impact inside Sandstorm than in a typical Linux environment, because:

Within a Sandstorm app’s sandbox, the visible filesystem consists of the contents of its own package. It cannot see the host system’s files nor files belonging to other apps, hence it would not be able to memory-map them in order to modify them using this bug.
App packages cannot contain setuid binaries and, even if they could, apps would not be able to execute them, because Sandstorm sets the NO_NEW_PRIVS prctl() flag inside the sandbox.

When running on Sandstorm, a user’s data in an app like Etherpad is containerized separately from another user’s data. In fact, we go one step further and containerize each document separately. In the case that Sandstorm had not mitigated the bug outright, it appears the impact of the bug would be that an app could break Sandstorm’s per-document isolation and read/write documents from any number of users, so long as those users all use the same version of the same app on the same server. The app still would not have been able to interfere with other apps. This is the status quo in a typical Linux environment: in most non-Sandstorm environments, an app keeps all users’ data in a single database without per-user isolation. Overall, this is much less significant than a privilege escalation to root. Thankfully, our seccomp mitigation prevented this.

Sandstorm’s Security Record

This is not the first Linux security bug mitigated by Sandstorm. In fact, we’ve kept a long list. Moreover, in addition to mitigating Linux kernel problem, Sandstorm mitigates most vulnerabilities in the apps that run on top of it. Check out the whole list of mitigated vulnerabilities that we’ve compiled: Sandstorm Security Non-Events

Want to try out Sandstorm as a user? Try the online demo »

Sandstorm Blog

Recommended Posts

Further Reading