Detecting a mount point
Today, I just want to dispense the results of a tiny bit of research I did. May it be of some use to you :)
How to scan a directory tree
In the old days, the way to scan a directory tree was to readdir
the names in the directory, lstat
them, and then recursively do this for any directories you found. The reason you would use lstat
instead of stat
is that symlinks can point up in the tree, potentially creating loops. Directories cannot point up in the tree, because you can't hardlink them: the original prohibition on hard linking directories was implemented because it tricks programs into endless recursion.
That doesn't keep bash
from doing it the wrong way (echo **
is a sure way to invoke the out-of-memory killer on my system, if I had the patience to wait for it, that is), but most programs working on directory trees (such as tar
) do get this right.
The two main cases where this is not enough are -x
options and Linux. First, many programs have an -x
or --one-file-system
option that instructs them not to cross filesystem boundaries. That way, you can use tar --one-file-system
to get a tar file of your root filesystem without also adding /proc
for example.
The way this was traditionally implemented is to stat the directory you want to recurse into and its parent directory, and see whether the st_dev
member matches. If it doesn't match, the subdirectory is on another filesystem, i.e. a mount point.
Unfortunately, this fails badly on Linux, for two reasons. The first is a bug - st_dev
values are not stable (most notably with NFS), that is, you might stat
a path, get a st_dev
value, and later stat
the same path again to get a different st_dev
value, even though nothing has changed. This can cause your tree walker to detect a filesystem crossing when there really isn't one.
The other reason is bind mounts - you can create a loop by bind mounting some directory into a subdirectory of itself:
mkdir subdir mount --bind . subdir ls subdir/subdir/subdir/subdir/...
This successfully confuses a lot of programs who otherwise get recursion right (or to put it differently, Linux bind mounts broke a lot of programs that worked properly for decades).
So how to avoid it?
The horrible way
One horrible way is to parse /etc/mounts
, the output of mount
, or some other source of mount points and compare the paths with the directory you want to enter, to see if it is a mount point.
This is horrible for so many reasons - comparing paths is hard (some of the components might be symlinks), and there are so many race conditions you can't check for - somebody could mount a new directory, an existing directory could be renamed, and so on.
I'd outright refuse to use this method, even if there was no other solution.
The tricky^Wtrivial way
Fortunately, there is a super-simple way of checking whether a directory is a mount point that is mostly race-free (and can be used for completely race-free scanning with some care) - by exploiting what some people consider a bug, but the Linux kernel developers fortunately consider a security feature: While a bind mount might refer to the same filesystem as its parent, it is still a security boundary for rename
and similar syscalls.
That means that you can't rename
or link
over mount point boundaries, even when source and destination are the same filesystem!
A practical way to exploit this is to do some illegal rename and then check the error return (errno
).
rename ("somepath/.", "somepath/subdir/.");
If this succeeds, you are in big trouble of course. So it will fail, but how exactly does it fail?
If it fails with EXDEV
, then somepath/subdir is a mount point. Otherwise (e.g. EBUSY
), it isn't. If you chdir
into the subdir before checking (or use the xxxat
functions), and use relative paths, then it is even race-free, i.e. this will reliably tell you whether the current directory is a mount point:
rename ("../.", ".");
Even if somebody mounts something over that directory shortly afterwards, your program can still continue to scan to underlying paths on the same filesystem.
Brilliant, isn't it? So what are the trade-offs? Everything has trade-offs, right?
Well, for one, this isn't exactly POSIX-specified behaviour. Bind mounts and other constructs where this problem happens aren't governed by POSIX anyway, and some OS might someday implement a similar construct and allow cross-mount-point renames. I don't know of any who do at the moment, and it would be dumb to do so, but then...
The bigger issue is that POSIX doesn't actually specify the order in which error conditions are checked. The above rename can fail for a number of reasons, and there is no guarantee that EXDEV
will be returned even for mount points - the kernel could simply refuse to use . as source path, regardless of destination. But currently, it is working, everywhere, and that's likely not going to change.
In any case, it's more portable and better than any other alternative I can think of.
Appendix: security boundary?
The reason why the EXDEV
error behaviour is not an accident but by design is that it is supposedly usable as a security boundary. As explained by Al Viro, this:
mount --bind /tmp /tmp
Gives you a /tmp directory that keeps people from linking from inside /tmp to the outside, and vice versa, which prevents some common exploits.
So this is a supported feature. Of course, there are bugs everywhere, so it might not always work, but that isn't news, is it?