For those of you who don’t use Linux or some variant of UNIX in your day to day lives, allow me to explain what
You have a filesystem, and you want to search it for files that may match a particular name, or a pattern. For that, you can use the command
find. Problem was, back in the late 70’s and early 80’s and even through to now on a lot of systems,
find is essentially an unindexed search. It will trawl your entire filesystem looking for what you have commanded it to search for.
This can result in long waits and heavily impacted I/O, meaning reduced performance for the rest of the system. And as filesystem sizes got bigger, it got worse.
And so, early in 1983,
locate was proposed. This system would essentially run
find routinely, and build up a database. You could then perform simple searches against that database – in other words: an indexed search. The only main downside is that it would not be a real-time index.
locate has got better as time has gone on, with other descendant versions released, such as secure
slocate), which prevents you from getting search results for files you don’t have access to, and
rlocate (I assume the ‘r’ stands for ‘real-time’), which takes
slocate and adds a kernel module that updates the database in real-time. Add a new file? The
locate database is immediately updated. Problem with
rlocate is it seems to have begun and ended with the 2.6 branch of the Linux kernel.
And then you have the incumbent, merging
mlocate) which does the same as
slocate as well as throwing in a more efficient way to update its database. Essentially it looks at its own database and checks the modified timestamp of every directory. If a directory has a modified timestamp that differs from the database, then
mlocate will re-index it. Or at least that’s my understanding of how it works.
These days, some versions of
find keep their own database, and so it may be faster in some cases to just use
But why did they need a database for
locate in the first place? Couldn’t they just have a plaintext file and use
grep? (For the uninitiated,
grep is a very powerful tool that lets you search a file’s contents).
Well, back in 1983, they asked the same questions. And it has no doubt been asked hundreds of times since. Back then, hard drive space wasn’t so available, and
grep on a PDP-11 wasn’t exactly fast. So the database solution was chosen in order to save space and to improve search performance.
So what does all of this have to do with me? Well, it’s 2016. We have hard drive space galore, and
grep is fast these days thanks to modern hardware. And I have an itch to scratch. Long story short, this is something I needed to do in order to enable some further work.
Imagine you have a host that doesn’t have
locate or some variant of it. Basically the moment you step out of Linux and into Solaris, HP-UX, AIX and others, you find that
locate is sadly not as ubiquitous as you would expect. But because hardware isn’t a limitation, we can easily create a simple version using basic shell scripting. So that’s what I did. You can get the code here:
So how does it perform? Let’s start with
find, and note that this is on an SSD:
# time find / -type d -name "bin" ... real 0m16.184s user 0m2.224s sys 0m5.888s
Next, the incumbent,
# time locate -r '.*/bin$' ... real 0m9.778s user 0m9.656s sys 0m0.032s
Now, this uses the regex capability of
locate to give us a more precise list. If we just use a raw check, which will return anything and everything with the characters ‘/bin’ in it:
# time locate /bin ... real 0m1.394s user 0m0.724s sys 0m0.024s
But this does return a bunch of useless crud, which you would then waste time filtering out with
And finally, my version, which is essentially an over-glorified
egrep and a plain-text index:
# time ./smlocate /bin$ ... real 0m0.074s user 0m0.028s sys 0m0.028s
My version also attempts to replicate
slocate‘s separation of root-readable and user-readable indexes. It might be interesting to see how far I can stretch this, or whether it’s good enough just as it is.
One interesting lesson from testing it, however: On FreeBSD, it kept throwing an error from one of the
chown: root: Invalid argument
This was caused by the command
chown root:root. Of course, it took a few minutes for me to remember that there is no group in FreeBSD named
root. The fun thing is, instead of having a solution like:
if [ "`uname`" = FreeBSD ]; then chown root:wheel blah else chown root:root blah fi
It’s a lot simpler than that, as
chown can accept UID’s and GID’s:
chown 0:0 blah
So, if you find my hackish version of
locate useful, let me know.