Return to the Kernel page

LWN.net Weekly Edition for May 23, 2013

LWN.net Weekly Edition for May 16, 2013

Weekly edition	Kernel	Security	Distributions	Contact Us	Search
Archives	Calendar	Subscribe	Write for LWN	LWN.net FAQ	Sponsors

Generic semaphores

ByJonathan Corbet
March 17, 2008

Most kernel patches delete some code, replacing it with newer and (presumably) better code. Much of the time, it seems, the new code is more voluminous than what came before. Occasionally, though, a patch comes along which deletes over 7600 lines of code - replacing it with a mere 314 lines - while claiming to maintain the same functionality. Matthew Wilcox's generic semaphore patch is one of those changes.

In essence, a semaphore is a counter with a wait queue attached to it. When kernel code wants to access the resource protected by the semaphore, it makes a call to:

    void down(struct semaphore *sem);

This call will check the counter associated with sem; if it is greater than zero, the counter will be decremented and control returns to the caller. Otherwise the caller will be put to sleep until sometime in the future when the counter has been increased again. Increasing the counter - when the the protected resource is no longer needed - is done with a call to up(). Semaphores can be used in any situation where there is a need to put an upper limit on the number of processes which can be within a given critical section at any time. In practice, that upper limit is almost always set to one, resulting in semaphores which are used as a straightforward mutual exclusion primitive.

In current kernels, semaphores are implemented with highly-optimized, architecture-specific code. There are, in fact, more than twenty independent semaphore implementations in the kernel code base. Matthew's patch rips all of that out and replaces it with a single, generic implementation which works on all architectures. After the patch is applied, a semaphore looks like this:

    struct semaphore {
 spinlock_t  lock;
 int   count;
 struct list_head wait_list;
    };

The implementation follows from this definition in a straightforward way: the spinlock is used to protect manipulations of count, while wait_list is used to put processes to sleep when they must wait for count to increase. The actual code, of course, is somewhat complicated by performance and interrupt-safety considerations, but it remains relatively short and simple.

One might ask: why weren't semaphores done this way in the first place? The answer is that, once upon a time (prior to 2.6.16), semaphores were one of the primary mutual exclusion mechanisms in the kernel. The 2.6.16 cycle brought in mutexes from the realtime tree, and most semaphore users were converted over. So semaphores, which were once a performance-critical primitive, are now much less so. As a result, any need there may have been for carefully hand-tuned, architecture-specific code is gone. So the code might as well go too.

The other question which comes up is: why are semaphores still being used at all? The number of semaphore users has dropped considerably since 2.6.16, but there are still a number of them in the kernel. Some of those could certainly be converted to mutexes, but doing so requires a careful audit of the code to be sure that the semaphore's counting feature is not being used. Once that work is done, it may turn out that, in some places, a semaphore is truly the right data structure. So semaphores are likely to remain - but they'll require rather less code than before.

(Log in to post comments)

Generic semaphores

Posted Mar 20, 2008 4:27 UTC (Thu) by daniel (subscriber, #3181) [Link]

Hi Jon,

"The other question which comes up is: why are semaphores still being used at all?"

Stay away from my counting semaphores, please :-)

Thankyou.

Daniel

Mutex is not a semaphore

Posted Mar 20, 2008 5:22 UTC (Thu) by ikm (subscriber, #493) [Link]

Somehow I feel that the "mutex vs semaphore" distinction would have been nice to have from the
start  this way the question "why are semaphores still being used at all?" would never arise
at all. I'd like to notice that one use of a semaphore as of a "fifo for token-like objects",
that is, the one where the initial value is usually 0 and where one side initiating pushes
causes another side to initiate the same amount of pops is not mentioned in this article at
all, while e.g. for me it was always the primary use for semaphores (when I had dedicated
mutex primitives for handling mutual exclusions, that is).

That is, semaphore is not a mutex

Posted Mar 20, 2008 5:49 UTC (Thu) by ikm (subscriber, #493) [Link]

Subject should've been the other way around, sorry :) Point is, stop using them
interchangeably!

Mutex is not a semaphore

Posted Mar 22, 2008 3:22 UTC (Sat) by willy (subscriber, #9762) [Link]

> I'd like to notice that one use of a semaphore is of a "fifo for
> token-like objects", that is, the one where the initial value is
> usually 0

The semaphore implementations (both current and new) do support this, but we have few if any
users of it currently in the kernel.  We have the 'completion' API which sort-of does what you
want (it's specialised for 'I am exiting now', but doesn't have to be used that way).

My motivation was really not to correct mutex vs semaphore usage but to add new features to
semaphores -- down_killable() and down_timeout() are the two I've added so far, each taking
very few additional lines of code.

I now suspect, having read over the completion code, that completions could be rewritten in
terms of my new semaphore implementation.  I'll look into it later.

Matthew Wilcox

Generic semaphores

Posted Mar 21, 2008 20:02 UTC (Fri) by Alan.Stern (subscriber, #12437) [Link]

One reason for keeping semaphores around is that they aren't subject to lockdep checking,
whereas mutexes are.  While this checking is a very good thing and I'm all in favor of it, the
fact remains that there are some usage patterns lockdep cannot handle.  A typical example is a
tree of data structures (like the device tree), where locks must be acquired from the top
down, or from the bottom up.

Generic semaphores

Posted Mar 22, 2008 3:31 UTC (Sat) by willy (subscriber, #9762) [Link]

> A typical example is a tree of data structures (like the device tree),
> where locks must be acquired from the top down, or from the bottom up.

lockdep would be right to moan about that -- if you have two tasks trying to do that, they can
deadlock against each other.

The real problem with lockdep is that mutexes have an owner and semaphores simply don't.  If
they're counting (>1) then they can have multiple simultaneous owners.  And, in some
circumstances, they can be acquired from interrupt context.  Horrible, but there you go.

Generic semaphores

Posted Mar 22, 2008 13:47 UTC (Sat) by Alan.Stern (subscriber, #12437) [Link]

> lockdep would be right to moan about that -- if you have two tasks
> trying to do that, they can deadlock against each other.

There you go -- you are suffering the same weakness as lockdep.  It's not possible for two
tasks to deadlock when acquiring locks in a tree structure provided that the locks are always
acquired going down a branch.  (Or if the locks are always acquired going up a branch.)  This
sort of constraint could potentially be represented within lockdep, but right now there's no
way to do it.

> The real problem with lockdep is that mutexes have an owner and
> semaphores simply don't.

That isn't a problem with lockdep; it's a problem with trying to use lockdep to track general
semaphore usage.  Lack of explicit parents for semaphores hasn't prevented many semaphores
from being converted to mutexes, complete with lockdep monitoring.

My point was that even though a semaphore may be used for simple mutual exclusion in a way
that should be compatible with replacement by a mutex, the replacement can't be carried out if
the semaphore is being used to lock members of a tree.

Generic semaphores

Posted Mar 23, 2008 13:37 UTC (Sun) by willy (subscriber, #9762) [Link]

> It's not possible for two tasks to deadlock when acquiring locks in a
> tree structure provided that the locks are always acquired going down
> a branch.  (Or if the locks are always acquired going up a branch.)

That's not what you said before!  It sounded like you were describing a situation where task A
acquires locks walking down the tree and simultaneously task B acquires locks walking up the
tree.  That can't work.

lockdep does manage to handle situations like the VFS where we have i_mutex protecting each
inode and some rather byzantine rules regarding which have to be locked first (it's
particularly hairy when doing cross-directory renames -- you have to lock the renamed object,
its parent and the destination directory.  While not deadlocking against any other operation
happening at the same time).

So your assertion is demonstratedly untrue, but you still can't give semaphores an owner.

Generic semaphores

Posted Mar 23, 2008 14:15 UTC (Sun) by Alan.Stern (subscriber, #12437) [Link]

> That's not what you said before!  It sounded like you were describing a
> situation where task A acquires locks walking down the tree and
> simultaneously task B acquires locks walking up the
> tree.  That can't work.

Okay, what I wrote before wasn't sufficiently unambiguous and you misinterpreted it.

> So your assertion is demonstratedly untrue, but you still can't give
> semaphores an owner.

Lockdep works in the special-case example of VFS.  It still can't be made to handle general
tree-usage patterns.

It's true that many semaphores can't be given an owner.  However there are some which can, but
which nevertheless can't be converted to a mutex.

Generic semaphores

Posted Mar 23, 2008 16:45 UTC (Sun) by willy (subscriber, #9762) [Link]

> Lockdep works in the special-case example of VFS.  It still can't be
> made to handle general tree-usage patterns.

I don't think that's true.  Take a look at Documentation/lockdep-design.txt.  The
mutex_lock_nested() interface needs to be specialised for each tree, but that doesn't look
hard -- the one in linux/fs.h (inode_i_mutex_lock_class) is more complex than most.  It looks
like a parent/child locking rule would be sufficient for most trees.

Generic semaphores

Posted Mar 23, 2008 23:33 UTC (Sun) by Alan.Stern (subscriber, #12437) [Link]

> It looks like a parent/child locking rule would be sufficient for most
> trees.

Only if you never need to hold more than two locks at a time.  In general that isn't good
enough.  (It certainly isn't good enough for the device tree!)  Furthermore lockdep has a
limit of 8 subclasses; thus you won't be able to acquire the locks for all the nodes along a
branch if the branch is too long.

In addition, trees can have other, more complicated, access patterns which lockdep can't even
come close to handling.  For instance, the rule about always acquiring locks going down a
branch can be generalized as follows:

    Whenever you hold a lock on a node A, you must not acquire
    a lock on node B unless you already hold the lock for the
    closest common ancestor of A and B.

This rule allows you to acquire locks down a branch, but it allows other patterns as well.  If
all threads follow the rule then deadlock can never occur (exercize!).  Clearly this is far
beyond lockdep's ability to express.

Apr	MAY	Jun
	24
2012	2013	2014