|
Kernel development
The current 2.6 prepatch is 2.6.24-rc1, released by Linus on
October 23. As Linus noted
when tagging the release:
The patch is big. Really big. You just won't believe how vastly
hugely mindbogglingly big it is. I mean you may think it's a long
way down the road to the chemist, but that's just peanuts to how
big the patch from 2.6.23 is.
See the article below for this week's update on what was merged before the
window closed. There's also the short-form
changelog for a list of patches or the
long changelog for all the details.
As of this writing, no patches have been merged into the mainline git
repository since the -rc1 release. There have also been no -mm releases
over the last week.
For older kernels: 2.6.20.21 was released on
October 17 with a few dozen patches including a couple of security
fixes. 2.6.16.56-rc1 came
out on October 20; it has a dozen or so patches, again with a couple
of security fixes.
Comments (none posted)
Kernel development news
mb() is the new lock_kernel(). Sigh.
-- Andrew Morton
So I *really* don't want to throw any stones in a glass house
here. Quite the reverse. I'd like to get rid of some of the glass,
and replace it with padding. Because you all know we'd all fit
better in a padded room than a glass house..
-- Linus Torvalds
I was in bugfix-only mode from a week prior to 2.6.24 release and
during the merge window. Partly caused by the already-idiotic
amount of stuff we had queued for 2.6.24, partly because we needed
to concentrate on stabilising the 2.6.25 patchpile rather than
writing new stuff.
And partly to send the signal that rather than beavering away on
new features all the time, we should also be spending some (more)
time testing, reviewing and bugfixing the current and
soon-to-be-current code.
-- Andrew Morton
I don't currently know of any common piece of hardware in use today
that is not supported on Linux. And since these vendors do not
know, and I don't, I'm asking the world to help out.
-- Greg
Kroah-Hartman looks for driver projects
Comments (12 posted)
At the kernel summit in
September, Andi Kleen, the maintainer of the i386 and x86_64 architecture
code, stated that he would not maintain that code if it was merged into the
unified x86 architecture. He
appears to have not changed his mind on that score; a patch
merged for 2.6.24 states that the x86 maintainers will be Thomas Gleixner,
Ingo Molnar, and H. Peter Anvin. The x86 code is clearly in good hands,
but it is sad to see Andi bow out; we owe him a lot of thanks for
maintaining the architectures that most of us use for so long.
Comments (19 posted)
ByJonathan Corbet October 24, 2007
The 2.6.24 merge window has now closed; more than 7000 changesets were
merged before 2.6.24-rc1 was released.
The bulk of the new features for 2.6.24 were described last week. Here's a
summary of patches merged since then, starting with user-visible changes:
-
There are new drivers for Marvell Libertas 8385/6 wireless chips,
Freescale 3.0Gbps SATA controllers, Fujitsu laptops (LCD brightness in
particular), and TI AR7 watchdog devices.
-
Another set of old Open Sound System drivers has been removed from the
kernel.
-
The "uninitialized block groups" feature has been merged into the
ext4 filesystem. UBG helps to speed filesystem checks by keeping
track of which parts of a disk partition have never been used, and,
thus, do not require checking.
-
As was discussed back in
August, the binary sysctl() interface has been marked
deprecated, and the code for many of the sysctl targets (much of which
appears to not have worked for some time) has been removed. There is
a new checker which looks for problematic sysctl definitions;
according to Eric Biederman,『As best as I can determine all of
the several hundred errors spewed on boot up now are
legitimate.』
-
The semantics of the CAP_SETPCAP capability have been
changed. In previous kernels, this capability gave a process the
ability to bestow new capabilities upon another process; now, instead,
it allows a process to set capabilities within its own "inherited"
mask.
-
Process CPU time accounting (via taskstats) has been augmented with
information allowing CPU usage time to be scaled by CPU frequency.
-
The Control Groups (formerly process containers) patch
set has been merged. Control groups will allow the CFS group
scheduling feature to be used; it will also be the control mechanism
used for containers in general.
-
Process ID namespaces have been added; this feature lets container
implementations create a different view of the list of processes on
the system for every container.
-
The kernel markers patch
set has been merged.
-
The CIFS filesystem now has access control list (ACL) support.
-
The old, unmaintained Fibre Channel support code has been removed.
Changes visible to kernel developers include:
-
The process of merging the i386 and x86_64 architectures continues,
with many files having been merged by the time the window closed.
This job is far from complete, though. For the curious, this message from Ingo Molnar talks a bit
about what is going on there. "The x86 architecture is the most
common Linux architecture after all - and users care much more about
having a working kernel than they care about cleanups and
unifications....
This cannot be realistically finished in v2.6.24, without upsetting
the codebase."
-
The paravirt_ops structure has been split into several smaller, more
specialized operations vectors. These include pv_init_ops
(boot-time operations), pv_time_ops (for time-related
operations), pv_cpu_ops (privileged instructions),
pv_irq_ops (interrupt handling), pv_mmu_ops (page
table management), and a few others.
-
There are some new bit operations which have been added:
int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
void clear_bit_unlock(unsigned long nr, unsigned long *addr);
void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
These operations are intended to be used in the creation of single-bit
locks; they work without the need for any additional memory barriers.
-
There is a new KERN_CONT priority level for
printk(). It is, in fact, empty; it is meant to serve as a
marker for printk() calls which continue a previous (not
terminated with a newline) printed line.
-
The watchdog device drivers have been moved to a new home at
drivers/watchdog.
-
A notifier mechanism for console events has been added; this feature
is aimed at accessibility tools (like Speakup) which need to know when
something has changed on the console display.
-
The filesystem export operations, used to make filesystems available
over protocols like NFS, have been reworked. Two new methods
(fh_to_dentry() and fh_to_parent()) replace the old
get_dentry() interface. There is a new structure (struct
fid) used to describe file handles. This work is aimed at making
the export interface easier to use and (eventually) supporting 64-bit
inode numbers.
-
The virtio patches -
providing an infrastructure for I/O into and out of virtualized guests
- have been merged.
Now the stabilization period begins.
Comments (4 posted)
ByJonathan Corbet October 24, 2007
An interrupt handler is the portion of a device driver which is charged
with responding to interrupts from the hardware; at a minimum it should
shut the hardware up and initiate any processing which needs to be
performed.
When your editor worked on the second edition of Linux Device
Drivers, the prototype for interrupt handlers looked like this:
void handler(int irq, void *dev_id, struct pt_regs *regs);
The kernel development process is not particularly kind to book authors
who, as a rule, prefer to see the ink dry on their creations before the
text becomes obsolete. True to form, the handler prototype has changed a
couple of times since LDD2, with the result that the 2.6.23 version looks
like:
irqreturn_t handler(int irq, void *dev_id);
Along the way, interrupt handlers gained a return type (used to tell the
kernel whether an interrupt was actually processed or not) and lost the
processor registers argument. One would think that this interface (along
with those who attempt to document it) had suffered enough, but, it seems,
there will be no rest in the near future.
In particular, Jeff Garzik has proposed that the irq
argument be removed from the interrupt handler prototype. There are
very few interrupt handlers which actually use that argument currently.
And, as it turns out, most of the remaining handlers do not actually need
it; they are often using the interrupt number to identify the interrupting
device, but the dev_id pointer already exists for just that
purpose. Still, getting this patch into the kernel would require a
significant amount of work, since every in-tree interrupt handler will have
to be audited and fixed up.
So Jeff is taking it slowly; this is not a patch set which is aimed at
being merged for 2.6.24. Before it goes in, there is room for a lot of
useful work cleaning up the current use of the irq argument in
drivers, all of which would ease the eventual transition to the new call.
Handlers which really need the IRQ number can call the new
get_irqfunc_irq() function. But, says
Jeff,『I am finding a ton of bugs in each get_irqfunc_irq()
driver, so I would rather patiently sift through them, and push fixes and
cleanups upstream.』 Quite a few interrupt handler fixes resulting
from this work have already been posted.
Eric Biederman worries that converting all of the drivers could be a
challenge; he has posted a proposal which
would create two different interrupt registration and handler interfaces,
allowing drivers which really need the IRQ number to continue to
receive it. Jeff is confident that the extra structure will not be
necessary, though. Thomas Gleixner, instead, would like to see the patches merged
immediately, but it is almost certain that this patch set will be given
one more development cycle to mature before going into the mainline.
Alexey Dobriyan, meanwhile, would like to fix up the interrupt-safe
spinlock interface. Most code which requires a spinlock in the presence of
interrupts calls:
void spin_lock_irqsave(spinlock_t *lock, unsigned long flags);
The flags variable is used by the (architecture-specific) code to
save any interrupt state which may be needed when
spin_unlock_irqrestore() is called. The problem with this
interface is that it is not particularly type-safe. Developers have been
known to use an int type instead of unsigned long; that
usage will generate no errors and it will work fine on the x86
architecture. It will, however, fail in ugly ways on some other
architectures.
So Alexey would like to turn flags into a new type
(irq_flags_t). This type would initially be defined to be
unsigned long, so the change would not break compilation. It
would be annotated, though, so that the sparse utility could
point out all of the places where spin_lock_irqsave() is called
with an incorrect type. In the more distant future, when the changeover is
complete, architecture maintainers would be able to redefine the type to
whatever works best on their systems, be it a structure or a single byte.
Andrew Morton had a mixed response to the
patch:
Yes, it's always been ugly that we use unsigned long for this
rather than abstracting it properly.
However I'd prefer that we have some really good reason for
introducing irq_flags_t now. Simply so that I don't needlessly
spend the next two years wrestling with literally thousands of
convert-to-irq_flags_t patches and having to type『please use
irq_flags_t here』in hundreds of patch reviews.
As an alternative, it was suggested that most calls of
spin_lock_irqsave() should be changed to spin_lock_irq()
instead. The latter version disables interrupts without saving the
previous state; the accompanying spin_unlock_irq() call will then
unconditionally re-enable interrupts. Those functions can be made to work,
but only if it is known that interrupts will not have already been disabled
when spin_lock_irq() is called. Otherwise the
spin_unlock_irq() call risks enabling interrupts when some other
part of the kernel expects them to still be disabled. The resulting random
behavior is generally seen as undesirable by most computer users.
So, in other words, spin_lock_irqsave() is a safer interface,
which is why there is not a great deal of support for removing it. The
prospect of well-intentioned kernel janitors changing code to
spin_lock_irq() without really understanding the broader context
is just too scary.
Finally, there was a discussion involving synchronize_irq() which
illustrates just how hard it can be to get a handle on race conditions on
multiprocessor systems. This function:
void synchronize_irq(unsigned int irq);
is intended to help coordinate actions between a driver's interrupt and
non-interrupt code. At its core, it is a simple loop:
while (desc->status & IRQ_INPROGRESS)
cpu_relax();
In other words, synchronize_irq() will busy-wait until it is known
that no handlers are running for the given interrupt. The idea is that any
interrupt handler which might have been running before the call to
synchronize_irq() will have completed when that function returns.
The typical usage pattern is something like this:
some_important_flag = a_new_value;
synchronize_irq();
/* Code which depends on IRQ handler seeing a_new_value here */
With code like this, after the synchronize_irq() call, any
interrupt handler will be guaranteed to see a_new_value - or so
people think.
The problem is that contemporary processors will happily reorder memory
operations to avoid pipeline stalls and improve performance; the what every programmer should know
about memory series currently being serialized by LWN describes these
issues in detail. What is relevant here is that the change to
some_important_flag might be reordered (delayed) such that it does
not become visible to other processors
on the system until sometime after synchronize_irq() returns.
During the window when the change is not visible, the promise of
synchronize_irq() is not kept - an interrupt handler could run and
see the old value, possibly creating mayhem as a result. That is the sort
of obscure, one-in-a-billion race
condition which keeps kernel hackers up at night.
Actually, kernel hacking and coffee keep kernel hackers up at night, but
your editor's point should be clear.
Benjamin Herrenschmidt, upon finding this race, attempted to fix it with a memory barrier.
After some discussion, though, it became clear that the memory barrier was
not sufficient. Barriers can affect the order in which operations become
visible, but they cannot, in the absence of corresponding
barriers on another processor, guarantee that a specific change becomes visible
to that processor at any given time. That sort of guarantee requires the
use of a locked operation which forces synchronization between
processors - the sort of operation which is typically used to implement
spinlocks.
So the real solution appears to be this
patch by Linus Torvalds and Herbert Xu. The while loop shown
above persists in the new version, and it continues to run with no locks
held - holding the interrupt descriptor lock when the interrupt subsystem
may want it could lead to deadlocks. But, once it appears that no handlers
are running, the descriptor lock is acquired and the status is checked one
more time. If no handlers are running, the synchronize operation is
complete; otherwise the code goes back to busy-waiting. The acquisition of
the descriptor lock guarantees that memory barriers will have been executed
on both sides of any potential race condition; that, in turn, will force
the ordering of the memory operations. So, with this change in place,
synchronize_irq() will truly synchronize with IRQ handlers and one
more difficult race condition will have been eliminated.
Comments (1 posted)
ByJake Edge October 24, 2007
The ever-contentious Linux Security Modules (LSM) API is being debated once
again on linux-kernel, not its removal, which Linus Torvalds came down
firmly against, but whether it should allow security modules to be loaded
dynamically. As part of 2.6.24, Torvalds merged a patch to convert LSM into a static
interface, but has indicated a willingness to revert it. The key
sticking point is whether there are real security modules that require the
ability to be runtime-loaded.
Acomplaint by Thomas
Fricaccia about the change caused Torvalds to put out a call for folks
using module loading with their LSM code. The patch could be reverted if
there are "real-world" uses for that ability. Torvalds again questions the sanity of security
developers, but is clearly looking for someone to step up:
I'd like to note that I asked people who were actually affected, and had
examples of their real-world use to step forward and explain their use,
and that I explicitly mentioned that this is something we can easily
re-visit.
Jan Engelhardt responded with information about his MultiAdmin module, which
allows multiple root users on a system, each with their own UID. This
allows separate tracking of file ownership, resource usage and the like for
each administrator. MultiAdmin also
allows for the creation of sub-administrators who can perform some root activities for
processes and files owned by a subset of users. The use case he cites is
for professors being allowed to administer their students' accounts without
getting full root privileges.
James Morris, who proposed the static LSM change, responded that
MultiAdmin seemed to qualify as a real-world use under Torvalds's criteria.
Though it is not clear that MultiAdmin requires a loadable
interface, it does use it. The venerable root_plug security
module – which only allows root processes to start if a
particular USB device is plugged in – also implements loading and
unloading. In both cases, configuration could be done via
sysfs parameters with an enable flag to turn them on or off.
To some extent, for the examples offered so far, loading is a
convenience for administrators, but the major users for unloading are
developers. Crispin Cowan sums it up:
Why would you
want to dynamically unload a module: because it is convenient for
debugging. Ok, so it is unsafe, and sometimes wedges your kernel, which
sometimes forces you to reboot. With this patch in place, it forces you to
*always* reboot when you want to try a hack to the module.
Other justifications for leaving the LSM loadable interface in the kernel
have been less compelling. It is hard to imagine that the US
Sarbanes-Oxley regulation would
allow loading security modules into a running kernel, but not allow the
kernel to be rebuilt as Fricaccia suggested. Inserting proprietary security modules that
are provided from the vendor in a binary-only form seems foolhardy –
this kind of potential abuse is the kind of hole Morris's patch was meant
to close – but could be
seen as a reason to allow LSM loading.
A compromise may have been found in a patch
posted by Arjan van de Ven, which converts LSM to be either static or
loadable depending on a compile-time kernel option. A consensus seems to
be building that this is a reasonable approach, allowing distributions and
users to decide for themselves whether they will allow security modules to
be loaded. As of this writing, Torvalds has not weighed back in with a
decision and the newly released 2.6.24-rc1 kernel has the static patch.
Dynamic loading of security modules is a potential source of problems
– what better place for a rootkit to hide? – but there are
valid reasons that someone might want to use it. Linux strives to be open
to many uses, including some that the kernel hackers might find
distasteful; dynamic security modules would seem to be one of those uses.
Comments (8 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Networking
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>
|
|