On Sat, 05 Jan 2008 20:12:12 -0500 Rafal Boni <rafal%pobox.com@localhost> wrote: > Rafal Boni wrote: > > Rafal Boni wrote: > >> I just rebooted my trusty Netra T1 with a shiny new 4.99.48 kernel and > >> thought I'd kick off a userland build. Things seemed to go swimmingly > >> for a few minutes, then the machine ground to an un-usable state -- > >> userland seems to be mostly non-responsive, though the machine is > >> pingable, answers a ^T at a tty (well, it seems to be wedged harder > >> now.. it did for a while after the apparent lockup), and the disk sounds > >> like progress is being made on the build. > >> > >> But, I can't get any echo from a tty anymore, and god forbid I should > >> want to log in ;) > >> > >> Anyone seeing anything similar? Should I go back to the last-known-good > >> kernel for a while? ;) > >> > >> Machine is a Netra T1 200 -- UltraSPARC-IIe @ 500 MHz with 512MB RAM. > > So I thought I'd give it one more try, and I saw the same thing happen > this time with a kernel build (thought I'd see if I maybe there was > something else in the latest CVS that would help). > > The machine locked up ~ 18:01; it's now 2+ hours later and the disk is > still chugging along. Here's the last thing 'top' on the console said > before the hang: > > load averages: 4.95, 4.71, 3.82 up 0 days, 13:48 > 18:01:34 > 29 processes: 1 runnable, 27 sleeping, 1 on processor > CPU states: 0.0% user, 0.0% nice, 8.1% system, 3.4% interrupt, 88.5% > idle > Memory: 184K Act, 336K Inact, 6096K Wired, 128K Exec, 328K File, 304K Free > Swap: 2050M Total, 36M Used, 2014M Free > > Unless top's reporting is just way off (it didn't seem to be at the > start), there's a sucking memory leak somewhere -- where'd the other 500 > MB of memory go? > > DDB's ps/l (as well as backtrace) also shows an interesting fact -- the > active LWP is the system idle loop every time I'd ended up in DDB due to > this hang. > > --rafal > I can reproduce this, with a LOCKDEBUG kernel. I don't have any swap enabled so instead of death by vm thrashing I get killed processes. I guess this could be a problem with the sparc64 atomic ops, since the uvmexp accounting doesn't add up.