LWN .net News from the source

LWN

●Content ●Weekly Edition ●Archives ●Search ●Kernel ●Security ●Events calendar ●Unread comments

●LWN FAQ ●Write for us

|

|

Log in / Subscribe / Register

Latency and throughput tasks in heterogenous systems

Latency and throughput tasks in heterogenous systems

Posted Jan 14, 2026 14:05 UTC (Wed) by farnz (subscriber, #17727) Parent article: A high-level quality-of-service interface This also looks like it will help with decision-making in big.LITTLE systems; if a thread is a background thread, and there's both an efficient core and a fast core idling, energy will push you into scheduling that thread on the efficient core, while if it's a UI thread, latency will push you into scheduling that thread on the fast core. You could further extend the interface to tell the scheduler whether it's worth spending extra power to get a thread's work done faster; it's always going to be a hint, because the scheduler might prefer to schedule you onto the SMT sibling of a low-latency thread, to avoid waking up another efficiency core, but it would let the scheduler know that, if all efficiency cores are in use, this thread would prefer to avoid starting up a power-hungry core just for this thread, when it could wait for another thread to leave the efficiency cores.

Posted Jan 25, 2026 11:50 UTC (Sun) by anton (subscriber, #25547) [Link] (9 responses)

if a thread is a background thread, and there's both an efficient core and a fast core idling, energy will push you into scheduling that thread on the efficient core, while if it's a UI thread, latency will push you into scheduling that thread on the fast core.

I don't think that either of these decisions would be correct in all or even most cases. The UI thread may have (and often has) so little CPU demand that it can well run on an efficient core, and the background thread may be something that the user is waiting for and that deserves a performance core. So we need a more detailed way of informing the kernel of the requirements. The nice level and QoS are ways to communicate such things; not sure if they will be enough.

On a related note, about 20 years ago on my Athlon 64, where clock rate was determined by the ondemand governor (IIRC), one could tell the governor to ignore nice processes when raising the clock rate of the CPU. I used this for the encoding part of CD ripping: Instead of clocking up the CPU (and spinning up the fan) after reading a song from the CD, the CPU stayed slow, the fan quiet, and the encoding was still fast enough (it was usually done by the time the next song had been read from the CD). Unfortunately, Linux lost that capability later (IIRC I no longer had it when I had the Core 2 Duo starting in 2008). Maybe the QoS work will bring it back.

Latency and throughput tasks in heterogenous systems

Posted Jan 25, 2026 13:12 UTC (Sun) by farnz (subscriber, #17727) [Link] (8 responses)

The UI thread may well benefit (on average) from a lower time to completion of work, even if it has very low CPU demand - if there's 500 µs between the UI thread being ready to run, and the next frame deadline, and the performance core can complete the UI work all the way to "result will be visible in the next frame" in 400 µs, while an efficiency core will miss the next frame by taking 600 µs and thus appear less responsive, even if the UI thread is only ready to run once per frame, and thus (on a 240 Hz top-end monitor) only needs 25% of an efficiency core, less on a more typical screen.

Similarly, the background thread may benefit more from having 100% of an efficiency core (and no context switches) than having 50% of a faster performance core, due to having a hot cache, rather than

That's what makes scheduling a hard problem - you're predicting the future all the time. The UI thread I've just described is a good example - because if there was 450 µs left until the next frame deadline or if this time, the compute will take 550 µs not 400 µs, you'd be better off putting it on the efficiency core, not the performance core.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 8:52 UTC (Fri) by daenzer (subscriber, #7050) [Link] (7 responses)

> he UI thread may well benefit (on average) from a lower time to completion of work, even if it has very low CPU demand - if there's 500 µs between the UI thread being ready to run, and the next frame deadline, and the performance core can complete the UI work all the way to "result will be visible in the next frame" in 400 µs, while an efficiency core will miss the next frame by taking 600 µs and thus appear less responsive, even if the UI thread is only ready to run once per frame,

Which BTW isn't a hypothetical scenario, but a quite realistic one e.g. for the mutter KMS thread.

Given current KMS UAPI, the consequence of missing the deadline is that the same contents will be displayed (at least) one display refresh cycle later, there's no mechanism for replacing them with newer contents yet.

As a side note, this kind of latency-minimizing workload keeps track of how long it takes to complete, and sets a timer for the next run such that it will complete as shortly before the deadline as (reasonably) possible. Migration between performance & efficiency cores could be problematic for this.

> and thus (on a 240 Hz top-end monitor) only needs 25% of an efficiency core, less on a more typical screen.

Ahem, there's a number of 360 & 540 Hz monitors now, the first 1000+ Hz ones are popping up.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 12:24 UTC (Fri) by farnz (subscriber, #17727) [Link] (6 responses)

Things like the mutter KMS thread, a browser's compositing thread, and similar were the exact sort of thing I was thinking of - not a huge amount of work to do (since it's focused on making the GPU do as much work as possible), but very latency critical. And these are exactly the sorts of low utilization threads that "look" to heuristics like they should run on an efficiency core (since they don't need the compute for very long), but in fact really gain from the performance core - since it's a burst of compute to determine what GPU commands to send and when to wake up for the next frame, then submit work to the GPU and go idle until next frame.

I hadn't realised that 240 Hz was no longer the realm of top-end monitors - my point was more that given the numbers I'd chosen, this thread would "fit" on an efficiency core (and therefore heuristics might well schedule it on an efficiency core) with anything up to a 960 Hz monitor, and you need the extra guidance to tell the scheduler that no, even though my primary monitor is 60 Hz, it still ought to run on the performance core, and get the undivided attention of the performance core from when it's ready to run to when it submits and goes idle, so that its latency goals are met, even though it doesn't have a need for a huge amount of compute.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 13:48 UTC (Fri) by paulj (subscriber, #341) [Link] (5 responses)

Aside: What's the use case for 200+ Hz monitors?

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 14:00 UTC (Fri) by daenzer (subscriber, #7050) [Link] (2 responses)

Largely motion clarity / avoiding motion blur due to the hold characteristics of flat panel displays (as opposed to the scanning characteristics of CRTs). The current state of research is that the former at around 1000 Hz has motion clarity comparable to the latter at 50/60 Hz.

Another way to attack the same problem is "Black Frame Insertion", i.e. displaying solid black instead of the actual contents for some time between frames. There has been some interesting development in this area lately, which allows for much better motion clarity even at double-digit frame rates.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 14:18 UTC (Fri) by paulj (subscriber, #341) [Link] (1 responses)

Aha! Ok, interesting. So 1000 Hz on a *LCD == 50/60 Hz on the CRT? Is that cause the phosphor on the CRT keeps glowing by itself for a while - there's a hysteresis to the energy absorption of electrons and emission of light by the phosphor? And the *LCD pixels are turning off? (But if that were the issue, a bit of capacitance could fix that - but maybe that's hard to add in without affecting PPI and quality). ??

Interesting, learn something new every day. Tangents on LWN can be useful sometimes. ;)

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 14:33 UTC (Fri) by daenzer (subscriber, #7050) [Link]

> Is that cause the phosphor on the CRT keeps glowing by itself for a while - there's a hysteresis to the energy absorption of electrons and emission of light by the phosphor? And the *LCD pixels are turning off?

Pretty much the opposite. :) CRT pixels only glow for a short time, most of the time they're black[0]. Conversely, LCDs are on most of the time. Since our eyes automatically follow moving objects, we perceive LCD pixels smeared along the direction of motion.

https://testufo.com/ nicely demonstrates this effect. On a 240 Hz monitor, I see a big difference between the 60 Hz and 120 Hz lines, and a smaller but still noticeable difference between the latter and the 240 Hz line.

[0]: See https://www.youtube.com/watch?v=3BJU2drrtCM for how this looks in ultra slow motion.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 14:10 UTC (Fri) by farnz (subscriber, #17727) [Link]

NB: this is going from memory - it'll take me weeks to find the notes I made 25 years ago that covered this.

There's two fundamental frequencies of interest in human vision:

The "flicker fusion threshold" (around 10 Hz), where the human visual system can be convinced that something is steady, not flickering.
The stroboscopic threshold (can't remember the correct name off the top of my head, over 1 kHz, under 10 kHz), where the human visual system can detect something caused by changes at this speed.

Below the flicker fusion threshold, you're relying on humans being willing to suspend disbelief - they can see that the change is slow, but they might be willing to accept that it's "real" to join in.

Between the two frequencies, the human brain sees frames as in smooth motion as long as you don't hit temporal aliasing effects. The higher the frequency, the less likely you are to hit a temporal aliasing effect - since they occur at harmonics of the refresh rate.

Above the higher frequency, there's no risk of hitting a temporal aliasing effect. The human vision system can no longer react fast enough to change to detect that it's not a moving picture, but instead a sequence of static frames.

This leads to the use case for high refresh rates: it's to allow for faster motion without the human being able to detect that it's "not smooth". How they experience this varies - it might be that they experience it as latency between input and reaction, or that increasing the refresh rate increases their ability to notice "oddities" (since the brain is no longer dismissing things as the effect of flicker fusion), or as something else.

And somewhere (I wish I could find it again), there's a lovely paper from the US military, showing that carrier-deployed pilots are capable of identifying military aircraft that they only see for 1 millisecond - not just "allied" or "hostile", but MiG-15 versus MiG-17, Polish versus USSR markings, under-wing missiles mounted/not present etc. The paper itself is interesting because it's mostly not about the capabilities of the pilots - rather, it's several pages about all the tricks they had to pull to be completely confident that the pilots were not getting to see the plane in their field of view for more than 1 ms when it was "flashed" onto a sky-blue background.

Now, very few people can do that sort of rapid identification trick - I suspect the pilots only could because it was literally life-and-death for them if their training didn't let them identify a potential hostile that quickly - but we've all got roughly the same hardware, and I wouldn't be surprised if in gaming, time-based data visualizations and similar fields, there's a real effect to having a higher refresh rate.

Along, of course, with a bunch of people buying HFR monitors because "number goes up", even though they don't gain from it.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 15:08 UTC (Fri) by excors (subscriber, #95769) [Link]

I think there's not much benefit to very high frequencies, but there's not that much cost either, so monitor manufacturers figure they may as well support it. And bigger numbers are always better for marketing.

There are clear benefits going 60->120Hz. Motion is noticeably smoother, even when simply dragging a window around your desktop. Or in an FPS game when you rotate the camera quickly, the scene might jump by hundreds of pixels per frame, and doubling the framerate will significantly reduce that jerkiness (without the downsides of simulated motion blur). Input latency is reduced by ~8ms, which isn't much but will sometimes be the difference between you or your enemy shooting first. You can run vsynced at new framerates (e.g. 40fps) without the downsides of VRR.

120->240Hz has all the same benefits, just with diminishing returns. Recent NVIDIA GPUs can interpolate between rendered frames to increase framerate by 4x, so you get 240fps basically for free (albeit losing the latency benefit). And you can buy very cheap 240Hz monitors, so why not. Then 240->480Hz is similar reasoning: even more diminished returns, and much greater costs, but the costs will come down over time, and if you can afford it then you might as well take those minor returns.

Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds