News from the source  


Content
Weekly Edition
Archives
Search
Kernel
Security
Events calendar
Unread comments

LWN FAQ
Write for us

Edition
Return to the Front page


 




|  
|  


Log in /  Subscribe /  Register  




A high-level quality-of-service interface




ByDaroc Alden
January 13, 2026

LPC

Quality-of-service (QoS) mechanisms attempt to prioritize some processes (or network traffic, disk I/O, etc.) over others in order to meet a system's performance goals. This is a difficult topic to handle in the world of Linux, where workloads, hardware, and user expectations vary wildly. Qais Yousef spoke at the 2025 Linux Plumbers Conference, alongside his collaborators John Stultz, Steven Rostedt, and Vincent Guittot, about their plans for introducing a high-level QoS API for Linux in a way that leaves end users in control of its configuration. The talk focused specifically on a QoS mechanism for the scheduler, to prioritize access to CPU resources differently for different kinds of processes. (slides; video)

Historically, the server market has been a big factor in optimizing the performance of the Linux scheduler, Yousef said. That has changed somewhat over time, but at least initially the scheduler was extremely throughput-focused. In recent years, there has been more concern given to interactivity, but there are still a lot of stale assumptions about how to wring the best performance out of a Linux system. POSIX hasn't evolved to cover new developments, applications still often spawn one thread per CPU core as though the system has no other workloads running, and so on.

The current default scheduler in the kernel, EEVDF, has a number of configurable parameters that can be adjusted for the system as a whole or per-process; Yousef thought that the best way to implement a QoS API for Linux was to give the scheduler enough information about processes to set reasonable default values for the existing configuration options. In the past, people have focused on the kernel interface used to communicate with the scheduler, he said, but that isn't the problem that matters. What matters is providing a high-level API for applications that doesn't require detailed knowledge of the scheduler's configuration to use.
[Qais Yousef]
iOS (and related platforms such as macOS and watchOS) already have an interface like that. It provides four QoS levels for a program to choose between: "user interactive" (for tasks needed to update a program's user interface), "user initiated" (for things that the user is actively doing), "utility" (for tasks that should happen promptly but that don't directly impact the user), and "background" (for tasks that have no particular latency requirements). The QoS level can be set independently per thread in a program.

Yousef proposed stealing that design for use on Linux, and mapping each of those classes to the time slice, policy, ramp-up multiplier, and uclamp settings of the scheduler. Threads would default to the utility class, which would match the scheduler's current default values. Threads in the user interactive or user initiated classes would be given shorter time slices, which tell the scheduler to prioritize latency over throughput. Threads in the background class would be given longer time slices, so that they can run for longer periods without interruption when the system is idle, but would be interrupted if a higher-priority thread became runnable.

One audience member objected that Linux does already have a way to control how much time is given to different threads: the deadline scheduler. Rostedt piped up to ask whether theyreally want to run Chrome under the deadline scheduler?He clarified that using the deadline scheduler required root privileges, but Yousef's proposal was all about allowing normal, unprivileged applications to provide performance hints. This led to an extended argument in the audience about deadline scheduling versus performance hinting until Yousef pulled things back on topic.

Adopting QoS classes in the scheduler would require some changes to the code that handles placing tasks on different CPUs, as well, he said. Right now, the kernel decides which CPU should run a task based on one of four criteria: CPU load, CPU load and energy usage, CPU load and NUMA domain, or core affinity. That should really be extended to consider aspects of the task being placed, Guittot explained. When the load balancer is placing a task with a short time slice, it should first consider idle CPUs (where the task could run right away), but if there are none, it should prefer putting the task on a CPU working on something with a long time slice (so that it can be preempted).

The required code changes in the kernel aren't the main problem, though, Yousef claimed. The problem is how to encourage application developers to adopt a new API; it can take years for people to actually use new kernel APIs. "I think we can do better", he said. Rather than using an interface based on calling functions in code, which requires application developers to update their programs to take them into account, why not use a configuration-file-based approach? That way applications can ship default configuration files if they care to, but users and distribution maintainers can add configuration files for any applications they want to see support the new API. It also leaves the user of the system in ultimate control: if the application ships an obnoxious default configuration, they can override it.

One person asked why an application developer wouldn't just ship a configuration setting all of their threads to the highest priority. Another audience member pointed out that the proposed QoS API was really most useful for prioritizing threads within an individual application; a video game would want to make the user interface part of the user interactive class, but giving a background cleanup task the same class would just cause jitter or lag, while slowing down the throughput of background processing.

Yousef also said that the kernel should not be bound to blindly follow an application's configuration no matter what. There should be appropriate access control. If an application requests the user interactive QoS class, but the system knows that the application isn't currently in the foreground (possibly due to hints from the desktop environment, through the use of control groups, or from some other process-management mechanism), it could restrict the application to the utility class instead. On the other hand, any application should probably be allowed to mark a thread as part of the background class in order to get the throughput benefits, since those threads won't impact latency.

The important thing is to give the scheduler more information with which to make reasonable decisions for different workloads, Yousef summarized. For servers, which mostly care about throughput, it makes sense to just leave everything at the default QoS level and let the existing code (which has been mostly optimized for servers already) handle it. For laptops, phones, and other applications where latency is more important than raw throughput, having the few threads that matter most indicated to the scheduler lets it make better decisions.

There is other ongoing work in the kernel that will also help with this, he added. There was a discussion at the conference in the toolchains track (video) about how to enable high-priority tasks to "donate" CPU time (or potentially other resources, such as QoS class or other scheduler settings) to lower-priority tasks that are currently holding a lock that the high-priority task needs. That way, a high-priority task isn't stuck waiting for a low-priority task to be scheduled on the CPU so it can make progress (a situation known as priority inversion). Such work has historically been called priority inheritance, but in the toolchains track talk it was called performance inheritance, to indicate that it can involve more than just priority. Regardless of what it is called, Yousef said that work would also contribute to improving user-visible latency.

Whether Linux will end up adopting a QoS API, and whether it will mimic Apple's API so closely, remains to be seen. It seems clear that there has been a shift in recent years, putting increased focus on concerns other than throughput in the kernel's scheduling subsystem. If that push continues, which seems likely, users may look forward to more responsive user interfaces in the future.

[ Thanks to the Linux Foundation for sponsoring my travel to the Linux Plumbers Conference. ]


to post comments

Why be fair to batch jobs?

Posted Jan 13, 2026 21:53 UTC (Tue) by TheJH (subscriber, #101155) [Link] (8 responses)

This proposal seems to try to be fair to all tasks, no matter what QoS level they're at - for example, by making interactive threads more responsive in exchange for reducing for how long they can run uninterrupted? That's unintuitive to me; I would have thought it would be desirable to aggressively prioritize foreground UI work at the expense of background work.

For example, if the system is creating a backup in the background while the user is trying to watch a video, I think the user would want the video player and display manager to have near-absolute priority over the background backup process?

Why be fair to batch jobs?

Posted Jan 13, 2026 22:20 UTC (Tue) by acarno (subscriber, #123476) [Link] (1 responses)

I took a different takeaway - this proposal aims to provide a standard interface that applications can use to provide additional information to the scheduler. It's not about fairness; it's attempting to remedy the fact that the current scheduler operates in a mostly-reactive nature. If you can prime the scheduler with information about an application's behavior, it can hopefully make better choices (where better is defined as "more responsive to the system owner's desires, whatever those may be").

Why be fair to batch jobs?

Posted Jan 13, 2026 23:36 UTC (Tue) by daroc (editor, #160859) [Link]

I think that's an excellent summary of the main point of the talk.

Why be fair to batch jobs?

Posted Jan 13, 2026 23:35 UTC (Tue) by daroc (editor, #160859) [Link]

This is actually an artifact of the way that the EEVDF scheduler works. EEVDF stands for "earliest eligible virtual deadline first"; each task is given a "virtual deadline" which is based on a few things, but primarily based on when that task's time-slice on the CPU will be up. Then the scheduler just runs whichever task has the next virtual deadline (among tasks that are eligible to run at all).

So, if the video player in question were given the "user interactive" priority, it would have a smaller time slice, which means its virtual deadline would almost always come sooner than a background task with a longer time slice. Therefore the video player would typically preempt the background task as soon as it became runnable (perhaps because new video data arrived). The background task would get assigned CPU time in bigger chunks, yes, but it would only get to take advantage of them if not preempted by other work.

Why be fair to batch jobs?

Posted Jan 14, 2026 2:44 UTC (Wed) by iabervon (subscriber, #722) [Link] (1 responses)

If the batch job isn't low priority, a better example might be watching a video while waiting for a test cycle to complete. In this situation, the user would like to give the video player only enough CPU to avoid dropping frames. If the foreground task is browsing the web or something similar, the user is likely to want to avoid giving the browser CPU beyond what's needed to keep it responsive, even if the pages have scripts that would like to do more.

Why be fair to batch jobs?

Posted Jan 25, 2026 11:33 UTC (Sun) by anton (subscriber, #25547) [Link]

a better example might be watching a video while waiting for a test cycle to complete. In this situation, the user would like to give the video player only enough CPU to avoid dropping frames.
Actually, the video player needs a certain amount of CPU per time unit. If you give it less, it drops frames. If you give it more, it will not need it (it's a kind of I/O-bound process). Given that you don't want to drop frames you will want to give it CPU when it needs it, and give the test cycle CPU when the video player does not need it. I.e., the video player should have priority.

40 years ago I learned about the Unix scheduler, and that it is designed exactly for these kinds of situations, even without nice: It lowers the priority of a process while it is having the CPU, until another process has higher priority, then the other process gets the CPU (threads and multi-core were not things at the time). The video player would yield the CPU regularly and would therefore have a higher priority when it would become ready again than the CPU hog, and therefore it would get the next time slice after it becomes ready again. I don't know how the default Linux scheduler and the alternatives work in this situation.

Concerning time slice length and such, at least the decoding component of a video player is not particularly demanding: It can decode quite a bit in advance, so if it gets the CPU only a while after it becomes ready, it would just need to use a higher low-water mark in its output buffer, so that the display engine does not run out of frames to display in the delay between the video player becoming ready due to the low-water mark being reached and the time when the decoder actually gets the CPU. The UI thread of the video player is more demanding, though.

Why be fair to batch jobs?

Posted Jan 14, 2026 11:38 UTC (Wed) by farnz (subscriber, #17727) [Link]

I think the bit you're missing is that there's already ways for background work to deprioritise itself so that foreground work gets more CPU. You can reduce the thread's priority (nice level) so that the scheduler knows to prefer other threads, and/or use SCHED_BATCHorSCHED_IDLE to tell the scheduler that this is background work and that other work should be considered more important. These existing mechanisms give you a way to tilt the thread's share of the CPU so that foreground tasks are prioritized at the expense of background tasks.

The QoS levels add another dimension to the system; does a given task care more about scheduling latency, or about throughput? That is, if I'm allowed 10% of the CPU, would I prefer to run for 100 ms then stall for 900 ms waiting for CPU, or would I prefer to run for 1 ms then stall for 9 ms waiting for CPU?

Add the two together, and you've got quite a powerful setup; background work can run at low priority, and indicate that it would prefer long timeslots less frequently (which implies that it gets more work done per second, thanks to things like CPU caches being hot), while foreground work can indicate that it wants to be scheduled immediately it's runnable, and will accept shorter timeslots when under heavy load as the tradeoff for being able to make less progress per unit time, but run more frequently.

Why be fair to batch jobs?

Posted Jan 26, 2026 13:21 UTC (Mon) by roblucid (guest, #48964) [Link] (1 responses)

Interactivity with a user is about low latency responses to input, such programs are spending most of the time blocked but want fast response displayed to (rare) input. Some guarantee that they will be picked first but give up long processing time slices is reasonable. If your processing isn't completed quickly, batching it with a longer timeslice to gain from warm caches makes sense.

Watching a video is NOT an interactive task, it's the user passively consuming a stream of i/o that's rendered on the display. You're actually wanting not just CPU time but also i/o to some deadline to avoid delayed frames, especially the audio; it's like the old arguments about guaranteed bandwidth on ATM networks which mostly would shovel larger blocks of data about, but needed time critical small packets to meet their deadines.
Rare pushes of play/pause buttons and volume adjustment simply don't need <20ms response times.

User space program hints providing information to the kernel has long been desirable but trade offs are required, eg) latency vs throughput cannot
be both or every application will claim they need both starving other programs. I've never met an application developer who was comfortable with their code being low priority or throttled because they ONLY think about their own program (which is the most important thing ever) not the whole system. Really users need to be in control, but whatever needs to be presented with understanable goals that are meaningful to them.
Sometimes that can be saving battery or reducing noise.

There's actually people wanting to run things like games, while they're waiting for their background work to complete as a priority, accepting degraded service for their interactive task.

Why be fair to batch jobs?

Posted Jan 26, 2026 15:44 UTC (Mon) by Wol (subscriber, #4433) [Link]

> Watching a video is NOT an interactive task, it's the user passively consuming a stream of i/o that's rendered on the display.

It's a real-time task - it needs guaranteed response times, but doesn't need all the power of the computer (probably only needs a little).

And as for background tasks not needing priority, many moons ago I got fed up with little jobs taking ages because some resource hog was running. So I set up a batch queue with the following characteristics "priority=max, wallclock=30secs". I almost never used it, but it was nice knowing it was there if i needed it, and other users would hardly notice because the runtimes were - enforcedly - so short.

Cheers,
Wol

Limit visible resources to available resources

Posted Jan 14, 2026 6:46 UTC (Wed) by donald.buczek (subscriber, #112892) [Link] (7 responses)

> POSIX hasn't evolved to cover new developments, applications still often spawn one thread per CPU core as though the system has no other workloads running, and so on.

These applications also ignore restrictions by sched_setaffinity or cgroups. I wish, /proc/cpuinfo could be made to not show unavailable CPUs.

Simmilar problem for virtual memory (Java VM).

Limit visible resources to available resources

Posted Jan 14, 2026 11:58 UTC (Wed) by epa (subscriber, #39769) [Link] (6 responses)

Surely one thread per core is the right approach? If you have fewer threads than cores, you will never be able to get full performance in a CPU-bound job. (If the job has a mixture of I/O and CPU then it might benefit from more threads than cores, but one-to-one seems like a good minimum.)

Perhaps the application should be "nice", look at the system load, and decide to spawn fewer threads than cores in order not to take more than its fair share or make the system unresponsive. But surely this kind of decision should be made by the scheduler. Userspace shouldn't have to second-guess how much CPU time might be available in the future. If the app spawns many threads then the scheduler has flexibility to decide how many run at the same time; if the app holds off from making threads because it's worried about load, there is no way for the scheduler to later give the application more resources if the system becomes less loaded.

It's true that if there is a hard limit on how many CPUs your process may use, then it may be pointless spawning more threads. So some kind of "count the available CPUs" system call could help. It might also help to tell the scheduler "I have spawned lots of threads but they are all worker processes and don't need to make progress equally; I am happy to starve some of them for a bit while the others keep running". This kind of fairness preference could be set as part of the QoS interface.

Limit visible resources to available resources

Posted Jan 14, 2026 13:20 UTC (Wed) by hkario (subscriber, #94864) [Link]

yes, this would also allow easier decision making about which threads should run on fast cores and which can be delegated to slow cores

Limit visible resources to available resources

Posted Jan 14, 2026 13:57 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

At least in theory, if your job has a mixture of I/O and CPU demand, then it should be better off with one thread per core (well, hardware thread - in an SMT system, you want one OS thread per SMT thread, which can be 2x, 4x or even 8x the number of cores depending on the processor and its setup) and asynchronous I/O of some form, where the thread does other work while it waits for I/O to complete.

That does not excuse how hard it is to find out how many threads you need to spawn to get the maximum performance for a job - you need to check cgroupfs (and find the right mount point for it, at least in theory), your CPU affinity mask (since that's inherited by new threads) and the total number of hardware threads on the system, and choose the smallest value from those three - otherwise you're going to spawn more threads than you need for peak CPU performance. It would be easier on application developers if there was a syscall that told you how many threads you could spawn before you were guaranteed to have threads contending with each other for CPU time, taking into account all of the factors that limit which CPUs you can run on.

Limit visible resources to available resources

Posted Jan 14, 2026 15:19 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

Or even a special scheduling class which means "let me run only when there are significant idle CPU resources". The application could have a thread whose only job is to start new worker threads. It will normally be suspended, but when the scheduler notices lots of free CPU it wakes up this thread, which kicks off a new worker and immediately goes back to sleep.

Significant idle CPU resources

Posted Jan 14, 2026 15:53 UTC (Wed) by farnz (subscriber, #17727) [Link]

The challenge for such a scheduling class is distinguishing "there are significant idle CPU resources because another workload has finished" from "there are significant idle CPU resources because your workload cannot saturate the CPU".

If saturating the CPU needs 16 threads, and you only started 8 because another workload was consuming half the total CPU resource, then you want to start the other 8 when that workload terminates. But if you have 16 threads running, but this is an edge case where all 16 of your threads can only wait for network I/O and nothing else, you do not want to start more threads, even though there could be significant idle CPU resources for some time.

And, of course, you'll not normally think about this edge case, precisely because it's an edge case that "should not happen" in normal operation. The "simple" fix is to determine the maximum number of usable threads, and limit yourself to spawning that many - but at that point, why not spawn all the threads at start-up, and save the complexity of having a worker thread spawning thread that exits after enough workers are spawned?

Limit visible resources to available resources

Posted Jan 14, 2026 19:44 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> Surely one thread per core is the right approach?

There are even more complications! Some tasks might prefer to always run on efficiency cores to prevent computer fans from spinning up. Unless the laptop is in the high-power mode already, and they might as well make use of it.

Limit visible resources to available resources

Posted Jan 15, 2026 10:56 UTC (Thu) by donald.buczek (subscriber, #112892) [Link]

> Surely one thread per core is the right approach?

No. But my use-case are users on multi-user machines (yes, these still exist in scientific computing), either interactive or via a cluster job scheduler, who don't investigate into "right approaches" but just use "make -j $(nproc)" or some prebuilt software which stupidly starts one worker thread per core. Even when the job scheduler allocates a subset pf CPUs for individual jobs with sched_setaffinity.

My expectations are simple

Posted Jan 14, 2026 7:59 UTC (Wed) by rrolls (subscriber, #151126) [Link] (3 responses)

No matter how much the (userspace) program I'm running wants to steal all the resources, the system should reserve enough CPU and memory that critical system functionality (moving the mouse, spawning a terminal or process monitor/task manager, interacting with it) still works and I'm able to kill the errant program rather than having to hard-reboot the PC.

Windows usually manages this. Linux almost never does. This has remained the same over 20ish years of experience.

My expectations are simple

Posted Jan 15, 2026 8:35 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

Linux manages this just fine as long as you disable swap or catch it before it goes deep into swap. Of course if you get back to a computer where all your interactive stuff is swapped out it gets harder.

My expectations are simple

Posted Jan 30, 2026 8:57 UTC (Fri) by daenzer (subscriber, #7050) [Link]

Disabling swap can itself cause such symptoms though.

When the system runs out of memory, if there's no swap, the only thing the kernel can do is evict file-backed pages, which will typically be largely executable code. As memory pressure goes up, the kernel spends increasingly more time faulting back in pages holding the user-space code it's trying to execute.

My expectations are simple

Posted Jan 16, 2026 13:37 UTC (Fri) by farnz (subscriber, #17727) [Link]

IME, systemd-oomd has fixed this (certainly since systemd 248) by killing processes in cgroups that have high memory pressure. When I have a runaway process, it kills that process before swapping out my working set, and that's all that you need to do to keep critical system functionality working in the face of a process trying to steal all the resources.

Latency and throughput tasks in heterogenous systems

Posted Jan 14, 2026 14:05 UTC (Wed) by farnz (subscriber, #17727) [Link] (10 responses)

This also looks like it will help with decision-making in big.LITTLE systems; if a thread is a background thread, and there's both an efficient core and a fast core idling, energy will push you into scheduling that thread on the efficient core, while if it's a UI thread, latency will push you into scheduling that thread on the fast core.

You could further extend the interface to tell the scheduler whether it's worth spending extra power to get a thread's work done faster; it's always going to be a hint, because the scheduler might prefer to schedule you onto the SMT sibling of a low-latency thread, to avoid waking up another efficiency core, but it would let the scheduler know that, if all efficiency cores are in use, this thread would prefer to avoid starting up a power-hungry core just for this thread, when it could wait for another thread to leave the efficiency cores.

Latency and throughput tasks in heterogenous systems

Posted Jan 25, 2026 11:50 UTC (Sun) by anton (subscriber, #25547) [Link] (9 responses)

if a thread is a background thread, and there's both an efficient core and a fast core idling, energy will push you into scheduling that thread on the efficient core, while if it's a UI thread, latency will push you into scheduling that thread on the fast core.
I don't think that either of these decisions would be correct in all or even most cases. The UI thread may have (and often has) so little CPU demand that it can well run on an efficient core, and the background thread may be something that the user is waiting for and that deserves a performance core. So we need a more detailed way of informing the kernel of the requirements. The nice level and QoS are ways to communicate such things; not sure if they will be enough.

On a related note, about 20 years ago on my Athlon 64, where clock rate was determined by the ondemand governor (IIRC), one could tell the governor to ignore nice processes when raising the clock rate of the CPU. I used this for the encoding part of CD ripping: Instead of clocking up the CPU (and spinning up the fan) after reading a song from the CD, the CPU stayed slow, the fan quiet, and the encoding was still fast enough (it was usually done by the time the next song had been read from the CD). Unfortunately, Linux lost that capability later (IIRC I no longer had it when I had the Core 2 Duo starting in 2008). Maybe the QoS work will bring it back.

Latency and throughput tasks in heterogenous systems

Posted Jan 25, 2026 13:12 UTC (Sun) by farnz (subscriber, #17727) [Link] (8 responses)

The UI thread may well benefit (on average) from a lower time to completion of work, even if it has very low CPU demand - if there's 500 µs between the UI thread being ready to run, and the next frame deadline, and the performance core can complete the UI work all the way to "result will be visible in the next frame" in 400 µs, while an efficiency core will miss the next frame by taking 600 µs and thus appear less responsive, even if the UI thread is only ready to run once per frame, and thus (on a 240 Hz top-end monitor) only needs 25% of an efficiency core, less on a more typical screen.

Similarly, the background thread may benefit more from having 100% of an efficiency core (and no context switches) than having 50% of a faster performance core, due to having a hot cache, rather than

That's what makes scheduling a hard problem - you're predicting the future all the time. The UI thread I've just described is a good example - because if there was 450 µs left until the next frame deadline or if this time, the compute will take 550 µs not 400 µs, you'd be better off putting it on the efficiency core, not the performance core.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 8:52 UTC (Fri) by daenzer (subscriber, #7050) [Link] (7 responses)

> he UI thread may well benefit (on average) from a lower time to completion of work, even if it has very low CPU demand - if there's 500 µs between the UI thread being ready to run, and the next frame deadline, and the performance core can complete the UI work all the way to "result will be visible in the next frame" in 400 µs, while an efficiency core will miss the next frame by taking 600 µs and thus appear less responsive, even if the UI thread is only ready to run once per frame,

Which BTW isn't a hypothetical scenario, but a quite realistic one e.g. for the mutter KMS thread.

Given current KMS UAPI, the consequence of missing the deadline is that the same contents will be displayed (at least) one display refresh cycle later, there's no mechanism for replacing them with newer contents yet.

As a side note, this kind of latency-minimizing workload keeps track of how long it takes to complete, and sets a timer for the next run such that it will complete as shortly before the deadline as (reasonably) possible. Migration between performance & efficiency cores could be problematic for this.

> and thus (on a 240 Hz top-end monitor) only needs 25% of an efficiency core, less on a more typical screen.

Ahem, there's a number of 360 & 540 Hz monitors now, the first 1000+ Hz ones are popping up.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 12:24 UTC (Fri) by farnz (subscriber, #17727) [Link] (6 responses)

Things like the mutter KMS thread, a browser's compositing thread, and similar were the exact sort of thing I was thinking of - not a huge amount of work to do (since it's focused on making the GPU do as much work as possible), but very latency critical. And these are exactly the sorts of low utilization threads that "look" to heuristics like they should run on an efficiency core (since they don't need the compute for very long), but in fact really gain from the performance core - since it's a burst of compute to determine what GPU commands to send and when to wake up for the next frame, then submit work to the GPU and go idle until next frame.

I hadn't realised that 240 Hz was no longer the realm of top-end monitors - my point was more that given the numbers I'd chosen, this thread would "fit" on an efficiency core (and therefore heuristics might well schedule it on an efficiency core) with anything up to a 960 Hz monitor, and you need the extra guidance to tell the scheduler that no, even though my primary monitor is 60 Hz, it still ought to run on the performance core, and get the undivided attention of the performance core from when it's ready to run to when it submits and goes idle, so that its latency goals are met, even though it doesn't have a need for a huge amount of compute.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 13:48 UTC (Fri) by paulj (subscriber, #341) [Link] (5 responses)

Aside: What's the use case for 200+ Hz monitors?

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 14:00 UTC (Fri) by daenzer (subscriber, #7050) [Link] (2 responses)

Largely motion clarity / avoiding motion blur due to the hold characteristics of flat panel displays (as opposed to the scanning characteristics of CRTs). The current state of research is that the former at around 1000 Hz has motion clarity comparable to the latter at 50/60 Hz.

Another way to attack the same problem is "Black Frame Insertion", i.e. displaying solid black instead of the actual contents for some time between frames. There has been some interesting development in this area lately, which allows for much better motion clarity even at double-digit frame rates.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 14:18 UTC (Fri) by paulj (subscriber, #341) [Link] (1 responses)

Aha! Ok, interesting. So 1000 Hz on a *LCD == 50/60 Hz on the CRT? Is that cause the phosphor on the CRT keeps glowing by itself for a while - there's a hysteresis to the energy absorption of electrons and emission of light by the phosphor? And the *LCD pixels are turning off? (But if that were the issue, a bit of capacitance could fix that - but maybe that's hard to add in without affecting PPI and quality). ??

Interesting, learn something new every day. Tangents on LWN can be useful sometimes. ;)

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 14:33 UTC (Fri) by daenzer (subscriber, #7050) [Link]

> Is that cause the phosphor on the CRT keeps glowing by itself for a while - there's a hysteresis to the energy absorption of electrons and emission of light by the phosphor? And the *LCD pixels are turning off?

Pretty much the opposite. :) CRT pixels only glow for a short time, most of the time they're black[0]. Conversely, LCDs are on most of the time. Since our eyes automatically follow moving objects, we perceive LCD pixels smeared along the direction of motion.

https://testufo.com/ nicely demonstrates this effect. On a 240 Hz monitor, I see a big difference between the 60 Hz and 120 Hz lines, and a smaller but still noticeable difference between the latter and the 240 Hz line.

[0]: See https://www.youtube.com/watch?v=3BJU2drrtCM for how this looks in ultra slow motion.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 14:10 UTC (Fri) by farnz (subscriber, #17727) [Link]

NB: this is going from memory - it'll take me weeks to find the notes I made 25 years ago that covered this.

There's two fundamental frequencies of interest in human vision:

  1. The "flicker fusion threshold" (around 10 Hz), where the human visual system can be convinced that something is steady, not flickering.
  2. The stroboscopic threshold (can't remember the correct name off the top of my head, over 1 kHz, under 10 kHz), where the human visual system can detect something caused by changes at this speed.

Below the flicker fusion threshold, you're relying on humans being willing to suspend disbelief - they can see that the change is slow, but they might be willing to accept that it's "real" to join in.

Between the two frequencies, the human brain sees frames as in smooth motion as long as you don't hit temporal aliasing effects. The higher the frequency, the less likely you are to hit a temporal aliasing effect - since they occur at harmonics of the refresh rate.

Above the higher frequency, there's no risk of hitting a temporal aliasing effect. The human vision system can no longer react fast enough to change to detect that it's not a moving picture, but instead a sequence of static frames.

This leads to the use case for high refresh rates: it's to allow for faster motion without the human being able to detect that it's "not smooth". How they experience this varies - it might be that they experience it as latency between input and reaction, or that increasing the refresh rate increases their ability to notice "oddities" (since the brain is no longer dismissing things as the effect of flicker fusion), or as something else.

And somewhere (I wish I could find it again), there's a lovely paper from the US military, showing that carrier-deployed pilots are capable of identifying military aircraft that they only see for 1 millisecond - not just "allied" or "hostile", but MiG-15 versus MiG-17, Polish versus USSR markings, under-wing missiles mounted/not present etc. The paper itself is interesting because it's mostly not about the capabilities of the pilots - rather, it's several pages about all the tricks they had to pull to be completely confident that the pilots were not getting to see the plane in their field of view for more than 1 ms when it was "flashed" onto a sky-blue background.

Now, very few people can do that sort of rapid identification trick - I suspect the pilots only could because it was literally life-and-death for them if their training didn't let them identify a potential hostile that quickly - but we've all got roughly the same hardware, and I wouldn't be surprised if in gaming, time-based data visualizations and similar fields, there's a real effect to having a higher refresh rate.

Along, of course, with a bunch of people buying HFR monitors because "number goes up", even though they don't gain from it.

Latency and throughput tasks in heterogenous systems

Posted Jan 30, 2026 15:08 UTC (Fri) by excors (subscriber, #95769) [Link]

I think there's not much benefit to very high frequencies, but there's not that much cost either, so monitor manufacturers figure they may as well support it. And bigger numbers are always better for marketing.

There are clear benefits going 60->120Hz. Motion is noticeably smoother, even when simply dragging a window around your desktop. Or in an FPS game when you rotate the camera quickly, the scene might jump by hundreds of pixels per frame, and doubling the framerate will significantly reduce that jerkiness (without the downsides of simulated motion blur). Input latency is reduced by ~8ms, which isn't much but will sometimes be the difference between you or your enemy shooting first. You can run vsynced at new framerates (e.g. 40fps) without the downsides of VRR.

120->240Hz has all the same benefits, just with diminishing returns. Recent NVIDIA GPUs can interpolate between rendered frames to increase framerate by 4x, so you get 240fps basically for free (albeit losing the latency benefit). And you can buy very cheap 240Hz monitors, so why not. Then 240->480Hz is similar reasoning: even more diminished returns, and much greater costs, but the costs will come down over time, and if you can afford it then you might as well take those minor returns.

Third QoS framework

Posted Jan 14, 2026 21:07 UTC (Wed) by zdzichu (subscriber, #17118) [Link] (2 responses)

We already have two: https://www.kernel.org/doc/html/v6.9/power/pm_qos_interfa...

Plus ‘tc‘ subsystem is often talked as a (network) QoS.

Third QoS framework

Posted Jan 15, 2026 15:14 UTC (Thu) by chrisr (subscriber, #83108) [Link] (1 responses)

fwiw, I agree with Qais that where we are in Linux is quite far from where other OS' have got to. We're not exploiting our hardware fully - vendors implementing battery powered devices using Android are replacing large amounts of default scheduler and power management functionality to get device attributes in an acceptable place vs Apple devices. I think this is bad for Linux, you might not agree.

The existing frameworks do provide what it says in the docs however they are conceptually about the whole system designed for an omnipotent administrator. The concept is sufficiently different from what Qais is talking about here that it isn't really possible to reuse for this application. Also, changing cpu latency takes all the CPUs out of idle to apply the latency settings so while it is correct it is not something you want to change often on a battery powered device.

Other OS' have settled around having a few separate groups of QoS requirement. They combine latency, peak performance, CPU selection in heterogeneous systems, and likely a number of other controllable parameters such as media device bandwidth etc.

We don't have anything in the kernel apart from the controls (which are not always discoverable) and we have a multitude of application environments which are not coalescing around anything by themselves. Our SW architecture is making solving this in a suitably open way hard - I appreciate the argument that this is not something the kernel should be providing, but a common aggregator could help something to turn up in the various application environments and move us all forward. Adoption by one of the foundational distros plus Android is likely enough to ensure it spreads everywhere.

Another item which makes this harder than it should be is that controls provided through sysfs are not always easily accessible from a module. Sometimes you need to actually use the file interface rather than having visibility of the API directly. This makes implementing a module to tie a set of configurations to a group of tasks more painful and miles away from atomic.

Linux does have the opportunity to be really good at this if we can agree.

Third QoS framework

Posted Jan 15, 2026 15:44 UTC (Thu) by taladar (subscriber, #68407) [Link]

It feels a little bit as if Linux might be reaching the end of the line of what is possible with its basic architectural idea of never communicating details and always using guesses and heuristics in both directions.

On the one hand the applications don't provide enough information to the kernel, on the other hand the kernel does not provide enough information to mechanisms inside the application so both sides use heuristics.

We can see similar effects in other areas where countless hours are wasted because someone thinks "No such file or directory", "Permission denied" or similar generic messages are enough information to debug a problem.

The whole imperfect filtering of system calls by security tools is another symptom of an unsuitably ill-defined kernel/user space communication interface.

Maybe the entire way kernel and user space communicate is due for a clean overhaul?


Copyright © 2026, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds