On Thu, Jan 10, 2008 at 02:21:24AM +0200, Mindaugas R. wrote: > here is the implementation of processor-sets and CPU affinity calls. Also, > this implements POSIX real-time extensions like: pthread_getschedparam(), > pthread_setschedparam(), pthread_setschedprio(), sched_setscheduler(), > sched_getscheduler(), etc. A userland utility schedctl(1) is provided too. > > http://www.netbsd.org/~rmind/psets_affinity_rt.tar.bz2 > > The processor-sets are compatible with Solaris API: > > int pset_assign(psetid_t, cpuid_t, psetid_t *); > int pset_bind(psetid_t, idtype_t, id_t, psetid_t *); > int pset_create(psetid_t *); > int pset_destroy(psetid_t); > > Two non-portable pthread affinity calls (compatible with Linux): > > int pthread_getaffinity_np(pthread_t, size_t, cpuset_t *); > int pthread_setaffinity_np(pthread_t, size_t, cpuset_t *); > > Unless nobody objects, I would like to start merging these sources. > Few things to mention about the implementation: > > 1. Instead of providing separate system calls for setting the priority and > scheduling class (policy, as defined by POSIX), there is an internal system > call _sched_setparam() to pass all parameters in a structure: > > struct sched_param { > int sched_class; > int sched_priority; > }; > > Is there anything potentially wrong with this? > > 2. There is another internal system call: > > int _pset_bind(idtype_t idtype, id_t first_id, id_t second_id, > psetid_t psid, psetid_t opsid); > > It is reasonable to provide a possibility for administrator to bind any > threads via userland utility. At this point, there is a need of two IDs (eg. > one for PID, and other for LID). I do not like such design i.e. to use two ID > arguments in syscall, however I am not sure about a better way. Thoughts? > > 3. There is a kernel function lwp_migrate(), which might be used for generic > migration of thread from one CPU to another. The problematic case when thread > is on LSONPROC state. In such case lwp::l_target_cpu is set, and migration > is performed in mi_switch(). However, this increases the complexity of > mi_switch(). One of the alternatives would be migration queues, but it has > few disadvantages: >1) this is needed only for SCHED_M2 scheduler because of per-CPU locks; > migration with SCHED_4BSD is trivial; >2) there is no easy way to abstract and close this in the scheduler. > Comments? Thanks for doing this, these are some pretty cool features to finally have! I have some comments, which we discussed in private: o 'nice' doesn't work with SCHED_M2, which is a regression. Asking people to use schedctl doesn't really wash, because nice is specified by POSIX, works on every other Unix type system and has been around for over 20 years. o SCHED_4BSD doesn't provide the new features and I'm strongly of the opinion that's a bug. By providing pluggable schedulers we gave people options. By fragmenting the feature set by scheduler we are taking those options away again. o I mentioned that I don't like how we deal with on-processor migration in mi_switch() because it's complicated - and mi_switch() is already too complicated. I'll try to think of a better way to handle it. Thanks, Andrew