This guide explores an experimental Linux scheduler, that adopts a naive, non-preemptive Round-Robin approach to task management. Developed as part of a teaching exercise, the scheduler implements a simplified workflow where tasks are cycled through a runqueue without the typical preemptive interruptions found in production systems. In this setup, a task only yields the CPU upon the expiration of its allocated timeslice or through voluntary yielding, and the scheduler then reorders the tasks accordingly. The guide offers a detailed, event-driven perspective on how these operations are handled, making it a valuable resource for understanding core scheduling concepts in operating systems.
When a Task’s Timeslice Expires
- The timer interrupt code calls
scheduler_tick()
, with HZ frequency. - This function subsequently calls the current task’s
task_tick()
function bycurr->sched_class->task_tick()
. - In the implementation for the scheduler, this function decrements the task’s timeslice (using jiffies as the unit).
- When the timeslice reaches zero, the following steps occur while the task remains on the CPU:
- The task is repositioned to the end of the runqueue.
- The
task_tick()
function callsresched_curr()
, setting theTIF_NEED_RESCHED
flag meaning the task needs to be rescheduled.
- On subsequent interrupts or when returning to user space, the kernel checks for the
TIF_NEED_RESCHED
flag and, if set, invokesschedule()
, which underlying invokes__schedule()
. - This huge
if
block is NOT executed.- In particular, recall that the
task_struct
has a__state
field. When a task isRUNNABLE
, the state field is zero. When a task is notRUNNABLE
, the state field is non-zero. - In this case, the task is still
RUNNABLE
, i.e.prev_state
is zero. So theif
block is not executed.
- In particular, recall that the
__schedule()
eventually callspick_next_task()
here.- Our implementation of
pick_next_task()
retrieves the task at the front of the runqueue. - Finally, the kernel performs a context switch via
context_switch()
here, transferring control to the newly selected task.
When a Task Voluntarily Yields While Remaining RUNNABLE
In scenarios where a task voluntarily yields the CPU but intends to remain in the runqueue, it signals its willingness to be moved to the back of the queue without fully relinquishing its eligibility for immediate rescheduling. This process unfolds as follows:
- The task calls the
sched_yield
system call, which leads to the invocation ofdo_sched_yield()
. do_sched_yield()
then calls the task’syield_task()
function atcurrent->sched_class->yield_task()
.- Within
yield_task()
, the task’s timeslice is set to zero.
On the next timer interrupt, the scheduler treats this as if the task’s timeslice has expired, preempting it accordingly and repositioning it in the runqueue.
When a Task Voluntarily Yields and Is Not RUNNABLE
When a task decides to yield the CPU and no longer wishes to remain in the runqueue—such as when entering a sleep state or upon completion of its execution—the procedure is slightly different:
- The task alters its state to indicate that it is not RUNNABLE and calls
__schedule()
with a non-preemptive flag.- In the case of task termination, the
exit()
system call is made, which leads to the invocation ofdo_exit()
and subsequentlydo_task_dead()
. The task state is updated accordingly before calling__schedule()
. - For tasks entering a sleep state (e.g., via
nanosleep()
or similar functions), the task state is similarly updated here, and then invokes `schedule().
- In the case of task termination, the
- In both cases, the updated task state (non-zero, indicating non-RUNNABLE) triggers a branch in
__schedule()
that callsdeactivate_task()
. This function, in turn, invokesdequeue_task()
implementation of the current scheduling class, which should remove the task from the runqueue. - Recall that we’re still within an earlier invocation of
__schedule()
. This invocation callspick_next_task()
andcontext_switch()
as mentioned above. - With the task removed, the scheduler selects the next task from the runqueue to execute.
The code snippets and links in this post correspond to Linux v6.8.
I warmly welcome any bug reports or suggestions for improvement regarding this scheduler implementation. Your feedback is invaluable in refining the code and clarifying the underlying concepts. Additionally, I invite future operating systems course teaching teams to reference this page as a resource when preparing their own course materials. Collaboration and shared insights help us all enhance the educational experience.