sched: Fix the do_set_cpus_allowed() locking fix

Commit abfc01077d ("sched: Fix do_set_cpus_allowed() locking") overlooked that __balance_push_cpu_stop() calls select_fallback_rq() with rq->lock held. This makes that set_cpus_allowed_force() will recursively take rq->lock and the machine locks up. Run select_fallback_rq() earlier, without holding rq->lock. This opens up a race window where a task could get migrated out from under us, but that is harmless, we want the task migrated. select_fallback_rq() itself will not be subject to concurrency as it will be fully serialized by p->pi_lock, so there is no chance of set_cpus_allowed_force() getting called with different arguments and selecting different fallback CPUs for one task. Fixes: abfc01077d ("sched: Fix do_set_cpus_allowed() locking") Reported-by: Jan Polensky <japo@linux.ibm.com> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Jan Polensky <japo@linux.ibm.com> Closes: https://lore.kernel.org/oe-lkp/202510271206.24495a68-lkp@intel.com Link: https://patch.msgid.link/20251027110133.GI3245006@noisy.programming.kicks-ass.net
2025-12-07 20:06:24 +00:00 · 2025-10-27 12:01:33 +01:00
parent 73cbcfe255
commit af13e5e437
1 changed files with 7 additions and 10 deletions
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8044,18 +8044,15 @@ static int __balance_push_cpu_stop(void *arg)
 	struct rq_flags rf;
 	int cpu;

-	raw_spin_lock_irq(&p->pi_lock);
-	rq_lock(rq, &rf);
-
-	update_rq_clock(rq);
-
-	if (task_rq(p) == rq && task_on_rq_queued(p)) {
+	scoped_guard (raw_spinlock_irq, &p->pi_lock) {
 		cpu = select_fallback_rq(rq->cpu, p);
-		rq = __migrate_task(rq, &rf, p, cpu);
-	}

-	rq_unlock(rq, &rf);
-	raw_spin_unlock_irq(&p->pi_lock);
+		rq_lock(rq, &rf);
+		update_rq_clock(rq);
+		if (task_rq(p) == rq && task_on_rq_queued(p))
+			rq = __migrate_task(rq, &rf, p, cpu);
+		rq_unlock(rq, &rf);
+	}

 	put_task_struct(p);