Skip to content

Conversation

@oeasy1412
Copy link
Contributor

本次PR主要修复了与 信号处理、进程ID获取 和 fork 相关的四个关键问题:

  1. 修正 getpid 系统调用:将 getpid() 的实现改为返回 task_tgid_vnr(current),以确保在PID命名空间环境中能返回当前命名空间下的正确进程组ID(TGID),符合 Linux 标准语义,修复了 tgkill 等依赖 TGID 的操作在多线程或容器环境下的归属判断错误。
  2. 修复信号阻塞掩码处理:解决了 new_blocked.remove(awaited) 的逻辑错误,确保在等待信号期间能正确管理信号的阻塞状态,避免了本该被丢弃的忽略信号被错误入队。。
  3. 实现精确的信号忽略判断:新增了 sig_task_ignoredsig_ignored 函数,为核心信号处理逻辑添加了完整的信号忽略判断机制。该机制能正确处理:
    • init进程(rawpid为1)对 SIGKILLSIGSTOP 信号的特殊忽略规则。
    • 被标记为 UNKILLABLE 的进程的信号处理。
    • 检查信号的默认处理行为是否为忽略(SIG_KERNEL_IGNORE_MASK)。
    • 综合考虑进程的信号阻塞掩码(sigmasksaved_sigmask),确保即使信号的处置为忽略,但如果该信号被阻塞,它也会被正确放入队列而不被立即丢弃。
  4. 调整 copy_process 流程顺序与并发安全性修复
    • 严格对齐执行顺序:参照 Linux 6.6.21 源码,调整了 sched_fork、资源拷贝(copy_fs 提前)、PID 分配等步骤的执行顺序。
    • 修复并发可见性竞态:将 ProcessManager::add_pcb(全局可见)推迟到所有 PID 初始化(init_task_pid)和哈希表挂载(attach_pid)完成之后执行。这消除了 PCB 对外可见但内部 ID 数据尚未初始化的竞态窗口。

@github-actions github-actions bot added the Bug fix A bug is fixed in this pull request label Dec 4, 2025
Signed-off-by: aLinChe <[email protected]>
tgid.data()
);
return Ok(1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个问题需要解决一下

@fslongjin
Copy link
Member

@claude review,审查功能、性能、安全、linux一致性的问题

@dragonos-community-ai-pr-reviewer
Copy link

dragonos-community-ai-pr-reviewer bot commented Dec 5, 2025

Claude finished @fslongjin's task —— View job


PR Review: #1434 - fix(signal): correct pid retrieval and signal ignoring logic

I've completed a thorough review of this PR for functionality, performance, security, and Linux compatibility. Here are my findings:

Overall Assessment

The PR addresses four important issues related to signal handling and process management. Most changes are correct and improve Linux compatibility, but there is one critical bug in rt_sigtimedwait that needs fixing.

1. getpid System Call Changes (kernel/src/process/syscall/sys_getpid.rs)

Change: getpid() now returns task_tgid_vnr(current) instead of task_pid_vnr(current), aligning with Linux semantics where getpid() returns the thread group ID (TGID).

Review:Correct. This fixes tgkill and other operations that depend on TGID in multi-threaded or container environments.

Issue: The special case for init process (PID=1 && TGID≠1) logs an error and returns 1. @fslongjin's comment on line 25 indicates this needs resolution. This inconsistency suggests a deeper problem with PID namespace initialization.

Recommendation: Instead of papering over the inconsistency, investigate why tgid might not be 1 for init and fix the root cause. The current workaround may hide bugs in PID namespace setup.

2. Signal Ignoring Logic Implementation (kernel/src/ipc/signal.rs, kernel/src/ipc/signal_types.rs)

Changes:

  • Added sig_task_ignored() and sig_ignored() functions.
  • Proper handling of init process (raw_pid == 1) ignoring SIGKILL/SIGSTOP.
  • UNKILLABLE processes (including container init) respect force-send semantics.
  • Signals blocked by sigmask or saved_sigmask are not ignored (remain queued).
  • Added missing constants (SIG_KERNEL_ONLY_MASK, SIG_KERNEL_IGNORE_MASK).

Review:Mostly correct. The logic aligns with Linux signal handling semantics. However, note:

  • The force parameter in sig_task_ignored() correctly allows SIGKILL/SIGSTOP to be forced through UNKILLABLE processes.
  • Blocked signals are correctly kept pending (return false from sig_ignored).
  • The RESTORE_SIG_MASK check for saved_sigmask is appropriate for rt_sigtimedwait/pselect contexts.

Potential Edge Cases: Ensure that ptrace interception (TODO) is eventually implemented for completeness.

3. Signal Blocking Mask Fix in rt_sigtimedwait (kernel/src/ipc/syscall/sys_rt_sigtimedwait.rs)

Change: Line 85 changed from new_blocked.insert(awaited) to new_blocked.remove(awaited).

Review:Incorrect. Both the original and the changed code are wrong according to Linux semantics.

Linux Specification: rt_sigtimedwait() replaces the caller's signal mask with the set pointed to by uthese (after removing SIGKILL/SIGSTOP). It does not add to or remove from the existing mask.

Current Behavior:

  • Before PR: new_blocked.insert(awaited) → adds awaited to existing blocked set (wrong).
  • After PR: new_blocked.remove(awaited) → removes awaited from existing blocked set (wrong).

Correct Implementation:

let mut new_blocked = awaited; // after SIGKILL/SIGSTOP removal already done
set_user_sigmask(&mut new_blocked);

Impact: The bug could cause signals to be incorrectly delivered during the wait, leading to missed signals or spurious EINTR.

Recommendation: Revert the change and implement the correct replacement semantics.

4. copy_process Order Adjustments (kernel/src/process/fork.rs)

Changes:

  • Reordered operations to match Linux 6.6.21 (sched_fork → copy_fs → copy_files → copy_mm → PID allocation).
  • Moved ProcessManager::add_pcb() after init_task_pid() and attach_pid().
  • Moved CLONE_CHILD_SETTID and CLONE_PIDFD operations later.

Review:Correct. These changes eliminate a race condition where the PCB becomes globally visible before its PID is fully initialized. The ordering aligns with Linux and improves concurrency safety.

Note: Ensure that the moved CLONE_PIDFD code still works correctly with the new PID assignment order (it uses pcb.raw_pid() which should now be set earlier).

5. Test Changes (user/apps/tests/syscall/gvisor/blocklists/)

Changes:

  • Deleted sigtimedwait_test (presumably because the fix allows tests to pass).
  • Removed two lines from itimer_test (likely now passing).

Review:Positive. Removing tests from blocklists indicates improved functionality.

Summary of Required Actions

  1. Fix rt_sigtimedwait signal mask handling – implement correct replacement semantics.
  2. Address init PID/TGID inconsistency – investigate root cause rather than masking with a workaround.
  3. Verify CLONE_PIDFD works with new fork order (should be fine).

Security & Performance

  • Security: The signal ignoring logic properly respects SIGKILL/SIGSTOP semantics for init and UNKILLABLE processes. No new vulnerabilities introduced.
  • Performance: Added signal checks are negligible overhead; fork ordering changes have no performance impact.

Linux Compatibility

Overall improvements. Once the rt_sigtimedwait bug is fixed, the PR significantly enhances signal handling compliance.


Review completed. Please address the critical issue in rt_sigtimedwait and consider the init PID/TGID inconsistency.

@oeasy1412
Copy link
Contributor Author

@claude review 根据Linux6.6.21的源码重新分析你说的 3. Signal Blocking Mask Fix

static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
		    const struct timespec64 *ts)
{
	ktime_t *to = NULL, timeout = KTIME_MAX;
	struct task_struct *tsk = current;
	sigset_t mask = *which;
	enum pid_type type;
	int sig, ret = 0;

	if (ts) {
		if (!timespec64_valid(ts))
			return -EINVAL;
		timeout = timespec64_to_ktime(*ts);
		to = &timeout;
	}

	/*
	 * Invert the set of allowed signals to get those we want to block.
	 */
	sigdelsetmask(&mask, sigmask(SIGKILL) | sigmask(SIGSTOP));
	signotset(&mask);

	spin_lock_irq(&tsk->sighand->siglock);
	sig = dequeue_signal(tsk, &mask, info, &type);
	if (!sig && timeout) {
		/*
		 * None ready, temporarily unblock those we're interested
		 * while we are sleeping in so that we'll be awakened when
		 * they arrive. Unblocking is always fine, we can avoid
		 * set_current_blocked().
		 */
		tsk->real_blocked = tsk->blocked;
        // 这里的操作等价于 new_blocked.remove(awaited)
        // 数学原理: Blocked & (~Awaited) === Blocked - Awaited
		sigandsets(&tsk->blocked, &tsk->blocked, &mask);
		recalc_sigpending();
		spin_unlock_irq(&tsk->sighand->siglock);

		__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
		ret = schedule_hrtimeout_range(to, tsk->timer_slack_ns,
					       HRTIMER_MODE_REL);
		spin_lock_irq(&tsk->sighand->siglock);
		__set_task_blocked(tsk, &tsk->real_blocked);
		sigemptyset(&tsk->real_blocked);
		sig = dequeue_signal(tsk, &mask, info, &type);
	}
	spin_unlock_irq(&tsk->sighand->siglock);

	if (sig)
		return sig;
	return ret ? -EINTR : -EAGAIN;
}

https://code.dragonos.org.cn/xref/linux-6.6.21/kernel/signal.c#3604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug fix A bug is fixed in this pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants