Skip to content

[fix] fix ipc signal name conflict & remained ipc signal file #3430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

liyonghua0910
Copy link
Collaborator

@liyonghua0910 liyonghua0910 commented Aug 15, 2025

问题描述

当前FD中使用大量共享内存实现进程间通信,这些共享内存会在 /dev/shm 目录下创建带有 server pid 的临时句柄。而当进程意外退出或被强制杀掉时,这些句柄会残留在系统中。在流水线机器上,需要启动多个容器执行任务,同时挂载主机的 /dev 目录。在容器内部,进程 pid 都会从头开始分配。导致在不同容器内部可能会出现相同 pid 的进程尝试连接同一个 /dev/shm 中的共享内存句柄的情况,或是残留的相同 pid 进程的句柄没有被及时清理掉,导致冲突。

解决方法

主要解决两个问题:

一是命名冲突。两个容器中同为 pid=10001 的进程同时访问 xxx_signal.10001 句柄。解决方案是为每个共享内存名再添加一个毫秒级的时间戳后缀,如 .20250815_085416_847,这样只要保证两个容器内的进程不是绝对同时启动的,就不会有命名冲突的问题。同时,该后缀支持通过环境变量 FD_IPC_APPEND_SUFFIX 控制, 最终生成的句柄名为xxx_signal.10001.${FD_IPC_APPEND_SUFFIX} 。

二是句柄残留

主要改动点

Copy link

paddle-bot bot commented Aug 15, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant