[fix] fix ipc signal name conflict & remained ipc signal file #3430
+134
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
问题描述
当前FD中使用大量共享内存实现进程间通信,这些共享内存会在 /dev/shm 目录下创建带有 server pid 的临时句柄。而当进程意外退出或被强制杀掉时,这些句柄会残留在系统中。在流水线机器上,需要启动多个容器执行任务,同时挂载主机的 /dev 目录。在容器内部,进程 pid 都会从头开始分配。导致在不同容器内部可能会出现相同 pid 的进程尝试连接同一个 /dev/shm 中的共享内存句柄的情况,或是残留的相同 pid 进程的句柄没有被及时清理掉,导致冲突。
解决方法
主要解决两个问题:
一是命名冲突。两个容器中同为 pid=10001 的进程同时访问 xxx_signal.10001 句柄。解决方案是为每个共享内存名再添加一个毫秒级的时间戳后缀,如 .20250815_085416_847,这样只要保证两个容器内的进程不是绝对同时启动的,就不会有命名冲突的问题。同时,该后缀支持通过环境变量
FD_IPC_APPEND_SUFFIX
控制, 最终生成的句柄名为xxx_signal.10001.${FD_IPC_APPEND_SUFFIX} 。二是句柄残留。
主要改动点