-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x3) ==== backtrace (tid: 212877) ====
0 0x0000000000042520 __sigaction() ???:0 1 0x0000000000049b8a ncclMemoryPoolAlloc<ncclProxyOp>() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/include/utils.h:280
2 0x0000000000049b8a addProxyOpIfNeeded() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:180 3 0x0000000000049b8a addProxyOpIfNeeded() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:176
4 0x000000000004c496 addCBDCollToPlan() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:481
5 0x000000000004f5bd ncclLaunchPrepare() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:844
6 0x000000000004f5bd ncclLaunchPrepare() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:1260
7 0x0000000000053d4b groupLaunch() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/group.cc:129
8 0x0000000000053d4b groupLaunch() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/group.cc:339
9 0x0000000000054f88 ncclGroupEndInternal() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/group.cc:418
10 0x0000000000054f88 ncclGroupEndInternal() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/group.cc:368
11 0x000000000004d74f ncclEnqueueCheck() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:2032
12 0x0000000000044b36 ncclAllGather() /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/collectives.cc:26
13 0x00000000011fd1f3 c10d::ProcessGroupNCCL::_allgather_base() ???:0
14 0x0000000005f8e9b8 c10d::ops::(anonymous namespace)::_allgather_base_CUDA() Ops.cpp:0
15 0x0000000005f985cc c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_defa
ult_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at:
:Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10:
:detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call() :0
16 0x00000000055b224b c10::OperatorHandle::redispatchBoxed() :0
17 0x00000000055afad9 torch::autograd::basicAutogradNotImplementedFallbackImpl() autograd_not_implemented_fallback.cpp:0 18 0x0000000001a8c3f8 c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::autograd_fallback>() VariableFallbackKernel.cpp:0
19 0x0000000005f9fc2e c10::impl::BoxedKernelWrapper<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), void>::call() :0
20 0x0000000005fabfe8 c10d::ProcessGroup::_allgather_base() :0 21 0x0000000000df6c7e pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup
, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name con
st&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at:
:Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybin
d11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybin
d11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name co
nst&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11
::detail::function_call&)#3}::_FUN() :0
22 0x00000000004cb474 pybind11::cpp_function::dispatcher() :0
23 0x000000000015a10e PyObject_CallFunctionObjArgs() ???:0
24 0x0000000000150a7b _PyObject_MakeTpCall() ???:0
25 0x0000000000168acb PyMethod_New() ???:0
26 0x0000000000148cfa _PyEval_EvalFrameDefault() ???:0
27 0x000000000015a9fc _PyFunction_Vectorcall() ???:0
28 0x0000000000169492 PyObject_Call() ???:0
29 0x00000000001455d7 _PyEval_EvalFrameDefault() ???:0
30 0x000000000015a9fc _PyFunction_Vectorcall() ???:0
31 0x000000000014453c _PyEval_EvalFrameDefault() ???:0
32 0x000000000015a9fc _PyFunction_Vectorcall() ???:0
33 0x000000000014345c _PyEval_EvalFrameDefault() ???:0
34 0x000000000015a9fc _PyFunction_Vectorcall() ???:0
35 0x000000000014326d _PyEval_EvalFrameDefault() ???:0
36 0x000000000016893e PyMethod_New() ???:0
37 0x00000000001455d7 _PyEval_EvalFrameDefault() ???:0 38 0x000000000016893e PyMethod_New() ???:0
39 0x00000000001455d7 _PyEval_EvalFrameDefault() ???:0
40 0x000000000014fc14 _PyObject_FastCallDictTstate() ???:0
41 0x000000000016586c _PyObject_Call_Prepend() ???:0
42 0x0000000000280700 PyInit__datetime() ???:0
43 0x0000000000150a7b _PyObject_MakeTpCall() ???:0
44 0x0000000000149629 _PyEval_EvalFrameDefault() ???:0
45 0x000000000015a9fc _PyFunction_Vectorcall() ???:0
46 0x00000000001455d7 _PyEval_EvalFrameDefault() ???:0
47 0x00000000001687f1 PyMethod_New() ???:0
48 0x0000000000148cfa _PyEval_EvalFrameDefault() ???:0
49 0x000000000015a9fc _PyFunction_Vectorcall() ???:0
50 0x000000000014345c _PyEval_EvalFrameDefault() ???:0
51 0x000000000015a9fc _PyFunction_Vectorcall() ???:0
52 0x000000000014345c _PyEval_EvalFrameDefault() ???:0
53 0x000000000015a9fc _PyFunction_Vectorcall() ???:0
54 0x000000000014345c _PyEval_EvalFrameDefault() ???:0
55 0x000000000015a9fc _PyFunction_Vectorcall() ???:0
56 0x00000000001455d7 _PyEval_EvalFrameDefault() ???:0
=================================
[2025-01-08 11:17:51 TP7] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 1578, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 410, in event_loop_overlap
recv_reqs = self.recv_requests()
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 459, in recv_requests
recv_reqs = broadcast_pyobj(recv_reqs, self.tp_rank, self.tp_cpu_group)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/utils.py", line 731, in broadcast_pyobj
dist.broadcast(tensor_size, src=0, group=dist_group)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 83, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2425, in broadcast
work.wait()
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer [29.127.64.100]:26496
[2025-01-08 11:17:51 TP1] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 1578, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 410, in event_loop_overlap
recv_reqs = self.recv_requests()
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 459, in recv_requests
recv_reqs = broadcast_pyobj(recv_reqs, self.tp_rank, self.tp_cpu_group)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/utils.py", line 731, in broadcast_pyobj
dist.broadcast(tensor_size, src=0, group=dist_group)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 83, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2425, in broadcast
work.wait()
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer [29.127.64.100]:2711
Killed
Reproduction
node 1
python -m sglang.launch_server --model-path DeepSeek-V3 --tp 16 --nccl-init 29.127.64.100:5000 --nnodes 2 --node-rank 0 --trust-remote-code --port 80 --host 0.0.0.0 --schedule-conservativeness 0.3 --context-length 32768
node2
python -m sglang.launch_server --model-path DeepSeek-V3 --tp 16 --nccl-init 29.127.64.100:5000 --nnodes 2 --node-rank 1 --trust-remote-code --port 80 --host 0.0.0.0 --schedule-conservativeness 0.3 --context-length 32768
Environment
/usr/local/lib/python3.10/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Warning: Your installation of OpenCV appears to be broken: module 'cv2.dnn' has no attribute 'DictValue'.Please follow the instructions at https://github.com/opencv/opencv-python/issues/884 to correct your environment. The import o
f cv2 has been skipped.
/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed warnings.warn(message, UserWarning) Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 535.161.08
PyTorch: 2.5.1+cu124
sglang: 0.4.1.post3
flashinfer: 0.1.6+cu124torch2.4
triton: 3.1.0
transformers: 4.47.1
torchao: 0.7.0
numpy: 1.26.4
aiohttp: 3.9.5
fastapi: 0.114.1
hf_transfer: 0.1.8
huggingface_hub: 0.24.7
interegular: 0.3.3
modelscope: 1.21.1
orjson: 3.10.13
packaging: 24.0
psutil: 5.9.8
pydantic: 2.9.1 multipart: 0.0.20 zmq: 26.0.3
uvicorn: 0.30.6
uvloop: 0.20.0
vllm: 0.6.4.post1 openai: 1.58.1 anthropic: 0.42.0 decord: 0.6.0
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 NIC10 NIC11 NIC12 NIC13 NIC14 NIC15 NIC16 NIC17 NIC18 NIC19 NIC20 NIC21 NIC22 NIC23 NIC24 NIC25 CPU Affinity NUMA Affini
ty GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS PIX NODE NODE NODE SYS SYS SYS SYS 0-95,192-287 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS NODE NODE PHB PIX SYS SYS SYS SYS 0-95,192-287 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS NODE NODE PIX PHB SYS SYS SYS SYS 0-95,192-287 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS NODE PIX NODE NODE SYS SYS SYS SYS 0-95,192-287 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE SYS SYS SYS SYS NODE NODE PIX NODE 96-191,288-383 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE SYS SYS SYS SYS NODE PIX NODE NODE 96-191,288-383 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE SYS SYS SYS SYS PHB NODE NODE PIX 96-191,288-383 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE SYS SYS SYS SYS PIX NODE NODE PHB 96-191,288-383 1 N/A
NIC0 SYS SYS SYS SYS NODE NODE NODE NODE X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC1 SYS SYS SYS SYS NODE NODE NODE NODE PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC2 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC3 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC4 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC5 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC6 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC7 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC8 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC9 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC10 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC11 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC12 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC13 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC14 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC15 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIXPIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC16 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC17 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X SYS SYS SYS SYS NODE NODE NODE NODE
NIC18 PIX NODE NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS X NODE NODE NODE SYS SYS SYS SYS
NIC19 NODE NODE NODE PIX SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS NODE X NODE NODE SYS SYS SYS SYS
NIC20 NODE PHB PIX NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS NODE NODE X PHB SYS SYS SYS SYS
NIC21 NODE PIX PHB NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYSSYS NODE NODE PHB X SYS SYS SYS SYS
NIC22 SYS SYS SYS SYS NODE NODE PHB PIX NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE SYS SYS SYS SYS X NODE NODE PHB
NIC23 SYS SYS SYS SYS NODE PIX NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE SYS SYS SYS SYS NODE X NODE NODE
NIC24 SYS SYS SYS SYS PIX NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE SYS SYS SYS SYS NODE NODE X NODE
NIC25 SYS SYS SYS SYS NODE NODE PIX PHB NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE SYS SYS SYS SYS PHB NODE NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NIC9: mlx5_9
NIC10: mlx5_10
NIC0: mlx5_0 NIC1: mlx5_1 NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NIC9: mlx5_9
NIC10: mlx5_10
NIC11: mlx5_11
NIC12: mlx5_12
NIC13: mlx5_13
NIC14: mlx5_14
NIC15: mlx5_16
NIC16: mlx5_17
NIC17: mlx5_18
NIC18: mlx5_bond_1
NIC19: mlx5_bond_2
NIC20: mlx5_bond_3
NIC21: mlx5_bond_4
NIC22: mlx5_bond_5
NIC23: mlx5_bond_6
NIC24: mlx5_bond_7 NIC25: mlx5_bond_8
ulimit soft: 1024