Skip to content

NotImplementedError: c10d::broadcast_: at #75

@yzhang123

Description

@yzhang123

Hi, I'm running on CUDA 12.0
and torch 2.6.0+cu124
Python 3.10.14
transformers 4.50.1

When i run
python train.py
--model_name meta-llama/Llama-3.2-1B
--gradient_accumulation_steps 2
--batch_size 8
--context_length 512
--num_epochs 1
--train_type qlora
--use_gradient_checkpointing False
--use_cpu_offload False
--log_to wandb
--dataset alpaca
--verbose false
--save_model true
--output_dir ~/models/qlora_alpaca

I get

File "$HOME/anaconda3/envs/py10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 509, in init
_init_param_handle_from_module(
File "$HOME/anaconda3/envs/py10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 629, in _init_param_handle_from_module
_sync_module_params_and_buffers(
File "$HOME/anaconda3/envs/py10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 1126, in _sync_module_params_and_buffers
_sync_params_and_buffers(
File "$HOME/anaconda3/envs/py10/lib/python3.10/site-packages/torch/distributed/utils.py", line 334, in _sync_params_and_buffers
dist.broadcast_coalesced(
NotImplementedError: c10d::broadcast
: attempted to run this operator with Meta tensors, but there was no fake impl or Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); in order to use this operator with those APIs you'll need to add a fake impl. Please see the following for next steps: https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html

/$HOME/anaconda3/envs/py10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions