Skip to content

training problem #40

@vision-heng

Description

@vision-heng

Hello, Professor! I have the following problem when running the code on win11. Can you explain what they mean and how to solve the problems? (my graph memory is 8GB) Thank you very much!

python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100
Namespace(K=2, base_lr1=0.1, base_lr2=0.1, baseline='lucir', branch_1='ss', branch_2='free', branch_mode='dual', ckpt_dir_fg='-', ckpt_label='exp01', custom_momentum=0.9, custom_weight_decay=0.0005, data_dir=
'data/seed_1993_subset_100_imagenet/data', dataset='cifar100', disable_gpu_occupancy=True, dist=0.5, dynamic_budget=False, epochs=160, eval_batch_size=128, fusion_lr=1e-08, gpu='0', icarl_T=2, icarl_beta=0.25
, imgnet_backbone='resnet18', lr_factor=0.1, lw_mr=1, nb_cl=10, nb_cl_fg=50, nb_protos=20, num_classes=100, num_workers=1, random_seed=1993, resume=False, resume_fg=False, test_batch_size=100, the_lambda=5, train_batch_size=128)
Using gpu: 0
Total memory: 8192, used memory: 829
Occupy GPU memory in advance.
Files already downloaded and verified
Files already downloaded and verified
Order name:./logs/cifar100_nfg50_ncls10_nproto20_lucir_dual_b1ss_b2free_fixed_exp01\seed_1993_cifar100_order.pkl
Loading the saved class order
[68, 56, 78, 8, 23, 84, 90, 65, 74, 76, 40, 89, 3, 92, 55, 9, 26, 80, 43, 38, 58, 70, 77, 1, 85, 19, 17, 50, 28, 53, 13, 81, 45, 82, 6, 59, 83, 16, 15, 44, 91, 41, 72, 60, 79, 52, 20, 10, 31, 54, 37, 95, 14, 71, 96, 98, 97, 2, 64, 66, 42, 22, 35, 86, 24, 34, 87, 21, 99, 0, 88, 27, 18, 94, 11, 12, 47, 25, 30, 46, 62, 69, 36, 61, 7, 63, 75, 5, 32, 4, 51, 48, 73, 93, 39, 67, 29, 49, 57, 33]
Feature: 64 Class: 50
Setting the dataloaders ...
Check point name: ./logs/cifar100_nfg50_ncls10_nproto20_lucir_dual_b1ss_b2free_fixed_exp01\iter_4_b1.pth

Epoch: 0, learning rate: 0.1
Traceback (most recent call last):
File "main.py", line 88, in
trainer.train()
File "E:\AlgSpace\pycharm\AANets\trainer\trainer.py", line 171, in train
cur_lambda, self.args.dist, self.args.K, self.args.lw_mr)
File "E:\AlgSpace\pycharm\AANets\trainer\zeroth_phase.py", line 63, in incremental_train_and_eval_zeroth_phase
outputs = b1_model(inputs)
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "E:\AlgSpace\pycharm\AANets\models\modified_resnet_cifar.py", line 109, in forward
x = self.fc(x)
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "E:\AlgSpace\pycharm\AANets\models\modified_linear.py", line 37, in forward
F.normalize(self.weight, p=2, dim=1))
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\functional.py", line 1371, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions