Skip to content

Evaluation results of Stage 1 are lower than reported in the paper #27

@Eku127

Description

@Eku127

Thanks for sharing the training dataset and code. I followed your guide to generate the training data and used the provided training code to successfully train a model using RxR + R2R + envdrop data. However, after evaluating on R2R_v1, I found that the results are significantly lower than those reported in Table 1 of your paper: (NE = 5.43, OS = 62.5, SR = 52.8, SPL = 47.2).

Image

I have also tested the results on R2R_v1-3, the results are almost the same.

Image

I did not modify the data generation code, and I kept all training parameters unchanged, except for adjusting the per_device_train_batch_size=1 and gradient_accumulation_steps=4.
To assist with troubleshooting, I’ve attached my training and evaluation scripts for reference. Also the trainning log is attached.

Could you please advise if you followed any additional steps not mentioned in the guide, or if there are any known issues that might lead to this performance drop ?I would also like to know the results you obtained when you re-ran the experiments internally.

I would appreciate any suggestions to ensure the reproduction of the same results.

Attachments:

  1. Training Scripts
    streamvln_train.zip

  2. Evaluaiton Scripts
    streamvln_eval.zip

  3. Final Results (Updated)
    results.zip

  4. Training Log
    streamvln-12670.zip

  5. Trained ckpt (Updated)
    OneDrive

Update (250811)

  1. The results after the bugfix in dataset code. Auther‘s Reply Result: Evaluation results of Stage 1 are lower than reported in the paper #27 (comment)
Source Version / Note NE OS SR SPL
Paper Table 3 Row 1 6.05 53.8 45.5 41.6
Paper Table 1 Row 20 5.43 62.5 52.8 47.2
Author's Reply v1 5.88 49.8 45.3 42.3
Eku127's Results v1 bs 64 6.36 48.2 42.1 38.3
Eku127's Results v1 bs 128 6.29 50.0 43.6 39.9
Author's Reply v1-3 5.90 51.4 46.7 43.3
Eku127's Results v1-3 bs 64 5.93 51.4 46.0 42.3
Eku127's Results v1-3 bs 128 6.34 50.4 43.9 40.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions