It seems there is a problem with the "make_std_mask" code.


import torch
tgt = torch.tensor([[1, 2, 3, 0, 0]])
pad = 0

padding_mask = (tgt != pad).unsqueeze(-2)  # shape: [1, 1, 5]
sub_mask = torch.triu(torch.ones((1, 5, 5)), diagonal=1).bool()
sub_mask = ~sub_mask

final_mask = padding_mask & sub_mask  # shape: [1, 5, 5]
print(final_mask[0].int())  

**Assume the input is [1, 2, 3, 0, 0]**
**Output is**
**tensor([[1, 0, 0, 0, 0],
        [1, 1, 0, 0, 0],
        [1, 1, 1, 0, 0],
        [1, 1, 1, 0, 0],
        [1, 1, 1, 0, 0]], dtype=torch.int32)**
**When changed to padding_mask = (tgt != pad).unsqueeze(-1)**
**Obtain the correct output**
**tensor([[1, 0, 0, 0, 0],
        [1, 1, 0, 0, 0],
        [1, 1, 1, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]], dtype=torch.int32)**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

It seems there is a problem with the "make_std_mask" code. #137

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

It seems there is a problem with the "make_std_mask" code. #137

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions