k_d dimension #3

xpivan · 2023-02-28T05:14:48Z

xpivan
Feb 28, 2023

Hi,

Thanks for providing notebook aside of your books - Just bought it a few days ago and loving it.

One question from the Multi_Head_attention notebook on CH02:

print("Step 4: Scaled Attention Scores")
k_d=1 #square root of k_d=3 rounded down to 1 for this example
attention_scores = (Q @ K.transpose())/k_d
print(attention_scores)

In the line in the comment, shouldn't it be k_d = 4 ?
3 being the number of inputs in x and 4 being the number of dimension ?
My question is why is k_d different than d_model ?

Denis2054 · 2023-02-28T09:34:41Z

Denis2054
Feb 28, 2023
Maintainer

Dear Xavier, Thank you for your message and question. I'll check this and get back to you in a few days (I'm traveling right now). In the meantime, you can continue because it doesn't change the reasoning behind this small example. Best regards, Denis

…

On Tue, Feb 28, 2023, 6:15 AM xavier ***@***.***> wrote: Hi, Thanks for providing notebook aside of your books - Just bought it a few days ago and loving it. One question from the Multi_Head_attention notebook on CH02: print("Step 4: Scaled Attention Scores") k_d=1 #square root of k_d=3 rounded down to 1 for this example attention_scores = (Q @ K.transpose())/k_d print(attention_scores) In the line in the comment, shouldn't it be k_d = 4 ? 3 being the number of inputs in x and 4 being the number of dimension ? — Reply to this email directly, view it on GitHub <#3>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHLCIVQ77JS3KP6VTNID5T3WZWCVHANCNFSM6AAAAAAVKID2C4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.*** com>

0 replies

Denis2054 · 2023-02-28T10:47:31Z

Denis2054
Feb 28, 2023
Maintainer

Dear Xavier, I managed to access the code. The code is OK. However, You're right. The code shows that the k dimensions were simplified to 1 for the example. I'll update the comment in the next few days. Best regards, Denis On Tue, Feb 28, 2023, 10:34 AM Denis Rothman ***@***.***> wrote:

…

Dear Xavier, Thank you for your message and question. I'll check this and get back to you in a few days (I'm traveling right now). In the meantime, you can continue because it doesn't change the reasoning behind this small example. Best regards, Denis On Tue, Feb 28, 2023, 6:15 AM xavier ***@***.***> wrote: > Hi, > > Thanks for providing notebook aside of your books - Just bought it a few > days ago and loving it. > > One question from the Multi_Head_attention notebook on CH02: > > print("Step 4: Scaled Attention Scores") > k_d=1 #square root of k_d=3 rounded down to 1 for this example > attention_scores = (Q @ K.transpose())/k_d > print(attention_scores) > > In the line in the comment, shouldn't it be k_d = 4 ? > 3 being the number of inputs in x and 4 being the number of dimension ? > > — > Reply to this email directly, view it on GitHub > <#3>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AHLCIVQ77JS3KP6VTNID5T3WZWCVHANCNFSM6AAAAAAVKID2C4> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.*** > .com> >

1 reply

Denis2054 Mar 1, 2023
Maintainer

I updated the comment which is now "simplified" instead of "rounded down"
Thanks for pointing this out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

k_d dimension #3

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

k_d dimension #3

Uh oh!

Uh oh!

xpivan Feb 28, 2023

Replies: 2 comments · 1 reply

Uh oh!

Denis2054 Feb 28, 2023 Maintainer

Uh oh!

Denis2054 Feb 28, 2023 Maintainer

Uh oh!

Denis2054 Mar 1, 2023 Maintainer

xpivan
Feb 28, 2023

Replies: 2 comments 1 reply

Denis2054
Feb 28, 2023
Maintainer

Denis2054
Feb 28, 2023
Maintainer

Denis2054 Mar 1, 2023
Maintainer