Replies: 2 comments 1 reply
-
Dear Xavier,
Thank you for your message and question.
I'll check this and get back to you in a few days (I'm traveling right
now).
In the meantime, you can continue because it doesn't change the reasoning
behind this small example.
Best regards,
Denis
…On Tue, Feb 28, 2023, 6:15 AM xavier ***@***.***> wrote:
Hi,
Thanks for providing notebook aside of your books - Just bought it a few
days ago and loving it.
One question from the Multi_Head_attention notebook on CH02:
print("Step 4: Scaled Attention Scores")
k_d=1 #square root of k_d=3 rounded down to 1 for this example
attention_scores = (Q @ K.transpose())/k_d
print(attention_scores)
In the line in the comment, shouldn't it be k_d = 4 ?
3 being the number of inputs in x and 4 being the number of dimension ?
—
Reply to this email directly, view it on GitHub
<#3>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHLCIVQ77JS3KP6VTNID5T3WZWCVHANCNFSM6AAAAAAVKID2C4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Dear Xavier,
I managed to access the code.
The code is OK.
However, You're right.
The code shows that the k dimensions were simplified to 1 for the example.
I'll update the comment in the next few days.
Best regards,
Denis
On Tue, Feb 28, 2023, 10:34 AM Denis Rothman ***@***.***>
wrote:
… Dear Xavier,
Thank you for your message and question.
I'll check this and get back to you in a few days (I'm traveling right
now).
In the meantime, you can continue because it doesn't change the reasoning
behind this small example.
Best regards,
Denis
On Tue, Feb 28, 2023, 6:15 AM xavier ***@***.***> wrote:
> Hi,
>
> Thanks for providing notebook aside of your books - Just bought it a few
> days ago and loving it.
>
> One question from the Multi_Head_attention notebook on CH02:
>
> print("Step 4: Scaled Attention Scores")
> k_d=1 #square root of k_d=3 rounded down to 1 for this example
> attention_scores = (Q @ K.transpose())/k_d
> print(attention_scores)
>
> In the line in the comment, shouldn't it be k_d = 4 ?
> 3 being the number of inputs in x and 4 being the number of dimension ?
>
> —
> Reply to this email directly, view it on GitHub
> <#3>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AHLCIVQ77JS3KP6VTNID5T3WZWCVHANCNFSM6AAAAAAVKID2C4>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***
> .com>
>
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Thanks for providing notebook aside of your books - Just bought it a few days ago and loving it.
One question from the Multi_Head_attention notebook on CH02:
print("Step 4: Scaled Attention Scores")
k_d=1 #square root of k_d=3 rounded down to 1 for this example
attention_scores = (Q @ K.transpose())/k_d
print(attention_scores)
In the line in the comment, shouldn't it be k_d = 4 ?
3 being the number of inputs in x and 4 being the number of dimension ?
My question is why is k_d different than d_model ?
Beta Was this translation helpful? Give feedback.
All reactions