#concept #todo : don’t fully grasp this intuitively yet…
QKV
stands for >> query, key, value
Query
in QKV >> vectors that represent input elements
Key
in QKV >> vectors that represent input elements, but for the future prediction
Value
in QKV >> actual content from input that is aggregated into the output
References
Notes
For unsupervised language model training like GPT, 𝑄,𝐾,𝑉 are usually from the same source, so such operation is also called self-attention.
For the machine translation task in the second paper, it first ≠ self-attention separately to source and target sequences, then on top of that it applies another attention where 𝑄Q is from the target sequence and 𝐾,𝑉K,V are from the source sequence.
For recommendation systems, 𝑄Q can be from the target items, 𝐾,𝑉K,V can be from the user profile and history.