Multi-query attention
suggested in Shazeer2019fast.
See also Grouped query attention
The idea is to share keys and values across attention heads in Transformer model‘s Attention mechanism.
suggested in Shazeer2019fast.
See also Grouped query attention
The idea is to share keys and values across attention heads in Transformer model‘s Attention mechanism.