Grouped-query attention
Attention mechanism used in the LLaMA.
Improved upon Multi query attention, suggested in Ainslie2023gqa.
It is an interpolation between multi-head and multi-query attention, with subgroups of query heads.
Attention mechanism used in the LLaMA.
Improved upon Multi query attention, suggested in Ainslie2023gqa.
It is an interpolation between multi-head and multi-query attention, with subgroups of query heads.