Multi-query attention

suggested in Shazeer2019fast.

See also Grouped query attention

The idea is to share keys and values across attention heads in Transformer model‘s Attention mechanism.