Home About YY Help Changes

Multi-query attention

suggested in Shazeer2019fast.

See also Grouped query attention

The idea is to share keys and values across attention heads in Transformer model‘s Attention mechanism.

« Multi-query attention »