Baichuan Intelligence releases new strategy for optimizing Transformer efficiency

74
Wang Bingning, the pre-training director of Baichuan Intelligence, shared the latest research results on Transformer efficiency optimization at the "2024 Global Machine Learning Technology Conference". He proposed that by implementing the two optimization strategies of GQA and MQA, the I/O bottleneck problem of Transformer in the decoding stage can be effectively solved, thereby improving the inference efficiency.