Microsoft has open sourced enhanced versions of transformer inference optimizations into the ONNX Runtime and extended them to work on both GPU and CPU.