With current technological advancements, the norm in computer vision and natural language processing research is transformer-based models. Transformer designs have shown exceptional performance increases since they were first introduced in 2017, leading to significant advances in deep learning and artificial intelligence. However, the significant computational and memory requirements of their self-attention processes (needed for capturing various syntactic and semantic representations from lengthy input sequences) have restricted their more extensive real-world applicability. In a recent research paper titled 'Transformers with Multiresolution Attention Heads,' researchers suggest MrsFormer, a unique transformer architecture that employs Multiresolution-head Attention (MrsHA) to approximate output sequences. Compared to