Abstract: Due to the substantial demands of storage and computation imposed by large language models (LLMs), there has been a surge of research interest in their hardware acceleration. As a technique ...