Skip to content

ic-lab-duth/Fused-ExpMul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fused-ExpMul

In this work we implement in C++ FlashAttention based hardware accelerators with our proposed operator ExpMul, that fuses floating-point exponent function calculation and multiplication into simple add and shift operations using fixed-point arithmetic, withought the need for additional conversion back to floating-point domain since the result is given directly as a floating-point number. In order to evaluate power metrics we run inference using the Google's FLAN-T5 LLM. More specifically, we run the pytorch model from huggingface and extracted inter layer results for the different tests included in GLUE dataset to use as inputs in main.cc.

Most of the floating-point functionality utilizes the Fast-Float4HLS library, publicly available on github.

Repository Hierarchy

This repository is organized as follows:

.
├── src
│   ├── attnetion.h
│   ├── bf16_arithm.h
│   ├── defines.h
│   ├── file_io.h
│   ├── fused_operators.h
│   ├── logging.h
│   ├── main.cc
│   ├── math_ops.h
│   └── reduction.h
│
├── utils
│   ├── gen_pwl_coeff.py
│   └── pack.py
│
├── LICENSE
├── README.md
└── setup.sh
  • ./src/ This directory contains the C++ implementation of FlashAttention based accelerators with ExpMul operator.
    • attention.h file contains the implementation of FlashAttention Accelerators
    • fused_operators.h file contains the implementation of ExpMul operator
  • ./utils/ This directory contains Python utility scripts.
  • ./setup.sh A bash script to fetch all required dependencies.

Pending Features

  • Python scripts for automatically loading and extracting FLAN-T5 input on GLUE.
  • Fix Dependency issues regarding HLS math library and Fast-Float4HLS.

Reference

TODO

Contributors

Currently active: Kosmas Alexandridis and Giorgos Dimitrakopoulos

License

Fused-ExpMul is licensed with the MIT License. You are completely free to re-distribute your work derived from Fused-ExpMul

About

FlashAttention2 implementation with fused exponent and multiplication operators

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published