FLASH-D

In this work we implement in C++ hardware accelerator for attention inference following FlashAttention-2 and our proposed Flash-D approach, that computes the output vector using as weighted sum of its previous value and an incoming value vector. The weight is computed incrementally and its formulation helps eliminate max computation needed for ensuring numerical stability and hide the final vector division operation in a sigmoid function calculation. In order to evaluate power metrics we run inference using Microsoft's PromptBench framework. More specifically, we run pytorch models from huggingface and extracted inter layer results for different prompts to use as inputs in main.cc.

Most of the floating-point functionality utilizes the Fast-Float4HLS library, publicly available on github.

Repository Hierarchy

This repository is organized as follows:

.
├── src
│   ├── defines.h
│   ├── flash_atten.h
│   ├── file_io.h
│   ├── fp_arithm.h
│   ├── logging.h
│   ├── main.cc
│   ├── math_ops.h
│   └── reduction.h
│
├── utils
│   ├── gen_pwl_coeff.py
│   └── pack.py
│
├── LICENSE
├── README.md
└── setup.sh

./src/ This directory contains the C++ source files.
- flash_atten.h file contains the implementation of FlashAttention-2 and Flash-D accelerators
./utils/ This directory contains Python utility scripts.
./setup.sh A bash script to fetch all required dependencies.

Pending Features

Update design files for to their optimized version.
Python scripts flows for running PromptBench outputs on C++ designs.
Provide extention files for 8-bit reduced precision Fast-Float4HLS datatypes.

Reference

TODO

Contributors

Currently active: Kosmas Alexandridis and Giorgos Dimitrakopoulos

License

Flash-D is licensed with the MIT License. You are completely free to re-distribute your work derived from Flash-D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FLASH-D

Repository Hierarchy

Pending Features

Reference

Contributors

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
utils		utils
LICENSE		LICENSE
README.md		README.md
setup.sh		setup.sh

License

ic-lab-duth/FLASH-D

Folders and files

Latest commit

History

Repository files navigation

FLASH-D

Repository Hierarchy

Pending Features

Reference

Contributors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages