Skip to content

[Roadmap] FlashAttention3 Support as SGLang Attention Backend #4709

@hebiao064

Description

@hebiao064

Functionality

Documentation and Benchmark:

Perf Optimization and Accuracy Problems

Success Criteria:

  • The latency should be on par with vLLM FlashAttention3 and SGLang's FlashInfer implementation
  • The accuracy should be on par with vLLM FlashAttention3 and SGLang's FlashInfer implementation

Other issues we surfaced but not scoped in this task:

Sub-issues

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions