Request more experiment results to compare to other architecture.

Hi!
This work is pretty interesting, but I think there should are more results like in "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight" as they replace local self-attention with depth-wise convolution in Swin Transformer. Since you conduct an advanced one with a more simple architecture compared to SwinTransformer, so **I wonder if ConvMixer can get similar performance on object detection and semantic segmentation**.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Request more experiment results to compare to other architecture. #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request more experiment results to compare to other architecture. #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions