Skip to content

Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue #47

Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue

Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue #47