Skip to content

Handling with long answers #42

Discussion options

You must be logged in to vote

👋 @orannahum-qualifire,

Thanks for creating this discussion! Building on what @dskarbrevik said above, we are actively working on two scorers specifically designed for longform responses: #19 graph-based scorers proposed by Jiang et al., 2024 and #46 LUQ proposed by Zhang et al., 2024. The former decomposes responses into claims, while the latter averages across sentences.

Thanks for sharing the RAGTruth benchmark. I took a look, and from what I understand, each row in this benchmark dataset contains a) a prompt containing a question and context, b) a generated response, and c) an indicator of whether the generated response contains a hallucination. To evaluate the effectiveness of UQLM s…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by dylanbouchard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants