Jiangjie Chen
Jiangjie Chen
Home
News
Experience
Awards
Featured
Recent
Topics
Publications
CV
Light
Dark
Automatic
Resources
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?
We introduce DetectBench, a benchmark for testing LLMs’ evidence detection in long contexts, and demonstrates that while existing LLMs lag behind human performance, the proposed Detective Reasoning Prompt and Finetuning methods can significantly improve their evidence detection and reasoning capabilities.
Zhouhong Gu
,
Lin Zhang
,
Xiaoxuan Zhu
,
Jiangjie Chen
,
Wenhao Huang
,
Yikai Zhang
,
Shusen Wang
,
Zheyu Ye
,
Yan Gao
,
Hongwei Feng
,
Yanghua Xiao
PDF
Cite
Code
Cite
×