Resources

DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

We introduce DetectBench, a benchmark for testing LLMs’ evidence detection in long contexts, and demonstrates that while existing LLMs lag behind human performance, the proposed Detective Reasoning Prompt and Finetuning methods can significantly improve their evidence detection and reasoning capabilities.

Zhouhong Gu, Lin Zhang, Xiaoxuan Zhu, Jiangjie Chen, Wenhao Huang, Yikai Zhang, Shusen Wang, Zheyu Ye, Yan Gao, Hongwei Feng, Yanghua Xiao

DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?