GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

One significant limitation of GM watermark lies in their production of identical responses to the same queries.

Abstract

Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity. Among these, the GumbelSoft watermark (a softmax variant of the Logits-Addition watermark) demonstrates superior performance in high diversity settings, with its AUROC score outperforming those of the two alternative variants by 0.1 to 0.3 and surpassing other decoding-based watermarking methods by a minimum of 0.1.

Type
Publication
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)
Jiangjie Chen
Jiangjie Chen
Ph.D. Candidate

His research interests mainly include natural language reasoning and large language models.