Generating Question-Answer Pairs from Document with Sequence to Sequence Modeling

Author: Pei-Ju Tang

Publish Year: 2018-07

Update by: March 27, 2025

摘要

In recent years, many studies in question answering and question generation have been applied in reading comprehension with satisfactory performance. Question answering aims to answer questions by reading documents, and question generation attempts to generate diverse questions from given documents and as reading test questions. However, some questions generated in question generation studies are beyond the scope of the given documents themselves, so that the machine cannot find an appropriate answer. In this thesis, we propose an approach to resolve the above question-answer unpaired situations. Given a document and the question type, the system can output a question-answer pair with quality. The question and answer generated must focus on the input document with fluency, relevance, and correctness. In this thesis, we use the attention-based sequence-to-sequence model and add hierarchical input to the model encoder. In order to solve the problem that the model generating question and answer from huge vocabulary cannot converge, the output question and answer adopts a dynamic vocabulary, which includes not only commonly used words, but also words dynamically changing with the document. Such approach makes the training to converge and improves model capabilities. The trained model can generate question-answer pairs as well as do question answering and question generation. The resulting performance is better than the retrieval-based model.