Keyphrase generation aims to produce a set of phrases summarizing the essentials of a given document. In this paper, we propose CDKGen, a Transformer-based keyphrase generator. We also adopt a copy mechanism to enhance our model via selecting appropriate words from documents.
Keyphrase generation aims to produce a set of phrases summarizing the
essentials of a given document. Conventional methods normally apply an
encoder-decoder architecture to generate the output keyphrases for an input
document, where they are designed to focus on each current document so they
inevitably omit crucial corpus-level information carried by other similar
documents, i.e., the cross-document dependency and latent topics. In this
paper, we propose CDKGen, a Transformer-based keyphrase generator, which
expands the Transformer to global attention with cross-document attention
networks to incorporate available documents as references so as to generate
better keyphrases with the guidance of topic information. On top of the
proposed Transformer + cross-document attention architecture, we also adopt a
copy mechanism to enhance our model via selecting appropriate words from
documents to deal with out-of-vocabulary words in keyphrases. Experiment
results on five benchmark datasets illustrate the validity and effectiveness of
our model, which achieves the state-of-the-art performance on all datasets.
Further analyses confirm that the proposed model is able to generate keyphrases
consistent with references while keeping sufficient diversity. The code of
CDKGen is available at https://github.com/SVAIGBA/CDKGen.