Published on Tue Aug 10 2021
Megadiff: A Dataset of 600k Java Source Code Changes Categorized by Diff Size
See More ...
This paper presents Megadiff, a dataset of source code diffs. It focuses on
Java, with strict inclusion criteria based on commit message and diff size.
Megadiff contains 663 029 Java diffs that can be used for research on commit
comprehension, fault localization, automated program repair, and machine
learning on code changes.