Published on Wed Mar 17 2021

Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs

Yanlin Wang, Hui Li
0
0
0
Abstract

Code completion has become an essential component of integrated development environments. Contemporary code completion methods rely on the abstract syntax tree (AST) to generate syntactically correct code. However, they cannot fully capture the sequential and repetitive patterns of writing code and the structural information of the AST. To alleviate these problems, we propose a new code completion approach named CCAG, which models the flattened sequence of a partial AST as an AST graph. CCAG uses our proposed AST Graph Attention Block to capture different dependencies in the AST graph for representation learning in code completion. The sub-tasks of code completion are optimized via multi-task learning in CCAG, and the task balance is automatically achieved using uncertainty without the need to tune task weights. The experimental results show that CCAG has superior performance than state-of-the-art approaches and it is able to provide intelligent code completion.

Thu Sep 17 2020
NLP
GraphCodeBERT: Pre-training Code Representations with Data Flow
GraphCodeBERT is a pre-trained model for programming language that considers the inherent structure of code. The model uses a graph-guided masked attention function to incorporate the code structure. It can be used for code search, clone detection, code translation, and code refinement.
1
0
1
Thu Sep 02 2021
NLP
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Pre-trained models for Natural Languages (NL) like BERT and GPT have been shown to transfer well to Programming Languages (PL) We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developers.
6
31
90
Fri Nov 22 2019
Machine Learning
TreeGen: A Tree-Based Transformer Architecture for Code Generation
State-of-the-art approaches rely on neural networks for code generation. TreeGen uses the attention mechanism of Transformers to alleviate the long dependency problem. It also introduces a novel AST reader andencoder.
0
0
0
Wed May 26 2021
Machine Learning
TreeBERT: A Tree-Based Pre-Trained Model for Programming Language
TreeBERT is a tree-based pre-trained model for improving programming language-oriented generation tasks. The model is trained by tree masked language modeling (TMLM) and node order prediction (NOP) With NOP, TreeBERT extracts the syntactical structure.
2
0
0
Tue May 18 2021
Artificial Intelligence
CoTexT: Multi-task Learning with Code-Text Transformer
0
0
0
Mon Sep 16 2019
Artificial Intelligence
A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning
Code completion can accelerate software development by suggesting libraries, APIs, and method names in real-time. Existing statistical language models can improve the performance of code completion tools through learning from large-scale software repositories. But these models suffer from three major drawbacks.
0
0
0