Published on Wed Feb 15 2012

An efficient high-quality hierarchical clustering algorithm for automatic inference of software architecture from the source code of a software system

Sarge Rogatch

A high-quality algorithm for hierarchical clustering of large software source code. This effectively allows to break the complexity of tens of millions of lines of source code. A human software engineer can comprehend a software system at high level.

0
0
0
Abstract

It is a high-quality algorithm for hierarchical clustering of large software source code. This effectively allows to break the complexity of tens of millions lines of source code, so that a human software engineer can comprehend a software system at high level by means of looking at its architectural diagram that is reconstructed automatically from the source code of the software system. The architectural diagram shows a tree of subsystems having OOP classes in its leaves (in the other words, a nested software decomposition). The tool reconstructs the missing (inconsistent/incomplete/inexistent) architectural documentation for a software system from its source code. This facilitates software maintenance: change requests can be performed substantially faster. Simply speaking, this unique tool allows to lift the comprehensible grain of object-oriented software systems from OOP class-level to subsystem-level. It is estimated that a commercial tool, developed on the basis of this work, will reduce software maintenance expenses 10 times on the current needs, and will allow to implement next-generation software systems which are currently too complex to be within the range of human comprehension, therefore can't yet be designed or implemented. Implemented prototype in Open Source: http://sourceforge.net/p/insoar/code-0/1/tree/

Tue Jul 26 2016
Artificial Intelligence
OntoCat: Automatically categorizing knowledge in API Documentation
Most application development happens in the context of complex APIs. There is a growing need to develop well-organized ways to access the knowledge latent in the documentation. Our system, OntoCat, introduces total nine different features and their semantic and statistical combinations to classify the knowledge types.
0
0
0
Mon Oct 10 2011
Artificial Intelligence
Open Source Software: How Can Design Metrics Facilitate Architecture Recovery?
Reuse can be facilitated by architectural knowledge of the software. The effort required to comprehend the system's source code and discover its architecture is a major drawback in reuse.
0
0
0
Sun May 06 2018
Machine Learning
Automatic Classification of Object Code Using Machine Learning
Machine learning techniques can be applied to whole files or file fragments to classify them for analysis. We show that using simple byte-value histograms we retain enough information to retain high accuracy.
0
0
0
Thu Jun 02 2016
Artificial Intelligence
Mining Software Components from Object-Oriented APIs
Object-oriented Application Programing Interfaces (APIs) support software reuse by providing pre-implemented functionalities. We propose an approach for reengineering object-oriented APIs into component-based ones. We mine components as a group of classes based on the frequency they are used
0
0
0
Thu Jun 18 2020
NLP
Learning to Format Coq Code Using Language Models
Coq code tends to be written in distinct manners by different people and teams. While coding conventions are important for comprehension and maintenance, they are costly to document and enforce. Rule-based formatters, such as Coq'sbeautifier, have limited flexibility and only capture small fractions of desired conventions. We believe that application of language models - a
0
0
0
Sun May 17 2020
Artificial Intelligence
Quantifying the Impact on Software Complexity of Composable Inductive Programming using Zoea
Zoea is a simple declarative approach to software development. Zoea programs are approximately 50% the complexity of equivalent programs in a conventional language and on average equal in size.
0
0
0