Skip to content

📚Towards an Automated Source Code Formatting System

License

Notifications You must be signed in to change notification settings

tkanoutas/Thesis

Repository files navigation

Thesis

📚 Table of Contents

🔖 Abstract

Towards an Automated Source Code Formatting System

Nowadays, the concept of software has prevailed in all aspects of human daily life, offering significant solutions to a wide variety of issues. The need for producing reliable and functional software within short timeframes that can adapt to possible changes is constantly increasing. In recent years, there has been significant research activity in the field of software development process optimization, while the vast availability of open-source software projects in repositories such as GitHub makes accessing large volumes of code data easy. Leveraging this information can serve as a catalyst for creating useful tools that can greatly accelerate the software development process while improving communication and collaboration among development teams.

This thesis presents an integrated system for automated source code formatting using machine learning techniques. The primary goal of the system is to detect and correct formatting errors that deviate from the standards set by the development team, ensuring its readability and thus facilitating easier maintenance. The system utilizes LSTM deep neural network models in combination with N-gram statistical language models for detecting formatting errors, while a specific mechanism for correcting these errors is proposed. Additionally, an evaluation mechanism for code formatting is proposed, aiming to quantify this abstract concept.

The system designed within the scope of this thesis is evaluated on 8000 Java code files obtained from the CodRep 2019 competition. Through the observation of the final results, we conclude that the system performs effectively in both detecting and correcting formatting errors.

Thomas Kanoutas
Electrical & Computer Engineering
Intelligent Systems & Software Engineering Labgroup (ISSEL)
Aristotle University of Thessaloniki, Greece
July 2023

🛠️ System's Architecture

Usage

  • Case 1: Formatting error fix with token deletion

* Case 2: Formatting error fix with replacing the token

* Case 3: Formatting error fix with appending a new token

✅ Prerequisities

pip install numpy
pip install pandas
pip install tensorflow
pip install nltk
pip install javac-parser
pip install seaborn
pip install matplotlib

📁 Directory Structure


├── ..
 |       ├── 10_Gram_Model: Contains the trained N-gram model.
 |       ├── LSTM_Model: Contains the trained LSTM model.
 |       ├── LSTM_Model_Training: Python scripts regarding LSTM training.
 |       ├── Scripts: Basic Python scripts that implement project's functionalities.
 |       ├── System_Evaluation: Scripts for evaluating the final system.
 |       ├── utils: Tools and utilities.

🚀 Technology Stack

⚠️ License

Distributed under the MIT License. See LICENSE.txt for more information.

🤝 Contact

Let's connect:

  • LinkedIn Badge