Meet LocAgent: Graph-Based AI Agents Transforming Code Localization for Scalable Software Maintenance

Ledger
Meet LocAgent: Graph-Based AI Agents Transforming Code Localization for Scalable Software Maintenance
Bybit


Software maintenance is an integral part of the software development lifecycle, where developers frequently revisit existing codebases to fix bugs, implement new features, and optimize performance. A critical task in this phase is code localization, pinpointing specific locations in a codebase that must be modified. This process has gained significance with modern software projects’ increasing scale and complexity. The growing reliance on automation and AI-driven tools has led to integrating large language models (LLMs) in supporting tasks like bug detection, code search, and suggestion. However, despite the advancement of LLMs in language tasks, enabling these models to understand the semantics and structures of complex codebases remains a technical challenge researchers strive to overcome.

Talking about the problems, one of the most persistent problems in software maintenance is accurately identifying the relevant parts of a codebase that need changes based on user-reported issues or feature requests. Often, issue descriptions in natural language mention symptoms but not the actual root cause in code. This disconnect makes it difficult for developers and automated tools to link descriptions to the exact code elements needing updates. Furthermore, traditional methods struggle with complex code dependencies, especially when the relevant code spans multiple files or requires hierarchical reasoning. Poor code localization contributes to inefficient bug resolution, incomplete patches, and longer development cycles.

Prior methods for code localization mostly depend on dense retrieval models or agent-based approaches. Dense retrieval requires embedding the entire codebase into a searchable vector space, which is difficult to maintain and update for large repositories. These systems often perform poorly when issue descriptions lack direct references to relevant code. On the other hand, some recent approaches use agent-based models that simulate a human-like exploration of the codebase. However, they often rely on directory traversal and lack an understanding of deeper semantic links like inheritance or function invocation. This limits their ability to handle complex relationships between code elements not explicitly linked.

A team of researchers from Yale University, University of Southern California, Stanford University, and All Hands AI developed LocAgent, a graph-guided agent framework to transform code localization. Rather than depending on lexical matching or static embeddings, LocAgent converts entire codebases into directed heterogeneous graphs. These graphs include nodes for directories, files, classes, and functions and edges to capture relationships like function invocation, file imports, and class inheritance. This structure allows the agent to reason across multiple levels of code abstraction. The system then applies tools like SearchEntity, TraverseGraph, and RetrieveEntity to allow LLMs to explore the system step-by-step. The use of sparse hierarchical indexing ensures rapid access to entities, and the graph design supports multi-hop traversal, which is essential for finding connections across distant parts of the codebase.

Ledger

LocAgent performs indexing within seconds and supports real-time usage, making it practical for developers and organizations. The researchers fine-tuned two open-source models, Qwen2.5-7B, and Qwen2.5-32B, on a curated set of successful localization trajectories. These models performed impressively on standard benchmarks. For instance, on the SWE-Bench-Lite dataset, LocAgent achieved 92.7% file-level accuracy using Qwen2.5-32B, compared to 86.13% with Claude-3.5 and lower scores from other models. On the newly introduced Loc-Bench dataset, which contains 660 examples across bug reports (282), feature requests (203), security issues (31), and performance problems (144), LocAgent again showed competitive results, achieving 84.59% Acc@5 and 87.06% Acc@10 at the file level. Even the smaller Qwen2.5-7B model delivered performance close to high-cost proprietary models while costing only $0.05 per example, a stark contrast to the $0.66 cost of Claude-3.5.

The core mechanism relies on a detailed graph-based indexing process. Each node, whether representing a class or function, is uniquely identified by a fully qualified name and indexed using BM25 for flexible keyword search. The model enables agents to simulate a reasoning chain that begins with extracting issue-relevant keywords, proceeds through graph traversals, and concludes with code retrievals for specific nodes. These actions are scored using a confidence estimation approach based on prediction consistency over multiple iterations. Notably, when the researchers disabled tools like TraverseGraph or SearchEntity, performance dropped by up to 18%, highlighting their importance. Further, multi-hop reasoning was critical; fixing traversal hops to one led to a decline in function-level accuracy from 71.53% to 66.79%.

When applied to downstream tasks like GitHub issue resolution, LocAgent increased the issue pass rate (Pass@10) from 33.58% in baseline Agentless systems to 37.59% with the fine-tuned Qwen2.5-32B model. The framework’s modularity and open-source nature make it a compelling solution for organizations looking for in-house alternatives to commercial LLMs. The introduction of Loc-Bench, with its broader representation of maintenance tasks, ensures fair evaluation without contamination from pre-training data.

Some Key Takeaways from the Research on LocAgent include the following:

LocAgent transforms codebases into heterogeneous graphs for multi-level code reasoning.  

It achieved up to 92.7% file-level accuracy on SWE-Bench-Lite with Qwen2.5-32B.  

Reduced code localization cost by approximately 86% compared to proprietary models. Introduced Loc-Bench dataset with 660 examples: 282 bugs, 203 features, 31 security, 144 performance. 

Fine-tuned models (Qwen2.5-7B, Qwen2.5-32B) performed comparably to Claude-3.5.  

Tools like TraverseGraph and SearchEntity proved essential, with accuracy drops when disabled.  

Demonstrated real-world utility by improving GitHub issue resolution rates.

It offers a scalable, cost-efficient, and effective alternative to proprietary LLM solutions.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source link

Bitbuy

Be the first to comment

Leave a Reply

Your email address will not be published.


*