By Sara Verdi
May 17, 2023
The first encounter between GitHub engineers and OpenAI’s large language models (LLMs) was extraordinary. Alireza Goudarzi, a senior machine learning researcher at GitHub, vividly recalls the moment: “As a theoretical AI researcher, my job has been to dissect deep learning models and understand their learning process. But it was the first time a model truly astonished me.” The remarkable capabilities of the LLM sparked the creation of GitHub Copilot, revolutionizing code generation and reshaping the future of development.
In this article, we delve into the behind-the-scenes journey of GitHub Copilot, exploring the experiences of the developers who collaborated with OpenAI’s LLMs and the significant impact it had on shaping the development of Copilot as we know it today—and beyond.
A Brief History of GitHub Copilot
The inception of GitHub Copilot can be traced back to June 2020, when OpenAI released GPT-3, a groundbreaking LLM that captivated developer communities worldwide. At GitHub, this release sparked a wave of possibilities and ignited discussions about general-purpose code generation.
“Should we think about general-purpose code generation?” was a recurring question in GitHub meetings, but the prevailing answer was always, “No, it’s too difficult; the current models can’t handle it,” explains Albert Ziegler, a principal machine learning engineer and member of the GitHub Next research and development team.
However, the landscape changed dramatically with the introduction of GPT-3. GitHub gained access to the OpenAI API, allowing them to experiment and evaluate the model’s performance on coding tasks. The GitHub Next team organized crowdsourced evaluations of self-contained coding problems, and the results were astonishing. Initially, the model solved around 50% of the problems but quickly surpassed the 90% mark.
Inspired by these promising outcomes, the team envisioned an AI-powered chatbot that could provide developers with immediate, runnable code snippets in response to their coding questions. The concept evolved further, leading them to integrate this technology directly into the IDE (Integrated Development Environment). Thus, the foundation for GitHub Copilot was laid.
Exploring Model Improvements
In 2021, OpenAI and GitHub partnered to develop the multilingual Codex model, an offshoot of GPT-3. Unlike its predecessors, Codex was trained on billions of lines of public code, equipping it with natural language generation capabilities and the ability to suggest code snippets.
While the Codex model was a significant breakthrough for GitHub Copilot, internal model improvements were necessary to optimize accuracy for end users. As GitHub prepared to launch GitHub Copilot as a technical preview, dedicated teams, including the Model Improvements team, were established to monitor and enhance the quality of GitHub Copilot. This involved close communication with the underlying LLM and focused efforts on improving user completion.
Prompt Crafting: The Art of Maximizing Completion
Prompt crafting, as an art form, plays a crucial role in maximizing completion rates with LLMs. John Berryman, a senior machine learning researcher on the Model Improvements team, sheds light on the process. He explains, “At its core, the LLM is a document completion model. During training, it learns to complete partial documents one token at a time. Prompt crafting involves creating a ‘pseudo-document’ that guides the model towards generating completions that benefit the customer.”
GitHub’s Model Improvements team recognized that LLMs trained on partial document completion are well-suited for code completion when provided with code as the partial document, which is precisely what GitHub Copilot aims to achieve.
The team explored different approaches to understand better how the model could be applied to code completion. They would present the model with a file and evaluate the code completions it generated. Results varied, ranging from acceptable to exceptional, with occasional seemingly magical effects. Berryman explains that the key lies in leveraging additional context from the IDE to guide the model towards better completions.
“One of our favorite tricks was pulling similar texts from the user’s neighboring editor tabs,” Berryman reveals. This approach significantly increased acceptance rates and improved the relevance of suggested code snippets. By considering additional contextual information, GitHub Copilot mimics how developers switch between tabs to reference code, but with even greater efficiency and accuracy.
The team continually explores various techniques and adjustments to optimize prompt crafting. GitHub Copilot aims to understand the user’s thought process and seamlessly integrate it into the algorithm. With the ability to anticipate and retrieve relevant code snippets, Copilot empowers developers to work more efficiently and productively.
Fine-Tuning for Enhanced Performance
Fine-tuning is a crucial technique in the AI domain to adapt pre-trained models for specific tasks or domains. It involves training a large-scale pre-trained model on a smaller, more specific dataset that aligns with the desired use case. This process enables the model to learn and adapt to the nuances of the new data, leading to improved performance on the target task.
While larger LLMs like Codex offer immense potential, their outputs may not always align perfectly with the desired outcomes. Defining what constitutes a “good” response statistically can be challenging, especially considering the complexity of training a model with billions of parameters.
Goudarzi emphasizes the importance of fine-tuning to enhance GitHub Copilot’s accuracy and relevance. The Model Improvements team aims to provide more focused and customized code completions by training the underlying Codex model on a user’s specific codebase. However, understanding why a user accepts or rejects a suggestion remains a significant challenge. Goudarzi explains, “We need to identify the contextual information that influenced the model’s output and caused it to generate helpful or unhelpful suggestions. Troubleshooting in traditional engineering may not be feasible, but we can refine our approach and ask the right questions to elicit the desired output.”
Through a combination of prompt crafting and fine-tuning, GitHub Copilot continues to evolve, adapting to the needs and preferences of developers. The goal is to make developers more productive by seamlessly integrating their coding style and thought processes with the power of LLMs.
GitHub Copilot: Then and Now
As OpenAI’s LLMs grew more powerful, GitHub Copilot experienced significant advancements, with the impact felt by end users. Johan Rosenkilde, a staff researcher on the GitHub Next team, vividly recalls the moment when the latest model iteration was integrated into GitHub Copilot. He shares an anecdote from a programming competition he participated in: “We were working on a project using F#, and initially, we had the old model for GitHub Copilot. However, after the new model was deployed, the difference was palpable. It was like magic.”
In the early stages, GitHub Copilot faced challenges, such as suggesting lines of code in a different programming language than the one used. To address this, the team implemented a headline in the prompt that listed the language the developer was working in. This simple addition provided clarity and improved the overall developer experience. However, challenges persisted when the file was ambiguous at the top, and the early models defaulted to suggesting code in the most popular languages.
To overcome this, the team devised a clever solution. They started including the file path at the top of the document, leveraging the end of the file name to determine the language. For example, if the file name were “connectiondatabase.py,” the model would understand that it was a Python file related to databases or connections. This resolved the language problem and allowed Copilot to suggest relevant boilerplate code based on the file name, enhancing the quality and user experience.
Months of iterative work it led to another significant breakthrough—the ability to lift code from other files. This feature had been a long-standing conversation but had remained abstract until Albert Ziegler, a member of the GitHub Next team, built a component that scanned other files open in the IDE. This new capability provided crucial additional information and substantially increased code acceptance rates. GitHub Copilot could now tap into the knowledge stored across multiple files, making suggestions that aligned with the developer’s context.
The Future of GitHub Copilot
After three years of working with generative AI models and LLMs, GitHub has witnessed their transformative potential firsthand. As the industry continues exploring generative AI possibilities, GitHub is committed to building new and innovative developer experiences. In March 2023, GitHub announced the future of Copilot with GitHub Copilot X—an AI-powered developer experience that extends beyond the IDE.
GitHub Copilot X aims to integrate AI capabilities into various components of the GitHub platform, including documentation and pull requests. The power of LLMs, coupled with dedicated training techniques, opens up a world of possibilities for developers. GitHub Copilot X is just one example of how these models can reshape how we interact with technology and revolutionize our work processes.
In conclusion, the collaboration between GitHub and OpenAI’s LLMs has unleashed the potential of GitHub Copilot, transforming code generation and enhancing developer productivity. Through continuous advancements in prompt crafting and fine-tuning, GitHub Copilot has evolved into a powerful tool that understands developers’ needs and delivers customized coding experiences.
As GitHub continues to push the boundaries of generative AI, the future of Copilot looks promising. With GitHub Copilot X on the horizon, developers can anticipate even more AI-powered features and improvements that will redefine their development workflows. The journey of GitHub Copilot is a testament to the incredible capabilities of LLMs and their ability to shape the future of coding.