Code Generation

AI code generation automates the creation of software code using machine learning models. It interprets human-readable descriptions and generates code, streamlining development processes and making coding more accessible to non-developers.

Code Language Model

Code Llama (opens in a new tab), Meta's AI coding tool, is a specialized LLM enhancing coding workflows. It generates, discusses, and assists with code, making it efficient for developers and accessible for learners. Key features include code generation, completion, debugging, and instruction-following. (blog)
Codex (opens in a new tab) (2021) by OpenAI, is based on GPT-3, translates natural language into code for various programming tasks. Proficient in multiple languages, it aids in code transpilation, explanation, and refactoring. Codex is integral to GitHub Copilot, bridging the gap between English instructions and popular coding languages. (paper)
Competitive programming with AlphaCode (opens in a new tab) (2021): AlphaCode, developed by DeepMind, is a natural language-to-code system based on GPT-3. It excels in competitive programming, achieving a rank within the top 54% of participants and outperforming human programmers in multiple contests on Codeforces. AlphaCode combines natural language understanding with problem-solving and statistical power, offering valuable assistance to human programmers. (article)
CodeGen (opens in a new tab) by Salesforce AI, is a conversational AI programming system that generates code based on developer prompts. Operating as a natural language-to-code system, it can sample executable code by adjusting model and dataset sizes, adding conversational capabilities. CodeGen aids programmers at all levels, streamlining the programming process and democratizing software development. (paper)
StarCoder (opens in a new tab) is a code LLM trained on source code and natural language text. Its training data incorporates more that 80 different programming languages as well as text extracted from GitHub issues and commits and from notebooks. (blog) (paper) (model)
CodeGeeX2 (opens in a new tab), an AI coding assistant, suggests code in real-time with 13B parameters. Pre-trained on a vast code corpus in 100+ languages, it's a left-to-right autoregressive decoder based on transformers. Supporting sequences up to 2,048, it aids in various programming languages and transforms into a custom assistant with few-shot ability, generating code based on provided examples. (code) (model)
DeepSeek Coder (opens in a new tab), a code LLM series, excels by being trained from scratch on 87% code and 13% English and Chinese natural language, outperforming existing open-source code LLMs. Pre-trained on 2T tokens, it's available in sizes from 1B to 33B. Models like DeepSeek-Coder-33B-Base use repo-level code corpus, while instruction-tuned versions, DeepSeek-Coder-33B-Instruct, emerge from fine-tuning the base model with 2B tokens of instruction data.

Advancements

AlphaDev (opens in a new tab): DeepMind's AlphaDev introduces faster sorting algorithms integrated into C++, boosting speed by up to 70% for shorter sequences and 1.7% for longer ones exceeding 250k elements. These algorithms can revolutionize various computing tasks, affecting everything from online search result ranking to data processing on devices.
AlphaCode (opens in a new tab) by DeepMind, a code LLM, excels in competitive computer program writing, ranking in the top 54% of participants in real-world programming competitions. This underscores the potential of deep learning models for tasks demanding critical thinking and problem-solving skills. (model) (dataset) (paper)

Papers

Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation (opens in a new tab) (2023): STOP is an innovative code generation method that iteratively enhances itself. It begins with an initial "improver" program that uses a language model to boost solutions for different tasks. STOP refines this improver, generating significantly better code. While it's not fully Recursively Self-Improving, it demonstrates the potential of modern language models for efficient self-improvement in code generation.

Audio Generation Image Generation