What is CodeT5?

CodeT5 is a cutting-edge pre-trained encoder-decoder model crafted specifically for the tasks of code comprehension and generation. This model is designed to be aware of identifiers and serves as a comprehensive framework suitable for a variety of coding challenges. Its official implementation in PyTorch stems from a research paper introduced by Salesforce Research at EMNLP 2021. Among its notable versions is CodeT5-large-ntp-py, which has been fine-tuned to achieve outstanding performance in Python code generation, serving as the foundation for our CodeRL strategy and securing impressive results in the APPS Python competition-level program synthesis benchmark. The repository contains all the necessary resources to replicate the experiments performed with CodeT5. Trained on a vast dataset consisting of 8.35 million functions across eight different programming languages—such as Python, Java, JavaScript, PHP, Ruby, Go, C, and C#—CodeT5 has shown remarkable performance, setting state-of-the-art results across 14 distinct sub-tasks in the code intelligence benchmark referred to as CodeXGLUE. Additionally, its ability to produce code directly from natural language input highlights both its adaptability and efficacy in programming contexts, making it a valuable tool for developers and researchers alike.

Screenshots and Video

CodeT5 Screenshot 1

Company Facts

Company Name:
Salesforce
Company Website:
github.com/salesforce/CodeT5

Product Details

Deployment
SaaS
Training Options
Documentation Hub
Support
Web-Based Support

Product Details

Target Company Sizes
Individual
1-10
11-50
51-200
201-500
501-1000
1001-5000
5001-10000
10001+
Target Organization Types
Mid Size Business
Small Business
Enterprise
Freelance
Nonprofit
Government
Startup
Supported Languages
English

CodeT5 Categories and Features