CodeGeeX is a large-scale multilingual code generation model with 13 billion parameters, developed to assist programmers with various coding tasks.
Multilingual code generation: CodeGeeX can generate executable programs in multiple mainstream programming languages, including Python, C++, Java, JavaScript, and Go.
Cross-lingual code translation: The model supports translating code snippets between different programming languages with high accuracy.
Customizable programming assistant: CodeGeeX is available as a free VS Code extension, offering features like code completion, explanation, and summarization to enhance the coding experience.
Open-source and cross-platform: The model's code and weights are publicly available for research purposes. It supports both Ascend and NVIDIA platforms, capable of running on a single Ascend 910, NVIDIA V100, or A100 GPU.
Pre-training: As of June 22, 2022, CodeGeeX was trained on more than 850 billion tokens from a large code corpus spanning over 20 programming languages.
Benchmark: The developers also introduced HumanEval-X, a multilingual benchmark containing 820 human-crafted coding problems in 5 programming languages to standardize the evaluation of multilingual code generation and translation.
CodeGeeX aims to improve coding efficiency and assist programmers in various tasks, from code generation to translation and explanation, across multiple programming languages.