Kimi K2-Instruct-0905 is the newest and most advanced release in the Kimi K2 family. It is a cutting-edge mixture-of-experts (MoE) language model, activating 32 billion parameters out of a total architecture size of 1 trillion parameters.
Key Highlights
- Stronger agentic coding intelligence: This version delivers notable gains across public benchmarks and practical coding-agent tasks.
- Enhanced frontend development: Kimi K2-Instruct-0905 brings improvements to both usability and design in frontend programming scenarios.
- Extended context capacity: The model’s context window has been expanded from 128k to 256k tokens, enabling more reliable performance on long-horizon tasks.
Model Summary
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 1T |
| Activated Parameters | 32B |
| Number of Layers (Dense layer included) | 61 |
| Number of Dense Layers | 1 |
| Attention Hidden Dimension | 7168 |
| MoE Hidden Dimension (per Expert) | 2048 |
| Number of Attention Heads | 64 |
| Number of Experts | 384 |
| Selected Experts per Token | 8 |
| Number of Shared Experts | 1 |
| Vocabulary Size | 160K |
| Context Length | 256K |
| Attention Mechanism | MLA |
| Activation Function | SwiGLU |
3. Evaluation Results
| Benchmark | Metric | K2-Instruct-0905 | K2-Instruct-0711 | Qwen3-Coder-480B-A35B-Instruct | GLM-4.5 | DeepSeek-V3.1 | Claude-Sonnet-4 | Claude-Opus-4 |
|---|---|---|---|---|---|---|---|---|
| SWE-Bench verified | ACC | 69.2 ± 0.63 | 65.8 | 69.6* | 64.2* | 66.0* | 72.7* | 72.5* |
| SWE-Bench Multilingual | ACC | 55.9 ± 0.72 | 47.3 | 54.7* | 52.7 | 54.5* | 53.3* | – |
| Multi-SWE-Bench | ACC | 33.5 ± 0.28 | 31.3 | 32.7 | 31.7 | 29.0 | 35.7 | – |
| Terminal-Bench | ACC | 44.5 ± 2.03 | 37.5 | 37.5* | 39.9* | 31.3* | 36.4* | 43.2* |
| SWE-Dev | ACC | 66.6 ± 0.72 | 61.9 | 64.7 | 63.2 | 53.3 | 67.1 | – |
All results for K2-Instruct-0905 are reported as mean ± standard deviation across five independent, full test-set runs. Prior to each run, we prune the repository to remove any Git objects unreachable from the target commit. This ensures that the agent has access only to code that would legitimately exist at that point in history.
With the exception of Terminal-Bench (Terminus-2), every benchmark result was obtained using our in-house evaluation harness. This harness is adapted from SWE-agent, but we apply two key modifications:
- context windows for the Bash and Edit tools are clamped, and
- the system prompt is rewritten to align with the task semantics.
Baseline figures marked with an asterisk (*) are taken directly from their official reports or public leaderboards. All other metrics were re-evaluated by us under the same conditions used for K2-Instruct-0905.
For SWE-Dev, we take an additional precaution: we overwrite the original repository files and remove any test file that explicitly exercises the functions the agent is tasked with generating. This eliminates the possibility of indirect hints about the target implementation.
You can access Kimi K2’s API on https://platform.moonshot.ai , we provide OpenAI/Anthropic-compatible API for you.
Also, yo Lucan access it on Groq: