Kimi-K2-Instruct-0905

Kimi K2-Instruct-0905 is the newest and most advanced release in the Kimi K2 family. It is a cutting-edge mixture-of-experts (MoE) language model, activating 32 billion parameters out of a total architecture size of 1 trillion parameters.

Key Highlights

Stronger agentic coding intelligence: This version delivers notable gains across public benchmarks and practical coding-agent tasks.
Enhanced frontend development: Kimi K2-Instruct-0905 brings improvements to both usability and design in frontend programming scenarios.
Extended context capacity: The model’s context window has been expanded from 128k to 256k tokens, enabling more reliable performance on long-horizon tasks.

Model Summary


Architecture	Mixture-of-Experts (MoE)
Total Parameters	1T
Activated Parameters	32B
Number of Layers (Dense layer included)	61
Number of Dense Layers	1
Attention Hidden Dimension	7168
MoE Hidden Dimension (per Expert)	2048
Number of Attention Heads	64
Number of Experts	384
Selected Experts per Token	8
Number of Shared Experts	1
Vocabulary Size	160K
Context Length	256K
Attention Mechanism	MLA
Activation Function	SwiGLU

3. Evaluation Results

Benchmark	Metric	K2-Instruct-0905	K2-Instruct-0711	Qwen3-Coder-480B-A35B-Instruct	GLM-4.5	DeepSeek-V3.1	Claude-Sonnet-4	Claude-Opus-4
SWE-Bench verified	ACC	69.2 ± 0.63	65.8	69.6*	64.2*	66.0*	72.7*	72.5*
SWE-Bench Multilingual	ACC	55.9 ± 0.72	47.3	54.7*	52.7	54.5*	53.3*	–
Multi-SWE-Bench	ACC	33.5 ± 0.28	31.3	32.7	31.7	29.0	35.7	–
Terminal-Bench	ACC	44.5 ± 2.03	37.5	37.5*	39.9*	31.3*	36.4*	43.2*
SWE-Dev	ACC	66.6 ± 0.72	61.9	64.7	63.2	53.3	67.1	–

All results for K2-Instruct-0905 are reported as mean ± standard deviation across five independent, full test-set runs. Prior to each run, we prune the repository to remove any Git objects unreachable from the target commit. This ensures that the agent has access only to code that would legitimately exist at that point in history.

With the exception of Terminal-Bench (Terminus-2), every benchmark result was obtained using our in-house evaluation harness. This harness is adapted from SWE-agent, but we apply two key modifications:

context windows for the Bash and Edit tools are clamped, and
the system prompt is rewritten to align with the task semantics.

Baseline figures marked with an asterisk (*) are taken directly from their official reports or public leaderboards. All other metrics were re-evaluated by us under the same conditions used for K2-Instruct-0905.

For SWE-Dev, we take an additional precaution: we overwrite the original repository files and remove any test file that explicitly exercises the functions the agent is tasked with generating. This eliminates the possibility of indirect hints about the target implementation.

You can access Kimi K2’s API on https://platform.moonshot.ai , we provide OpenAI/Anthropic-compatible API for you.

Also, yo Lucan access it on Groq:

Who needs sleep?
Kimi-K2-Instruct-0905 just landed.
200+ T/s, $1.50/M tokens.
256k context window.
Built for coding. Rivals Sonnet 4. Available now. 👇 pic.twitter.com/nmOxVdQGQA
— Groq Inc (@GroqInc) September 5, 2025