-
1
Hermes 4
Nous Research
Experience dynamic, human-like interactions with innovative reasoning power.
Hermes 4 marks a significant leap forward in Nous Research's lineup of neutrally aligned, steerable foundational models, showcasing advanced hybrid reasoners capable of seamlessly shifting between creative, expressive outputs and succinct, efficient answers tailored to user needs. This model is designed to emphasize user and system commands above any corporate ethical considerations, resulting in a more conversational and engaging interaction style that avoids sounding overly authoritative or ingratiating, while also promoting opportunities for imaginative roleplay. By incorporating a specific tag in prompts, users can unlock a higher level of reasoning that is resource-intensive, enabling them to tackle complex problems without sacrificing efficiency for simpler inquiries. With a training dataset that is 50 times larger than that of Hermes 3, much of which has been synthetically generated through Atropos, Hermes 4 shows significant performance improvements. This evolution not only enhances accuracy but also expands the scope of applications for which the model can be utilized effectively. Furthermore, the increased capabilities of Hermes 4 pave the way for innovative uses across various domains, demonstrating a strong commitment to advancing user experiences.
-
2
K2 Think
Institute of Foundation Models
Revolutionary reasoning model: compact, powerful, and open-source.
K2 Think is an innovative open-source advanced reasoning model that has emerged from a collaborative effort between the Institute of Foundation Models at MBZUAI and G42. Despite having a relatively modest size of 32 billion parameters, K2 Think delivers performance that competes with top-tier models that possess much larger parameter counts. Its primary strength is in mathematical reasoning, where it has achieved excellent rankings on distinguished benchmarks, including AIME ’24/’25, HMMT ’25, and OMNI-Math-HARD. This model is part of a broader initiative aimed at developing open models in the UAE, which also encompasses Jais (for Arabic), NANDA (for Hindi), and SHERKALA (for Kazakh). It builds on the foundational work laid by the K2-65B, a fully reproducible open-source foundation model that was introduced in 2024. K2 Think is designed to be open, efficient, and versatile, featuring a web app interface that encourages user interaction and exploration. Its cutting-edge approach to parameter positioning signifies a notable leap forward in creating compact architectures for high-level AI reasoning. Furthermore, its development underscores a commitment to improving access to advanced AI technologies across multiple languages and sectors, ultimately fostering greater inclusivity in the field.
-
3
Ray3
Luma AI
Transform your storytelling with stunning, pro-level video creation.
Ray3, created by Luma Labs, represents a state-of-the-art video generation platform that equips creators with the tools to produce visually stunning narratives at a professional level. This groundbreaking model enables the creation of native 16-bit High Dynamic Range (HDR) videos, leading to more vibrant colors, deeper contrasts, and an efficient workflow similar to those utilized in premium studios. It employs sophisticated physics to ensure consistency in key aspects like motion, lighting, and reflections, while providing users with visual controls to enhance their projects. Additionally, Ray3 includes a draft mode that allows for quick concept exploration, which can subsequently be polished into breathtaking 4K HDR outputs. The model is skilled in interpreting prompts with nuance, understanding creative intent, and performing initial self-assessments of drafts to refine scene and motion accuracy. Furthermore, it boasts features like keyframe support, looping and extending capabilities, upscaling options, and the ability to export individual frames, making it an essential tool for smooth integration into professional creative workflows. By leveraging these functionalities, creators can significantly amplify their storytelling through captivating visual experiences that resonate deeply with audiences, ultimately transforming how narratives are brought to life.
-
4
DeepSeek has introduced DeepSeek-V3.1-Terminus, an enhanced version of the V3.1 architecture that incorporates user feedback to improve output reliability, uniformity, and overall performance of the agent. This upgrade notably reduces the frequency of mixed Chinese and English text as well as unintended anomalies, resulting in a more polished and cohesive language generation experience. Furthermore, the update overhauls both the code agent and search agent subsystems, yielding better and more consistent performance across a range of benchmarks. DeepSeek-V3.1-Terminus is released as an open-source model, with its weights made available on Hugging Face, thereby facilitating easier access for the community to utilize its functionalities. The model's architecture stays consistent with that of DeepSeek-V3, ensuring compatibility with existing deployment strategies, while updated inference demonstrations are provided for users to investigate its capabilities. Impressively, the model functions at a massive scale of 685 billion parameters and accommodates various tensor formats, such as FP8, BF16, and F32, which enhances its adaptability in diverse environments. This versatility empowers developers to select the most appropriate format tailored to their specific requirements and resource limitations, thereby optimizing performance in their respective applications.
-
5
Qwen3-Max
Alibaba
Unleash limitless potential with advanced multi-modal reasoning capabilities.
Qwen3-Max is Alibaba's state-of-the-art large language model, boasting an impressive trillion parameters designed to enhance performance in tasks that demand agency, coding, reasoning, and the management of long contexts. As a progression of the Qwen3 series, this model utilizes improved architecture, training techniques, and inference methods; it features both thinker and non-thinker modes, introduces a distinctive “thinking budget” approach, and offers the flexibility to switch modes according to the complexity of the tasks. With its capability to process extremely long inputs and manage hundreds of thousands of tokens, it also enables the invocation of tools and showcases remarkable outcomes across various benchmarks, including evaluations related to coding, multi-step reasoning, and agent assessments like Tau2-Bench. Although the initial iteration primarily focuses on following instructions within a non-thinking framework, Alibaba plans to roll out reasoning features that will empower autonomous agent functionalities in the near future. Furthermore, with its robust multilingual support and comprehensive training on trillions of tokens, Qwen3-Max is available through API interfaces that integrate well with OpenAI-style functionalities, guaranteeing extensive applicability across a range of applications. This extensive and innovative framework positions Qwen3-Max as a significant competitor in the field of advanced artificial intelligence language models, making it a pivotal tool for developers and researchers alike.
-
6
DeepSeek-V3.2-Exp
DeepSeek
Experience lightning-fast efficiency with cutting-edge AI technology!
We are excited to present DeepSeek-V3.2-Exp, our latest experimental model that evolves from V3.1-Terminus, incorporating the cutting-edge DeepSeek Sparse Attention (DSA) technology designed to significantly improve both training and inference speeds for longer contexts. This innovative DSA framework enables accurate sparse attention while preserving the quality of outputs, resulting in enhanced performance for long-context tasks alongside reduced computational costs. Benchmark evaluations demonstrate that V3.2-Exp delivers performance on par with V3.1-Terminus, all while benefiting from these efficiency gains. The model is fully functional across various platforms, including app, web, and API. In addition, to promote wider accessibility, we have reduced DeepSeek API pricing by more than 50% starting now. During this transition phase, users will have access to V3.1-Terminus through a temporary API endpoint until October 15, 2025. DeepSeek invites feedback on DSA from users via our dedicated feedback portal, encouraging community engagement. To further support this initiative, DeepSeek-V3.2-Exp is now available as open-source, with model weights and key technologies—including essential GPU kernels in TileLang and CUDA—published on Hugging Face, and we are eager to observe how the community will leverage this significant technological advancement. As we unveil this new chapter, we anticipate fruitful interactions and innovative applications arising from the collective contributions of our user base.
-
7
gpt-4o-mini Realtime
OpenAI
Real-time voice and text interactions, effortlessly seamless communication.
The gpt-4o-mini-realtime-preview model is an efficient and cost-effective version of GPT-4o, designed explicitly for real-time communication in both speech and text with minimal latency. It processes audio and text inputs and outputs, enabling seamless dialogue experiences through a stable WebSocket or WebRTC connection. Unlike its larger GPT-4o relatives, this model does not support image or structured output formats and focuses solely on immediate voice and text applications. Developers can start a real-time session via the /realtime/sessions endpoint to obtain a temporary key, which allows them to stream user audio or text and receive instant feedback through the same connection. This model is part of the early preview family (version 2024-12-17) and is mainly intended for testing and feedback collection, rather than for handling large-scale production tasks. Users should be aware that there are certain rate limitations, and the model may experience changes during this preview phase. The emphasis on audio and text modalities opens avenues for technologies such as conversational voice assistants, significantly improving user interactions across various environments. As advancements in technology continue, it is anticipated that new enhancements and capabilities will emerge to further enrich the overall user experience. Ultimately, this model serves as a stepping stone towards more versatile applications in the realm of real-time communication.
-
8
Hunyuan-Vision-1.5
Tencent
Revolutionizing vision-language tasks with deep multimodal reasoning.
HunyuanVision, a cutting-edge vision-language model developed by Tencent's Hunyuan team, utilizes a unique mamba-transformer hybrid architecture that significantly enhances performance while ensuring efficient inference for various multimodal reasoning tasks. The most recent version, Hunyuan-Vision-1.5, emphasizes the notion of "thinking on images," which empowers it to understand the interactions between visual and textual elements and perform complex reasoning tasks such as cropping, zooming, pointing, box drawing, and annotating images to improve comprehension. This adaptable model caters to a wide range of vision-related tasks, including image and video recognition, optical character recognition (OCR), and diagram analysis, while also promoting visual reasoning and 3D spatial understanding, all within a unified multilingual framework. With a design that accommodates multiple languages and tasks, HunyuanVision intends to be open-sourced, offering access to various checkpoints, a detailed technical report, and inference support to encourage community involvement and experimentation. This initiative not only seeks to empower researchers and developers to tap into the model's potential for diverse applications but also aims to foster collaboration among users to drive innovation within the field. By making these resources available, HunyuanVision aspires to create a vibrant ecosystem for further advancements in multimodal AI.
-
9
Gemini Enterprise
Google
Empower your workforce with seamless AI-driven productivity.
Gemini Enterprise is a comprehensive AI solution from Google Cloud that aims to utilize the extensive capabilities of Google's advanced AI models, tools for agent creation, and enterprise-level data access, all integrated seamlessly into everyday operations. This cutting-edge platform includes a unified chat interface that enables employees to interact effectively with internal documents, applications, multiple data sources, and customized AI agents. The core of Gemini Enterprise is built upon six critical components: the Gemini suite of large multimodal models, an agent orchestration workbench formerly known as Google Agentspace, pre-built starter agents, robust data integration connectors for business systems, comprehensive security and governance measures, and a collaborative partner ecosystem for tailored integrations. Designed for scalability across different departments and organizations, it allows users to create no-code or low-code agents that can automate a variety of tasks, including research synthesis, customer service interactions, code support, and contract evaluation while remaining compliant with corporate regulations. In addition to streamlining operations, the platform also aims to boost productivity and inspire innovation across businesses, making it easier for users to take advantage of advanced AI technologies. Ultimately, Gemini Enterprise represents a significant step forward in the integration of AI into business processes, paving the way for a new era of efficiency and creativity in the workplace.
-
10
Claude Haiku 4.5
Anthropic
Elevate efficiency with cutting-edge performance at reduced costs!
Anthropic has launched Claude Haiku 4.5, a new small language model that seeks to deliver near-frontier capabilities while significantly lowering costs. This model shares the coding and reasoning strengths of the mid-tier Sonnet 4 but operates at about one-third of the cost and boasts over twice the processing speed. Benchmarks provided by Anthropic indicate that Haiku 4.5 either matches or exceeds the performance of Sonnet 4 in vital areas such as code generation and complex “computer use” workflows. It is particularly fine-tuned for use cases that demand real-time, low-latency performance, making it a perfect fit for applications such as chatbots, customer service, and collaborative programming. Users can access Haiku 4.5 via the Claude API under the label “claude-haiku-4-5,” aiming for large-scale deployments where cost efficiency, quick responses, and sophisticated intelligence are critical. Now available on Claude Code and a variety of applications, this model enhances user productivity while still delivering high-caliber performance. Furthermore, its introduction signifies a major advancement in offering businesses affordable yet effective AI solutions, thereby reshaping the landscape of accessible technology. This evolution in AI capabilities reflects the ongoing commitment to providing innovative tools that meet the diverse needs of users in various sectors.
-
11
MiniMax M2
MiniMax
Revolutionize coding workflows with unbeatable performance and cost.
MiniMax M2 represents a revolutionary open-source foundational model specifically designed for agent-driven applications and coding endeavors, striking a remarkable balance between efficiency, speed, and cost-effectiveness. It excels within comprehensive development ecosystems, skillfully handling programming assignments, utilizing various tools, and executing complex multi-step operations, all while seamlessly integrating with Python and delivering impressive inference speeds estimated at around 100 tokens per second, coupled with competitive API pricing at roughly 8% of comparable proprietary models. Additionally, the model features a "Lightning Mode" for rapid and efficient agent actions and a "Pro Mode" tailored for in-depth full-stack development, report generation, and management of web-based tools; its completely open-source weights facilitate local deployment through vLLM or SGLang. What sets MiniMax M2 apart is its readiness for production environments, enabling agents to independently carry out tasks such as data analysis, software development, tool integration, and executing complex multi-step logic in real-world organizational settings. Furthermore, with its cutting-edge capabilities, this model is positioned to transform how developers tackle intricate programming challenges and enhances productivity across various domains.
-
12
Kimi K2 Thinking
Moonshot AI
Unleash powerful reasoning for complex, autonomous workflows.
Kimi K2 Thinking is an advanced open-source reasoning model developed by Moonshot AI, specifically designed for complex, multi-step workflows where it adeptly merges chain-of-thought reasoning with the use of tools across various sequential tasks. It utilizes a state-of-the-art mixture-of-experts architecture, encompassing an impressive total of 1 trillion parameters, though only approximately 32 billion parameters are engaged during each inference, which boosts efficiency while retaining substantial capability. The model supports a context window of up to 256,000 tokens, enabling it to handle extraordinarily lengthy inputs and reasoning sequences without losing coherence. Furthermore, it incorporates native INT4 quantization, which dramatically reduces inference latency and memory usage while maintaining high performance. Tailored for agentic workflows, Kimi K2 Thinking can autonomously trigger external tools, managing sequential logic steps that typically involve around 200-300 tool calls in a single chain while ensuring consistent reasoning throughout the entire process. Its strong architecture positions it as an optimal solution for intricate reasoning challenges that demand both depth and efficiency, making it a valuable asset in various applications. Overall, Kimi K2 Thinking stands out for its ability to integrate complex reasoning and tool use seamlessly.
-
13
GPT-5.1-Codex
OpenAI
Elevate coding efficiency with intelligent, adaptive software solutions.
GPT-5.1-Codex represents a sophisticated evolution of the GPT-5.1 framework, tailored specifically for coding and software development tasks that necessitate a degree of independence. This model shines in interactive programming scenarios as well as in the sustained execution of complex engineering endeavors, encompassing activities such as building applications from scratch, improving functionalities, debugging, performing comprehensive code refactoring, and conducting code reviews. It adeptly harnesses a variety of tools while merging seamlessly into development environments, modulating its reasoning skills according to the complexity of the tasks at hand; it swiftly resolves straightforward issues while allocating additional resources to more complex challenges. Users have noted that GPT-5.1-Codex consistently produces cleaner and higher-quality code compared to its general-purpose alternatives, demonstrating a better alignment with developer needs and a significant decrease in errors. Moreover, access to the model is provided via the Responses API rather than the typical chat API, and it includes distinct configurations such as a “mini” version for those on a budget and a “max” variant that offers the highest level of performance. This specialized iteration is designed not only to improve productivity but also to significantly enhance efficiency in software development processes, ultimately leading to a smoother workflow for engineers. Its adaptability and targeted features make it a valuable asset in the fast-evolving landscape of software engineering.
-
14
SAM 3D
Meta
Transforming images into stunning 3D models effortlessly.
SAM 3D is comprised of two advanced foundation models capable of converting standard RGB images into striking 3D representations of objects or human figures. Among its features, SAM 3D Objects excels in accurately reconstructing the full 3D geometry, textures, and spatial arrangements of real-world items, effectively tackling challenges such as clutter, occlusions, and variable lighting conditions. Meanwhile, SAM 3D Body specializes in producing dynamic human mesh models that capture complex poses and shapes, employing the "Meta Momentum Human Rig" (MHR) format for added detail. This system is designed to function seamlessly with images captured in natural environments, requiring no additional training or fine-tuning; users can simply upload an image, choose the object or person of interest, and obtain a downloadable asset (like .OBJ, .GLB, or MHR) that is immediately ready for use in 3D applications. The models also boast features such as open-vocabulary reconstruction applicable across various object categories, consistency across multiple views, and reasoning for occlusions, all of which are enhanced by a rich and diverse dataset comprising over one million annotated real-world images that significantly bolster their adaptability and reliability. Additionally, the open-source nature of these models fosters greater accessibility and encourages collaborative advancements within the development community, allowing users to contribute and refine the technology collectively. This collaborative effort not only enhances the models but also promotes innovation in the field of 3D reconstruction.
-
15
Olmo 3
Ai2
Unlock limitless potential with groundbreaking open-model technology.
Olmo 3 constitutes an extensive series of open models that include versions with 7 billion and 32 billion parameters, delivering outstanding performance in areas such as base functionality, reasoning, instruction, and reinforcement learning, all while ensuring transparency throughout the development process, including access to raw training datasets, intermediate checkpoints, training scripts, extended context support (with a remarkable window of 65,536 tokens), and provenance tools. The backbone of these models is derived from the Dolma 3 dataset, which encompasses about 9 trillion tokens and employs a thoughtful mixture of web content, scientific research, programming code, and comprehensive documents; this meticulous strategy of pre-training, mid-training, and long-context usage results in base models that receive further refinement through supervised fine-tuning, preference optimization, and reinforcement learning with accountable rewards, leading to the emergence of the Think and Instruct versions. Importantly, the 32 billion Think model has earned recognition as the most formidable fully open reasoning model available thus far, showcasing a performance level that closely competes with that of proprietary models in disciplines such as mathematics, programming, and complex reasoning tasks, highlighting a considerable leap forward in the realm of open model innovation. This breakthrough not only emphasizes the capabilities of open-source models but also suggests a promising future where they can effectively rival conventional closed systems across a range of sophisticated applications, potentially reshaping the landscape of artificial intelligence.
-
16
DeepSeek-V3.2
DeepSeek
Revolutionize reasoning with advanced, efficient, next-gen AI.
DeepSeek-V3.2 represents one of the most advanced open-source LLMs available, delivering exceptional reasoning accuracy, long-context performance, and agent-oriented design. The model introduces DeepSeek Sparse Attention (DSA), a breakthrough attention mechanism that maintains high-quality output while significantly lowering compute requirements—particularly valuable for long-input workloads. DeepSeek-V3.2 was trained with a large-scale reinforcement learning framework capable of scaling post-training compute to the level required to rival frontier proprietary systems. Its Speciale variant surpasses GPT-5 on reasoning benchmarks and achieves performance comparable to Gemini-3.0-Pro, including gold-medal scores in the IMO and IOI 2025 competitions. The model also features a fully redesigned agentic training pipeline that synthesizes tool-use tasks and multi-step reasoning data at scale. A new chat template architecture introduces explicit thinking blocks, robust tool-interaction formatting, and a specialized developer role designed exclusively for search-powered agents. To support developers, the repository includes encoding utilities that translate OpenAI-style prompts into DeepSeek-formatted input strings and parse model output safely. DeepSeek-V3.2 supports inference using safetensors and fp8/bf16 precision, with recommendations for ideal sampling settings when deployed locally. The model is released under the MIT license, ensuring maximal openness for commercial and research applications. Together, these innovations make DeepSeek-V3.2 a powerful choice for building next-generation reasoning applications, agentic systems, research assistants, and AI infrastructures.
-
17
DeepSeek-V3.2-Speciale represents the pinnacle of DeepSeek’s open-source reasoning models, engineered to deliver elite performance on complex analytical tasks. It introduces DeepSeek Sparse Attention (DSA), a highly efficient long-context attention design that reduces the computational burden while maintaining deep comprehension and logical consistency. The model is trained with an expanded reinforcement learning framework capable of leveraging massive post-training compute, enabling performance not only comparable to GPT-5 but demonstrably surpassing it in internal tests. Its reasoning capabilities have been validated through gold-winning solutions across major global competitions, including IMO 2025 and IOI 2025, with official submissions released for transparency and peer assessment. DeepSeek-V3.2-Speciale is intentionally designed without tool-calling features, focusing every parameter on pure reasoning, multi-step logic, and structured problem solving. It introduces a reworked chat template featuring explicit thought-delimited sections and a structured message format optimized for agentic-style reasoning workflows. The repository includes Python-based utilities for encoding and parsing messages, illustrating how to format prompts correctly for the model. Supporting multiple tensor types (BF16, FP32, FP8_E4M3), it is built for both research experimentation and high-performance local deployment. Users are encouraged to use temperature = 1.0 and top_p = 0.95 for best results when running the model locally. With its open MIT license and transparent development process, DeepSeek-V3.2-Speciale stands as a breakthrough option for anyone requiring industry-leading reasoning capacity in an open LLM.
-
18
Marengo
TwelveLabs
Revolutionizing multimedia search with powerful unified embeddings.
Marengo is a cutting-edge multimodal model specifically engineered to transform various forms of media—such as video, audio, images, and text—into unified embeddings, thereby enabling flexible "any-to-any" functionalities for searching, retrieving, classifying, and analyzing vast collections of video and multimedia content. By integrating visual frames that encompass both spatial and temporal dimensions with audio elements like speech, background noise, and music, as well as textual components including subtitles and metadata, Marengo develops an all-encompassing, multidimensional representation of each media piece. Its advanced embedding architecture empowers Marengo to tackle a wide array of complex tasks, including different types of searches (like text-to-video and video-to-audio), semantic content exploration, anomaly detection, hybrid searching, clustering, and similarity-based recommendations. Recent updates have further refined the model by introducing multi-vector embeddings that effectively separate appearance, motion, and audio/text features, resulting in significant advancements in accuracy and contextual comprehension, especially for complex or prolonged content. This ongoing development not only enhances the overall user experience but also expands the model’s applicability across various multimedia sectors, paving the way for more innovative uses in the future. As a result, the versatility and effectiveness of Marengo position it as a valuable asset in the rapidly evolving landscape of multimedia technology.
-
19
Lux
OpenAGI Foundation
Revolutionizing AI: Empowering agents to operate like humans.
Lux marks a major leap in AI capability by giving models the ability to operate real software environments—moving a cursor, pressing buttons, filling forms, navigating dashboards, and performing full computer workflows autonomously. It combines three powerful execution modes: Tasker for strict step-by-step reliability, Actor for rapid-response actions, and Thinker for extended reasoning across complex tasks that may take minutes or hours. These modes allow Lux to support a diverse set of use cases such as Amazon marketplace data extraction, automated QA test execution in developer environments, and instant retrieval of insider trading information from Nasdaq. Developers can begin building production-grade agents in under 20 minutes using Lux’s SDKs, frameworks, and ready-made UX templates. Unlike traditional AI models that only generate outputs, Lux operates inside real interfaces, enabling automation for businesses that rely on human-facing applications. The system understands both simple instructions and vague requests, planning its actions and executing long chains of behavior with high stability. This capability unlocks new possibilities for software automation, from enterprise workflows to gaming, analytics, and back-office operations. Lux represents a broader paradigm shift in AI—from information generation to direct action—making machines capable of using computers as humans do. By democratizing a skill previously limited to the world’s largest AI labs, Lux empowers developers everywhere to build advanced computer-use agents. With Lux, AI becomes not just a tool for insights, but a workforce capable of performing digital tasks at scale.
-
20
Ministral 3
Mistral AI
"Unleash advanced AI efficiency for every device."
Mistral 3 marks the latest development in the realm of open-weight AI models created by Mistral AI, featuring a wide array of options ranging from small, edge-optimized variants to a prominent large-scale multimodal model. Among this selection are three streamlined “Ministral 3” models, equipped with 3 billion, 8 billion, and 14 billion parameters, specifically designed for use on resource-constrained devices like laptops, drones, and various edge devices. In addition, the powerful “Mistral Large 3” serves as a sparse mixture-of-experts model, featuring an impressive total of 675 billion parameters, with 41 billion actively utilized. These models are adept at managing multimodal and multilingual tasks, excelling in areas such as text analysis and image understanding, and have demonstrated remarkable capabilities in responding to general inquiries, handling multilingual conversations, and processing multimodal inputs. Moreover, both the base and instruction-tuned variants are offered under the Apache 2.0 license, which promotes significant customization and integration into a range of enterprise and open-source projects. This approach not only enhances flexibility in usage but also sparks innovation and fosters collaboration among developers and organizations, ultimately driving advancements in AI technology.
-
21
Qwen3-VL
Alibaba
Revolutionizing multimodal understanding with cutting-edge vision-language integration.
Qwen3-VL is the newest member of Alibaba Cloud's Qwen family, merging advanced text processing alongside remarkable visual and video analysis functionalities within a unified multimodal system. This model is designed to handle various input formats, such as text, images, and videos, and it excels in navigating complex and lengthy contexts, accommodating up to 256 K tokens with the possibility for future enhancements. With notable improvements in spatial reasoning, visual comprehension, and multimodal reasoning, the architecture of Qwen3-VL introduces several innovative features, including Interleaved-MRoPE for consistent spatio-temporal positional encoding and DeepStack to leverage multi-level characteristics from its Vision Transformer foundation for enhanced image-text correlation. Additionally, the model incorporates text–timestamp alignment to ensure precise reasoning regarding video content and time-related occurrences. These innovations allow Qwen3-VL to effectively analyze complex scenes, monitor dynamic video narratives, and decode visual arrangements with exceptional detail. The capabilities of this model signify a substantial advancement in multimodal AI applications, underscoring its versatility and promise for a broad spectrum of real-world applications. As such, Qwen3-VL stands at the forefront of technological progress in the realm of artificial intelligence.
-
22
Devstral 2
Mistral AI
Revolutionizing software engineering with intelligent, context-aware code solutions.
Devstral 2 is an innovative, open-source AI model tailored for software engineering, transcending simple code suggestions to fully understand and manipulate entire codebases; this advanced functionality enables it to execute tasks such as multi-file edits, bug fixes, refactoring, managing dependencies, and generating code that is aware of its context. The suite includes a powerful 123-billion-parameter model alongside a streamlined 24-billion-parameter variant called “Devstral Small 2,” offering flexibility for teams; the larger model excels in handling intricate coding tasks that necessitate a deep contextual understanding, whereas the smaller model is optimized for use on less robust hardware. With a remarkable context window capable of processing up to 256 K tokens, Devstral 2 is adept at analyzing extensive repositories, tracking project histories, and maintaining a comprehensive understanding of large files, which is especially advantageous for addressing the challenges of real-world software projects. Additionally, the command-line interface (CLI) further enhances the model’s functionality by monitoring project metadata, Git statuses, and directory structures, thereby enriching the AI’s context and making “vibe-coding” even more impactful. This powerful blend of features solidifies Devstral 2's role as a revolutionary tool within the software development ecosystem, offering unprecedented support for engineers. As the landscape of software engineering continues to evolve, tools like Devstral 2 promise to redefine the way developers approach coding tasks.
-
23
Devstral Small 2
Mistral AI
Empower coding efficiency with a compact, powerful AI.
Devstral Small 2 is a condensed, 24 billion-parameter variant of Mistral AI's groundbreaking coding-focused models, made available under the adaptable Apache 2.0 license to support both local use and API access. Alongside its more extensive sibling, Devstral 2, it offers "agentic coding" capabilities tailored for low-computational environments, featuring a substantial 256K-token context window that enables it to understand and alter entire codebases with ease. With a performance score nearing 68.0% on the widely recognized SWE-Bench Verified code-generation benchmark, Devstral Small 2 distinguishes itself within the realm of open-weight models that are much larger. Its compact structure and efficient design allow it to function effectively on a single GPU or even in CPU-only setups, making it an excellent option for developers, small teams, or hobbyists who may lack access to extensive data-center facilities. Moreover, despite being smaller, Devstral Small 2 retains critical functionalities found in its larger counterparts, such as the capability to reason through multiple files and adeptly manage dependencies, ensuring that users enjoy substantial coding support. This combination of efficiency and high performance positions it as an indispensable asset for the coding community. Additionally, its user-friendly approach ensures that both novice and experienced programmers can leverage its capabilities without significant barriers.
-
24
DeepCoder
Agentica Project
Unleash coding potential with advanced open-source reasoning model.
DeepCoder, a fully open-source initiative for code reasoning and generation, has been created through a collaboration between the Agentica Project and Together AI. Built on the foundation of DeepSeek-R1-Distilled-Qwen-14B, it has been fine-tuned using distributed reinforcement learning techniques, achieving an impressive accuracy of 60.6% on LiveCodeBench, which represents an 8% improvement compared to its predecessor. This remarkable performance positions it competitively alongside proprietary models such as o3-mini (2025-01-031 Low) and o1, all while operating with a streamlined 14 billion parameters. The training process was intensive, lasting 2.5 weeks on a fleet of 32 H100 GPUs and utilizing a meticulously curated dataset comprising around 24,000 coding challenges obtained from reliable sources such as TACO-Verified, PrimeIntellect SYNTHETIC-1, and submissions to LiveCodeBench. Each coding challenge was required to include a valid solution paired with at least five unit tests to ensure robustness during the reinforcement learning phase. Additionally, DeepCoder employs innovative methods like iterative context lengthening and overlong filtering to effectively handle long-range contextual dependencies, allowing it to tackle complex coding tasks with proficiency. This distinctive approach not only enhances DeepCoder's accuracy and reliability in code generation but also positions it as a significant player in the landscape of code generation models. As a result, developers can rely on its capabilities for diverse programming challenges.
-
25
DeepSWE
Agentica Project
Revolutionizing coding with intelligent, adaptive, open-source solutions.
DeepSWE represents a groundbreaking advancement in open-source coding agents, harnessing the Qwen3-32B foundation model trained exclusively through reinforcement learning (RL) without the aid of supervised fine-tuning or proprietary model distillation. Developed using rLLM, which is Agentica's open-source RL framework tailored for language-driven agents, DeepSWE functions effectively within a simulated development environment provided by the R2E-Gym framework. This setup equips it with a range of tools, such as a file editor, search functions, shell execution, and submission capabilities, allowing the agent to adeptly navigate extensive codebases, modify multiple files, compile code, execute tests, and iteratively generate patches or fulfill intricate engineering tasks. In addition to mere code generation, DeepSWE exhibits sophisticated emergent behaviors; when confronted with bugs or feature requests, it engages in critical reasoning regarding edge cases, searches for existing tests in the codebase, proposes patches, creates additional tests to avert regressions, and adapts its cognitive strategies based on the specific challenges presented. This remarkable adaptability and efficiency position DeepSWE as a formidable asset in the software development landscape, empowering developers to tackle complex projects with greater ease and confidence. Its ability to learn from each interaction further enhances its performance, ensuring continuous improvement over time.