List of the Top 9 Large Language Models for Together AI in 2026

Reviews and comparisons of the top Large Language Models with a Together AI integration


Below is a list of Large Language Models that integrates with Together AI. Use the filters above to refine your search for Large Language Models that is compatible with Together AI. The list below displays Large Language Models products that have a native integration with Together AI.
  • 1
    Kimi K2.5 Reviews & Ratings

    Kimi K2.5

    Moonshot AI

    Revolutionize your projects with advanced reasoning and comprehension.
    Kimi K2.5 is an advanced multimodal AI model engineered for high-performance reasoning, coding, and visual intelligence tasks. It natively supports both text and visual inputs, allowing applications to analyze images and videos alongside natural language prompts. The model achieves open-source state-of-the-art results across agent workflows, software engineering, and general-purpose intelligence tasks. With a massive 256K token context window, Kimi K2.5 can process large documents, extended conversations, and complex codebases in a single request. Its long-thinking capabilities enable multi-step reasoning, tool usage, and precise problem solving for advanced use cases. Kimi K2.5 integrates smoothly with existing systems thanks to full compatibility with the OpenAI API and SDKs. Developers can leverage features like streaming responses, partial mode, JSON output, and file-based Q&A. The platform supports image and video understanding with clear best practices for resolution, formats, and token usage. Flexible deployment options allow developers to choose between thinking and non-thinking modes based on performance needs. Transparent pricing and detailed token estimation tools help teams manage costs effectively. Kimi K2.5 is designed for building intelligent agents, developer tools, and multimodal applications at scale. Overall, it represents a major step forward in practical, production-ready multimodal AI.
  • 2
    Kimi K2.6 Reviews & Ratings

    Kimi K2.6

    Moonshot AI

    Unleash advanced reasoning and seamless execution capabilities today!
    Kimi K2.6 is a cutting-edge agentic AI model developed by Moonshot AI, designed to improve practical application, programming efficiency, and complex reasoning abilities beyond its forerunners, K2 and K2.5. Utilizing a Mixture-of-Experts framework, this model embodies the multimodal, agent-centric principles of the Kimi series, seamlessly combining language understanding, coding skills, and tool application into a unified system capable of planning and executing sophisticated workflows. It boasts advanced reasoning capabilities and superior agent planning, allowing it to break down tasks, coordinate multiple tools, and address challenges involving numerous files or steps with heightened accuracy and efficiency. Furthermore, it excels in tool-calling functions, ensuring a reliable connection with external platforms like web searches or APIs, while incorporating built-in validation systems to confirm the correctness of execution formats. Significantly, Kimi K2.6 marks a transformative advancement in the AI landscape, establishing new benchmarks for the intricacy and dependability of automated processes, and paving the way for future innovations in the field.
  • 3
    DeepSeek-V4 Reviews & Ratings

    DeepSeek-V4

    DeepSeek

    Unlock limitless potential with advanced reasoning and coding!
    DeepSeek-V4 is a cutting-edge open-source AI model built to deliver exceptional performance in reasoning, coding, and large-scale data processing. It supports an industry-leading one million token context window, allowing it to manage long documents and complex tasks efficiently. The model includes two variants: DeepSeek-V4-Pro, which offers 1.6 trillion parameters with 49 billion active for top-tier performance, and DeepSeek-V4-Flash, which provides a faster and more cost-effective alternative. DeepSeek-V4 introduces structural innovations such as token-wise compression and sparse attention, significantly reducing computational overhead while maintaining accuracy. It is designed with strong agentic capabilities, enabling seamless integration with AI agents and multi-step workflows. The model excels in domains such as mathematics, coding, and scientific reasoning, outperforming many open-source alternatives. It also supports flexible reasoning modes, allowing users to optimize for speed or depth depending on the task. DeepSeek-V4 is compatible with popular APIs, making it easy to integrate into existing systems. Its open-source nature allows developers to customize and scale it according to their needs. The model is already being used in advanced coding agents and automation workflows. It delivers a strong balance of performance, efficiency, and scalability for real-world applications. Overall, DeepSeek-V4 represents a major advancement in accessible, high-performance AI technology.
  • 4
    Qwen3.5 Reviews & Ratings

    Qwen3.5

    Alibaba

    Empowering intelligent multimodal workflows with advanced language capabilities.
    Qwen3.5 is an advanced open-weight multimodal AI system built to serve as the foundation for native digital agents capable of reasoning across text, images, and video. The primary release, Qwen3.5-397B-A17B, introduces a hybrid architecture that combines Gated DeltaNet linear attention with a sparse mixture-of-experts design, activating just 17 billion parameters per inference pass while maintaining a total parameter count of 397 billion. This selective activation dramatically improves decoding throughput and cost efficiency without sacrificing benchmark-level performance. Qwen3.5 demonstrates strong results across knowledge, multilingual reasoning, coding, STEM tasks, search agents, visual question answering, document understanding, and spatial intelligence benchmarks. The hosted Qwen3.5-Plus variant offers a default one-million-token context window and integrated tool usage such as web search and code interpretation for adaptive problem-solving. Expanded multilingual support now covers 201 languages and dialects, backed by a 250k vocabulary that enhances encoding and decoding efficiency across global use cases. The model is natively multimodal, using early fusion techniques and large-scale visual-text pretraining to outperform prior Qwen-VL systems in scientific reasoning and video analysis. Infrastructure innovations such as heterogeneous parallel training, FP8 precision pipelines, and disaggregated reinforcement learning frameworks enable near-text baseline throughput even with mixed multimodal inputs. Extensive reinforcement learning across diverse and generalized environments improves long-horizon planning, multi-turn interactions, and tool-augmented workflows. Designed for developers, researchers, and enterprises, Qwen3.5 supports scalable deployment through Alibaba Cloud Model Studio while paving the way toward persistent, economically aware, autonomous AI agents.
  • 5
    MiniMax M2.7 Reviews & Ratings

    MiniMax M2.7

    MiniMax

    Revolutionize productivity with advanced AI for seamless workflows.
    MiniMax M2.7 is a cutting-edge AI model engineered to deliver high-performance productivity across coding, search, and professional office workflows. It is trained using reinforcement learning across extensive real-world environments, allowing it to handle complex, multi-step tasks with accuracy and adaptability. The model excels at structured problem-solving, breaking down challenges into logical steps before generating solutions across a wide range of programming languages. It offers high-speed processing with rapid token generation, enabling faster execution of tasks and improved workflow efficiency. Its optimized reasoning reduces unnecessary token usage, improving both performance and cost efficiency compared to earlier models. M2.7 achieves state-of-the-art results in software engineering benchmarks, demonstrating strong capabilities in debugging, development, and incident resolution. It also significantly reduces intervention time during system issues, improving operational reliability. The model is equipped with advanced agentic capabilities, enabling it to collaborate with tools and execute complex workflows with high precision. It supports multi-agent environments and maintains strong adherence to complex task requirements. Additionally, it excels in professional knowledge tasks, including high-quality office document editing and multi-turn interactions. Its ability to handle structured business workflows makes it suitable for enterprise use cases. With its balance of speed, intelligence, and affordability, it stands out among frontier AI models. Overall, MiniMax M2.7 provides a scalable and efficient solution for modern AI-driven productivity and automation.
  • 6
    DeepSeek-V4-Pro Reviews & Ratings

    DeepSeek-V4-Pro

    DeepSeek

    Unleash powerful reasoning with advanced long-context efficiency.
    DeepSeek-V4-Pro is a next-generation Mixture-of-Experts language model designed to deliver high performance across reasoning, coding, and long-context AI tasks. It features a massive architecture with 1.6 trillion total parameters and 49 billion activated parameters, enabling efficient computation while maintaining strong capabilities. The model supports an industry-leading context window of up to one million tokens, allowing it to process extremely large datasets, documents, and workflows. Its hybrid attention mechanism combines advanced techniques to optimize long-context efficiency and reduce computational requirements. DeepSeek-V4-Pro is trained on over 32 trillion tokens, enhancing its knowledge base and reasoning abilities. It incorporates advanced optimization methods to improve training stability and convergence. The model supports multiple reasoning modes, including fast responses and deep analytical thinking for complex problem solving. It performs strongly across benchmarks in coding, mathematics, and knowledge-based tasks. The architecture is designed for agentic workflows, enabling it to handle multi-step tasks and tool-based interactions. As an open-source model, it offers flexibility for customization and deployment across various environments. It also supports efficient memory usage and reduced inference costs compared to previous versions. The model’s capabilities make it suitable for both research and enterprise applications. Overall, DeepSeek-V4-Pro represents a significant advancement in scalable, high-performance AI with long-context intelligence.
  • 7
    DeepSeek-V4-Flash Reviews & Ratings

    DeepSeek-V4-Flash

    DeepSeek

    Unmatched efficiency and scalability for advanced text generation.
    DeepSeek-V4-Flash is a next-generation Mixture-of-Experts language model engineered for high efficiency, scalability, and long-context intelligence. It consists of 284 billion total parameters with 13 billion activated parameters, enabling optimized performance with reduced computational overhead. The model supports an industry-leading context window of up to one million tokens, allowing it to process extensive datasets and complex workflows seamlessly. Its hybrid attention architecture combines advanced techniques to improve long-context efficiency and reduce memory usage. DeepSeek-V4-Flash is trained on over 32 trillion tokens, enhancing its capabilities in reasoning, coding, and knowledge-based tasks. It incorporates advanced optimization methods for stable training and faster convergence. The model supports multiple reasoning modes, including fast responses and deeper analytical processing for complex problems. While slightly less powerful than its Pro counterpart, it achieves comparable reasoning performance when given more computation budget. It is designed for agentic workflows, enabling multi-step reasoning and tool-based interactions. The model is well-suited for scalable deployments where performance and cost efficiency are both important. As an open-source solution, it offers flexibility for customization across various environments. It also reduces inference cost and resource usage compared to larger models. Overall, DeepSeek-V4-Flash delivers a strong balance of speed, efficiency, and capability for real-world AI use cases.
  • 8
    LFM2 Reviews & Ratings

    LFM2

    Liquid AI

    Experience lightning-fast, on-device AI for every endpoint.
    LFM2 is a cutting-edge series of on-device foundation models specifically engineered to deliver an exceptionally fast generative-AI experience across a wide range of devices. It employs an innovative hybrid architecture that enables decoding and pre-filling speeds up to twice as fast as competing models, while also improving training efficiency by as much as threefold compared to earlier versions. Striking a perfect balance between quality, latency, and memory use, these models are ideally suited for embedded system applications, allowing for real-time, on-device AI capabilities in smartphones, laptops, vehicles, wearables, and many other platforms. This results in millisecond-level inference, enhanced device longevity, and complete data sovereignty for users. Available in three configurations with 0.35 billion, 0.7 billion, and 1.2 billion parameters, LFM2 demonstrates superior benchmark results compared to similarly sized models, excelling in knowledge recall, mathematical problem-solving, adherence to multilingual instructions, and conversational dialogue evaluations. With such impressive capabilities, LFM2 not only elevates the user experience but also establishes a new benchmark for on-device AI performance, paving the way for future advancements in the field.
  • 9
    Nemotron 3 Super Reviews & Ratings

    Nemotron 3 Super

    NVIDIA

    Unleash advanced AI reasoning with unparalleled efficiency and scale.
    The Nemotron-3 Super stands out as a groundbreaking addition to NVIDIA's Nemotron 3 series of open models, designed specifically to support advanced agentic AI systems capable of reasoning, planning, and executing complex multi-step workflows in challenging settings. It incorporates a distinctive hybrid Mamba-Transformer Mixture-of-Experts architecture that combines the streamlined capabilities of Mamba layers with the contextual richness offered by transformer attention mechanisms, enabling it to effectively handle long sequences and complicated reasoning tasks with notable precision and efficiency. By activating only a selected subset of its parameters for each token, this design greatly improves computational efficiency while ensuring strong reasoning skills, making it particularly suitable for scalable inference in demanding situations. With an impressive configuration of around 120 billion parameters, of which approximately 12 billion are engaged during inference, the Nemotron-3 Super significantly enhances its capacity for managing multi-step reasoning and facilitating collaborative interactions among agents in broad contexts. This combination of features not only empowers it to address a wide array of challenges in the AI landscape but also positions it as a key player in the evolution of intelligent systems. Overall, the model exemplifies the potential for future innovations in AI technology.
  • Previous
  • You're on page 1
  • Next