QWQ AI

Support Models

Cohere: North Mini Code (free)

cohere/north-mini-code:free

North Mini Code is Cohere's first agentic coding model and the debut of its North family. A sparse mixture-of-experts model with 30B total parameters and 3B active, it is optimized...

Input:

text

Output:

text

Try Model

Google: Gemma 4 26B A4B (free)

google/gemma-4-26b-a4b-it:free

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

Input:

imagetextvideo

Output:

text

Try Model

Google: Gemma 4 31B (free)

google/gemma-4-31b-it:free

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

Input:

imagetextvideo

Output:

text

Try Model

LiquidAI: LFM2.5-1.2B-Instruct (free)

liquid/lfm-2.5-1.2b-instruct:free

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.

Input:

text

Output:

text

Try Model

LiquidAI: LFM2.5-1.2B-Thinking (free)

liquid/lfm-2.5-1.2b-thinking:free

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...

Input:

text

Output:

text

Try Model

Meta: Llama 3.2 3B Instruct (free)

meta-llama/llama-3.2-3b-instruct:free

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...

Input:

text

Output:

text

Try Model

Meta: Llama 3.3 70B Instruct (free)

meta-llama/llama-3.3-70b-instruct:free

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

Input:

text

Output:

text

Try Model

Nous: Hermes 3 405B Instruct (free)

nousresearch/hermes-3-llama-3.1-405b:free

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Input:

text

Output:

text

Try Model

NVIDIA: Nemotron 3 Nano 30B A3B (free)

nvidia/nemotron-3-nano-30b-a3b:free

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

Input:

text

Output:

text

Try Model

NVIDIA: Nemotron 3 Nano Omni (free)

nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

Input:

textaudioimagevideo

Output:

text

Try Model

NVIDIA: Nemotron 3 Super (free)

nvidia/nemotron-3-super-120b-a12b:free

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

Input:

text

Output:

text

Try Model

NVIDIA: Nemotron 3 Ultra (free)

nvidia/nemotron-3-ultra-550b-a55b:free

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Input:

text

Output:

text

Try Model

NVIDIA: Nemotron 3.5 Content Safety (free)

nvidia/nemotron-3.5-content-safety:free

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting...

Input:

textimage

Output:

text

Try Model

NVIDIA: Nemotron Nano 12B 2 VL (free)

nvidia/nemotron-nano-12b-v2-vl:free

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...

Input:

imagetextvideo

Output:

text

Try Model

NVIDIA: Nemotron Nano 9B V2 (free)

nvidia/nemotron-nano-9b-v2:free

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...

Input:

text

Output:

text

Try Model

OpenAI: gpt-oss-120b (free)

openai/gpt-oss-120b:free

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Input:

text

Output:

text

Try Model

OpenAI: gpt-oss-20b (free)

openai/gpt-oss-20b:free

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Input:

text

Output:

text

Try Model

Poolside: Laguna M.1 (free)

poolside/laguna-m.1:free

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai/), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 256K...

Input:

text

Output:

text

Try Model

Poolside: Laguna XS 2.1 (free)

poolside/laguna-xs-2.1:free

Laguna XS 2.1 is the latest coding agent model in the 33B-A3B category from [Poolside](https://poolside.ai/) and a step forward from their Laguna XS.2 model (released in April 2026). It combines...

Input:

text

Output:

text

Try Model

Qwen: Qwen3 Coder 480B A35B (free)

qwen/qwen3-coder:free

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...

Input:

text

Output:

text

Try Model

Qwen: Qwen3 Next 80B A3B Instruct (free)

qwen/qwen3-next-80b-a3b-instruct:free

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...

Input:

text

Output:

text

Try Model

Tencent: Hy3 (free)

tencent/hy3:free

Hy3 is a 295B-parameter Mixture-of-Experts model from Tencent (21B active, 192 experts with top-8 routing) built for reasoning, agentic workflows, and real-world production use. It supports a configurable reasoning effort:...

Input:

text

Output:

text

Try Model

Venice: Uncensored (free)

cognitivecomputations/dolphin-mistral-24b-venice-edition:free

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving...

Input:

text

Output:

text

Try Model