
Model Library
You can leverage our models to solve a variety of problems.
Chat
Qwen/Qwen3-235B-A22B
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and Mixed-of-Experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.
In/Out MTokens – $0.2/$0.8
Context – 128K
meta-llama/Llama-4-Maverick-17B-128E-Instruct
Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts.
In/Out MTokens – $0.17/$0.85
Context – 512K
Qwen/Qwen3-32B
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and Mixed-of-Experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.
In/Out MTokens – $0.1/$0.45
Context – 128K
meta-llama/Llama-4-Scout-17B-16E-Instruct
Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts.
In/Out MTokens – $0.1/$0.5
Context – 320K
Embedding
BAAI/bge-m3
It is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.
In/Out MTokens – $0.1/$0
Stella_en_1.5B_v5
The models are trained based on Alibaba-NLP/gte-large-en-v1.5 and Alibaba-NLP/gte-Qwen2-1.5B-instruct.
In/Out MTokens – $0.17/$0.85
NV-Embed-v2
NV-Embed-v2 presents several new designs, including having the LLM attend to latent vectors for better pooled embedding output, and demonstrating a two-staged instruction tuning method to enhance the accuracy of both retrieval and non-retrieval tasks. Additionally, NV-Embed-v2 incorporates a novel hard-negative mining methods that take into account the positive relevance score for better false negatives removal.
In/Out MTokens – $0.1/$0
Image
Flux.1-Depth-dev
FLUX.1 Depth [dev] is a 12 billion parameter rectified flow transformer capable of generating an image based on a text description while following the structure of a given input image.
Fluc.1-Redux-dev
FLUX.1 Fill [dev] is a 12 billion parameter rectified flow transformer capable of filling areas in existing images based on a text description
Flux.1-Canny-dev
FLUX.1 Canny [dev] is 12 billion parameter rectified flow transformer capable of generating an image based on a text description while following the structure of a given input image
Flux.1-Fill-dev
FLUX.1 Fill [dev] is a 12 billion parameter rectified flow transformer capable of filling areas in existing images based on a text description
Video
Hunyuan Video
MMAudio generates synchronized audio given video and/or text inputs. Our key innovation is multimodal joint training which allows training on a wide range of audio-visual and audio-text datasets. Moreover, a synchronization module aligns the generated audio with the video frames.
NetMindVideo1.0
The new NetMind video 1.0 model will produce jaw-dropping outputs, soon to be upgraded to 4K resolution, and faster generation times. (Allow us to geek out for a bit) Our perceptual Diffusion Transformer (DiT) model and advanced neural architectures are tailored specifically for video generation to provide unparalleled performance in resolution, dynamism, and speed. The new model will also increase temporal coherence, meaning movements will be smoother and more lifelike, through a blend of techniques and models.
MMAudio
MMAudio generates synchronized audio given video and/or text inputs. Our key innovation is multimodal joint training which allows training on a wide range of audio-visual and audio-text datasets. Moreover, a synchronization module aligns the generated audio with the video frames.