Run Inference

Model Library

You can leverage our models to solve a variety of problems.

Chat

Qwen/Qwen3-235B-A22B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and Mixed-of-Experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

In/Out MTokens – $0.2/$0.8

Context – 128K

meta-llama/Llama-4-Maverick-17B-128E-Instruct

Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts.

In/Out MTokens – $0.17/$0.85

Context – 512K

Qwen/Qwen3-32B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and Mixed-of-Experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

In/Out MTokens – $0.1/$0.45

Context – 128K

meta-llama/Llama-4-Scout-17B-16E-Instruct

Llama 4 models mark the beginning of a new era for the Llama ecosystem. We are launching two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion parameter model with 16 experts, and Llama 4 Maverick, a 17 billion parameter model with 128 experts.

In/Out MTokens – $0.1/$0.5

Context – 320K

Embedding

BAAI/bge-m3

It is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.

In/Out MTokens – $0.1/$0

Stella_en_1.5B_v5

The models are trained based on Alibaba-NLP/gte-large-en-v1.5 and Alibaba-NLP/gte-Qwen2-1.5B-instruct.

In/Out MTokens – $0.17/$0.85

NV-Embed-v2

NV-Embed-v2 presents several new designs, including having the LLM attend to latent vectors for better pooled embedding output, and demonstrating a two-staged instruction tuning method to enhance the accuracy of both retrieval and non-retrieval tasks. Additionally, NV-Embed-v2 incorporates a novel hard-negative mining methods that take into account the positive relevance score for better false negatives removal.

In/Out MTokens – $0.1/$0

Image

Flux.1-Depth-dev

FLUX.1 Depth [dev] is a 12 billion parameter rectified flow transformer capable of generating an image based on a text description while following the structure of a given input image.

Fluc.1-Redux-dev

FLUX.1 Fill [dev] is a 12 billion parameter rectified flow transformer capable of filling areas in existing images based on a text description

Flux.1-Canny-dev

FLUX.1 Canny [dev] is 12 billion parameter rectified flow transformer capable of generating an image based on a text description while following the structure of a given input image

Flux.1-Fill-dev

FLUX.1 Fill [dev] is a 12 billion parameter rectified flow transformer capable of filling areas in existing images based on a text description

Video

Hunyuan Video

MMAudio generates synchronized audio given video and/or text inputs. Our key innovation is multimodal joint training which allows training on a wide range of audio-visual and audio-text datasets. Moreover, a synchronization module aligns the generated audio with the video frames.

NetMindVideo1.0

The new NetMind video 1.0 model will produce jaw-dropping outputs, soon to be upgraded to 4K resolution, and faster generation times. (Allow us to geek out for a bit) Our perceptual Diffusion Transformer (DiT) model and advanced neural architectures are tailored specifically for video generation to provide unparalleled performance in resolution, dynamism, and speed. The new model will also increase temporal coherence, meaning movements will be smoother and more lifelike, through a blend of techniques and models.

MMAudio

MMAudio generates synchronized audio given video and/or text inputs. Our key innovation is multimodal joint training which allows training on a wide range of audio-visual and audio-text datasets. Moreover, a synchronization module aligns the generated audio with the video frames.