Running Open-Source Generative AI Models on Fireworks AI
Run open-source AI models on Fireworks AI — text, image, and code generation via a simple Streamlit app with Kimi K2.5, DeepSeek V3.2, GLM-5, and more.

Since Meta released Llama, open-source generative AI models have gone from research curiosity to a genuine alternative to closed APIs. The ecosystem has grown fast — text, image, code, audio, and music generation are all covered by publicly hosted models you can call with a few lines of Python. Fireworks AI has carved out a clear niche: blazing-fast inference for open-source models via a cloud API, without managing any infrastructure. Their FireAttention engine delivers meaningfully faster throughput than standard serving stacks, and they tend to offer day-0 support for major model releases. Sign up for an account and grab an API key — you'll get $1 in credits to start. The models below are all maintained by Fireworks; pricing is as listed at time of writing.

Generate Text using Kimi K2.5
Kimi K2.5 is Moonshot AI's flagship agentic model — a native multimodal MoE built on 1 trillion total parameters (32B active per forward pass), pretrained on ~15 trillion mixed visual and text tokens. It unifies vision and text, thinking and non-thinking modes, and single/multi-agent execution in one model. Fireworks offers day-0 support and claims the fastest endpoint for the Kimi K2 series. Currently priced at $0.60/1M input tokens and $2.50/1M output tokens.
Generate Text using Minimax M2.5
MiniMax M2.5 is a Mixture-of-Experts model built for real-world productivity tasks — coding, agentic tool use, document work, and multi-agent coordination. It was trained with reinforcement learning across hundreds of thousands of real-world digital environments, with a 204K token context window. Currently priced at $0.30/1M input tokens and $1.20/1M output tokens.
Generate Text using OpenAI gpt-oss 20B
gpt-oss-20b is OpenAI's first fully open-weight model release since 2019, available under an Apache 2.0 licence. It's a 21.5B parameter MoE model (3.6B active per token), designed for lower latency and resource-constrained deployments — it runs in 16GB of memory. Despite its size, it matches or exceeds OpenAI o3-mini on competition mathematics and health benchmarks. It supports configurable reasoning depth (low/medium/high) and a 131K token context window on Fireworks.
Generate Text using Mixtral MoE 8x22B Instruct
Mixtral MoE 8x22B Instruct is a Sparse Mixture of Experts model from Mistral, fine-tuned for instruction following. With 8 experts and a 22B parameter base per expert, it offers strong reasoning and multilingual capability at competitive cost. Currently priced at $0.90/1M input and output tokens.
Generate Text using DeepSeek V3.2
DeepSeek V3.2 is the latest iteration of DeepSeek's flagship MoE model, with 685B total parameters and a 160K context window. It combines strong general reasoning with tool calling support, and remains one of the best value-for-performance models available via API. Currently priced at $0.56/1M input tokens and $1.68/1M output tokens on Fireworks.
Generate Text using GLM-5
GLM-5 is Z.ai's (formerly Zhipu AI) flagship open-weight model — a 744B parameter MoE (40B active per token) trained on 28.5 trillion tokens using DeepSeek Sparse Attention on Huawei Ascend hardware. It targets complex systems engineering and long-horizon agentic tasks, with a 200K token context window and strong performance on SWE-bench (77.8%). Currently priced at $1.00/1M input tokens and $3.20/1M output tokens on Fireworks.
Generate Image using Stable Diffusion XL Model
Stable Diffusion XL is a diffusion-based text-to-image generative AI model that creates beautiful images. You can apply a watermark, and enable a safety check for generated images. The model is priced by the number of inference steps (denoising iterations); each step costs $0.00013. So, a 30-step image would cost $0.0039.
The Streamlit App
Install the dependencies and run locally with:
pip install fireworks-ai streamlit
streamlit run streamlit_app.pyAlternatively, if you'd rather not run it locally, you can deploy this app to Railway — just push the code to a GitHub repo and connect it from the Railway dashboard.
Here's the full source:
import os
import streamlit as st
import fireworks.client
from fireworks.client.image import ImageInference, Answer
# Streamlit app config
st.set_page_config(page_title="Fireworks Studio", page_icon="🔥", layout="centered")
with st.sidebar:
st.title("🔥 Fireworks Studio")
with st.expander("⚙️ Settings", expanded=True):
fireworks_api_key = st.text_input("Fireworks API key", type="password", help="Get your key [here](https://fireworks.ai/settings/users/api-keys)")
option = st.radio("Serverless model", [
"📝 Kimi K2.5",
"📝 MiniMax-M2.5",
"📝 OpenAI gpt-oss 20B",
"📝 Mixtral MoE 8x22B Instruct",
"📝 Deepseek v3.2",
"📝 GLM-5",
"📷 Stable Diffusion XL"]
)
col1, col2 = st.columns([4, 1])
prompt = col1.text_input("Prompt", label_visibility="collapsed")
submit = col2.button("Submit")
# If Submit button is clicked
if submit:
if not fireworks_api_key.strip():
st.warning("⚠️ Please enter your Fireworks API key in the sidebar.")
elif not prompt.strip():
st.warning("⚠️ Please enter a prompt.")
else:
try:
with st.spinner("Please wait..."):
fireworks.client.api_key = fireworks_api_key
os.environ["FIREWORKS_API_KEY"] = fireworks_api_key
if option == "📝 Kimi K2.5":
# Run kimi-k2p5 model on Fireworks AI
response = fireworks.client.ChatCompletion.create(
model="accounts/fireworks/models/kimi-k2p5",
messages=[{
"role": "user",
"content": prompt,
}],
)
st.success(response.choices[0].message.content)
elif option == "📝 MiniMax-M2.5":
# Run minimax-m2p5 model on Fireworks AI
response = fireworks.client.ChatCompletion.create(
model="accounts/fireworks/models/minimax-m2p5",
messages=[{
"role": "user",
"content": prompt,
}],
)
st.success(response.choices[0].message.content)
elif option == "📝 OpenAI gpt-oss 20B":
# Run gpt-oss-20b model on Fireworks AI
response = fireworks.client.ChatCompletion.create(
model="accounts/fireworks/models/gpt-oss-20b",
messages=[{
"role": "user",
"content": prompt,
}],
)
st.success(response.choices[0].message.content)
elif option == "📝 Mixtral MoE 8x22B Instruct":
# Run mixtral-8x22b-instruct model on Fireworks AI
response = fireworks.client.ChatCompletion.create(
model="accounts/fireworks/models/mixtral-8x22b-instruct",
messages=[{
"role": "user",
"content": prompt,
}],
)
st.success(response.choices[0].message.content)
elif option == "📝 Deepseek v3.2":
# Run deepseek-v3p2 model on Fireworks AI
response = fireworks.client.ChatCompletion.create(
model="accounts/fireworks/models/deepseek-v3p2",
messages=[{
"role": "user",
"content": prompt,
}],
)
st.success(response.choices[0].message.content)
elif option == "📝 GLM-5":
# Run glm-5 model on Fireworks AI
response = fireworks.client.ChatCompletion.create(
model="accounts/fireworks/models/glm-5",
messages=[{
"role": "user",
"content": prompt,
}],
)
st.success(response.choices[0].message.content)
elif option == "📷 Stable Diffusion XL":
# Run stable-diffusion-xl-1024-v1-0 model on Fireworks AI
client = ImageInference(model="stable-diffusion-xl-1024-v1-0")
answer : Answer = client.text_to_image(
prompt=prompt,
cfg_scale=7,
height=1024,
width=1024,
sampler=None,
steps=30,
seed=0,
safety_check=False,
output_image_format="PNG"
)
st.image(answer.image)
except Exception as e:
st.exception(f"Exception: {e}")