By alphasec in AI/ML — 13 Jul 2024

Running Open-Source Generative AI Models on Fireworks AI

Run open-source AI models on Fireworks AI — text, image, and code generation via a simple Streamlit app with Kimi K2.5, DeepSeek V3.2, GLM-5, and more.

Image generated using Stable Diffusion XL on Fireworks AI

Since Meta released Llama, open-source generative AI models have gone from research curiosity to a genuine alternative to closed APIs. The ecosystem has grown fast — text, image, code, audio, and music generation are all covered by publicly hosted models you can call with a few lines of Python. Fireworks AI has carved out a clear niche: blazing-fast inference for open-source models via a cloud API, without managing any infrastructure. Their FireAttention engine delivers meaningfully faster throughput than standard serving stacks, and they tend to offer day-0 support for major model releases. Sign up for an account and grab an API key — you'll get $1 in credits to start. The models below are all maintained by Fireworks; pricing is as listed at time of writing.

Open-source models available on Fireworks AI

Generate Text using Kimi K2.5

Kimi K2.5 is Moonshot AI's flagship agentic model — a native multimodal MoE built on 1 trillion total parameters (32B active per forward pass), pretrained on ~15 trillion mixed visual and text tokens. It unifies vision and text, thinking and non-thinking modes, and single/multi-agent execution in one model. Fireworks offers day-0 support and claims the fastest endpoint for the Kimi K2 series. Currently priced at $0.60/1M input tokens and $2.50/1M output tokens.

Generate Text using Minimax M2.5

MiniMax M2.5 is a Mixture-of-Experts model built for real-world productivity tasks — coding, agentic tool use, document work, and multi-agent coordination. It was trained with reinforcement learning across hundreds of thousands of real-world digital environments, with a 204K token context window. Currently priced at $0.30/1M input tokens and $1.20/1M output tokens.

Generate Text using OpenAI gpt-oss 20B

gpt-oss-20b is OpenAI's first fully open-weight model release since 2019, available under an Apache 2.0 licence. It's a 21.5B parameter MoE model (3.6B active per token), designed for lower latency and resource-constrained deployments — it runs in 16GB of memory. Despite its size, it matches or exceeds OpenAI o3-mini on competition mathematics and health benchmarks. It supports configurable reasoning depth (low/medium/high) and a 131K token context window on Fireworks.

Generate Text using Mixtral MoE 8x22B Instruct

Mixtral MoE 8x22B Instruct is a Sparse Mixture of Experts model from Mistral, fine-tuned for instruction following. With 8 experts and a 22B parameter base per expert, it offers strong reasoning and multilingual capability at competitive cost. Currently priced at $0.90/1M input and output tokens.

Generate Text using DeepSeek V3.2

DeepSeek V3.2 is the latest iteration of DeepSeek's flagship MoE model, with 685B total parameters and a 160K context window. It combines strong general reasoning with tool calling support, and remains one of the best value-for-performance models available via API. Currently priced at $0.56/1M input tokens and $1.68/1M output tokens on Fireworks.

Generate Text using GLM-5

GLM-5 is Z.ai's (formerly Zhipu AI) flagship open-weight model — a 744B parameter MoE (40B active per token) trained on 28.5 trillion tokens using DeepSeek Sparse Attention on Huawei Ascend hardware. It targets complex systems engineering and long-horizon agentic tasks, with a 200K token context window and strong performance on SWE-bench (77.8%). Currently priced at $1.00/1M input tokens and $3.20/1M output tokens on Fireworks.

Generate Image using Stable Diffusion XL Model

Stable Diffusion XL is a diffusion-based text-to-image generative AI model that creates beautiful images. You can apply a watermark, and enable a safety check for generated images. The model is priced by the number of inference steps (denoising iterations); each step costs $0.00013. So, a 30-step image would cost $0.0039.

The Streamlit App

Install the dependencies and run locally with:

pip install fireworks-ai streamlit
streamlit run streamlit_app.py

Alternatively, if you'd rather not run it locally, you can deploy this app to Railway — just push the code to a GitHub repo and connect it from the Railway dashboard.

Here's the full source:

import os
import streamlit as st
import fireworks.client
from fireworks.client.image import ImageInference, Answer

# Streamlit app config
st.set_page_config(page_title="Fireworks Studio", page_icon="🔥", layout="centered")

with st.sidebar:
  st.title("🔥 Fireworks Studio")
  with st.expander("⚙️ Settings", expanded=True):
    fireworks_api_key = st.text_input("Fireworks API key", type="password", help="Get your key [here](https://fireworks.ai/settings/users/api-keys)")
    option = st.radio("Serverless model", [
      "📝 Kimi K2.5",
      "📝 MiniMax-M2.5",
      "📝 OpenAI gpt-oss 20B",
      "📝 Mixtral MoE 8x22B Instruct",
      "📝 Deepseek v3.2",
      "📝 GLM-5",
      "📷 Stable Diffusion XL"]
      )

col1, col2 = st.columns([4, 1])
prompt = col1.text_input("Prompt", label_visibility="collapsed")
submit = col2.button("Submit")

# If Submit button is clicked
if submit:
  if not fireworks_api_key.strip():
    st.warning("⚠️ Please enter your Fireworks API key in the sidebar.")
  elif not prompt.strip():
    st.warning("⚠️ Please enter a prompt.")
  else:
    try:
      with st.spinner("Please wait..."):
        fireworks.client.api_key = fireworks_api_key
        os.environ["FIREWORKS_API_KEY"] = fireworks_api_key
        
        if option == "📝 Kimi K2.5":
          # Run kimi-k2p5 model on Fireworks AI
          response = fireworks.client.ChatCompletion.create(
              model="accounts/fireworks/models/kimi-k2p5",
              messages=[{
                  "role": "user",
                  "content": prompt,
              }],
          )
          st.success(response.choices[0].message.content)
        elif option == "📝 MiniMax-M2.5":
          # Run minimax-m2p5 model on Fireworks AI
          response = fireworks.client.ChatCompletion.create(
              model="accounts/fireworks/models/minimax-m2p5",
              messages=[{
                  "role": "user",
                  "content": prompt,
              }],
          )
          st.success(response.choices[0].message.content)
        elif option == "📝 OpenAI gpt-oss 20B":
          # Run gpt-oss-20b model on Fireworks AI
          response = fireworks.client.ChatCompletion.create(
              model="accounts/fireworks/models/gpt-oss-20b",
              messages=[{
                  "role": "user",
                  "content": prompt,
              }],
          )
          st.success(response.choices[0].message.content)
        elif option == "📝 Mixtral MoE 8x22B Instruct":
          # Run mixtral-8x22b-instruct model on Fireworks AI
          response = fireworks.client.ChatCompletion.create(
              model="accounts/fireworks/models/mixtral-8x22b-instruct",
              messages=[{
                  "role": "user",
                  "content": prompt,
              }],
          )
          st.success(response.choices[0].message.content)
        elif option == "📝 Deepseek v3.2":
          # Run deepseek-v3p2 model on Fireworks AI
          response = fireworks.client.ChatCompletion.create(
              model="accounts/fireworks/models/deepseek-v3p2",
              messages=[{
                  "role": "user",
                  "content": prompt,
              }],
          )
          st.success(response.choices[0].message.content)
        elif option == "📝 GLM-5":
          # Run glm-5 model on Fireworks AI
          response = fireworks.client.ChatCompletion.create(
              model="accounts/fireworks/models/glm-5",
              messages=[{
                  "role": "user",
                  "content": prompt,
              }],
          )
          st.success(response.choices[0].message.content)
        elif option == "📷 Stable Diffusion XL":
          # Run stable-diffusion-xl-1024-v1-0 model on Fireworks AI
          client = ImageInference(model="stable-diffusion-xl-1024-v1-0")
          answer : Answer = client.text_to_image(
              prompt=prompt,
              cfg_scale=7,
              height=1024,
              width=1024,
              sampler=None,
              steps=30,
              seed=0,
              safety_check=False,
              output_image_format="PNG"
          )
          st.image(answer.image)
    except Exception as e:
      st.exception(f"Exception: {e}")