Running Open-Source Generative AI Models on Replicate

Run open-source AI models on Replicate — text, image, code, audio, and music generation via a Streamlit app with Llama, Flux Schnell, MusicGen, and more.

Image generated using Stable Diffusion XL on Replicate
Image generated using Stable Diffusion XL on Replicate

Since Meta released Llama, open-source generative AI models have gone from research curiosity to a genuine alternative to closed APIs. The ecosystem has grown fast — text, image, code, audio, and music generation are all covered by publicly hosted models you can call with a few lines of Python. Replicate makes this straightforward: you get a cloud API for running open-source models without managing any infrastructure, and you only pay for active processing time. Sign up, grab an API key, and you're ready to go.

Select a model to run on Replicate
Select a model to run on Replicate

Generate Text using Meta Llama 3 70B Instruct

Meta Llama 3 70B Instruct is a 70 billion parameter language model from Meta, pre-trained and fine-tuned for chat completions. It has a context window of 8000 tokens, double that of Llama 2. You can modify the system prompt to guide model responses. The model is currently priced at $0.65 / 1M input tokens, and $2.75 / 1M output tokens.

Reason using Meta Llama 3.1 405B Instruct

Meta Llama 3.1 405B Instruct is a 405 billion parameter multi-lingual language model from Meta, pre-trained on ~15T+ tokens, and fine-tuned for chat completions. It was trained on 8 languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. It has a context window of 8000 tokens; you can also modify the system prompt to guide model responses. The model is currently priced at $9.50 / 1M input or output tokens.

Generate Text using Google Gemma 7B Instruct

Google Gemma 7B Instruct is a 7 billion parameter instruction-tuned language model from Google, built from the same research and technology used to create the larger, more powerful Gemini models. The model runs on NVIDIA A40 (Large) GPU hardware; each prediction typically completes within 6 seconds and costs $0.000725/sec ($2.61/hr).

Generate Text using Mistral 7B

Mistral 7B model is a compact but capable 7 billion parameter base model from Mistral AI. It punches above its weight for its size, making it a solid choice when you want low-latency text generation without the cost of larger models. Currently priced at $0.05/1M input tokens and $0.25/1M output tokens.

Generate Image using Stable Diffusion 3

Stable Diffusion 3 is a diffusion-based text-to-image generative AI model that creates beautiful images. It excels at photorealism, typography and prompt following. You can apply a watermark, and enable a safety check for generated images. The model is priced by the number of images generated; each image generation costs $0.035.

Image generation using Stable Diffusion XL model
Image generation using Stable Diffusion XL model

Generate Image using Black Forest Labs Flux Schnell

Flux Schnell is a fast text-to-image model from Black Forest Labs, optimised for speed without sacrificing output quality. It's a strong alternative to Stable Diffusion 3 when turnaround time matters more than fine-grained control. Priced per image generated.

Generate Code using Meta Code Llama 70B Instruct

Meta Code Llama 70B Instruct is a 70 billion parameter language model from Meta, pre-trained and fine-tuned for coding and conversation. The model runs on NVIDIA A100 (80GB) GPU hardware; each prediction typically completes within 22 seconds and costs $0.0014/sec ($5.04/hr).

Convert Text to Audio using Suno AI Bark

Suno AI Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model runs on Nvidia T4 GPU hardware; each prediction typically completes within 42 seconds and the costs vary based on the inputs.

Generate Music using Meta MusicGen

Meta MusicGen is a generative AI model from Meta, pre-trained to generate music from a prompt or melody. The model runs on NVIDIA A100 (40GB) GPU hardware; each prediction typically completes within 53 seconds and costs $0.00115/sec ($4.14/hr).

Music generation using Meta MusicGen model
Music generation using Meta MusicGen model

The Streamlit App

Install the dependencies and run locally with:

pip install replicate streamlit
streamlit run streamlit_app.py

Alternatively, if you'd rather not run it locally, you can deploy this app to Railway — just push the code to a GitHub repo and connect it from the Railway dashboard.

Here's the full source:

import os
import replicate
import streamlit as st

# Streamlit app config
st.set_page_config(page_title="Replicate Studio", page_icon="🚀", layout="centered")

with st.sidebar:
  st.title("🚀 Replicate Studio")
  with st.expander("⚙️ Settings", expanded=True):
    replicate_api_token = st.text_input("Replicate API token", type="password", help="Get your token [here](https://replicate.com)")
    option = st.radio("Serverless model", [
      "📝 Meta Llama 3 70B Instruct",
      "📝 Meta Llama 3.1 405B Instruct",
      "📝 Google Gemma 7B Instruct",
      "📝 Mistral 7B",
      "📷 Stable Diffusion 3",
      "📷 Black Forest Labs Flux Schnell",
      "💻 Meta Code Llama 70B Instruct",
      "🎙️ Suno AI Bark",
      "🎵 Meta MusicGen"
      ]
    )

os.environ["REPLICATE_API_TOKEN"] = replicate_api_token

col1, col2 = st.columns([4, 1])
prompt = col1.text_input("Prompt", label_visibility="collapsed")
submit = col2.button("Submit")

# If Generate button is clicked
if submit:
  if not replicate_api_token.strip() or not prompt.strip():
    st.error("Please provide the missing fields.")
  else:
    try:
      with st.spinner("Please wait..."):
        if option == "📝 Meta Llama 3 70B Instruct":
          # Run meta/meta-llama-3-70b-instruct model on Replicate
          output = replicate.run(
              "meta/meta-llama-3-70b-instruct",
              input={
                  "debug": False,
                  "top_k": 0,
                  "top_p": 0.9,
                  "prompt": prompt,
                  "temperature": 0.6,
                  "system_prompt": "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.",
                  "max_new_tokens": 500,
                  "min_new_tokens": -1
              },
          )
          st.success(''.join(output))
        elif option == "📝 Meta Llama 3.1 405B Instruct":
          # Run meta/meta-llama-3.1-405b-instruct model on Replicate
          output = replicate.run(
              "meta/meta-llama-3.1-405b-instruct",
              input={
                  "debug": False,
                  "top_k": 50,
                  "top_p": 0.9,
                  "prompt": prompt,
                  "temperature": 0.6,
                  "system_prompt": "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.",
                  "max_tokens": 1024,
                  "min_tokens": 0
              },
          )
          st.success(''.join(output))
        elif option == "📝 Google Gemma 7B Instruct":
          # Run google-deepmind/gemma-7b-it model on Replicate
          output = replicate.run(
              "google-deepmind/gemma-7b-it:2790a695e5dcae15506138cc4718d1106d0d475e6dca4b1d43f42414647993d5",
              input={
                  "top_k": 50,
                  "top_p": 0.95,
                  "prompt": prompt,
                  "temperature": 0.7,
                  "max_new_tokens": 512,
                  "min_new_tokens": -1,
                  "repetition_penalty": 1
              },
          )
          st.success(''.join(output))
        elif option == "📝 Mistral 7B":
          # Run mistralai/mistral-7b-v0.1 model on Replicate
          output = replicate.run(
              "mistralai/mistral-7b-v0.1",
              input={
                  "top_k": 50,
                  "top_p": 0.9,
                  "prompt": prompt,
                  "temperature": 0.6,
                  "max_new_tokens": 1024,
                  "prompt_template": "<s>[INST] {prompt} [/INST] ",
                  "presence_penalty": 0,
                  "frequency_penalty": 0
              },
          )
          st.success(''.join(output))
        elif option == "📷 Stable Diffusion 3":
          # Run stability-ai/stable-diffusion-3 image model on Replicate
          output = replicate.run(
            "stability-ai/stable-diffusion-3", 
            input={
              "prompt": prompt,
              "aspect_ratio": "3:2"
            }
          )
          st.image(output)
        elif option == "📷 Black Forest Labs Flux Schnell":
          # Run black-forest-labs/flux-schnell image model on Replicate
          output = replicate.run(
            "black-forest-labs/flux-schnell", 
            input={
              "prompt": prompt,
              "aspect_ratio": "3:2"
            }
          )
          st.image(output)
        elif option == "💻 Meta Code Llama 70B Instruct":
          # Run meta/codellama-70b-instruct model on Replicate
          output = replicate.run(
              "meta/codellama-70b-instruct:a279116fe47a0f65701a8817188601e2fe8f4b9e04a518789655ea7b995851bf",
              input={
                  "top_k": 10,
                  "top_p": 0.95,
                  "prompt": prompt,
                  "max_tokens": 500,
                  "temperature": 0.8,
                  "system_prompt": "",
                  "repeat_penalty": 1.1,
                  "presence_penalty": 0,
                  "frequency_penalty": 0
              }
          )
          st.success(''.join(output))
        elif option == "🎙️ Suno AI Bark":
          # Run suno-ai/bark model on Replicate
          output = replicate.run(
              "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
              input={
                  "prompt": prompt,
                  "text_temp": 0.7,
                  "output_full": False,
                  "waveform_temp": 0.7,
                  "history_prompt": "announcer"
              }
          )
          st.audio(output.get('audio_out'), format="audio/wav")
        elif option == "🎵 Meta MusicGen":
          # Run meta/musicgen model on Replicate
          output = replicate.run(
              "meta/musicgen:671ac645ce5e552cc63a54a2bbff63fcf798043055d2dac5fc9e36a837eedcfb",
              input={
                  "top_k": 250,
                  "top_p": 0,
                  "prompt": prompt,
                  "duration": 33,
                  "temperature": 1,
                  "continuation": False,
                  "model_version": "stereo-large",
                  "output_format": "mp3",
                  "continuation_start": 0,
                  "multi_band_diffusion": False,
                  "normalization_strategy": "peak",
                  "classifier_free_guidance": 3
              }
          )
          st.audio(output, format="audio/mp3")
    except Exception as e:
      st.exception(f"Exception: {e}")

Subscribe to alphasec

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe