Blinkist for URLs with LangChain and OpenAI
In my previous post, I used LangChain and Serper API to retrieve and summarize Google news search results. Today, I'll reuse some of the components, namely the LangChain UnstructuredURLLoader
module and OpenAI, to demonstrate how you can summarize the contents of any web URL. In other words, a Blinkist for URLs!
Build a Streamlit App with LangChain and OpenAI
LangChain is an open-source framework created to aid the development of applications leveraging the power of large language models (LLMs). Add Streamlit, an open-source Python library that allows you to create and share interactive web apps and data visualisations, and you can rapidly prototype LLM web apps.
I've covered document loaders at length in Part 4 of the LangChain Decoded blog series. This app loads URL content specifically using the UnstructuredURLLoader
, and runs the load_summarize_chain
module with the gpt-3.5-turbo
chat model to generate a summary. Here's an excerpt from the streamlit_app.py
file - you can find the complete source code on GitHub. Note that this is just proof-of-concept code; it has not been developed or optimized for production usage.
import validators, streamlit as st
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import YoutubeLoader, UnstructuredURLLoader
# Streamlit app
st.subheader('Summarize URL')
# Get OpenAI API key and URL to be summarized
with st.sidebar:
openai_api_key = st.text_input("OpenAI API key", value="", type="password")
url = st.text_input("URL", label_visibility="collapsed")
# If 'Summarize' button is clicked
if st.button("Summarize"):
# Validate inputs
if not openai_api_key.strip() or not url.strip():
st.error("Please provide the missing fields.")
elif not validators.url(url):
st.error("Please enter a valid URL.")
else:
try:
with st.spinner("Please wait..."):
# Load URL data
if "youtube.com" in url:
loader = YoutubeLoader.from_youtube_url(url, add_video_info=True)
else:
loader = UnstructuredURLLoader(urls=[url], ssl_verify=False, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_5_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"})
data = loader.load()
# Initialize the ChatOpenAI module, load and run the summarize chain
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo", openai_api_key=openai_api_key)
prompt_template = """Write a summary of the following in 250-300 words:
{text}
"""
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(llm, chain_type="stuff", prompt=prompt)
summary = chain.run(data)
st.success(summary)
except Exception as e:
st.exception(f"Exception: {e}")
Deploy the Streamlit App on Railway
Railway is a modern app hosting platform that makes it easy to deploy production-ready apps quickly. Sign up for an account using GitHub, and click Authorize Railway App
when redirected. Review and agree to Railway's Terms of Service and Fair Use Policy if prompted. Launch the LangChain Apps one-click starter template (or click the button below) to deploy the app instantly on Railway.
This template deploys several services - search, text summary, document summary, URL summary (this one), and more. For each, you'll be given an opportunity to change the default repository name and set it private, if you'd like. Since you are deploying from a monorepo, configuring the first app should suffice. Accept the defaults and click Deploy
; the deployment will kick off immediately.
Once the deployment completes, the Streamlit apps will be available at default xxx.up.railway.app
domains - launch each URL to access the respective app. If you are interested in setting up a custom domain, I covered it at length in a previous post - see the final section here.
Provide the OpenAI API key, the URL to be summarized, and click Summarize
. If you don't have an OpenAI API key, you can get it here.
That's it - a super easy implementation! Fork the GitHub repo or follow along to explore other use cases like text/document summarization, generative question-answering, news search, and more.