Blinkist for URLs with LangChain and OpenAI

A brief guide to AI-generated web URL summaries with LangChain and OpenAI.

In my previous post, I used LangChain and Serper API to retrieve and summarize Google news search results. Today, I'll reuse some of the components, namely the LangChain UnstructuredURLLoader module and OpenAI, to demonstrate how you can summarize the contents of any web URL. In other words, a Blinkist for URLs!

Build a Streamlit App with LangChain and OpenAI

LangChain is an open-source framework created to aid the development of applications leveraging the power of large language models (LLMs). Add Streamlit, an open-source Python library that allows you to create and share interactive web apps and data visualisations, and you can rapidly prototype LLM web apps.

I've covered document loaders at length in Part 4 of the LangChain Decoded blog series. This app loads URL content specifically using the UnstructuredURLLoader, and runs the load_summarize_chain module with the gpt-3.5-turbo chat model to generate a summary. Here's an excerpt from the streamlit_app.py file - you can find the complete source code on GitHub. Note that this is just proof-of-concept code; it has not been developed or optimized for production usage.

import validators, streamlit as st
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import YoutubeLoader, UnstructuredURLLoader

# Streamlit app
st.subheader('Summarize URL')

# Get OpenAI API key and URL to be summarized
with st.sidebar:
    openai_api_key = st.text_input("OpenAI API key", value="", type="password")
url = st.text_input("URL", label_visibility="collapsed")

# If 'Summarize' button is clicked
if st.button("Summarize"):
    # Validate inputs
    if not openai_api_key.strip() or not url.strip():
        st.error("Please provide the missing fields.")
    elif not validators.url(url):
        st.error("Please enter a valid URL.")
    else:
        try:
            with st.spinner("Please wait..."):
                # Load URL data
                if "youtube.com" in url:
                    loader = YoutubeLoader.from_youtube_url(url, add_video_info=True)
                else:
                    loader = UnstructuredURLLoader(urls=[url], ssl_verify=False, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_5_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"})
                data = loader.load()
                
                # Initialize the ChatOpenAI module, load and run the summarize chain
                llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo", openai_api_key=openai_api_key)
                prompt_template = """Write a summary of the following in 250-300 words:
                    
                    {text}

                """
                prompt = PromptTemplate(template=prompt_template, input_variables=["text"])
                chain = load_summarize_chain(llm, chain_type="stuff", prompt=prompt)
                summary = chain.run(data)

                st.success(summary)
        except Exception as e:
            st.exception(f"Exception: {e}")

Deploy the Streamlit App on Railway

Railway is a modern app hosting platform that makes it easy to deploy production-ready apps quickly. Sign up for an account using GitHub, and click Authorize Railway App when redirected. Review and agree to Railway's Terms of Service and Fair Use Policy if prompted. Launch the LangChain Apps one-click starter template (or click the button below) to deploy the app instantly on Railway.

Deploy on Railway

This template deploys several services - search, text summary, document summary, URL summary (this one), and more. For each, you'll be given an opportunity to change the default repository name and set it private, if you'd like. Since you are deploying from a monorepo, configuring the first app should suffice. Accept the defaults and click Deploy; the deployment will kick off immediately.

LangChain Apps one-click template on Railway
LangChain Apps one-click template on Railway

Once the deployment completes, the Streamlit apps will be available at default xxx.up.railway.app domains - launch each URL to access the respective app. If you are interested in setting up a custom domain, I covered it at length in a previous post - see the final section here.

Summarize URLs with LangChain and OpenAI
Summarize URLs with LangChain and OpenAI

Provide the OpenAI API key, the URL to be summarized, and click Summarize. If you don't have an OpenAI API key, you can get it here.

Sample URL summary
Sample URL summary

That's it - a super easy implementation! Fork the GitHub repo or follow along to explore other use cases like text/document summarization, generative question-answering, news search, and more.

Subscribe to alphasec

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe