By alphasec in AI/ML — 20 Apr 2023

Summarize Text with LangChain and OpenAI

A brief guide to summarizing text inputs with LangChain and OpenAI.

LangChain is an open-source framework created to aid the development of applications leveraging the power of large language models (LLMs). It can be used for chatbots, text summarisation, data generation, code understanding, question answering, evaluation, and more. Streamlit, on the other hand, is an open-source Python library that allows you to create and share interactive web apps and data visualisations with ease. Together, LangChain and Streamlit are a simple yet powerful combination for getting started with LLM web applications. In this post, we'll create a simple Streamlit application that summarizes text input from the user with LangChain and OpenAI.

Build a Streamlit App with LangChain for Summarization

To summarize text, we'll use the LangChain Chains module, which allows us to combine multiple components (e.g. prompts, LLMs, and multiple chains too) into a single application. Summarization of a large body of text or multiple documents generally runs into context window limitations i.e. you can only send a specific amount of text (or tokens) per request. While this is ok for question-answering or chatbot use cases, a summary naturally requires access to the entire input.

To deal with this, we'll use the concept of "chunking". Since the input text in this tutorial is relatively small, we do not need additional vector stores or databases to store and retrieve the input. In a subsequent post, I'll discuss how vector stores like Chroma or Pinecone can be used to deal with large documents.

Alright, so first, we split the text input into smaller chunks ("documents"), and then call the load_summarize_chain method to perform text summarization over the input. This method supports three types of chains - map_reduce, stuff, and refine - with map_reduce being the easiest chain to get started with. Depending on your needs, you can also use prompt templates to augment the response.

Here's an excerpt from the streamlit_app.py file - you can find the complete source code on GitHub. Shoutout to the official LangChain docs - much of the code is borrowed or influenced by it.

import os, streamlit as st
from langchain.text_splitter import CharacterTextSplitter
from langchain.docstore.document import Document
from langchain.llms.openai import OpenAI
from langchain.chains.summarize import load_summarize_chain

# Streamlit app
st.subheader('LangChain Text Summary')

# Get OpenAI API key and source text input
openai_api_key = st.text_input("OpenAI API Key", type="password")
source_text = st.text_area("Source Text", height=200)

# If the 'Summarize' button is clicked
if st.button("Summarize"):
    # Validate inputs
    if not openai_api_key.strip() or not source_text.strip():
        st.error(f"Please provide the missing fields.")
    else:
        try:
            with st.spinner('Please wait...'):
              # Split the source text
              text_splitter = CharacterTextSplitter()
              texts = text_splitter.split_text(source_text)

              # Create Document objects for the texts (max 3 pages)
              docs = [Document(page_content=t) for t in texts[:3]]

              # Initialize the OpenAI module, load and run the summarize chain
              llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
              chain = load_summarize_chain(llm, chain_type="map_reduce")
              summary = chain.run(docs)

              st.success(summary)
        except Exception as e:
            st.exception(f"An error occurred: {e}")

Deploy the Streamlit App on Railway

Railway is a modern app hosting platform that makes it easy to deploy production-ready apps quickly. Sign up for an account using GitHub, and click Authorize Railway App when redirected. Review and agree to Railway's Terms of Service and Fair Use Policy if prompted. Launch the LangChain Apps one-click starter template (or click the button below) to deploy the app instantly on Railway.

This template deploys several services - search, text summary (this one), document summary, news summary, and more. For each, you'll be given an opportunity to change the default repository name and set it private, if you'd like. Since you are deploying from a monorepo, configuring the first app should suffice. Accept the defaults and click Deploy; the deployment will kick off immediately.

LangChain Apps one-click template on Railway

Once the deployment completes, the Streamlit apps will be available at default xxx.up.railway.app domains - launch each URL to access the respective app. If you are interested in setting up a custom domain, I covered it at length in a previous post - see the final section here.

LangChain text summarizer deployed on Railway

Provide the OpenAI API key and the source text to be summarized, and click Summarize. Assuming your key is valid, the summary will be displayed in just a few seconds. If you don't have an OpenAI API key, you can get it here.

Run the Python Notebook with Google Colab

Google Colaboratory (Colab for short), is a cloud-based platform for data analysis and machine learning. It provides a free Jupyter notebook environment that allows users to write and execute Python code in their web browser, with no setup or configuration required. Fork the GitHub repository, or launch the notebook directly in Google Colab using the one-click button below. Click on the play button next to each cell to execute the code within it. Once all the cells execute successfully, the Streamlit app will be available on a ***.loca.lt URL - click to launch the app and play with it.

Build a Streamlit App with LangChain for Summarization

Deploy the Streamlit App on Railway

Run the Python Notebook with Google Colab

Subscribe to alphasec