Safe Browsing with Google Cloud Web Risk

A brief guide to Google Safe Browsing, a free URL risk assessment service, and its commercial counterpart, Google Cloud Web Risk.

In this post, we'll talk about one of the most widely-used free security services, Google Safe Browsing, and its commercial counterpart, Google Cloud Web Risk, and the role they play in keeping over five billion devices safe everyday. We'll also create a simple Streamlit application to showcase how developers can integrate with the Web Risk API programmatically in their own applications.

What is Google Safe Browsing?

Google Safe Browsing is a free service from Google that warns users when they try to visit a dangerous website or download a malicious file. If using Google's webmaster tools, Safe Browsing notifies website owners when their sites are compromised, and helps them diagnose and clean up the problem. Safe Browsing was launched in 2005 and has since become an integral part of the web browsing experience for millions of users worldwide. It works by constantly analyzing billions of URLs and web pages to identify any suspicious activity or content. When a user attempts to access a potentially harmful website, Safe Browsing immediately issues a warning, giving the user the option to proceed at their own risk or return to safety.

Safe Browsing warning for a site that contains malware
Safe Browsing warning for a site that contains malware

Safe Browsing has been adopted by all major web browsers today (except Edge), as well as several security products and services, and has been instrumental in the fight against online threats. Safe Browsing offers two APIs - Lookup and Update. While the Lookup API is extremely fast and easy-to-use, it has one privacy drawback - the URLs aren't hashed, so the server theoretically knows which URLs have been looked up. To address this, Google offers the Update API, which instead compares 32-bit hash prefixes of the URL to preserve privacy. Users can manually report phishing pages using this form too.

Supported threat types (source: safebrowsing.google.com)
Supported threat types (source: safebrowsing.google.com)

Malware, social engineering, and unwanted software are the major threat types supported today. Users who need or want more advanced capabilities while browsing the web can enable Enhanced Safe Browsing; when enabled, this option shares additional information with Google servers in return for deeper file scans and protection against previously unknown attacks. The Safe Browsing API is meant for non-commercial use only; for revenue-generating use cases, users are encouraged to use the Web Risk API.

What is Google Cloud Web Risk?

Google Cloud Web Risk is effectively the "Enterprise" edition of Safe Browsing. It extends Safe Browsing capabilities to use the APIs at higher volumes, offers access to enterprise features like risk scoring, confidence levels, file/attachment reputation coverage, and integrates with Google Cloud security and analytics tools. More importantly, it offers an SLA, which is critical for Enterprise adoption.

Web Risk offers four methods to check a URL reputation and risk profile. Note that the information returned by Web Risk cannot be redistributed further. In the next section, we'll explore the Lookup API more extensively.

  • Lookup API: Lets client applications send URLs to the server as HTTP requests, and receive a verdict and type in response. Free for up to 100,000 requests per month.
  • Update API: Lets client applications download and periodically update hashed versions of Safe Browser lists to a local database for client-side verdict checks. Free for local database checks, but live URL checks are chargeable.
  • Submission API: Lets client applications submit suspicious URLs to Safe Browsing for analysis, and subsequent protection. Free for up to 100 submissions per month.
  • Evaluate API: In preview, returns a confidence score (instead of a binary result) that indicates the maliciousness of a URL based on blocklists, machine learning models, and heuristic rules.

Build a Streamlit App for Web Risk Lookup API

For this tutorial, you need a Google Cloud account, and a project with the Web Risk API enabled. If you don't already have an account, sign up here - new customers get a generous $300 credit for 90 days, with several always-free products once the trial period expires. From the Google Cloud Console, enable billing on your account, create a new project, and do the following:

  • Navigate to APIs & Services > Enabled APIs & services, and select Web Risk API from the list. From the Credentials tab, click Create Credentials > Service account, and create a service account.
  • Add and download a new key for this service account i.e. JSON credential file.

Using Streamlit to render the interface, the app requires the user to upload the service account credential file, upon which the user can lookup either a URL or a URL hash prefix using the WebRiskServiceClient module. Here's an excerpt from the streamlit_app.py file - you can find the complete source code on GitHub. Note that this is just proof-of-concept code; it has not been developed or optimized for production usage.

import os, streamlit as st
from google.cloud import webrisk_v1

# Streamlit app
st.subheader('Google Cloud Web Risk API')

# Create a file upload widget for the credentials JSON file
creds_file = st.file_uploader("Upload Google Cloud credentials file", type="json")

# If the user has uploaded a file, read its contents and set the GOOGLE_APPLICATION_CREDENTIALS environment variable
if creds_file is not None:
    creds_contents = creds_file.read().decode("utf-8")
    with open("temp_credentials.json", "w") as f:
        f.write(creds_contents)
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "temp_credentials.json"

    # Get the URL or hash prefix inputs for lookup
    col1, col2 = st.columns(2)
    url_input = col1.text_input('Enter URL', value='')
    url_lookup = col1.button(key='LookupURL', label='Lookup')
    hash_input = col2.text_input('Enter hash prefix', value='')
    hash_lookup = col2.button(key='LookupHash', label='Lookup')
    
    # Initialize the WebRiskServiceClient and define the threat types
    client = webrisk_v1.WebRiskServiceClient()
    threat_types = [webrisk_v1.ThreatType.MALWARE,
                        webrisk_v1.ThreatType.SOCIAL_ENGINEERING,
                        webrisk_v1.ThreatType.SOCIAL_ENGINEERING_EXTENDED_COVERAGE,
                        webrisk_v1.ThreatType.UNWANTED_SOFTWARE]

    if url_lookup:
        if not url_input.strip():
            col1.write(f'Provide the URL for lookup.')    
        else:
            try:
                # Call the Lookup API for submitted URL
                response = client.search_uris(uri=url_input, threat_types=threat_types)
        
                if response.threat:
                    st.error(f'The URL `{url_input}` is associated with:\n - {response.threat.threat_types[0].name}')
                else:
                    st.success(f'The URL `{url_input}` appears safe.')
            except Exception as e:
                st.error(f"An error occurred: {e}")

    if hash_lookup:
        if not hash_input.strip():
            col2.write(f'Provide the hash prefix for lookup.')    
        else:
            try:
                # Call the Lookup API for submitted hash prefix
                response = client.search_hashes(hash_prefix=hash_input, threat_types=threat_types)

                if len(response.threats) > 0:
                    st.error(f'The hash prefix {hash_input} matched the following hashes:\n')
                    for threat_hash in response.threats:
                        st.write(threat_hash.hash)
                else:
                    st.success(f'The hash prefix {hash_input} appears safe.')
            except Exception as e:
                st.error(f"An error occurred: {e}")

Disclaimer: This is not an officially supported Google or Google Cloud project; it is a personal project created for educational and experimental purposes.

Deploy the Streamlit App on Railway

Railway is a modern app hosting platform that makes it easy to deploy production-ready apps quickly. Sign up for an account using GitHub, and click Authorize Railway App when redirected. Review and agree to Railway's Terms of Service and Fair Use Policy if prompted. Launch the Google Cloud Apps one-click starter template (or click the button below) to deploy the app instantly on Railway.

Deploy on Railway

This template deploys several services - PaLM API, Web Risk API (this one), and more. For each, you'll be given an opportunity to change the default repository name and set it private, if you'd like. Since you are deploying from a monorepo, configuring the first app should suffice. Accept the defaults and click Deploy; the deployment will kick off immediately.

Google Cloud Apps one-click template on Railway
Google Cloud Apps one-click template on Railway

Once the deployment completes, the Streamlit apps will be available at default xxx.up.railway.app domains - launch each URL to access the respective app. If you are interested in setting up a custom domain, I covered it at length in a previous post - see the final section here.

Lookup URLs using Google Cloud Web Risk API
Lookup URLs using Google Cloud Web Risk API

Upload the JSON credential file, and specify the URL to be verified. If you want test URLs that trigger positive Lookup API results, Google provides them here.

Source: testsafebrowsing.appspot.com
Source: testsafebrowsing.appspot.com

Here are sample results for URLs that test positive for MALWARE and SOCIAL_ENGINEERING threat types respectively. Note that the app only displays the first threat type; feel free to modify the code to show all matches.

Sample lookup results for a malicious URL
Sample lookup results for a malicious URL
Sample lookup results for a phishing URL
Sample lookup results for a phishing URL

And here's an example of a hash prefix that tests negative with the Lookup API. The hash prefix consists of the most significant 4-32 bytes of a SHA256 hash. For JSON requests, this field is base64-encoded. As discussed earlier, hashes are useful when you don't want to submit the actual URLs for privacy reasons. Of course, since you are submitted a partial hash (prefix), there is a likelihood of a false positive. But, depending on the criticality of your use case, that's a tradeoff you can choose to make.

Sample lookup results for a URL hash prefix
Sample lookup results for a URL hash prefix

Obviously, this web application is meant for demonstration purposes only. In reality, you would integrate the API calls programmatically in your application, typically after you ingest the URL but before you act upon it further.

Subscribe to alphasec

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe