By alphasec in AI/ML — 19 Aug 2024

Protect Jupyter Notebooks with NB Defense Extension

A step-by-step guide on protecting Jupyter notebooks with NB Defense, an open-source extension by Protect AI.

In a previous post, we talked about Jupyter notebooks as an indispensable tool for data science professionals, and also explored self-hosting options. However, these notebooks come with several security risks that most users are either oblivious to, or don't know how to deal with. Not a exhaustive list by any means, but just to name a few issues:

Arbitrary code execution: Jupyter notebooks can execute any code, including malicious scripts, and potentially compromise the systems where they are deployed.
Lack of access controls: By default, Jupyter notebooks lack strong authentication and authorisation mechanisms, often leading to unauthorised access of sensitive data if compromised.
Sensitive data exposure: Jupyter notebooks may inadvertently store and display sensitive or personally identifiable data, secrets like API keys, access tokens, database credentials etc.
Insecure network communications: Without proper network configuration, Jupyter notebooks may communicate over unencrypted channels and risk man-in-the-middle attacks.
Kernel vulnerabilities and injection attacks: Vulnerabilities in the Jupyter server components, or the notebook itself, may lead to system exploits and unauthorised access subsequently.

Given the wide range of potential security risks, it is imperative that we consider adequate security mechanisms for the deployment and use of Jupyter notebooks. Hardening a Jupyter server environment is out of scope for this blog post, however, let's look at an open-source option for securing notebook usage.

What is NB Defense?

NB Defense, by Protect AI, is an open-source Jupyter notebook scanning tool. It is available as a CLI tool, as well as an SDK and JupyterLab Extension (JLE). It allows you to look for secrets, personally identifiable information (PII), dependency vulnerabilities, and non-permissive licenses in ML OSS frameworks, libraries, and packages.

With the JupyterLab Extension, you can get contextual help to identify issues within your Jupyter notebook, shifting security left, and making notebook users a strong participant in the overall security program. The CLI tool, on the other hand, is intended to be inserted into Continuous Integration (CI) systems as a pre-commit hook, ensuring a secure and streamlined process.

Protect Jupyter Notebooks with Built-in NB Defense

Instead of deploying a Jupyter notebook environment first, and then installing the NB Defense extension, I wanted to deploy them together - secure by default. Since this mechanism did not exist, I created a Dockerfile to achieve the same net result. For this exercise, I'm using the base-notebook Jupyter image for deployment - it includes a wide range of commonly used Python libraries for data science and scientific computing, such as NumPy, Pandas, Matplotlib, and SciPy. Here's the Dockerfile for your reference.

# Use the official JupyterLab base-notebook image as the base image
FROM jupyter/base-notebook:python-3.10

# Switch to root to install system packages
USER root

# Install curl
RUN apt-get update && apt-get install -y curl

# Ensure curl is in the PATH for all users
ENV PATH="/usr/bin/curl:$PATH"

# Install JupyterLab 3.x and the nbdefense_jupyter extension
RUN pip install jupyterlab==3.* nbdefense_jupyter

# Install the en_core_web_trf model for spaCy if the PII module is needed
RUN pip install spacy && python -m spacy download en_core_web_trf

# Switch back to the jovyan user
USER $NB_UID

# Enable the nbdefense_jupyter extension
RUN jupyter server extension enable nbdefense_jupyter

# Expose the port JupyterLab will run on
EXPOSE 8888

# Start JupyterLab by default
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]

Deploy a Jupyter Notebook with NB Defense on Railway

💡

Note: The Railway one-click starter template has been deprecated, but you can still deploy the code to Railway or other platforms.

Let's self-host a Jupyter Notebook with the NB Defense extension on Railway, a modern app hosting platform that makes it easy to deploy production-ready apps quickly. If you don't already have an account, sign up using GitHub, and click Authorize Railway App when redirected. Railway does not offer an always-free plan anymore, but the free trial is good enough to try this. Launch the one-click starter template (or click the button below) to deploy it instantly on Railway.

Review the default settings and click Deploy; the deployment will kick off immediately.

Deploy Jupyter Notebook + NB Defense using one-click starter on Railway

Once the deployment completes, a Jupyter Notebook with the pre-installed NB Defense extension will be available at a default xxx.up.railway.app domain - launch this URL to access the web interface. If you are interested in setting up a custom domain, I covered it at length in a previous post - see the final section here.

💡

Once you deploy the service, look for the authentication token under the Deploy Logs section of the deployed service - you'll need this to access the Jupyter notebook for the first time.

Look for the authentication token under Deploy Logs

Once you provide the authentication token and click Log in, you'll see the default dashboard for the Jupyter Notebook, along with a tab for NB Defense.

Default dashboard for a self-hosted Jupyter Notebook

Create a new notebook, or import an existing one, and click Scan with NB Defense from the menu bar to check for potential issues.

The first scan may take a bit longer as the notebook downloads and initialises the necessary files, but subsequent scans should be faster. If your notebook has no obvious issues, NB Defense will give you the green light.

Jupyter notebook after NB Defense scan - no issues found

If your notebook contains secrets, PII, dependency vulnerabilities, or non-permissive licenses, NB Defense will highlight them for you, along with a severity rating for each identified issue.

Jupyter notebook after NB Defense scan - some issues found

What is NB Defense?

Protect Jupyter Notebooks with Built-in NB Defense

Deploy a Jupyter Notebook with NB Defense on Railway

Subscribe to alphasec