By alphasec in Security — Feb 16, 2024

Magika: Enhancing File Content Type Detection through Deep Learning

Enhancing file content type detection through deep learning with Magika, an open-source tool by Google.

In their latest blog, Google announced that they are open-sourcing Magika, an AI-powered file content type detection tool. Magika uses a lightweight, highly optimized deep learning model that enables precise file type identification within milliseconds, with over 99% accuracy. While file detection is not a new technology, the accuracy, performance, and maintainability of previous libraries like libmagic has been a tricky affair. With AI being increasingly used by bad actors to bypass detection mechanisms, it's good to see an increase in the positive uses of AI too. Over time, this can help tilt the scales in the favour of defenders by making it economically harder for adversaries.

Google has been using Magika internally to improve user safety across a whole host of web properties like Gmail, Drive, and Safe Browsing, and will now integrate it with their crowdsourced malware intelligence platform, VirusTotal, to augment it's Code Insight capabilities.

Magika is available as a standalone CLI tool, or as a Python/JavaScript package, and supports over 100 content types. You can test Magika using Google's web demo, or integrate with the Magika libraries in your own application; let's play with the Python library here.

Explore Magika Using a Streamlit App on Railway

In this post, we'll test Magika using a simple Streamlit app on Railway, a modern app hosting platform that makes it easy to deploy production-ready apps quickly. If you don't already have an account, sign up using GitHub, and click Authorize Railway App when redirected. Review and agree to Railway's Terms of Service and Fair Use Policy if prompted. Railway does not offer an always-free plan anymore, but the free trial is good enough to try this. Launch the Magika one-click starter template (or click the button below) to deploy it instantly on Railway.

You'll be given an opportunity to change the default repository name and set it private, if you'd like. Review the settings and click Deploy; the deployment will kick off immediately.

Deploy the Streamlit app using one-click starter on Railway

Once the deployment completes, the Streamlit app will be available at a default xxx.up.railway.app domain - launch this URL to access the web interface. If you are interested in setting up a custom domain, I covered it at length in a previous post - see the final section here. Drag-and-drop or upload different file types to test the detection speed and accuracy of Magika.

Test Magika content-type scanner using different file types

Here's the code for my Streamlit app - you can also find it on GitHub.

import os, tempfile, streamlit as st
from pathlib import Path
from magika import Magika
magika = Magika()

# Streamlit app
st.subheader("Magika content-type scanner")
source_file = st.file_uploader("Source File", label_visibility="collapsed")

if not source_file:
  st.error(f"Please upload the file to be classified.")
else:
  try:
    # Save uploaded file temporarily to disk, pass the file path to Magika, delete the temp file
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        tmp_file.write(source_file.read())
    result = magika.identify_path(Path(tmp_file.name))
    confidence = "{:.2f}%".format(result.output.score * 100)
    st.success(f"File type: {result.output.ct_label}\n\nDescription: {result.output.magic}\n\nConfidence: {confidence}")
    os.remove(tmp_file.name)
  except Exception as e:
    st.exception(f"An error occurred: {e}")

As you can see, the app is pretty basic, and simply tests the default capabilities of the Magika library. For more details, have a look at the Magika GitHub repo.

Explore Magika Using a Streamlit App on Railway

Subscribe to alphasec