Detect File Types from Content (Not Just Extensions)

File extensions lie. A .jpg might actually be a PHP shell, and a .pdf might be a ZIP archive. The File Type service inspects the first bytes of a file — its magic number — to determine the true content type.

What You'll Learn

  • How to detect image types by sending base64-encoded content
  • How PDF detection works with magic byte signatures
  • Why content-based detection is more secure than trusting file extensions

Prerequisites

Before you start: You need an API key. Follow the Platform Quick Start to get one.

  • curl (or any HTTP client)
  • A valid API key in the X-API-KEY header
  • A sample file to test (or use the examples below)

Step 1: Detect a PNG Image

Send the first 64+ bytes of a file as base64. A PNG always starts with the bytes 89 50 4E 47.

bash
# Encode the first 256 bytes of a local image
SAMPLE=$(head -c 256 photo.png | base64 -w0)

curl -X POST /api/file-type/detect \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: your-api-key" \
  -d "{\"content\": \"$SAMPLE\"}"
python
import base64, requests

with open("photo.png", "rb") as f:
    sample = base64.b64encode(f.read(256)).decode()

resp = requests.post(
    "/api/file-type/detect",
    headers={"X-API-KEY": "your-api-key"},
    json={"content": sample}
)
print(resp.json())
Response — 200 OK
{
  "status": "OK",
  "data": {
    "detectedType": "image/png",
    "extension": "png",
    "confidence": 1.0,
    "magicBytes": "89504E47",
    "description": "PNG image"
  }
}

A confidence of 1.0 means the magic bytes are an exact match for a known signature.

Step 2: Detect a PDF Document

PDF files start with %PDF (hex 25504446). Even if someone renames a PDF to .docx, the magic bytes reveal the truth.

bash
SAMPLE=$(head -c 256 report.pdf | base64 -w0)

curl -X POST /api/file-type/detect \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: your-api-key" \
  -d "{\"content\": \"$SAMPLE\"}"
Response — 200 OK
{
  "status": "OK",
  "data": {
    "detectedType": "application/pdf",
    "extension": "pdf",
    "confidence": 1.0,
    "magicBytes": "25504446",
    "description": "PDF document"
  }
}

Step 3: Why Magic Numbers Beat Extensions

Relying on file extensions creates real security holes:

Scenario Extension Says Magic Bytes Say
Renamed PHP webshell .jpg text/x-php
ZIP disguised as document .pdf application/zip
EXE with double extension .pdf.exe application/x-dosexec

Security rule: Never trust user-supplied file extensions for access control or rendering decisions. Always verify with content-based detection.

Integration Tips

  • Upload validation: Check files before saving to storage. Reject mismatched extensions to block disguised malware.
  • Minimal payload: You only need the first 256 bytes. No need to upload the entire file.
  • Content-Disposition headers: Use the detected MIME type when serving files back to users.
  • Combine with virus scanning: Detect type first, then route to the appropriate scanner based on MIME type.

Next Steps