Find Duplicate Content with SHA-256 Hashing

The Deduplication service computes a SHA-256 hash of your content and checks it against previously seen hashes. Use it to prevent duplicate form submissions, detect re-posted articles, or deduplicate file uploads.

What You'll Learn

  • How to submit content and get a uniqueness verdict
  • What UNIQUE vs EXACT_DUPLICATE responses look like
  • How to use namespaces to isolate deduplication scopes

Prerequisites

Before you start: You need an API key. Follow the Platform Quick Start to get one.

  • curl (or any HTTP client)
  • A valid API key in the X-API-KEY header

Step 1: First Submission — UNIQUE

Submit content for the first time. The service hashes it with SHA-256, stores the hash, and returns UNIQUE.

bash
curl -X POST /api/deduplication/check \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: your-api-key" \
  -d '{
    "content": "This is an original blog post about distributed caching strategies for high-traffic applications.",
    "namespace": "blog-posts"
  }'
python
import requests

resp = requests.post(
    "/api/deduplication/check",
    headers={"X-API-KEY": "your-api-key"},
    json={
        "content": "This is an original blog post about distributed caching strategies for high-traffic applications.",
        "namespace": "blog-posts"
    }
)
print(resp.json())
Response — 200 OK
{
  "status": "OK",
  "data": {
    "result": "UNIQUE",
    "hash": "a1b2c3d4e5f6789012345678abcdef0123456789abcdef0123456789abcdef01",
    "namespace": "blog-posts",
    "firstSeen": "2026-03-26T14:30:00Z",
    "occurrences": 1
  }
}

The namespace field lets you keep separate deduplication pools. Blog posts and support tickets won't collide.

Step 2: Submit the Same Content Again — EXACT_DUPLICATE

Send the identical text again. The SHA-256 hash matches, so the service returns EXACT_DUPLICATE.

bash
# Same payload as Step 1
curl -X POST /api/deduplication/check \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: your-api-key" \
  -d '{
    "content": "This is an original blog post about distributed caching strategies for high-traffic applications.",
    "namespace": "blog-posts"
  }'
Response — 200 OK
{
  "status": "OK",
  "data": {
    "result": "EXACT_DUPLICATE",
    "hash": "a1b2c3d4e5f6789012345678abcdef0123456789abcdef0123456789abcdef01",
    "namespace": "blog-posts",
    "firstSeen": "2026-03-26T14:30:00Z",
    "occurrences": 2
  }
}

Note: Even a single character change (including whitespace) produces a completely different SHA-256 hash. This is exact deduplication, not fuzzy matching.

Step 3: Different Content — UNIQUE Again

Submit different text. Different hash, different verdict.

bash
curl -X POST /api/deduplication/check \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: your-api-key" \
  -d '{
    "content": "A completely different article about event-driven architecture patterns.",
    "namespace": "blog-posts"
  }'
Response — 200 OK
{
  "status": "OK",
  "data": {
    "result": "UNIQUE",
    "hash": "f9e8d7c6b5a4321098765432fedcba9876543210fedcba9876543210fedcba98",
    "namespace": "blog-posts",
    "firstSeen": "2026-03-26T14:31:12Z",
    "occurrences": 1
  }
}

Integration Tips

  • Idempotent form submissions: Check before processing a payment or order to prevent double-charges.
  • Content moderation pipelines: Deduplicate before spending resources on AI classification.
  • Namespace strategy: Use user-{id} namespaces to detect per-user duplicates while allowing the same content from different users.
  • TTL support: Set "ttlHours": 24 to auto-expire hashes — useful for time-windowed deduplication.

Next Steps