Extract Keywords from Any Text Using TF-IDF

The Keyword Extractor service uses TF-IDF (Term Frequency–Inverse Document Frequency) to pull the most relevant keywords from any text. Feed it blog posts, support tickets, or technical docs and get back ranked keywords in milliseconds.

What You'll Learn

  • How to extract keywords from a block of text with a single API call
  • The difference between GENERAL and TECHNICAL extraction profiles
  • How to submit multiple documents in a batch request

Prerequisites

Before you start: You need an API key. If you don't have one yet, follow the Platform Quick Start to register and create a key.

  • curl (or any HTTP client)
  • A valid API key passed in the X-API-KEY header

Step 1: Extract Keywords from Article Text

Send a POST request with your text in the text field. The service returns ranked keywords with their TF-IDF scores.

bash
curl -X POST /api/keyword-extractor/extract \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: your-api-key" \
  -d '{
    "text": "Kubernetes has become the de facto standard for container orchestration. Companies adopting microservices architecture rely on Kubernetes to manage deployment, scaling, and networking of containerized applications across hybrid cloud environments.",
    "maxKeywords": 5
  }'
python
import requests

resp = requests.post(
    "/api/keyword-extractor/extract",
    headers={"X-API-KEY": "your-api-key"},
    json={
        "text": "Kubernetes has become the de facto standard for container orchestration...",
        "maxKeywords": 5
    }
)
print(resp.json())
Response — 200 OK
{
  "status": "OK",
  "data": {
    "keywords": [
      { "term": "kubernetes",    "score": 0.412 },
      { "term": "container",     "score": 0.287 },
      { "term": "orchestration", "score": 0.231 },
      { "term": "microservices", "score": 0.198 },
      { "term": "cloud",          "score": 0.154 }
    ],
    "profile": "GENERAL",
    "wordCount": 34
  }
}

Each keyword comes with a TF-IDF score between 0 and 1. Higher scores mean greater relevance to the input text.

Step 2: GENERAL vs TECHNICAL Profiles

The profile parameter controls the stopword list and stemming rules. Use GENERAL for everyday text and TECHNICAL for code-heavy or engineering content.

bash
curl -X POST /api/keyword-extractor/extract \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: your-api-key" \
  -d '{
    "text": "The NullPointerException was thrown from the HashMap.get() method during garbage collection. Heap dump analysis revealed a race condition in the ConcurrentHashMap resize path.",
    "maxKeywords": 5,
    "profile": "TECHNICAL"
  }'
Response — 200 OK
{
  "status": "OK",
  "data": {
    "keywords": [
      { "term": "NullPointerException",  "score": 0.389 },
      { "term": "ConcurrentHashMap",    "score": 0.321 },
      { "term": "garbage collection",   "score": 0.276 },
      { "term": "race condition",       "score": 0.241 },
      { "term": "heap dump",            "score": 0.198 }
    ],
    "profile": "TECHNICAL",
    "wordCount": 29
  }
}

Notice how TECHNICAL preserves compound terms like ConcurrentHashMap and garbage collection that GENERAL mode would split or discard.

Step 3: Batch Extraction

Submit multiple documents at once using the /extract/batch endpoint. Each item gets its own keyword list.

bash
curl -X POST /api/keyword-extractor/extract/batch \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: your-api-key" \
  -d '{
    "documents": [
      { "id": "doc-1", "text": "React hooks simplify state management in functional components." },
      { "id": "doc-2", "text": "PostgreSQL JSONB columns support GIN indexes for fast queries." }
    ],
    "maxKeywords": 3
  }'
Response — 200 OK
{
  "status": "OK",
  "data": {
    "results": [
      {
        "id": "doc-1",
        "keywords": [
          { "term": "React hooks", "score": 0.445 },
          { "term": "state management", "score": 0.312 },
          { "term": "functional components", "score": 0.278 }
        ]
      },
      {
        "id": "doc-2",
        "keywords": [
          { "term": "PostgreSQL", "score": 0.401 },
          { "term": "JSONB", "score": 0.356 },
          { "term": "GIN indexes", "score": 0.289 }
        ]
      }
    ]
  }
}

Integration Tips

  • Auto-tagging: Pipe blog posts through the extractor at publish time to generate SEO tags automatically.
  • Search enrichment: Store extracted keywords alongside documents to improve full-text search relevance.
  • Support triage: Extract keywords from incoming tickets to auto-assign categories and route to the right team.
  • Tune maxKeywords: Start with 5–10 for tagging, 3–5 for summaries. Scores drop off quickly past the top terms.

Next Steps