Extract Keywords from Any Text Using TF-IDF
The Keyword Extractor service uses TF-IDF (Term Frequency–Inverse Document Frequency) to pull the most relevant keywords from any text. Feed it blog posts, support tickets, or technical docs and get back ranked keywords in milliseconds.
What You'll Learn
- How to extract keywords from a block of text with a single API call
- The difference between
GENERALandTECHNICALextraction profiles - How to submit multiple documents in a batch request
Prerequisites
Before you start: You need an API key. If you don't have one yet, follow the Platform Quick Start to register and create a key.
curl(or any HTTP client)- A valid API key passed in the
X-API-KEYheader
Step 1: Extract Keywords from Article Text
Send a POST request with your text in the text field. The service returns ranked keywords with their TF-IDF scores.
curl -X POST /api/keyword-extractor/extract \
-H "Content-Type: application/json" \
-H "X-API-KEY: your-api-key" \
-d '{
"text": "Kubernetes has become the de facto standard for container orchestration. Companies adopting microservices architecture rely on Kubernetes to manage deployment, scaling, and networking of containerized applications across hybrid cloud environments.",
"maxKeywords": 5
}'
import requests
resp = requests.post(
"/api/keyword-extractor/extract",
headers={"X-API-KEY": "your-api-key"},
json={
"text": "Kubernetes has become the de facto standard for container orchestration...",
"maxKeywords": 5
}
)
print(resp.json())
{
"status": "OK",
"data": {
"keywords": [
{ "term": "kubernetes", "score": 0.412 },
{ "term": "container", "score": 0.287 },
{ "term": "orchestration", "score": 0.231 },
{ "term": "microservices", "score": 0.198 },
{ "term": "cloud", "score": 0.154 }
],
"profile": "GENERAL",
"wordCount": 34
}
}
Each keyword comes with a TF-IDF score between 0 and 1. Higher scores mean greater relevance to the input text.
Step 2: GENERAL vs TECHNICAL Profiles
The profile parameter controls the stopword list and stemming rules. Use GENERAL for everyday text and TECHNICAL for code-heavy or engineering content.
curl -X POST /api/keyword-extractor/extract \
-H "Content-Type: application/json" \
-H "X-API-KEY: your-api-key" \
-d '{
"text": "The NullPointerException was thrown from the HashMap.get() method during garbage collection. Heap dump analysis revealed a race condition in the ConcurrentHashMap resize path.",
"maxKeywords": 5,
"profile": "TECHNICAL"
}'
{
"status": "OK",
"data": {
"keywords": [
{ "term": "NullPointerException", "score": 0.389 },
{ "term": "ConcurrentHashMap", "score": 0.321 },
{ "term": "garbage collection", "score": 0.276 },
{ "term": "race condition", "score": 0.241 },
{ "term": "heap dump", "score": 0.198 }
],
"profile": "TECHNICAL",
"wordCount": 29
}
}
Notice how TECHNICAL preserves compound terms like ConcurrentHashMap and garbage collection that GENERAL mode would split or discard.
Step 3: Batch Extraction
Submit multiple documents at once using the /extract/batch endpoint. Each item gets its own keyword list.
curl -X POST /api/keyword-extractor/extract/batch \
-H "Content-Type: application/json" \
-H "X-API-KEY: your-api-key" \
-d '{
"documents": [
{ "id": "doc-1", "text": "React hooks simplify state management in functional components." },
{ "id": "doc-2", "text": "PostgreSQL JSONB columns support GIN indexes for fast queries." }
],
"maxKeywords": 3
}'
{
"status": "OK",
"data": {
"results": [
{
"id": "doc-1",
"keywords": [
{ "term": "React hooks", "score": 0.445 },
{ "term": "state management", "score": 0.312 },
{ "term": "functional components", "score": 0.278 }
]
},
{
"id": "doc-2",
"keywords": [
{ "term": "PostgreSQL", "score": 0.401 },
{ "term": "JSONB", "score": 0.356 },
{ "term": "GIN indexes", "score": 0.289 }
]
}
]
}
}
Integration Tips
- Auto-tagging: Pipe blog posts through the extractor at publish time to generate SEO tags automatically.
- Search enrichment: Store extracted keywords alongside documents to improve full-text search relevance.
- Support triage: Extract keywords from incoming tickets to auto-assign categories and route to the right team.
- Tune
maxKeywords: Start with 5–10 for tagging, 3–5 for summaries. Scores drop off quickly past the top terms.
Next Steps
- Full API Reference — all parameters, error codes, and rate limits
- Readability Tutorial — pair keyword extraction with readability scoring
- Deduplication Tutorial — detect duplicate content before indexing
- Try It Live — test the keyword extractor in your browser