ElasticSearch
What is Elasticsearch?
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It's designed for horizontal scalability, reliability, and real-time search.
- Type: Distributed search and analytics engine
- Written in: Java
- License: Elastic License 2.0 / SSPL (post 7.11), Apache 2.0 (OpenSearch fork)
- Protocol: REST API over HTTP/HTTPS
- Default Port: 9200 (HTTP), 9300 (Transport)
- Part of: Elastic Stack (ELK: Elasticsearch, Logstash, Kibana)
Core Concepts
Terminology
| Concept |
Description |
Analogy (RDBMS) |
| Index |
Collection of documents |
Database |
| Document |
JSON object (unit of data) |
Row |
| Field |
Key-value in document |
Column |
| Mapping |
Schema definition |
Table Schema |
| Shard |
Horizontal partition of index |
Partition |
| Replica |
Copy of a shard |
Replica |
| Node |
Single ES server instance |
Server |
| Cluster |
Group of nodes |
Cluster |
Architecture

Node Types
| Type |
Role |
| Master |
Cluster management, index creation |
| Data |
Stores data, executes searches |
| Ingest |
Pre-processing documents |
| Coordinating |
Routes requests, aggregates results |
| ML |
Machine learning jobs |
Core Features

Common Use Cases
1. Full-Text Search
// Index a document
PUT /products/_doc/1
{
"name": "Apple iPhone 15 Pro",
"description": "Latest iPhone with A17 Pro chip",
"price": 999,
"category": "electronics",
"tags": ["smartphone", "apple", "ios"]
}
// Search
GET /products/_search
{
"query": {
"multi_match": {
"query": "iphone pro",
"fields": ["name^3", "description", "tags"],
"fuzziness": "AUTO"
}
}
}
2. Log Analytics (ELK Stack)
// Log document structure
{
"@timestamp": "2024-01-15T10:30:00Z",
"level": "ERROR",
"service": "payment-service",
"message": "Payment failed for user 123",
"trace_id": "abc123",
"user_id": "user_123",
"error_code": "INSUFFICIENT_FUNDS"
}
// Query logs
GET /logs-*/_search
{
"query": {
"bool": {
"must": [
{ "match": { "level": "ERROR" } },
{ "match": { "service": "payment-service" } }
],
"filter": [
{ "range": { "@timestamp": { "gte": "now-1h" } } }
]
}
},
"aggs": {
"errors_by_code": {
"terms": { "field": "error_code.keyword" }
}
}
}
3. E-commerce Search
@Service
public class ProductSearchService {
private final ElasticsearchClient client;
public SearchResult<Product> search(ProductSearchRequest request) {
SearchResponse<Product> response = client.search(s -> s
.index("products")
.query(q -> q
.bool(b -> {
// Full-text search
if (request.getQuery() != null) {
b.must(m -> m
.multiMatch(mm -> mm
.query(request.getQuery())
.fields("name^3", "description", "brand^2")
.fuzziness("AUTO")
)
);
}
// Filters
if (request.getCategory() != null) {
b.filter(f -> f
.term(t -> t.field("category").value(request.getCategory()))
);
}
if (request.getMinPrice() != null || request.getMaxPrice() != null) {
b.filter(f -> f
.range(r -> r
.field("price")
.gte(JsonData.of(request.getMinPrice()))
.lte(JsonData.of(request.getMaxPrice()))
)
);
}
return b;
})
)
.aggregations("categories", a -> a
.terms(t -> t.field("category.keyword"))
)
.aggregations("price_ranges", a -> a
.range(r -> r
.field("price")
.ranges(
Range.of(rr -> rr.to(50.0)),
Range.of(rr -> rr.from(50.0).to(100.0)),
Range.of(rr -> rr.from(100.0))
)
)
)
.highlight(h -> h
.fields("name", f -> f)
.fields("description", f -> f)
)
.from(request.getOffset())
.size(request.getLimit()),
Product.class
);
return mapResponse(response);
}
}
4. Autocomplete / Suggestions
// Mapping with completion suggester
PUT /products
{
"mappings": {
"properties": {
"name": { "type": "text" },
"suggest": {
"type": "completion",
"contexts": [
{ "name": "category", "type": "category" }
]
}
}
}
}
// Index with suggestions
PUT /products/_doc/1
{
"name": "Apple iPhone 15",
"suggest": {
"input": ["iphone", "iphone 15", "apple iphone"],
"contexts": { "category": "electronics" }
}
}
// Autocomplete query
GET /products/_search
{
"suggest": {
"product-suggest": {
"prefix": "iph",
"completion": {
"field": "suggest",
"size": 5,
"contexts": {
"category": "electronics"
},
"fuzzy": { "fuzziness": 1 }
}
}
}
}
5. Geo-Spatial Search
// Mapping
PUT /stores
{
"mappings": {
"properties": {
"name": { "type": "text" },
"location": { "type": "geo_point" }
}
}
}
// Index store
PUT /stores/_doc/1
{
"name": "Downtown Store",
"location": { "lat": 40.7128, "lon": -74.0060 }
}
// Find stores within radius
GET /stores/_search
{
"query": {
"geo_distance": {
"distance": "10km",
"location": { "lat": 40.73, "lon": -73.99 }
}
},
"sort": [
{
"_geo_distance": {
"location": { "lat": 40.73, "lon": -73.99 },
"order": "asc",
"unit": "km"
}
}
]
}
6. Aggregations / Analytics
// Sales analytics
GET /orders/_search
{
"size": 0,
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"total_sales": { "sum": { "field": "amount" } },
"avg_order_value": { "avg": { "field": "amount" } }
}
},
"top_categories": {
"terms": { "field": "category.keyword", "size": 10 },
"aggs": {
"revenue": { "sum": { "field": "amount" } }
}
},
"revenue_percentiles": {
"percentiles": { "field": "amount" }
}
}
}
Query Types
Full-Text Queries
// Match (analyzed)
{ "match": { "message": "quick brown fox" } }
// Match Phrase
{ "match_phrase": { "message": "quick brown fox" } }
// Multi-Match
{ "multi_match": { "query": "search text", "fields": ["title^2", "body"] } }
// Query String (Lucene syntax)
{ "query_string": { "query": "title:elasticsearch AND status:published" } }
Term-Level Queries
// Term (exact match, not analyzed)
{ "term": { "status": "published" } }
// Terms (multiple values)
{ "terms": { "status": ["published", "draft"] } }
// Range
{ "range": { "price": { "gte": 10, "lte": 100 } } }
// Exists
{ "exists": { "field": "email" } }
// Prefix
{ "prefix": { "username": "joh" } }
// Wildcard
{ "wildcard": { "email": "*@gmail.com" } }
Compound Queries
// Bool Query
{
"bool": {
"must": [ ... ], // AND, affects score
"should": [ ... ], // OR, affects score
"must_not": [ ... ], // NOT, no score
"filter": [ ... ] // AND, no score (cached)
}
}
Mapping & Analyzers
Field Types
| Type |
Description |
text |
Analyzed full-text |
keyword |
Exact value (not analyzed) |
long, integer, short, byte |
Numeric |
double, float |
Floating point |
boolean |
true/false |
date |
Date/datetime |
geo_point |
Lat/lon |
geo_shape |
Polygons, etc. |
nested |
Array of objects |
object |
JSON object |
Custom Analyzer
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding", "my_stemmer"]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"language": "english"
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Index Management
Index Lifecycle

Index Templates
PUT /_index_template/logs_template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"message": { "type": "text" }
}
}
}
}
Reindexing
POST /_reindex
{
"source": { "index": "old_index" },
"dest": { "index": "new_index" }
}
Indexing
// Bulk indexing
POST /_bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Product 1", "price": 10 }
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "Product 2", "price": 20 }
Search
// Use filter context for non-scoring queries
{
"query": {
"bool": {
"must": { "match": { "title": "search" } },
"filter": [
{ "term": { "status": "published" } },
{ "range": { "date": { "gte": "2024-01-01" } } }
]
}
}
}
// Limit fields returned
{
"_source": ["title", "date"],
"query": { ... }
}
// Use search_after for deep pagination
{
"search_after": [1234, "doc_id"],
"sort": [{ "date": "desc" }, { "_id": "asc" }]
}
Trade-offs
| Pros |
Cons |
| Powerful full-text search |
Complex to operate at scale |
| Near real-time |
Eventually consistent |
| Horizontal scalability |
Memory intensive |
| Rich query DSL |
Not for primary data store |
| Great for analytics |
Expensive for high cardinality |
| Schema-free |
Mapping changes can be painful |
| REST API |
No transactions |
| Aggregations |
No joins (denormalize) |
| Metric |
Typical Value |
| Index latency |
~1 second (NRT) |
| Search latency |
10-100ms |
| Throughput |
10,000+ docs/sec/node |
| Shard size |
10-50 GB recommended |
| Shards per index |
1-5 (avoid over-sharding) |
When to Use Elasticsearch
Good For:
- Full-text search
- Log/event analytics
- Product search
- Autocomplete
- Geo-spatial search
- Metrics and monitoring
- Security analytics (SIEM)
Not Good For:
- Primary database
- Transactional data
- Strong consistency needs
- Frequent updates to same document
- Complex relational queries
Elasticsearch vs Alternatives
| Feature |
Elasticsearch |
Solr |
OpenSearch |
Algolia |
| Full-text search |
Excellent |
Excellent |
Excellent |
Excellent |
| Analytics |
Excellent |
Good |
Excellent |
Limited |
| Ease of use |
Good |
Moderate |
Good |
Excellent |
| Managed options |
Yes |
Limited |
Yes |
Yes (SaaS) |
| License |
Elastic/SSPL |
Apache 2.0 |
Apache 2.0 |
Proprietary |
| Real-time |
Yes |
Yes |
Yes |
Yes |
Best Practices
- Right-size shards - 10-50GB per shard
- Don't over-shard - More shards ≠ better performance
- Use aliases - For zero-downtime reindexing
- Bulk for indexing - Never single document inserts at scale
- Use filters - For non-scoring queries (cached)
- Denormalize data - No joins, embed related data
- Set explicit mappings - Don't rely on dynamic mapping in production
- Index templates - For consistent settings across indices
- Separate hot/warm/cold - Lifecycle management
- Monitor cluster health - Yellow = replicas missing, Red = data missing
Common API Endpoints
# Cluster
GET /_cluster/health
GET /_cluster/stats
GET /_cat/nodes?v
GET /_cat/indices?v
GET /_cat/shards?v
# Index
PUT /my_index
DELETE /my_index
GET /my_index/_mapping
GET /my_index/_settings
# Document
PUT /my_index/_doc/1 { ... }
GET /my_index/_doc/1
DELETE /my_index/_doc/1
POST /my_index/_update/1 { "doc": { ... } }
# Search
GET /my_index/_search { "query": { ... } }
POST /my_index/_search { "query": { ... } }
# Bulk
POST /_bulk { ... }
# Analyze
GET /_analyze { "analyzer": "standard", "text": "Hello World" }