Elasticsearch Cheat Sheet Cheatsheet

Core Concepts & API Basics

Key Concepts

Index	A collection of documents with similar characteristics. Think of it as a database.
Document	A JSON document containing fields and their values. It’s the basic unit of information.
Field	A key-value pair within a document. The key is the field name and the value is the data.
Mapping	Defines how a document and its fields are stored and indexed. Like a schema.
Shard	Indexes are divided into shards. Each shard is a fully-functional and independent “index” that can be hosted on any node in an Elasticsearch cluster.
Replica	A copy of a shard. Replicas provide redundancy and increase search capacity.

Basic API Endpoints

PUT /<index_name> - Create an index.

GET /<index_name> - Retrieve index information.

DELETE /<index_name> - Delete an index.

POST /<index_name>/_doc - Index a document. Elasticsearch will assign an ID.

PUT /<index_name>/_doc/<_id> - Index or update a document with a specific ID.

GET /<index_name>/_doc/<_id> - Retrieve a document by ID.

POST /<index_name>/_search - Search documents within an index.

Common HTTP Methods

GET	Retrieve information.
POST	Create a new resource or perform an action (e.g., search).
PUT	Create or update a resource at a specific ID. Replaces the entire document.
DELETE	Delete a resource.

Query DSL (Domain Specific Language)

Basic Query Structure

The Query DSL is based on JSON. The basic structure is:

{
  "query": {
    "<query_type>": {
      "<field_name>": {
        "<parameter>": "<value>"
      }
    }
  }
}

Match Query

`match`	Analyzes the query and constructs a boolean query. Good for full-text search. `{ "query": { "match": { "title": "quick brown fox" } } }`
`match_phrase`	Matches exact phrases. The terms must be in the specified order. `{ "query": { "match_phrase": { "message": "this is a test" } } }`
`match_all`	Matches all documents. Useful for retrieving all documents in an index. `{ "query": { "match_all": {} } }`

Term Query

`term`	Finds documents that contain the exact term specified. Not analyzed. `{ "query": { "term": { "user.id": "kimchy" } } }`
`terms`	Finds documents that contain one or more of the exact terms specified. `{ "query": { "terms": { "user.id": ["kimchy", "jordan"] } } }`

Boolean Query

`bool`	A query that matches documents matching boolean combinations of other queries. Uses `must`, `should`, `must_not`, and `filter` clauses. `{ "query": { "bool": { "must": [ { "match": { "title": "brown" } } ], "filter": [ { "term": { "tags": "search" } } ], "must_not": [ { "range": { "date": { "gte": "2024-01-01" } } } ], "should": [ { "term": { "license": "pro" } } ], "minimum_should_match": 1 } } }`
`must`	The clause (query) must appear in matching documents and will contribute to the score.
`should`	The clause (query) should appear in the matching document. If the `bool` query contains no `must` or `filter` clauses, then at least one `should` clause must match. Contributes to the score.
`must_not`	The clause (query) must not appear in the matching documents. Is executed in filter context meaning that scoring is ignored and the clause is considered for caching.
`filter`	The clause (query) must appear in matching documents. However unlike `must` the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and the clause is considered for caching.

Aggregations

Aggregation Basics

Aggregations allow you to compute statistics and analytics over your data. They are similar to SQL GROUP BY.

{
  "aggs": {
    "<aggregation_name>": {
      "<aggregation_type>": {
        "field": "<field_name>"
      }
    }
  }
}

You can nest aggregations.

Bucket Aggregations

terms

Creates buckets based on unique terms in a field.

{
  "aggs": {
    "popular_tags": {
      "terms": {
        "field": "tags.keyword",
        "size": 10
      }
    }
  }
}

date_histogram

Creates buckets based on date intervals.

{
  "aggs": {
    "articles_per_month": {
      "date_histogram": {
        "field": "publish_date",
        "calendar_interval": "month",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

range

Creates buckets based on numeric or date ranges.

{
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 50 },
          { "from": 50, "to": 100 },
          { "from": 100 }
        ]
      }
    }
  }
}

Metric Aggregations

`avg`	Calculates the average of a numeric field. `{ "aggs": { "avg_price": { "avg": { "field": "price" } } } }`
`sum`	Calculates the sum of a numeric field. `{ "aggs": { "total_sales": { "sum": { "field": "sales" } } } }`
`min`	Calculates the minimum value of a numeric field. `{ "aggs": { "min_price": { "min": { "field": "price" } } } }`
`max`	Calculates the maximum value of a numeric field. `{ "aggs": { "max_price": { "max": { "field": "price" } } } }`
`cardinality`	Calculates the approximate number of unique values in a field. Useful for counting distinct users. `{ "aggs": { "distinct_users": { "cardinality": { "field": "user_id" } } } }`

Mappings & Settings

Mapping Types

`text`	Used for full-text search. Analyzed into individual terms.
`keyword`	Used for exact-value matching, filtering, and sorting. Not analyzed.
`date`	Stores dates. Can be formatted. `"format": "yyyy-MM-dd HH:mm:ss\|\|yyyy-MM-dd\|\|epoch_millis"`
`integer`, `long`, `float`, `double`	Numeric types.
`boolean`	Stores boolean values (true/false).
`object`	Used for nested JSON objects.
`nested`	Used for arrays of JSON objects. Allows querying each object in the array independently.

Explicit Mapping

You can define the mapping explicitly when creating an index.

PUT /my_index
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "publish_date": { "type": "date", "format": "yyyy-MM-dd" },
      "author_id": { "type": "keyword" }
    }
  }
}

If no mapping is defined, Elasticsearch will attempt to infer the mapping dynamically (Dynamic Mapping).

Index Settings

`number_of_shards`	The number of primary shards an index should have. Defaults to 1 in newer versions. Can only be set at index creation.
`number_of_replicas`	The number of replica shards each primary shard should have. Defaults to 1. Can be changed dynamically after index creation. `PUT /my_index/_settings { "number_of_replicas": 2 }`
`analysis`	Configures analyzers, tokenizers, token filters, and character filters for text analysis. Allows for customizing how text is indexed and searched.