ElasticSearch DSL介绍

Elasticsearch 最强大的地方，不只是它能“存数据”和“搜数据”，而是它提供了一套非常灵活的查询语言，也就是 DSL。

很多人第一次接触 Elasticsearch 时，会觉得它和 SQL 很不一样。SQL 更像是面向“表”的查询，而 Elasticsearch DSL 更像是面向“文档”和“倒排索引”的检索描述语言。

这篇文章想讲清楚几个最常用的问题：

什么是 Elasticsearch DSL
match、term、bool 分别怎么用
查询和过滤有什么区别
排序、分页、聚合应该怎么写
平时最容易踩哪些坑

一、什么是 DSL

DSL 全称是 Domain Specific Language，也就是“领域特定语言”。

在 Elasticsearch 中，DSL 本质上就是一段 JSON，用来描述：

我要查哪些文档
条件是什么
怎么排序
怎么分页
是否要做聚合统计

例如最简单的一条查询：

GET /product/_search
{
  "query": {
    "match": {
      "title": "iphone"
    }
  }
}

这段 DSL 的含义就是：

在 product 索引中搜索
查找 title 字段中和 iphone 相关的文档

所以可以把 DSL 理解成 Elasticsearch 的查询描述语言。

二、一个完整搜索请求通常包含什么

一个典型的 Elasticsearch 搜索请求，通常由下面几部分组成：

query：查询条件
sort：排序规则
from / size：分页
_source：返回哪些字段
aggs：聚合分析

例如：

GET /product/_search
{
  "_source": ["title", "price", "brand"],
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "iphone" } }
      ],
      "filter": [
        { "term": { "brand.keyword": "Apple" } },
        { "range": { "price": { "gte": 5000, "lte": 10000 } } }
      ]
    }
  },
  "sort": [
    { "price": "asc" }
  ],
  "from": 0,
  "size": 10
}

这一条请求已经很接近真实业务了：

关键词搜索 iphone
只看 Apple 品牌
只要 5000 到 10000 的商品
按价格升序排序
返回前 10 条

三、Query 和 Filter 的区别

这是 Elasticsearch DSL 最值得先理解的地方。

虽然它们看起来都在“筛选文档”，但语义完全不同。

1. Query

query 更强调“相关性”。

它通常会计算 _score，也就是相关性得分。例如：

关键词是否命中
命中了几个词
词频如何
文档是否更接近查询意图

所以 query 适合全文检索场景，例如：

搜文章标题
搜商品名称
搜评论内容

2. Filter

filter 更强调“条件过滤”。

它只关心：

满足
或不满足

而不会参与相关性评分，通常也更高效，很多情况下还能利用缓存。

所以 filter 更适合结构化条件，例如：

品牌是否等于 Apple
价格是否大于 5000
状态是否为已上架
时间是否在某个区间内

一句话总结：

需要相关性计算，用 query
只是精确筛选，用 filter

四、最常用的查询类型

1. `match`

match 是最常见的全文检索查询。

GET /article/_search
{
  "query": {
    "match": {
      "title": "深度学习推荐系统"
    }
  }
}

它会对查询词进行分词，再基于倒排索引检索。

适合字段：

text

不适合字段：

keyword
数值字段
日期字段

2. `term`

term 用于精确匹配，不做分词。

GET /product/_search
{
  "query": {
    "term": {
      "brand.keyword": "Apple"
    }
  }
}

适合场景：

ID
枚举值
状态字段
keyword 类型字段

这是最常见的坑之一：

对 text 字段做全文搜索，不该用 term
对精确值做过滤，通常不该用 match

3. `terms`

terms 相当于 SQL 里的 IN。

GET /product/_search
{
  "query": {
    "terms": {
      "category_id": [1, 2, 3]
    }
  }
}

4. `range`

range 用于范围查询。

GET /product/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 5000,
        "lte": 10000
      }
    }
  }
}

常见关键字：

gt：大于
gte：大于等于
lt：小于
lte：小于等于

5. `match_phrase`

match_phrase 用于短语匹配，强调词序。

GET /article/_search
{
  "query": {
    "match_phrase": {
      "title": "推荐系统 架构"
    }
  }
}

适合需要“连续短语命中”的场景。

五、`bool` 查询

实际业务中，查询条件往往不会只有一个。Elasticsearch 里最常用的组合方式就是 bool。

bool 最常见的四个子句是：

must
should
filter
must_not

1. `must`

必须满足，并参与评分。

1
2
3

"must": [
  { "match": { "title": "iphone" } }
]

2. `filter`

必须满足，但不参与评分。

1
2
3

"filter": [
  { "term": { "brand.keyword": "Apple" } }
]

3. `should`

最好满足，相当于加分项。

"should": [
  { "match": { "title": "pro" } },
  { "match": { "title": "max" } }
]

4. `must_not`

必须不满足。

1
2
3

"must_not": [
  { "term": { "status.keyword": "offline" } }
]

5. 一个完整示例

GET /product/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "iphone" } }
      ],
      "filter": [
        { "term": { "brand.keyword": "Apple" } },
        { "range": { "price": { "gte": 5000 } } }
      ],
      "must_not": [
        { "term": { "status.keyword": "offline" } }
      ],
      "should": [
        { "match": { "title": "pro" } }
      ]
    }
  }
}

六、排序和分页

1. 排序

GET /product/_search
{
  "query": {
    "match": {
      "title": "iphone"
    }
  },
  "sort": [
    { "_score": "desc" },
    { "price": "asc" }
  ]
}

常见排序字段：

_score
数值字段
日期字段
keyword 字段

如果对 text 字段排序，通常会有问题，一般要排序对应的 .keyword 子字段。

2. 分页

最常见的分页方式是：

GET /product/_search
{
  "from": 0,
  "size": 10,
  "query": {
    "match_all": {}
  }
}

含义是：

从第 0 条开始
返回 10 条

但需要注意，from + size 很大时，深分页性能会明显变差。

如果是深分页场景，通常更推荐：

search_after
scroll

其中：

scroll 更适合离线导出
search_after 更适合在线翻页

七、聚合查询 `aggs`

Elasticsearch 不只是搜索引擎，它也很擅长做统计分析。这个能力主要通过 aggs 来实现。

1. `terms` 聚合

统计每个品牌下有多少文档：

GET /product/_search
{
  "size": 0,
  "aggs": {
    "brand_count": {
      "terms": {
        "field": "brand.keyword"
      }
    }
  }
}

这里 size: 0 的意思是：

不返回具体文档
只返回聚合结果

2. `avg` 聚合

统计平均价格：

GET /product/_search
{
  "size": 0,
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}

3. 分组后再统计

先按品牌分组，再统计每个品牌的平均价格：

GET /product/_search
{
  "size": 0,
  "aggs": {
    "brand_group": {
      "terms": {
        "field": "brand.keyword"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

这是报表和分析场景里非常常见的写法。

八、几个常见业务场景

1. 全文搜索 + 条件过滤

GET /article/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Elasticsearch" } }
      ],
      "filter": [
        { "term": { "status.keyword": "published" } },
        { "range": { "publish_time": { "gte": "2026-01-01" } } }
      ]
    }
  }
}

2. 精确筛选

GET /order/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "user_id": 1001 } },
        { "term": { "status.keyword": "paid" } }
      ]
    }
  }
}

3. 搜索 + 聚合

GET /product/_search
{
  "query": {
    "match": {
      "title": "手机"
    }
  },
  "aggs": {
    "brand_group": {
      "terms": {
        "field": "brand.keyword"
      }
    }
  }
}

这种写法在搜索页做筛选面板时特别常见。

九、最容易踩的坑

1. `match` 和 `term` 用反了

这是最常见的问题。

搜 text 字段，通常用 match
查 ID、状态、枚举值，通常用 term

2. 对 `text` 字段做聚合或排序

很多人会发现：

排序报错
聚合结果不符合预期

原因通常是字段类型是 text，会被分词。

这时一般要用：

field.keyword

3. 深分页性能差

如果 from 很大，Elasticsearch 会越来越慢。

所以不是所有分页都适合一直用 from + size。

4. 把过滤条件写进 `must`

如果某个条件只是单纯筛选，而不是为了影响相关性评分，那么它更适合写进 filter。

十、一个综合示例

下面是一个比较完整的搜索 DSL：

GET /product/_search
{
  "_source": ["title", "brand", "price", "category"],
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "iphone" } }
      ],
      "filter": [
        { "term": { "brand.keyword": "Apple" } },
        { "range": { "price": { "gte": 5000, "lte": 10000 } } }
      ],
      "must_not": [
        { "term": { "status.keyword": "offline" } }
      ],
      "should": [
        { "match": { "title": "pro" } }
      ]
    }
  },
  "sort": [
    { "_score": "desc" },
    { "price": "asc" }
  ],
  "aggs": {
    "category_group": {
      "terms": {
        "field": "category.keyword"
      }
    }
  }
}

如果把它翻译成人话，就是：

搜索标题里和 iphone 相关的商品
品牌必须是 Apple
价格在 5000 到 10000 之间
不要下架商品
命中 pro 的结果优先级更高
按相关性和价格排序
同时统计不同分类下的分布

十一、小结

ElasticSearch DSL 最重要的不是死记语法，而是先建立一套正确的使用习惯：

全文搜索优先考虑 match
精确过滤优先考虑 term / range / filter
复杂逻辑用 bool 来组织
统计分析用 aggs

如果把 Elasticsearch DSL 理解成“倒排索引上的 JSON 查询语法树”，很多看起来复杂的写法其实都会变得很自然。

真正写多了之后，你会发现最常用的其实就是这几类：

match
term
bool
range
sort
aggs

把这些掌握住，日常绝大多数 Elasticsearch 查询需求基本就能覆盖了。

一、什么是 DSL

二、一个完整搜索请求通常包含什么

三、Query 和 Filter 的区别

1. Query

2. Filter

四、最常用的查询类型

1. match

2. term

3. terms

4. range

5. match_phrase

五、bool 查询

1. must

2. filter

3. should

4. must_not