Working with Mappings and Analyzers
Mappings:
Mappings are the process of defining how a document should be mapped in elasticsearch. i.e. How the data should be interpreted.
Each type in elasticsearch has a mapping.
To Retrieve the existing mapping for a given type, you can use:
For identification, Elasticsearch has following Data Types:
Numeric Types => Byte,Short,Integer & Long
Floating Point Types => Float,Double
Date
Boolean
Objects
String
There is also an implicit _all
field which is the concatenation of all the fields inside a document.
Different mapping for type can return different search results.
Custom mappings are possible.
For Defining Custom Mapping For Type:
Note: Custom mappings can only be specified during index creation. Once indexed, mappings cannot be modified as there may be data in index belonging to that mapping. Kamal has recently done a chapter about Re-Indexing. (worth a read).
There are two types of searches in elasticsearch.
Exact Value Match
Full-Text Search
1.Exact Value Match:
In this search, fields are search for exact value match i.e. Searching all records where book name == "John Doe".
2.Full Text Search:
In this search, ES searches for partial match based on specified keywords. i.e. find all books where discription has words "Hunger","adventure" and "Fantasy".
Analysis:
For Utilizing these searches to their fullest, analysis needs to be performed. Analysis can be summarized to specifying:
Abbreviations
Stemming
Typo Handling
We will be looking at each of them now.
1. Abbreviations:
Using analyzers, we can tell elasticsearch how to treat abbreviations in our data i.e. dr = Doctor
. So whenever we search for doctor
keyword in our index, elasticsearch will also return the results which have dr
mentioned in them.
2. Stemming:
Using stemming in analyzers allows us to use base words for modified verbs like
3. Typo Handling:
Analyzers also provide typo handling as while querying if we are searching for particular word say resurrection
, then elasticsearch will return the results in which typos are present.i.e. it will treat typos like resurection,ressurection
as same and will return the result.
Analysis Phases:
Tidy Up The Body
Analysis involves removing irrelevant stuff like applying character filters e.g. removing html tags
Tokenize the Body
Tokenizers:
tokenizers can be used to remove whitespaces and generate tokens.
Normalization of Tokens
After token generation, we need to normalize these tokens. for eg. lowercase all tokens.
Analyzers in Elasticsearch:
There are few in-built analyzers in elasticsearch.
Standard
Simple
Whitespace
Stop
Keyword
Pattern
Language
Snowball
Though we can configure our own custom analyzers too. We will be configuring Path Analyzer in this tutorial.
In the following query, I am indexing an index named 'elastic_course' and type 'Book' where analyzers and custom mappings are defined.
Last updated