NGram with Elasticsearch
Setup a sandbox
Note: Even without the video, you can learn all the crucial details from the steps that are documented below
Login to your cloud-box over
ssh
Create a directory for running an elasticsearch sandbox:
mkdir -p ~/dev/elasticsearch-sandbox
Step into the working directory:
cd ~/dev/elasticsearch-sandbox
Create a
docker-compose.yml
file to install and run elasticsearch:## Version Selection for compose file # https://docs.docker.com/compose/compose-file/#/versioning version: '2' services: es_v2: image: elasticsearch:2 ports: - "9202:9200" volumes: - ./docker-entrypoint-es2-plugins.sh:/apps/docker-entrypoint-es2-plugins.sh entrypoint: /apps/docker-entrypoint-es2-plugins.sh
Create an entrypoint file named
docker-entrypoint-es2-plugins.sh
to install useful plugins:#!/bin/bash # setting up prerequisites # re-runs will give an error that is harmless: # > ERROR: plugin directory /usr/share/elasticsearch/plugins/delete-by-query already exists. # > To update the plugin, uninstall it first using 'remove delete-by-query' command #plugin install delete-by-query # https://github.com/mobz/elasticsearch-head/#running-as-a-plugin-of-elasticsearch plugin install mobz/elasticsearch-head # access it at /_plugin/elasticsearch-inquisitor/ plugin install polyfractal/elasticsearch-inquisitor #exec /docker-entrypoint.sh elasticsearch exec elasticsearch -Des.insecure.allow.root=true
Make sure to change the permissions to execute
sh
files:chmod 744 *.sh
Start the service:
docker-compose up
Open a browser to view the two plugins running on ES:
Play Around
Use the
Any Request
tab in/_plugin/head/
to create an index with a custom analyzer for a 3 by 3 ngram:PUT testing_ngram_3_by_3 { "settings": { "analysis": { "analyzer": { "ngram_3_by_3": { "tokenizer": "ngram_3_by_3" } }, "tokenizer": { "ngram_3_by_3": { "type": "ngram", "min_gram": 3, "max_gram": 3, "token_chars": [ "letter", "digit" ] } } } } }
Go to
/_plugin/elasticsearch-inquisitor/#/analyzers
to see thengram_3_by_3
analyzer at the bottom of the page.click the checkbox and then use the top most input field to see how the analyzer breaks down the input into tokens
Last updated