NGram with Elasticsearch
Setup a sandbox
Note: Even without the video, you can learn all the crucial details from the steps that are documented below
Login to your cloud-box over
sshCreate a directory for running an elasticsearch sandbox:
mkdir -p ~/dev/elasticsearch-sandboxStep into the working directory:
cd ~/dev/elasticsearch-sandboxCreate a
docker-compose.ymlfile to install and run elasticsearch:## Version Selection for compose file # https://docs.docker.com/compose/compose-file/#/versioning version: '2' services: es_v2: image: elasticsearch:2 ports: - "9202:9200" volumes: - ./docker-entrypoint-es2-plugins.sh:/apps/docker-entrypoint-es2-plugins.sh entrypoint: /apps/docker-entrypoint-es2-plugins.shCreate an entrypoint file named
docker-entrypoint-es2-plugins.shto install useful plugins:#!/bin/bash # setting up prerequisites # re-runs will give an error that is harmless: # > ERROR: plugin directory /usr/share/elasticsearch/plugins/delete-by-query already exists. # > To update the plugin, uninstall it first using 'remove delete-by-query' command #plugin install delete-by-query # https://github.com/mobz/elasticsearch-head/#running-as-a-plugin-of-elasticsearch plugin install mobz/elasticsearch-head # access it at /_plugin/elasticsearch-inquisitor/ plugin install polyfractal/elasticsearch-inquisitor #exec /docker-entrypoint.sh elasticsearch exec elasticsearch -Des.insecure.allow.root=trueMake sure to change the permissions to execute
shfiles:chmod 744 *.shStart the service:
docker-compose upOpen a browser to view the two plugins running on ES:
Play Around
Use the
Any Requesttab in/_plugin/head/to create an index with a custom analyzer for a 3 by 3 ngram:PUT testing_ngram_3_by_3 { "settings": { "analysis": { "analyzer": { "ngram_3_by_3": { "tokenizer": "ngram_3_by_3" } }, "tokenizer": { "ngram_3_by_3": { "type": "ngram", "min_gram": 3, "max_gram": 3, "token_chars": [ "letter", "digit" ] } } } } }Go to
/_plugin/elasticsearch-inquisitor/#/analyzersto see thengram_3_by_3analyzer at the bottom of the page.click the checkbox and then use the top most input field to see how the analyzer breaks down the input into tokens

Last updated