NGram with Elasticsearch

Setup a sandbox

Note: Even without the video, you can learn all the crucial details from the steps that are documented below

  1. Login to your cloud-box over ssh

  2. Create a directory for running an elasticsearch sandbox:

    mkdir -p ~/dev/elasticsearch-sandbox
  3. Step into the working directory:

    cd ~/dev/elasticsearch-sandbox
  4. Create a docker-compose.yml file to install and run elasticsearch:

    ## Version Selection for compose file
    # https://docs.docker.com/compose/compose-file/#/versioning
    version: '2'
    services:
       es_v2:
        image: elasticsearch:2
        ports:
         - "9202:9200"
        volumes:
         - ./docker-entrypoint-es2-plugins.sh:/apps/docker-entrypoint-es2-plugins.sh
        entrypoint: /apps/docker-entrypoint-es2-plugins.sh
  5. Create an entrypoint file named docker-entrypoint-es2-plugins.sh to install useful plugins:

    #!/bin/bash
    # setting up prerequisites
    
    # re-runs will give an error that is harmless:
    #   > ERROR: plugin directory /usr/share/elasticsearch/plugins/delete-by-query already exists.
    #   > To update the plugin, uninstall it first using 'remove delete-by-query' command
    #plugin install delete-by-query
    
    # https://github.com/mobz/elasticsearch-head/#running-as-a-plugin-of-elasticsearch
    plugin install mobz/elasticsearch-head
    
    # access it at /_plugin/elasticsearch-inquisitor/
    plugin install polyfractal/elasticsearch-inquisitor
    
    #exec /docker-entrypoint.sh elasticsearch
    exec elasticsearch -Des.insecure.allow.root=true
  6. Make sure to change the permissions to execute sh files:

    chmod 744 *.sh
  7. Start the service: docker-compose up

Play Around

  1. Use the Any Request tab in /_plugin/head/ to create an index with a custom analyzer for a 3 by 3 ngram:

    PUT testing_ngram_3_by_3
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "ngram_3_by_3": {
              "tokenizer": "ngram_3_by_3"
            }
          },
          "tokenizer": {
              "ngram_3_by_3": {
              "type": "ngram",
              "min_gram": 3,
              "max_gram": 3,
              "token_chars": [
                "letter",
                "digit"
              ]
            }
          }
        }
      }
    }
  2. Go to /_plugin/elasticsearch-inquisitor/#/analyzers to see the ngram_3_by_3 analyzer at the bottom of the page.

    • click the checkbox and then use the top most input field to see how the analyzer breaks down the input into tokens

Last updated