training.shoppinpal.com
  • Introduction
  • 1. The Ideal Workspace
    • The Perfect Machine
      • For Biz Team
      • For Developers
      • For Designers
    • Setup a machine in the cloud
      • Solution
      • Setup box on Azure
        • Create a machine on Azure
        • Test drive your remote machine
        • Setup Dropbox On Azure
      • Setup box on DigitalOcean
        • Setup UI
        • Shared FileSystem
          • Dropbox
            • Use locally developed node modules in another project
          • sshfs
        • Long Running Sessions
      • Feedback
  • 2. Learning Git
    • Static Code Analysis
  • 3. The Backend
    • Use Containers
    • Setup a loopback project
    • Lockdown
    • Build a better mousetrap
    • The abyss stares back
    • Built-in models
    • Extending built-in models
    • Understanding UserModel
    • Boot Scripts
    • Promises
    • Find roles for current user
    • Loopback Console
    • Current User
  • 4. Multi-tenancy With Loopback
    • What is Multi-Tenancy
    • Architecting with Loopback
    • Define scope for Roles
    • Role Resolvers
    • Access Control For Tenants
    • Better Programming with multi-tenancy
  • 5. The Frontend
    • The Browser
    • Unit Testing
      • Motivation behind this blog
      • How to write a test
      • Karma and Jasmin
      • Writing Tests
    • End-2-End Testing
    • Angular 1.x
    • Angular 2
      • Testing
  • 6. ElasticSearch
    • Better Search with NGram
    • NGram with Elasticsearch
    • Fun with Path Hierarchy Tokenizer
    • Working with Mappings and Analyzers
  • 7. Promises
    • What are Promises
    • Promise Implementations
    • Nuances
    • What should we use
  • 8. Learning Docker
    • Docker Swarm
  • 9. Queues & Workers
    • PHP workers in AWS EBS
    • NodeJS workers in AWS EBS
      • SQS Daemon by AWS
      • SQS Daemon in NodeJS
      • SQS polling by worker
    • Gearman
  • 10. Docker
    • Capabilities
  • Appendix
    • Bug in WebStorm deployments
    • The Perfect Terminal
    • Scalable App Deployment with AWS
    • Chrome Tips & Tricks
    • Host your own Gitbook
    • Gitbook Tips & Tricks
    • How to handle support incidents
    • Dev Resources
    • Debug e2e Tests on CircleCI
    • Logging
    • Authentication Principles
    • Mac
    • nvm
    • Unify testing with npm
      • Debugging Mocha
    • Sequence Diagrams
    • Project Sync via IDE
      • SFTP with WebStorm
      • SFTP with Visual Studio
    • Soft Linking
    • NodeJS Profiling
      • How to find node.js performance optimization killers
    • Setup Packer on Azure
Powered by GitBook
On this page
  • Better Search? What does it mean, better how?
  • Configuring NGram
  • Credit where credit is due - References
  1. 6. ElasticSearch

Better Search with NGram

Better Search? What does it mean, better how?

Let's say a user wants to search for: gift ideas in their notes but they accidently type gist ideas. One of the notes which should match as a search result has the title which contains super cool gifts list ... what now?

It means the search engine needs to be configured further and one way to do that is by using nGram.

  1. A well configured nGram, would break the word gifts from the original title, down into various combinations such as:

    gi | if | ft
    gif | ift
    gift

    and store it for matching when a search occurs.

  2. So even if users accidently used the word gist when searching, their query would be broken down into:

    gi | is | st
    gis | ist
    gist

    and at least a partial match would exist between the many broken down tokens: gi - 1 out of 6 tokens matched.

    • This will allow for the note titled super cool gifts list to show up as a search result.

    • It will be A low ranked search result but it is better than missing it completely.

    • There are other meaningful improvements like an auto-suggester which states: showing search results for "gist ideas" ... did you mean "gift ideas"? but that is a different topic entirely.

  3. The same concept for using ngrams applies to full-text search (FTE) anywhere: websites, blogs, eCommerce or personal notes.

Configuring NGram

NGram vs. Edge NGram — The NGram token filter generates all n-grams of the configured sizes for each token. For example, with the default settings (min_gram=1 and max_gram=2), "brown" is tokenized into:

[b] [r] [o] [w] [n] [br] [ro] [ow] [wn]

The Edge NGram token filter only generates n-grams from the beginning of the word:

[b] [br]

If you use edge n-grams, you will probably want to increase max_gram so you generate a few more terms. Setting it to 5 would yield:

[b] [br] [bro] [brow] [brown]

Thanks to ngrams, Well tuned queries can be matched directly:

Query: j
Analyzed Query Terms: [j]
Document Terms: [ju] [um] [mp] [jum] [ump] [jump]

Query: ju
Analyzed Query Terms: <ju>
Document Terms: <ju> [um] [mp] [jum] [ump] [jump]

Query: jum
Analyzed Query Terms: <jum>
Document Terms: [ju] [um] [mp] <jum> [ump] [jump]

Query: jump
Analyzed Query Terms: <jump>
Document Terms: [ju] [um] [mp] [jum] [ump] <jump>

Credit where credit is due - References

    • A significant portion of this page comes from Jon Tai's blog. The blog was not heavily anchored so it became quite difficult to reference readers via link & scroll to the relevant content directly. Therefore, some content is repeated here for creative control of a reader's learning experience.

Previous6. ElasticSearchNextNGram with Elasticsearch

Last updated 7 years ago

Blogs sometimes go down or disappear. It felt downright unethical to clone an article as HTML or PDF so instead here is a via SnagIt. Hoping that this can fall in the "ok as a backup for readers" non-infringing zone.

Control+R is Jon Tai's technical blog, powered by WordPress
scrolling screen capture