npm i my_word
this is for complete goofballs:
myCompleteMemoirs === -1
because it loops through every character in the text.
even worse!
- it will match
fjohnny carsonb
- it will miss
johnny-carson
- it will over-scan on
jjohnny cartoon
etc.
All your word are belong:
let index = //takes a few milliseconds.. // 💥 fast 💥index//true
by using my_word, not only are lookups faster, but filesize + memory-use is much smaller.
in comparison to other prefix/suffix tries (like efrt!) my_word indexes by word and sentence, instead of by character. This means matches will not extend over sentence-boundaries, and it won't get tripped-up by punctuation, whitespace, or prefix-matches.
How-the?
the Aho-Corasick algorithm is a fancy ~pants~ way to look-up a string efficiently in text.
If you have a bag of words, and want to know whether they're found in a text, you could loop through and do a str.match(/\bword\b/)
for each one - but that's equally-slow for every lookup (O(n)).
...or you could put all the words in an object, but lord-help-you when you want to lookup a multiple-word input.
Faster would be to store the sequences of words in a stick-and-arrow diagram kinda-way.
This way, any-length of text can be searched-for immediately O(1), and no sequence is stored twice.
This algorithm makes a graph of words, instead of characters, and makes certain assumptions about language, and that you are looking for full words in natural language text.
It is forked/lifted from tombooth's async, substring implimentation
Usage
$ npm install my_word
var myWord=var index= console// trueconsole// trueconsole// falseconsole// true
MIT