Large Text Analyzer
This ES6 module runs a user-defined function on the async stream of words and returns a map of aggregated results, in this example a word frequency count
IMPORTANT - This uses ES6 Module Loader - You will need a very recent version of node and below v13 run it with --node-experimental-featuresExample of usage
import lta from 'large-text-analyzer'
// OPTIONALLY override the word delimiters - this RegEx is the default one:
lta.delimiters = /\s|[^a-zA-Z]|[0-9]/
// OPTIONALLY override the processWord(word) function
// to be executed for each word.
// NOTE "this.map" - this is the result map,
// that gets returned at the end of the stream.
// It can be used to store all kinds of results
// this here is the default - counts unique
// word occurences
lta.processWord = function (word){
word = word.toLowerCase()
let count = this.map.get(word)
count = count? ++count : 1
this.map.set(word,count)
}
// Define async block:
async function processFile (fileName) {
let vocabulary = await lta.processWords(fileName);
console.log(Vocabulary consists of ${vocabulary.size} words
)
console.log(vocabulary);
}
// And execute it.
processFile('./test/data/testdata.txt')
That will produce the map of words and frequency of their use in the input text file
SAMPLE OUTPUT:
Vocabulary consists of 2 words
Map {
'a' => 2,
'after' => 1
etc... }