subtlex-word-frequencies

2.0.0 • Public • Published

subtlex-word-frequencies

Build Downloads Size

List of 74,286 words sorted by frequency of use in spoken English.

The word counts are derived from SUBTLEXus, a corpus of American English subtitles of movies.

Install

npm:

npm install subtlex-word-frequencies

Use

var subtlex = require('subtlex-word-frequencies')
 
console.log(words.length)
 
console.log(words.slice(0, 3))
 
console.log(words.filter(d => d.word.match(/chick/)).slice(0, 5))

Yields:

74286
[
  {word: 'you', count: 2134713},
  {word: 'I', count: 2038529},
  {word: 'the', count: 1501908}
]
[
  {word: 'chicken', count: 3148},
  {word: 'chick', count: 1334},
  {word: 'chicks', count: 742},
  {word: 'chickens', count: 520},
  {word: 'chickenshit', count: 85}
]

API

subtlexWordFrequencies

Array.<Entry> — List of all entries in SUBTLEXus. Each entry has the following properties:

  • word (string) — Unique word (example: git)
  • value (number) — Number of times the word appears in the corpus (example: 101)

word starts with a capital when the word more often starts with an uppercase letter than with a lowercase letter (example: I).

The entire original corpus consists of 51 million words.

License

ISC © Zeke Sikelianos

Dependencies (0)

    Dev Dependencies (10)

    Package Sidebar

    Install

    npm i subtlex-word-frequencies

    Weekly Downloads

    51

    Version

    2.0.0

    License

    ISC

    Unpacked Size

    3.62 MB

    Total Files

    4

    Last publish

    Collaborators

    • wooorm
    • zeke