subtlex-word-frequencies

2.0.0 • Public • Published

subtlex-word-frequencies

Build Downloads Size

List of 74,286 words sorted by frequency of use in spoken English.

The word counts are derived from SUBTLEXus, a corpus of American English subtitles of movies.

Install

npm:

npm install subtlex-word-frequencies

Use

var subtlex = require('subtlex-word-frequencies')
 
console.log(words.length)
 
console.log(words.slice(0, 3))
 
console.log(words.filter(d => d.word.match(/chick/)).slice(0, 5))

Yields:

74286
[
  {word: 'you', count: 2134713},
  {word: 'I', count: 2038529},
  {word: 'the', count: 1501908}
]
[
  {word: 'chicken', count: 3148},
  {word: 'chick', count: 1334},
  {word: 'chicks', count: 742},
  {word: 'chickens', count: 520},
  {word: 'chickenshit', count: 85}
]

API

subtlexWordFrequencies

Array.<Entry> — List of all entries in SUBTLEXus. Each entry has the following properties:

  • word (string) — Unique word (example: git)
  • value (number) — Number of times the word appears in the corpus (example: 101)

word starts with a capital when the word more often starts with an uppercase letter than with a lowercase letter (example: I).

The entire original corpus consists of 51 million words.

License

ISC © Zeke Sikelianos

Package Sidebar

Install

npm i subtlex-word-frequencies

Weekly Downloads

18

Version

2.0.0

License

ISC

Unpacked Size

3.62 MB

Total Files

4

Last publish

Collaborators

  • wooorm
  • zeke