compromise-dates
TypeScript icon, indicating that this package has built-in type declarations

3.5.0 • Public • Published
date-parsing plugin for compromise
npm install compromise-dates

This library is an earnest attempt to get date information out of text, in a clear way -

- including all informal text formats, and folksy shorthands.

import nlp from 'compromise'
import datePlugin from 'compromise-dates'
nlp.plugin(datePlugin)

let doc = nlp('the second monday of february')
doc.dates().get()[0]
/*
  { start: '2021-02-08T00:00:00.000Z', end: '2021-02-08T23:59:59.999Z'}
*/

Tokenization and disambiguation with compromise.
Timezone and DST reckoning with spacetime [1]
Number-parsing with compromise-numbers [1]
Timezone reconciliation with spacetime-informal [1]

Things it does well:

explicit-dates description Start End
march 2nd March 2, 12:00am March 2, 11:59pm
2 march '' ''
tues march 2 '' ''
march the second natural-language number '' ''
on the 2nd implicit months '' ''
tuesday the 2nd date-reckoning '' ''

numeric-dates:
2020/03/02 iso formats '' ''
2020-03-02 '' ''
03-02-2020 british formats '' ''
03/02 '' ''
2020.08.13 alt-ISO '' ''

named-dates:
today - -
tomorrow '' ''
christmas eve calendar-holidays Dec 24, 12:00am Dec 24, 11:59pm
easter astronomical holidays -depends- -
q1 Jan 1, 12:00am Mar 31, 11:59pm

times:
2pm '' ''
2:12pm '' ''
2:12 '' ''
02:12:00 weird iso-times '' ''
two oclock written formats '' ''
before 1 '' ''
noon '' ''
at night informal daytimes '' ''
in the morning '' ''
tomorrow evening '' ''

timezones:
eastern time informal zone support '' ''
est TZ shorthands '' ''
peru time '' ''
..in beirut by location '' ''
GMT+9 by UTC/GMT offset '' ''
-4h '' '' ''
Canada/Eastern IANA codes '' ''

relative durations:
this march '' ''
this week '' ''
this sunday '' ''
next april '' ''
this past year '' ''
second week of march '' ''
last weekend of march '' ''
last spring '' ''
the saturday after next '' ''

punted dates:
in seven weeks now+duration '' ''
two days after june 6th date+duration '' ''
2 weeks from now '' ''
2 weeks after june '' ''
2 years, 4 months, and 5 days ago complex durations '' ''
a week and a half before written-out numbers '' ''
a week friday idiom format '' ''

start/end:
end of the week up-against the ending '' ''
start of next year lean-toward starting '' ''
middle of q2 last year rough-center calculation '' ''

date-ranges:
between june and july explicit ranges '' ''
from today to next haloween '' ''
aug 1 - aug 31 dash-ranges '' ''
22-23 February '' ''
today to next friday '' ''
during june '' ''
aug to june 1999 shared range info '' ''
before [2019] up-to a date '' ''
by march '' ''
after february date-to-infinity '' ''

repeating-intervals:
any wednesday n-repeating dates
any day in June repeating-date in range June 1 ... .. June 30
any wednesday this week '' ''
weekends in July more-complex interval '' ''
every weekday until February interval until date '' ''

Things it does awkwardly:

hmmm, description Start End
middle of 2019/June tries to find the sorta-center June 15 ''
good friday 2025 tries to reckon astronomically-set holidays '' ''
Oct 22 1975 2am in PST historical DST changes (assumes current dates) '' ''

Things it doesn't do:

😓, description Start End
not this Saturday, but the Saturday after self-reference logic '' ''
3 years ago tomorrow folksy short-hand '' ''
2100 military time formats '' ''
may 97 'bare' 2-digit years '' ''

API

Configuration:

.dates() accepts an optional object, that lets you set the context for the date parsing.

const context = {
  timezone: 'Canada/Eastern', //the default timezone is 'ETC/UTC'
  today: '2020-02-20', //the implicit, or reference day/year
  punt: { weeks: 2 }, // the implied duration to use for 'after june 2nd'
  dayStart: '8:00am',
  dayEnd: '5:30pm',
}

nlp('in two days').dates(context).get()
/*
  [{ start: '2020-02-22T08:00:00.000+5:00', end: '2020-02-22T17:30:00.000+5:00' }]
*/

Opinions:

Start of week:

By default, weeks start on a Monday, and 'next week' will run from Monday morning to Sunday night. This can be configued in spacetime, but right now we are not passing-through this config.

Implied durations:

'after October' returns a range starting Nov 1st, and ending 2-weeks after, by default. This can be configured by setting punt param in the context object:

doc.dates({ punt: { month: 1 } })

Future bias:

'May 7th' will prefer a May 7th in the future.

The parser will return a past-date though, in the current-month:

// from march 2nd
nlp('feb 30th').dates({ today: '2021-02-01' }).get()

This/Next/Last:

named-weeks or months eg 'this/next/last week' are mostly straight-forward.

This monday

A bare 'monday' will always refer to itself, or the upcoming monday.

  • Saying 'this monday' on monday, is itself.
  • Saying 'this monday' on tuesday , is next week.

Likewise, 'this june' in June, is itself. 'this june' in any other month, is the nearest June in the future.

Future versions of this library could look at sentence-tense to help disambiguate these dates - 'i paid on monday' vs 'i will pay on monday'.

Last monday

If it's Tuesday, 'last monday' will not mean yesterday.

  • Saying 'last monday' on a tuesday will be -1 week.
  • Saying 'a week ago monday' will also work.
  • Saying 'this past monday' will return yesterday.

For reference, Wit.ai & chronic libraries both return yesterday. Natty and SugarJs returns -1 week, like we do.

'last X' can be less than 7 days backward, if it crosses a week starting-point:

  • Saying 'last friday' on a monday will be only a few days back.

Next Friday

If it's Tuesday, 'next wednesday' will not be tomorrow. It will be a week after tomorrow.

  • Saying 'next wednesday' on a tuesday, will be +1 week.
  • Saying 'a week wednesday' will also be +1 week.
  • Saying 'this coming wednesday' will be tomorrow.

For reference, Wit.ai, chronic, and Natty libraries all return tomorrow. SugarJs returns +1 week, like we do.

Nth Week:

The first week of a month, or a year is the first week with a thursday in it. This is a weird, but widely-held standard. I believe it's a military formalism. It cannot be (easily) configued. This means that the start-date for first week of January may be a Monday in December, etc.

As expected, first monday of January will always be in January.

British/American ambiguity:

by default, we use the same interpretation of dates as javascript does - we assume 01/02/2020 is Jan 2nd, (US-version) but allow 13/01/2020 to be Jan 13th (UK-version). This should be possible to configure in the near future.

Seasons:

By default, 'this summer' will return June 1 - Sept 1, which is northern hemisphere ISO. Configuring the default hemisphere should be possible in the future.

Day times:

There are some hardcoded times for 'lunch time' and others, but mainly, a day begins at 12:00am and ends at 11:59pm - the last millisecond of the day.

Invalid dates:

compromise will tag anything that looks like a date, but not validate the dates until they are parsed.

  • 'january 34th 2020' will return Jan 31 2020.
  • 'tomorrow at 2:62pm' will return just return 'tomorrow'.
  • '6th week of february will return the 2nd week of march.
  • Setting an hour that's skipped, or repeated by a DST change will return the closest valid time to the DST change.

Inclusive/exclusive ranges:

'between january and march' will include all of march. This is usually pretty-ambiguous normally.

Date greediness:

This library makes no assumptions about the input text, and is careful to avoid false-positive dates. If you know your text is a date, you can crank-up the date-tagger with a compromise-plugin, like so:

nlp.extend(function (Doc, world) {
  // ambiguous words
  world.addWords({
    weds: 'WeekDay',
    wed: 'WeekDay',
    sat: 'WeekDay',
    sun: 'WeekDay',
  })
  world.postProcess(doc => {
    // tag '2nd quarter' as a date
    doc.match('#Ordinal quarter').tag('#Date')
    // tag '2/2' as a date (not a fraction)
    doc.match('/[0-9]{1,2}/[0-9]{1,2}/').tag('#Date')
  })
})

Misc:

  • 'thursday the 16th' - will set to the 16th, even if it's not thursday
  • 'in a few hours/years' - in 2 hours/years
  • 'jan 5th 2008 to Jan 6th the following year' - date-range explicit references
  • assume 'half past 5' is 5pm

About:

1 - Regular-expressions are too-brittle to parse dates.
2 - Neural-nets are too-wonky to parse dates.
3 - A corporation, or startup is the wrong place to build a universal date-parser.

Parsing dates, times, durations, and intervals from natural language can be a solved-problem.

A rule-based, community open-source library - one based on simple NLP - is the best way to build a natural language date parser - commercial, or otherwise - for the frontend, or the backend.

The match-syntax is effective and easy, javascript is prevailing, and the more people who contribute, the better.

See also

compromise-date is sponsored by

MIT licenced

Readme

Keywords

none

Package Sidebar

Install

npm i compromise-dates

Weekly Downloads

2,157

Version

3.5.0

License

MIT

Unpacked Size

631 kB

Total Files

65

Last publish

Collaborators

  • spencermountain