@yiminghe/regexp
TypeScript icon, indicating that this package has built-in type declarations

0.1.9 • Public • Published

regexp in js

https://github.com/yiminghe/kison

Match regular expression synchronously or asynchronously ( stream ).

usage

sync match

import * as regexp from '@yiminghe/regexp';

// parse ast
console.log(regexp.parse('(a|b)*z'));

// match
const options={
  multiline:false,
  caseInsensitive:false,
  dotMatchesLineSeparators:false,
};
const patternInstance = regexp.compile('(a|b)*z', options);
const matcher = patternInstance.matcher('abzaaz');
let m;
while (m = matcher.match()) {
  console.log(m);
}

async match

import * as regexp from '@yiminghe/regexp';
(async function(){
  let buffer = ["c", "a", "b", "x", "a", "a", "b"];
  const patternInstance = regexp.compile("a+b", { async: true });
  const matcher = patternInstance.matcherAsync(() => {
    return new Promise(resolve => {
      setTimeout(() => {
        if (buffer.length) {
          // or as whole
          // resolve(buffer); buffer=[];
          resolve([buffer.shift()]);
        }
      }, 100);
    });
  });
  let ret = await matcher.match();
  expect(ret).toMatchInlineSnapshot(`
    Object {
      "match": "ab",
    }
  `);
  ret = await matcher.match();
  expect(ret).toMatchInlineSnapshot(`
    Object {
      "match": "aab",
    }
  `);
})();

Features

Character Classes

A character class matches any one of a set of characters.

  • [character_group] – matches any single character in character_group, e.g. [ae]
  • [^</b><i>character_group</i><b>] – negation, matches any single character that is not in character_group, e.g. [^ae]
  • [first-last] – character range, matches any single character in the given range from first to last, e.g. [a-z]
  • . – wildcard, matches any single character except \n
  • \w - matches any word character (negation: \W)
  • \s - matches any whitespace character (negation: \S)
  • \d - matches any decimal digit (negation: \D)
  • \z - matches end of string (negation: \Z)

Character Escapes

The backslash (\) either indicates that the character that follows is a special character or that the keyword should be interpreted literally.

  • \keyword – interprets the keyword literally, e.g. \{ matches the opening bracket
  • \special_character – interprets the special character, e.g. \b matches word boundary (more info in "Anchors")
  • \uhexadecimal_number – interprets the hexadecimal number to char, e.g. \u0061 matches character 'a'
  • \xhexadecimal_number – interprets the hexadecimal number to char, e.g. \x61 matches character 'a'

Anchors

Anchors specify a position in the string where a match must occur.

  • ^ – matches the beginning of the string (or beginning of the line when .multiline option is enabled)
  • $ – matches the end of the string or \n at the end of the string (end of the line in .multiline mode)
  • \A – matches the beginning of the string (ignores .multiline option)
  • \Z – matches the end of the string or \n at the end of the string (ignores .multiline option)
  • \z – matches the end of the string (ignores .multiline option)
  • \G – match must occur at the point where the previous match ended
  • \b – match must occur on a boundary between a word character and a non-word character (negation: \B)

Assertions

  • x(?=y)Lookahead assertion: Matches "x" only if "x" is followed by "y". For example, /Jack(?=Sprat)/ matches "Jack" only if it is followed by "Sprat". /Jack(?=Sprat|Frost)/ matches "Jack" only if it is followed by "Sprat" or "Frost". However, neither "Sprat" nor "Frost" is part of the match results.

  • x(?!y)Negative lookahead assertion: Matches "x" only if "x" is not followed by "y". For example, /\d+(?!.)/ matches a number only if it is not followed by a decimal point. /\d+(?!.)/.exec('3.141') matches "141" but not "3".

  • (?<=y)xLookbehind assertion: Matches "x" only if "x" is preceded by "y". For example, /(?<=Jack)Sprat/ matches "Sprat" only if it is preceded by "Jack". /(?<=Jack|Tom)Sprat/ matches "Sprat" only if it is preceded by "Jack" or "Tom". However, neither "Jack" nor "Tom" is part of the match results.

  • (?<!y)xNegative lookbehind assertion: Matches "x" only if "x" is not preceded by "y". For example, /(?<!-)\d+/ matches a number only if it is not preceded by a minus sign. /(?<!-)\d+/.exec('3') matches "3". /(?<!-)\d+/.exec('-3') match is not found because the number is preceded by the minus sign.

Grouping Constructs

Grouping constructs delineate the subexpressions of a regular expression and capture the substrings of an input string.

  • (subexpression) – captures a subexpression in a group
  • (?<name>subexpression) – captures a subexpression in a named group
  • (?:subexpression) – non-capturing group

Backreferences

Backreferences provide a convenient way to identify a repeated character or substring within a string.

  • \number – matches the capture group at the given ordinal position e.g. \4 matches the content of the fourth group

  • \k<name> – matches the capture group at the given name e.g. \k<c> matches the content of the named group c

Quantifiers

Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found.

  • * – match zero or more times
  • + – match one or more times
  • ? – match zero or one time
  • {n} – match exactly n times
  • {n,} – match at least n times
  • {n,m} – match from n to m times, closed range, e.g. a{3,4}

All quantifiers are greedy by default, they try to match as many occurrences of the pattern as possible. Append the ? character to a quantifier to make it lazy and match as few occurrences as possible, e.g. a+?.

Warning: lazy quantifiers might be used to control which groups and matches are captured, but they shouldn't be used to optimize matcher performance which already uses an algorithm which can handle even nested greedy quantifiers.

Alternation

  • | – match either left side or right side

Options

  • caseInsensitive – match letters in the pattern independent of case.
  • multiline – control the behavior of ^ and $ anchors. By default, these match at the start and end of the input text. If this flag is set, will match at the start and end of each line within the input text.
  • dotMatchesLineSeparators – allow . to match any character, including line separators.
  • sticky - match must be anchored to last matched index.
  • unicode - switch to unicode mode
  • bfs - whether match by breadth first search strategy. Can only be used when check whether is matched);

Package Sidebar

Install

npm i @yiminghe/regexp

Weekly Downloads

0

Version

0.1.9

License

MIT

Unpacked Size

719 kB

Total Files

20

Last publish

Collaborators

  • yiminghe