wordsoap

0.2.0 • Public • Published

wordsoap

Build Status NPM version

Clean up dirty HTML output from Microsoft Word

Usage

command line

$ npm install -g wordsoap
$ cat msword_garbage.html | wordsoap

module

$ npm install --save wordsoap
var wordsoap = require('wordsoap')
 
var dirty = "<p class=MsoNormal style='font-size:12pt'>Text</p>")
var clean = wordsoap(dirty) // <p>Text</p>
 
// access individual regex strings
wordsoap.regexes.msoAttributes // <(\w+)(?: (?:class|lang|style|size|face|[ovwxp]))=(?:'[^']*'|""[^""]*""|[^\s>]+)(?:[^>]*)>
 
// access individual regexes compiled with 'gi' flags
wordsoap.regexesCompiled.msoAttributes // <(\w+)(?: (?:class|lang|style|size|face|[ovwxp]))=(?:'[^']*'|""[^""]*""|[^\s>]+)(?:[^>]*)>

License

ISC © Raine Lourie

/wordsoap/

    Package Sidebar

    Install

    npm i wordsoap

    Weekly Downloads

    2

    Version

    0.2.0

    License

    ISC

    Last publish

    Collaborators

    • raine