scraperscript

1.0.2 • Public • Published

ScraperScript

Travis Downloads Node Version XO code style

ScraperScript is a query language for Web Scraping

Installation

Module available through the npm registry. It can be installed using the npm or yarn command line tools.

# NPM 
npm install scraperscript --global
# Or Using Yarn 
yarn global add scraperscript

Documentation

Use the command scraperscript myfile or server

Example file.

@https://helloword.site/list
!! A comment ...
- names: html >> body >> div >> h2 @> {number, text, bold} :array
- hasTitle: html >> head >> title == " my string " :boolean
- title: html >> head >> title :string

This return an json:

"error"false,
"errorsMsg"[],
"names"[
    {
        "number": 0,
        "text": "Tiago"
    },
    {
        "number": 0,
        "text": "James"
    }
],
"hasTitle"true,
"title""my string"

Syntax

Place the URL in the first line: @http://myurl.com

Other lines: - key: query :type

PS: Space is important.

Key

Name

Rules:

  • Use at the beginning of the line
  • Format - key:

Example: - name:

Type

Return type

Rules:

  • Use at the end of the line
  • Format :type

Types:

  • array
  • object
  • boolean
  • string
  • number

Example: :string

Query

String

" my string "

NOTE: "my string" is invalid

Comment

!! my comment in ScrapperScript

Elements

nameOfHtmlElementOne >> nameOfHtmlElementTwo

Map elements [String]

nameOfHtmlElementOne @> nameOfSubHtmlElement

Map elements [Array]

nameOfHtmlElementOne @> [nameOfSubHtmlElement]

Map elements [Object]

nameOfHtmlElementOne @> {nameOfIndex, nameOfData, nameOfSubHtmlElement}

Addition

nameOfHtmlElementOne ++ nameOfHtmlElementTwo

Replace

nameOfHtmlElementOne -- nameOfHtmlElementTwo

Equal comparison or Different

nameOfHtmlElementOne == nameOfHtmlElementTwo

nameOfHtmlElementOne ~= nameOfHtmlElementTwo

OR

nameOfHtmlElementOne || nameOfHtmlElementTwo

Tests

To run the test suite, first install the dependencies, then run test:

# NPM 
npm test
# Or Using Yarn 
yarn test

Dependencies

  • axios: Promise based HTTP client for the browser and node.js
  • cheerio: Tiny, fast, and elegant implementation of core jQuery designed specifically for the server

Dev Dependencies

  • body-parser: Node.js body parsing middleware
  • express: Fast, unopinionated, minimalist web framework
  • mocha: simple, flexible, fun test framework
  • xo: JavaScript happiness style linter ❤️

Contributors

Pull requests and stars are always welcome. For bugs and feature requests, please create an issue. List of all contributors.

License

MIT © Tiago Danin

Package Sidebar

Install

npm i scraperscript

Weekly Downloads

4

Version

1.0.2

License

MIT

Unpacked Size

13.3 kB

Total Files

11

Last publish

Collaborators

  • tiagodanin