npm bundle size

Schema Analyzer

An Open Source joint by Dan Levy ✨

Analyze column type & size summary from any input JSON array

Schema Analyzer is the core library behind Dan's Schema Generator.

Features

The primary goal is to support any input JSON/CSV and infer as much as possible. More data will generally yield better results.

Getting Started

npm install schema-analyzer

import { schemaBuilder } from 'schema-builder'
 
schemaBuilder(schemaName: String, data: Array<Object>): TypeSummary

Preview Analysis Results

What does this library's analysis look like?

It consists of 3 key top-level properties:

totalRows - # of rows analyzed.
fields: FieldTypeSummary - a map of field names with all detected types (includes meta-data for each type detected, with possible overlaps. e.g. an Email is also a String, "42" is a String and Number)

Review the raw results below

Details about each field can be found below.

{
  "totalRows": 5,
  "fields": {
    "id": {
      "types": {
        "Number": {
          "rank": 8,
          "count": 5,
          "value": { "min": 1, "mean": 3, "max": 5, "p25": 2, "p33": 2, "p50": 3, "p66": 4, "p75": 4, "p99": 5 }
        },
        "String": {
          "rank": 12,
          "count": 5,
          "length": { "min": 1, "mean": 1, "max": 1, "p25": 1, "p33": 1, "p50": 1, "p66": 1, "p75": 1, "p99": 1 }
        }
      }
    },
    "name": {
      "types": {
        "String": {
          "rank": 12,
          "count": 5,
          "length": { "min": 3, "mean": 7.2, "max": 15, "p25": 3, "p33": 3, "p50": 5, "p66": 10, "p75": 10, "p99": 15 }
        }
      }
    },
    "role": {
      "types": {
        "String": {
          "rank": 12,
          "count": 5,
          "length": { "min": 4, "mean": 5.4, "max": 9, "p25": 4, "p33": 4, "p50": 5, "p66": 5, "p75": 5, "p99": 9 }
        }
      }
    },
    "email": {
      "types": {
        "Email": {
          "rank": 11,
          "count": 5,
          "length": { "min": 15, "mean": 19.4, "max": 26, "p25": 15, "p33": 15, "p50": 18, "p66": 23, "p75": 23, "p99": 26 }
        }
      }
    },
    "createdAt": {
      "types": {
        "Date": {
          "rank": 4,
          "count": 4,
          "value": { "min": "2001-01-01T00:00:00.000Z", "mean": "2015-04-14T18:00:00.000Z", "max": "2020-02-02T00:00:00.000Z", "p25": "2020-02-02T00:00:00.000Z", "p33": "2020-02-02T00:00:00.000Z", "p50": "2019-12-31T00:00:00.000Z", "p66": "2019-12-31T00:00:00.000Z", "p75": "2001-01-01T00:00:00.000Z", "p99": "2001-01-01T00:00:00.000Z" }
        },
        "String": {
          "rank": 12,
          "count": 1,
          "length": { "min": 6, "mean": 6, "max": 6, "p25": 6, "p33": 6, "p50": 6, "p66": 6, "p75": 6, "p99": 6 }
        }
      }
    },
    "accountConfirmed": {
      "types": {
        "Unknown": {
          "rank": -1,
          "count": 1
        },
        "String": {
          "rank": 12,
          "count": 1,
          "length": { "min": 9, "mean": 9, "max": 9, "p25": 9, "p33": 9, "p50": 9, "p66": 9, "p75": 9, "p99": 9 }
        },
        "Boolean": {
          "rank": 3,
          "count": 4
        }
      }
    }
  }
}

Sample input dataset for the example results above

id	name	role	email	createdAt	accountConfirmed
1	Eve	poweruser	`eve@example.com`	01/20/2020	undefined
2	Alice	user	`ali@example.com`	02/02/2020	true
3	Bob	user	`robert@example.com`	12/31/2019	true
4	Elliot Alderson	admin	`falkensmaze@protonmail.com`	01/01/2001	false
5	Sam Sepiol	admin	`falkensmaze@hotmail.com`	9/9/99	true

`AggregateSummary`

Numeric and String types include a summary of the observed field sizes:

Number & String Range Object Details

Properties

min the minimum number or string length
max the maximum number or string length
mean the average number or string length
percentiles[25th, 33th, 50th, 66th, 75th, 99th] values from the Nth percentile number or string length

Percentile is based on input data, as-is with out sorting.

Length Range Data

Range data for the length of a String field type:

{
  "rank": 11,
  "count": 5,
  "length": { "min": 15, "mean": 19.4, "max": 26, "p25": 15, "p33": 15, "p50": 18, "p66": 23, "p75": 23, "p99": 26 }
}

This is useful for defining strict length limits or minimums, for example as SQL servers often require..

Range data for a Date fields value:

{
  "rank": 4,
  "count": 4,
  "value": { "min": "2001-01-01T00:00:00.000Z", "mean": "2015-04-14T18:00:00.000Z", "max": "2020-02-02T00:00:00.000Z", "p25": "2020-02-02T00:00:00.000Z", "p33": "2020-02-02T00:00:00.000Z", "p50": "2019-12-31T00:00:00.000Z", "p66": "2019-12-31T00:00:00.000Z", "p75": "2001-01-01T00:00:00.000Z", "p99": "2001-01-01T00:00:00.000Z" }
}

Notes

We recommend you provide at least 100+ rows. Accuracy increases greatly with 1,000 rows.

The following features require a certain minimum # of records:

Enumeration detection.
- 100+ Rows Required.
- Number of unique values must not exceed 20 or 5% of the total number of records. (100 records will identify as Enum w/ 5 values. Up to 20 are possible given 400 or 1,000+.)
Not Null detection.
- where rowCount === field count

Full List of Detected Types

Unknown
ObjectId
UUID
Boolean
Date
Timestamp
Currency
Float
Number
Email
String
Array
Object
Null

schema-analyzer

Schema Analyzer

Analyze column type & size summary from any input JSON array

Features

Getting Started

Preview Analysis Results

Review the raw results below

Sample input dataset for the example results above

`AggregateSummary`

Properties

Length Range Data

Notes

Full List of Detected Types

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

schema-analyzer

Schema Analyzer

Analyze column type & size summary from any input JSON array

Features

Getting Started

Preview Analysis Results

Review the raw results below

Sample input dataset for the example results above

AggregateSummary

Properties

Length Range Data

Notes

Full List of Detected Types

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

`AggregateSummary`

Weekly Downloads