Loading the initial HTML Data. Each url will be analyzed for JavaScript information and sent down the pipe for decompiling and processing. Multiple website urls can specified. This operation should be cached as not to bother the webmasters.
{
moduleName: "download-html",
downloadCache: "2 hours",
websiteList: ["http://trackthis.link"]
}
To keep things compact and clean, each of the downloaded HTML documents will have all their script src="" paths resolved. For example if a http://www.example.com
document is referencing <script src="something.js">
the tag will be converted to <script src="http://www.example.com/something.js">
this eliminates keeping track of document metadata (such as its source).
{
moduleName: "resolve-src-attribute-url"
}
The Extraction of Website's JavaScript
{
moduleName: "extract-inline-scripts"
}
Here we can skip filenames by Regular Expression matching.
{
moduleName: "download-remote-scripts",
excludeList: [/modernizr.min.js$/, /detectizr.min.js$/]
}
Converting JavaScript's ObjectExpressions(AST) to plain JSON ascii data.We target all ObjectExpressions found in all the extracted scripts.
{
moduleName: "extract-object-expressions"
}
Now, we convert the AST representation of ObjectExpressions into plain old JSON.
{
moduleName: "convert-object-expressions-to-json"
}
Finding the object of interest. With a simple string search we identify the JSON of interest. The keywords used here are mentioned in the target website.
{
moduleName: "select-json",
stringList: ["hypebeast", "rich", "doomsday", "influencer"]
}
and do a little bit of cleaning and transformation.
{
moduleName: "pretify-object-structure"
}
Now we download all websites mentioned in the object of interest, and cache them for analysis as not to upset any webmasters.
{
moduleName: "prefetch-all-webpages",
moduleCache: "8 hours",
statusEnabled: true
}
Now, we extract titles from all the cached websites.
{
moduleName: "fetch-real-link-titles"
}
Markdown Document Creation. Now that the object contains real titles and descriptions we proceed to create asimple markdown document that can be read over at github.
{
moduleName: "convert-links-to-markdown"
}
save it for everybody to see
{
moduleName: "save-data",
fileName: "TRACKTHIS.md"
}