fuata

Have you ever wanted to know who are the best people online to follow and why?

Data Model

{
  "followers": {
    "u1" : ["timestamp"],
    "u2" : ["timestamp"]
  },
  "following": {
    "u3" : ["timestamp"],
    "u2" : ["timestamp", "timestamp2", "timestamp3"]
  }
}

This can be stored as a basic flat-file database where github-username.json would be the file

In addition to creating a file per user, we should maintain an index of all the users we are tracking. the easiest way is to have a new-line-separted list.

~~But... in the interest of being able to run this on Heroku (where you don't have access to the filesystem so no flat-file-db!) I'm going to use LevelDB for this.~~ >> Use Files on DigitalOcean!

Tests

Backup the Data

Given that LevelDB is Node (in-memory) storage. It makes sense to either pay for persistance or use files!

Quantify Data Load

so crawling the full list of GitHub users (5 Million) once would require 5,000,000 * 5kb = 25 Gb!

Simple UI

FAQ?

Q: How is this different from Klout?
A: Klout tries to calculate your social "influence". That's interesting but useless for tracking makers.

Research

Useful Links

GitHub Stats API

curl -v https://api.github.com/users/pgte/followers

[
  {
    "login": "methodmissing",
    "id": 379,
    "avatar_url": "https://avatars.githubusercontent.com/u/379?v=2",
    "gravatar_id": "",
    "url": "https://api.github.com/users/methodmissing",
    "html_url": "https://github.com/methodmissing",
    "followers_url": "https://api.github.com/users/methodmissing/followers",
    "following_url": "https://api.github.com/users/methodmissing/following{/other_user}",
    "gists_url": "https://api.github.com/users/methodmissing/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/methodmissing/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/methodmissing/subscriptions",
    "organizations_url": "https://api.github.com/users/methodmissing/orgs",
    "repos_url": "https://api.github.com/users/methodmissing/repos",
    "events_url": "https://api.github.com/users/methodmissing/events{/privacy}",
    "received_events_url": "https://api.github.com/users/methodmissing/received_events",
    "type": "User",
    "site_admin": false
  },
 
etc...]

Issues with using the GitHub API:

But...

curl -v https://api.github.com/users/pgte/following/visionmedia

Interesting Facts

Profile Data to Scrape

  followercount: 11000,
  stared: 1000,
  followingcount: 147,
  worksfor: 'Segment.io',
  location: 'Victoria, BC, Canada',
  fullname: 'TJ Holowaychuk',
  email: 'tj@vision-media.ca',
  url: 'http://tjholowaychuk.com',
  joined: '2008-09-18T22:37:28Z',
  avatar: 'https://avatars2.githubusercontent.com/u/25254?v=2&s=460',
  contribs: 3217,
  longest: 43,
  current: 0

Tasks

var C = require('./src/index.js');
 
var user = 'alanshaw';
 
C.crawlUser(user, function (err, profile) {
  console.log(profile);
});

fuata

Untitled

Who Should I Follow?

Data Model

Tests

Backup the Data

Quantify Data Load

Simple UI

FAQ?

Research

Useful Links

GitHub Stats API

Issues with using the GitHub API:

But...

Interesting Facts

Profile Data to Scrape

Tasks

Objective 1

Twitter Followers

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Last publish

Collaborators

fuata

Untitled

Who Should I Follow?

Data Model

Tests

Backup the Data

Quantify Data Load

Simple UI

FAQ?

Research

Useful Links

GitHub Stats API

Issues with using the GitHub API:

But...

Interesting Facts

Profile Data to Scrape

Tasks

Objective 1

Twitter Followers

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Last publish

Collaborators

Weekly Downloads