In progress of factoring out of feedbot and socdb.
Only for twitter at the moment, but should work for other social networks eventually.
State is kept in PostgreSQL, between & during runs.
Use something like:
const Crawler = const config = {} // use env vars if omitted here// where we store the data// PGDATABASE or config.pgDatabase// PGPASSWORD or config.pgPassword// what user we access twitter as// TWUSERTOKEN or config.tw.userToken// TWUSERTOKENSECRET or config.tw.userTokenSecretconst crawler = config // if you want crawling in this process; without this, you can stillread the database but it wont do any crawlerstart // if you want to shut down the crawling; necessary if you want the// process to exit cleanly, because otherwise it's just waiting for more// stuff to appearcrawler // add stuff to the queue, which flows out as priorities for related things// via computeContacts()crawlercrawler crawler // iteratorcrawler // iteratorcrawler // iter, what we have so farcrawler // iter, what we have so farcrawler // iter, what we have so farcrawler const stat = crawlerstatus// status.postsFetched// status.postsFetchable// status.followersFetched// status.leadersFetched// status.contactPostsFetched// status.contactPostsFetchable // Who are the contacts (from leaders, followers, posts, etc) and how// important are they? You can provide this function if you don't like// the default: leaders are weight 10, followers weight 1, plus 1// point for any like / 3 points for any boost, in the past 90 days. crawlercomputeContacts = async { // Look at followers, leaders, posts, etc, to compute // a weighted table of contacts. // // Return a Set mapping contact -> weight. // // All weights turn into priorities by * 0.001 * user priority}