Top 1000 Books Summary Comparison
actual dataset is contains 518 books.
Install & Use
npm i -g book-summary-comparatorbook-summary-comparator
How
Firstly find which document is which books.
Steps
- Step 1: Gather which ref we have.
ls |grep "" > ref.txt # For unix ls | grep "" > ref.txt # For linux
- Step 2: Gather which book list we have.
// open https://www.edebiyatogretmeni.org/etiket/1000-roman-ozeti-indir/// Execute this JS in browser console:// Gather content.const data = document1innerHTML// Generate hidden input for copy to clipboard.const dummyInput = documentdummyInputvalue = data; /* Select the text field */dummyInput;dummyInput; /* Copy the text inside the text field */document; // Ta-da! Copied your clipboard.
Paste into a ref.txt
echo `pbpaste` > list.txt
You have both list.txt
and ref.txt
- Step 2: Match the refs and list.
node match.js; # Match the books and refs. Left pad 00001 to 1.doc node extract.js # Extract data accordingly 1.doc to 1.txt
- Step 3: Generate db.json
Just loop over database then fs.writeFileSync..
- Step 4: Compare books with other books. Big O (n^2)
Thankfully Node.js and JS, any loop is parallel executed by default. Does not needed any parallelisation process.
node magic.js # Comparison algorithm is: Dice's Coefficient. # Generates two indexed JSON. case_1, case_2
Other staffs are for build CLI :)
Cheers, Cagatay Cali.