Better Reads, a book recommendation website based on a natural language processing model powered by spaCy and Elastic Beanstalk, was a collaborative project with three other data scientists and six web developers. Unfortunately, we pulled the plug on the AWS-hosted backend due to mounting costs, but the website is still available on Netlify.
First we web scraped 30,000 book details and descriptions from a popular online repository. After data cleaning and removal of non-English titles, our final database consisted of ~18,000 books. Using AWS SageMaker Notebooks, we trained a spaCy convolutional neural network, vector-based model on the dataset.
In order to decrease the size of deployment, we built a cosine-similarity function to compare test descriptions against the database and generate recommendations. With the database stored in an AWS S3 bucket and model hosted on Elastic Beanstalk, we built a Flask app to interface with the Netlify website and return the model’s top 10 recommendations as a JSON object. The resulting ISBN numbers were then sent to the Google Books API to retrieve corresponding cover art and book descriptions.