BnBalyze, a collaboration project with four Data Scientists and seven Web Developers, utilizes data from 22,522 AirBnB listings in Berlin from November 2018 to predict an optimal price. Our hypothetical customer owns several properties; some are priced too low, leaving extra revenue on the table, and some are priced too high, leading to low occupancy rates.

Starting with a dataset of 96 features, we settled on a model with nine key features:

pic

We ran a combination of exploratory regression models – Random Forest, XGBoost, Keras Neural Network – all had similar accuracy scores. As the final backend version we used a hyperparameter-tuned Random Forest because of both feature explainability and a preference for returning common AirBnB prices such as $40 and not overly specific amounts like $39.15.

pic2

pic3

The feature importance of the final model was unsurprisingly dominated by room type (i.e. ‘shared’, ‘single’, ‘entire apartment’) and the number of guests accommodated.

pic4

Zooming in on the twelve different regions present in the dataset, central tourist-heavy neighborhoods dominate overall room availability and higher priced rentals in particular.

pic5

The final Netlify-hosted site deployment connected to a pickled model Flask application on Heroku. Specific pricing recommendations generally varied only slightly from actual prices, but an extra $7 per night adds up to an extra $2,100 per year assuming 300 nights booked.

pic6