Unveiling Insights for Prospective Hosts using Regression Analysis.
Primary goal was to understand the factors that significantly influenced rental prices in different neighborhoods with the use of Python / Pandas and data visualization.
With the use of Multiple Linear Regression analysis and thorough exploratory data analysis the following were derived:
The project extended over a timeframe of 4 to 6 weeks, during which extensive data cleaning, exploratory data analysis, and predictive modeling techniques were employed to gain insights into the factors influencing price fluctuations in AirBnB rentals.
Throughout the project, the P.A.C.E. methodology was utilized to establish a structured approach, ensuring meticulous planning and seamless execution. This project management framework helped stay organized at every step, effectively manage risks, and maintain work within scope and timeframe.
The early stages of the project were focused on data exploration, data cleaning, and transformation. It required an indepth review of the data dictionary provided by AirBnB, descriptive statistics of the dataset, and core functionality within Python packages such as Pandas, Seaborn, Matplotlib, and Numpy to investigate the state of different variables in the data.
Once the data was thoroughly evaluated and processed, extensive exploratory data analysis helped investigate relationships between variables, gauging their impact against rental price, and representing that data visually to tell a better story overall.
After a consolidated list of variables were selected for modeling, it was time to build a model which would best address rental price predictability. The model needed to be explainable, therefore, the best model with low complexity was a Multiple Linear Regression.
In order to assess the impact of the modeling on individual neighborhoods, a more focused analysis was performed using specific neighborhoods
as subsets of data. By evaluating the performance of the model within each neighborhood, additional insights were gained regarding the
statistical significance of the selected independent variables. In essence, while the model overall performed well (r-squared .54), performance
at the individual neighborhood level had drastic variances (ex: Sherman Oaks had an r-squared of .72, while Santa Monica had .22).
Ultimately, the data analysis revealed that the factors influencing rental prices in Los Angeles County varied across different neighborhoods.
While certain factors such as the number of bedrooms, type of rental property, and amenities offered showed statistically significant relationships
with price, the strength of these relationships varied depending on factors like neighborhood competition and demand. These findings emphasize the
importance of conducting further investigation in future projects to gain a deeper understanding of these dynamics.
Some future considerations: