This repository contains the analysis and predictive modeling work completed during DataFest 2025 at UCLA. Our team developed a machine learning approach to predict markets with high probability for future healthcare leasing activity. We're proud to have been selected as one of the finalists to present our work at the competition.
The healthcare real estate market is dynamic and influenced by various factors. This project aims to:
- Analyze historical leasing patterns across different industries
- Identify correlations between healthcare leasing and other industry activities
- Build predictive models to forecast markets with high healthcare leasing potential
- Provide actionable insights for real estate investment decisions
The analysis uses commercial real estate leasing data, not included in the repo, (Leases.csv) containing:
- Market information
- Year and quarter of lease signing
- Industry classification
- Various metrics including rent, availability proportion, leased square footage
- Converted quarterly data into structured time series format
- Created date features from year and quarter information
- Identified top industries by leasing frequency
- Built market-level aggregations of leasing activity
We examined correlations between healthcare leasing and other industry leasing patterns to identify potential leading indicators.
- Created lagged predictors from previous quarters
- Generated industry-specific leasing counts
- Incorporated COVID-19 impact indicators
- Built market clustering features
We implemented 4 modeling strategies:
- Linear regression for initial exploration
- Time-series validation to assess model performance across years
- XGBoost classification to predict markets with future healthcare leasing
- Ensemble methods with calibrated probabilities for final predictions
- Temporal cross-validation with forward-chaining
- Special focus on 2021-2023 performance
- ROC-AUC and PR-AUC metrics
- Probability calibration for better decision support
- Healthcare leasing activity shows significant correlations with other industry sectors
- Models achieved calibrated AUC scores between 0.46-0.71 across test years
- Feature importance analysis reveals key drivers of healthcare leasing
- Market-level predictions provide actionable insights for future investment
predicting_healthcare.ipynb: Main analysis notebook containing all code and visualizationsLeases.csv: Dataset containing commercial real estate leasing information- Various PNG files: Visualizations of key metrics and findings
Presentation_Datathon.pdf: Final presentation of findings
The repository includes several visualizations:
- Availability proportion over time by building class
- Average rent over time by building class
- Healthcare opportunity by state
- Leased square feet over time by industry
- Net growth in leased space across top markets
- Senior population demographics
The final model demonstrates strong predictive capability:
- Calibrated AUC scores improved over time (0.464 in 2021 to 0.714 in 2023)
- Feature importance analysis identified key predictors of healthcare leasing activity
- Market ranking by probability provides actionable investment guidance
This project demonstrates that:
- Healthcare real estate leasing can be predicted using historical patterns from multiple industries
- The model provides practical guidance for identifying high-opportunity markets
- Temporal validation shows improving model performance over recent years
- Feature engineering and ensemble approaches yield robust predictions
DataFest 2025 Team at UCLA -- Clyde Villacrusis, Mindy Zhu, Sanskriti Shindadkar, Selena Lam, and Vivian Yee
We thank the DataFest 2025 organizers and UCLA for providing this opportunity to work on a real-world data science challenge.