Skip to content

Clyde0513/datafest_2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Healthcare Real Estate Leasing Prediction

DataFest 2025 - UCLA Project

This repository contains the analysis and predictive modeling work completed during DataFest 2025 at UCLA. Our team developed a machine learning approach to predict markets with high probability for future healthcare leasing activity. We're proud to have been selected as one of the finalists to present our work at the competition.

Project Overview

The healthcare real estate market is dynamic and influenced by various factors. This project aims to:

  1. Analyze historical leasing patterns across different industries
  2. Identify correlations between healthcare leasing and other industry activities
  3. Build predictive models to forecast markets with high healthcare leasing potential
  4. Provide actionable insights for real estate investment decisions

Dataset

The analysis uses commercial real estate leasing data, not included in the repo, (Leases.csv) containing:

  • Market information
  • Year and quarter of lease signing
  • Industry classification
  • Various metrics including rent, availability proportion, leased square footage

Methodology

Data Preprocessing

  • Converted quarterly data into structured time series format
  • Created date features from year and quarter information
  • Identified top industries by leasing frequency
  • Built market-level aggregations of leasing activity

Correlation Analysis

We examined correlations between healthcare leasing and other industry leasing patterns to identify potential leading indicators.

Feature Engineering

  • Created lagged predictors from previous quarters
  • Generated industry-specific leasing counts
  • Incorporated COVID-19 impact indicators
  • Built market clustering features

Modeling Approach

We implemented 4 modeling strategies:

  1. Linear regression for initial exploration
  2. Time-series validation to assess model performance across years
  3. XGBoost classification to predict markets with future healthcare leasing
  4. Ensemble methods with calibrated probabilities for final predictions

Validation Strategy

  • Temporal cross-validation with forward-chaining
  • Special focus on 2021-2023 performance
  • ROC-AUC and PR-AUC metrics
  • Probability calibration for better decision support

Key Findings

  1. Healthcare leasing activity shows significant correlations with other industry sectors
  2. Models achieved calibrated AUC scores between 0.46-0.71 across test years
  3. Feature importance analysis reveals key drivers of healthcare leasing
  4. Market-level predictions provide actionable insights for future investment

Files in Repository

  • predicting_healthcare.ipynb: Main analysis notebook containing all code and visualizations
  • Leases.csv: Dataset containing commercial real estate leasing information
  • Various PNG files: Visualizations of key metrics and findings
  • Presentation_Datathon.pdf: Final presentation of findings

Visualizations

The repository includes several visualizations:

  • Availability proportion over time by building class
  • Average rent over time by building class
  • Healthcare opportunity by state
  • Leased square feet over time by industry
  • Net growth in leased space across top markets
  • Senior population demographics

Model Performance

The final model demonstrates strong predictive capability:

  • Calibrated AUC scores improved over time (0.464 in 2021 to 0.714 in 2023)
  • Feature importance analysis identified key predictors of healthcare leasing activity
  • Market ranking by probability provides actionable investment guidance

Conclusions

This project demonstrates that:

  1. Healthcare real estate leasing can be predicted using historical patterns from multiple industries
  2. The model provides practical guidance for identifying high-opportunity markets
  3. Temporal validation shows improving model performance over recent years
  4. Feature engineering and ensemble approaches yield robust predictions

Contributors

DataFest 2025 Team at UCLA -- Clyde Villacrusis, Mindy Zhu, Sanskriti Shindadkar, Selena Lam, and Vivian Yee

Acknowledgements

We thank the DataFest 2025 organizers and UCLA for providing this opportunity to work on a real-world data science challenge.

About

Healthcare Real Estate Leasing Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published