MURD-ViT (Urban Retrofitting Detection with Vision Transformer) is a multimodal deep learning pipeline designed to detect urban retrofitting interventions—micro-scale upgrades to existing urban spaces. It is a ViT-based model that utilizes temporal Google Street View (GSV) imagery and demographic data (population density changes) to classify urban changes.
Key Features
- Multimodal Fusion: Combines temporal image pairs with demographic features like population density and percentage change.
- ViT-based Backbone: Leverages Vision Transformer architectures to capture global spatial dependencies in street view images.
- Spatial Stratified Sampling: Uses K-Means clustering to ensure geographic diversity and balanced class distribution.
- Robust Evaluation: Employs Top-2 Accuracy to address the rarity and class imbalance of urban retrofitting events.
- Geospatial Visualization: Includes interactive Folium maps to visualize sample distributions across study areas.