All Projects
Solar irradiance probabilistic forecast visualization
EPFL CFI Lab··school·research

Uncertainty-Aware Solar Forecasting

Developed probabilistic solar irradiance forecasting models that quantify prediction uncertainty, enabling more reliable grid integration of solar energy.

mlprobabilisticforecasting
  • Implemented Bayesian neural networks and ensemble methods for probabilistic irradiance prediction
  • Achieved 15% improvement in calibration over deterministic baselines
  • Built a pipeline for real-time ingestion of meteorological data from Swiss weather stations
  • Developed custom metrics for evaluating probabilistic forecast quality (CRPS, reliability diagrams)
Stack
PythonPyTorchScikit-learnPandasMatplotlib
RoleResearch project
Team3 people

Overview

Solar energy integration into the power grid needs more than just "how much sun do we expect?" — grid operators need to know how confident that prediction is. A 3-person team project at EPFL's Chair of Finance and Insurance Lab, tackling uncertainty quantification in short-term solar irradiance forecasting.

What I Built

A full forecasting pipeline from raw meteorological data to probabilistic predictions:

Data Pipeline

  • Ingested historical solar irradiance measurements from MeteoSwiss stations
  • Incorporated webcam sky imagery alongside meteorological features for richer input
  • Feature engineering: solar geometry (zenith angle, azimuth), cloud cover indices, lagged irradiance values, time-of-day cyclical features
  • Robust handling of missing data, sensor anomalies, and seasonal patterns

Modeling

  • Deterministic baselines: Gradient boosted trees (XGBoost), feedforward neural networks
  • Probabilistic models:
    • Monte Carlo Dropout for approximate Bayesian inference
    • Deep Ensembles (5 independently trained neural networks)
    • Quantile Regression Neural Networks
    • Gaussian Process regression for short horizons
  • Post-hoc calibration: Isotonic regression and temperature scaling to improve calibration

Evaluation Framework

  • Continuous Ranked Probability Score (CRPS) as the primary metric
  • Reliability diagrams and sharpness analysis
  • Coverage analysis at multiple confidence levels (50%, 80%, 90%, 95%)

Technical Details

The key insight was that different sources of uncertainty matter at different forecast horizons:

  • Short-term (< 1 hour): Aleatoric uncertainty dominates — mainly from rapid cloud transients. MC Dropout captured this well.
  • Medium-term (1–6 hours): Both aleatoric and epistemic uncertainty are significant. Deep Ensembles performed best here.
  • Day-ahead: Epistemic uncertainty dominates — model uncertainty about weather patterns. Gaussian Processes gave the best-calibrated uncertainty estimates.

We took two complementary approaches: quantile regression for sharp, computationally efficient prediction intervals, and Bayesian neural networks for a broader view of uncertainty. Quantile regression gave us the best-calibrated intervals, especially when meteorological data was incorporated. BNNs produced wider intervals and needed more compute, but captured a richer picture of what the model didn't know.

We also implemented a horizon-adaptive ensemble that blended predictions from different models based on the forecast horizon, weighted by their historical CRPS performance.

Challenges & Tradeoffs

  • Calibration vs. sharpness tradeoff: Models can trivially achieve perfect calibration by predicting very wide intervals. We optimized for CRPS which naturally balances both.
  • Computational cost: Gaussian Processes don't scale well to large datasets. We used sparse GP approximations with inducing points.
  • Non-stationarity: Solar irradiance patterns change seasonally. We implemented online learning with exponential decay weighting of historical data.

Results

  • 15% CRPS improvement over deterministic baselines
  • Well-calibrated intervals: 90% prediction intervals contained the true value 89.2% of the time (near-perfect calibration)
  • The horizon-adaptive ensemble outperformed any single model across all forecast horizons
  • Results presented to the lab group; potential extension to wind power forecasting discussed

What I Learned

  • Probabilistic forecasting is a genuinely different way of thinking — it changes how you design models, not just how you evaluate them
  • CRPS matters more than MSE when you care about the full predictive distribution
  • Real sensor data is messy in ways that textbook datasets aren't — missing values, calibration drift, timestamps that don't quite line up
  • Uncertainty quantification isn't just academic — it directly translates to economic value in energy trading