GuzzLabs

EPWForge — Data Sources & Methods

Transparency in how EPWForge generates, transforms, and presents climate and weather file data.

EPWForge generates simulation-ready weather files for building energy modeling using physically consistent methods based on CMIP6 climate projections, ERA5 reanalysis, and ASHRAE-aligned workflows.

The result is distribution-aware, physically coherent weather data that improves confidence in peak load estimation, HVAC sizing, and resilience analysis.

What makes it different

Physically consistent percentiles

Each percentile corresponds to a single climate model realization across all variables, preserving coherent weather states and avoiding physically impossible combinations created by variable-by-variable ranking.

Future design conditions from full hourly data

ASHRAE-style design values are computed directly from morphed 8760-hour datasets — not by applying offsets to historical design conditions.

Per-model climate ensembles

Multiple EPW files per scenario are generated using individual CMIP6 model deltas, enabling explicit evaluation of inter-model uncertainty.

Global, site-flexible weather generation

Weather files can be generated for any location on Earth using ERA5-based datasets — not limited to predefined weather stations.

System overview

Baseline Weather (TMY / ERA5)
            +
      CMIP6 Climate Deltas
            ↓
   Morphing + QA/QC + Ensembles
            ↓
EPW Files • Design Conditions • Extreme Events

EPWForge generates simulation-ready weather files for building energy modeling using physically consistent methods based on CMIP6 climate projections, ERA5 reanalysis, and ASHRAE-aligned workflows.

Key Differentiators

EPWForge departs from conventional weather file workflows in several important ways:

Physically consistent percentiles

Percentiles are assigned at the model level. All variables at a given percentile originate from the same CMIP6 model realization, preserving internally consistent relationships between temperature, humidity, solar radiation, and wind. Approaches that rank each variable independently can produce physically impossible weather states (for example, very high temperature paired with very low solar radiation drawn from different models).

Design conditions from morphed hourly data

Future ASHRAE design values are derived directly from the full 8760-hour morphed dataset, ensuring consistency between extreme values and coincident conditions. This is materially different from delta-only methods that simply offset historical design values.

Per-model climate ensembles

Each CMIP6 model produces a distinct EPW file, allowing simulation across the full ensemble to quantify uncertainty in loads, energy use, and thermal comfort.

In contrast to traditional delta-only approaches, EPWForge operates on full hourly distributions rather than adjusting summary statistics — preserving extremes, coincident conditions, and inter-variable relationships throughout the morphing process.

Weather File Sources

Typical Meteorological Year (TMY)

TMY data is sourced from Climate.OneBuilding.Org, derived from ASHRAE IWEC2, TMY3, and TMYx datasets. Using the de facto reference for the energy modeling community ensures compatibility and reproducibility of simulation results across projects and firms.

Infrastructure: EPWForge maintains a local mirror — GuzzStations — of 74,693 EPW files covering 17,137 stations, audited monthly against the upstream OneBuilding spreadsheet so requests never depend on third-party uptime. Catalog and library are kept in 1:1 sync; dead entries from OneBuilding's own spreadsheet drift are pruned from our station picker so users never click into a file that doesn't exist.

Actual Meteorological Year (AMY)

AMY data is generated from the ECMWF ERA5 reanalysis, which provides hourly, gap-free global coverage from the mid-20th century to present. ERA5 integrates millions of observations — surface stations, radiosondes, satellites, aircraft, and ocean buoys — into a physically consistent atmospheric state at every timestep, so all variables are internally consistent rather than spliced from disparate sources. This eliminates the missing-data and instrument-gap issues common to traditional station-based AMY files, and extends coverage to any location on Earth.

As with all gridded datasets, ERA5 represents regional atmospheric conditions and may not fully capture localized effects such as complex terrain, coastal microclimates, or dense urban environments. EPWForge addresses the urban case through its UHI adjustment system.

Site-Specific TMYx Generation

EPWForge generates TMYx weather files for arbitrary locations using the Finkelstein–Schafer statistic in accordance with ISO 15927-4 — the same methodology used by OneBuilding for their published TMYx datasets — with multiple period variants available to match published TMYx products.

Validation against several hundred OneBuilding TMYx stations across all major climate zones shows strong agreement in annual temperature and solar radiation. Deviations are primarily observed in complex terrain and high-latitude regions, consistent with the known limitations of gridded reanalysis datasets.

Future Climate Projections

Future weather files are generated using the Belcher et al. (2005) morphing methodology, the industry standard for climate-adjusted EPW generation. Climate change deltas are derived from a consistent subset of CMIP6 models and applied across all variables to preserve inter-variable relationships.

Supported scenarios:

  • SSP1-2.6 — low emissions
  • SSP2-4.5 — moderate emissions
  • SSP3-7.0 — high emissions
  • SSP5-8.5 — very high emissions

Each scenario is available across multiple time horizons (2030, 2050, 2070, 2090) and seven percentile bands (P5–P95).

Design Conditions, Ground Temperatures & UHI

Design conditions

Design day values (heating 99.6%/99%, cooling 0.4%/1%/2% dry bulb and wet bulb, evaporation, dehumidification) are computed directly from the 8760-hour dataset using ASHRAE Fundamentals methodology. Mean coincident values are derived from percentile exceedance subsets, ensuring physically meaningful pairings between dry-bulb temperature and coincident variables.

Illustrative Example — Miami (SSP2-4.5, 2050)
Cooling 1% Dry Bulb
  Historical TMY:        32.1 °C
  EPWForge (morphed):    35.4 °C   ← derived from morphed hourly distribution
  Delta-only method:     33.6 °C   ← historical + ΔT only

Illustrative values shown — actual results vary by location and percentile. The point: morphing the full hourly distribution captures shifts in extremes that simple offsets miss.

Ground temperatures

Undisturbed ground temperatures at any depth are computed using a published two-harmonic analytical model that extends the classical Kusuda–Achenbach approach with improved seasonal asymmetry. Inputs are derived from TMY surface air temperature data.

Urban Heat Island (UHI)

Most weather stations are located at airports or rural outskirts, which can be several °C cooler than dense urban cores at night. EPWForge offers UHI presets aligned with the Stewart & Oke (2012) Local Climate Zone framework. Diurnal profiles are applied to dry bulb temperature, dew point, and surface wind, with magnitudes consistent with observed ranges in the urban climatology literature.

Climate Ensembles & Extreme Events

Climate ensembles

A single morphed weather file is deterministic and cannot capture inter-model climate uncertainty. EPWForge generates per-model climate ensembles using real CMIP6 model deltas — multiple individual model EPWs per scenario, each representing a distinct projection. Engineers can run simulations across all members to understand the range of possible outcomes (peak loads, annual EUI, overheating hours) rather than relying on a single point estimate. This approach aligns with emerging best practice from NREL and LBNL for climate-resilient design.

Ensemble generation may take several seconds depending on location and scenario.

Extreme event injection

Typical meteorological years deliberately exclude extreme events, but resilience analysis requires understanding building performance under heat waves, cold snaps, and humidity events. EPWForge maintains a global extreme events database derived from ERA5 reanalysis, with statistically fitted return-period intensities at every grid cell. Events can be stitched into baseline or future weather files for direct analysis of questions like “if grid power fails during a 25-year heat wave in 2050, does the building remain habitable?”

Intensity scale

Event intensity is exposed as a 1–10 slider per event type, mapped piecewise to an anomaly multiplier on the historical event:

  • Slider 5 = 0.6× — typical extreme event (default)
  • Slider 7 ≈ 0.96× — severe historical (~50-yr return period)
  • Slider 10 = 1.5× — stress test, exceeds observed extremes

Sliders 8–10 are gated behind an explicit stress_test=true flag on the API and the MCP server. These produce events more extreme than anything in the observational record and are appropriate for resilience studies (“what fails first?”) but not for HVAC sizing or code compliance — downstream simulators may also exceed their psychrometric bounds at extreme intensities.

Time-warp placement (day-level)

When a user requests an event of duration N days, EPWForge maps the historical event's available days across the requested duration via a day-level time-warp: stitched day k draws from source event day floor(k × source_days / stitched_days), preserving the natural rise-peak-fall shape and each day's diurnal cycle. The peak day still appears for ~1–2 stitched days; surrounding days carry the natural shoulder pattern.

This replaces an older “peak-day-cycling” approach that repeated the single hottest / coldest day for the full event duration. The cycling produced sustained extremes that were physically implausible and inflated stitched temperatures by ~10°F relative to a natural event shape — particularly noticeable when stacked with SSP morphing and high percentile bands.

Compound events

Two compound pairings are supported with co-driven physics:

  • Heat wave + Hot-humid — heat sets the temperature anomaly, hot-humid contributes its humidity profile. Secondary anomaly layered at 0.5 × the secondary's own intensity factor (so the hot-humid slider scales humidity contribution independently).
  • Cold snap + Cold-windy — cold sets the temperature anomaly, cold-windy contributes the wind profile at the same 0.5 × secondary-intensity blend.

The 0.5 physical blend reflects that compound extremes are not fully additive — the atmospheric circulations driving them are partially correlated rather than independent.

AR6 Phase 4 SSP scaling

When an SSP scenario is active, event intensity sliders auto-populate using factors from the IPCC AR6 Atlas (Phase 4): TXx (annual peak temperature) for heat events, TNn (annual minimum) for cold. The factor reflects how much extreme events scale faster (or slower) than the mean climate signal at each location and horizon.

Cold-family floor. AR6 evidence suggests cold extremes warm faster than mean (dampening future severity), but recent observational record (Texas 2021, polar-vortex disruption events documented in Cohen 2026) doesn't yet support this cleanly. EPWForge defaults cold-family events to slider 5 under SSP (no future dampening, no false amplification) — users can manually override either way.

Improbability indicator. Each generated file carries an improbability score (1–10) computed from the joint rarity of active dials (SSP × year × percentile × UHI × event intensity × smoke). The UI surfaces this as a header warning when the combined settings drift into stress-test (≥4) or exploratory (≥7) territory.

Where the event is placed in the year

Events are stitched at the climatologically peak window of the baseline file, not at the historical event's calendar date. Heat-family events (heat waves, hot-humid events) are inserted at the hottest 14-day window of the baseline file; cold-family events (cold snaps, cold-windy) at the coldest. Smoke overlays auto-align to whichever event drives the season.

We do this for three reasons:

  • Design-condition relevance. ASHRAE 0.4% / 1% / 2% percentile design conditions are by definition the worst hours of the year — they land in peak season. Layering an extreme event on top of those hours is the natural extension; placing the event off-season produces a smaller peak temperature than the design-day already sits at, which has no equipment-sizing implication.
  • AR6 scaling alignment. The Phase 4 SSP intensity factor is derived from AR6 TXx (annual peak temperature) and TNn (annual minimum). Those metrics are by construction in the peak month. Scaling the event anomaly by a TXx factor while placing the event in a shoulder month would mix two different season references; both the factor and the placement should refer to the same season for the result to be physically coherent.
  • Statistical honesty. The historical event database returns the single worst event over ~75 years of ERA5 at each cell. Whether that specific event happened in May or July is partly atmospheric-circulation luck — one stochastic realization. Treating that one calendar slot as definitive isn't methodologically meaningful; the physically robust part of the historical record is the shape of the anomaly (ramp / plateau / decay), which we preserve.

What we keep from the historical event: the anomaly profile shape and the relative magnitudes across temperature, humidity, wind, and solar. What we replace: the calendar slot. The historical reference dates remain visible in the UI tooltip as provenance.

Wildfire Smoke Overlay

Wildfire smoke increasingly drives building HVAC sizing in fire-prone regions but is absent from conventional EPW files. EPWForge models smoke impact via a CAMS-AOD climatology (aerosol optical depth, 86,896 global grid cells, 2003–2025 record) and applies it physically to the relevant EPW fields when active.

Physics

  • Solar attenuation — Beer-Lambert law applied to GHI / DNI / DHI using the AOD profile of the active event.
  • Asymmetric surface temperature impact — −3 °F daytime cooling per AOD unit (shaded mornings, reduced solar gain) vs. +1 °F nighttime warming per AOD unit (heat trapping under aerosol layer).
  • Humidity bump — +3 % relative humidity per AOD unit, with downstream dewpoint recomputation.
  • AOD field — written into EPW field 29 (aerosol_optical_depth) for downstream tooling that reads it.

Coefficients are mid-points of published literature ranges; site-specific tuning (by albedo, vegetation, aerosol chemistry) is not currently applied. Smoke events are auto-aligned in time to whichever event drives the season (smoke onset overlapping the peak of an active heat event, for instance).

SSP scaling status

SSP-driven amplification of smoke severity is wired client-side and server-side but the underlying scaling database is still being populated (per the AR6 Phase 4 approach for heat / cold events). Until that data lands, smoke severity defaults to the historical CAMS climatology baseline for the selected location — users can manually increase the slider, but no automatic future amplification is applied.

Validation & Quality Control

EPWForge undergoes ongoing automated validation across the data pipeline, the Belcher morphing implementation, and the EnergyPlus simulation outputs of pre-computed reference cases. Recent audits confirm:

  • CMIP6 deltas align with IPCC AR6 assessed warming within expected bounds.
  • Physical consistency — Arctic amplification, land-vs-ocean contrast, and SSP ordering all match observed climate patterns.
  • Belcher morphing produces exact temperature shifts and physically valid humidity, solar, and wind output across every model in the ensemble.
  • EPW outputs satisfy physical bounds and have valid 8760-hour structure across all tested locations and scenarios.

Validation includes cross-checks against published TMYx datasets and internal consistency checks across all generated variables.

Detailed audit reports are maintained internally and are available to enterprise customers on request.

Limitations & Assumptions

EPWForge uses methods that are standard practice in the field, but all modelling tools carry inherent limitations users should understand when applying results to design decisions.

  • Spatial resolution. CMIP6 deltas are computed on grids consistent with established practice. Mountain regions and immediate coastlines may show larger interpolation uncertainty.
  • Within-cell uniformity. A single monthly delta value is applied uniformly across all hours in a month, which is the standard Belcher approach but does not capture intra-month variability in the rate of climate change.
  • Baseline file quality. Morphed future files inherit any biases or gaps present in the underlying baseline, whether TMY or AMY.
  • UHI presets are approximations. Urban microclimate is site-specific and varies by morphology, albedo, and vegetation cover. Presets are LCZ-based estimates and should not substitute for site-specific urban climate measurements where precision is critical.

These limitations are inherent to global-to-local downscaling. For critical design decisions, results should be validated against primary data sources and applicable local code requirements. EPWForge is designed to make these limitations explicit while providing the most physically consistent weather inputs practical for simulation workflows.

Ongoing R&D

EPWForge is under active development. The platform's next-generation work focuses on extending the science, sharpening calibration, and reducing infrastructure cost as usage scales. Selected workstreams currently in flight or on the near roadmap:

  • CMIP7 evaluation. Tracking the rollout of CMIP7 ScenarioMIP (Van Vuuren et al. 2026), which replaces the SSP framework with seven new emission-driven scenarios (H, HL, M, ML, L, VL, LN) and deprecates SSP5-8.5 as implausible. CMIP6 remains canonical through at least 2027 while a multi-model CMIP7 ensemble (needed for our P5–P95 percentile bands) becomes available. Schema work on a per-model CO₂ concentration store is planned for Q3 2026 in anticipation.
  • Wildfire smoke SSP scaling. Smoke overlays currently use the CAMS 2003–2025 climatology baseline. We're populating a parallel-to-AR6 extreme-event-scaling database for smoke, which will auto-amplify smoke severity under future SSP scenarios as the climate-fire feedback literature continues to consolidate. Wiring is already in place; activates when the dataset lands.
  • Diurnal-cycle climate deltas. CMIP6 morphing today uses monthly mean ΔT. The literature increasingly documents that diurnal temperature range itself changes under warming (nighttime warms faster than daytime in many regions). Investigating ingestion of sub-monthly delta fields so the morphing process captures DTR shifts, not just mean shifts.
  • AR6 → AR7 extreme-event scaling. The AR6 Phase 4 amplification factors we use for event auto-fill are the best available now. As AR7 prepares for release (2027–2028), we'll re-derive the scaling factors from the newer atlas and revisit the conservative cold-family floor if observations or new attribution studies (post-Cohen 2026) materially change the cold-extremes-under-warming story.
  • AI-weather-model interop. Tracking the rapid maturation of graph-neural-network weather models (GraphCast, Pangu, GenCast). Once their hourly skill on extreme-event timing matches ERA5 at the relevant percentiles, they become an interesting alternative substrate for AMY generation — particularly in data-sparse regions where reanalysis quality drops.
  • Cost-architecture work (ARCO / Zarr). Packing the ~75k-file station library and the climate-deltas store into cloud-native Zarr (Icechunk) for batch lookups. Targeting an order-of-magnitude speedup on multi-station / multi-scenario workloads (parametric sweeps, ensemble fan-out) without changing user-facing behavior.
  • Validation benchmarks expansion. Growing the quantitative validation set against station observations beyond annual-mean temperature and GHI bias bounds — adding seasonal extreme tracking, wet-bulb peak agreement, and comparison against published downscaling intercomparisons (CORDEX-CMIP6 where available). Goal: published QC report each year.
  • Higher-resolution UHI presets. Current LCZ-based UHI presets are continent-scale approximations. Investigating per-city tuned diurnal profiles for the largest metros where simulation accuracy materially improves outcomes (NYC, London, Tokyo, Singapore, Dubai, etc.).

If your project depends on a specific R&D item landing on a timeline, reach out — we adjust priorities based on customer signal.

Citations

Belcher, S.E., Hacker, J.N. & Powell, D.S. (2005). “Constructing design weather data for future climates.” Building Services Engineering Research and Technology, 26(1), 49–61.

Hersbach, H. et al. (2020). “The ERA5 global reanalysis.” Quarterly Journal of the Royal Meteorological Society, 146(730), 1999–2049.

Stewart, I.D. & Oke, T.R. (2012). “Local Climate Zones for urban ecosystem studies.” Bulletin of the American Meteorological Society, 93(12), 1879–1900.

Eyring, V. et al. (2016). “Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization.” Geoscientific Model Development, 9(5), 1937–1958.

IPCC. (2021). “Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change” (AR6 WG1) — including the AR6 Atlas regional CMIP6 indices used for Phase 4 event amplification factors.

Van Vuuren, D.P. et al. (2026). “The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP7.” Geoscientific Model Development, 19, 2627–2656.

Cohen, J. et al. (2026). Polar-vortex disruption and cold-extreme trends under continued Arctic warming. Used as the basis for our conservative cold-family floor in AR6 Phase 4 auto-fill (no future dampening applied).