# TerraFIRMA UKESM scenario helper — project context

## What this repo does

Loads, stitches, and summarises monthly UKESM1-2-LL climate data
(tas, pr) for the TerraFIRMA overshoot scenario ensemble.
Scenarios branch from each other at known years; the loader resolves
the full ancestry chain and returns a single continuous `xr.Dataset`.

---

## Key files

| File | Purpose |
|---|---|
| `terrafirma_helpers.py` | Main module: scenario registry, loaders, summary CLI |
| `glacier_grid_on_ukesm.nc` | UKESM grid with `glacier_area` and `glacier_area_by_region` |
| `new_data/` | Additional scenario NetCDF files supplementing DIRPATH |
| `summary/` | Output directory (created on first run) |
| `summary/flattened/` | Per-scenario flattened glacier NetCDF files |
| `summary/by_region/` | Per-RGI-region monthly CSV files |

---

## Data sources

Two directories are searched for NetCDF files:

1. **`DIRPATH`** = `/home/www/lschuster/terrafirma_oggm_proj/terrafirma_data/raw_data`
   — original fragmented files (some scenarios only go to ~2053 / ~2139).
2. **`NEW_DATA_DIRPATH`** = `./new_data/` (relative to this repo)
   — longer merged files covering GWL1.5–GWL6.0 branches.

If the same scenario appears in both, `new_data/` wins (longer record)
and a note is printed.  Three scenarios overlap:
- `up2p0` (new_data extends to 2238 vs 2139)
- `up2p0-gwl2p0-50y-dn1p0` (new_data extends to 2333 vs 2053)
- `up2p0-gwl4p0-50y-dn2p0` (new_data extends to 2285 vs 2284)

---

## Scenario registry (`SCENARIO_PARENTS`)

35 scenarios, grouped by GWL level (1.5 → 6.0).  All branch from
`up2p0` directly or indirectly.  `hist` and `piControl` are standalone
roots and cannot be used as parents.

---

## CLI usage

```bash
goconda   # activate oggm_env

# Run all scenarios (summaries + flattened files):
python terrafirma_helpers.py summaries

# Run a subset:
python terrafirma_helpers.py summaries up2p0,up2p0-gwl2p0

# Smoke-test a single scenario load (original CLI mode):
python terrafirma_helpers.py up2p0-gwl4p0 tas
```

---

## Summary outputs produced per run

| File | Contents |
|---|---|
| `summary/global_tas_monthly.csv` | Cosine-lat weighted global mean TAS, columns = scenarios |
| `summary/global_pr_monthly.csv` | Same for PR |
| `summary/glacier_global_tas_monthly.csv` | Glacier-area weighted global TAS |
| `summary/glacier_global_pr_monthly.csv` | Same for PR |
| `summary/by_region/tas_monthly_reg{01..19}.csv` | Per-RGI-region TAS |
| `summary/by_region/pr_monthly_reg{01..19}.csv` | Per-RGI-region PR |
| `summary/flattened/ukesm_{scenario}_flat_glaciers.nc` | Per-scenario flattened glacier NetCDF |

**CSV time index** is a string like `"2100-01-01 00:00:00"` (cftime
360-day calendar, normalised to day 1 of month).  Pandas
`parse_dates=True` silently fails for years > 2262 due to
`datetime64[ns]` overflow — use string slicing instead:

```python
df = pd.read_csv("summary/global_tas_monthly.csv", index_col=0)
years = df.index.str.slice(0, 4).astype(int)
annual = df.groupby(years).mean()
```

---

## Flattened NetCDF structure

Dims: `(time, points, reg)` where `points` are the 1155 UKESM grid
cells that have `glacier_area > 0`.

Variables: `tas`, `pr`, `glacier_area`, `glacier_area_by_region`.

Auxiliary coordinates: `latitude(points)`, `longitude(points)`.

---

## Data-quality fixes applied at load time

1. **Missing months** — single-month gaps in raw files are filled by
   copying the same calendar month from the prior year (`_fill_missing_months`).
   Up to 5 gaps tolerated; more raises an error.

2. **Incomplete final year** — if the last year has 11/12 months, the
   missing month is filled from the prior year.  If fewer than 11 months,
   the whole final year is cropped.

3. **Timestamp normalisation** — all timestamps are set to day 1 of
   the month (`_set_month_start_time`), replacing the original mid-month
   values.

---

## Environment

```bash
goconda   # → activates oggm_env (miniforge3)
```

Python 3.13, xarray (recent version with CFDatetimeCoder API),
netCDF4, numpy, pandas.

---

## Status (2026-06-12)

- [x] Scenario registry complete (35 scenarios, GWL1.5–6.0)
- [x] Dual-directory loader with duplicate detection
- [x] Missing-month and incomplete-final-year repair
- [x] Month-start timestamp normalisation
- [x] Summary CSV generation (global + glacier-weighted + per-region)
- [x] Flattened glacier NetCDF export per scenario
- [x] Timestamped progress logging
- [x] Smoke-tested: `up2p0` passes end-to-end
- [ ] Full all-scenarios run not yet executed (run with `python terrafirma_helpers.py summaries`)
