{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5d66576e",
   "metadata": {},
   "source": [
    "# Medical Signal Processing — Lab Project\n",
    "# Multi-Center Breast DCE-MRI: From Raw Images to Harmonized Biomarkers\n",
    "\n",
    "**Course:** Medical Signal Processing &nbsp;·&nbsp; **Format:** Guided build-up lab (9 parts)\n",
    "\n",
    "---\n",
    "\n",
    "In this lab you will rebuild — **one piece at a time** — a complete research pipeline that\n",
    "analyses breast **Dynamic Contrast-Enhanced MRI (DCE-MRI)** from several hospitals and turns\n",
    "the raw scans into clean, comparable numerical biomarkers.\n",
    "\n",
    "This is a *real* pipeline used in oncological imaging research. We have broken it into 9 small\n",
    "parts. Each part:\n",
    "\n",
    "1. explains the **idea** behind it (short theory),\n",
    "2. gives you a **task** (\"build this function\"),\n",
    "3. lets you **run and see** the result immediately,\n",
    "4. ends with **exercises** to test your understanding.\n",
    "\n",
    "By the end, every function you write is plugged together into the final pipeline in **Part 7**,\n",
    "and you produce the same outputs (colormaps, CSV feature tables, validation figures) as the\n",
    "original research code.\n",
    "\n",
    "> 🧭 **How to use this notebook.** Run the cells from top to bottom. Read the markdown before\n",
    "> each code cell. Do the exercises. Do **not** skip Part 0 — it creates the data everything else\n",
    "> needs.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bc708df",
   "metadata": {},
   "source": [
    "> 📝 **This is the student assignment.** Functions marked `# TODO (Part N)` are **yours to write** — they currently `raise NotImplementedError`. Implement each one from the Task description above it, then run its cell and the check cell below. Work top to bottom: later parts depend on earlier ones. Solutions are intentionally not included here."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38837031",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 0 — The Problem, the Plan, and the Data\n",
    "\n",
    "### 0.1 The clinical question\n",
    "\n",
    "After a contrast agent (gadolinium) is injected into a vein, a breast tumour \"lights up\" on MRI.\n",
    "*How* the brightness changes over the next minute tells radiologists something about the tumour:\n",
    "\n",
    "| Pattern | What the signal does after injection | Clinical hint |\n",
    "|---|---|---|\n",
    "| **Uptake**  | keeps rising | more often benign |\n",
    "| **Plateau** | rises then flattens | intermediate |\n",
    "| **Washout** | rises then **falls** | more often malignant |\n",
    "\n",
    "We acquire two 3D volumes per patient: **pre-contrast** (`_0000`, before injection) and\n",
    "**post-contrast** (`_0001`, ~1 min after). A radiologist also gives us a **segmentation mask**\n",
    "that marks exactly which voxels are tumour (the *Region Of Interest*, ROI).\n",
    "\n",
    "### 0.2 The multi-center twist\n",
    "\n",
    "The data comes from **different hospitals with different scanners**. A Siemens 3T machine and a\n",
    "GE 1.5T machine produce systematically different numbers *even for an identical tumour*. If we\n",
    "just pool all hospitals together, those scanner differences (\"**batch effects**\") contaminate\n",
    "the analysis. The last part of the pipeline removes them with **harmonization**.\n",
    "\n",
    "### 0.3 The pipeline you will build\n",
    "\n",
    "```\n",
    " raw NIfTI  ─►  explore   ─►  kinetic      ─►  pseudo-color  ─►  RGB NIfTI\n",
    " (Part 1)       & QC          analysis         visualization     export\n",
    "                              (Part 2)         (Part 3)          (Part 4)\n",
    "                                  │\n",
    "                                  ▼\n",
    "                            radiomics features ──► feature table ──► harmonization ──► validation\n",
    "                              (Part 5)              (Part 6)          (Part 6)          (Part 8)\n",
    "                                                         └────────── assembled in Part 7 ──────────┘\n",
    "```\n",
    "\n",
    "### 0.4 Setup\n",
    "\n",
    "Run the next cell. It only needs the standard scientific Python stack plus `nibabel`\n",
    "(the library for reading `.nii.gz` medical images)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7cd7ac7d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:03.695079Z",
     "iopub.status.busy": "2026-05-19T20:23:03.694933Z",
     "iopub.status.idle": "2026-05-19T20:23:04.305300Z",
     "shell.execute_reply": "2026-05-19T20:23:04.304151Z"
    }
   },
   "outputs": [],
   "source": [
    "# --- Core libraries (all standard, pip-installable) ---\n",
    "import os, sys, glob, warnings, textwrap\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.colors import ListedColormap\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "try:\n",
    "    import nibabel as nib\n",
    "except ImportError:\n",
    "    print(\"nibabel is missing. Install it with:  pip install nibabel\")\n",
    "    raise\n",
    "\n",
    "np.random.seed(42)                 # reproducible results for everyone\n",
    "plt.rcParams[\"figure.dpi\"] = 110\n",
    "\n",
    "print(\"Environment ready.\")\n",
    "print(\"numpy\", np.__version__, \"| pandas\", pd.__version__, \"| nibabel\", nib.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7ebcfd0",
   "metadata": {},
   "source": [
    "### 0.5 Create a synthetic multi-center dataset\n",
    "\n",
    "The real MAMA-MIA hospital data cannot be shared (patient privacy + file size). So we **build a\n",
    "tiny phantom dataset ourselves** that behaves like the real thing:\n",
    "\n",
    "* a small 3D volume with a **tumour blob**,\n",
    "* a **pre** and a **post** contrast version (the tumour enhances),\n",
    "* a **mask** of the tumour,\n",
    "* every tumour has its **own kinetic composition** (some uptake-dominant, some\n",
    "  washout-dominant), so the per-case statistics in Part 9 are meaningful,\n",
    "* **two \"hospitals\"** (`CENTER_A`, `CENTER_B`) whose scanners apply a different *gain* and\n",
    "  *offset* and see slightly different tumour mixes — the batch effect we later remove.\n",
    "\n",
    "We save everything on disk using **exactly the folder layout of the original project** so the\n",
    "code you write here would work unchanged on the real data:\n",
    "\n",
    "```\n",
    "mamamia_data/\n",
    "  CENTER_A/\n",
    "    CENTER_A_001/  CENTER_A_001_0000.nii.gz   CENTER_A_001_0001.nii.gz\n",
    "    segment/       CENTER_A_001.nii.gz\n",
    "  CENTER_B/ ...\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac6e38ff",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:04.308541Z",
     "iopub.status.busy": "2026-05-19T20:23:04.307368Z",
     "iopub.status.idle": "2026-05-19T20:23:04.623576Z",
     "shell.execute_reply": "2026-05-19T20:23:04.622429Z"
    }
   },
   "outputs": [],
   "source": [
    "DATA_ROOT = \"mamamia_data\"\n",
    "CENTERS   = [\"CENTER_A\", \"CENTER_B\"]\n",
    "N_CASES   = 4                       # cases per center\n",
    "VOL_SHAPE = (64, 64, 24)            # small on purpose so the lab runs in seconds\n",
    "\n",
    "def _make_one_case(rng, scanner_gain, scanner_offset, mix):\n",
    "    \"\"\"Return (pre, post, mask) 3D arrays for one synthetic patient.\n",
    "    `mix` = (p_uptake, p_plateau, p_washout) proportions for this tumour.\"\"\"\n",
    "    X, Y, Z = VOL_SHAPE\n",
    "    xx, yy, zz = np.mgrid[0:X, 0:Y, 0:Z]\n",
    "\n",
    "    # background breast tissue: low signal + noise\n",
    "    pre = rng.normal(40, 5, size=VOL_SHAPE)\n",
    "\n",
    "    # a spherical tumour at a slightly random location/size\n",
    "    cx, cy, cz = X//2 + rng.integers(-6,6), Y//2 + rng.integers(-6,6), Z//2\n",
    "    radius     = rng.integers(7, 11)\n",
    "    dist       = np.sqrt((xx-cx)**2 + (yy-cy)**2 + ((zz-cz)*2.0)**2)\n",
    "    mask       = (dist < radius).astype(np.uint8)\n",
    "\n",
    "    # tumour baseline a bit brighter than background\n",
    "    pre[mask == 1] += rng.normal(25, 4, size=int(mask.sum()))\n",
    "\n",
    "    # post-contrast: enhance the tumour. Each tumour has its OWN composition\n",
    "    # (`mix`) so per-case pie charts genuinely differ.\n",
    "    post = pre.copy()\n",
    "    tumour_idx = np.where(mask == 1)\n",
    "    n = len(tumour_idx[0])\n",
    "    pattern = rng.choice(3, size=n, p=mix)            # 0=uptake 1=plateau 2=washout\n",
    "    delta = np.empty(n)\n",
    "    delta[pattern == 0] = rng.normal( 0.45, 0.05, (pattern==0).sum())  # +45% strong rise\n",
    "    delta[pattern == 1] = rng.normal( 0.05, 0.03, (pattern==1).sum())  # ~+5% flat\n",
    "    delta[pattern == 2] = rng.normal(-0.15, 0.04, (pattern==2).sum())  # -15% washout\n",
    "    post[tumour_idx] = pre[tumour_idx] * (1.0 + delta)\n",
    "\n",
    "    # ---- SCANNER / HOSPITAL effect (the batch effect) ----\n",
    "    pre  = pre  * scanner_gain + scanner_offset\n",
    "    post = post * scanner_gain + scanner_offset\n",
    "    return pre.astype(np.float32), post.astype(np.float32), mask\n",
    "\n",
    "def build_synthetic_dataset(root=DATA_ROOT):\n",
    "    # each center gets a different scanner gain & offset (intensity batch\n",
    "    # effect) AND a slightly different tumour-composition tendency.\n",
    "    scanner = {\"CENTER_A\": (1.00,  0.0),\n",
    "               \"CENTER_B\": (1.35, 30.0)}      # B reads ~35% higher + a bias\n",
    "    # Dirichlet alphas -> per-case mixes vary a lot; centers lean differently\n",
    "    comp_alpha = {\"CENTER_A\": (3.0, 2.0, 1.2),   # A: more uptake-dominant tumours\n",
    "                  \"CENTER_B\": (1.4, 2.0, 3.0)}   # B: more washout-dominant tumours\n",
    "    affine = np.diag([1.0, 1.0, 1.0, 1.0])    # 1x1x1 mm voxels\n",
    "    for center in CENTERS:\n",
    "        seg_dir = os.path.join(root, center, \"segment\")\n",
    "        os.makedirs(seg_dir, exist_ok=True)\n",
    "        gain, offset = scanner[center]\n",
    "        rng = np.random.default_rng(hash(center) % (2**32))\n",
    "        for k in range(1, N_CASES + 1):\n",
    "            cid = f\"{center}_{k:03d}\"\n",
    "            cdir = os.path.join(root, center, cid)\n",
    "            os.makedirs(cdir, exist_ok=True)\n",
    "            mix = rng.dirichlet(comp_alpha[center])     # this tumour's composition\n",
    "            pre, post, mask = _make_one_case(rng, gain, offset, mix)\n",
    "            nib.save(nib.Nifti1Image(pre,  affine), os.path.join(cdir, f\"{cid}_0000.nii.gz\"))\n",
    "            nib.save(nib.Nifti1Image(post, affine), os.path.join(cdir, f\"{cid}_0001.nii.gz\"))\n",
    "            nib.save(nib.Nifti1Image(mask, affine), os.path.join(seg_dir, f\"{cid}.nii.gz\"))\n",
    "    return root\n",
    "\n",
    "build_synthetic_dataset()\n",
    "print(\"Synthetic dataset created under:\", DATA_ROOT)\n",
    "for f in sorted(glob.glob(os.path.join(DATA_ROOT, \"*\", \"*\", \"*.nii.gz\")))[:6]:\n",
    "    print(\"  \", f)\n",
    "print(\"   ... (8 cases x 3 files total)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "880f82b4",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 1 — Reading & Exploring a NIfTI Image\n",
    "\n",
    "🎓 **Learning goals:** open a `.nii.gz` file, understand what's inside it, and look at a slice.\n",
    "\n",
    "### Concept — what is a NIfTI volume?\n",
    "\n",
    "A brain/breast MRI is a **3D grid of numbers**. `.nii.gz` (NIfTI) is the standard file format\n",
    "to store that grid plus metadata. Two things matter for us:\n",
    "\n",
    "* **`get_fdata()`** → the raw 3D `numpy` array of intensities (`shape = (X, Y, Z)`).\n",
    "* the **header / affine** → voxel size, orientation. We mostly keep it untouched and re-use it\n",
    "  when we save results, so our outputs line up with the input.\n",
    "\n",
    "> 💡 **Analogy.** A NIfTI file is like a stack of greyscale photos (the *slices*), plus a sticky\n",
    "> note (the *header*) saying how big each pixel is in millimetres.\n",
    "\n",
    "### 🎯 Task 1 — write `load_nifti` and `describe_volume`\n",
    "\n",
    "`load_nifti(path)` returns `(data_array, nifti_object)`.\n",
    "`describe_volume(name, data)` prints shape, dtype, and intensity range — the first thing any\n",
    "medical-imaging engineer does to **sanity-check** a file (this mirrors the original\n",
    "`explore_nifti.py`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65a26fcb",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:04.625382Z",
     "iopub.status.busy": "2026-05-19T20:23:04.625226Z",
     "iopub.status.idle": "2026-05-19T20:23:04.641800Z",
     "shell.execute_reply": "2026-05-19T20:23:04.640874Z"
    }
   },
   "outputs": [],
   "source": [
    "def load_nifti(path):\n",
    "    \"\"\"Load a .nii.gz file. Returns (data: np.ndarray, img: nib object).\"\"\"\n",
    "    # TODO (Part 1): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"load_nifti() is not implemented yet — Part 1\")\n",
    "\n",
    "def describe_volume(name, data):\n",
    "    \"\"\"Print a quick quality-control summary of a 3D volume.\"\"\"\n",
    "    # TODO (Part 1): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"describe_volume() is not implemented yet — Part 1\")\n",
    "\n",
    "# Try it on the first case of CENTER_A\n",
    "case_dir = os.path.join(DATA_ROOT, \"CENTER_A\", \"CENTER_A_001\")\n",
    "pre,  _   = load_nifti(os.path.join(case_dir, \"CENTER_A_001_0000.nii.gz\"))\n",
    "post, _   = load_nifti(os.path.join(case_dir, \"CENTER_A_001_0001.nii.gz\"))\n",
    "mask, _   = load_nifti(os.path.join(DATA_ROOT, \"CENTER_A\", \"segment\", \"CENTER_A_001.nii.gz\"))\n",
    "\n",
    "describe_volume(\"pre-contrast  (_0000)\", pre)\n",
    "describe_volume(\"post-contrast (_0001)\", post)\n",
    "describe_volume(\"tumour mask\",           mask)\n",
    "print(\"\\nUnique values in mask:\", np.unique(mask), \" <- binary: 0=background, 1=tumour\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea1feea1",
   "metadata": {},
   "source": [
    "### See it — visualise one slice\n",
    "\n",
    "Medical images are 3D, but screens are 2D, so we look at one **slice** at a time. We pick the\n",
    "slice that contains the *most* tumour, then show pre, post, and the mask overlaid."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ebff44d4",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:04.643460Z",
     "iopub.status.busy": "2026-05-19T20:23:04.643312Z",
     "iopub.status.idle": "2026-05-19T20:23:04.907915Z",
     "shell.execute_reply": "2026-05-19T20:23:04.906825Z"
    }
   },
   "outputs": [],
   "source": [
    "def best_slice(mask):\n",
    "    \"\"\"Return the z-index of the slice with the largest tumour area.\"\"\"\n",
    "    areas = mask.reshape(-1, mask.shape[2]).sum(axis=0)  # voxels per slice\n",
    "    return int(np.argmax(areas))\n",
    "\n",
    "z = best_slice(mask)\n",
    "\n",
    "fig, ax = plt.subplots(1, 3, figsize=(12, 4))\n",
    "ax[0].imshow(pre[:, :, z].T,  cmap=\"gray\", origin=\"lower\"); ax[0].set_title(f\"Pre-contrast (z={z})\")\n",
    "ax[1].imshow(post[:, :, z].T, cmap=\"gray\", origin=\"lower\"); ax[1].set_title(\"Post-contrast\")\n",
    "ax[2].imshow(post[:, :, z].T, cmap=\"gray\", origin=\"lower\")\n",
    "ax[2].imshow(np.ma.masked_where(mask[:, :, z]==0, mask[:, :, z]).T,\n",
    "             cmap=\"autumn\", origin=\"lower\", alpha=0.6)\n",
    "ax[2].set_title(\"Tumour ROI overlay\")\n",
    "for a in ax: a.axis(\"off\")\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6a22a63",
   "metadata": {},
   "source": [
    "### ✏️ Exercises — Part 1\n",
    "\n",
    "1. The tumour looks brighter on the post-contrast image than on the pre. Where in the displayed\n",
    "   summary numbers can you *already* see that enhancement, before doing any analysis?\n",
    "2. Write a one-liner that prints how many voxels are tumour in this case.\n",
    "\n",
    "> 💡 *Try this yourself first. The worked solution is in the instructor notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fdff37d",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 2 — Kinetic Analysis: Classifying Enhancement Patterns\n",
    "\n",
    "🎓 **Learning goals:** turn *two* volumes into *one* map that labels every tumour voxel as\n",
    "**uptake / plateau / washout**, and summarise it into numbers.\n",
    "\n",
    "### Concept — percent enhancement\n",
    "\n",
    "For each voxel we compute how much it changed from pre to post, **as a percentage**:\n",
    "\n",
    "$$\\Delta SI\\,[\\%] \\;=\\; \\frac{S_{post}-S_{pre}}{S_{pre}+\\varepsilon}\\times 100$$\n",
    "\n",
    "The tiny $\\varepsilon$ (`1e-10`) just prevents division by zero. Then we apply thresholds:\n",
    "\n",
    "| Class | Code | Rule (this pipeline) |\n",
    "|---|---|---|\n",
    "| Uptake  | 1 | $\\Delta SI > 15\\%$ |\n",
    "| Plateau | 2 | $-5\\% \\le \\Delta SI \\le 15\\%$ |\n",
    "| Washout | 3 | $\\Delta SI < -5\\%$ |\n",
    "| Background | 0 | outside the ROI |\n",
    "\n",
    "> ⚠️ **Real-world lesson.** The project's README *text* says \"10% / ~10% / 10%\" but the actual\n",
    "> **code** uses 15 / −5. When documentation and code disagree, **the code is the ground truth**\n",
    "> for what was really computed. Always check. We follow the code.\n",
    "\n",
    "### 🎯 Task 2 — write `percent_enhancement` and `classify_kinetics`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a87579b",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:04.909753Z",
     "iopub.status.busy": "2026-05-19T20:23:04.909534Z",
     "iopub.status.idle": "2026-05-19T20:23:04.918097Z",
     "shell.execute_reply": "2026-05-19T20:23:04.917256Z"
    }
   },
   "outputs": [],
   "source": [
    "def percent_enhancement(pre, post, eps=1e-10):\n",
    "    \"\"\"Voxel-wise percent signal change from pre to post contrast.\"\"\"\n",
    "    # TODO (Part 2): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"percent_enhancement() is not implemented yet — Part 2\")\n",
    "\n",
    "def classify_kinetics(change, roi, up_thr=15, plat_lo=-5, plat_hi=15):\n",
    "    \"\"\"\n",
    "    Build the colormap of kinetic classes (uint8):\n",
    "        0 = background, 1 = uptake, 2 = plateau, 3 = washout\n",
    "    Only voxels inside `roi` (boolean) are classified.\n",
    "    \"\"\"\n",
    "    # TODO (Part 2): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"classify_kinetics() is not implemented yet — Part 2\")\n",
    "\n",
    "roi    = mask > 0\n",
    "change = percent_enhancement(pre, post)\n",
    "colormap = classify_kinetics(change, roi)\n",
    "\n",
    "print(\"Voxels per class inside the tumour:\")\n",
    "for v, name in [(1,\"Uptake\"), (2,\"Plateau\"), (3,\"Washout\")]:\n",
    "    print(f\"  {name:8s}: {(colormap==v).sum():5d}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f621172",
   "metadata": {},
   "source": [
    "### 🎯 Task 2b — summarise the map into biomarkers\n",
    "\n",
    "A radiologist does not want a million voxels — they want a few numbers per patient. The most\n",
    "important: **what fraction of the tumour is uptake / plateau / washout**, plus simple intensity\n",
    "statistics. These percentages are the headline biomarkers used later for harmonization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ac967bc",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:04.919716Z",
     "iopub.status.busy": "2026-05-19T20:23:04.919568Z",
     "iopub.status.idle": "2026-05-19T20:23:04.928932Z",
     "shell.execute_reply": "2026-05-19T20:23:04.928027Z"
    }
   },
   "outputs": [],
   "source": [
    "def kinetic_features(pre, post, mask):\n",
    "    \"\"\"Return (features dict, colormap) for one case. Mirrors the original pipeline.\"\"\"\n",
    "    # TODO (Part 2): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"kinetic_features() is not implemented yet — Part 2\")\n",
    "\n",
    "feats, cmap = kinetic_features(pre, post, mask)\n",
    "for k, v in feats.items():\n",
    "    print(f\"  {k:28s} = {v:.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6af94e88",
   "metadata": {},
   "source": [
    "### ✏️ Exercises — Part 2\n",
    "\n",
    "1. The three percentages should add up to ~100%. Verify it in code. Why *exactly* 100 here?\n",
    "2. Re-run `classify_kinetics` with a stricter uptake threshold `up_thr=30`. What happens to the\n",
    "   uptake vs plateau percentages, and why?\n",
    "\n",
    "> 💡 *Try this yourself first. The worked solution is in the instructor notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c69d257d",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 3 — Pseudo-color Map Visualization\n",
    "\n",
    "🎓 **Learning goal:** turn the class map into the colour picture clinicians actually look at.\n",
    "\n",
    "### Concept\n",
    "\n",
    "Numbers don't help a radiologist at a glance — **colour does**. We draw the anatomical image in\n",
    "grey, then paint a semi-transparent layer on top: **blue = uptake, green = plateau,\n",
    "red = washout**. Red regions (washout) are the ones that worry oncologists, so they must\n",
    "\"pop out\".\n",
    "\n",
    "This reproduces the `*_complete_colormap.png` output of the original `complete_pipeline.py`.\n",
    "\n",
    "### 🎯 Task 3 — write `plot_colormap`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "636ce7fd",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:04.930701Z",
     "iopub.status.busy": "2026-05-19T20:23:04.930551Z",
     "iopub.status.idle": "2026-05-19T20:23:05.063290Z",
     "shell.execute_reply": "2026-05-19T20:23:05.062224Z"
    }
   },
   "outputs": [],
   "source": [
    "# 0=transparent background, 1=blue, 2=green, 3=red  (with alpha)\n",
    "KINETIC_CMAP = ListedColormap([(0,0,0,0), (0,0,1,0.7), (0,1,0,0.7), (1,0,0,0.7)])\n",
    "\n",
    "def plot_colormap(case_id, pre, colormap, mask, ax=None):\n",
    "    \"\"\"Overlay the kinetic class map on the grey anatomical image.\"\"\"\n",
    "    # TODO (Part 3): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"plot_colormap() is not implemented yet — Part 3\")\n",
    "\n",
    "plot_colormap(\"CENTER_A_001\", pre, cmap, mask)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "605a6b98",
   "metadata": {},
   "source": [
    "### ✏️ Exercise — Part 3\n",
    "\n",
    "Show a **montage** of every tumour-containing slice for this case (one subplot per slice) using\n",
    "`plot_colormap` and a grid of axes.\n",
    "\n",
    "> 💡 *Try this yourself first. The worked solution is in the instructor notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18130600",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 4 — Exporting an RGB NIfTI for Medical Viewers\n",
    "\n",
    "🎓 **Learning goal:** save the colormap as a file clinical software (Mango, ITK-SNAP, 3D Slicer)\n",
    "can open in colour.\n",
    "\n",
    "### Concept\n",
    "\n",
    "Our `colormap` array holds class *codes* (0–3). A radiologist's viewer doesn't know \"2 means\n",
    "plateau\". The trick the original `rgb_nifti_converter.py` uses:\n",
    "\n",
    "1. normalise the grey anatomical image to 0–255 → use it as a greyscale background in **R, G,\n",
    "   B** channels;\n",
    "2. paint class colours on top (blue / green / red);\n",
    "3. store it with NIfTI's special **RGB datatype (code 128, 24-bit)** using a *structured*\n",
    "   numpy dtype with named fields `('R','G','B')`.\n",
    "\n",
    "> 💡 **Why a structured dtype?** A normal `(X,Y,Z,3)` array would be read as a *4D* volume.\n",
    "> The `('R','G','B')` record dtype tells NIfTI \"this is **one** 3D volume whose voxels are\n",
    "> colours\", which is what viewers expect.\n",
    "\n",
    "### 🎯 Task 4 — write `save_rgb_nifti`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "00d3a90a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:05.065079Z",
     "iopub.status.busy": "2026-05-19T20:23:05.064911Z",
     "iopub.status.idle": "2026-05-19T20:23:05.091427Z",
     "shell.execute_reply": "2026-05-19T20:23:05.090406Z"
    }
   },
   "outputs": [],
   "source": [
    "def save_rgb_nifti(ref_path, class_arr, out_path):\n",
    "    \"\"\"\n",
    "    Save a class map as an RGB-encoded NIfTI (datatype 128) so medical\n",
    "    viewers show it in colour over a greyscale anatomical background.\n",
    "    Adapted from the project's rgb_nifti_converter.py.\n",
    "    \"\"\"\n",
    "    # TODO (Part 4): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"save_rgb_nifti() is not implemented yet — Part 4\")\n",
    "\n",
    "ref_path = os.path.join(case_dir, \"CENTER_A_001_0000.nii.gz\")\n",
    "out_path = os.path.join(case_dir, \"CENTER_A_001_colormap.nii.gz\")\n",
    "save_rgb_nifti(ref_path, cmap, out_path)\n",
    "\n",
    "# verify by re-loading\n",
    "back = nib.load(out_path)\n",
    "print(\"Saved :\", out_path)\n",
    "print(\"dtype :\", back.get_data_dtype(), \"| NIfTI datatype code:\", int(back.header[\"datatype\"]))\n",
    "print(\"A washout-painted voxel reads as RGB:\",\n",
    "      np.asarray(back.dataobj)[tuple(np.argwhere(cmap == 3)[0])])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cf756c0",
   "metadata": {},
   "source": [
    "### ✏️ Exercise — Part 4\n",
    "\n",
    "If you skipped the structured-dtype step and saved the plain `(X, Y, Z, 3)` array instead, how\n",
    "would a viewer interpret the file, and why is that wrong?\n",
    "\n",
    "> 💡 *Try this yourself first. The worked solution is in the instructor notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fbe2995",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 5 — Radiomics Features\n",
    "\n",
    "🎓 **Learning goal:** describe the tumour with a vector of numbers (\"**radiomics**\") instead of\n",
    "just looking at it.\n",
    "\n",
    "### Concept\n",
    "\n",
    "**Radiomics** = extracting many quantitative descriptors from a region of interest, so that\n",
    "statistics / machine learning can use the image. Categories:\n",
    "\n",
    "* **First-order** — the intensity *histogram*: mean, std, skewness, kurtosis, percentiles,\n",
    "  entropy. (We implement these by hand — it's good signal-processing practice.)\n",
    "* **Shape** — volume, surface, sphericity.\n",
    "* **Texture** — spatial patterns (GLCM, GLRLM, …).\n",
    "\n",
    "The original pipeline uses the **PyRadiomics** library for the full set. PyRadiomics is heavy\n",
    "and not always installed, so here we build the **first-order** features ourselves and treat\n",
    "PyRadiomics as an *optional* extra.\n",
    "\n",
    "### 🎯 Task 5 — write `first_order_features`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5b69ec75",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:05.093261Z",
     "iopub.status.busy": "2026-05-19T20:23:05.093079Z",
     "iopub.status.idle": "2026-05-19T20:23:05.103810Z",
     "shell.execute_reply": "2026-05-19T20:23:05.102976Z"
    }
   },
   "outputs": [],
   "source": [
    "def first_order_features(values, prefix):\n",
    "    \"\"\"First-order (histogram-based) radiomics features for a 1D array of ROI intensities.\"\"\"\n",
    "    # TODO (Part 5): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"first_order_features() is not implemented yet — Part 5\")\n",
    "\n",
    "def radiomics_features(pre, post, mask):\n",
    "    \"\"\"First-order radiomics at both timepoints + their temporal change.\"\"\"\n",
    "    # TODO (Part 5): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"radiomics_features() is not implemented yet — Part 5\")\n",
    "\n",
    "rf = radiomics_features(pre, post, mask)\n",
    "for k, v in rf.items():\n",
    "    print(f\"  {k:22s} = {v:8.3f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4dc6cef8",
   "metadata": {},
   "source": [
    "<details>\n",
    "<summary><b>Optional: the real PyRadiomics (click — runs only if installed)</b></summary>\n",
    "\n",
    "The original pipeline calls PyRadiomics like below. It needs `SimpleITK` + `pyradiomics`\n",
    "(`pip install SimpleITK pyradiomics`). This cell is safe to run even if they're missing.\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ca6c11ef",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:05.105539Z",
     "iopub.status.busy": "2026-05-19T20:23:05.105375Z",
     "iopub.status.idle": "2026-05-19T20:23:05.111678Z",
     "shell.execute_reply": "2026-05-19T20:23:05.110858Z"
    }
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    import SimpleITK as sitk\n",
    "    from radiomics.featureextractor import RadiomicsFeatureExtractor\n",
    "    ex = RadiomicsFeatureExtractor(binWidth=5, normalize=False,\n",
    "                                   resampledPixelSpacing=[1, 1, 1])\n",
    "    img  = sitk.ReadImage(ref_path)\n",
    "    msk  = sitk.ReadImage(os.path.join(DATA_ROOT,\"CENTER_A\",\"segment\",\"CENTER_A_001.nii.gz\"))\n",
    "    msk  = sitk.Resample(msk, img, sitk.Transform(), sitk.sitkNearestNeighbor)\n",
    "    res  = ex.execute(img, msk, label=1)\n",
    "    keys = [k for k in res if k.startswith(\"original_firstorder\")][:5]\n",
    "    print(\"PyRadiomics is available. Example features:\")\n",
    "    for k in keys:\n",
    "        print(f\"  {k} = {float(res[k]):.3f}\")\n",
    "except Exception as e:\n",
    "    print(\"PyRadiomics not available in this environment — that's fine for the lab.\")\n",
    "    print(\"We use our hand-written first_order_features() instead.\")\n",
    "    print(f\"(detail: {type(e).__name__})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f1d6834",
   "metadata": {},
   "source": [
    "### ✏️ Exercise — Part 5\n",
    "\n",
    "`skewness` measures asymmetry of the intensity histogram. Without computing it, predict the\n",
    "**sign** of `t1_skewness` for a tumour where a small bright sub-region enhances strongly, then\n",
    "check.\n",
    "\n",
    "> 💡 *Try this yourself first. The worked solution is in the instructor notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61664e51",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 6 — Multi-Center Harmonization\n",
    "\n",
    "🎓 **Learning goal:** understand *why* pooling hospitals naively is wrong, and remove the\n",
    "scanner batch effect.\n",
    "\n",
    "### Concept — batch effects\n",
    "\n",
    "We deliberately built `CENTER_B` with a scanner that reads **~35% higher with a +30 offset**.\n",
    "So `CENTER_B`'s `t1_mean_intensity` is systematically larger than `CENTER_A`'s — *not* because\n",
    "its tumours differ, but because of the **machine**. If you train a model on pooled data, it may\n",
    "learn \"high intensity ⇒ center B\" instead of real biology. That contamination is a **batch\n",
    "effect**.\n",
    "\n",
    "Two tools, applied to the **feature table** (rows = patients, columns = features):\n",
    "\n",
    "1. **Normalization** (e.g. min–max to 0–1): puts features on a common scale.\n",
    "2. **Harmonization (ComBat)**: explicitly *removes the center-specific location & scale* while\n",
    "   preserving the real patient-to-patient (biological) variation. ComBat models each feature\n",
    "   as `value = grand_mean + center_shift + center_scale·noise` and uses *empirical Bayes* to\n",
    "   estimate the center terms robustly, then subtracts them.\n",
    "\n",
    "We implement a **teaching-grade location/scale ComBat** (the core idea, no empirical-Bayes\n",
    "shrinkage) and point to the production `neuroCombat` used in the original code.\n",
    "\n",
    "### 🎯 Task 6a — build the multi-center feature table\n",
    "\n",
    "First we need a table. We loop over **every case in every center** and stack the kinetic +\n",
    "radiomics features. (This is the heart of `process_case`, which Part 7 will formalise.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9caac81",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:05.113394Z",
     "iopub.status.busy": "2026-05-19T20:23:05.113240Z",
     "iopub.status.idle": "2026-05-19T20:23:05.223097Z",
     "shell.execute_reply": "2026-05-19T20:23:05.222024Z"
    }
   },
   "outputs": [],
   "source": [
    "def feature_table(root=DATA_ROOT):\n",
    "    # TODO (Part 6): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"feature_table() is not implemented yet — Part 6\")\n",
    "\n",
    "raw_df = feature_table()\n",
    "print(f\"Feature table: {raw_df.shape[0]} patients x {raw_df.shape[1]-2} features\\n\")\n",
    "print(raw_df.groupby(\"center\")[[\"t1_mean_intensity\",\"uptake_percentage\"]]\n",
    "            .agg([\"mean\",\"std\"]).round(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7aa5cd35",
   "metadata": {},
   "source": [
    "Look at `t1_mean_intensity` above: `CENTER_B`'s mean is far higher than `CENTER_A`'s — exactly\n",
    "the scanner batch effect we injected. That is what harmonization must remove.\n",
    "\n",
    "### 🎯 Task 6b — normalize, then harmonize"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4317f551",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:05.224996Z",
     "iopub.status.busy": "2026-05-19T20:23:05.224792Z",
     "iopub.status.idle": "2026-05-19T20:23:05.304125Z",
     "shell.execute_reply": "2026-05-19T20:23:05.302969Z"
    }
   },
   "outputs": [],
   "source": [
    "def minmax_normalize(df, feature_cols):\n",
    "    # TODO (Part 6): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"minmax_normalize() is not implemented yet — Part 6\")\n",
    "\n",
    "def simple_combat(df, feature_cols, batch_col=\"center\"):\n",
    "    \"\"\"\n",
    "    Teaching-grade location/scale harmonization (the core of ComBat without\n",
    "    the empirical-Bayes shrinkage). For every feature, every center is\n",
    "    re-centred and re-scaled to the pooled (grand) mean and std.\n",
    "    \"\"\"\n",
    "    # TODO (Part 6): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"simple_combat() is not implemented yet — Part 6\")\n",
    "\n",
    "feat_cols = [c for c in raw_df.columns if c not in (\"case_id\", \"center\")]\n",
    "norm_df   = minmax_normalize(raw_df, feat_cols)\n",
    "harm_df   = simple_combat(norm_df, feat_cols)\n",
    "\n",
    "def between_center_var(df, col):\n",
    "    \"\"\"Variance of the per-center means = how much centers disagree.\"\"\"\n",
    "    return df.groupby(\"center\")[col].mean().var()\n",
    "\n",
    "print(\"Between-center variance (lower = better harmonized):\\n\")\n",
    "print(f\"{'feature':28s} {'raw':>10s} {'normalized':>12s} {'harmonized':>12s}\")\n",
    "for col in [\"t1_mean_intensity\", \"uptake_percentage\", \"washout_percentage\"]:\n",
    "    print(f\"{col:28s} {between_center_var(raw_df,col):10.4f} \"\n",
    "          f\"{between_center_var(norm_df,col):12.4f} \"\n",
    "          f\"{between_center_var(harm_df,col):12.6f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55001eaa",
   "metadata": {},
   "source": [
    "The harmonized between-center variance collapses toward ~0: the systematic hospital difference\n",
    "is gone, while patient-to-patient differences within each center are preserved.\n",
    "\n",
    "<details>\n",
    "<summary><b>Production note: neuroCombat (what the original code uses)</b></summary>\n",
    "\n",
    "The real `complete_pipeline.py` calls the `neuroCombat` library:\n",
    "\n",
    "```python\n",
    "from neuroCombat import neuroCombat\n",
    "harmonized = neuroCombat(dat=data_matrix,            # features x samples\n",
    "                         batch=center_labels,\n",
    "                         mod=None, par_prior=True, eb=True)[\"data\"]\n",
    "```\n",
    "\n",
    "`eb=True` turns on **empirical-Bayes shrinkage**, which makes the per-center estimates robust\n",
    "when you have few samples per center — important with only a handful of patients. Our\n",
    "`simple_combat` is the same idea without that shrinkage, so it's easier to read.\n",
    "</details>\n",
    "\n",
    "### ✏️ Exercise — Part 6\n",
    "\n",
    "Why do we harmonize the **feature table** and not the raw images directly? Give one practical\n",
    "reason.\n",
    "\n",
    "> 💡 *Try this yourself first. The worked solution is in the instructor notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2672c3cd",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 7 — Putting It All Together: the Pipeline\n",
    "\n",
    "🎓 **Learning goal:** assemble every function you wrote into the single class the original\n",
    "project ships, and produce the same outputs (per-case colormaps + 3 CSV files).\n",
    "\n",
    "Notice there is **almost no new code** here — Part 7 just *orchestrates* Parts 1–6. That is the\n",
    "whole point of building piece by piece."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0fb65cc2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:05.305973Z",
     "iopub.status.busy": "2026-05-19T20:23:05.305772Z",
     "iopub.status.idle": "2026-05-19T20:23:05.596801Z",
     "shell.execute_reply": "2026-05-19T20:23:05.595889Z"
    }
   },
   "outputs": [],
   "source": [
    "class DCEMRIPipeline:\n",
    "    \"\"\"End-to-end pipeline, assembled from the functions built in Parts 1-6.\"\"\"\n",
    "\n",
    "    def process_case(self, cid, case_dir, seg_dir):\n",
    "        # TODO (Part 7): implement this function — see the Task description above.\n",
    "        # (delete this line once done)\n",
    "        raise NotImplementedError(\"process_case() is not implemented yet — Part 7\")\n",
    "\n",
    "    def run(self, root=DATA_ROOT):\n",
    "        # TODO (Part 7): implement this function — see the Task description above.\n",
    "        # (delete this line once done)\n",
    "        raise NotImplementedError(\"run() is not implemented yet — Part 7\")\n",
    "\n",
    "pipe = DCEMRIPipeline()\n",
    "raw, norm, harm = pipe.run()\n",
    "print(f\"\\nFinal table: {raw.shape[0]} patients x {raw.shape[1]-1} features\")\n",
    "raw.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d2b1c7a",
   "metadata": {},
   "source": [
    "### ✏️ Exercise — Part 7\n",
    "\n",
    "Add **one** new biomarker (e.g. `enhancement_ratio = t1_mean / t0_mean`) so it appears in all\n",
    "three CSVs. Which function do you edit — and how many others?\n",
    "\n",
    "> 💡 *Try this yourself first. The worked solution is in the instructor notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adbce3e5",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 8 — Validation Dashboard\n",
    "\n",
    "🎓 **Learning goal:** produce the figure that *proves* the pipeline worked, like the original\n",
    "`combat_visualization.py`: reference kinetic curves + before/after harmonization comparison."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2369377",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:05.598642Z",
     "iopub.status.busy": "2026-05-19T20:23:05.598485Z",
     "iopub.status.idle": "2026-05-19T20:23:05.918018Z",
     "shell.execute_reply": "2026-05-19T20:23:05.916868Z"
    }
   },
   "outputs": [],
   "source": [
    "def reference_curves(ax):\n",
    "    \"\"\"Idealised uptake / plateau / washout kinetic curves (teaching figure).\"\"\"\n",
    "    t = np.linspace(0, 3, 100)\n",
    "    base = 75 * (1 - np.exp(-2.0 * t))           # common early rise\n",
    "    div  = 1.3                                   # divergence time\n",
    "    i    = np.argmin(np.abs(t - div))\n",
    "    up, pl, wo = base.copy(), base.copy(), base.copy()\n",
    "    for j in range(i+1, len(t)):\n",
    "        dt = t[j] - t[i]\n",
    "        up[j] = base[i] + dt*5\n",
    "        pl[j] = base[i] + dt*(-3)\n",
    "        wo[j] = base[i] + dt*(-10)\n",
    "    ax.plot(t, up, color=\"#3274A1\", lw=2, label=\"Uptake\")\n",
    "    ax.plot(t, pl, color=\"#3A923A\", lw=2, label=\"Plateau\")\n",
    "    ax.plot(t, wo, color=\"#C03D3E\", lw=2, label=\"Washout\")\n",
    "    ax.axvline(div, color=\"gray\", ls=\"--\", alpha=.6)\n",
    "    ax.set(title=\"Reference kinetic curves\", xlabel=\"Time-point\",\n",
    "           ylabel=\"Signal intensity (%)\", xlim=(0,3), ylim=(0,120))\n",
    "    ax.legend(); ax.grid(alpha=.3)\n",
    "\n",
    "def comparison_bars(ax, raw_df, harm_df, feature, title, color):\n",
    "    centers = sorted(raw_df[\"center\"].unique())\n",
    "    x = np.arange(len(centers)); w = 0.35\n",
    "    rm = [raw_df [raw_df [\"center\"]==c][feature].mean() for c in centers]\n",
    "    hm = [harm_df[harm_df[\"center\"]==c][feature].mean() for c in centers]\n",
    "    ax.bar(x-w/2, rm, w, label=\"Raw\",        color=color, alpha=.85)\n",
    "    ax.bar(x+w/2, hm, w, label=\"Harmonized\", color=color, alpha=.40)\n",
    "    ax.set(title=title, xticks=x); ax.set_xticklabels(centers)\n",
    "    ax.legend(); ax.grid(axis=\"y\", alpha=.3)\n",
    "\n",
    "raw_c  = raw.copy();  raw_c[\"center\"]  = raw_c[\"case_id\"].str.rsplit(\"_\",n=1).str[0]\n",
    "harm_c = harm.copy(); harm_c[\"center\"] = harm_c[\"case_id\"].str.rsplit(\"_\",n=1).str[0]\n",
    "\n",
    "fig = plt.figure(figsize=(15, 4.5))\n",
    "reference_curves(fig.add_subplot(1, 3, 1))\n",
    "comparison_bars(fig.add_subplot(1, 3, 2), raw_c, harm_c,\n",
    "                \"t1_mean_intensity\", \"t1 mean intensity: raw vs harmonized\", \"#1f77b4\")\n",
    "comparison_bars(fig.add_subplot(1, 3, 3), raw_c, harm_c,\n",
    "                \"uptake_percentage\", \"Uptake %: raw vs harmonized\", \"#2ca02c\")\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cabf613",
   "metadata": {},
   "source": [
    "In the middle panel the two **raw** bars are very different heights (scanner batch effect);\n",
    "after **harmonization** the bars match — the centers now agree. That single picture is the\n",
    "\"proof it worked\" you put in a paper or a report.\n",
    "\n",
    "### ✏️ Exercise — Part 8\n",
    "\n",
    "Add a 4th panel comparing `washout_percentage`. Does harmonization change it as much as\n",
    "`t1_mean_intensity`? Explain why ratio-type features are *naturally* more robust to scanner\n",
    "gain than absolute-intensity features.\n",
    "\n",
    "> 💡 *Try this yourself first. The worked solution is in the instructor notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "745a36ac",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 9 — Descriptive Statistics & Composition Pie Charts\n",
    "\n",
    "🎓 **Learning goal:** summarise the whole cohort with numbers and the right pictures —\n",
    "per-case **composition pie charts** and per-center **descriptive statistics** (this reproduces\n",
    "the *signal-distribution pie charts* and *key-metrics tables* from the original web app).\n",
    "\n",
    "### Concept — describing a cohort\n",
    "\n",
    "After Part 7 we have one row per patient. Before any modelling, you always **describe** the\n",
    "data: central tendency (`mean`, `median`), spread (`std`, min–max), and *how it differs by\n",
    "center* (the batch-effect story again, now in table form).\n",
    "\n",
    "### Concept — when is a pie chart the right chart?\n",
    "\n",
    "A pie shows **parts of a single whole**. The kinetic percentages\n",
    "(uptake + plateau + washout ≈ 100%) of **one tumour** are exactly that — a perfect pie.\n",
    "\n",
    "> ⚠️ **Good practice.** Pies are *bad* for comparing across patients or centers (humans compare\n",
    "> angles poorly). For comparisons use grouped **bars** or **boxplots**. Rule of thumb:\n",
    "> *pie = composition of one thing; bar/box = comparison between things.* You will use both below.\n",
    "\n",
    "### 🎯 Task 9a — write `kinetic_pie`\n",
    "\n",
    "Draw the uptake / plateau / washout composition of **one case** as a pie, using the project's\n",
    "colour convention (**blue = uptake, green = plateau, red = washout**, same as Part 3)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "752753f6",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:05.920235Z",
     "iopub.status.busy": "2026-05-19T20:23:05.919600Z",
     "iopub.status.idle": "2026-05-19T20:23:05.977343Z",
     "shell.execute_reply": "2026-05-19T20:23:05.976296Z"
    }
   },
   "outputs": [],
   "source": [
    "# project-wide kinetic colours (match the Part 3 overlay)\n",
    "KIN_LABELS = [\"Uptake\", \"Plateau\", \"Washout\"]\n",
    "KIN_COLS   = [\"uptake_percentage\", \"plateau_percentage\", \"washout_percentage\"]\n",
    "KIN_HEX    = [\"#1f77b4\", \"#2ca02c\", \"#d62728\"]   # blue / green / red\n",
    "\n",
    "def kinetic_pie(row, ax=None):\n",
    "    \"\"\"Pie chart of one case's kinetic composition. `row` is a feature-table row.\"\"\"\n",
    "    # TODO (Part 9): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"kinetic_pie() is not implemented yet — Part 9\")\n",
    "\n",
    "# show the composition of the first case\n",
    "kinetic_pie(raw.iloc[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e41f92b",
   "metadata": {},
   "source": [
    "### 🎯 Task 9b — write `kinetic_summary`\n",
    "\n",
    "Return a tidy **descriptive-statistics table** of the three kinetic percentages, **grouped by\n",
    "center** (so the per-site differences are visible as numbers, not just figures). Recover the\n",
    "center from `case_id` *inside* the function so it works on any feature table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b365430",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:05.979000Z",
     "iopub.status.busy": "2026-05-19T20:23:05.978791Z",
     "iopub.status.idle": "2026-05-19T20:23:06.001463Z",
     "shell.execute_reply": "2026-05-19T20:23:06.000435Z"
    }
   },
   "outputs": [],
   "source": [
    "def kinetic_summary(df):\n",
    "    \"\"\"Per-center descriptive statistics of the kinetic percentages.\"\"\"\n",
    "    # TODO (Part 9): implement this function — see the Task description above.\n",
    "    # (delete this line once done)\n",
    "    raise NotImplementedError(\"kinetic_summary() is not implemented yet — Part 9\")\n",
    "\n",
    "print(\"Per-center descriptive statistics (raw features):\\n\")\n",
    "print(kinetic_summary(raw))\n",
    "\n",
    "print(\"\\nWhole-cohort summary (raw features):\\n\")\n",
    "print(raw[KIN_COLS].describe().round(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7892502",
   "metadata": {},
   "source": [
    "### See it — cohort composition grid + distribution boxplots\n",
    "\n",
    "This is a *demonstration* cell (visualization plumbing, given for you). It uses your\n",
    "`kinetic_pie` for a pie-per-patient grid, then **boxplots** to compare the distribution of each\n",
    "kinetic class across centers — the comparison the pies cannot show well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9210ed23",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-19T20:23:06.003132Z",
     "iopub.status.busy": "2026-05-19T20:23:06.002955Z",
     "iopub.status.idle": "2026-05-19T20:23:06.579282Z",
     "shell.execute_reply": "2026-05-19T20:23:06.578170Z"
    }
   },
   "outputs": [],
   "source": [
    "cases = raw.reset_index(drop=True)\n",
    "n = len(cases); cols = 4; rows = int(np.ceil(n / cols))\n",
    "\n",
    "fig, axes = plt.subplots(rows, cols, figsize=(3.2*cols, 3.2*rows))\n",
    "for ax, (_, r) in zip(axes.ravel(), cases.iterrows()):\n",
    "    kinetic_pie(r, ax=ax)\n",
    "for ax in axes.ravel()[n:]:\n",
    "    ax.axis(\"off\")\n",
    "fig.suptitle(\"Per-case kinetic composition\", y=1.02, fontsize=13)\n",
    "plt.tight_layout(); plt.show()\n",
    "\n",
    "# distribution comparison across centers (the right chart for comparison)\n",
    "cmp = raw.copy()\n",
    "cmp[\"center\"] = cmp[\"case_id\"].str.rsplit(\"_\", n=1).str[0]\n",
    "fig, axes = plt.subplots(1, 3, figsize=(14, 4))\n",
    "for ax, col, hexc in zip(axes, KIN_COLS, KIN_HEX):\n",
    "    groups = [cmp.loc[cmp.center == c, col].values for c in sorted(cmp.center.unique())]\n",
    "    bp = ax.boxplot(groups, labels=sorted(cmp.center.unique()), patch_artist=True)\n",
    "    for box in bp[\"boxes\"]:\n",
    "        box.set(facecolor=hexc, alpha=0.5)\n",
    "    ax.set_title(col.replace(\"_\", \" \")); ax.set_ylabel(\"%\"); ax.grid(axis=\"y\", alpha=.3)\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd8730da",
   "metadata": {},
   "source": [
    "### ✏️ Exercises — Part 9\n",
    "\n",
    "1. From the per-center table, which kinetic class differs **most** between `CENTER_A` and\n",
    "   `CENTER_B` in the *raw* features? Is it still different after harmonization\n",
    "   (`kinetic_summary(harm)`)?\n",
    "2. Add a **coefficient of variation** (`std / mean`) column to `kinetic_summary`. Why is CV\n",
    "   sometimes more informative than std alone?\n",
    "3. You have a pie per patient. Why would a single pie of the *cohort-average* uptake/plateau/\n",
    "   washout be **misleading** for a clinician?\n",
    "\n",
    "> 💡 *Try this yourself first. The worked solution is in the instructor notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "659f0606",
   "metadata": {},
   "source": [
    "---\n",
    "## 🎉 Wrap-up\n",
    "\n",
    "You built, piece by piece, a complete multi-center DCE-MRI pipeline:\n",
    "\n",
    "| Part | You can now… |\n",
    "|---|---|\n",
    "| 1 | load & QC a NIfTI volume |\n",
    "| 2 | classify enhancement kinetics (uptake/plateau/washout) |\n",
    "| 3 | make the clinical pseudo-color overlay |\n",
    "| 4 | export an RGB NIfTI for medical viewers |\n",
    "| 5 | extract first-order radiomics features |\n",
    "| 6 | normalize **and** harmonize multi-center features |\n",
    "| 7 | orchestrate everything into one pipeline + CSV outputs |\n",
    "| 8 | produce the validation dashboard |\n",
    "| 9 | describe the cohort: composition pie charts + per-center statistics |\n",
    "\n",
    "### Mini-glossary\n",
    "\n",
    "- **DCE-MRI** — MRI series before/after a contrast agent injection.\n",
    "- **ROI / mask** — voxels marked as tumour by a radiologist.\n",
    "- **Kinetic pattern** — uptake / plateau / washout shape of the enhancement.\n",
    "- **Radiomics** — many quantitative features extracted from the ROI.\n",
    "- **Batch effect** — systematic difference caused by scanner/site, not biology.\n",
    "- **ComBat / harmonization** — statistical removal of batch effects from features.\n",
    "\n",
    "### From this lab to real research\n",
    "\n",
    "Everything here runs on the **real MAMA-MIA dataset** with *no code changes*: replace the\n",
    "`build_synthetic_dataset()` call by pointing `DATA_ROOT` at the real folders (same\n",
    "`CENTER/CASE/..._0000/_0001` + `segment/` layout), and swap `simple_combat` for `neuroCombat`\n",
    "(`pip install neuroCombat`) for the empirical-Bayes version. The original project also wraps\n",
    "all of this in a Flask web app for clinicians — a natural extension once this lab is mastered.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}