{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10.0"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \ud83e\udde0 Tutorial 2 \u2014 EEG Preprocessing\n",
    "###  \u00b7 Biomedical Signals & Applications \u00b7 EEG Python Workshop\n",
    "\n",
    "---\n",
    "\n",
    "**Goal of this tutorial:** Clean the raw EEG so it is ready for analysis.\n",
    "\n",
    "**By the end you will be able to:**\n",
    "- Re-reference EEG to the average reference\n",
    "- Apply notch and bandpass filters\n",
    "- Detect and interpolate bad channels\n",
    "- Run ICA to remove eye-blink and muscle artefacts\n",
    "- Cut the continuous recording into epochs around motor imagery cues\n",
    "- Apply baseline correction and reject noisy trials\n",
    "- Save the clean epochs for Tutorial 3\n",
    "\n",
    "**Total time: ~3 hours** (including theory discussion and exercises)\n",
    "\n",
    "---\n",
    "\n",
    "> \u26a0\ufe0f **Run Tutorial 1 first** \u2014 it downloads the dataset that this notebook uses.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## \ud83c\udf93 Part 0 \u2014 Why Is Preprocessing Necessary?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Part 0 \u2014 Introduction: ~20 min**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 0.1 What Does a Raw EEG Recording Look Like?\n",
    "\n",
    "Imagine you place **64 small metal discs** (electrodes) on someone's scalp and record the tiny electrical signals produced by their brain for several minutes. What you get back is **not** a clean brain signal. Instead, you get a mixture of many things happening at the same time:\n",
    "\n",
    "- The **brain signal** you actually want \u2014 faint oscillations (5\u201350 \u00b5V)\n",
    "- **Eye blinks** \u2014 create a giant spike that can be **50 times bigger** than the brain signal\n",
    "- **Jaw and forehead muscles twitching** \u2014 adds high-frequency noise\n",
    "- **Heartbeat** \u2014 the electrical activity of the heart reaches the scalp too\n",
    "- **50 Hz hum from power sockets** \u2014 couples into the recording cables\n",
    "- **Electrode wobbling** when the person moves their head\n",
    "\n",
    "> \ud83d\udca1 **Analogy \u2014 Noisy Caf\u00e9:** Think of it like trying to record someone **whispering** in a noisy caf\u00e9. There are other people talking, music playing, coffee machines running, and chairs scraping \u2014 all mixed into your microphone at the same time. Preprocessing is the process of cleaning up that recording so you can hear what the whispering person is actually saying.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 0.2 How Big Are These Artefacts?\n",
    "\n",
    "One of the most important things to understand is the **scale difference** between brain signals and noise. This is why preprocessing is not optional \u2014 it is essential.\n",
    "\n",
    "| Source | Typical Amplitude | Compared to brain signal |\n",
    "|--------|-------------------|--------------------------|\n",
    "| **Brain EEG** (what we want) | 5 \u2013 50 \u00b5V | \u00d7 1 \u2014 the target |\n",
    "| Eye blink | 100 \u2013 2000 \u00b5V | **\u00d7 10 to \u00d7 100 larger!** |\n",
    "| Lateral eye movement | 50 \u2013 500 \u00b5V | \u00d7 5 to \u00d7 50 larger |\n",
    "| Jaw/forehead muscle | 50 \u2013 1000 \u00b5V | \u00d7 5 to \u00d7 100 larger |\n",
    "| Heartbeat (ECG) | 20 \u2013 200 \u00b5V | \u00d7 2 to \u00d7 20 larger |\n",
    "| 50 Hz power line | 5 \u2013 100 \u00b5V | Similar to or larger than brain |\n",
    "| Electrode movement | 100 \u2013 5000 \u00b5V | Can completely mask the brain |\n",
    "\n",
    "> \ud83c\udfaf **Why This Matters:** Without preprocessing, any analysis you do will mostly reflect these artefacts \u2014 not the brain. A classifier trained on noisy EEG would learn to detect eye blinks, not motor imagery. Preprocessing is the **foundation** of everything else.\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Discussion question for the class:** *\"What would happen if we tried to classify left-hand vs right-hand motor imagery on raw EEG that still contains eye blinks?\"*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 0.3 The Preprocessing Pipeline \u2014 An Overview\n",
    "\n",
    "We clean EEG in a **specific sequence** of steps. Each step targets a different type of noise, and **the order matters** (we explain why in Part 9). Here is the full pipeline we will follow today:\n",
    "\n",
    "```\n",
    "Load data\n",
    "    \u2502\n",
    "    \u25bc\n",
    "Set electrode positions (montage)\n",
    "    \u2502\n",
    "    \u25bc\n",
    "Mark & Interpolate bad channels  \u2190\u2500\u2500 must come FIRST\n",
    "    \u2502\n",
    "    \u25bc\n",
    "Average re-reference             \u2190\u2500\u2500 after bad channels fixed\n",
    "    \u2502\n",
    "    \u25bc\n",
    "Notch filter  (50 / 60 Hz)\n",
    "    \u2502\n",
    "    \u25bc\n",
    "Bandpass filter  (1 \u2013 40 Hz)     \u2190\u2500\u2500 must come BEFORE ICA\n",
    "    \u2502\n",
    "    \u25bc\n",
    "ICA: fit \u2192 identify \u2192 remove     \u2190\u2500\u2500 must come BEFORE epoching\n",
    "    \u2502\n",
    "    \u25bc\n",
    "Epoch (cut into trials)\n",
    "    \u2502\n",
    "    \u25bc\n",
    "Baseline correction\n",
    "    \u2502\n",
    "    \u25bc\n",
    "Epoch rejection\n",
    "    \u2502\n",
    "    \u25bc\n",
    "Save clean epochs  \u2192  Tutorial 3\n",
    "```\n",
    "\n",
    "> \u26a0\ufe0f **The order is not arbitrary.** We will explain the three critical ordering rules at the end of this tutorial.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## \u2699\ufe0f Setup \u2014 Imports\n",
    "\n",
    "Run this cell first every time you open the notebook.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Setup: ~5 min**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import mne\n",
    "from mne.datasets import eegbci\n",
    "from mne.io import concatenate_raws, read_raw_edf\n",
    "from mne.preprocessing import ICA\n",
    "\n",
    "mne.set_log_level(\"WARNING\")\n",
    "print(\"MNE version:\", mne.__version__)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Step 0 \u2014 Load Raw Data\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Step 0 \u2014 Load data: ~10 min**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What dataset are we using?\n",
    "\n",
    "We are using the **PhysioNet BCI Motor Movement/Imagery** dataset \u2014 one of the most widely-used public EEG datasets in research.\n",
    "\n",
    "- **Subjects:** 109 healthy adults\n",
    "- **Task:** The subject imagines moving their **left hand (T1)** or **right hand (T2)**\n",
    "- **Channels:** 64 EEG electrodes arranged in the standard 10-05 system\n",
    "- **Sampling rate:** 160 Hz (160 data points per second per channel)\n",
    "- **Duration:** ~2.5 minutes per run (we load 2 runs)\n",
    "\n",
    "The raw EEG object contains:\n",
    "- **64 channels** \u00d7 **~20,000 time points** = ~1.3 million numbers\n",
    "- Two types of events: **T1** (left hand imagery) and **T2** (right hand imagery)\n",
    "- Lots of noise \u2014 eye blinks, muscle tension, power-line interference\n",
    "\n",
    "Our job in this tutorial is to clean all of that.\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Discussion question:** *\"With 64 channels and 160 samples/second, how many numbers per second are we recording? How many per minute?\"*\n",
    ">\n",
    "> *Answer: 64 \u00d7 160 = 10,240 numbers/second = 614,400/minute*\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subject = 1\n",
    "runs    = [6, 10]\n",
    "\n",
    "raw_files = eegbci.load_data(subject, runs, verbose=False)\n",
    "raw = concatenate_raws([read_raw_edf(f, preload=True, verbose=False)\n",
    "                        for f in raw_files])\n",
    "eegbci.standardize(raw)\n",
    "raw.set_montage(mne.channels.make_standard_montage(\"standard_1005\"), verbose=False)\n",
    "\n",
    "print(\"Raw data loaded\")\n",
    "print(f\"  Channels : {len(raw.ch_names)}\")\n",
    "print(f\"  Duration : {raw.times[-1]:.1f} s\")\n",
    "print(f\"  Sfreq    : {raw.info['sfreq']} Hz\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualise the raw data\n",
    "\n",
    "Let's look at 10 seconds of the raw EEG **before any cleaning**. You should be able to spot large spikes (eye blinks) and faster noise (muscle artefacts).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show 10 seconds of raw EEG \u2014 look for obvious artefacts\n",
    "fig, ax = plt.subplots(figsize=(14, 6))\n",
    "data, times = raw.get_data(return_times=True)\n",
    "t_mask = times < 10          # first 10 seconds\n",
    "offset = np.arange(10) * 100e-6   # 100 \u00b5V spacing between channels\n",
    "\n",
    "for i, ch in enumerate(raw.ch_names[:10]):\n",
    "    ax.plot(times[t_mask], data[i, t_mask] * 1e6 + i * 100,\n",
    "            lw=0.7, label=ch)\n",
    "\n",
    "ax.set_xlabel(\"Time (s)\")\n",
    "ax.set_ylabel(\"Channels (offset for display)\")\n",
    "ax.set_yticks(np.arange(10) * 100)\n",
    "ax.set_yticklabels(raw.ch_names[:10])\n",
    "ax.set_title(\"First 10 channels \u2014 first 10 seconds of RAW EEG (no cleaning yet)\")\n",
    "ax.set_xlim(0, 10)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "print(\"\\n\ud83d\udc49 Can you spot any large spikes (eye blinks) or irregular bursts (muscle noise)?\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Step 1 \u2014 Re-Referencing\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Step 1 \u2014 Re-referencing: ~20 min**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 What Is a Reference Electrode?\n",
    "\n",
    "Before explaining re-referencing, we need to understand something fundamental about **voltage**.\n",
    "\n",
    "> \ud83d\udca1 **Voltage is never absolute \u2014 it is always a difference between two points.**\n",
    ">\n",
    "> You cannot measure \"the voltage at electrode Fp1\". You can only measure \"the voltage at Fp1 **compared to some other point**\".\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Analogy \u2014 Altitude:** Think of altitude. We say Mount Everest is 8,849 metres tall \u2014 but tall compared to *what*? Compared to sea level. If we chose the floor of the Dead Sea as our reference, Everest would have a different height number \u2014 but the mountain itself hasn't changed.\n",
    ">\n",
    "> Similarly, EEG measures brain voltage **compared to a reference point**. The brain hasn't changed \u2014 but the numbers you see depend entirely on where you placed the reference electrode.\n",
    "\n",
    "In EEG, the amplifier records the voltage difference between each active electrode (on the scalp) and **one chosen reference electrode**. If that reference electrode happens to sit on a region with brain activity, that activity gets subtracted from every channel \u2014 distorting all recordings simultaneously.\n",
    "\n",
    "**Example:** If the reference is placed at Cz (top of the head, directly over the motor cortex), then any motor cortex signal will be reduced in all other channels \u2014 because you are subtracting it out.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 The Average Reference \u2014 Our Solution\n",
    "\n",
    "The most popular solution for research EEG is the **average reference**. Instead of using one specific electrode as the reference, we use the **average of all electrodes** and subtract this average from every channel.\n",
    "\n",
    "$$V_{\\text{cleaned}}(\\text{channel}) = V_{\\text{recorded}}(\\text{channel}) - \\frac{1}{N}\\sum_{k=1}^{N} V_{\\text{recorded}}(k)$$\n",
    "\n",
    "In plain words: for each time point, add up the voltage values from all electrodes, divide by the number of electrodes, and subtract that average from every channel.\n",
    "\n",
    "> \u26a0\ufe0f **Critical rule:** Always fix bad channels **before** computing the average reference. A broken channel recording noise at 500 \u00b5V would pull the average up by ~8 \u00b5V, corrupting every single channel.\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Check your understanding:**\n",
    "> 1. If one electrode is broken and recording 500 \u00b5V of noise, how does that affect the average reference? *(It inflates the average, which gets subtracted from every channel, corrupting all 63 other channels.)*\n",
    "> 2. Why is the average reference better than a single reference electrode?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raw_reref = raw.copy().set_eeg_reference(\"average\", projection=False, verbose=False)\n",
    "print(\"Re-referencing: set to average reference \u2713\")\n",
    "\n",
    "# Visual comparison \u2014 one channel before vs after\n",
    "data_before, times = raw.get_data(return_times=True)\n",
    "data_after, _      = raw_reref.get_data(return_times=True)\n",
    "\n",
    "fig, axes = plt.subplots(2, 1, figsize=(12, 5), sharex=True)\n",
    "axes[0].plot(times[:1280], data_before[0, :1280] * 1e6, \"steelblue\", lw=0.8)\n",
    "axes[0].set_title(\"Channel Fc5 \u2014 Before Re-referencing\")\n",
    "axes[0].set_ylabel(\"Amplitude (\u00b5V)\")\n",
    "axes[1].plot(times[:1280], data_after[0,  :1280] * 1e6, \"tomato\",    lw=0.8)\n",
    "axes[1].set_title(\"Channel Fc5 \u2014 After Average Re-referencing\")\n",
    "axes[1].set_ylabel(\"Amplitude (\u00b5V)\")\n",
    "axes[1].set_xlabel(\"Time (s)\")\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \ud83d\udc41\ufe0f **What to check:**\n",
    "> - The two traces should look nearly identical for this dataset (it was already average-referenced during recording)\n",
    "> - In real-world data with a biased reference you would see the DC level shift\n",
    "> - The *shape* of the waveform should not change \u2014 only the overall level\n",
    "> - In your own research: **always report which reference you used** in your methods section, as different choices produce visually different results\n",
    "\n",
    "> \u26a0\ufe0f **Note for European datasets:** The PhysioNet dataset was recorded in the USA. For European datasets (e.g., from BrainVision amplifiers in a Greek hospital), the same logic applies \u2014 you still use average reference. The only difference is the notch filter in the next step (50 Hz vs 60 Hz).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Step 2 \u2014 Filtering\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Step 2 \u2014 Filtering: ~30 min**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 What Is a Frequency in EEG?\n",
    "\n",
    "EEG signals are made up of **oscillations** \u2014 rhythmic up-and-down fluctuations. A 10 Hz signal goes up and down 10 times per second. A 50 Hz signal goes up and down 50 times per second.\n",
    "\n",
    "Different types of **brain activity** produce oscillations at different speeds, and different types of **noise** also have characteristic frequencies.\n",
    "\n",
    "> \ud83d\udca1 **Analogy \u2014 Music Equaliser:** Think of the equaliser on a music player \u2014 the device that lets you boost the bass or treble. It separates the audio into frequency bands and lets you control each one independently. A filter in EEG does exactly the same thing.\n",
    "\n",
    "### 2.2 The EEG Frequency Bands\n",
    "\n",
    "| Frequency Range | What Is There? | Action |\n",
    "|---|---|---|\n",
    "| **Below 1 Hz** | Electrode drift, sweat, slow body movements \u2014 NOT brain | Remove (high-pass filter) |\n",
    "| **0.5 \u2013 4 Hz** | **Delta** waves: deep sleep | Keep (for sleep studies) |\n",
    "| **4 \u2013 8 Hz** | **Theta** waves: memory, drowsiness | Keep |\n",
    "| **8 \u2013 13 Hz** | **Alpha** waves: relaxed wakefulness; **Mu** rhythm over motor cortex | Keep \u2014 critical for motor imagery |\n",
    "| **13 \u2013 30 Hz** | **Beta** waves: active thinking, motor planning | Keep \u2014 critical for motor imagery |\n",
    "| **30 \u2013 80 Hz** | **Gamma** waves: high-level cognition; also contains some EMG | Keep or partially remove |\n",
    "| **Above 40\u201380 Hz** | Mainly muscle (EMG) noise \u2014 NOT brain | Remove (low-pass filter) |\n",
    "| **50 Hz exactly** | Power socket hum \u2014 NOT brain | Remove (notch filter) |\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Discussion question:** *\"For a motor imagery BCI, which frequency bands are most important and why?\"*\n",
    ">\n",
    "> *Expected answer: Alpha (8\u201313 Hz) and Beta (13\u201330 Hz). These contain the Mu rhythm and sensorimotor beta rhythms that change during imagined movement (Event-Related Desynchronisation).*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 The Three Filters We Apply\n",
    "\n",
    "**The Notch Filter \u2014 Removing the Power Socket Hum**\n",
    "\n",
    "European power sockets supply electricity that oscillates at exactly **50 Hz** (60 Hz in USA). This frequency radiates from the cables in the walls and couples into EEG recording cables. A notch filter removes **one very narrow frequency** while leaving everything around it completely untouched.\n",
    "\n",
    "> \ud83d\udca1 **Analogy \u2014 Radio interference:** Imagine a radio that receives a strong interference signal at exactly 100.0 MHz. A notch filter is like a tiny \"hole\" in the frequency response that blocks only that exact frequency. You can still hear everything else perfectly.\n",
    "\n",
    "---\n",
    "\n",
    "**The High-Pass Filter \u2014 Removing Slow Drifts**\n",
    "\n",
    "Electrodes and skin produce very slow voltage changes caused by sweat, small movements, and temperature. These are so slow (below 1 Hz) that they look like long, smooth drifts in the signal.\n",
    "\n",
    "> \ud83d\udcac **In plain words:** High-pass means: *let high frequencies pass through; block low frequencies*. We set the cut-off at **1 Hz**. Anything slower than 1 cycle per second is removed. Brain alpha (10 Hz), beta (20 Hz), theta (6 Hz) \u2014 all safely above 1 Hz \u2014 pass through unchanged.\n",
    "\n",
    "---\n",
    "\n",
    "**The Low-Pass Filter \u2014 Removing Muscle Noise**\n",
    "\n",
    "Muscles generate high-frequency electrical activity (EMG) when they contract. Jaw and forehead muscles are especially problematic because they sit close to EEG electrodes. Their noise is spread broadly across 20\u2013300 Hz.\n",
    "\n",
    "> \ud83d\udcac **In plain words:** Low-pass means: *let low frequencies pass through; block high frequencies*. We set the cut-off at **40 Hz**. Everything above 40 Hz is removed.\n",
    "\n",
    "---\n",
    "\n",
    "> \u2b50 **Remember:** High-pass + Low-pass together = **Bandpass filter**. Bandpass 1\u201340 Hz means: keep only the frequencies between 1 Hz and 40 Hz. This is the standard choice for **motor imagery** EEG analysis.\n",
    ">\n",
    "> \u26a0\ufe0f **For ERP studies** (like the P300 or N400), use **0.1\u201340 Hz** instead \u2014 ERPs contain very slow components that would be removed by 1 Hz filtering.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 Zero-Phase Filtering \u2014 Why It Matters\n",
    "\n",
    "When you apply a filter, there is a risk of **shifting the signal in time** \u2014 making a brain response appear earlier or later than it actually was. This is called **phase distortion**. For ERP research, where timing is everything (a P300 at 300 ms has a different meaning from a P300 at 320 ms), phase distortion would corrupt all latency measurements.\n",
    "\n",
    "MNE-Python applies filters in **zero-phase mode**, which means the filter is applied **twice**: once forward in time and once backward. The time shifts from the two passes cancel each other out exactly.\n",
    "\n",
    "> \u2b50 **Tip:** Zero-phase filtering is the **default** in MNE. You do not need to do anything special \u2014 it is automatic. Just be aware that this requires the full signal to be loaded into memory (`preload=True`), which is why we always set that option when loading EEG files.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raw_filt = raw_reref.copy()\n",
    "\n",
    "# 2a. Notch filter \u2014 remove power-line interference\n",
    "# Change freqs=60 to freqs=50 for European datasets!\n",
    "raw_filt.notch_filter(freqs=60, verbose=False)\n",
    "\n",
    "# 2b. Bandpass filter \u2014 keep only 1\u201340 Hz\n",
    "raw_filt.filter(l_freq=1.0, h_freq=40.0, verbose=False)\n",
    "\n",
    "print(\"Filtering complete: notch at 60 Hz + bandpass 1\u201340 Hz \u2713\")\n",
    "print()\n",
    "print(\"Remember: for European recordings, use notch_filter(freqs=50)\")\n",
    "print(\"Remember: for ERP studies,        use filter(l_freq=0.1, h_freq=40)\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualise \u2014 Power Spectral Density (PSD) Before vs After\n",
    "\n",
    "The **Power Spectral Density (PSD)** shows how much power exists at each frequency. It is the best way to verify that filtering worked correctly.\n",
    "\n",
    "> \ud83d\udca1 **Reading a PSD plot:**\n",
    "> - The x-axis is frequency (Hz)\n",
    "> - The y-axis is power (dB) \u2014 how much energy at that frequency (higher = more energy)\n",
    "> - Normal EEG has a characteristic **1/f shape** \u2014 more power at lower frequencies, gradually decreasing\n",
    "> - We will look for: the 60 Hz spike **before** filtering, and confirm it is **gone after**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compare PSD before vs after filtering\n",
    "fig, axes = plt.subplots(1, 2, figsize=(13, 4))\n",
    "\n",
    "raw_reref.compute_psd(fmax=80).plot(axes=axes[0], show=False)\n",
    "axes[0].set_title(\"PSD \u2014 Before Filtering\")\n",
    "axes[0].axvline(60, color=\"red\", lw=1.5, ls=\"--\", label=\"60 Hz (power line)\")\n",
    "axes[0].legend(fontsize=9)\n",
    "\n",
    "raw_filt.compute_psd(fmax=80).plot(axes=axes[1], show=False)\n",
    "axes[1].set_title(\"PSD \u2014 After Filtering (notch 60 Hz + bandpass 1\u201340 Hz)\")\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \ud83d\udc41\ufe0f **What to check in the PSD plots:**\n",
    "> - **Before:** Large hump at low frequencies + a sharp spike at 60 Hz\n",
    "> - **After:** Clean spectrum that drops off sharply above 40 Hz; the 60 Hz spike is gone\n",
    "> - An **alpha peak (~10 Hz)** may be visible as a small bump in both plots\n",
    "> - The cut-off at 40 Hz should be clearly visible \u2014 power drops almost to zero above it\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Discussion question:** *\"Look at the PSD before filtering. Where is the power concentrated? Which part of the spectrum might contain motor imagery information?\"*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Step 3 \u2014 Bad Channel Detection & Interpolation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Step 3 \u2014 Bad channels: ~20 min**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 Why Do Channels Go Bad?\n",
    "\n",
    "In an ideal world, every electrode would make perfect contact with the scalp and record only genuine brain signals. In practice, recordings are rarely perfect. Here are the most common reasons an electrode fails:\n",
    "\n",
    "- **Poor gel contact:** not enough conductive gel between electrode and scalp \u2192 high impedance \u2192 the electrode picks up more noise than brain signal\n",
    "- **Dry electrode:** gel dried out during a long recording session\n",
    "- **Broken wire:** thin wire connecting electrode to amplifier is damaged \u2192 flat line or extremely noisy signal\n",
    "- **Bridged electrodes:** two adjacent electrodes accidentally connected by too much gel \u2192 both channels record exactly the same signal\n",
    "- **Saturated amplifier:** a very large movement pushed the amplifier beyond its input range \u2192 clipped flat line at maximum voltage\n",
    "\n",
    "### 3.2 How to Identify a Bad Channel\n",
    "\n",
    "| What you see | What it means |\n",
    "|---|---|\n",
    "| **Flat horizontal line** (zero or constant) | Broken wire, disconnected electrode, or saturated amplifier |\n",
    "| **Extreme noise** \u2014 much higher than all other channels | Poor contact, high impedance |\n",
    "| **Exact same signal** as an adjacent channel | Bridged electrodes |\n",
    "| Regular spikes at heartbeat rate (~1 per second) | Electrode over a scalp vessel |\n",
    "| Signal looks fine but **jumps suddenly** to a new level | Electrode pop \u2014 brief loss of contact, then recovers |\n",
    "\n",
    "> \ud83d\udca1 **Analogy \u2014 Microphone Array:** Think of it like a microphone array in a recording studio. If one microphone has a broken cable, you do not throw away the whole recording. Instead, you flag that microphone, remove its broken track, and **reconstruct** what it should have recorded using the signals from the nearby microphones. That reconstruction is what spherical spline interpolation does for EEG.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 Spherical Spline Interpolation \u2014 Rebuilding a Bad Channel\n",
    "\n",
    "Once we have identified a bad channel, we do not simply delete it. Instead, we **reconstruct** a plausible estimate of what it should have recorded, using the signals from all the surrounding good electrodes.\n",
    "\n",
    "> \ud83d\udcac **In plain words:**\n",
    "> 1. Take all the working electrodes around the bad one\n",
    "> 2. Fit a **smooth surface** through their values at each time point *(think of it like stretching a smooth rubber sheet over the scalp, anchored to all the good electrode values)*\n",
    "> 3. Read off the value of that smooth surface at the position of the bad electrode\n",
    ">\n",
    "> That reconstructed value replaces the broken channel's data.\n",
    "\n",
    "> \u2b50 **Note:** Spherical spline interpolation works well when only a **few isolated** electrodes are bad (up to about 5 out of 64). If more than 20% of channels are bad, the interpolation becomes unreliable \u2014 that recording may need to be excluded entirely.\n",
    "\n",
    "> \u26a0\ufe0f **Critical rule:** Always interpolate bad channels **BEFORE** computing the average reference. A bad channel contaminates the channel average and will introduce artefacts everywhere.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For this demo, we manually mark Fp1 as bad\n",
    "# In a real session: visually inspect raw.plot() and mark the ones you find\n",
    "raw_bad = raw_filt.copy()\n",
    "raw_bad.info[\"bads\"] = [\"Fp1\"]\n",
    "print(\"Marked as bad:\", raw_bad.info[\"bads\"])\n",
    "\n",
    "# Interpolate the bad channel from its neighbours\n",
    "raw_interp = raw_bad.copy().interpolate_bads(reset_bads=True, verbose=False)\n",
    "print(\"After interpolation \u2014 bad channels:\", raw_interp.info[\"bads\"])  # should be empty\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visual comparison: original vs bad (simulated) vs interpolated\n",
    "idx_fp1 = raw_filt.ch_names.index(\"Fp1\")\n",
    "data_before, times = raw.get_data(return_times=True)\n",
    "t_sel   = times[:640]  # first 4 seconds\n",
    "\n",
    "data_orig   = raw_filt.get_data()[idx_fp1, :640]   * 1e6\n",
    "data_interp = raw_interp.get_data()[idx_fp1, :640] * 1e6\n",
    "\n",
    "fig, axes = plt.subplots(3, 1, figsize=(12, 7), sharex=True)\n",
    "axes[0].plot(t_sel, data_orig,       \"steelblue\", lw=1, label=\"Original\")\n",
    "axes[0].set_title(\"Fp1 \u2014 Original signal (eye blink artefacts are normal here \u2014 they will be removed by ICA)\")\n",
    "axes[1].plot(t_sel, np.zeros(640),   \"tomato\",    lw=1, label=\"Zeroed (simulating dead channel)\")\n",
    "axes[1].set_title(\"Fp1 \u2014 Simulated dead channel (all zeros = what a flat-line electrode looks like)\")\n",
    "axes[2].plot(t_sel, data_interp,     \"seagreen\",  lw=1, label=\"Interpolated\")\n",
    "axes[2].set_title(\"Fp1 \u2014 Reconstructed from neighbouring electrodes (spherical spline interpolation)\")\n",
    "\n",
    "for ax in axes:\n",
    "    ax.set_ylabel(\"\u00b5V\")\n",
    "    ax.legend(loc=\"upper right\", fontsize=8)\n",
    "    ax.grid(alpha=0.3)\n",
    "axes[2].set_xlabel(\"Time (s)\")\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \ud83d\udc41\ufe0f **What to check:**\n",
    "> - The **interpolated** channel (green) should look smoother than the original because it has no eye-blink spike \u2014 Fp1 is a frontal electrode that captures blinks, and its neighbours are slightly further from the eyes\n",
    "> - The overall shape and scale should be similar to other nearby channels\n",
    "> - If the interpolated channel looks completely flat or extremely noisy, there are likely no good neighbours (too many bad channels)\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Practical tip:** In a real EEG study, always check electrode impedances **before** the recording starts and aim for < 20 k\u03a9. This prevents most bad channels before they happen.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Step 4 \u2014 ICA: Removing Physiological Artefacts\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Step 4 \u2014 ICA (longest section): ~40 min**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1 The Problem ICA Solves\n",
    "\n",
    "Even after filtering, **eye blinks, eye movements, and cardiac artefacts** are still present in the EEG. These artefacts are **physiological** \u2014 they come from the body, not the environment \u2014 so no filter can remove them without also removing brain signals (because they overlap in frequency).\n",
    "\n",
    "We need a smarter approach.\n",
    "\n",
    "The challenge is that **every electrode records a mixture of many sources**: some brain regions, some eye muscles, some facial muscles, and the heart. We cannot separate them by looking at one electrode at a time, because all sources are mixed together everywhere.\n",
    "\n",
    "We need to look at all 64 channels **simultaneously** and use the **spatial pattern** of the signal to separate the sources.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2 The Cocktail Party Analogy\n",
    "\n",
    "> \ud83d\udca1 **Analogy:** Imagine you are at a party where **5 people are talking at the same time**. You place 5 microphones around the room. Each microphone records a **mixture** of all 5 voices \u2014 some voices louder, some quieter, depending on how far each microphone is from each speaker.\n",
    ">\n",
    "> Given only the 5 microphone recordings, can you recover the 5 original voices?\n",
    ">\n",
    "> **ICA does exactly this \u2014 but for EEG.**\n",
    ">\n",
    "> - The \"microphones\" are the EEG **electrodes**\n",
    "> - The \"voices\" are the independent sources: **brain regions, eye muscles, heart**\n",
    "> - ICA analyses all channels together and figures out the original sources\n",
    "\n",
    "The key insight is that brain signals, eye blinks, and muscle activity are **statistically independent** \u2014 knowing when someone blinks tells you nothing about what their alpha rhythm is doing at that moment. ICA exploits this independence to disentangle them.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3 How ICA Works \u2014 Step by Step\n",
    "\n",
    "ICA analyses the patterns of co-variation across all 64 channels and identifies a set of **independent components (ICs)**. Each IC has two representations:\n",
    "\n",
    "1. A **topographic map** showing which electrodes \"feel\" this source the most (its spatial distribution on the scalp)\n",
    "2. A **time series** showing *when* this source was active\n",
    "\n",
    "After ICA, you look at each component and ask: *does this look like a brain source or an artefact?*\n",
    "\n",
    "| Component type | What the scalp map looks like | What the time series looks like |\n",
    "|---|---|---|\n",
    "| **Eye blink** | Large activity over the forehead electrodes (Fp1, Fp2) \u2014 symmetric | Occasional large spikes \u2014 once every few seconds |\n",
    "| **Eye movement** | Opposite activity at left-front vs right-front electrodes | Step-like shifts when the person moves their eyes |\n",
    "| **Heartbeat** | Scattered activity; often stronger at temples | Regular sharp pulses \u2014 about once per second |\n",
    "| **Jaw muscle** | Irregular patches over temporal/cheek electrodes | Bursts of high-frequency noise when tense |\n",
    "| **Brain \u2014 alpha** | Smooth, symmetric blob over the **back** of the head | Waxing and waning 10 Hz oscillation when relaxed |\n",
    "| **Brain \u2014 motor mu** | Smooth blob centred over C3 or C4 | Rhythmic 10 Hz oscillation that disappears during movement |\n",
    "\n",
    "> \u2b50 **Remember:** ICA requires a **good amount of data** to work well. That is why we apply ICA to the **full continuous recording** (before epoching), not to individual short trials. More data = better source separation. ICA also requires that the data has been **high-pass filtered at \u2265 1 Hz** first \u2014 slow drifts confuse the ICA algorithm.\n",
    "\n",
    "> \u26a0\ufe0f **Golden rule:** When in doubt \u2014 **keep** the component. Removing a brain component permanently deletes neural information. Typically remove only **1\u20133 components** (eye blinks + possibly heartbeat).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.4 Fitting the ICA Model\n",
    "\n",
    "We fit ICA on the **filtered** data. This learns 20 independent components from the continuous EEG.\n",
    "\n",
    "> \ud83d\udcac **What `n_components=20` means:** We ask ICA to find 20 independent sources. With 64 channels, we could go up to 64 components, but 20 is enough to capture the main artefact sources while keeping computation fast. Each component will represent either a brain source or an artefact source.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_components = 20\n",
    "ica = ICA(n_components=n_components, random_state=42, max_iter=\"auto\")\n",
    "ica.fit(raw_filt, verbose=False)\n",
    "\n",
    "print(f\"ICA fitted: {n_components} independent components extracted \u2713\")\n",
    "print()\n",
    "print(\"Next steps:\")\n",
    "print(\"  1. Plot component TOPOGRAPHIES \u2192 identify the spatial pattern of each source\")\n",
    "print(\"  2. Plot component TIME SERIES  \u2192 identify the temporal pattern (spikes = blinks)\")\n",
    "print(\"  3. Use automated detection    \u2192 MNE's find_bads_eog() as a starting point\")\n",
    "print(\"  4. ALWAYS visually verify     \u2192 never remove a component without inspection\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.5 Inspect Component Topographies\n",
    "\n",
    "Plot all 20 component maps. You are looking for characteristic patterns that indicate artefacts.\n",
    "\n",
    "> \ud83d\udc41\ufe0f **What to look for:**\n",
    "> - **Eye blink:** Large **symmetric blob** at the very front of the head (over Fp1/Fp2)\n",
    "> - **Lateral eye movement:** **Asymmetric** front pattern \u2014 positive left, negative right (or vice versa)\n",
    "> - **Muscle:** Focal spots at the **edges/bottom** of the scalp map\n",
    "> - **Brain (alpha):** Smooth blob over the **back** of the head (occipital)\n",
    "> - **Brain (motor mu):** Two symmetric spots over the **C3/C4** area\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Class exercise:** After the plots appear, go through each component together. Call on students to identify which components look like artefacts and which look like brain activity.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot all component topographies \u2014 look for the frontal eye-blink blob\n",
    "fig = ica.plot_components(show=True)\n",
    "plt.suptitle(\"ICA Component Topographies\\n\"\n",
    "             \"Look for: frontal symmetric blob (eye blink), \"\n",
    "             \"asymmetric front (eye movement), edge patches (muscle)\", y=1.02)\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.6 Inspect Component Time Series\n",
    "\n",
    "Plot the time series of each component. You are looking for:\n",
    "- **Occasional large sharp spikes** \u2192 eye blinks\n",
    "- **Regular pulses at ~1/second** \u2192 heartbeat\n",
    "- **Bursts of fast oscillations** \u2192 muscle activity\n",
    "- **Smooth 10 Hz oscillations** \u2192 alpha brain activity (keep this!)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot component time series \u2014 look for regular sharp triangular spikes (blinks)\n",
    "fig = ica.plot_sources(raw_filt, show=True)\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.7 Automated Eye-Artefact Detection\n",
    "\n",
    "MNE can help identify eye-movement components automatically by computing the **correlation** between each ICA component and a frontal channel (Fp1) that is close to the eyes. High correlation = likely eye artefact.\n",
    "\n",
    "> \u26a0\ufe0f **Important:** This is a **starting point**, not a final answer. Always visually verify the flagged components before removing them.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Automated eye artefact detection using Fp1 as a proxy for eye movements\n",
    "eog_ch = \"Fp1\"\n",
    "eog_idx, eog_scores = ica.find_bads_eog(raw_filt, ch_name=eog_ch, verbose=False)\n",
    "print(f\"Automated detection \u2014 suspected eye-movement components: {eog_idx}\")\n",
    "print()\n",
    "print(\"Always verify these visually before removing!\")\n",
    "\n",
    "# Plot correlation scores\n",
    "fig = ica.plot_scores(eog_scores, exclude=eog_idx, show=True,\n",
    "                      title=\"ICA component correlation with Fp1 (eye proxy)\\n\"\n",
    "                            \"Bars in red are flagged as suspected eye artefacts\")\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exclude the identified artefact components and reconstruct clean EEG\n",
    "ica.exclude = eog_idx\n",
    "raw_ica = raw_filt.copy()\n",
    "ica.apply(raw_ica, verbose=False)\n",
    "\n",
    "print(f\"Artefact components removed: {ica.exclude}\")\n",
    "print(f\"Clean data shape: {raw_ica.get_data().shape}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \ud83d\udc41\ufe0f **What to check after ICA:**\n",
    "> - The large frontal spikes (eye blinks) should be **much smaller or gone**\n",
    "> - The rest of the signal should look **broadly similar** \u2014 we only removed artefact sources\n",
    "> - If the signal looks completely different, you may have removed **too many components**\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Discussion question:** *\"If we removed 3 components, does that mean we lost brain information?\"*\n",
    ">\n",
    "> *Expected answer: Ideally no \u2014 if we identified them correctly as artefacts, we removed artefact sources. But if we accidentally removed a brain component, we have permanently lost that neural information. This is why visual inspection and the \"when in doubt, keep it\" rule are so important.*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Step 5 \u2014 Epoching\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Step 5 \u2014 Epoching: ~20 min**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.1 From Continuous Recording to Trials\n",
    "\n",
    "So far we have **one long continuous recording**. But our experiment had many individual trials \u2014 each time the subject received a cue (T1 or T2) to imagine moving their hand. **Epoching** cuts the continuous EEG into fixed-length windows aligned to each event.\n",
    "\n",
    "> \ud83d\udca1 **Analogy \u2014 Cooking Video:** Imagine you are filming someone cooking throughout the day (continuous recording). Now you want to study what they do when a kitchen timer goes off (the event). You go through the footage and cut out a clip starting 30 seconds before each timer ring and ending 3 minutes after it. Each clip is an epoch. You then stack all the clips on top of each other and look for common patterns.\n",
    "\n",
    "Each epoch runs from **\u22120.5 s** (before the cue, for baseline) to **+3.0 s** (end of the imagery period):\n",
    "\n",
    "```\n",
    "|-- baseline --|------------- imagery period ------------|\n",
    "-0.5 s         0 s                                    +3.0 s\n",
    "               \u2191 cue appears here\n",
    "```\n",
    "\n",
    "After epoching, your data changes shape:\n",
    "- **Before:** 2D array \u2014 (channels \u00d7 all time samples)\n",
    "- **After:** 3D array \u2014 (number of trials \u00d7 channels \u00d7 time samples per trial)\n",
    "\n",
    "This 3D format is the standard input for all ERP and BCI classification analyses.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.2 Baseline Correction\n",
    "\n",
    "After cutting epochs, we **subtract the mean voltage** during the pre-cue period (\u22120.5 to 0 s) from the whole epoch. This zeros out any slow drift that happened to be present at the moment the cue appeared.\n",
    "\n",
    "> \ud83d\udca1 **Analogy \u2014 Temperature Change:** Think of measuring how much the temperature changes after you turn on a heater. If you start measuring at 18\u00b0C in one room and 21\u00b0C in another, you cannot directly compare the measurements. But if you subtract the starting temperature from each measurement, you get \"change from starting temperature\" \u2014 which is directly comparable across rooms (and trials).\n",
    "\n",
    "> \ud83d\udcac **In plain words:**\n",
    "> 1. Calculate the **average voltage** during the pre-stimulus period (\u22120.5 to 0 s)\n",
    "> 2. Subtract that average from **every time point** in the epoch\n",
    ">\n",
    "> After this, every trial starts at exactly **zero** at the moment the stimulus appears. This makes trials comparable to each other regardless of what the brain was doing before.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extract event markers (T1 = left hand, T2 = right hand)\n",
    "events, event_id = mne.events_from_annotations(raw_ica, verbose=False)\n",
    "print(\"Event IDs:\", event_id)\n",
    "\n",
    "# Keep only motor imagery events\n",
    "event_id_sel = {\"left_hand\": event_id[\"T1\"], \"right_hand\": event_id[\"T2\"]}\n",
    "\n",
    "tmin, tmax = -0.5, 3.0   # epoch window: 0.5 s baseline + 3.0 s imagery\n",
    "\n",
    "epochs = mne.Epochs(\n",
    "    raw_ica,\n",
    "    events,\n",
    "    event_id   = event_id_sel,\n",
    "    tmin       = tmin,\n",
    "    tmax       = tmax,\n",
    "    baseline   = (None, 0),   # baseline-correct using the pre-stimulus interval\n",
    "    preload    = True,\n",
    "    verbose    = False,\n",
    ")\n",
    "\n",
    "print(f\"\\nEpochs created:\")\n",
    "print(f\"  Trials    : {len(epochs)}\")\n",
    "print(f\"  Shape     : {epochs.get_data().shape}  (trials \u00d7 channels \u00d7 time points)\")\n",
    "print(f\"  Conditions: {list(epochs.event_id.keys())}\")\n",
    "print(f\"  Time range: {tmin} s to {tmax} s\")\n",
    "print(f\"  Baseline  : {tmin} s to 0 s (pre-cue period)\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot the first 5 epochs at channel C3\n",
    "# C3 is over the LEFT motor cortex \u2014 important for right-hand motor imagery\n",
    "ep_data  = epochs.get_data()\n",
    "ep_times = epochs.times\n",
    "idx_c3   = epochs.ch_names.index(\"C3\")\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(11, 4))\n",
    "for i in range(min(5, len(ep_data))):\n",
    "    ax.plot(ep_times, ep_data[i, idx_c3] * 1e6, alpha=0.6, lw=0.9,\n",
    "            label=f\"Trial {i+1}\")\n",
    "ax.axvline(0, color=\"k\", lw=1.5, linestyle=\"--\", label=\"Cue onset (t=0)\")\n",
    "ax.axvspan(tmin, 0, alpha=0.08, color=\"grey\", label=\"Baseline period\")\n",
    "ax.set_xlabel(\"Time (s)\")\n",
    "ax.set_ylabel(\"Amplitude (\u00b5V)\")\n",
    "ax.set_title(\"C3 (left motor cortex) \u2014 First 5 Epochs\\n\"\n",
    "             \"Each line is one trial; cue appears at t=0; baseline period is grey\")\n",
    "ax.legend(fontsize=8, loc=\"upper right\")\n",
    "ax.grid(alpha=0.3)\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Epoch image \u2014 rows=trials, columns=time, colour=amplitude\n",
    "# This gives an overview of ALL trials at once \u2014 rows are trials, colour is amplitude\n",
    "print(\"Epoch image for channel C3:\")\n",
    "print(\"  \u2192 Each row = one trial\")\n",
    "print(\"  \u2192 Colour   = voltage (blue = negative, red = positive)\")\n",
    "print(\"  \u2192 Look for patterns that appear consistently across rows (= real brain signal)\")\n",
    "print(\"  \u2192 Bright rows that look different from others (= possible artefacts to reject)\")\n",
    "fig = epochs.plot_image(picks=[\"C3\"], show=True)\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \ud83d\udc41\ufe0f **What to check in the epoch image:**\n",
    "> - Look for **consistent patterns** across rows \u2014 these represent reliable brain activity\n",
    "> - Look for **bright or dark outlier rows** \u2014 these are likely artefacted trials that will be rejected\n",
    "> - The **baseline period** (left side of plot, before t=0) should look relatively flat and consistent\n",
    "> - Any trial that looks dramatically different from its neighbours is a candidate for rejection\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Discussion question:** *\"Why do we use the period BEFORE the cue as baseline? What would happen if we used the period AFTER the cue?\"*\n",
    ">\n",
    "> *Answer: The pre-cue period should not be contaminated by the task-related brain response. Using the post-cue period as baseline would subtract the brain response we are trying to measure \u2014 destroying the signal of interest.*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Step 6 \u2014 Epoch Rejection\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Step 6 \u2014 Epoch rejection: ~10 min**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6.1 Why Some Trials Must Be Discarded\n",
    "\n",
    "Despite all the preprocessing steps above, some individual trials will still contain large artefacts that could not be removed. This happens when:\n",
    "- The participant moved their head suddenly \u2014 creating a huge transient movement artefact\n",
    "- An electrode popped off and back on during a trial\n",
    "- The participant sneezed, coughed, or laughed\n",
    "- A very large muscle contraction overwhelmed the ICA decomposition\n",
    "\n",
    "These trials are so contaminated that keeping them would corrupt the analysis.\n",
    "\n",
    "### 6.2 The Amplitude Threshold Method\n",
    "\n",
    "For each trial and each channel, we compute the **peak-to-peak amplitude** (max \u2212 min). If this exceeds a threshold that would be physiologically implausible for a clean EEG signal, we reject the entire trial.\n",
    "\n",
    "| Threshold | Effect | When to use |\n",
    "|---|---|---|\n",
    "| **75 \u00b5V** | Very strict \u2014 rejects many trials | Only if you have > 100 trials and a very clean recording |\n",
    "| **100 \u00b5V** | Strict | Standard for well-controlled recordings in cooperative adults |\n",
    "| **150 \u00b5V** | Moderate \u2014 **recommended starting point** | Most EEG paradigms, motor imagery with 64 channels |\n",
    "| **200 \u00b5V** | Lenient | When you cannot afford to lose trials (few per condition) |\n",
    "\n",
    "> \u2b50 **Target:** Aim to retain at least **70\u201380% of epochs**. If you are rejecting more than 30%, something in the earlier steps may need adjustment, or the recording quality was poor from the start.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_before = len(epochs)\n",
    "\n",
    "epochs_clean = epochs.copy().drop_bad(\n",
    "    reject={\"eeg\": 150e-6},   # \u00b1150 \u00b5V threshold (150e-6 = 150 \u00b5V in Volts)\n",
    "    verbose=False,\n",
    ")\n",
    "\n",
    "n_after  = len(epochs_clean)\n",
    "retained = n_after / n_before * 100\n",
    "\n",
    "print(f\"Epochs before rejection : {n_before}\")\n",
    "print(f\"Epochs after  rejection : {n_after}\")\n",
    "print(f\"Retained                : {retained:.1f}%\")\n",
    "print()\n",
    "if retained < 70:\n",
    "    print(\"\u26a0\ufe0f  Less than 70% retained \u2014 consider adjusting the threshold or reviewing ICA\")\n",
    "else:\n",
    "    print(\"\u2713  Retention rate is acceptable\")\n",
    "print()\n",
    "print(f\"Left-hand trials  : {len(epochs_clean['left_hand'])}\")\n",
    "print(f\"Right-hand trials : {len(epochs_clean['right_hand'])}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualise \u2014 Which Channels Caused the Most Rejections?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show the drop log \u2014 what fraction of epochs was dropped and why\n",
    "epochs_clean.plot_drop_log(show=True)\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Step 7 \u2014 Save Clean Epochs\n",
    "\n",
    "We save the preprocessed epochs in MNE's `.fif` format. Tutorial 3 will load this file directly \u2014 no need to re-run all the preprocessing steps.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> \u23f1\ufe0f **Estimated time \u2014 Step 7 \u2014 Save: ~5 min**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply explicit baseline correction (already applied in Epochs, but shown for clarity)\n",
    "epochs_bc = epochs_clean.copy().apply_baseline((None, 0), verbose=False)\n",
    "\n",
    "save_path = \"/tmp/motor_imagery_epochs-epo.fif\"\n",
    "epochs_bc.save(save_path, overwrite=True, verbose=False)\n",
    "\n",
    "print(f\"Clean epochs saved to: {save_path} \u2713\")\n",
    "print(f\"Final shape: {epochs_bc.get_data().shape}  (trials \u00d7 channels \u00d7 time points)\")\n",
    "print()\n",
    "# Verify we can load them back\n",
    "epochs_loaded = mne.read_epochs(save_path, verbose=False)\n",
    "print(f\"Load check: {epochs_loaded.get_data().shape} \u2713\")\n",
    "print()\n",
    "print(\"These epochs will be the input for Tutorial 3: Feature Extraction & Classification\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## \ud83c\udf93 Part 9 \u2014 Why the Order Matters\n",
    "\n",
    "> \ud83d\udde3\ufe0f *\"Can I just do these steps in any order?\"*\n",
    ">\n",
    "> **No.** The sequence is important because each step assumes the previous ones have already been done. Here are the three most important ordering rules.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Rule 1: Fix Bad Channels BEFORE Computing the Average Reference\n",
    "\n",
    "The average reference subtracts the **mean of all channels** from each channel. If one channel is broken and recording noise at 500 \u00b5V, that noise gets added (in negative) to every other channel. You must repair the broken channel first, so the average is meaningful.\n",
    "\n",
    "> \ud83d\udca1 **Analogy:** If one person in a group is reporting absurd numbers, their number should be removed from the group average before you calculate it \u2014 not after.\n",
    "\n",
    "---\n",
    "\n",
    "### Rule 2: Apply Bandpass Filter BEFORE Running ICA\n",
    "\n",
    "ICA is confused by slow drifts (below 1 Hz). Without high-pass filtering first, ICA tends to spend several of its components modelling the slow drift instead of the eye blinks and heartbeat. The result is poor artefact separation and components that are hard to interpret.\n",
    "\n",
    "> \ud83d\udca1 **Analogy:** Before trying to separate voices at a cocktail party, you would turn off the air conditioning (the low-frequency hum). Otherwise the microphones capture that dominant rumble and the voice-separation algorithm tries to model it instead of the voices.\n",
    "\n",
    "---\n",
    "\n",
    "### Rule 3: Run ICA BEFORE Creating Epochs\n",
    "\n",
    "ICA works better on **longer continuous data**. The more data it sees, the more accurately it estimates the spatial patterns of each source. Epoched data consists of many short segments with gaps between them, which disrupts the statistical estimation.\n",
    "\n",
    "> \ud83d\udca1 **Analogy:** You would ask a music recogniser to listen to a full song before identifying the instruments \u2014 not to listen to thousands of disconnected one-second clips with silence in between.\n",
    "\n",
    "---\n",
    "\n",
    "> \u2b50 **The correct order:**\n",
    "> ```\n",
    "> Load \u2192 Montage \u2192 Bad Channels \u2192 Average Reference \u2192 Notch Filter\n",
    "> \u2192 Bandpass Filter \u2192 ICA \u2192 Epoch \u2192 Baseline Correction \u2192 Epoch Rejection \u2192 Save\n",
    "> ```\n",
    "> This order is used by research groups worldwide. Following it ensures your preprocessing decisions are scientifically defensible.\n",
    "\n",
    "> \ud83d\udde3\ufe0f **Final discussion questions:**\n",
    "> 1. *\"What would go wrong if we epoched BEFORE running ICA?\"*\n",
    "> 2. *\"What would go wrong if we computed the average reference BEFORE fixing bad channels?\"*\n",
    "> 3. *\"A student in a future project decides to bandpass filter their data to 0.1\u201340 Hz instead of 1\u201340 Hz. Is this necessarily wrong? When would it be appropriate?\"*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## \u2705 Tutorial 2 \u2014 Complete Summary\n",
    "\n",
    "| Step | MNE Command | What it does | Why it matters |\n",
    "|---|---|---|---|\n",
    "| 1 \u2014 Re-reference | `set_eeg_reference('average')` | Remove reference electrode bias | All channels contaminated by reference choice |\n",
    "| 2 \u2014 Notch filter | `notch_filter(freqs=60)` | Delete 60 Hz power-line hum | 60 Hz spike corrupts spectra and ICA |\n",
    "| 2 \u2014 Bandpass | `filter(l_freq=1, h_freq=40)` | Keep only 1\u201340 Hz | Removes drifts (low end) and muscle noise (high end) |\n",
    "| 3 \u2014 Bad channels | `info['bads'] = [...]` | Mark broken electrodes | Must be fixed before average reference |\n",
    "| 3 \u2014 Interpolate | `interpolate_bads()` | Reconstruct from neighbours | Fills gaps without deleting spatial information |\n",
    "| 4 \u2014 ICA fit | `ICA().fit(raw)` | Learn 20 independent components | Statistical learning requires full continuous data |\n",
    "| 4 \u2014 ICA identify | `find_bads_eog()` + visual | Find artefact components | Automated + visual = safest approach |\n",
    "| 4 \u2014 ICA apply | `ica.apply(raw)` | Remove artefacts; reconstruct EEG | Physiological artefacts can't be filtered \u2014 ICA is the solution |\n",
    "| 5 \u2014 Epoch | `mne.Epochs(tmin, tmax, baseline)` | Cut into 3.5 s windows | Links brain signals to experimental events |\n",
    "| 6 \u2014 Reject | `epochs.drop_bad(reject={...})` | Discard trials above threshold | Keeps data quality high; aim for \u226570% retention |\n",
    "| 7 \u2014 Save | `epochs.save('-epo.fif')` | Store to disk for Tutorial 3 | Avoid repeating preprocessing every time |\n",
    "\n",
    "---\n",
    "\n",
    "> \u26a0\ufe0f **Order matters!**\n",
    "> `Fix bad channels \u2192 Re-reference \u2192 Filter \u2192 ICA \u2192 Epoch`\n",
    "> Changing the order leads to worse artefact removal.\n",
    "\n",
    "---\n",
    "\n",
    "### \ud83d\udccb Quick Reference Cheat Sheet\n",
    "\n",
    "| Concept | One-sentence explanation | Key number / parameter |\n",
    "|---|---|---|\n",
    "| Average reference | Subtract mean of all channels from each channel | Requires \u2265 32 channels; do AFTER fixing bad channels |\n",
    "| Notch filter | Removes exactly 50 Hz (or 60 Hz) mains interference | 50 Hz in Europe; 60 Hz in USA |\n",
    "| High-pass filter | Removes everything slower than the cutoff | 1 Hz for motor imagery; 0.1 Hz for ERP studies |\n",
    "| Low-pass filter | Removes everything faster than the cutoff | 40 Hz removes most EMG |\n",
    "| Bandpass filter | High-pass + low-pass: keeps a frequency band | 1\u201340 Hz standard for motor imagery |\n",
    "| Bad channel | Electrode not recording usable brain signals | Mark if flat, extreme noise, or identical to neighbour |\n",
    "| Interpolation | Reconstruct bad channel from surrounding good channels | Works for up to ~5 bad channels out of 64 |\n",
    "| ICA | Separate EEG into independent sources; remove artefact ones | 20 components; always visually verify before removing |\n",
    "| Epoch | Fixed-length EEG window around an event marker | Typical: \u22120.5 s to +3.0 s for motor imagery |\n",
    "| Baseline correction | Subtract pre-stimulus mean from each trial \u00d7 channel | Pre-stimulus period: tmin to 0 s |\n",
    "| Epoch rejection | Remove trials exceeding amplitude threshold | Start with 150 \u00b5V; aim to keep \u2265 70% of trials |\n",
    "\n",
    "---\n",
    "\n",
    "### \u279c Next: Tutorial 3 \u2014 EEG Analysis & Classification\n",
    "\n",
    "In Tutorial 3 we will use these clean epochs to:\n",
    "- Compute ERPs and compare left vs right hand conditions\n",
    "- Analyse the frequency spectrum and band power\n",
    "- Build time-frequency maps showing ERD/ERS\n",
    "- Extract features and train a left/right classifier\n"
   ]
  }
 ]
}