Files
elderly-heat-warning/docs/superpowers/plans/2026-05-28-pipeline-execution.md
T

231 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 管线执行计划
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** 从 ERA5 NetCDF 原始数据运行完整管线到 LaTeX 论文
**Architecture:** 5 阶段流水线 — 预处理(NPZ) → 训练(模型) → 评估(图表) → Web(验证) → 论文(LaTeX)
**Tech Stack:** PyTorch 2.12+cu126, xarray+h5netcdf, XGBoost, Flask+ECharts, XeLaTeX+ctexbook
**前置项已就绪:**
- ERA5 数据: 焦作 180 + 郑州 180 (NetCDF4, 已解压)
- GPU: RTX 4060 Laptop (8GB), CUDA 12.6
- h5netcdf/h5py: 已安装
- 外部数据: mortality_population.csv, exposure_response.csv
---
### Task 1: 修复文件命名一致性
**Files:**
- Modify: `src/data/preprocess.py:537`
preprocess 保存 `sequences_{city}.npz`train 加载 `{city}_sequences.npz`,统一为 `{city}_sequences.npz`
- [ ] **Step 1: 修改 preprocess 的命名**
```python
# 第537行: sequences_{city_key}.npz → {city_key}_sequences.npz
npz_path = DATA_PROCESSED / f"{city_key}_sequences.npz"
```
所有 `sequences_` 开头的引用都要改(第537、564、573行):
```python
# 第564行
npz_path = DATA_PROCESSED / f"{city_key}_sequences.npz"
# 第573行
combined_npz = DATA_PROCESSED / "sequences_combined.npz" # 合并文件保持原名
```
- [ ] **Step 2: 提交**
```bash
git add src/data/preprocess.py
git commit -m "fix: 统一 NPZ 命名格式为 {city}_sequences.npz"
```
---
### Task 2: 运行预处理管线
**Files:** `src/data/preprocess.py` (无需修改,已改命名)
- [ ] **Step 1: 清理旧数据并运行预处理**
```bash
cd D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究
rm -f data/processed/*.npz data/processed/*.csv
uv run python -m src.data.preprocess
```
**预期输出:**
- 加载焦作 180 NC → 日聚合 → 特征工程 → 序列 14×N_feat
- 加载郑州 180 NC → 同上
- 保存: `jiaozuo_sequences.npz`, `zhengzhou_sequences.npz`, `sequences_combined.npz`, `features_combined.csv`
- 日志显示每个城市的 X/y shape 和标签分布
- [ ] **Step 2: 验证产出**
```bash
uv run python -c "
import numpy as np
for f in ['jiaozuo_sequences.npz', 'zhengzhou_sequences.npz', 'sequences_combined.npz']:
d = np.load(f'data/processed/{f}')
print(f'{f}: X{d[\"X\"].shape} y{d[\"y\"].shape}')
print(f' y unique counts: {[len(set(d[\"y\"][:,i])) for i in range(3)]}')
"
```
**预期:** 两个城市共约 10000+ 样本,y 三列各有 4 类
- [ ] **Step 3: 提交**
```bash
git add data/processed/
git commit -m "feat: ERA5 预处理完成,生成序列 NPZ 和特征 CSV"
```
---
### Task 3: 训练 LSTM-Attention 模型
**Files:** `src/models/train.py` (无需修改)
- [ ] **Step 1: 运行训练**
```bash
cd D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究
uv run python -m src.models.train
```
**预期输出:**
- "使用设备: cuda"
- 数据加载: X (N, 14, F), y (N, 3)
- 划分: 训练 ~70%, 验证 ~15%, 测试 ~15%
- 每 epoch 打印 loss/acc/f1
- 早停后保存 `outputs/models/best_model.pt`
- [ ] **Step 2: 验证产出**
```bash
ls -lh outputs/models/best_model.pt
ls -lh outputs/logs/training_history.json
```
- [ ] **Step 3: 提交**
```bash
git add outputs/models/best_model.pt outputs/logs/training_history.json
git commit -m "feat: LSTM-Attention 模型训练完成"
```
---
### Task 4: 训练 XGBoost 基线并评估
**Files:** `src/models/evaluate.py` (无需修改)
- [ ] **Step 1: 运行评估**
```bash
cd D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究
uv run python -m src.models.evaluate
```
**预期输出:**
- 混淆矩阵 × 3 时间尺度 (LSTM + XGBoost 对比)
- F1/Accuracy 对比柱状图
- 保存至 `outputs/figures/`
- [ ] **Step 2: 验证产出**
```bash
ls -lh outputs/figures/confusion_matrix.png outputs/figures/model_comparison.png
```
- [ ] **Step 3: 提交**
```bash
git add outputs/figures/
git commit -m "feat: 模型评估完成 — LSTM vs XGBoost 对比图表"
```
---
### Task 5: 启动 Web 大屏并验证
**Files:** `src/web/app.py`, `src/web/static/index.html` (无需修改)
- [ ] **Step 1: 启动 Flask**
```bash
cd D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究
uv run python -m src.web.app
```
- [ ] **Step 2: 浏览器验证**
打开 http://localhost:5000,检查:
- [ ] 6 面板均渲染(温度趋势/风险展示/人口饼图/时间柱状/暴露反应/历史回顾)
- [ ] API `/api/predict` 返回正确 JSON
- [ ] API `/api/history` 返回 90 天数据
- [ ] API `/api/stats` 返回统计摘要
- [ ] **Step 3: 截图保存**
```bash
# 用 Playwright 截取大屏截图
```
---
### Task 6: 编译 LaTeX 论文
**Files:** `thesis/main.tex`, `thesis/chapters/*.tex`
- [ ] **Step 1: 填充论文内容**
更新以下章节:
- `ch2-data-methods.tex`: 填入 ERA5 变量表、NOAA 体感温度公式、模型架构描述
- `ch3-model-design.tex`: LSTM-Attention 架构详述 (983K 参数)
- `ch4-experiments.tex`: 插入 `outputs/figures/` 中的评估图表
- `ch5-visualization.tex`: Web 大屏 6 面板截图与架构说明
- [ ] **Step 2: 编译论文**
```bash
cd thesis
make # xelatex + biber + xelatex + xelatex
```
- [ ] **Step 3: 验证 PDF**
```bash
ls -lh thesis/main.pdf
```
用 PDF 阅读器打开,检查: 中文渲染、图表清晰度、引用编号、页眉页脚
- [ ] **Step 4: 提交**
```bash
git add thesis/ thesis/main.pdf
git commit -m "feat: LaTeX 论文编译完成"
```
---
### Task 7: 最终推送
- [ ] **Step 1: 推送代码**
```bash
git push origin main
```
- [ ] **Step 2: 推送模型和图表 (如需要)**
较大文件可考虑 git-lfs 或单独存放