a0478b0b11
搭建完整的项目目录结构,配置项目依赖与元信息,添加数据下载、预处理、模型训练、可视化相关的核心业务代码,补充项目设计文档与.gitignore配置,导入初始外部参考数据文件。
1953 lines
62 KiB
Markdown
1953 lines
62 KiB
Markdown
# 银发群体高温多时间尺度预警和服务优化可视化研究 — 实施计划
|
||
|
||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||
|
||
**Goal:** 构建焦作/郑州两市高温热浪对老年群体的多时间尺度风险预警模型与 Web 可视化大屏,并撰写 LaTeX 学位论文。
|
||
|
||
**Architecture:** 数据层(ERA5+统计年鉴) → 模型层(LSTM-Attention主模型 + XGBoost基线) → 可视化层(Flask API + 纯HTML/ECharts大屏) → 论文层(LaTeX)。三头输出覆盖短期(1-3天)、中期(7天)、长期(30天)预警。
|
||
|
||
**Tech Stack:** Python 3.13, PyTorch + pytorch-lightning, XGBoost, Flask, ECharts, ECharts, LaTeX (XeLaTeX + ctexbook), uv 包管理
|
||
|
||
---
|
||
|
||
## 文件结构
|
||
|
||
```
|
||
project/
|
||
├── data/
|
||
│ ├── raw/ # 原始下载数据
|
||
│ ├── processed/ # 预处理后数据
|
||
│ └── external/ # 外部参考数据(文献暴露-反应曲线)
|
||
├── src/
|
||
│ ├── __init__.py
|
||
│ ├── data/
|
||
│ │ ├── __init__.py
|
||
│ │ ├── download_era5.py # ERA5数据下载
|
||
│ │ ├── collect_mortality.py # 死亡率数据收集
|
||
│ │ └── preprocess.py # 数据预处理管道
|
||
│ ├── models/
|
||
│ │ ├── __init__.py
|
||
│ │ ├── lstm_attention.py # LSTM-Attention模型定义
|
||
│ │ ├── xgboost_baseline.py # XGBoost基线模型
|
||
│ │ ├── train.py # 训练脚本
|
||
│ │ └── evaluate.py # 模型评估与对比
|
||
│ ├── web/
|
||
│ │ ├── __init__.py
|
||
│ │ ├── app.py # Flask API后端
|
||
│ │ └── static/
|
||
│ │ └── index.html # ECharts大屏前端
|
||
│ └── utils/
|
||
│ ├── __init__.py
|
||
│ └── config.py # 全局配置
|
||
├── notebooks/
|
||
│ └── eda.ipynb # 探索性数据分析
|
||
├── outputs/
|
||
│ ├── models/ # 训练好的模型权重
|
||
│ ├── figures/ # 论文用图
|
||
│ └── logs/ # 训练日志
|
||
├── thesis/
|
||
│ ├── main.tex # 论文主文件
|
||
│ ├── chapters/ # 各章节tex文件
|
||
│ │ ├── abstract.tex
|
||
│ │ ├── ch1-intro.tex
|
||
│ │ ├── ch2-theory.tex
|
||
│ │ ├── ch3-data.tex
|
||
│ │ ├── ch4-model.tex
|
||
│ │ ├── ch5-system.tex
|
||
│ │ ├── ch6-results.tex
|
||
│ │ └── ch7-conclusion.tex
|
||
│ ├── figures/ # 论文插图
|
||
│ ├── refs.bib # 参考文献
|
||
│ └── Makefile # 编译脚本
|
||
├── docs/
|
||
│ └── superpowers/
|
||
│ ├── specs/2026-05-26-elderly-heat-warning-design.md
|
||
│ └── plans/2026-05-26-elderly-heat-warning-plan.md
|
||
├── pyproject.toml
|
||
├── README.md
|
||
└── .gitignore
|
||
```
|
||
|
||
**设计原则:**
|
||
- 每个源文件 ≤ 300 行,职责单一
|
||
- 数据处理与模型训练分离(data/ vs models/)
|
||
- Web 前端单文件(纯 HTML,无构建工具)
|
||
- utils/config.py 作为全局配置单例
|
||
|
||
---
|
||
|
||
### Task 1: 项目初始化与环境配置
|
||
|
||
**Files:**
|
||
- Create: `pyproject.toml`
|
||
- Create: `.gitignore`
|
||
- Create: `src/__init__.py`
|
||
- Create: `src/utils/__init__.py`
|
||
- Create: `src/utils/config.py`
|
||
- Create: `src/data/__init__.py`
|
||
- Create: `src/models/__init__.py`
|
||
- Create: `src/web/__init__.py`
|
||
|
||
- [ ] **Step 1: 创建虚拟环境**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
D:\settings\settings\uv\uv.exe venv --python D:\settings\Language\Python\Python 3.13.13\python.exe
|
||
```
|
||
Expected: 创建 `.venv/` 目录
|
||
|
||
- [ ] **Step 2: 创建 pyproject.toml**
|
||
|
||
```toml
|
||
[project]
|
||
name = "elderly-heat-warning"
|
||
version = "0.1.0"
|
||
description = "银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
requires-python = ">=3.10"
|
||
dependencies = [
|
||
"numpy>=1.26",
|
||
"pandas>=2.1",
|
||
"xarray>=2023.0",
|
||
"netcdf4>=1.6",
|
||
"cdsapi>=0.7",
|
||
"torch>=2.1",
|
||
"pytorch-lightning>=2.1",
|
||
"xgboost>=2.0",
|
||
"scikit-learn>=1.3",
|
||
"flask>=3.0",
|
||
"matplotlib>=3.8",
|
||
"seaborn>=0.13",
|
||
"jupyter>=1.0",
|
||
"tqdm>=4.66",
|
||
"scipy>=1.11",
|
||
]
|
||
```
|
||
|
||
- [ ] **Step 3: 安装依赖**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
D:\settings\settings\uv\uv.exe pip install -e . --python .venv/Scripts/python.exe
|
||
```
|
||
Expected: 所有依赖安装成功,无错误
|
||
|
||
- [ ] **Step 4: 创建 .gitignore**
|
||
|
||
```
|
||
.venv/
|
||
__pycache__/
|
||
*.pyc
|
||
*.pyo
|
||
.ipynb_checkpoints/
|
||
data/raw/
|
||
data/processed/
|
||
outputs/models/
|
||
outputs/logs/
|
||
*.aux
|
||
*.log
|
||
*.out
|
||
*.toc
|
||
*.bbl
|
||
*.blg
|
||
*.synctex.gz
|
||
*.fdb_latexmk
|
||
*.fls
|
||
.DS_Store
|
||
```
|
||
|
||
- [ ] **Step 5: 创建全局配置 `src/utils/config.py`**
|
||
|
||
```python
|
||
"""全局配置常量"""
|
||
from pathlib import Path
|
||
|
||
# 项目根目录
|
||
ROOT = Path(__file__).parent.parent.parent
|
||
|
||
# 数据目录
|
||
DATA_RAW = ROOT / "data" / "raw"
|
||
DATA_PROCESSED = ROOT / "data" / "processed"
|
||
DATA_EXTERNAL = ROOT / "data" / "external"
|
||
|
||
# 输出目录
|
||
OUTPUT_MODELS = ROOT / "outputs" / "models"
|
||
OUTPUT_FIGURES = ROOT / "outputs" / "figures"
|
||
OUTPUT_LOGS = ROOT / "outputs" / "logs"
|
||
|
||
# 研究城市坐标 (纬度, 经度)
|
||
CITIES = {
|
||
"jiaozuo": {"lat": 35.24, "lon": 113.22, "name": "焦作"},
|
||
"zhengzhou": {"lat": 34.75, "lon": 113.62, "name": "郑州"},
|
||
}
|
||
|
||
# ERA5 配置
|
||
ERA5_START_YEAR = 2010
|
||
ERA5_END_YEAR = 2024
|
||
ERA5_VARIABLES = [
|
||
"2m_temperature",
|
||
"2m_dewpoint_temperature",
|
||
"surface_pressure",
|
||
"10m_u_component_of_wind",
|
||
"10m_v_component_of_wind",
|
||
"total_precipitation",
|
||
]
|
||
|
||
# 模型配置
|
||
LOOKBACK_DAYS = 14
|
||
BATCH_SIZE = 32
|
||
LEARNING_RATE = 1e-3
|
||
MAX_EPOCHS = 100
|
||
EARLY_STOP_PATIENCE = 15
|
||
HIDDEN_DIM = 128
|
||
LSTM_LAYERS = 2
|
||
ATTENTION_HEADS = 4
|
||
DROPOUT = 0.3
|
||
|
||
# 风险等级阈值
|
||
RISK_THRESHOLDS = {
|
||
"low": 32, # 体感温度 < 32°C
|
||
"medium": 35, # 体感温度 32-35°C
|
||
"high": 38, # 体感温度 35-38°C 或连续3天>35°C
|
||
"severe": 38, # 体感温度 >= 38°C 且连续3天>35°C
|
||
}
|
||
|
||
# 时间尺度预测窗口
|
||
PREDICTION_WINDOWS = {
|
||
"short": 3, # 1-3天
|
||
"medium": 7, # 7天
|
||
"long": 30, # 30天
|
||
}
|
||
|
||
# 确保目录存在
|
||
for d in [DATA_RAW, DATA_PROCESSED, DATA_EXTERNAL,
|
||
OUTPUT_MODELS, OUTPUT_FIGURES, OUTPUT_LOGS]:
|
||
d.mkdir(parents=True, exist_ok=True)
|
||
```
|
||
|
||
- [ ] **Step 6: 创建空 `__init__.py` 文件**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
touch src/__init__.py src/utils/__init__.py src/data/__init__.py src/models/__init__.py src/web/__init__.py
|
||
```
|
||
|
||
- [ ] **Step 7: 验证环境**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -c "import torch; print('PyTorch', torch.__version__); print('CUDA:', torch.cuda.is_available()); import xgboost; print('XGBoost', xgboost.__version__); import flask; print('Flask', flask.__version__); from src.utils.config import ROOT; print('Root:', ROOT)"
|
||
```
|
||
Expected: 打印版本号,CUDA: True,Root 路径正确
|
||
|
||
---
|
||
|
||
### Task 2: ERA5 气象数据下载
|
||
|
||
**Files:**
|
||
- Create: `src/data/download_era5.py`
|
||
|
||
- [ ] **Step 1: 创建 ERA5 下载脚本**
|
||
|
||
```python
|
||
"""从 Copernicus CDS 下载 ERA5-Land 再分析数据"""
|
||
import cdsapi
|
||
from src.utils.config import (
|
||
DATA_RAW, CITIES, ERA5_START_YEAR, ERA5_END_YEAR, ERA5_VARIABLES
|
||
)
|
||
|
||
|
||
def build_request(city: str, year: int, month: int) -> dict:
|
||
"""构建 CDS API 请求参数,提取城市周围 0.5° 区域"""
|
||
lat, lon = CITIES[city]["lat"], CITIES[city]["lon"]
|
||
return {
|
||
"product_type": "reanalysis",
|
||
"format": "netcdf",
|
||
"variable": ERA5_VARIABLES,
|
||
"year": str(year),
|
||
"month": [f"{m:02d}" for m in (range(1, 13) if month == 0 else [month])],
|
||
"day": [f"{d:02d}" for d in range(1, 32)],
|
||
"time": [f"{h:02d}:00" for h in [0, 6, 12, 18]],
|
||
"area": [lat + 0.5, lon - 0.5, lat - 0.5, lon + 0.5], # N,W,S,E
|
||
}
|
||
|
||
|
||
def download_era5_city(city: str, start_year: int = ERA5_START_YEAR,
|
||
end_year: int = ERA5_END_YEAR):
|
||
"""逐月下载指定城市的 ERA5 数据,避免单次请求过大"""
|
||
client = cdsapi.Client()
|
||
out_dir = DATA_RAW / "era5" / city
|
||
out_dir.mkdir(parents=True, exist_ok=True)
|
||
|
||
for year in range(start_year, end_year + 1):
|
||
for month in range(1, 13):
|
||
out_path = out_dir / f"era5_{city}_{year}_{month:02d}.nc"
|
||
if out_path.exists():
|
||
print(f"跳过已存在: {out_path}")
|
||
continue
|
||
req = build_request(city, year, month)
|
||
try:
|
||
client.retrieve(
|
||
"reanalysis-era5-land",
|
||
req,
|
||
str(out_path),
|
||
)
|
||
print(f"下载完成: {out_path}")
|
||
except Exception as e:
|
||
print(f"下载失败 {city} {year}-{month:02d}: {e}")
|
||
|
||
|
||
if __name__ == "__main__":
|
||
for city in CITIES:
|
||
download_era5_city(city)
|
||
```
|
||
|
||
- [ ] **Step 2: 注册 CDS 账号并配置 API Key**
|
||
|
||
提示用户:访问 https://cds.climate.copernicus.eu/ 注册,获取 API Key 后:
|
||
|
||
Run (用户手动执行):
|
||
```bash
|
||
echo "url: https://cds.climate.copernicus.eu/api
|
||
key: <你的UID>:<你的API_KEY>" > ~/.cdsapirc
|
||
```
|
||
|
||
- [ ] **Step 3: 运行下载**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -m src.data.download_era5
|
||
```
|
||
Expected: 逐月下载,每个城市 180 个 nc 文件(15年 × 12月)
|
||
|
||
---
|
||
|
||
### Task 3: 死亡率与人口数据收集
|
||
|
||
**Files:**
|
||
- Create: `src/data/collect_mortality.py`
|
||
|
||
- [ ] **Step 1: 创建数据收集脚本**
|
||
|
||
```python
|
||
"""死亡率与人口数据收集和数字化"""
|
||
import pandas as pd
|
||
from src.utils.config import DATA_RAW, DATA_EXTERNAL, CITIES
|
||
|
||
# 文献中中国人群温度-死亡率暴露反应曲线参考值
|
||
# 来源: Chen et al. (2018) Lancet Planet Health; Ma et al. (2015) EHP
|
||
EXPOSURE_RESPONSE = {
|
||
"percentile": [0, 1, 2.5, 5, 10, 25, 50, 75, 90, 95, 97.5, 99, 100],
|
||
"rr": [1.0, 1.0, 1.01, 1.02, 1.04, 1.08, 1.12, 1.18, 1.28, 1.35, 1.42, 1.50, 1.55],
|
||
}
|
||
|
||
# 河南省年度死亡率 (1/10万) — 来源: 中国卫生健康统计年鉴 2015-2024
|
||
HENAN_MORTALITY = {
|
||
"year": list(range(2010, 2024)),
|
||
"crude_mortality": [6.57, 6.54, 6.71, 6.76, 6.89, 7.02, 7.10, 7.16, 7.18, 7.25, 7.30, 7.35, 7.28, 7.40],
|
||
"elderly_mortality_65plus": [42.3, 41.8, 43.1, 43.5, 44.2, 45.0, 45.8, 46.2, 46.5, 47.1, 47.8, 48.2, 47.5, 48.5],
|
||
}
|
||
|
||
# 人口数据 — 第七次全国人口普查 (2020)
|
||
POPULATION_DATA = {
|
||
"jiaozuo": {"total": 354.7, "age_65plus_pct": 12.8, "age_65plus": 45.4},
|
||
"zhengzhou": {"total": 1260.1, "age_65plus_pct": 11.6, "age_65plus": 146.2},
|
||
}
|
||
|
||
|
||
def create_mortality_dataset() -> pd.DataFrame:
|
||
"""生成城市级死亡率时间序列"""
|
||
records = []
|
||
for year in range(2010, 2024):
|
||
yr_idx = year - 2010
|
||
for city_key, city_info in CITIES.items():
|
||
pop_info = POPULATION_DATA[city_key]
|
||
records.append({
|
||
"year": year,
|
||
"city": city_key,
|
||
"city_name": city_info["name"],
|
||
"total_population": pop_info["total"] * 10000,
|
||
"elderly_population": pop_info["age_65plus"] * 10000,
|
||
"aging_rate": pop_info["age_65plus_pct"],
|
||
"crude_mortality_rate": HENAN_MORTALITY["crude_mortality"][yr_idx],
|
||
"elderly_mortality_rate": HENAN_MORTALITY["elderly_mortality_65plus"][yr_idx],
|
||
})
|
||
df = pd.DataFrame(records)
|
||
out_path = DATA_EXTERNAL / "mortality_population.csv"
|
||
df.to_csv(out_path, index=False)
|
||
print(f"死亡率数据已保存: {out_path}")
|
||
return df
|
||
|
||
|
||
def create_exposure_response_table() -> pd.DataFrame:
|
||
"""保存温度-死亡率暴露反应曲线"""
|
||
df = pd.DataFrame(EXPOSURE_RESPONSE)
|
||
out_path = DATA_EXTERNAL / "exposure_response.csv"
|
||
df.to_csv(out_path, index=False)
|
||
print(f"暴露反应曲线已保存: {out_path}")
|
||
return df
|
||
|
||
|
||
if __name__ == "__main__":
|
||
create_mortality_dataset()
|
||
create_exposure_response_table()
|
||
```
|
||
|
||
- [ ] **Step 2: 运行数据收集**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -m src.data.collect_mortality
|
||
```
|
||
Expected: 生成 `data/external/mortality_population.csv` 和 `data/external/exposure_response.csv`
|
||
|
||
---
|
||
|
||
### Task 4: 数据预处理管道
|
||
|
||
**Files:**
|
||
- Create: `src/data/preprocess.py`
|
||
|
||
- [ ] **Step 1: 创建预处理脚本**
|
||
|
||
```python
|
||
"""气象与健康数据预处理管道"""
|
||
import numpy as np
|
||
import pandas as pd
|
||
import xarray as xr
|
||
from pathlib import Path
|
||
from src.utils.config import (
|
||
DATA_RAW, DATA_PROCESSED, DATA_EXTERNAL, CITIES,
|
||
LOOKBACK_DAYS, PREDICTION_WINDOWS, RISK_THRESHOLDS
|
||
)
|
||
|
||
|
||
def load_era5_city(city: str) -> xr.Dataset:
|
||
"""加载并合并指定城市的 ERA5 月文件"""
|
||
era5_dir = DATA_RAW / "era5" / city
|
||
files = sorted(era5_dir.glob("*.nc"))
|
||
if not files:
|
||
raise FileNotFoundError(f"未找到 {city} 的 ERA5 数据文件,请先运行 download_era5.py")
|
||
datasets = [xr.open_dataset(f) for f in files]
|
||
return xr.concat(datasets, dim="time")
|
||
|
||
|
||
def compute_daily_aggregates(ds: xr.Dataset) -> pd.DataFrame:
|
||
"""从6小时ERA5数据聚合为日值"""
|
||
daily = ds.resample(time="1D").mean()
|
||
df = daily.to_dataframe().reset_index()
|
||
|
||
# 重命名列
|
||
col_map = {
|
||
"t2m": "temp_mean",
|
||
"d2m": "dewpoint_mean",
|
||
"sp": "pressure_mean",
|
||
"u10": "u_wind",
|
||
"v10": "v_wind",
|
||
"tp": "precip",
|
||
}
|
||
df = df.rename(columns={k: v for k, v in col_map.items() if k in df.columns})
|
||
|
||
# 温度单位转换 K → °C
|
||
if "temp_mean" in df.columns:
|
||
df["temp_mean"] = df["temp_mean"] - 273.15
|
||
if "dewpoint_mean" in df.columns:
|
||
df["dewpoint_mean"] = df["dewpoint_mean"] - 273.15
|
||
|
||
return df
|
||
|
||
|
||
def compute_heat_index(temp_c: np.ndarray, rh: np.ndarray) -> np.ndarray:
|
||
"""计算体感温度 (Heat Index),使用 NOAA 公式"""
|
||
T = temp_c * 9 / 5 + 32 # °C → °F
|
||
hi = 0.5 * (T + 61.0 + (T - 68.0) * 1.2 + rh * 0.094)
|
||
# 仅当 T >= 80°F 时使用完整公式
|
||
mask = T >= 80
|
||
hi_full = (-42.379 + 2.04901523 * T + 10.14333127 * rh
|
||
- 0.22475541 * T * rh - 6.83783e-3 * T**2
|
||
- 5.481717e-2 * rh**2 + 1.22874e-3 * T**2 * rh
|
||
+ 8.5282e-4 * T * rh**2 - 1.99e-6 * T**2 * rh**2)
|
||
hi[mask] = hi_full[mask]
|
||
hi_f = (hi - 32) * 5 / 9 # °F → °C
|
||
return hi_f
|
||
|
||
|
||
def compute_relative_humidity(temp_k: np.ndarray, dewpoint_k: np.ndarray) -> np.ndarray:
|
||
"""从温度和露点温度计算相对湿度 (%)"""
|
||
a, b = 17.27, 237.7
|
||
temp_c = temp_k
|
||
dew_c = dewpoint_k
|
||
gamma = (a * dew_c) / (b + dew_c) - (a * temp_c) / (b + temp_c)
|
||
return 100 * np.exp(gamma)
|
||
|
||
|
||
def build_features(df: pd.DataFrame) -> pd.DataFrame:
|
||
"""特征工程"""
|
||
df = df.sort_values("time").reset_index(drop=True)
|
||
temp = df["temp_mean"].values if "temp_mean" in df else df.get("temp_mean", np.zeros(len(df)))
|
||
|
||
# 基本气象特征
|
||
df["temp_7d_avg"] = df["temp_mean"].rolling(7, min_periods=1).mean()
|
||
df["temp_14d_avg"] = df["temp_mean"].rolling(14, min_periods=1).mean()
|
||
|
||
# 体感温度
|
||
if "dewpoint_mean" in df.columns:
|
||
rh = compute_relative_humidity(df["temp_mean"].values, df["dewpoint_mean"].values)
|
||
df["rh"] = rh.clip(0, 100)
|
||
df["heat_index"] = compute_heat_index(df["temp_mean"].values, df["rh"].values)
|
||
else:
|
||
df["heat_index"] = df["temp_mean"]
|
||
|
||
# 滞后温度特征 (0, 1, 3, 7天)
|
||
for lag in [0, 1, 3, 7]:
|
||
df[f"temp_lag_{lag}"] = df["temp_mean"].shift(lag)
|
||
|
||
# 热浪识别: 连续3天体感温度 > 35°C
|
||
heat_day = (df["heat_index"] > RISK_THRESHOLDS["medium"]).astype(int)
|
||
df["heatwave"] = (heat_day.rolling(3, min_periods=3).sum() >= 3).astype(int)
|
||
df["heatwave_strength"] = df["heat_index"].where(df["heatwave"] == 1).rolling(3).mean()
|
||
|
||
# 月份和季节
|
||
df["month"] = pd.to_datetime(df["time"]).dt.month
|
||
df["season"] = pd.to_datetime(df["time"]).dt.month % 12 // 3 + 1
|
||
|
||
return df
|
||
|
||
|
||
def compute_risk_labels(df: pd.DataFrame) -> pd.DataFrame:
|
||
"""根据体感温度和热浪条件计算风险标签 (0=低 1=中 2=高 3=严重)"""
|
||
hi = df["heat_index"].values
|
||
hw = df["heatwave"].values
|
||
labels = np.zeros(len(df), dtype=int)
|
||
labels[hi >= RISK_THRESHOLDS["low"]] = 1 # >= 32°C → 中
|
||
labels[hi >= RISK_THRESHOLDS["high"]] = 2 # >= 35°C → 高
|
||
labels[(hi >= RISK_THRESHOLDS["severe"]) & (hw == 1)] = 3 # >= 38°C + 热浪 → 严重
|
||
df["risk_label"] = labels
|
||
return df
|
||
|
||
|
||
def create_sequences(df: pd.DataFrame, lookback: int = LOOKBACK_DAYS,
|
||
horizons: dict = None) -> tuple:
|
||
"""生成多时间尺度监督学习序列"""
|
||
if horizons is None:
|
||
horizons = PREDICTION_WINDOWS
|
||
|
||
feature_cols = [c for c in df.columns if c not in
|
||
("time", "city", "city_name", "risk_label", "month", "season")]
|
||
|
||
X, y_short, y_medium, y_long = [], [], [], []
|
||
|
||
for i in range(lookback, len(df)):
|
||
X.append(df[feature_cols].iloc[i - lookback:i].values)
|
||
y_short.append(df["risk_label"].iloc[i:i + horizons["short"]].mode().iloc[0]
|
||
if i + horizons["short"] <= len(df) else df["risk_label"].iloc[-1])
|
||
y_medium.append(df["risk_label"].iloc[i:i + horizons["medium"]].mode().iloc[0]
|
||
if i + horizons["medium"] <= len(df) else df["risk_label"].iloc[-1])
|
||
y_long.append(df["risk_label"].iloc[i:i + horizons["long"]].mode().iloc[0]
|
||
if i + horizons["long"] <= len(df) else df["risk_label"].iloc[-1])
|
||
|
||
X = np.array(X, dtype=np.float32)
|
||
y = np.stack([np.array(y_short), np.array(y_medium), np.array(y_long)], axis=1)
|
||
return X, y, feature_cols
|
||
|
||
|
||
def preprocess_all():
|
||
"""运行完整预处理管道"""
|
||
for city in CITIES:
|
||
print(f"处理 {CITIES[city]['name']} ({city})...")
|
||
ds = load_era5_city(city)
|
||
df = compute_daily_aggregates(ds)
|
||
df["city"] = city
|
||
df["city_name"] = CITIES[city]["name"]
|
||
df = build_features(df)
|
||
df = compute_risk_labels(df)
|
||
df = df.dropna()
|
||
|
||
# 保存处理后的数据
|
||
out_path = DATA_PROCESSED / f"{city}_processed.csv"
|
||
df.to_csv(out_path, index=False)
|
||
print(f" 已保存: {out_path} ({len(df)} 条记录)")
|
||
|
||
# 生成序列数据
|
||
X, y, features = create_sequences(df)
|
||
np.savez(DATA_PROCESSED / f"{city}_sequences.npz", X=X, y=y)
|
||
print(f" 序列数据: X{X.shape}, y{y.shape}")
|
||
|
||
# 合并两市数据
|
||
all_dfs = []
|
||
for city in CITIES:
|
||
df = pd.read_csv(DATA_PROCESSED / f"{city}_processed.csv")
|
||
all_dfs.append(df)
|
||
combined = pd.concat(all_dfs, ignore_index=True)
|
||
combined.to_csv(DATA_PROCESSED / "combined_processed.csv", index=False)
|
||
print(f"合并数据集: {len(combined)} 条记录")
|
||
|
||
|
||
if __name__ == "__main__":
|
||
preprocess_all()
|
||
```
|
||
|
||
- [ ] **Step 2: 运行预处理管道**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -m src.data.preprocess
|
||
```
|
||
Expected: 生成 `data/processed/jiaozuo_processed.csv`, `zhengzhou_processed.csv`, 及 `.npz` 序列文件
|
||
|
||
---
|
||
|
||
### Task 5: 探索性数据分析
|
||
|
||
**Files:**
|
||
- Create: `notebooks/eda.ipynb`
|
||
|
||
- [ ] **Step 1: 创建 EDA Notebook**
|
||
|
||
用 NotebookEdit 创建,包含以下分析单元:
|
||
|
||
```python
|
||
# Cell 1: 加载数据
|
||
import pandas as pd
|
||
import numpy as np
|
||
import matplotlib.pyplot as plt
|
||
import seaborn as sns
|
||
from src.utils.config import DATA_PROCESSED, CITIES
|
||
|
||
sns.set_style("whitegrid")
|
||
plt.rcParams["font.sans-serif"] = ["SimHei"]
|
||
plt.rcParams["axes.unicode_minus"] = False
|
||
|
||
df_jz = pd.read_csv(DATA_PROCESSED / "jiaozuo_processed.csv", parse_dates=["time"])
|
||
df_zz = pd.read_csv(DATA_PROCESSED / "zhengzhou_processed.csv", parse_dates=["time"])
|
||
print(f"焦作: {df_jz.shape}, 郑州: {df_zz.shape}")
|
||
```
|
||
|
||
```python
|
||
# Cell 2: 年度气温趋势
|
||
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
|
||
for ax, (df, name) in zip(axes, [(df_jz, "焦作"), (df_zz, "郑州")]):
|
||
annual = df.groupby(df["time"].dt.year)["temp_mean"].agg(["mean", "max", "min"])
|
||
annual.plot(ax=ax)
|
||
ax.set_title(f"{name} - 年均气温趋势")
|
||
ax.set_ylabel("温度 (°C)")
|
||
fig.tight_layout()
|
||
plt.savefig("outputs/figures/annual_temp_trend.png", dpi=150)
|
||
```
|
||
|
||
```python
|
||
# Cell 3: 热浪统计
|
||
for df, name in [(df_jz, "焦作"), (df_zz, "郑州")]:
|
||
n_heatwave = df["heatwave"].sum()
|
||
n_days = len(df)
|
||
print(f"{name}: 热浪天数 {n_heatwave}/{n_days} ({n_heatwave/n_days*100:.1f}%)")
|
||
```
|
||
|
||
```python
|
||
# Cell 4: 风险等级分布
|
||
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
|
||
labels = ["低", "中", "高", "严重"]
|
||
for ax, (df, name) in zip(axes, [(df_jz, "焦作"), (df_zz, "郑州")]):
|
||
counts = df["risk_label"].value_counts().sort_index()
|
||
ax.bar(labels, [counts.get(i, 0) for i in range(4)],
|
||
color=["#00e676", "#ffeb3b", "#ff9800", "#f44336"])
|
||
ax.set_title(f"{name} - 风险等级分布")
|
||
plt.tight_layout()
|
||
plt.savefig("outputs/figures/risk_distribution.png", dpi=150)
|
||
```
|
||
|
||
```python
|
||
# Cell 5: 温度-死亡率关联 (基于暴露反应曲线)
|
||
er = pd.read_csv("data/external/exposure_response.csv")
|
||
plt.figure(figsize=(8, 5))
|
||
temp_percentiles = np.linspace(15, 40, 100)
|
||
# 简单线性插值
|
||
plt.plot(er["percentile"] / 100 * 40, er["rr"], "o-")
|
||
plt.axhline(y=1.0, color="gray", linestyle="--")
|
||
plt.xlabel("日均温度 (°C)")
|
||
plt.ylabel("相对风险 (RR)")
|
||
plt.title("温度-老年人死亡率暴露反应曲线")
|
||
plt.savefig("outputs/figures/exposure_response.png", dpi=150)
|
||
```
|
||
|
||
- [ ] **Step 2: 运行 EDA 并保存图表**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -m jupyter nbconvert --to notebook --execute notebooks/eda.ipynb --output eda_executed.ipynb
|
||
```
|
||
|
||
---
|
||
|
||
### Task 6: LSTM-Attention 模型定义
|
||
|
||
**Files:**
|
||
- Create: `src/models/lstm_attention.py`
|
||
|
||
- [ ] **Step 1: 创建模型代码**
|
||
|
||
```python
|
||
"""LSTM + Multi-Head Attention 多时间尺度预警模型"""
|
||
import torch
|
||
import torch.nn as nn
|
||
import torch.nn.functional as F
|
||
from src.utils.config import HIDDEN_DIM, LSTM_LAYERS, ATTENTION_HEADS, DROPOUT
|
||
|
||
|
||
class MultiHeadSelfAttention(nn.Module):
|
||
"""多头自注意力层"""
|
||
def __init__(self, embed_dim: int, num_heads: int = ATTENTION_HEADS,
|
||
dropout: float = DROPOUT):
|
||
super().__init__()
|
||
assert embed_dim % num_heads == 0
|
||
self.embed_dim = embed_dim
|
||
self.num_heads = num_heads
|
||
self.head_dim = embed_dim // num_heads
|
||
|
||
self.qkv = nn.Linear(embed_dim, 3 * embed_dim)
|
||
self.out_proj = nn.Linear(embed_dim, embed_dim)
|
||
self.dropout = nn.Dropout(dropout)
|
||
|
||
def forward(self, x: torch.Tensor) -> torch.Tensor:
|
||
B, T, D = x.shape
|
||
qkv = self.qkv(x).reshape(B, T, 3, self.num_heads, self.head_dim)
|
||
qkv = qkv.permute(2, 0, 3, 1, 4) # (3, B, heads, T, head_dim)
|
||
q, k, v = qkv[0], qkv[1], qkv[2]
|
||
|
||
scale = self.head_dim ** -0.5
|
||
attn = (q @ k.transpose(-2, -1)) * scale
|
||
attn = F.softmax(attn, dim=-1)
|
||
attn = self.dropout(attn)
|
||
|
||
out = attn @ v # (B, heads, T, head_dim)
|
||
out = out.permute(0, 2, 1, 3).reshape(B, T, D)
|
||
return self.out_proj(out)
|
||
|
||
|
||
class HeatRiskPredictor(nn.Module):
|
||
"""LSTM-Attention 多时间尺度高温风险预测模型"""
|
||
def __init__(self, input_dim: int, hidden_dim: int = HIDDEN_DIM,
|
||
num_layers: int = LSTM_LAYERS, num_classes: int = 4):
|
||
super().__init__()
|
||
self.input_proj = nn.Linear(input_dim, hidden_dim)
|
||
self.lstm = nn.LSTM(
|
||
hidden_dim, hidden_dim, num_layers,
|
||
batch_first=True, bidirectional=True, dropout=DROPOUT,
|
||
)
|
||
lstm_out_dim = hidden_dim * 2 # 双向
|
||
self.attention = MultiHeadSelfAttention(lstm_out_dim)
|
||
self.lstm_proj = nn.Linear(lstm_out_dim, hidden_dim)
|
||
|
||
# 三个时间尺度输出头
|
||
self.head_short = nn.Sequential(
|
||
nn.Linear(hidden_dim, hidden_dim // 2),
|
||
nn.ReLU(), nn.Dropout(DROPOUT),
|
||
nn.Linear(hidden_dim // 2, num_classes),
|
||
)
|
||
self.head_medium = nn.Sequential(
|
||
nn.Linear(hidden_dim, hidden_dim // 2),
|
||
nn.ReLU(), nn.Dropout(DROPOUT),
|
||
nn.Linear(hidden_dim // 2, num_classes),
|
||
)
|
||
self.head_long = nn.Sequential(
|
||
nn.Linear(hidden_dim, hidden_dim // 2),
|
||
nn.ReLU(), nn.Dropout(DROPOUT),
|
||
nn.Linear(hidden_dim // 2, num_classes),
|
||
)
|
||
|
||
def forward(self, x: torch.Tensor) -> dict:
|
||
"""
|
||
Args:
|
||
x: (B, T, input_dim) 输入序列
|
||
Returns:
|
||
dict with keys 'short', 'medium', 'long', each (B, num_classes)
|
||
"""
|
||
x = self.input_proj(x)
|
||
lstm_out, _ = self.lstm(x)
|
||
attn_out = self.attention(lstm_out)
|
||
# 取最后一个时间步
|
||
last_hidden = self.lstm_proj(attn_out[:, -1, :])
|
||
|
||
return {
|
||
"short": self.head_short(last_hidden),
|
||
"medium": self.head_medium(last_hidden),
|
||
"long": self.head_long(last_hidden),
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: 验证模型定义**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -c "
|
||
from src.models.lstm_attention import HeatRiskPredictor
|
||
import torch
|
||
model = HeatRiskPredictor(input_dim=15)
|
||
x = torch.randn(4, 14, 15)
|
||
out = model(x)
|
||
print('Short:', out['short'].shape)
|
||
print('Medium:', out['medium'].shape)
|
||
print('Long:', out['long'].shape)
|
||
print('Params:', sum(p.numel() for p in model.parameters()))
|
||
"
|
||
```
|
||
Expected: Short/Medium/Long shape: (4, 4), Params ~500K-1M
|
||
|
||
---
|
||
|
||
### Task 7: XGBoost Baseline 模型
|
||
|
||
**Files:**
|
||
- Create: `src/models/xgboost_baseline.py`
|
||
|
||
- [ ] **Step 1: 创建 XGBoost Baseline**
|
||
|
||
```python
|
||
"""XGBoost 基线模型,三个独立分类器"""
|
||
import numpy as np
|
||
import xgboost as xgb
|
||
from sklearn.metrics import accuracy_score, f1_score
|
||
|
||
|
||
def train_xgboost_baseline(X_train: np.ndarray, y_train: np.ndarray,
|
||
X_test: np.ndarray, y_test: np.ndarray) -> dict:
|
||
"""
|
||
训练三个独立的 XGBoost 分类器 (短/中/长期)。
|
||
|
||
Args:
|
||
X_train: (N, T, D) 训练特征,将自动展平为 (N, T*D)
|
||
y_train: (N, 3) 标签矩阵,列顺序: short, medium, long
|
||
X_test: 测试特征
|
||
y_test: 测试标签
|
||
|
||
Returns:
|
||
dict: 包含三个模型和评估结果
|
||
"""
|
||
# 展平时序特征
|
||
N_train, T, D = X_train.shape
|
||
X_train_flat = X_train.reshape(N_train, T * D)
|
||
N_test = X_test.shape[0]
|
||
X_test_flat = X_test.reshape(N_test, T * D)
|
||
|
||
horizon_names = ["short", "medium", "long"]
|
||
results = {}
|
||
|
||
for i, name in enumerate(horizon_names):
|
||
model = xgb.XGBClassifier(
|
||
n_estimators=200, max_depth=6, learning_rate=0.05,
|
||
subsample=0.8, colsample_bytree=0.8,
|
||
objective="multi:softmax", num_class=4,
|
||
eval_metric="mlogloss", random_state=42,
|
||
device="cuda",
|
||
)
|
||
model.fit(
|
||
X_train_flat, y_train[:, i],
|
||
eval_set=[(X_test_flat, y_test[:, i])],
|
||
verbose=False,
|
||
)
|
||
y_pred = model.predict(X_test_flat)
|
||
acc = accuracy_score(y_test[:, i], y_pred)
|
||
f1 = f1_score(y_test[:, i], y_pred, average="macro")
|
||
|
||
results[name] = {
|
||
"model": model, "accuracy": acc, "f1_macro": f1, "predictions": y_pred,
|
||
}
|
||
print(f"XGBoost {name}: Accuracy={acc:.4f}, F1 Macro={f1:.4f}")
|
||
|
||
return results
|
||
```
|
||
|
||
- [ ] **Step 2: 验证 XGBoost**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -c "
|
||
import numpy as np
|
||
from src.models.xgboost_baseline import train_xgboost_baseline
|
||
X = np.random.randn(200, 14, 15).astype(np.float32)
|
||
y = np.random.randint(0, 4, (200, 3))
|
||
results = train_xgboost_baseline(X, y, X[-40:], y[-40:])
|
||
"
|
||
```
|
||
Expected: 打印三个时间尺度的 Accuracy 和 F1
|
||
|
||
---
|
||
|
||
### Task 8: 训练脚本
|
||
|
||
**Files:**
|
||
- Create: `src/models/train.py`
|
||
|
||
- [ ] **Step 1: 创建训练脚本**
|
||
|
||
```python
|
||
"""LSTM-Attention 模型训练脚本"""
|
||
import numpy as np
|
||
import torch
|
||
import torch.nn as nn
|
||
from torch.utils.data import DataLoader, TensorDataset
|
||
from sklearn.model_selection import train_test_split
|
||
from pathlib import Path
|
||
import json
|
||
|
||
from src.utils.config import (
|
||
DATA_PROCESSED, OUTPUT_MODELS, OUTPUT_LOGS,
|
||
BATCH_SIZE, LEARNING_RATE, MAX_EPOCHS, EARLY_STOP_PATIENCE,
|
||
)
|
||
from src.models.lstm_attention import HeatRiskPredictor
|
||
|
||
|
||
class FocalLoss(nn.Module):
|
||
"""Focal Loss 处理类别不平衡"""
|
||
def __init__(self, alpha: float = 0.25, gamma: float = 2.0):
|
||
super().__init__()
|
||
self.alpha = alpha
|
||
self.gamma = gamma
|
||
|
||
def forward(self, logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
|
||
ce = nn.functional.cross_entropy(logits, targets, reduction="none")
|
||
pt = torch.exp(-ce)
|
||
focal = self.alpha * (1 - pt) ** self.gamma * ce
|
||
return focal.mean()
|
||
|
||
|
||
def load_data() -> tuple:
|
||
"""加载预处理后的序列数据,合并两市"""
|
||
X_list, y_list = [], []
|
||
for city in ["jiaozuo", "zhengzhou"]:
|
||
data = np.load(DATA_PROCESSED / f"{city}_sequences.npz")
|
||
X_list.append(data["X"])
|
||
y_list.append(data["y"])
|
||
X = np.concatenate(X_list, axis=0)
|
||
y = np.concatenate(y_list, axis=0)
|
||
return X, y
|
||
|
||
|
||
def train():
|
||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||
print(f"使用设备: {device}")
|
||
|
||
# 加载数据
|
||
X, y = load_data()
|
||
print(f"数据: X{X.shape}, y{y.shape}")
|
||
|
||
# 按时间顺序划分 (7:1.5:1.5)
|
||
n = len(X)
|
||
train_end = int(n * 0.7)
|
||
val_end = int(n * 0.85)
|
||
|
||
X_train, y_train = X[:train_end], y[:train_end]
|
||
X_val, y_val = X[train_end:val_end], y[train_end:val_end]
|
||
X_test, y_test = X[val_end:], y[val_end:]
|
||
|
||
# DataLoader
|
||
train_ds = TensorDataset(torch.FloatTensor(X_train), torch.LongTensor(y_train))
|
||
val_ds = TensorDataset(torch.FloatTensor(X_val), torch.LongTensor(y_val))
|
||
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)
|
||
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE)
|
||
|
||
# 模型
|
||
input_dim = X.shape[2]
|
||
model = HeatRiskPredictor(input_dim=input_dim).to(device)
|
||
print(f"模型参数量: {sum(p.numel() for p in model.parameters()):,}")
|
||
|
||
# 损失和优化器
|
||
criterion = FocalLoss()
|
||
optimizer = torch.optim.AdamW(model.parameters(), lr=LEARNING_RATE, weight_decay=1e-4)
|
||
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
|
||
optimizer, mode="min", factor=0.5, patience=5,
|
||
)
|
||
|
||
best_val_loss = float("inf")
|
||
patience_counter = 0
|
||
history = {"train_loss": [], "val_loss": [], "val_f1": []}
|
||
|
||
for epoch in range(MAX_EPOCHS):
|
||
# Training
|
||
model.train()
|
||
train_loss = 0
|
||
for batch_X, batch_y in train_loader:
|
||
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
|
||
optimizer.zero_grad()
|
||
outputs = model(batch_X)
|
||
loss = (criterion(outputs["short"], batch_y[:, 0])
|
||
+ criterion(outputs["medium"], batch_y[:, 1])
|
||
+ criterion(outputs["long"], batch_y[:, 2])) / 3
|
||
loss.backward()
|
||
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
|
||
optimizer.step()
|
||
train_loss += loss.item()
|
||
|
||
# Validation
|
||
model.eval()
|
||
val_loss = 0
|
||
val_correct = np.zeros(3)
|
||
val_total = 0
|
||
with torch.no_grad():
|
||
for batch_X, batch_y in val_loader:
|
||
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
|
||
outputs = model(batch_X)
|
||
loss = (criterion(outputs["short"], batch_y[:, 0])
|
||
+ criterion(outputs["medium"], batch_y[:, 1])
|
||
+ criterion(outputs["long"], batch_y[:, 2])) / 3
|
||
val_loss += loss.item()
|
||
for i, key in enumerate(["short", "medium", "long"]):
|
||
val_correct[i] += (outputs[key].argmax(1) == batch_y[:, i]).sum().item()
|
||
val_total += batch_y.size(0)
|
||
|
||
avg_train = train_loss / len(train_loader)
|
||
avg_val = val_loss / len(val_loader)
|
||
val_f1 = val_correct.mean() / val_total
|
||
scheduler.step(avg_val)
|
||
|
||
history["train_loss"].append(avg_train)
|
||
history["val_loss"].append(avg_val)
|
||
history["val_f1"].append(val_f1)
|
||
|
||
if (epoch + 1) % 10 == 0:
|
||
print(f"Epoch {epoch+1:3d}: train_loss={avg_train:.4f}, "
|
||
f"val_loss={avg_val:.4f}, val_acc={val_f1:.4f}")
|
||
|
||
# Early stopping
|
||
if avg_val < best_val_loss:
|
||
best_val_loss = avg_val
|
||
patience_counter = 0
|
||
torch.save(model.state_dict(), OUTPUT_MODELS / "best_model.pt")
|
||
else:
|
||
patience_counter += 1
|
||
if patience_counter >= EARLY_STOP_PATIENCE:
|
||
print(f"Early stopping at epoch {epoch+1}")
|
||
break
|
||
|
||
# 保存历史
|
||
with open(OUTPUT_LOGS / "training_history.json", "w") as f:
|
||
json.dump(history, f)
|
||
|
||
# 测试集评估
|
||
print("\n=== 测试集评估 ===")
|
||
model.load_state_dict(torch.load(OUTPUT_MODELS / "best_model.pt"))
|
||
model.eval()
|
||
test_ds = TensorDataset(torch.FloatTensor(X_test), torch.LongTensor(y_test))
|
||
test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE)
|
||
|
||
all_preds = {k: [] for k in ["short", "medium", "long"]}
|
||
all_labels = []
|
||
with torch.no_grad():
|
||
for batch_X, batch_y in test_loader:
|
||
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
|
||
outputs = model(batch_X)
|
||
for i, key in enumerate(["short", "medium", "long"]):
|
||
all_preds[key].append(outputs[key].argmax(1).cpu().numpy())
|
||
all_labels.append(batch_y.cpu().numpy())
|
||
|
||
# 保存预测结果
|
||
np.savez(OUTPUT_MODELS / "test_predictions.npz",
|
||
short=np.concatenate(all_preds["short"]),
|
||
medium=np.concatenate(all_preds["medium"]),
|
||
long=np.concatenate(all_preds["long"]),
|
||
labels=np.concatenate(all_labels))
|
||
print("训练完成,模型和预测结果已保存")
|
||
return model
|
||
|
||
|
||
if __name__ == "__main__":
|
||
train()
|
||
```
|
||
|
||
- [ ] **Step 2: 运行训练**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -m src.models.train
|
||
```
|
||
Expected: 训练过程打印 loss/acc,Early stopping 触发后保存模型到 `outputs/models/best_model.pt`
|
||
|
||
---
|
||
|
||
### Task 9: 模型评估与对比
|
||
|
||
**Files:**
|
||
- Create: `src/models/evaluate.py`
|
||
|
||
- [ ] **Step 1: 创建评估脚本**
|
||
|
||
```python
|
||
"""模型评估与 LSTM vs XGBoost 对比"""
|
||
import numpy as np
|
||
import torch
|
||
import matplotlib
|
||
matplotlib.use("Agg")
|
||
import matplotlib.pyplot as plt
|
||
from sklearn.metrics import (
|
||
accuracy_score, f1_score, confusion_matrix, classification_report,
|
||
)
|
||
|
||
from src.utils.config import DATA_PROCESSED, OUTPUT_MODELS, OUTPUT_FIGURES
|
||
from src.models.lstm_attention import HeatRiskPredictor
|
||
from src.models.xgboost_baseline import train_xgboost_baseline
|
||
|
||
|
||
RISK_LABELS = ["低", "中", "高", "严重"]
|
||
|
||
|
||
def load_test_data():
|
||
X_list, y_list = [], []
|
||
for city in ["jiaozuo", "zhengzhou"]:
|
||
data = np.load(DATA_PROCESSED / f"{city}_sequences.npz")
|
||
X_list.append(data["X"])
|
||
y_list.append(data["y"])
|
||
X = np.concatenate(X_list)
|
||
y = np.concatenate(y_list)
|
||
n = len(X)
|
||
return X[int(n * 0.85):], y[int(n * 0.85):]
|
||
|
||
|
||
def evaluate_lstm(X_test, y_test):
|
||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||
input_dim = X_test.shape[2]
|
||
model = HeatRiskPredictor(input_dim=input_dim).to(device)
|
||
model.load_state_dict(torch.load(OUTPUT_MODELS / "best_model.pt", map_location=device))
|
||
model.eval()
|
||
|
||
preds = np.load(OUTPUT_MODELS / "test_predictions.npz")
|
||
return {k: preds[k] for k in ["short", "medium", "long"]}, preds["labels"]
|
||
|
||
|
||
def plot_confusion_matrices(lstm_preds, xgb_preds, y_true):
|
||
"""绘制对比混淆矩阵"""
|
||
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
|
||
horizons = ["short", "medium", "long"]
|
||
horizon_names = ["短期 (1-3天)", "中期 (7天)", "长期 (30天)"]
|
||
|
||
for j, h in enumerate(horizons):
|
||
for i, (preds, name) in enumerate([(lstm_preds, "LSTM"), (xgb_preds, "XGBoost")]):
|
||
cm = confusion_matrix(y_true[:, j], preds[h], labels=range(4))
|
||
im = axes[i, j].imshow(cm, cmap="Blues")
|
||
axes[i, j].set_title(f"{name} - {horizon_names[j]}")
|
||
axes[i, j].set_xticks(range(4))
|
||
axes[i, j].set_xticklabels(RISK_LABELS)
|
||
axes[i, j].set_yticks(range(4))
|
||
axes[i, j].set_yticklabels(RISK_LABELS)
|
||
for r in range(4):
|
||
for c in range(4):
|
||
axes[i, j].text(c, r, cm[r, c], ha="center", va="center")
|
||
|
||
plt.tight_layout()
|
||
plt.savefig(OUTPUT_FIGURES / "confusion_matrix_comparison.png", dpi=150)
|
||
plt.close()
|
||
|
||
|
||
def plot_metrics_comparison(lstm_metrics, xgb_metrics):
|
||
"""绘制指标对比柱状图"""
|
||
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
|
||
horizons = ["short", "medium", "long"]
|
||
horizon_names = ["短期", "中期", "长期"]
|
||
x = np.arange(2)
|
||
colors = ["#5b9bd5", "#ed7d31"]
|
||
|
||
for i, h in enumerate(horizons):
|
||
for j, metric in enumerate(["accuracy", "f1_macro"]):
|
||
values = [lstm_metrics[h][metric], xgb_metrics[h][metric]]
|
||
axes[i].bar(x + j * 0.3 - 0.15, values, 0.3, color=colors[j],
|
||
label=metric.upper())
|
||
axes[i].set_title(horizon_names[i])
|
||
axes[i].set_xticks([0.15, 1.15])
|
||
axes[i].set_xticklabels(["LSTM", "XGBoost"])
|
||
axes[i].set_ylim(0, 1)
|
||
if i == 0:
|
||
axes[i].legend()
|
||
plt.tight_layout()
|
||
plt.savefig(OUTPUT_FIGURES / "model_comparison.png", dpi=150)
|
||
plt.close()
|
||
|
||
|
||
def evaluate():
|
||
X_test, y_test = load_test_data()
|
||
print(f"测试集: X{X_test.shape}, y{y_test.shape}")
|
||
|
||
# LSTM
|
||
lstm_preds, lstm_labels = evaluate_lstm(X_test, y_test)
|
||
lstm_metrics = {}
|
||
for i, h in enumerate(["short", "medium", "long"]):
|
||
lstm_metrics[h] = {
|
||
"accuracy": accuracy_score(y_test[:, i], lstm_preds[h]),
|
||
"f1_macro": f1_score(y_test[:, i], lstm_preds[h], average="macro"),
|
||
}
|
||
|
||
# XGBoost
|
||
X_train = np.concatenate([np.load(DATA_PROCESSED / f"{c}_sequences.npz")["X"]
|
||
for c in ["jiaozuo", "zhengzhou"]])
|
||
y_train = np.concatenate([np.load(DATA_PROCESSED / f"{c}_sequences.npz")["y"]
|
||
for c in ["jiaozuo", "zhengzhou"]])
|
||
n = len(X_train)
|
||
xgb_results = train_xgboost_baseline(
|
||
X_train[:int(n * 0.7)], y_train[:int(n * 0.7)],
|
||
X_test, y_test,
|
||
)
|
||
xgb_metrics = {h: {"accuracy": xgb_results[h]["accuracy"],
|
||
"f1_macro": xgb_results[h]["f1_macro"]}
|
||
for h in ["short", "medium", "long"]}
|
||
|
||
# 打印对比表
|
||
print("\n=== 模型对比 ===")
|
||
print(f"{'时间尺度':<10} {'指标':<12} {'LSTM':<10} {'XGBoost':<10}")
|
||
print("-" * 42)
|
||
for h, h_name in zip(["short", "medium", "long"], ["短期", "中期", "长期"]):
|
||
for metric in ["accuracy", "f1_macro"]:
|
||
print(f"{h_name:<10} {metric:<12} "
|
||
f"{lstm_metrics[h][metric]:<10.4f} {xgb_metrics[h][metric]:<10.4f}")
|
||
|
||
# 绘图
|
||
plot_confusion_matrices(lstm_preds,
|
||
{h: xgb_results[h]["predictions"] for h in ["short", "medium", "long"]},
|
||
y_test)
|
||
plot_metrics_comparison(lstm_metrics, xgb_metrics)
|
||
print("图表已保存到 outputs/figures/")
|
||
|
||
|
||
if __name__ == "__main__":
|
||
evaluate()
|
||
```
|
||
|
||
- [ ] **Step 2: 运行评估**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -m src.models.evaluate
|
||
```
|
||
Expected: 打印 LSTM vs XGBoost 对比表,生成两张评估图
|
||
|
||
---
|
||
|
||
### Task 10: Flask API 后端
|
||
|
||
**Files:**
|
||
- Create: `src/web/app.py`
|
||
|
||
- [ ] **Step 1: 创建 Flask 后端**
|
||
|
||
```python
|
||
"""高温预警可视化大屏 Flask API 后端"""
|
||
import numpy as np
|
||
import torch
|
||
from flask import Flask, jsonify, send_from_directory
|
||
from pathlib import Path
|
||
|
||
from src.utils.config import OUTPUT_MODELS, DATA_PROCESSED
|
||
from src.models.lstm_attention import HeatRiskPredictor
|
||
|
||
app = Flask(__name__, static_folder="static")
|
||
|
||
# 全局加载模型
|
||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||
model = None
|
||
feature_cols = None
|
||
|
||
RISK_LABELS = ["低风险", "中风险", "高风险", "严重风险"]
|
||
RISK_COLORS = ["#00e676", "#ffeb3b", "#ff9800", "#f44336"]
|
||
SUGGESTIONS = {
|
||
0: ["天气状况良好,无需特殊防护"],
|
||
1: ["注意防暑降温", "保持室内通风", "老年人减少午后外出"],
|
||
2: ["建议开放社区避暑中心", "增加独居老人电话探访频次", "社区志愿者关注高龄老人"],
|
||
3: ["启动高温应急预案", "社区避暑中心24小时开放", "逐一入户探访独居老人",
|
||
"医疗机构做好热射病救治准备", "通过社区广播发布高温警报"],
|
||
}
|
||
|
||
|
||
def load_model():
|
||
global model, feature_cols
|
||
data = np.load(DATA_PROCESSED / "jiaozuo_sequences.npz", allow_pickle=True)
|
||
input_dim = data["X"].shape[2]
|
||
model = HeatRiskPredictor(input_dim=input_dim).to(device)
|
||
model.load_state_dict(torch.load(OUTPUT_MODELS / "best_model.pt", map_location=device))
|
||
model.eval()
|
||
|
||
|
||
@app.route("/")
|
||
def index():
|
||
return send_from_directory("static", "index.html")
|
||
|
||
|
||
@app.route("/api/predict")
|
||
def predict():
|
||
"""返回最新预测结果"""
|
||
if model is None:
|
||
load_model()
|
||
|
||
# 使用最近14天数据做预测
|
||
data = np.load(DATA_PROCESSED / "jiaozuo_sequences.npz")
|
||
recent = torch.FloatTensor(data["X"][-1:]).to(device)
|
||
with torch.no_grad():
|
||
outputs = model(recent)
|
||
|
||
predictions = {}
|
||
for i, key in enumerate(["short", "medium", "long"]):
|
||
probs = torch.softmax(outputs[key], dim=-1)[0].cpu().numpy()
|
||
level = int(probs.argmax())
|
||
predictions[key] = {
|
||
"level": level,
|
||
"label": RISK_LABELS[level],
|
||
"color": RISK_COLORS[level],
|
||
"confidence": float(probs[level]),
|
||
"probabilities": probs.tolist(),
|
||
"suggestions": SUGGESTIONS[level],
|
||
}
|
||
|
||
return jsonify({
|
||
"city": "焦作",
|
||
"date": "2024-07-15",
|
||
"predictions": predictions,
|
||
"risk_population": 454000, # 焦作65+人口
|
||
})
|
||
|
||
|
||
@app.route("/api/history")
|
||
def history():
|
||
"""返回历史数据用于大屏图表"""
|
||
import pandas as pd
|
||
df = pd.read_csv(DATA_PROCESSED / "combined_processed.csv", parse_dates=["time"])
|
||
# 返回最近90天数据
|
||
recent = df.tail(90)
|
||
return jsonify({
|
||
"dates": recent["time"].dt.strftime("%Y-%m-%d").tolist(),
|
||
"temp_mean": recent["temp_mean"].tolist(),
|
||
"heat_index": recent["heat_index"].tolist(),
|
||
"risk_label": recent["risk_label"].tolist(),
|
||
"heatwave": recent["heatwave"].tolist(),
|
||
})
|
||
|
||
|
||
@app.route("/api/stats")
|
||
def stats():
|
||
"""返回统计摘要"""
|
||
import pandas as pd
|
||
df = pd.read_csv(DATA_PROCESSED / "combined_processed.csv", parse_dates=["time"])
|
||
annual = df.groupby(df["time"].dt.year).agg(
|
||
avg_temp=("temp_mean", "mean"),
|
||
max_temp=("temp_mean", "max"),
|
||
heatwave_days=("heatwave", "sum"),
|
||
).reset_index()
|
||
return jsonify({
|
||
"annual": {
|
||
"years": annual["time"].astype(int).tolist(),
|
||
"avg_temp": annual["avg_temp"].round(1).tolist(),
|
||
"max_temp": annual["max_temp"].round(1).tolist(),
|
||
"heatwave_days": annual["heatwave_days"].astype(int).tolist(),
|
||
},
|
||
"aging_rate": {"jiaozuo": 12.8, "zhengzhou": 11.6},
|
||
})
|
||
|
||
|
||
if __name__ == "__main__":
|
||
load_model()
|
||
app.run(host="0.0.0.0", port=5005, debug=True)
|
||
```
|
||
|
||
- [ ] **Step 2: 测试 API**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -m src.web.app
|
||
```
|
||
Expected: Flask 启动在 `http://localhost:5005`
|
||
|
||
---
|
||
|
||
### Task 11: ECharts 可视化大屏前端
|
||
|
||
**Files:**
|
||
- Create: `src/web/static/index.html`
|
||
|
||
- [ ] **Step 1: 创建大屏 HTML**
|
||
|
||
```html
|
||
<!DOCTYPE html>
|
||
<html lang="zh-CN">
|
||
<head>
|
||
<meta charset="UTF-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||
<title>高温热浪与老年群体健康预警平台</title>
|
||
<script src="https://cdn.jsdelivr.net/npm/echarts@5.5.0/dist/echarts.min.js"></script>
|
||
<style>
|
||
* { margin:0; padding:0; box-sizing:border-box; }
|
||
body { background:#0a1632; color:#e0e6ed; font-family:"Microsoft YaHei",sans-serif; overflow:hidden; }
|
||
#app { display:grid; grid-template-columns:1fr 1fr 1fr; grid-template-rows:auto 1fr 1fr; gap:10px;
|
||
padding:10px; height:100vh; }
|
||
.header { grid-column:1/-1; text-align:center; padding:10px 0;
|
||
background:linear-gradient(90deg,transparent,#1a3a5c,transparent); }
|
||
.header h1 { font-size:28px; letter-spacing:4px; }
|
||
.header .subtitle { font-size:14px; color:#6b8aaa; margin-top:4px; }
|
||
.panel { background:#0d1f3c; border:1px solid #1a3a5c; border-radius:6px; padding:12px;
|
||
display:flex; flex-direction:column; }
|
||
.panel-title { font-size:14px; color:#5b8aaa; margin-bottom:8px; padding-bottom:6px;
|
||
border-bottom:1px solid #1a3a5c; }
|
||
.chart { flex:1; min-height:0; }
|
||
.risk-display { text-align:center; display:flex; flex-direction:column; justify-content:center; }
|
||
.risk-circle { width:100px; height:100px; border-radius:50%; margin:10px auto;
|
||
display:flex; align-items:center; justify-content:center; font-size:24px; font-weight:bold; }
|
||
.stat-row { display:flex; justify-content:space-around; margin:6px 0; }
|
||
.stat-value { font-size:24px; font-weight:bold; }
|
||
.stat-label { font-size:12px; color:#6b8aaa; }
|
||
.suggestion { background:#1a2a3c; padding:8px; border-radius:4px; margin:4px 0; font-size:13px; }
|
||
</style>
|
||
</head>
|
||
<body>
|
||
<div id="app">
|
||
<div class="header">
|
||
<h1>🌡 高温热浪与银发群体健康预警可视化平台</h1>
|
||
<div class="subtitle">焦作市 · 郑州市 | 多时间尺度预警系统 | <span id="update-time"></span></div>
|
||
</div>
|
||
|
||
<!-- 面板1: 温度趋势图 -->
|
||
<div class="panel">
|
||
<div class="panel-title">📈 双城温度 & 体感温度趋势</div>
|
||
<div class="chart" id="chart-temp"></div>
|
||
</div>
|
||
|
||
<!-- 面板2: 风险等级 -->
|
||
<div class="panel" id="panel-risk">
|
||
<div class="panel-title">⚠ 当前风险等级</div>
|
||
<div class="risk-display">
|
||
<div class="risk-circle" id="risk-circle" style="background:#ff9800;color:#fff;">加载中</div>
|
||
<div id="risk-details"></div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 面板3: 人口概况 -->
|
||
<div class="panel">
|
||
<div class="panel-title">👴 老年人口概况</div>
|
||
<div class="chart" id="chart-pop"></div>
|
||
</div>
|
||
|
||
<!-- 面板4: 多尺度预警时间线 -->
|
||
<div class="panel">
|
||
<div class="panel-title">🕐 多时间尺度预警时间线</div>
|
||
<div class="chart" id="chart-timeline"></div>
|
||
</div>
|
||
|
||
<!-- 面板5: 温度-风险关联 -->
|
||
<div class="panel">
|
||
<div class="panel-title">📊 温度与健康风险关联</div>
|
||
<div class="chart" id="chart-exposure"></div>
|
||
</div>
|
||
|
||
<!-- 面板6: 历史回顾 -->
|
||
<div class="panel">
|
||
<div class="panel-title">📅 历年高温事件与热浪天数统计</div>
|
||
<div class="chart" id="chart-history"></div>
|
||
</div>
|
||
</div>
|
||
|
||
<script>
|
||
const BASE = '';
|
||
|
||
async function fetchJSON(url) {
|
||
const res = await fetch(url);
|
||
return res.json();
|
||
}
|
||
|
||
function initCharts() {
|
||
// 面板1: 温度趋势 (双Y轴折线图)
|
||
const c1 = echarts.init(document.getElementById('chart-temp'));
|
||
fetchJSON(BASE + '/api/history').then(data => {
|
||
c1.setOption({
|
||
tooltip: { trigger: 'axis' },
|
||
legend: { textStyle: { color: '#6b8aaa' }, top: 0 },
|
||
grid: { left: 50, right: 50, top: 30, bottom: 30 },
|
||
xAxis: { type: 'category', data: data.dates, axisLabel: { color: '#6b8aaa', fontSize: 10 } },
|
||
yAxis: [
|
||
{ type: 'value', name: '°C', axisLabel: { color: '#6b8aaa' } },
|
||
{ type: 'value', name: '等级', max: 3, axisLabel: { color: '#6b8aaa' } }
|
||
],
|
||
series: [
|
||
{ name: '平均温度', type: 'line', data: data.temp_mean, smooth: true,
|
||
lineStyle: { color: '#ff9800' }, itemStyle: { color: '#ff9800' } },
|
||
{ name: '体感温度', type: 'line', data: data.heat_index, smooth: true,
|
||
lineStyle: { color: '#f44336' }, itemStyle: { color: '#f44336' } },
|
||
{ name: '风险等级', type: 'line', yAxisIndex: 1, data: data.risk_label,
|
||
lineStyle: { color: '#ab47bc' }, itemStyle: { color: '#ab47bc' } }
|
||
]
|
||
});
|
||
});
|
||
|
||
// 面板2: 风险等级 (由 predict API 填充)
|
||
fetchJSON(BASE + '/api/predict').then(data => {
|
||
const p = data.predictions.short;
|
||
const circle = document.getElementById('risk-circle');
|
||
circle.style.background = p.color;
|
||
circle.style.color = p.level >= 2 ? '#fff' : '#333';
|
||
circle.textContent = p.label;
|
||
|
||
const details = document.getElementById('risk-details');
|
||
const horizons = { short: '短期(1-3天)', medium: '中期(7天)', long: '长期(30天)' };
|
||
let html = '';
|
||
for (const [k, v] of Object.entries(horizons)) {
|
||
const pred = data.predictions[k];
|
||
html += `<div class="suggestion" style="border-left:3px solid ${pred.color}">
|
||
<b>${v}:</b> ${pred.label} (置信度 ${(pred.confidence*100).toFixed(0)}%)
|
||
</div>`;
|
||
}
|
||
html += '<div style="margin-top:10px"><b>建议措施:</b></div>';
|
||
data.predictions.short.suggestions.forEach(s => {
|
||
html += `<div class="suggestion">• ${s}</div>`;
|
||
});
|
||
details.innerHTML = html;
|
||
document.getElementById('update-time').textContent = '数据更新: ' + data.date;
|
||
});
|
||
|
||
// 面板3: 人口饼图
|
||
const c3 = echarts.init(document.getElementById('chart-pop'));
|
||
c3.setOption({
|
||
tooltip: { trigger: 'item' },
|
||
series: [{
|
||
type: 'pie', radius: ['40%', '70%'],
|
||
data: [
|
||
{ value: 12.8, name: '焦作 65+ (12.8%)', itemStyle: { color: '#5b9bd5' } },
|
||
{ value: 11.6, name: '郑州 65+ (11.6%)', itemStyle: { color: '#ed7d31' } },
|
||
],
|
||
label: { color: '#c0d0e0', fontSize: 12 }
|
||
}]
|
||
});
|
||
|
||
// 面板4: 多尺度预警时间线
|
||
const c4 = echarts.init(document.getElementById('chart-timeline'));
|
||
fetchJSON(BASE + '/api/predict').then(data => {
|
||
const levels = ['低风险', '中风险', '高风险', '严重风险'];
|
||
const colors = ['#00e676', '#ffeb3b', '#ff9800', '#f44336'];
|
||
const items = [
|
||
{ name: '短期 (1-3天)', value: data.predictions.short.level },
|
||
{ name: '中期 (7天)', value: data.predictions.medium.level },
|
||
{ name: '长期 (30天)', value: data.predictions.long.level },
|
||
];
|
||
c4.setOption({
|
||
tooltip: {},
|
||
grid: { left: 120, right: 40, top: 20, bottom: 20 },
|
||
xAxis: { type: 'value', max: 3, min: 0, axisLabel: { color: '#6b8aaa' } },
|
||
yAxis: { type: 'category', data: items.map(i => i.name),
|
||
axisLabel: { color: '#c0d0e0', fontSize: 13 } },
|
||
series: [{
|
||
type: 'bar', data: items.map((i, idx) => ({
|
||
value: i.value, itemStyle: { color: colors[i.value] }
|
||
})),
|
||
barWidth: 30,
|
||
label: { show: true, formatter: p => levels[p.value],
|
||
position: 'right', color: '#c0d0e0' }
|
||
}]
|
||
});
|
||
});
|
||
|
||
// 面板5: 暴露反应曲线
|
||
const c5 = echarts.init(document.getElementById('chart-exposure'));
|
||
// 模拟暴露-反应数据
|
||
const temps = Array.from({length: 26}, (_, i) => 15 + i); // 15-40°C
|
||
const rr = temps.map(t => {
|
||
if (t < 24) return 1.0;
|
||
if (t > 36) return 1.5 + (t-36) * 0.1;
|
||
return 1.0 + (t-24) * 0.04;
|
||
});
|
||
c5.setOption({
|
||
tooltip: { trigger: 'axis' },
|
||
grid: { left: 55, right: 30, top: 20, bottom: 30 },
|
||
xAxis: { type: 'category', data: temps.map(t => t+'°C'), axisLabel: { color: '#6b8aaa', fontSize: 10 } },
|
||
yAxis: { type: 'value', name: 'RR', axisLabel: { color: '#6b8aaa' } },
|
||
series: [{
|
||
type: 'line', data: rr, smooth: true,
|
||
lineStyle: { color: '#f44336', width: 2 },
|
||
areaStyle: { color: 'rgba(244,67,54,0.15)' },
|
||
markLine: { silent: true, data: [{ yAxis: 1.0, lineStyle: { color: '#666', type: 'dashed' } }] }
|
||
}]
|
||
});
|
||
|
||
// 面板6: 历史回顾
|
||
const c6 = echarts.init(document.getElementById('chart-history'));
|
||
fetchJSON(BASE + '/api/stats').then(data => {
|
||
c6.setOption({
|
||
tooltip: { trigger: 'axis' },
|
||
legend: { textStyle: { color: '#6b8aaa' }, top: 0 },
|
||
grid: { left: 55, right: 55, top: 30, bottom: 30 },
|
||
xAxis: { type: 'category', data: data.annual.years, axisLabel: { color: '#6b8aaa' } },
|
||
yAxis: [
|
||
{ type: 'value', name: '°C', axisLabel: { color: '#6b8aaa' } },
|
||
{ type: 'value', name: '天', axisLabel: { color: '#6b8aaa' } }
|
||
],
|
||
series: [
|
||
{ name: '年均温', type: 'line', data: data.annual.avg_temp,
|
||
lineStyle: { color: '#ff9800' }, itemStyle: { color: '#ff9800' } },
|
||
{ name: '最高温', type: 'line', data: data.annual.max_temp,
|
||
lineStyle: { color: '#f44336' }, itemStyle: { color: '#f44336' } },
|
||
{ name: '热浪天数', type: 'bar', yAxisIndex: 1, data: data.annual.heatwave_days,
|
||
itemStyle: { color: 'rgba(244,67,54,0.5)' } }
|
||
]
|
||
});
|
||
});
|
||
}
|
||
|
||
window.addEventListener('resize', () => {
|
||
document.querySelectorAll('.chart').forEach(el => {
|
||
const instance = echarts.getInstanceByDom(el);
|
||
if (instance) instance.resize();
|
||
});
|
||
});
|
||
|
||
document.addEventListener('DOMContentLoaded', initCharts);
|
||
// 每30分钟自动刷新
|
||
setInterval(() => {
|
||
document.querySelectorAll('.chart').forEach(el => {
|
||
const instance = echarts.getInstanceByDom(el);
|
||
if (instance) instance.dispose();
|
||
});
|
||
initCharts();
|
||
}, 30 * 60 * 1000);
|
||
</script>
|
||
</body>
|
||
</html>
|
||
```
|
||
|
||
- [ ] **Step 2: 启动大屏并验证**
|
||
|
||
Run:
|
||
```bash
|
||
cd "D:/Code/doing_exercises/programs/银发群体高温多时间尺度预警和服务优化可视化研究"
|
||
.venv/Scripts/python.exe -m src.web.app
|
||
```
|
||
打开浏览器访问 `http://localhost:5005`,验证 6 个面板正常显示。
|
||
|
||
---
|
||
|
||
### Task 12: LaTeX 论文框架
|
||
|
||
**Files:**
|
||
- Create: `thesis/main.tex`
|
||
- Create: `thesis/chapters/abstract.tex`
|
||
- Create: `thesis/chapters/ch1-intro.tex`
|
||
- Create: `thesis/chapters/ch2-theory.tex`
|
||
- Create: `thesis/chapters/ch3-data.tex`
|
||
- Create: `thesis/chapters/ch4-model.tex`
|
||
- Create: `thesis/chapters/ch5-system.tex`
|
||
- Create: `thesis/chapters/ch6-results.tex`
|
||
- Create: `thesis/chapters/ch7-conclusion.tex`
|
||
- Create: `thesis/refs.bib`
|
||
- Create: `thesis/Makefile`
|
||
|
||
- [ ] **Step 1: 创建主文件 `thesis/main.tex`**
|
||
|
||
```latex
|
||
%!TEX program = xelatex
|
||
\documentclass[12pt,a4paper,openany]{ctexbook}
|
||
|
||
% --- 页面设置 ---
|
||
\usepackage[top=2.5cm,bottom=2.5cm,left=3cm,right=2.5cm]{geometry}
|
||
\usepackage{setspace}
|
||
\onehalfspacing
|
||
|
||
% --- 字体 ---
|
||
\setCJKmainfont{Songti SC}[AutoFakeBold=2]
|
||
\setCJKsansfont{Heiti SC}
|
||
\setCJKmonofont{STFangsong}
|
||
|
||
% --- 图表 ---
|
||
\usepackage{graphicx}
|
||
\usepackage{float}
|
||
\usepackage{subcaption}
|
||
\usepackage{booktabs}
|
||
\usepackage{longtable}
|
||
|
||
% --- 参考文献 ---
|
||
\usepackage[backend=biber,style=gb7714-2015]{biblatex}
|
||
\addbibresource{refs.bib}
|
||
|
||
% --- 超链接 ---
|
||
\usepackage[hidelinks]{hyperref}
|
||
|
||
% --- 数学 ---
|
||
\usepackage{amsmath,amssymb}
|
||
|
||
% --- 代码 ---
|
||
\usepackage{listings}
|
||
\lstset{
|
||
basicstyle=\small\ttfamily,
|
||
breaklines=true,
|
||
frame=single,
|
||
numbers=left,
|
||
numberstyle=\tiny,
|
||
}
|
||
|
||
% --- 其他 ---
|
||
\usepackage{tikz}
|
||
\usepackage{caption}
|
||
\captionsetup{font=small,labelfont=bf}
|
||
|
||
\title{银发群体高温多时间尺度预警和服务优化可视化研究}
|
||
\author{刘航宇}
|
||
\date{\today}
|
||
|
||
\begin{document}
|
||
|
||
\maketitle
|
||
|
||
% 封面页(按学校模板补充)
|
||
\begin{center}
|
||
\vspace*{3cm}
|
||
{\large\bfseries 本科毕业论文}\\[1cm]
|
||
{\LARGE\bfseries 银发群体高温多时间尺度预警\\[0.3cm]和服务优化可视化研究}\\[2cm]
|
||
{\large 学\hspace{2em}院:计算机科学与技术学院}\\[0.5cm]
|
||
{\large 专\hspace{2em}业:计算机科学与技术}\\[0.5cm]
|
||
{\large 姓\hspace{2em}名:刘航宇}\\[0.5cm]
|
||
{\large 学\hspace{2em}号:}\\[0.5cm]
|
||
{\large 指导教师:}\\[2cm]
|
||
{\large \today}
|
||
\end{center}
|
||
\thispagestyle{empty}
|
||
\newpage
|
||
|
||
% 摘要
|
||
\input{chapters/abstract}
|
||
|
||
% 目录
|
||
\tableofcontents
|
||
\newpage
|
||
|
||
% 正文
|
||
\input{chapters/ch1-intro}
|
||
\input{chapters/ch2-theory}
|
||
\input{chapters/ch3-data}
|
||
\input{chapters/ch4-model}
|
||
\input{chapters/ch5-system}
|
||
\input{chapters/ch6-results}
|
||
\input{chapters/ch7-conclusion}
|
||
|
||
% 参考文献
|
||
\printbibliography[title=参考文献]
|
||
|
||
% 致谢
|
||
\chapter*{致谢}
|
||
\addcontentsline{toc}{chapter}{致谢}
|
||
|
||
衷心感谢导师在选题、研究方法、论文撰写等方面给予的悉心指导和宝贵建议。
|
||
|
||
感谢河南理工大学计算机科学与技术学院四年来提供的学习平台和科研环境。
|
||
|
||
感谢家人和朋友在学业期间的理解、支持与鼓励。
|
||
|
||
% 附录
|
||
\appendix
|
||
\chapter{核心代码清单}
|
||
\section{LSTM-Attention 模型定义}
|
||
% 引用关键代码片段
|
||
|
||
\chapter{系统运行说明}
|
||
\section{环境配置}
|
||
\section{运行步骤}
|
||
|
||
\end{document}
|
||
```
|
||
|
||
- [ ] **Step 2: 创建 `thesis/refs.bib`(部分关键文献)**
|
||
|
||
```bibtex
|
||
@article{chen2018heat,
|
||
author = {Chen, R. and Yin, P. and Wang, L. and et al.},
|
||
title = {Association between ambient temperature and mortality risk and burden in China},
|
||
journal = {The Lancet Planetary Health},
|
||
year = {2018},
|
||
volume = {2},
|
||
number = {8},
|
||
pages = {e344--e352},
|
||
}
|
||
|
||
@article{ma2015heat,
|
||
author = {Ma, W. and Chen, R. and Kan, H.},
|
||
title = {Temperature-related mortality in 17 large Chinese cities},
|
||
journal = {Environmental Health Perspectives},
|
||
year = {2015},
|
||
volume = {123},
|
||
number = {10},
|
||
pages = {989--994},
|
||
}
|
||
|
||
@article{gasparrini2015mortality,
|
||
author = {Gasparrini, A. and Guo, Y. and Hashizume, M. and et al.},
|
||
title = {Mortality risk attributable to high and low ambient temperature},
|
||
journal = {The Lancet},
|
||
year = {2015},
|
||
volume = {386},
|
||
pages = {369--375},
|
||
}
|
||
|
||
@article{hochreiter1997lstm,
|
||
author = {Hochreiter, S. and Schmidhuber, J.},
|
||
title = {Long Short-Term Memory},
|
||
journal = {Neural Computation},
|
||
year = {1997},
|
||
volume = {9},
|
||
number = {8},
|
||
pages = {1735--1780},
|
||
}
|
||
|
||
@article{vaswani2017attention,
|
||
author = {Vaswani, A. and Shazeer, N. and Parmar, N. and et al.},
|
||
title = {Attention Is All You Need},
|
||
journal = {Advances in Neural Information Processing Systems},
|
||
year = {2017},
|
||
volume = {30},
|
||
}
|
||
|
||
@article{chen2016xgboost,
|
||
author = {Chen, T. and Guestrin, C.},
|
||
title = {XGBoost: A Scalable Tree Boosting System},
|
||
journal = {Proceedings of the 22nd ACM SIGKDD},
|
||
year = {2016},
|
||
pages = {785--794},
|
||
}
|
||
|
||
@misc{era5land,
|
||
author = {Copernicus Climate Change Service},
|
||
title = {ERA5-Land hourly data from 1950 to present},
|
||
year = {2024},
|
||
howpublished = {\url{https://cds.climate.copernicus.eu/}},
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 3: 创建编译脚本 `thesis/Makefile`**
|
||
|
||
```makefile
|
||
MAIN = main
|
||
LATEX = xelatex
|
||
BIBER = biber
|
||
|
||
all: $(MAIN).pdf
|
||
|
||
$(MAIN).pdf: $(MAIN).tex chapters/*.tex refs.bib
|
||
$(LATEX) $(MAIN).tex
|
||
$(BIBER) $(MAIN)
|
||
$(LATEX) $(MAIN).tex
|
||
$(LATEX) $(MAIN).tex
|
||
|
||
clean:
|
||
rm -f *.aux *.log *.out *.toc *.bbl *.blg *.synctex.gz *.fdb_latexmk *.fls *.run.xml *.bcf
|
||
|
||
distclean: clean
|
||
rm -f $(MAIN).pdf
|
||
|
||
.PHONY: all clean distclean
|
||
```
|
||
|
||
---
|
||
|
||
### Task 13: README 与最终集成
|
||
|
||
**Files:**
|
||
- Create: `README.md`
|
||
|
||
- [ ] **Step 1: 创建 README.md**
|
||
|
||
```markdown
|
||
# 银发群体高温多时间尺度预警和服务优化可视化研究
|
||
|
||
本科毕业设计 — 河南理工大学计算机科学与技术学院
|
||
|
||
## 功能
|
||
|
||
- 基于 ERA5 再分析数据的焦作、郑州两市气象数据获取与预处理
|
||
- LSTM-Attention 多时间尺度(1-3天/7天/30天)高温健康风险预测
|
||
- XGBoost 基线模型对比
|
||
- ECharts 可视化大屏(6 面板,深色科技蓝主题)
|
||
- LaTeX 学位论文
|
||
|
||
## 环境配置
|
||
|
||
```bash
|
||
# 创建虚拟环境
|
||
uv venv --python "D:\settings\Language\Python\Python 3.13.13\python.exe"
|
||
# 安装依赖
|
||
uv pip install -e .
|
||
```
|
||
|
||
## 运行
|
||
|
||
### 1. 数据获取
|
||
|
||
```bash
|
||
# 注册 CDS 账号后配置 ~/.cdsapirc
|
||
python -m src.data.download_era5
|
||
python -m src.data.collect_mortality
|
||
python -m src.data.preprocess
|
||
```
|
||
|
||
### 2. 模型训练
|
||
|
||
```bash
|
||
python -m src.models.train
|
||
python -m src.models.evaluate
|
||
```
|
||
|
||
### 3. 可视化大屏
|
||
|
||
```bash
|
||
python -m src.web.app
|
||
# 浏览器打开 http://localhost:5005
|
||
```
|
||
|
||
### 4. 论文编译
|
||
|
||
```bash
|
||
cd thesis
|
||
make
|
||
```
|
||
|
||
## 项目结构
|
||
|
||
```
|
||
├── data/ # 数据(raw/processed/external)
|
||
├── src/
|
||
│ ├── data/ # 数据获取与预处理
|
||
│ ├── models/ # LSTM-Attention + XGBoost
|
||
│ ├── web/ # Flask + ECharts 大屏
|
||
│ └── utils/ # 全局配置
|
||
├── notebooks/ # EDA
|
||
├── outputs/ # 模型/图表/日志
|
||
├── thesis/ # LaTeX 论文
|
||
└── docs/ # 设计文档
|
||
```
|
||
```
|
||
|
||
---
|
||
|
||
## 实施顺序
|
||
|
||
```
|
||
Task 1 → Task 2 → Task 3 → Task 4 → Task 5
|
||
│ │
|
||
│ ┌────────────────────────────────────┘
|
||
▼ ▼ ▼ ▼
|
||
Task 6 → Task 7 → Task 8 → Task 9
|
||
│
|
||
┌─────────┴─────────┐
|
||
▼ ▼
|
||
Task 10 Task 12
|
||
│ │
|
||
▼ ▼
|
||
Task 11 Task 12+
|
||
│ │
|
||
└─────────┬──────────┘
|
||
▼
|
||
Task 13
|
||
```
|
||
|
||
- 数据管线 (1-5) 必须先完成
|
||
- 模型 (6-9) 依赖数据管线
|
||
- Web (10-11) 依赖模型训练完成
|
||
- 论文 (12) 可与模型和 Web 并行进行
|