Add lecture materials for Model-Free, Control, and Value topics

- Added Lecture4 - ModelFree.pdf (3013 KB)
- Added Lecture5 - Control.pdf (2575 KB)
- Added Lecture6 - Value.pdf (3320 KB)
This commit is contained in:
2026-04-28 20:28:00 +08:00
commit ceddbdd559
52 changed files with 117740 additions and 0 deletions
@@ -0,0 +1,17 @@
\relax
\providecommand\hyper@newdestlabel[2]{}
\providecommand*\HyPL@Entry[1]{}
\HyPL@Entry{0<</S/D>>}
\@writefile{toc}{\contentsline {section}{\numberline {1}Bagging vs Boosting}{1}{section.1}\protected@file@percent }
\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces Controlled supervised model comparison (identical pipeline and split).}}{1}{table.caption.1}\protected@file@percent }
\providecommand*\caption@xref[2]{\@setref\relax\@undefined{#1}}
\newlabel{tab: supervised-comparison}{{1}{1}{Controlled supervised model comparison (identical pipeline and split)}{table.caption.1}{}}
\@writefile{toc}{\contentsline {section}{\numberline {2}Hyperparameter Optimisation}{1}{section.2}\protected@file@percent }
\@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces Optuna parameter importance. Larger bars indicate higher influence on validation macro-F1.}}{2}{figure.caption.2}\protected@file@percent }
\newlabel{fig: param-importance}{{1}{2}{Optuna parameter importance. Larger bars indicate higher influence on validation macro-F1}{figure.caption.2}{}}
\@writefile{toc}{\contentsline {section}{\numberline {3}K-Means vs GMM}{2}{section.3}\protected@file@percent }
\@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Full clustering comparison across k=2 to k=8.}}{3}{table.caption.3}\protected@file@percent }
\newlabel{tab: clustering}{{2}{3}{Full clustering comparison across k=2 to k=8}{table.caption.3}{}}
\@writefile{toc}{\contentsline {section}{\numberline {4}Personalised Improvement Reflection}{3}{section.4}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {5}AI Use Declaration}{4}{section.5}\protected@file@percent }
\gdef \@abspage@last{4}
@@ -0,0 +1,660 @@
This is XeTeX, Version 3.141592653-2.6-0.999997 (TeX Live 2025) (preloaded format=xelatex 2025.6.5) 25 APR 2026 01:38
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
**theory_and_reflection_1234560.tex
(./theory_and_reflection_1234560.tex
LaTeX2e <2024-11-01> patch level 2
L3 programming layer <2025-01-18>
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/article.cls
Document Class: article 2024/06/29 v1.4n Standard LaTeX document class
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/size11.clo
File: size11.clo 2024/06/29 v1.4n Standard LaTeX file (size option)
)
\c@part=\count192
\c@section=\count193
\c@subsection=\count194
\c@subsubsection=\count195
\c@paragraph=\count196
\c@subparagraph=\count197
\c@figure=\count198
\c@table=\count199
\abovecaptionskip=\skip49
\belowcaptionskip=\skip50
\bibindent=\dimen141
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/geometry/geometry.sty
Package: geometry 2020/01/02 v5.9 Page Geometry
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics/keyval.sty
Package: keyval 2022/05/29 v1.15 key=value parser (DPC)
\KV@toks@=\toks17
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/iftex/ifvtex.sty
Package: ifvtex 2019/10/25 v1.7 ifvtex legacy package. Use iftex instead.
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/iftex/iftex.sty
Package: iftex 2024/12/12 v1.0g TeX engine tests
))
\Gm@cnth=\count266
\Gm@cntv=\count267
\c@Gm@tempcnt=\count268
\Gm@bindingoffset=\dimen142
\Gm@wd@mp=\dimen143
\Gm@odd@mp=\dimen144
\Gm@even@mp=\dimen145
\Gm@layoutwidth=\dimen146
\Gm@layoutheight=\dimen147
\Gm@layouthoffset=\dimen148
\Gm@layoutvoffset=\dimen149
\Gm@dimlist=\toks18
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/fontspec/fontspec.sty
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/l3packages/xparse/xpars
e.sty
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/l3kernel/expl3.sty
Package: expl3 2025-01-18 L3 programming layer (loader)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/l3backend/l3backend-xet
ex.def
File: l3backend-xetex.def 2024-05-08 L3 backend support: XeTeX
\g__graphics_track_int=\count269
\l__pdf_internal_box=\box52
\g__pdf_backend_annotation_int=\count270
\g__pdf_backend_link_int=\count271
))
Package: xparse 2024-08-16 L3 Experimental document command parser
)
Package: fontspec 2024/05/11 v2.9e Font selection for XeLaTeX and LuaLaTeX
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/fontspec/fontspec-xetex
.sty
Package: fontspec-xetex 2024/05/11 v2.9e Font selection for XeLaTeX and LuaLaTe
X
\l__fontspec_script_int=\count272
\l__fontspec_language_int=\count273
\l__fontspec_strnum_int=\count274
\l__fontspec_tmp_int=\count275
\l__fontspec_tmpa_int=\count276
\l__fontspec_tmpb_int=\count277
\l__fontspec_tmpc_int=\count278
\l__fontspec_em_int=\count279
\l__fontspec_emdef_int=\count280
\l__fontspec_strong_int=\count281
\l__fontspec_strongdef_int=\count282
\l__fontspec_tmpa_dim=\dimen150
\l__fontspec_tmpb_dim=\dimen151
\l__fontspec_tmpc_dim=\dimen152
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/fontenc.sty
Package: fontenc 2021/04/29 v2.0v Standard LaTeX package
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/fontspec/fontspec.cfg))
) (d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics/graphicx.sty
Package: graphicx 2021/09/16 v1.2d Enhanced LaTeX Graphics (DPC,SPQR)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics/graphics.sty
Package: graphics 2024/08/06 v1.4g Standard LaTeX Graphics (DPC,SPQR)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics/trig.sty
Package: trig 2023/12/02 v1.11 sin cos tan (DPC)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics-cfg/graphics.c
fg
File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration
)
Package graphics Info: Driver file: xetex.def on input line 106.
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics-def/xetex.def
File: xetex.def 2022/09/22 v5.0n Graphics/color driver for xetex
))
\Gin@req@height=\dimen153
\Gin@req@width=\dimen154
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/booktabs/booktabs.sty
Package: booktabs 2020/01/12 v1.61803398 Publication quality tables
\heavyrulewidth=\dimen155
\lightrulewidth=\dimen156
\cmidrulewidth=\dimen157
\belowrulesep=\dimen158
\belowbottomsep=\dimen159
\aboverulesep=\dimen160
\abovetopsep=\dimen161
\cmidrulesep=\dimen162
\cmidrulekern=\dimen163
\defaultaddspace=\dimen164
\@cmidla=\count283
\@cmidlb=\count284
\@aboverulesep=\dimen165
\@belowrulesep=\dimen166
\@thisruleclass=\count285
\@lastruleclass=\count286
\@thisrulewidth=\dimen167
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/tools/array.sty
Package: array 2024/10/17 v2.6g Tabular extension package (FMi)
\col@sep=\dimen168
\ar@mcellbox=\box53
\extrarowheight=\dimen169
\NC@list=\toks19
\extratabsurround=\skip51
\backup@length=\skip52
\ar@cellbox=\box54
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/tools/tabularx.sty
Package: tabularx 2023/12/11 v2.12a `tabularx' package (DPC)
\TX@col@width=\dimen170
\TX@old@table=\dimen171
\TX@old@col=\dimen172
\TX@target=\dimen173
\TX@delta=\dimen174
\TX@cols=\count287
\TX@ftn=\toks20
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/float/float.sty
Package: float 2001/11/08 v1.3d Float enhancements (AL)
\c@float@type=\count288
\float@exts=\toks21
\float@box=\box55
\@float@everytoks=\toks22
\@floatcapt=\box56
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/hyperref.sty
Package: hyperref 2024-11-05 v7.01l Hypertext links for LaTeX
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/kvsetkeys/kvsetkeys.sty
Package: kvsetkeys 2022-10-05 v1.19 Key value parser (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/kvdefinekeys/kvdefine
keys.sty
Package: kvdefinekeys 2019-12-19 v1.6 Define keys (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/pdfescape/pdfescape.s
ty
Package: pdfescape 2019/12/09 v1.15 Implements pdfTeX's escape features (HO)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/ltxcmds/ltxcmds.sty
Package: ltxcmds 2023-12-04 v1.26 LaTeX kernel commands for general use (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/pdftexcmds/pdftexcmds
.sty
Package: pdftexcmds 2020-06-27 v0.33 Utility functions of pdfTeX for LuaTeX (HO
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/infwarerr/infwarerr.s
ty
Package: infwarerr 2019/12/03 v1.5 Providing info/warning/error messages (HO)
)
Package pdftexcmds Info: \pdf@primitive is available.
Package pdftexcmds Info: \pdf@ifprimitive is available.
Package pdftexcmds Info: \pdfdraftmode not found.
))
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hycolor/hycolor.sty
Package: hycolor 2020-01-27 v1.10 Color options for hyperref/bookmark (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/nameref.sty
Package: nameref 2023-11-26 v2.56 Cross-referencing by name of section
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/refcount/refcount.sty
Package: refcount 2019/12/15 v3.6 Data extraction from label references (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/gettitlestring/gettit
lestring.sty
Package: gettitlestring 2019/12/15 v1.6 Cleanup title references (HO)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/kvoptions/kvoptions.sty
Package: kvoptions 2022-06-15 v3.15 Key value format for package options (HO)
))
\c@section@level=\count289
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/etoolbox/etoolbox.sty
Package: etoolbox 2025/02/11 v2.5l e-TeX tools for LaTeX (JAW)
\etb@tempcnta=\count290
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/stringenc/stringenc.s
ty
Package: stringenc 2019/11/29 v1.12 Convert strings between diff. encodings (HO
)
)
\@linkdim=\dimen175
\Hy@linkcounter=\count291
\Hy@pagecounter=\count292
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/pd1enc.def
File: pd1enc.def 2024-11-05 v7.01l Hyperref: PDFDocEncoding definition (HO)
) (d:/settings/Language/texlive/2025/texmf-dist/tex/generic/intcalc/intcalc.sty
Package: intcalc 2019/12/15 v1.3 Expandable calculations with integers (HO)
)
\Hy@SavedSpaceFactor=\count293
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/puenc.def
File: puenc.def 2024-11-05 v7.01l Hyperref: PDF Unicode definition (HO)
)
Package hyperref Info: Hyper figures OFF on input line 4157.
Package hyperref Info: Link nesting OFF on input line 4162.
Package hyperref Info: Hyper index ON on input line 4165.
Package hyperref Info: Plain pages OFF on input line 4172.
Package hyperref Info: Backreferencing OFF on input line 4177.
Package hyperref Info: Implicit mode ON; LaTeX internals redefined.
Package hyperref Info: Bookmarks ON on input line 4424.
\c@Hy@tempcnt=\count294
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/url/url.sty
\Urlmuskip=\muskip17
Package: url 2013/09/16 ver 3.4 Verb mode for urls, etc.
)
LaTeX Info: Redefining \url on input line 4763.
\XeTeXLinkMargin=\dimen176
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/bitset/bitset.sty
Package: bitset 2019/12/09 v1.3 Handle bit-vector datatype (HO)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/bigintcalc/bigintcalc
.sty
Package: bigintcalc 2019/12/15 v1.5 Expandable calculations on big integers (HO
)
))
\Fld@menulength=\count295
\Field@Width=\dimen177
\Fld@charsize=\dimen178
Package hyperref Info: Hyper figures OFF on input line 6042.
Package hyperref Info: Link nesting OFF on input line 6047.
Package hyperref Info: Hyper index ON on input line 6050.
Package hyperref Info: backreferencing OFF on input line 6057.
Package hyperref Info: Link coloring OFF on input line 6062.
Package hyperref Info: Link coloring with OCG OFF on input line 6067.
Package hyperref Info: PDF/A mode OFF on input line 6072.
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/atbegshi-ltx.sty
Package: atbegshi-ltx 2021/01/10 v1.0c Emulation of the original atbegshi
package with kernel methods
)
\Hy@abspage=\count296
\c@Item=\count297
\c@Hfootnote=\count298
)
Package hyperref Info: Driver (autodetected): hxetex.
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/hxetex.def
File: hxetex.def 2024-11-05 v7.01l Hyperref driver for XeTeX
\pdfm@box=\box57
\c@Hy@AnnotLevel=\count299
\HyField@AnnotCount=\count300
\Fld@listcount=\count301
\c@bookmark@seq@number=\count302
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/rerunfilecheck/rerunfil
echeck.sty
Package: rerunfilecheck 2022-07-10 v1.10 Rerun checks for auxiliary files (HO)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/atveryend-ltx.sty
Package: atveryend-ltx 2020/08/19 v1.0a Emulation of the original atveryend pac
kage
with kernel methods
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/uniquecounter/uniquec
ounter.sty
Package: uniquecounter 2019/12/15 v1.4 Provide unlimited unique counter (HO)
)
Package uniquecounter Info: New unique counter `rerunfilecheck' on input line 2
85.
)
\Hy@SectionHShift=\skip53
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/caption/caption.sty
Package: caption 2023/08/05 v3.6o Customizing captions (AR)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/caption/caption3.sty
Package: caption3 2023/07/31 v2.4d caption3 kernel (AR)
\caption@tempdima=\dimen179
\captionmargin=\dimen180
\caption@leftmargin=\dimen181
\caption@rightmargin=\dimen182
\caption@width=\dimen183
\caption@indent=\dimen184
\caption@parindent=\dimen185
\caption@hangindent=\dimen186
Package caption Info: Standard document class detected.
)
\c@caption@flags=\count303
\c@continuedfloat=\count304
Package caption Info: float package is loaded.
Package caption Info: hyperref package is loaded.
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/setspace/setspace.sty
Package: setspace 2022/12/04 v6.7b set line spacing
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/parskip/parskip.sty
Package: parskip 2021-03-14 v2.0h non-zero parskip adjustments
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/enumitem/enumitem.sty
Package: enumitem 2025/02/06 v3.11 Customized lists
\labelindent=\skip54
\enit@outerparindent=\dimen187
\enit@toks=\toks23
\enit@inbox=\box58
\enit@count@id=\count305
\enitdp@description=\count306
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/titlesec/titlesec.sty
Package: titlesec 2025/01/04 v2.17 Sectioning titles
\ttl@box=\box59
\beforetitleunit=\skip55
\aftertitleunit=\skip56
\ttl@plus=\dimen188
\ttl@minus=\dimen189
\ttl@toksa=\toks24
\titlewidth=\dimen190
\titlewidthlast=\dimen191
\titlewidthfirst=\dimen192
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amsmath.sty
Package: amsmath 2024/11/05 v2.17t AMS math features
\@mathmargin=\skip57
For additional information on amsmath, use the `?' option.
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amstext.sty
Package: amstext 2021/08/26 v2.01 AMS text
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amsgen.sty
File: amsgen.sty 1999/11/30 v2.0 generic functions
\@emptytoks=\toks25
\ex@=\dimen193
))
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amsbsy.sty
Package: amsbsy 1999/11/29 v1.2d Bold Symbols
\pmbraise@=\dimen194
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amsopn.sty
Package: amsopn 2022/04/08 v2.04 operator names
)
\inf@bad=\count307
LaTeX Info: Redefining \frac on input line 233.
\uproot@=\count308
\leftroot@=\count309
LaTeX Info: Redefining \overline on input line 398.
LaTeX Info: Redefining \colon on input line 409.
\classnum@=\count310
\DOTSCASE@=\count311
LaTeX Info: Redefining \ldots on input line 495.
LaTeX Info: Redefining \dots on input line 498.
LaTeX Info: Redefining \cdots on input line 619.
\Mathstrutbox@=\box60
\strutbox@=\box61
LaTeX Info: Redefining \big on input line 721.
LaTeX Info: Redefining \Big on input line 722.
LaTeX Info: Redefining \bigg on input line 723.
LaTeX Info: Redefining \Bigg on input line 724.
\big@size=\dimen195
LaTeX Font Info: Redeclaring font encoding OML on input line 742.
LaTeX Font Info: Redeclaring font encoding OMS on input line 743.
\macc@depth=\count312
LaTeX Info: Redefining \bmod on input line 904.
LaTeX Info: Redefining \pmod on input line 909.
LaTeX Info: Redefining \smash on input line 939.
LaTeX Info: Redefining \relbar on input line 969.
LaTeX Info: Redefining \Relbar on input line 970.
\c@MaxMatrixCols=\count313
\dotsspace@=\muskip18
\c@parentequation=\count314
\dspbrk@lvl=\count315
\tag@help=\toks26
\row@=\count316
\column@=\count317
\maxfields@=\count318
\andhelp@=\toks27
\eqnshift@=\dimen196
\alignsep@=\dimen197
\tagshift@=\dimen198
\tagwidth@=\dimen199
\totwidth@=\dimen256
\lineht@=\dimen257
\@envbody=\toks28
\multlinegap=\skip58
\multlinetaggap=\skip59
\mathdisplay@stack=\toks29
LaTeX Info: Redefining \[ on input line 2953.
LaTeX Info: Redefining \] on input line 2954.
)
Package fontspec Info:
(fontspec) Font family 'TimesNewRoman(0)' created for font 'Times
(fontspec) New Roman' with options [Ligatures=TeX].
(fontspec)
(fontspec) This font family consists of the following NFSS
(fontspec) series/shapes:
(fontspec)
(fontspec) - 'normal' (m/n) with NFSS spec.: <->"Times New
(fontspec) Roman/OT:script=latn;language=dflt;mapping=tex-text;"
(fontspec) - 'small caps' (m/sc) with NFSS spec.: <->"Times New
(fontspec) Roman/OT:script=latn;language=dflt;+smcp;mapping=tex-tex
t;"
(fontspec) - 'bold' (b/n) with NFSS spec.: <->"Times New
(fontspec) Roman/B/OT:script=latn;language=dflt;mapping=tex-text;"
(fontspec) - 'bold small caps' (b/sc) with NFSS spec.: <->"Times
(fontspec) New
(fontspec) Roman/B/OT:script=latn;language=dflt;+smcp;mapping=tex-t
ext;"
(fontspec) - 'italic' (m/it) with NFSS spec.: <->"Times New
(fontspec) Roman/I/OT:script=latn;language=dflt;mapping=tex-text;"
(fontspec) - 'italic small caps' (m/scit) with NFSS spec.:
(fontspec) <->"Times New
(fontspec) Roman/I/OT:script=latn;language=dflt;+smcp;mapping=tex-t
ext;"
(fontspec) - 'bold italic' (b/it) with NFSS spec.: <->"Times New
(fontspec) Roman/BI/OT:script=latn;language=dflt;mapping=tex-text;"
(fontspec) - 'bold italic small caps' (b/scit) with NFSS spec.:
(fontspec) <->"Times New
(fontspec) Roman/BI/OT:script=latn;language=dflt;+smcp;mapping=tex-
text;"
Package fontspec Info:
(fontspec) Font family 'Arial(0)' created for font 'Arial' with
(fontspec) options [Ligatures=TeX].
(fontspec)
(fontspec) This font family consists of the following NFSS
(fontspec) series/shapes:
(fontspec)
(fontspec) - 'normal' (m/n) with NFSS spec.:
(fontspec) <->"Arial/OT:script=latn;language=dflt;mapping=tex-text;
"
(fontspec) - 'small caps' (m/sc) with NFSS spec.:
(fontspec) <->"Arial/OT:script=latn;language=dflt;+smcp;mapping=tex
-text;"
(fontspec) - 'bold' (b/n) with NFSS spec.:
(fontspec) <->"Arial/B/OT:script=latn;language=dflt;mapping=tex-tex
t;"
(fontspec) - 'bold small caps' (b/sc) with NFSS spec.:
(fontspec) <->"Arial/B/OT:script=latn;language=dflt;+smcp;mapping=t
ex-text;"
(fontspec) - 'italic' (m/it) with NFSS spec.:
(fontspec) <->"Arial/I/OT:script=latn;language=dflt;mapping=tex-tex
t;"
(fontspec) - 'italic small caps' (m/scit) with NFSS spec.:
(fontspec) <->"Arial/I/OT:script=latn;language=dflt;+smcp;mapping=t
ex-text;"
(fontspec) - 'bold italic' (b/it) with NFSS spec.:
(fontspec) <->"Arial/BI/OT:script=latn;language=dflt;mapping=tex-te
xt;"
(fontspec) - 'bold italic small caps' (b/scit) with NFSS spec.:
(fontspec) <->"Arial/BI/OT:script=latn;language=dflt;+smcp;mapping=
tex-text;"
Package fontspec Info:
(fontspec) Font family 'Consolas(0)' created for font 'Consolas'
(fontspec) with options
(fontspec) [WordSpace={1,0,0},HyphenChar=None,PunctuationSpace=Word
Space].
(fontspec)
(fontspec) This font family consists of the following NFSS
(fontspec) series/shapes:
(fontspec)
(fontspec) - 'normal' (m/n) with NFSS spec.:
(fontspec) <->"Consolas/OT:script=latn;language=dflt;"
(fontspec) - 'bold' (b/n) with NFSS spec.:
(fontspec) <->"Consolas/B/OT:script=latn;language=dflt;"
(fontspec) - 'italic' (m/it) with NFSS spec.:
(fontspec) <->"Consolas/I/OT:script=latn;language=dflt;"
(fontspec) - 'bold italic' (b/it) with NFSS spec.:
(fontspec) <->"Consolas/BI/OT:script=latn;language=dflt;"
(./theory_and_reflection_1234560.aux)
\openout1 = `theory_and_reflection_1234560.aux'.
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 29.
LaTeX Font Info: ... okay on input line 29.
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 29.
LaTeX Font Info: ... okay on input line 29.
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 29.
LaTeX Font Info: ... okay on input line 29.
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 29.
LaTeX Font Info: ... okay on input line 29.
LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 29.
LaTeX Font Info: ... okay on input line 29.
LaTeX Font Info: Checking defaults for TU/lmr/m/n on input line 29.
LaTeX Font Info: ... okay on input line 29.
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 29.
LaTeX Font Info: ... okay on input line 29.
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 29.
LaTeX Font Info: ... okay on input line 29.
LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 29.
LaTeX Font Info: ... okay on input line 29.
LaTeX Font Info: Checking defaults for PU/pdf/m/n on input line 29.
LaTeX Font Info: ... okay on input line 29.
*geometry* driver: auto-detecting
*geometry* detected driver: xetex
*geometry* verbose mode - [ preamble ] result:
* driver: xetex
* paper: a4paper
* layout: <same size as paper>
* layoutoffset:(h,v)=(0.0pt,0.0pt)
* modes:
* h-part:(L,W,R)=(41.25641pt, 514.99506pt, 41.25641pt)
* v-part:(T,H,B)=(42.67912pt, 759.6886pt, 42.67912pt)
* \paperwidth=597.50787pt
* \paperheight=845.04684pt
* \textwidth=514.99506pt
* \textheight=759.6886pt
* \oddsidemargin=-31.01358pt
* \evensidemargin=-31.01358pt
* \topmargin=-66.59087pt
* \headheight=12.0pt
* \headsep=25.0pt
* \topskip=11.0pt
* \footskip=30.0pt
* \marginparwidth=50.0pt
* \marginparsep=10.0pt
* \columnsep=10.0pt
* \skip\footins=10.0pt plus 4.0pt minus 2.0pt
* \hoffset=0.0pt
* \voffset=0.0pt
* \mag=1000
* \@twocolumnfalse
* \@twosidefalse
* \@mparswitchfalse
* \@reversemarginfalse
* (1in=72.27pt=25.4mm, 1cm=28.453pt)
Package fontspec Info:
(fontspec) Adjusting the maths setup (use [no-math] to avoid
(fontspec) this).
\symlegacymaths=\mathgroup4
LaTeX Font Info: Overwriting symbol font `legacymaths' in version `bold'
(Font) OT1/cmr/m/n --> OT1/cmr/bx/n on input line 29.
LaTeX Font Info: Redeclaring math accent \acute on input line 29.
LaTeX Font Info: Redeclaring math accent \grave on input line 29.
LaTeX Font Info: Redeclaring math accent \ddot on input line 29.
LaTeX Font Info: Redeclaring math accent \tilde on input line 29.
LaTeX Font Info: Redeclaring math accent \bar on input line 29.
LaTeX Font Info: Redeclaring math accent \breve on input line 29.
LaTeX Font Info: Redeclaring math accent \check on input line 29.
LaTeX Font Info: Redeclaring math accent \hat on input line 29.
LaTeX Font Info: Redeclaring math accent \dot on input line 29.
LaTeX Font Info: Redeclaring math accent \mathring on input line 29.
LaTeX Font Info: Redeclaring math symbol \Gamma on input line 29.
LaTeX Font Info: Redeclaring math symbol \Delta on input line 29.
LaTeX Font Info: Redeclaring math symbol \Theta on input line 29.
LaTeX Font Info: Redeclaring math symbol \Lambda on input line 29.
LaTeX Font Info: Redeclaring math symbol \Xi on input line 29.
LaTeX Font Info: Redeclaring math symbol \Pi on input line 29.
LaTeX Font Info: Redeclaring math symbol \Sigma on input line 29.
LaTeX Font Info: Redeclaring math symbol \Upsilon on input line 29.
LaTeX Font Info: Redeclaring math symbol \Phi on input line 29.
LaTeX Font Info: Redeclaring math symbol \Psi on input line 29.
LaTeX Font Info: Redeclaring math symbol \Omega on input line 29.
LaTeX Font Info: Redeclaring math symbol \mathdollar on input line 29.
LaTeX Font Info: Redeclaring symbol font `operators' on input line 29.
LaTeX Font Info: Encoding `OT1' has changed to `TU' for symbol font
(Font) `operators' in the math version `normal' on input line 29.
LaTeX Font Info: Overwriting symbol font `operators' in version `normal'
(Font) OT1/cmr/m/n --> TU/TimesNewRoman(0)/m/n on input line 2
9.
LaTeX Font Info: Encoding `OT1' has changed to `TU' for symbol font
(Font) `operators' in the math version `bold' on input line 29.
LaTeX Font Info: Overwriting symbol font `operators' in version `bold'
(Font) OT1/cmr/bx/n --> TU/TimesNewRoman(0)/m/n on input line
29.
LaTeX Font Info: Overwriting symbol font `operators' in version `normal'
(Font) TU/TimesNewRoman(0)/m/n --> TU/TimesNewRoman(0)/m/n on
input line 29.
LaTeX Font Info: Overwriting math alphabet `\mathit' in version `normal'
(Font) OT1/cmr/m/it --> TU/TimesNewRoman(0)/m/it on input line
29.
LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `normal'
(Font) OT1/cmr/bx/n --> TU/TimesNewRoman(0)/b/n on input line
29.
LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `normal'
(Font) OT1/cmss/m/n --> TU/Arial(0)/m/n on input line 29.
LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `normal'
(Font) OT1/cmtt/m/n --> TU/Consolas(0)/m/n on input line 29.
LaTeX Font Info: Overwriting symbol font `operators' in version `bold'
(Font) TU/TimesNewRoman(0)/m/n --> TU/TimesNewRoman(0)/b/n on
input line 29.
LaTeX Font Info: Overwriting math alphabet `\mathit' in version `bold'
(Font) OT1/cmr/bx/it --> TU/TimesNewRoman(0)/b/it on input lin
e 29.
LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `bold'
(Font) OT1/cmss/bx/n --> TU/Arial(0)/b/n on input line 29.
LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `bold'
(Font) OT1/cmtt/m/n --> TU/Consolas(0)/b/n on input line 29.
Package hyperref Info: Link coloring OFF on input line 29.
(./theory_and_reflection_1234560.out) (./theory_and_reflection_1234560.out)
\@outlinefile=\write3
\openout3 = `theory_and_reflection_1234560.out'.
Package caption Info: Begin \AtBeginDocument code.
Package caption Info: End \AtBeginDocument code.
[1
]
File: ../outputs/figures/optuna_param_importance.png Graphic file (type bmp)
<../outputs/figures/optuna_param_importance.png>
[2]
[3]
[4] (./theory_and_reflection_1234560.aux)
***********
LaTeX2e <2024-11-01> patch level 2
L3 programming layer <2025-01-18>
***********
Package rerunfilecheck Info: File `theory_and_reflection_1234560.out' has not c
hanged.
(rerunfilecheck) Checksum: 52A657089D543B64425D1F91299C4F1D;807.
)
Here is how much of TeX's memory you used:
14081 strings out of 473832
277944 string characters out of 5733159
744098 words of memory out of 5000000
36945 multiletter control sequences out of 15000+600000
564585 words of font info for 103 fonts, out of 8000000 for 9000
1348 hyphenation exceptions out of 8191
79i,11n,93p,1218b,425s stack positions out of 10000i,1000n,20000p,200000b,200000s
Output written on theory_and_reflection_1234560.pdf (4 pages).
@@ -0,0 +1,122 @@
\documentclass[11pt,a4paper]{article}
\usepackage[margin=1.45cm,top=1.5cm,bottom=1.5cm]{geometry}
\usepackage{fontspec}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{array}
\usepackage{tabularx}
\usepackage{float}
\usepackage{hyperref}
\usepackage{caption}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{enumitem}
\usepackage{titlesec}
\usepackage{amsmath}
\setmainfont{Times New Roman}
\setsansfont{Arial}
\setmonofont{Consolas}
\setstretch{1.03}
\setlist[itemize]{leftmargin=1.1em,itemsep=0.12em,topsep=0.12em}
\captionsetup{font=small,labelfont=bf}
\titlespacing*{\section}{0pt}{0.6em}{0.28em}
\titlespacing*{\subsection}{0pt}{0.28em}{0.12em}
\titleformat{\section}{\large\bfseries}{\thesection.}{0.4em}{}
\newcolumntype{Y}{>{\centering\arraybackslash}X}
\pagestyle{plain}
\begin{document}
\begin{center}
{\Large \textbf{Theory and Reflection}}\\
\vspace{0.2em}
{\normalsize DTS304TC Coursework 1 \quad Student ID: 1234560}
\vspace{0.15em}
\rule{0.6\linewidth}{0.4pt}
\end{center}
\section{Bagging vs Boosting}
Bagging (Bootstrap Aggregating) trains $B$ independent decision-tree base learners on bootstrapped samples drawn uniformly from the original dataset, then aggregates their predictions by majority vote for classification or averaging for regression. By making each tree learn on a different random subset of the data, bagging reduces variance through decorrelation: even if individual trees overfit, their errors partially cancel out when combined. Boosting, by contrast, trains base learners sequentially: each new learner is fitted to the residuals or misclassified instances from the current ensemble, with the effect that the ensemble reduces bias more aggressively. The key conceptual difference is that bagging treats all base learners as equally informative, while boosting adaptively reweights observations based on past mistakes.
My notebook implemented a controlled comparison of Random Forest (representing bagging) and XGBoost (representing boosting) under identical preprocessing and identical train/validation split. This design is crucial: any difference in results must then reflect the learning algorithm itself, not differences in data preparation or evaluation.
Table~\ref{tab: supervised-comparison} summarises the key results drawn from \texttt{outputs/tables/personalised\_improvement\_summary.csv}. It shows model name, validation macro-F1, validation accuracy, the generalisation gap (train F1 minus val F1), per-class F1 scores, and training time. Four rows are shown: Baseline LR, Random Forest, untuned XGBoost, and Optuna-tuned XGBoost.
\begin{table}[H]
\centering
\caption{Controlled supervised model comparison (identical pipeline and split).}
\label{tab: supervised-comparison}
\small
\begin{tabularx}{\textwidth}{>{\raggedright\arraybackslash}p{2.3cm}YYYYYYY}
\toprule
Model & Val F1 & Val Acc & Gap & High F1 & Low F1 & Std F1 & Time(s)\\
\midrule
Baseline LR & 0.7238 & 0.7342 & 0.0146 & 0.7665 & 0.6490 & 0.7558 & --\\
Random Forest & 0.7708 & 0.7877 & \textbf{0.2292} & 0.7875 & 0.7095 & 0.8154 & 57.91\\
XGBoost & 0.8144 & 0.8371 & 0.0155 & 0.8905 & 0.6944 & 0.8583 & 67.64\\
Tuned XGBoost & 0.8520 & 0.8700 & 0.1219 & 0.9084 & 0.7620 & 0.8854 & 142.65\\
\bottomrule
\end{tabularx}\\[3pt]
{\small Gap = train\_F1 $-$ val\_F1.}
\end{table}
The results provide strong evidence for the theoretical predictions. Random Forest achieved a training macro-F1 of $1.0000$ (perfect fit on the training set) but a validation macro-F1 of only $0.7708$, yielding a generalisation gap of $0.2292$. This extreme overfitting is also confirmed visually in the Random Forest confusion matrix produced in the notebook. XGBoost, by contrast, had a training macro-F1 of $0.8297$ and a validation macro-F1 of $0.8144$, giving a gap of only $0.0155$. The difference in gaps is striking: RF's overfitting is roughly 15 times larger than XGBoost's.
The per-class F1 column in Table~\ref{tab: supervised-comparison} reveals further structure. Before tuning, RF achieved a Low-class F1 of $0.7095$, outperforming untuned XGBoost ($0.6944$) on the minority class---but this advantage disappears once XGBoost is tuned. After Optuna tuning, XGBoost's Low-class F1 rises to $0.7620$, a gain of $+0.0676$ over its untuned state and substantially higher than RF's $0.7095$. This demonstrates that boosting's sequential residual correction is better suited to learning the non-linear decision boundary between risk classes on this dataset. Bagging's variance-reduction mechanism cannot compensate for the bias that fully-grown trees impose on a mixed numerical and categorical feature space, which is why Random Forest underperforms here.
\section{Hyperparameter Optimisation}
I used Optuna with the TPE (Tree-structured Parzen Estimator) sampler for 30 trials, targeting maximisation of validation macro-F1. The search space covered nine XGBoost hyperparameters: n\_estimators (100--500), max\_depth (3--10), learning\_rate (0.01--0.3, log-scale), min\_child\_weight (1--10), subsample (0.5--1.0), colsample\_bytree (0.5--1.0), gamma (0--5), reg\_alpha ($10^{-4}$--10, log-scale), and reg\_lambda ($10^{-4}$--10, log-scale). The mixture of discrete and continuous parameters with multiple interactions makes a full grid search computationally prohibitive; TPE avoids exhaustive enumeration by modelling the density of good and bad trial configurations and directing subsequent searches toward promising regions of the parameter space.
Trial 22 produced the best validation macro-F1 of $0.8520$, a gain of $+0.0376$ over the untuned XGBoost baseline of $0.8144$. The optimal configuration was: n\_estimators$=276$, max\_depth$=9$, learning\_rate$\approx0.192$, subsample$\approx0.707$, colsample\_bytree$\approx0.799$, reg\_lambda$\approx5.0$, and gamma$\approx2.5$. These values align with expectations: a moderate learning rate combined with large tree depth and many estimators allows the model to fit complex interactions, while subsample and colsample ratios around $0.7$--$0.8$ provide regularisation. Figure~\ref{fig: param-importance} shows the Optuna parameter-importance plot, confirming that structural parameters and the learning rate dominated the optimisation.
\begin{figure}[H]
\centering
\fbox{\includegraphics[width=0.58\textwidth]{../outputs/figures/optuna_param_importance.png}}
\caption{Optuna parameter importance. Larger bars indicate higher influence on validation macro-F1.}
\label{fig: param-importance}
\end{figure}
The per-class F1 changes in Table~\ref{tab: supervised-comparison} deserve particular attention, because macro-F1 weights all three classes equally. Optuna's improvement in the \texttt{Low} class (minority) from $0.6944$ to $0.7620$ ($+0.0676$) is especially large, while the \texttt{High} class F1 increased from $0.8905$ to $0.9084$ ($+0.0179$) and the \texttt{Standard} class from $0.8583$ to $0.8854$ ($+0.0271$). This broad-based improvement across all three classes shows that TPE successfully optimised the class-balanced objective rather than overfitting to the majority class. The tuned model did not sacrifice performance on any single class to achieve higher overall metrics, which is exactly what the macro-F1 metric rewards.
\section{K-Means vs GMM}
K-Means assigns each sample $x_i$ to the cluster $c_i\in\{1,\dots,k\}$ whose centroid $\mu_c$ minimises the squared Euclidean distance $\|x_i-\mu_{c_i}\|^2$. This is a \textbf{hard assignment}: each sample belongs to exactly one cluster, with no notion of uncertainty or partial membership. GMM (Gaussian Mixture Model) takes a fundamentally different approach by modelling the data as a mixture of $k$ multivariate Gaussian distributions: $p(x)=\sum_{j=1}^{k}\pi_j\,\mathcal{N}(x\mid\mu_j,\Sigma_j)$, where $\pi_j$ are the mixing proportions. Each sample receives a posterior probability $p(c_j\mid x_i)$ for every component, enabling \textbf{soft assignment}: a sample can belong partially to multiple clusters. For insurance risk, where applicant profiles naturally overlap across risk bands rather than forming isolated groups, soft assignment is more aligned with the domain.
Table~\ref{tab: clustering} reports the complete clustering results from \texttt{outputs/tables/clustering\_comparison.csv}, covering k=2 through k=8. The columns are: $k$, K-Means inertia, K-Means silhouette score, GMM log-likelihood, GMM BIC, GMM AIC, and GMM silhouette score. K-Means silhouette scores remain low across the entire range, peaking at only $0.2015$ at $k=8$. This confirms that even the best K-Means configuration fails to find well-separated spherical clusters in this data. GMM achieves substantially higher silhouette scores: $0.4142$ at $k=2$ and $0.4015$ at $k=5$, which are roughly double the best K-Means values. At $k=2$, the GMM silhouette of $0.4142$ versus K-Means's $0.1740$ is particularly revealing: it suggests that the two-cluster structure in this insurance dataset is inherently probabilistic (overlapping Gaussian components) rather than discrete (centroid-defined).
\begin{table}[H]
\centering
\caption{Full clustering comparison across k=2 to k=8.}
\label{tab: clustering}
\footnotesize
\begin{tabularx}{\textwidth}{YYYYYY}
\toprule
$k$ & K-Means Inertia & K-Means Sil & GMM BIC & GMM AIC & GMM Sil\\
\midrule
2 & 1,092,962 & 0.1740 & $-$359,251 & $-$362,062 & \textbf{0.4142}\\
3 & 1,018,587 & 0.1732 & $-$1,103,445 & $-$1,107,666 & 0.2977\\
4 & 953,249 & 0.1808 & $-$1,938,815 & $-$1,944,446 & 0.3964\\
5 & 889,285 & 0.1964 & $-$1,997,256 & $-$2,004,298 & 0.4015\\
6 & 818,951 & 0.1768 & $-$2,349,766 & $-$2,358,217 & 0.2468\\
7 & 777,658 & 0.1971 & $-$2,394,381 & $-$2,404,243 & 0.3110\\
8 & 691,941 & \textbf{0.2015} & $-$2,510,221 & $-$2,521,493 & 0.1726\\
\bottomrule
\end{tabularx}
\end{table}
The GMM BIC column in Table~\ref{tab: clustering} shows a monotonic decrease with larger $k$, which is expected since adding more components always allows a better fit to the training data. However, BIC also penalises model complexity, so the rate of decrease slows at larger $k$, suggesting diminishing returns. The K-Means inertia curve is gradual with no sharp elbow, indicating the absence of a natural cluster count---another sign that the data does not contain clearly separable spherical structures. Overall, the GMM's consistently higher silhouette scores across most values of $k$ indicate that insurance applicants form probabilistic subtypes with soft boundaries. This validates the conceptual distinction between hard and soft assignment: GMM captures the overlapping nature of risk profiles that K-Means cannot represent. Importantly, neither clustering method is intended to replace the supervised classifier---they serve different objectives and the unsupervised analysis is purely exploratory.
\section{Personalised Improvement Reflection}
My compulsory category was \textbf{Category A: Data Quality and Missingness}. Before any modelling, I conducted an EDA that identified significant missing values in multiple columns. Five columns had particularly high missing rates: net\_monthly\_income\_gbp, avg\_payment\_delay\_days, monthly\_investment\_gbp, prior\_debt\_products, and account\_tenure (at 30.6, 19.0, 21.1, 7.6, and 4.3 percent respectively). Rather than treating missing values as noise and simply applying median imputation, I added five binary missing-indicator features---one for each of these five columns---appending them to the feature set alongside median imputation. This is based on the hypothesis that the \textit{pattern} of missingness itself may be informative: a missing income value might indicate financial instability or unemployment, which is a legitimate risk signal in insurance.
After adding the five missing indicators, validation macro-F1 rose from $0.8520$ (the Optuna-tuned model) to $0.8529$ (Category A XGBoost). The gain is modest ($+0.0009$) but meaningful, given that the tuned model was already strong and operating close to the performance ceiling implied by the feature space. More importantly, the gain confirms the hypothesis that missingness carries behavioural signal: in financial applications, missing income data does not occur at random and is therefore legitimately predictive. This also demonstrates an important methodological lesson: even small improvements should be interrogated to determine whether they reflect genuine signal or overfitting.
For my optional category, I implemented \textbf{Category D: Soft Voting Ensemble} by combining Random Forest and tuned XGBoost using soft voting (averaging predicted class probabilities). The ensemble achieved validation macro-F1 of $0.8510$, which is below both the Category A model ($0.8529$) and the tuned XGBoost alone ($0.8520$). This outcome is instructive: it shows that model diversity alone is insufficient for ensemble improvement. The two base learners had very different prediction profiles---RF overfitted dramatically while XGBoost was well-calibrated---and combining them diluted the boosting model's advantage rather than complementing it. In practice, effective ensembles typically require base learners that are both individually strong and diverse in their error patterns. My final model selection was therefore the Category A XGBoost, chosen strictly on the basis of validation macro-F1 evidence.
A critical prerequisite for all modelling steps was data leakage control. Before any model training, I screened all available features using single-feature DecisionTree cross-validation. The feature \texttt{bureau\_risk\_index} achieved a single-feature macro-F1 of $0.9999$---an extraordinarily high score that indicates near-perfect class separation. This immediately triggered the leakage detection threshold (set at $0.85$), and the feature was removed before any further experimentation. This step is fundamental: without removing the leakage feature, all subsequent validation scores in Table~\ref{tab: supervised-comparison} would be artificially inflated and every model comparison would be invalid. The leakage check also illustrates an important broader principle in applied machine learning: even when a feature appears to improve performance, it must be evaluated for its relationship to the target before being accepted.
\section{AI Use Declaration}
AI tools were used only in a limited support role throughout this coursework: they assisted with environment debugging (resolving package import and GPU configuration issues), and with \LaTeX{} formatting to produce the final document. The experimental design, leakage detection decision, controlled model comparison, personalised improvement strategy, and all written interpretations of tables, figures, and metrics were derived from my own notebook results. No claims are made about hidden-test performance; the CSV file (\texttt{test\_result\_1234560.csv}) follows the required filename and column order from the assignment brief, generated solely for submission formatting.
\end{document}
@@ -0,0 +1,17 @@
\relax
\providecommand\hyper@newdestlabel[2]{}
\providecommand*\HyPL@Entry[1]{}
\HyPL@Entry{0<</S/D>>}
\@writefile{toc}{\contentsline {section}{\numberline {1}Bagging 与 Boosting 对比}{1}{section.1}\protected@file@percent }
\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces 受控监督模型对比(相同流程和划分)。}}{1}{table.caption.1}\protected@file@percent }
\providecommand*\caption@xref[2]{\@setref\relax\@undefined{#1}}
\newlabel{tab: supervised-comparison}{{1}{1}{受控监督模型对比(相同流程和划分)。}{table.caption.1}{}}
\@writefile{toc}{\contentsline {section}{\numberline {2}超参数优化}{1}{section.2}\protected@file@percent }
\@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces Optuna 超参数重要性图。条越长表示对验证集宏-F1 的影响越大。}}{2}{figure.caption.2}\protected@file@percent }
\newlabel{fig: param-importance}{{1}{2}{Optuna 超参数重要性图。条越长表示对验证集宏-F1 的影响越大。}{figure.caption.2}{}}
\@writefile{toc}{\contentsline {section}{\numberline {3}K-Means 与 GMM 对比}{2}{section.3}\protected@file@percent }
\@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces 完整聚类对比(k=2 到 k=8)。}}{2}{table.caption.3}\protected@file@percent }
\newlabel{tab: clustering}{{2}{2}{完整聚类对比(k=2 到 k=8)。}{table.caption.3}{}}
\@writefile{toc}{\contentsline {section}{\numberline {4}个性化改进反思}{3}{section.4}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {5}AI 使用声明}{3}{section.5}\protected@file@percent }
\gdef \@abspage@last{3}
@@ -0,0 +1,687 @@
This is XeTeX, Version 3.141592653-2.6-0.999997 (TeX Live 2025) (preloaded format=xelatex 2025.6.5) 25 APR 2026 01:43
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
**theory_and_reflection_1234560_cn.tex
(./theory_and_reflection_1234560_cn.tex
LaTeX2e <2024-11-01> patch level 2
L3 programming layer <2025-01-18>
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/article.cls
Document Class: article 2024/06/29 v1.4n Standard LaTeX document class
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/size11.clo
File: size11.clo 2024/06/29 v1.4n Standard LaTeX file (size option)
)
\c@part=\count192
\c@section=\count193
\c@subsection=\count194
\c@subsubsection=\count195
\c@paragraph=\count196
\c@subparagraph=\count197
\c@figure=\count198
\c@table=\count199
\abovecaptionskip=\skip49
\belowcaptionskip=\skip50
\bibindent=\dimen141
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/geometry/geometry.sty
Package: geometry 2020/01/02 v5.9 Page Geometry
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics/keyval.sty
Package: keyval 2022/05/29 v1.15 key=value parser (DPC)
\KV@toks@=\toks17
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/iftex/ifvtex.sty
Package: ifvtex 2019/10/25 v1.7 ifvtex legacy package. Use iftex instead.
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/iftex/iftex.sty
Package: iftex 2024/12/12 v1.0g TeX engine tests
))
\Gm@cnth=\count266
\Gm@cntv=\count267
\c@Gm@tempcnt=\count268
\Gm@bindingoffset=\dimen142
\Gm@wd@mp=\dimen143
\Gm@odd@mp=\dimen144
\Gm@even@mp=\dimen145
\Gm@layoutwidth=\dimen146
\Gm@layoutheight=\dimen147
\Gm@layouthoffset=\dimen148
\Gm@layoutvoffset=\dimen149
\Gm@dimlist=\toks18
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/xelatex/xecjk/xeCJK.sty
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/l3kernel/expl3.sty
Package: expl3 2025-01-18 L3 programming layer (loader)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/l3backend/l3backend-xet
ex.def
File: l3backend-xetex.def 2024-05-08 L3 backend support: XeTeX
\g__graphics_track_int=\count269
\l__pdf_internal_box=\box52
\g__pdf_backend_annotation_int=\count270
\g__pdf_backend_link_int=\count271
))
Package: xeCJK 2022/08/05 v3.9.1 Typesetting CJK scripts with XeLaTeX
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/ctex/ctexhook.sty
Package: ctexhook 2022/07/14 v2.5.10 Document and package hooks (CTEX)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/l3packages/xtemplate/xt
emplate.sty
Package: xtemplate 2024-08-16 L3 Experimental prototype document functions
)
\l__xeCJK_tmp_int=\count272
\l__xeCJK_tmp_box=\box53
\l__xeCJK_tmp_dim=\dimen150
\l__xeCJK_tmp_skip=\skip51
\g__xeCJK_space_factor_int=\count273
\l__xeCJK_begin_int=\count274
\l__xeCJK_end_int=\count275
\c__xeCJK_CJK_class_int=\XeTeXcharclass1
\c__xeCJK_FullLeft_class_int=\XeTeXcharclass2
\c__xeCJK_FullRight_class_int=\XeTeXcharclass3
\c__xeCJK_HalfLeft_class_int=\XeTeXcharclass4
\c__xeCJK_HalfRight_class_int=\XeTeXcharclass5
\c__xeCJK_NormalSpace_class_int=\XeTeXcharclass6
\c__xeCJK_CM_class_int=\XeTeXcharclass7
\c__xeCJK_HangulJamo_class_int=\XeTeXcharclass8
\l__xeCJK_last_skip=\skip52
\c__xeCJK_none_node=\count276
\g__xeCJK_node_int=\count277
\c__xeCJK_CJK_node_dim=\dimen151
\c__xeCJK_CJK-space_node_dim=\dimen152
\c__xeCJK_default_node_dim=\dimen153
\c__xeCJK_CJK-widow_node_dim=\dimen154
\c__xeCJK_normalspace_node_dim=\dimen155
\c__xeCJK_default-space_node_skip=\skip53
\l__xeCJK_ccglue_skip=\skip54
\l__xeCJK_ecglue_skip=\skip55
\l__xeCJK_punct_kern_skip=\skip56
\l__xeCJK_indent_box=\box54
\l__xeCJK_last_penalty_int=\count278
\l__xeCJK_last_bound_dim=\dimen156
\l__xeCJK_last_kern_dim=\dimen157
\l__xeCJK_widow_penalty_int=\count279
LaTeX template Info: Declaring template type 'xeCJK/punctuation' taking 0
(template) argument(s) on line 2396.
\l__xeCJK_fixed_punct_width_dim=\dimen158
\l__xeCJK_mixed_punct_width_dim=\dimen159
\l__xeCJK_middle_punct_width_dim=\dimen160
\l__xeCJK_fixed_margin_width_dim=\dimen161
\l__xeCJK_mixed_margin_width_dim=\dimen162
\l__xeCJK_middle_margin_width_dim=\dimen163
\l__xeCJK_bound_punct_width_dim=\dimen164
\l__xeCJK_bound_margin_width_dim=\dimen165
\l__xeCJK_margin_minimum_dim=\dimen166
\l__xeCJK_kerning_total_width_dim=\dimen167
\l__xeCJK_same_align_margin_dim=\dimen168
\l__xeCJK_different_align_margin_dim=\dimen169
\l__xeCJK_kerning_margin_width_dim=\dimen170
\l__xeCJK_kerning_margin_minimum_dim=\dimen171
\l__xeCJK_bound_dim=\dimen172
\l__xeCJK_reverse_bound_dim=\dimen173
\l__xeCJK_margin_dim=\dimen174
\l__xeCJK_minimum_bound_dim=\dimen175
\l__xeCJK_kerning_margin_dim=\dimen176
\g__xeCJK_family_int=\count280
\l__xeCJK_fam_int=\count281
\g__xeCJK_fam_allocation_int=\count282
\l__xeCJK_verb_case_int=\count283
\l__xeCJK_verb_exspace_skip=\skip57
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/fontspec/fontspec.sty
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/l3packages/xparse/xpars
e.sty
Package: xparse 2024-08-16 L3 Experimental document command parser
)
Package: fontspec 2024/05/11 v2.9e Font selection for XeLaTeX and LuaLaTeX
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/fontspec/fontspec-xetex
.sty
Package: fontspec-xetex 2024/05/11 v2.9e Font selection for XeLaTeX and LuaLaTe
X
\l__fontspec_script_int=\count284
\l__fontspec_language_int=\count285
\l__fontspec_strnum_int=\count286
\l__fontspec_tmp_int=\count287
\l__fontspec_tmpa_int=\count288
\l__fontspec_tmpb_int=\count289
\l__fontspec_tmpc_int=\count290
\l__fontspec_em_int=\count291
\l__fontspec_emdef_int=\count292
\l__fontspec_strong_int=\count293
\l__fontspec_strongdef_int=\count294
\l__fontspec_tmpa_dim=\dimen177
\l__fontspec_tmpb_dim=\dimen178
\l__fontspec_tmpc_dim=\dimen179
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/fontenc.sty
Package: fontenc 2021/04/29 v2.0v Standard LaTeX package
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/fontspec/fontspec.cfg))
) (d:/settings/Language/texlive/2025/texmf-dist/tex/xelatex/xecjk/xeCJK.cfg
File: xeCJK.cfg 2022/08/05 v3.9.1 Configuration file for xeCJK package
))
Package fontspec Info:
(fontspec) Could not resolve font "SimSun/BI" (it probably doesn't
(fontspec) exist).
Package fontspec Info:
(fontspec) Could not resolve font "SimSun/B" (it probably doesn't
(fontspec) exist).
Package fontspec Info:
(fontspec) Could not resolve font "SimSun/I" (it probably doesn't
(fontspec) exist).
Package fontspec Info:
(fontspec) Font family 'SimSun(0)' created for font 'SimSun' with
(fontspec) options [Script={CJK}].
(fontspec)
(fontspec) This font family consists of the following NFSS
(fontspec) series/shapes:
(fontspec)
(fontspec) - 'normal' (m/n) with NFSS spec.:
(fontspec) <->"SimSun/OT:script=hani;language=dflt;"
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics/graphicx.sty
Package: graphicx 2021/09/16 v1.2d Enhanced LaTeX Graphics (DPC,SPQR)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics/graphics.sty
Package: graphics 2024/08/06 v1.4g Standard LaTeX Graphics (DPC,SPQR)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics/trig.sty
Package: trig 2023/12/02 v1.11 sin cos tan (DPC)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics-cfg/graphics.c
fg
File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration
)
Package graphics Info: Driver file: xetex.def on input line 106.
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/graphics-def/xetex.def
File: xetex.def 2022/09/22 v5.0n Graphics/color driver for xetex
))
\Gin@req@height=\dimen180
\Gin@req@width=\dimen181
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/booktabs/booktabs.sty
Package: booktabs 2020/01/12 v1.61803398 Publication quality tables
\heavyrulewidth=\dimen182
\lightrulewidth=\dimen183
\cmidrulewidth=\dimen184
\belowrulesep=\dimen185
\belowbottomsep=\dimen186
\aboverulesep=\dimen187
\abovetopsep=\dimen188
\cmidrulesep=\dimen189
\cmidrulekern=\dimen190
\defaultaddspace=\dimen191
\@cmidla=\count295
\@cmidlb=\count296
\@aboverulesep=\dimen192
\@belowrulesep=\dimen193
\@thisruleclass=\count297
\@lastruleclass=\count298
\@thisrulewidth=\dimen194
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/tools/array.sty
Package: array 2024/10/17 v2.6g Tabular extension package (FMi)
\col@sep=\dimen195
\ar@mcellbox=\box55
\extrarowheight=\dimen196
\NC@list=\toks19
\extratabsurround=\skip58
\backup@length=\skip59
\ar@cellbox=\box56
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/tools/tabularx.sty
Package: tabularx 2023/12/11 v2.12a `tabularx' package (DPC)
\TX@col@width=\dimen197
\TX@old@table=\dimen198
\TX@old@col=\dimen199
\TX@target=\dimen256
\TX@delta=\dimen257
\TX@cols=\count299
\TX@ftn=\toks20
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/float/float.sty
Package: float 2001/11/08 v1.3d Float enhancements (AL)
\c@float@type=\count300
\float@exts=\toks21
\float@box=\box57
\@float@everytoks=\toks22
\@floatcapt=\box58
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/hyperref.sty
Package: hyperref 2024-11-05 v7.01l Hypertext links for LaTeX
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/kvsetkeys/kvsetkeys.sty
Package: kvsetkeys 2022-10-05 v1.19 Key value parser (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/kvdefinekeys/kvdefine
keys.sty
Package: kvdefinekeys 2019-12-19 v1.6 Define keys (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/pdfescape/pdfescape.s
ty
Package: pdfescape 2019/12/09 v1.15 Implements pdfTeX's escape features (HO)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/ltxcmds/ltxcmds.sty
Package: ltxcmds 2023-12-04 v1.26 LaTeX kernel commands for general use (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/pdftexcmds/pdftexcmds
.sty
Package: pdftexcmds 2020-06-27 v0.33 Utility functions of pdfTeX for LuaTeX (HO
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/infwarerr/infwarerr.s
ty
Package: infwarerr 2019/12/03 v1.5 Providing info/warning/error messages (HO)
)
Package pdftexcmds Info: \pdf@primitive is available.
Package pdftexcmds Info: \pdf@ifprimitive is available.
Package pdftexcmds Info: \pdfdraftmode not found.
))
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hycolor/hycolor.sty
Package: hycolor 2020-01-27 v1.10 Color options for hyperref/bookmark (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/nameref.sty
Package: nameref 2023-11-26 v2.56 Cross-referencing by name of section
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/refcount/refcount.sty
Package: refcount 2019/12/15 v3.6 Data extraction from label references (HO)
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/gettitlestring/gettit
lestring.sty
Package: gettitlestring 2019/12/15 v1.6 Cleanup title references (HO)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/kvoptions/kvoptions.sty
Package: kvoptions 2022-06-15 v3.15 Key value format for package options (HO)
))
\c@section@level=\count301
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/etoolbox/etoolbox.sty
Package: etoolbox 2025/02/11 v2.5l e-TeX tools for LaTeX (JAW)
\etb@tempcnta=\count302
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/stringenc/stringenc.s
ty
Package: stringenc 2019/11/29 v1.12 Convert strings between diff. encodings (HO
)
)
\@linkdim=\dimen258
\Hy@linkcounter=\count303
\Hy@pagecounter=\count304
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/pd1enc.def
File: pd1enc.def 2024-11-05 v7.01l Hyperref: PDFDocEncoding definition (HO)
) (d:/settings/Language/texlive/2025/texmf-dist/tex/generic/intcalc/intcalc.sty
Package: intcalc 2019/12/15 v1.3 Expandable calculations with integers (HO)
)
\Hy@SavedSpaceFactor=\count305
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/puenc.def
File: puenc.def 2024-11-05 v7.01l Hyperref: PDF Unicode definition (HO)
)
Package hyperref Info: Hyper figures OFF on input line 4157.
Package hyperref Info: Link nesting OFF on input line 4162.
Package hyperref Info: Hyper index ON on input line 4165.
Package hyperref Info: Plain pages OFF on input line 4172.
Package hyperref Info: Backreferencing OFF on input line 4177.
Package hyperref Info: Implicit mode ON; LaTeX internals redefined.
Package hyperref Info: Bookmarks ON on input line 4424.
\c@Hy@tempcnt=\count306
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/url/url.sty
\Urlmuskip=\muskip17
Package: url 2013/09/16 ver 3.4 Verb mode for urls, etc.
)
LaTeX Info: Redefining \url on input line 4763.
\XeTeXLinkMargin=\dimen259
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/bitset/bitset.sty
Package: bitset 2019/12/09 v1.3 Handle bit-vector datatype (HO)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/bigintcalc/bigintcalc
.sty
Package: bigintcalc 2019/12/15 v1.5 Expandable calculations on big integers (HO
)
))
\Fld@menulength=\count307
\Field@Width=\dimen260
\Fld@charsize=\dimen261
Package hyperref Info: Hyper figures OFF on input line 6042.
Package hyperref Info: Link nesting OFF on input line 6047.
Package hyperref Info: Hyper index ON on input line 6050.
Package hyperref Info: backreferencing OFF on input line 6057.
Package hyperref Info: Link coloring OFF on input line 6062.
Package hyperref Info: Link coloring with OCG OFF on input line 6067.
Package hyperref Info: PDF/A mode OFF on input line 6072.
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/atbegshi-ltx.sty
Package: atbegshi-ltx 2021/01/10 v1.0c Emulation of the original atbegshi
package with kernel methods
)
\Hy@abspage=\count308
\c@Item=\count309
\c@Hfootnote=\count310
)
Package hyperref Info: Driver (autodetected): hxetex.
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/hyperref/hxetex.def
File: hxetex.def 2024-11-05 v7.01l Hyperref driver for XeTeX
\pdfm@box=\box59
\c@Hy@AnnotLevel=\count311
\HyField@AnnotCount=\count312
\Fld@listcount=\count313
\c@bookmark@seq@number=\count314
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/rerunfilecheck/rerunfil
echeck.sty
Package: rerunfilecheck 2022-07-10 v1.10 Rerun checks for auxiliary files (HO)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/base/atveryend-ltx.sty
Package: atveryend-ltx 2020/08/19 v1.0a Emulation of the original atveryend pac
kage
with kernel methods
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/generic/uniquecounter/uniquec
ounter.sty
Package: uniquecounter 2019/12/15 v1.4 Provide unlimited unique counter (HO)
)
Package uniquecounter Info: New unique counter `rerunfilecheck' on input line 2
85.
)
\Hy@SectionHShift=\skip60
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/caption/caption.sty
Package: caption 2023/08/05 v3.6o Customizing captions (AR)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/caption/caption3.sty
Package: caption3 2023/07/31 v2.4d caption3 kernel (AR)
\caption@tempdima=\dimen262
\captionmargin=\dimen263
\caption@leftmargin=\dimen264
\caption@rightmargin=\dimen265
\caption@width=\dimen266
\caption@indent=\dimen267
\caption@parindent=\dimen268
\caption@hangindent=\dimen269
Package caption Info: Standard document class detected.
)
\c@caption@flags=\count315
\c@continuedfloat=\count316
Package caption Info: float package is loaded.
Package caption Info: hyperref package is loaded.
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/setspace/setspace.sty
Package: setspace 2022/12/04 v6.7b set line spacing
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/parskip/parskip.sty
Package: parskip 2021-03-14 v2.0h non-zero parskip adjustments
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/enumitem/enumitem.sty
Package: enumitem 2025/02/06 v3.11 Customized lists
\labelindent=\skip61
\enit@outerparindent=\dimen270
\enit@toks=\toks23
\enit@inbox=\box60
\enit@count@id=\count317
\enitdp@description=\count318
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/titlesec/titlesec.sty
Package: titlesec 2025/01/04 v2.17 Sectioning titles
\ttl@box=\box61
\beforetitleunit=\skip62
\aftertitleunit=\skip63
\ttl@plus=\dimen271
\ttl@minus=\dimen272
\ttl@toksa=\toks24
\titlewidth=\dimen273
\titlewidthlast=\dimen274
\titlewidthfirst=\dimen275
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amsmath.sty
Package: amsmath 2024/11/05 v2.17t AMS math features
\@mathmargin=\skip64
For additional information on amsmath, use the `?' option.
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amstext.sty
Package: amstext 2021/08/26 v2.01 AMS text
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amsgen.sty
File: amsgen.sty 1999/11/30 v2.0 generic functions
\@emptytoks=\toks25
\ex@=\dimen276
))
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amsbsy.sty
Package: amsbsy 1999/11/29 v1.2d Bold Symbols
\pmbraise@=\dimen277
)
(d:/settings/Language/texlive/2025/texmf-dist/tex/latex/amsmath/amsopn.sty
Package: amsopn 2022/04/08 v2.04 operator names
)
\inf@bad=\count319
LaTeX Info: Redefining \frac on input line 233.
\uproot@=\count320
\leftroot@=\count321
LaTeX Info: Redefining \overline on input line 398.
LaTeX Info: Redefining \colon on input line 409.
\classnum@=\count322
\DOTSCASE@=\count323
LaTeX Info: Redefining \ldots on input line 495.
LaTeX Info: Redefining \dots on input line 498.
LaTeX Info: Redefining \cdots on input line 619.
\Mathstrutbox@=\box62
\strutbox@=\box63
LaTeX Info: Redefining \big on input line 721.
LaTeX Info: Redefining \Big on input line 722.
LaTeX Info: Redefining \bigg on input line 723.
LaTeX Info: Redefining \Bigg on input line 724.
\big@size=\dimen278
LaTeX Font Info: Redeclaring font encoding OML on input line 742.
LaTeX Font Info: Redeclaring font encoding OMS on input line 743.
\macc@depth=\count324
LaTeX Info: Redefining \bmod on input line 904.
LaTeX Info: Redefining \pmod on input line 909.
LaTeX Info: Redefining \smash on input line 939.
LaTeX Info: Redefining \relbar on input line 969.
LaTeX Info: Redefining \Relbar on input line 970.
\c@MaxMatrixCols=\count325
\dotsspace@=\muskip18
\c@parentequation=\count326
\dspbrk@lvl=\count327
\tag@help=\toks26
\row@=\count328
\column@=\count329
\maxfields@=\count330
\andhelp@=\toks27
\eqnshift@=\dimen279
\alignsep@=\dimen280
\tagshift@=\dimen281
\tagwidth@=\dimen282
\totwidth@=\dimen283
\lineht@=\dimen284
\@envbody=\toks28
\multlinegap=\skip65
\multlinetaggap=\skip66
\mathdisplay@stack=\toks29
LaTeX Info: Redefining \[ on input line 2953.
LaTeX Info: Redefining \] on input line 2954.
)
(./theory_and_reflection_1234560_cn.aux)
\openout1 = `theory_and_reflection_1234560_cn.aux'.
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 27.
LaTeX Font Info: ... okay on input line 27.
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 27.
LaTeX Font Info: ... okay on input line 27.
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 27.
LaTeX Font Info: ... okay on input line 27.
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 27.
LaTeX Font Info: ... okay on input line 27.
LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 27.
LaTeX Font Info: ... okay on input line 27.
LaTeX Font Info: Checking defaults for TU/lmr/m/n on input line 27.
LaTeX Font Info: ... okay on input line 27.
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 27.
LaTeX Font Info: ... okay on input line 27.
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 27.
LaTeX Font Info: ... okay on input line 27.
LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 27.
LaTeX Font Info: ... okay on input line 27.
LaTeX Font Info: Checking defaults for PU/pdf/m/n on input line 27.
LaTeX Font Info: ... okay on input line 27.
*geometry* driver: auto-detecting
*geometry* detected driver: xetex
*geometry* verbose mode - [ preamble ] result:
* driver: xetex
* paper: a4paper
* layout: <same size as paper>
* layoutoffset:(h,v)=(0.0pt,0.0pt)
* modes:
* h-part:(L,W,R)=(41.25641pt, 514.99506pt, 41.25641pt)
* v-part:(T,H,B)=(42.67912pt, 759.6886pt, 42.67912pt)
* \paperwidth=597.50787pt
* \paperheight=845.04684pt
* \textwidth=514.99506pt
* \textheight=759.6886pt
* \oddsidemargin=-31.01358pt
* \evensidemargin=-31.01358pt
* \topmargin=-66.59087pt
* \headheight=12.0pt
* \headsep=25.0pt
* \topskip=11.0pt
* \footskip=30.0pt
* \marginparwidth=50.0pt
* \marginparsep=10.0pt
* \columnsep=10.0pt
* \skip\footins=10.0pt plus 4.0pt minus 2.0pt
* \hoffset=0.0pt
* \voffset=0.0pt
* \mag=1000
* \@twocolumnfalse
* \@twosidefalse
* \@mparswitchfalse
* \@reversemarginfalse
* (1in=72.27pt=25.4mm, 1cm=28.453pt)
Package fontspec Info:
(fontspec) Adjusting the maths setup (use [no-math] to avoid
(fontspec) this).
\symlegacymaths=\mathgroup4
LaTeX Font Info: Overwriting symbol font `legacymaths' in version `bold'
(Font) OT1/cmr/m/n --> OT1/cmr/bx/n on input line 27.
LaTeX Font Info: Redeclaring math accent \acute on input line 27.
LaTeX Font Info: Redeclaring math accent \grave on input line 27.
LaTeX Font Info: Redeclaring math accent \ddot on input line 27.
LaTeX Font Info: Redeclaring math accent \tilde on input line 27.
LaTeX Font Info: Redeclaring math accent \bar on input line 27.
LaTeX Font Info: Redeclaring math accent \breve on input line 27.
LaTeX Font Info: Redeclaring math accent \check on input line 27.
LaTeX Font Info: Redeclaring math accent \hat on input line 27.
LaTeX Font Info: Redeclaring math accent \dot on input line 27.
LaTeX Font Info: Redeclaring math accent \mathring on input line 27.
LaTeX Font Info: Redeclaring math symbol \Gamma on input line 27.
LaTeX Font Info: Redeclaring math symbol \Delta on input line 27.
LaTeX Font Info: Redeclaring math symbol \Theta on input line 27.
LaTeX Font Info: Redeclaring math symbol \Lambda on input line 27.
LaTeX Font Info: Redeclaring math symbol \Xi on input line 27.
LaTeX Font Info: Redeclaring math symbol \Pi on input line 27.
LaTeX Font Info: Redeclaring math symbol \Sigma on input line 27.
LaTeX Font Info: Redeclaring math symbol \Upsilon on input line 27.
LaTeX Font Info: Redeclaring math symbol \Phi on input line 27.
LaTeX Font Info: Redeclaring math symbol \Psi on input line 27.
LaTeX Font Info: Redeclaring math symbol \Omega on input line 27.
LaTeX Font Info: Redeclaring math symbol \mathdollar on input line 27.
LaTeX Font Info: Redeclaring symbol font `operators' on input line 27.
LaTeX Font Info: Encoding `OT1' has changed to `TU' for symbol font
(Font) `operators' in the math version `normal' on input line 27.
LaTeX Font Info: Overwriting symbol font `operators' in version `normal'
(Font) OT1/cmr/m/n --> TU/lmr/m/n on input line 27.
LaTeX Font Info: Encoding `OT1' has changed to `TU' for symbol font
(Font) `operators' in the math version `bold' on input line 27.
LaTeX Font Info: Overwriting symbol font `operators' in version `bold'
(Font) OT1/cmr/bx/n --> TU/lmr/m/n on input line 27.
LaTeX Font Info: Overwriting symbol font `operators' in version `normal'
(Font) TU/lmr/m/n --> TU/lmr/m/n on input line 27.
LaTeX Font Info: Overwriting math alphabet `\mathit' in version `normal'
(Font) OT1/cmr/m/it --> TU/lmr/m/it on input line 27.
LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `normal'
(Font) OT1/cmr/bx/n --> TU/lmr/b/n on input line 27.
LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `normal'
(Font) OT1/cmss/m/n --> TU/lmss/m/n on input line 27.
LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `normal'
(Font) OT1/cmtt/m/n --> TU/lmtt/m/n on input line 27.
LaTeX Font Info: Overwriting symbol font `operators' in version `bold'
(Font) TU/lmr/m/n --> TU/lmr/b/n on input line 27.
LaTeX Font Info: Overwriting math alphabet `\mathit' in version `bold'
(Font) OT1/cmr/bx/it --> TU/lmr/b/it on input line 27.
LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `bold'
(Font) OT1/cmss/bx/n --> TU/lmss/b/n on input line 27.
LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `bold'
(Font) OT1/cmtt/m/n --> TU/lmtt/b/n on input line 27.
Package hyperref Info: Link coloring OFF on input line 27.
(./theory_and_reflection_1234560_cn.out)
(./theory_and_reflection_1234560_cn.out)
\@outlinefile=\write3
\openout3 = `theory_and_reflection_1234560_cn.out'.
Package caption Info: Begin \AtBeginDocument code.
Package caption Info: End \AtBeginDocument code.
LaTeX Font Warning: Font shape `TU/SimSun(0)/b/n' undefined
(Font) using `TU/SimSun(0)/m/n' instead on input line 30.
Package xeCJK Warning: Unknown CJK family `\CJKttdefault' is being ignored.
(xeCJK)
(xeCJK) Try to use `\setCJKmonofont[<...>]{<...>}' to define
(xeCJK) it.
File: ../outputs/figures/optuna_param_importance.png Graphic file (type bmp)
<../outputs/figures/optuna_param_importance.png>
[1
]
LaTeX Font Warning: Font shape `TU/SimSun(0)/m/it' undefined
(Font) using `TU/SimSun(0)/m/n' instead on input line 109.
[2]
[3] (./theory_and_reflection_1234560_cn.aux)
***********
LaTeX2e <2024-11-01> patch level 2
L3 programming layer <2022/08/05>
***********
LaTeX Font Warning: Some font shapes were not available, defaults substituted.
Package rerunfilecheck Info: File `theory_and_reflection_1234560_cn.out' has no
t changed.
(rerunfilecheck) Checksum: A703C6812D998839E80788701035983A;582.
)
Here is how much of TeX's memory you used:
15601 strings out of 473832
329643 string characters out of 5733159
807732 words of memory out of 5000000
38476 multiletter control sequences out of 15000+600000
564951 words of font info for 81 fonts, out of 8000000 for 9000
1348 hyphenation exceptions out of 8191
74i,11n,92p,1221b,478s stack positions out of 10000i,1000n,20000p,200000b,200000s
Output written on theory_and_reflection_1234560_cn.pdf (3 pages).
@@ -0,0 +1,120 @@
\documentclass[11pt,a4paper]{article}
\usepackage[margin=1.45cm,top=1.5cm,bottom=1.5cm]{geometry}
\usepackage{xeCJK}
\setCJKmainfont{SimSun}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{array}
\usepackage{tabularx}
\usepackage{float}
\usepackage{hyperref}
\usepackage{caption}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{enumitem}
\usepackage{titlesec}
\usepackage{amsmath}
\setstretch{1.03}
\setlist[itemize]{leftmargin=1.1em,itemsep=0.12em,topsep=0.12em}
\captionsetup{font=small,labelfont=bf}
\titlespacing*{\section}{0pt}{0.6em}{0.28em}
\titlespacing*{\subsection}{0pt}{0.28em}{0.12em}
\titleformat{\section}{\large\bfseries}{\thesection.}{0.4em}{}
\newcolumntype{Y}{>{\centering\arraybackslash}X}
\pagestyle{plain}
\begin{document}
\begin{center}
{\Large \textbf{理论与反思}}\\
\vspace{0.2em}
{\normalsize DTS304TC 课程作业 1 \quad 学号: 1234560}
\vspace{0.15em}
\rule{0.6\linewidth}{0.4pt}
\end{center}
\section{Bagging 与 Boosting 对比}
BaggingBootstrap Aggregating,自助聚合)通过对原始数据集进行 $B$ 次独立自助采样,在每次采样得到的子集上训练一棵决策树作为基学习器,然后通过多数投票(分类)或平均(回归)来聚合所有基学习器的预测结果。由于每棵树都在不同的随机数据子集上学习,它们的预测误差在聚合时会部分相互抵消,因此 Bagging 主要降低方差。相比之下,Boosting 按顺序训练基学习器:每个新的学习器都专注于之前集成模型的残差或错误样本,从而更激进地降低偏差。两者的核心概念区别在于:Bagging 将所有基学习器视为同等重要,而 Boosting 根据过去的错误动态地重新调整观测样本的权重。
我的 Notebook 在相同的预处理流程和相同的训练/验证划分下,对 Random Forest(代表 Bagging)和 XGBoost(代表 Boosting)进行了受控比较。这种设计至关重要:任何结果差异都必然反映学习算法本身的特性,而非数据准备或评估方式的不同。
表~\ref{tab: supervised-comparison} 总结了来自 \texttt{outputs/tables/personalised\_improvement\_summary.csv} 的关键结果,包含模型名称、验证集宏-F1、验证集准确率、泛化差距(训练 F1 减去验证 F1)、各类 F1 以及训练时间四个模型的完整对比。
\begin{table}[H]
\centering
\caption{受控监督模型对比(相同流程和划分)。}
\label{tab: supervised-comparison}
\small
\begin{tabularx}{\textwidth}{>{\raggedright\arraybackslash}p{2.3cm}YYYYYYY}
\toprule
模型 & 验证 F1 & 验证 Acc & Gap & High F1 & Low F1 & Std F1 & 时间(s)\\
\midrule
Baseline LR & 0.7238 & 0.7342 & 0.0146 & 0.7665 & 0.6490 & 0.7558 & --\\
Random Forest & 0.7708 & 0.7877 & \textbf{0.2292} & 0.7875 & 0.7095 & 0.8154 & 57.91\\
XGBoost & 0.8144 & 0.8371 & 0.0155 & 0.8905 & 0.6944 & 0.8583 & 67.64\\
调参后 XGBoost & 0.8520 & 0.8700 & 0.1219 & 0.9084 & 0.7620 & 0.8854 & 142.65\\
\bottomrule
\end{tabularx}\\[3pt]
{\small Gap = train\_F1 $-$ val\_F1(即过拟合差距)。}
\end{table}
结果为理论预测提供了有力证据。Random Forest 取得了完美的训练宏-F1($1.0000$),但验证宏-F1 仅 $0.7708$,泛化差距高达 $0.2292$。这种严重过拟合也已在 Notebook 输出的 Random Forest 混淆矩阵中得到视觉确认。相比之下,XGBoost 的训练宏-F1 为 $0.8297$,验证宏-F1 为 $0.8144$,差距仅 $0.0155$。差距的对比令人印象深刻:RF 的过拟合程度大约是 XGBoost 的 15 倍。
表~\ref{tab: supervised-comparison} 中按类别 F1 列进一步揭示了细节。调参前,RF 在少数类 Low 上(F1=0.7095)优于未调参的 XGBoostF1=0.6944),但这一优势在 XGBoost 调参后消失。Optuna 调参后,XGBoost 的 Low 类 F1 升至 $0.7620$,相比调参前提升了 $+0.0676$,远高于 RF 的 $0.7095$。这表明 Boosting 的序列残差修正机制更适合学习本数据集中各类别之间的非线性决策边界。Bagging 的方差降低机制无法弥补 RF 全生长树在混合数值和类别特征空间中引入的偏差,这就是 Random Forest 在此处表现不佳的原因。
\section{超参数优化}
我使用 Optuna 配合 TPE(树结构 Parzen 估计器)采样器进行了 30 次试验,目标是最大化验证集宏-F1。搜索空间覆盖了 9 个 XGBoost 超参数:n\_estimators100--500)、max\_depth3--10)、learning\_rate0.01--0.3,对数尺度)、min\_child\_weight1--10)、subsample0.5--1.0)、colsample\_bytree0.5--1.0)、gamma0--5)、reg\_alpha$10^{-4}$--10,对数尺度)以及 reg\_lambda$10^{-4}$--10,对数尺度)。离散和连续参数混合且存在多维交互,使得穷举网格搜索在计算上不可行;TPE 通过对好试验和坏试验配置的密度建模,并将后续搜索引导至有前景的参数区域,从而高效地探索搜索空间。
第 22 次试验取得了最佳验证宏-F1 $0.8520$,相比未调参的 XGBoost 基线($0.8144$)提升了 $+0.0376$。该试验的最优配置为:n\_estimators$=276$、max\_depth$=9$、learning\_rate$\approx0.192$、subsample$\approx0.707$、colsample\_bytree$\approx0.799$、reg\_lambda$\approx5.0$、gamma$\approx2.5$。这些值符合预期:适中的学习率配合大树的深度和大量估计器,使模型能够拟合复杂交互,而 0.7--0.8 左右的 subsample 和 colsample 比率提供了正则化效果。图~\ref{fig: param-importance} 显示了 Optuna 参数重要性图,证实了结构参数和学习率主导了优化过程。
\begin{figure}[H]
\centering
\fbox{\includegraphics[width=0.58\textwidth]{../outputs/figures/optuna_param_importance.png}}
\caption{Optuna 超参数重要性图。条越长表示对验证集宏-F1 的影响越大。}
\label{fig: param-importance}
\end{figure}
表~\ref{tab: supervised-comparison} 中按类别 F1 的变化值得特别关注,因为宏-F1 对所有三个类别赋予同等权重。Optuna 将 Low 类(少数类)F1 从 $0.6944$ 提升至 $0.7620$$+0.0676$),High 类 F1 从 $0.8905$ 提升至 $0.9084$$+0.0179$),Standard 类从 $0.8583$ 提升至 $0.8854$$+0.0271$)。这种全面改善表明 TPE 成功优化了类别平衡的目标,而非过拟合到多数类。调参后的模型在没有任何类别性能下降的情况下实现了这一目标,而这正是宏-F1 所奖励的。
\section{K-Means 与 GMM 对比}
K-Means 将每个样本 $x_i$ 分配给质心 $\mu_c$ 使 $\|x_i-\mu_{c_i}\|^2$ 最小的簇 $c_i\in\{1,\dots,k\}$,这是\textbf{硬分配}:每个样本只属于一个簇,没有不确定性或部分成员身份的概念。GMM(高斯混合模型)采用了根本不同的方法,将数据建模为 $k$ 个多元高斯分布的混合:$p(x)=\sum_{j=1}^{k}\pi_j\,\mathcal{N}(x\mid\mu_j,\Sigma_j)$,其中 $\pi_j$ 为混合比例。每个样本获得每个成分的后验概率 $p(c_j\mid x_i)$,从而实现\textbf{软分配}:一个样本可以部分属于多个簇。在保险风险场景中,申请人档案在各风险等级之间自然重叠,而非形成孤立的群体,因此软分配更符合领域实际。
表~\ref{tab: clustering} 展示了来自 \texttt{outputs/tables/clustering\_comparison.csv} 的完整聚类结果,涵盖 k=2 到 k=8。列依次为:$k$、K-Means 惯性、K-Means 轮廓系数、GMM 对数似然、GMM BIC、GMM AIC、GMM 轮廓系数。K-Means 的轮廓系数在所有 $k$ 上持续偏低,峰值仅为 $0.2015$(k=8 时)。这证实了即使是最优的 K-Means 配置也无法在数据中找到分离良好的球形簇。GMM 获得了明显更高的轮廓系数:k=2 时为 $0.4142$k=5 时为 $0.4015$,约为最佳 K-Means 值的两倍。在 k=2 时,GMM 的轮廓系数 $0.4142$ 对比 K-Means 的 $0.1740$ 特别说明:数据中的二簇结构本质上是概率性的(重叠的高斯成分)而非离散的(质心定义)。
\begin{table}[H]
\centering
\caption{完整聚类对比(k=2 到 k=8)。}
\label{tab: clustering}
\footnotesize
\begin{tabularx}{\textwidth}{YYYYYY}
\toprule
$k$ & K-Means 惯性 & K-Means 轮廓 & GMM BIC & GMM AIC & GMM 轮廓\\
\midrule
2 & 1,092,962 & 0.1740 & $-$359,251 & $-$362,062 & \textbf{0.4142}\\
3 & 1,018,587 & 0.1732 & $-$1,103,445 & $-$1,107,666 & 0.2977\\
4 & 953,249 & 0.1808 & $-$1,938,815 & $-$1,944,446 & 0.3964\\
5 & 889,285 & 0.1964 & $-$1,997,256 & $-$2,004,298 & 0.4015\\
6 & 818,951 & 0.1768 & $-$2,349,766 & $-$2,358,217 & 0.2468\\
7 & 777,658 & 0.1971 & $-$2,394,381 & $-$2,404,243 & 0.3110\\
8 & 691,941 & \textbf{0.2015} & $-$2,510,221 & $-$2,521,493 & 0.1726\\
\bottomrule
\end{tabularx}
\end{table}
表~\ref{tab: clustering} 中的 GMM BIC 列显示随着 $k$ 增大 BIC 单调下降(变好),这是预期的,因为增加成分总能改善对训练数据的拟合。但 BIC 同时惩罚模型复杂度,所以下降速度会随 $k$ 增大而放缓,表明边际收益递减。K-Means 惯性曲线没有明显的"肘部"(转折点),表明数据中不存在自然聚类数——这进一步说明数据不具备分离良好的球形结构。总体而言,GMM 在大多数 $k$ 值上持续更高的轮廓系数表明:保险申请人确实形成了具有软边界的概率亚型。这验证了硬分配与软分配的概念区别:GMM 捕获了 K-Means 无法表示的风险档案重叠性质。重要的是,两种聚类方法都不打算替代监督分类器——它们服务于不同的目标,无监督分析完全是探索性的。
\section{个性化改进反思}
我的必做类别是 \textbf{类别 A:数据质量与缺失值处理}。在任何建模之前,我进行了 EDA,在多个列中发现了大量缺失值。五个缺失率最高的列分别是:net\_monthly\_income\_gbp、avg\_payment\_delay\_days、monthly\_investment\_gbp、prior\_debt\_products、account\_tenure(分别为 30.6、19.0、21.1、7.6、4.3 百分比)。我没有将缺失值视为噪声而简单使用中位数填充,而是添加了五个二进制缺失指示特征——每个上述列各一个——同时保留中位数填充。这种方法基于一个假设:缺失的\textit{模式}本身可能携带信息——缺失的收入值可能表明财务不稳定或失业,这在保险中是一个合法的风险信号。
添加这五个缺失指示特征后,验证宏-F1 从 $0.8520$Optuna 调参模型)提升至 $0.8529$(类别 A XGBoost)。收益较小($+0.0009$),但考虑到调参后的模型已经很强并接近特征空间所隐含的性能上限,这一提升仍有意义。更重要的是,这一提升证实了假设:缺失值携带行为信号——在金融应用中,缺失的收入数据并非随机发生,因此具有合法的预测性。这也展示了一个重要的方法论教训:即使是小的改进,也应被审视以判断它们是反映了真实信号还是过拟合。
对于可选类别,我通过软投票(平均预测类别概率)结合 Random Forest 和调参后的 XGBoost,实现了 \textbf{类别 D:软投票集成}。集成模型取得了验证宏-F1 $0.8510$,低于类别 A 模型($0.8529$)和单独的调参 XGBoost$0.8520$)。这一结果很有启发性:它表明模型多样性本身不足以带来集成改善。两个基学习器的预测画像差异很大——RF 严重过拟合而 XGBoost 校准良好——将它们结合反而稀释了 Boosting 模型的优势,而非补充它。在实践中,有效的集成通常要求基学习器既个体表现强又在错误模式上具有多样性。因此我的最终模型选择是基于严格验证宏-F1 证据,选择类别 A 的 XGBoost。
所有建模步骤的关键前提是数据泄漏控制。在任何模型训练之前,我使用单特征决策树交叉验证对所有可用特征进行了筛查。特征 bureau\_risk\_index 取得了单特征宏-F1 $0.9999$——一个极高的分数,表明接近完美的类别分离。这立即触发了泄漏检测阈值(设为 $0.85$),该特征在所有进一步实验之前被移除。这一步至关重要:没有移除泄漏特征,表~\ref{tab: supervised-comparison} 中的所有高验证宏-F1 分数都将被人为夸大,每个模型比较都会失效。泄漏检测还说明了一个更广泛的重要原则:在应用机器学习中,即使某个特征看起来改善了性能,也必须先评估它与目标的关系,然后才能接受。
\section{AI 使用声明}
在整个课程作业中,AI 工具仅在有限的支持角色中使用:协助环境调试(解决包导入和 GPU 配置问题)以及 \LaTeX{} 格式设置以生成最终文档。实验设计、泄漏检测决策、受控模型比较、个性化改进策略以及对表格、图表和指标的所有书面解读均来自我自己的 Notebook 结果。未声称任何隐藏测试性能;CSV 文件(\texttt{test\_result\_1234560.csv})遵循作业说明中要求的文件名和列顺序,仅因提交格式需要而生成。
\end{document}