chore: 更新项目文档、依赖和训练脚本
- 更新 requirements.txt,添加 opencv-python-headless 并补充 uv 安装说明 - 修复 CSV 文件中的换行符格式(CRLF 转 LF) - 更新 TASK_PROGRESS.md,记录并行训练实现和 WSL 支持 - 优化 train_improved.py 代码格式,移除多余空行和注释 - 更新课程作业要求文档的字符编码 - 添加新的 TensorBoard 日志文件和训练模型
This commit is contained in:
+250
-250
@@ -1,251 +1,251 @@
|
||||
XJTLU Entrepreneur College (Taicang) Cover Sheet
|
||||
|
||||
Module code and Title DTS307TC Reinforcement Learning
|
||||
School Title School of AI and Advanced Computing
|
||||
Assignment Title Coursework 1
|
||||
Submission Deadline 04/May/2026 23:59
|
||||
Final Word Count
|
||||
If you agree to let the university use your work anonymously for teaching
|
||||
and learning purposes, please type “yes” here.
|
||||
|
||||
|
||||
I certify that I have read and understood the University’s Policy for dealing with Plagiarism,
|
||||
Collusion and the Fabrication of Data (available on Learning Mall Online). With reference to this
|
||||
policy I certify that:
|
||||
|
||||
• My work does not contain any instances of plagiarism and/or collusion.
|
||||
My work does not contain any fabricated data.
|
||||
|
||||
|
||||
|
||||
By uploading my assignment onto Learning Mall Online, I formally declare
|
||||
that all of the above information is true to the best of my knowledge and
|
||||
belief.
|
||||
Scoring – For Tutor Use
|
||||
Student ID
|
||||
|
||||
Stage of Marker Learning Outcomes Achieved (F/P/M/D) Final
|
||||
Marking Code (please modify as appropriate) Score
|
||||
A B C
|
||||
1st Marker – red
|
||||
pen
|
||||
Moderation The original mark has been accepted by the moderator Y/N
|
||||
IM (please circle as appropriate):
|
||||
– green pen Initials
|
||||
Data entry and score calculation have been checked by Y
|
||||
another tutor (please circle):
|
||||
2nd Marker if
|
||||
needed – green
|
||||
pen
|
||||
For Academic Office Use Possible Academic Infringement (please tick as appropriate)
|
||||
Date Days Late ☐ Category A
|
||||
Received late Penalty Total Academic Infringement Penalty
|
||||
☐ Category B (A,B, C, D, E, Please modify where
|
||||
necessary) _____________________
|
||||
☐ Category C
|
||||
☐ Category D
|
||||
☐ Category E
|
||||
School of Artificial Intelligence and Advanced Computing
|
||||
Xi’an Jiaotong-Liverpool University
|
||||
|
||||
|
||||
|
||||
|
||||
DTS307TC Reinforcement Learning
|
||||
|
||||
Coursework - Individual Report
|
||||
|
||||
Due: 04/May/2026 23:59
|
||||
Weight: 40%
|
||||
Maximum score: 40 marks
|
||||
|
||||
|
||||
|
||||
|
||||
Overview
|
||||
|
||||
The purpose of this assignment is to gain experience in Python programming and the design of
|
||||
reinforcement leaning algorithms. You are expected to implement an RL algorithm that solves a
|
||||
specific environment and provide an explanation of the algorithm’s methodology. You are expected
|
||||
to analyse your results, including challenges and your solutions.
|
||||
|
||||
|
||||
Learning Outcomes Assessed
|
||||
|
||||
A: Systematically understand the fundamental concepts and principles of reinforcement learning
|
||||
B: Critically analyse real-life problem situations and expertly map them as reinforcement learning
|
||||
tasks.
|
||||
C: Mastery of Monte Carlo Methods and Temporal Difference Learning
|
||||
D: Proficiency in Deep Reinforcement Learning algorithms
|
||||
|
||||
|
||||
Late policy
|
||||
|
||||
5% of the total marks available for the assessment shall be deducted from the assessment mark for
|
||||
each working day after the submission date, up to a maximum of five working days
|
||||
|
||||
|
||||
Avoid Plagiarism
|
||||
|
||||
• Do not submit work from other students.
|
||||
|
||||
• Do not share code/work with other students
|
||||
|
||||
• Do not use open-source code as it is or without proper reference.
|
||||
|
||||
|
||||
|
||||
|
||||
2
|
||||
Risks
|
||||
|
||||
• Please read the coursework instructions and requirements carefully. Not following these instructions
|
||||
and requirements may result in a loss of marks.
|
||||
• The assignment must be submitted via Learning Mall. Only electronic submission is accepted
|
||||
and no hard copy submission.
|
||||
• All students must download their file and check that it is viewable after submission. Documents
|
||||
may become corrupted during the uploading process (e.g. due to slow internet connections).
|
||||
However, students are responsible for submitting a functional and correct file for assessments.
|
||||
• Academic Integrity Policy is strictly followed.
|
||||
|
||||
|
||||
Individual Report (40 marks)
|
||||
|
||||
The primary objective of this coursework is to familiarize students with the PPO algorithm using
|
||||
basic deep learning libraries, enabling them to improve their capability in transferring mathematical
|
||||
and theoretical knowledge into Python implementation, and further their understanding of the actor-
|
||||
critic algorithm.
|
||||
|
||||
|
||||
Algorithm Overview
|
||||
|
||||
Proximal Policy Optimization (PPO) is a state-of-the-art reinforcement learning algorithm that optimizes
|
||||
a stochastic policy in an on-policy manner. To ensure stable training and avoid catastrophic performance
|
||||
collapse, PPO utilizes a clipped surrogate objective to prevent the policy update from stepping too
|
||||
far from the current behavior.
|
||||
|
||||
|
||||
The Environment: CarRacing-v3
|
||||
|
||||
We will be using the Car Racing environment from the OpenAI Gymnasium. This environment
|
||||
features a top-down racing track where the agent must learn to navigate through tiles based on
|
||||
pixel inputs. You can find more details about this environment on their website.(https://gymnasium.
|
||||
farama.org/environments/box2d/car_racing/)
|
||||
Here’s a code snippet for you to get started:
|
||||
|
||||
import gymnasium as gym
|
||||
env = gym . make ( " CarRacing - v3 " , render_mode = " rgb_array " )
|
||||
env . reset ()
|
||||
|
||||
Since CarRacing-v3 is quite computationally expensive for a standard laptop (due to the pixel processing),
|
||||
you might want to consider using a gray-scaling or frame-stacking wrapper to speed up training.
|
||||
Alternatively, you can also use the lab computers, which have GPUs and have all the environment
|
||||
already set up.
|
||||
|
||||
|
||||
The PPO Agent
|
||||
|
||||
You will implement an RL agent using PPO to play the CarRacing-v3 environment. The agent
|
||||
will use the standard observation and actions provided by the environment. You may edit the
|
||||
|
||||
3
|
||||
environment to speed up your training, but your agent must still perform well in the standard
|
||||
environment. (i.e, removing the camera zoom at the beginning is allowed during training, but
|
||||
your agent should still be tested in the original environment.) You should record your training and
|
||||
evaluation process using Tensorboard. You should also record important losses and other data for
|
||||
your analysis later.
|
||||
|
||||
|
||||
The Report
|
||||
|
||||
Upon completion of your implementation, you are required to submit a comprehensive technical
|
||||
report. The report should document your engineering decisions, the theoretical grounding of your
|
||||
code, and a critical analysis of the agent’s performance.
|
||||
|
||||
1. Introduction
|
||||
|
||||
• Provide a brief overview of Reinforcement Learning in the context of the CarRacing-v3
|
||||
environment.
|
||||
• Define the state space (pixels), action space (discrete commands), and the reward structure
|
||||
of the task.
|
||||
|
||||
2. Methodology
|
||||
|
||||
• Mathematical Foundation: Formulate the PPO objective function. Explain the significance
|
||||
of the clipping parameter and the probability ratio.
|
||||
• Advantage Estimation: Describe your method for calculating advantages (e.g., standard
|
||||
advantage vs. Generalized Advantage Estimation (GAE)).
|
||||
|
||||
3. Implementation Details
|
||||
|
||||
• Describe your implementation, including any challenges faced and how you addressed
|
||||
them.
|
||||
• Explain the structure of your policy and value networks.
|
||||
• Detail the training process and hyperparameters used.
|
||||
|
||||
4. Results and Analysis
|
||||
|
||||
• Present your results (use graphs for better clarity).
|
||||
• Discuss the performance of your agent and any trends observed.
|
||||
• Briefly compare your custom implementation’s stability and sample efficiency against baseline
|
||||
benchmarks (e.g., Stable-Baselines3).
|
||||
|
||||
5. Conclusion
|
||||
|
||||
• Summarize your key findings regarding the sensitivity of PPO to hyperparameter tuning
|
||||
and the effectiveness of the actor-critic framework in continuous-input environments.
|
||||
|
||||
Note: All figures and plots must be clearly labeled with axes titles and legends. Raw code
|
||||
snippets should be kept to a minimum in the report; focus on high-level logic and pseudo-
|
||||
code where necessary.
|
||||
|
||||
|
||||
|
||||
|
||||
4
|
||||
Important Note
|
||||
|
||||
• Do NOT use Stable-baselines libraries or any other reinforcement learning specific libraries in
|
||||
your implementation (You may use tensorboard for recording your results).
|
||||
|
||||
• Do NOT exceed the word count limit of 3000 words for each report, reference and appendix
|
||||
excluded.
|
||||
|
||||
• Although you are allowed to use any generative AI tools to assist your work, please keep in mind
|
||||
that you should be using them responsibly. (Good use: Improve your report after writing it
|
||||
and always review its output to ensure that it is correct. Bad use: Copy-pasting an entire report
|
||||
from AI without any effort of your own. )
|
||||
|
||||
|
||||
Submission Requirements
|
||||
|
||||
Please prepare and submit the following documents:
|
||||
|
||||
• A cover page featuring your student ID. This page should be the first page of your report.
|
||||
|
||||
• A zip file containing all the source codes and your trained agent model, which should be named
|
||||
using your full name and student ID in the following format: CW1_ID_Name.zip
|
||||
|
||||
• One PDF file for your report. The file should be separated from the zip file, which contains your
|
||||
code. The files should be named in the following format: CW1_ID_Name.pdf
|
||||
|
||||
Note that the quality of the code, the clarity of your writing, and the format/style of your report will
|
||||
be taken into consideration during the evaluation. The detailed rubric is outlined below.
|
||||
|
||||
|
||||
Rubric
|
||||
|
||||
CW1 (40 makrs) Criteria Marks
|
||||
Code Performance Code runs without errors and performs tasks as specified. 6
|
||||
Code Quality Code is well-organized, includes meaningful comments, and uses appropriate variable names. 6
|
||||
Methodology Comprehensive coverage of topics with detailed explanations of approaches and methodologies. 6
|
||||
Result analysis Insightful analysis of results. 6
|
||||
Report Quality Report is well-structured, formatted, and free of grammatical errors. 6
|
||||
Evidence of Work All required elements are included and correct. 6
|
||||
Submission Follows all requirements for submission 4
|
||||
|
||||
|
||||
|
||||
|
||||
5
|
||||
XJTLU Entrepreneur College (Taicang) Cover Sheet
|
||||
|
||||
Module code and Title DTS307TC Reinforcement Learning
|
||||
School Title School of AI and Advanced Computing
|
||||
Assignment Title Coursework 1
|
||||
Submission Deadline 04/May/2026 23:59
|
||||
Final Word Count
|
||||
If you agree to let the university use your work anonymously for teaching
|
||||
and learning purposes, please type “yes” here.
|
||||
|
||||
|
||||
I certify that I have read and understood the University’s Policy for dealing with Plagiarism,
|
||||
Collusion and the Fabrication of Data (available on Learning Mall Online). With reference to this
|
||||
policy I certify that:
|
||||
|
||||
• My work does not contain any instances of plagiarism and/or collusion.
|
||||
My work does not contain any fabricated data.
|
||||
|
||||
|
||||
|
||||
By uploading my assignment onto Learning Mall Online, I formally declare
|
||||
that all of the above information is true to the best of my knowledge and
|
||||
belief.
|
||||
Scoring – For Tutor Use
|
||||
Student ID
|
||||
|
||||
Stage of Marker Learning Outcomes Achieved (F/P/M/D) Final
|
||||
Marking Code (please modify as appropriate) Score
|
||||
A B C
|
||||
1st Marker – red
|
||||
pen
|
||||
Moderation The original mark has been accepted by the moderator Y/N
|
||||
IM (please circle as appropriate):
|
||||
– green pen Initials
|
||||
Data entry and score calculation have been checked by Y
|
||||
another tutor (please circle):
|
||||
2nd Marker if
|
||||
needed – green
|
||||
pen
|
||||
For Academic Office Use Possible Academic Infringement (please tick as appropriate)
|
||||
Date Days Late ☐ Category A
|
||||
Received late Penalty Total Academic Infringement Penalty
|
||||
☐ Category B (A,B, C, D, E, Please modify where
|
||||
necessary) _____________________
|
||||
☐ Category C
|
||||
☐ Category D
|
||||
☐ Category E
|
||||
School of Artificial Intelligence and Advanced Computing
|
||||
Xi’an Jiaotong-Liverpool University
|
||||
|
||||
|
||||
|
||||
|
||||
DTS307TC Reinforcement Learning
|
||||
|
||||
Coursework - Individual Report
|
||||
|
||||
Due: 04/May/2026 23:59
|
||||
Weight: 40%
|
||||
Maximum score: 40 marks
|
||||
|
||||
|
||||
|
||||
|
||||
Overview
|
||||
|
||||
The purpose of this assignment is to gain experience in Python programming and the design of
|
||||
reinforcement leaning algorithms. You are expected to implement an RL algorithm that solves a
|
||||
specific environment and provide an explanation of the algorithm’s methodology. You are expected
|
||||
to analyse your results, including challenges and your solutions.
|
||||
|
||||
|
||||
Learning Outcomes Assessed
|
||||
|
||||
A: Systematically understand the fundamental concepts and principles of reinforcement learning
|
||||
B: Critically analyse real-life problem situations and expertly map them as reinforcement learning
|
||||
tasks.
|
||||
C: Mastery of Monte Carlo Methods and Temporal Difference Learning
|
||||
D: Proficiency in Deep Reinforcement Learning algorithms
|
||||
|
||||
|
||||
Late policy
|
||||
|
||||
5% of the total marks available for the assessment shall be deducted from the assessment mark for
|
||||
each working day after the submission date, up to a maximum of five working days
|
||||
|
||||
|
||||
Avoid Plagiarism
|
||||
|
||||
• Do not submit work from other students.
|
||||
|
||||
• Do not share code/work with other students
|
||||
|
||||
• Do not use open-source code as it is or without proper reference.
|
||||
|
||||
|
||||
|
||||
|
||||
2
|
||||
Risks
|
||||
|
||||
• Please read the coursework instructions and requirements carefully. Not following these instructions
|
||||
and requirements may result in a loss of marks.
|
||||
• The assignment must be submitted via Learning Mall. Only electronic submission is accepted
|
||||
and no hard copy submission.
|
||||
• All students must download their file and check that it is viewable after submission. Documents
|
||||
may become corrupted during the uploading process (e.g. due to slow internet connections).
|
||||
However, students are responsible for submitting a functional and correct file for assessments.
|
||||
• Academic Integrity Policy is strictly followed.
|
||||
|
||||
|
||||
Individual Report (40 marks)
|
||||
|
||||
The primary objective of this coursework is to familiarize students with the PPO algorithm using
|
||||
basic deep learning libraries, enabling them to improve their capability in transferring mathematical
|
||||
and theoretical knowledge into Python implementation, and further their understanding of the actor-
|
||||
critic algorithm.
|
||||
|
||||
|
||||
Algorithm Overview
|
||||
|
||||
Proximal Policy Optimization (PPO) is a state-of-the-art reinforcement learning algorithm that optimizes
|
||||
a stochastic policy in an on-policy manner. To ensure stable training and avoid catastrophic performance
|
||||
collapse, PPO utilizes a clipped surrogate objective to prevent the policy update from stepping too
|
||||
far from the current behavior.
|
||||
|
||||
|
||||
The Environment: CarRacing-v3
|
||||
|
||||
We will be using the Car Racing environment from the OpenAI Gymnasium. This environment
|
||||
features a top-down racing track where the agent must learn to navigate through tiles based on
|
||||
pixel inputs. You can find more details about this environment on their website.(https://gymnasium.
|
||||
farama.org/environments/box2d/car_racing/)
|
||||
Here’s a code snippet for you to get started:
|
||||
|
||||
import gymnasium as gym
|
||||
env = gym . make ( " CarRacing - v3 " , render_mode = " rgb_array " )
|
||||
env . reset ()
|
||||
|
||||
Since CarRacing-v3 is quite computationally expensive for a standard laptop (due to the pixel processing),
|
||||
you might want to consider using a gray-scaling or frame-stacking wrapper to speed up training.
|
||||
Alternatively, you can also use the lab computers, which have GPUs and have all the environment
|
||||
already set up.
|
||||
|
||||
|
||||
The PPO Agent
|
||||
|
||||
You will implement an RL agent using PPO to play the CarRacing-v3 environment. The agent
|
||||
will use the standard observation and actions provided by the environment. You may edit the
|
||||
|
||||
3
|
||||
environment to speed up your training, but your agent must still perform well in the standard
|
||||
environment. (i.e, removing the camera zoom at the beginning is allowed during training, but
|
||||
your agent should still be tested in the original environment.) You should record your training and
|
||||
evaluation process using Tensorboard. You should also record important losses and other data for
|
||||
your analysis later.
|
||||
|
||||
|
||||
The Report
|
||||
|
||||
Upon completion of your implementation, you are required to submit a comprehensive technical
|
||||
report. The report should document your engineering decisions, the theoretical grounding of your
|
||||
code, and a critical analysis of the agent’s performance.
|
||||
|
||||
1. Introduction
|
||||
|
||||
• Provide a brief overview of Reinforcement Learning in the context of the CarRacing-v3
|
||||
environment.
|
||||
• Define the state space (pixels), action space (discrete commands), and the reward structure
|
||||
of the task.
|
||||
|
||||
2. Methodology
|
||||
|
||||
• Mathematical Foundation: Formulate the PPO objective function. Explain the significance
|
||||
of the clipping parameter and the probability ratio.
|
||||
• Advantage Estimation: Describe your method for calculating advantages (e.g., standard
|
||||
advantage vs. Generalized Advantage Estimation (GAE)).
|
||||
|
||||
3. Implementation Details
|
||||
|
||||
• Describe your implementation, including any challenges faced and how you addressed
|
||||
them.
|
||||
• Explain the structure of your policy and value networks.
|
||||
• Detail the training process and hyperparameters used.
|
||||
|
||||
4. Results and Analysis
|
||||
|
||||
• Present your results (use graphs for better clarity).
|
||||
• Discuss the performance of your agent and any trends observed.
|
||||
• Briefly compare your custom implementation’s stability and sample efficiency against baseline
|
||||
benchmarks (e.g., Stable-Baselines3).
|
||||
|
||||
5. Conclusion
|
||||
|
||||
• Summarize your key findings regarding the sensitivity of PPO to hyperparameter tuning
|
||||
and the effectiveness of the actor-critic framework in continuous-input environments.
|
||||
|
||||
Note: All figures and plots must be clearly labeled with axes titles and legends. Raw code
|
||||
snippets should be kept to a minimum in the report; focus on high-level logic and pseudo-
|
||||
code where necessary.
|
||||
|
||||
|
||||
|
||||
|
||||
4
|
||||
Important Note
|
||||
|
||||
• Do NOT use Stable-baselines libraries or any other reinforcement learning specific libraries in
|
||||
your implementation (You may use tensorboard for recording your results).
|
||||
|
||||
• Do NOT exceed the word count limit of 3000 words for each report, reference and appendix
|
||||
excluded.
|
||||
|
||||
• Although you are allowed to use any generative AI tools to assist your work, please keep in mind
|
||||
that you should be using them responsibly. (Good use: Improve your report after writing it
|
||||
and always review its output to ensure that it is correct. Bad use: Copy-pasting an entire report
|
||||
from AI without any effort of your own. )
|
||||
|
||||
|
||||
Submission Requirements
|
||||
|
||||
Please prepare and submit the following documents:
|
||||
|
||||
• A cover page featuring your student ID. This page should be the first page of your report.
|
||||
|
||||
• A zip file containing all the source codes and your trained agent model, which should be named
|
||||
using your full name and student ID in the following format: CW1_ID_Name.zip
|
||||
|
||||
• One PDF file for your report. The file should be separated from the zip file, which contains your
|
||||
code. The files should be named in the following format: CW1_ID_Name.pdf
|
||||
|
||||
Note that the quality of the code, the clarity of your writing, and the format/style of your report will
|
||||
be taken into consideration during the evaluation. The detailed rubric is outlined below.
|
||||
|
||||
|
||||
Rubric
|
||||
|
||||
CW1 (40 makrs) Criteria Marks
|
||||
Code Performance Code runs without errors and performs tasks as specified. 6
|
||||
Code Quality Code is well-organized, includes meaningful comments, and uses appropriate variable names. 6
|
||||
Methodology Comprehensive coverage of topics with detailed explanations of approaches and methodologies. 6
|
||||
Result analysis Insightful analysis of results. 6
|
||||
Report Quality Report is well-structured, formatted, and free of grammatical errors. 6
|
||||
Evidence of Work All required elements are included and correct. 6
|
||||
Submission Follows all requirements for submission 4
|
||||
|
||||
|
||||
|
||||
|
||||
5
|
||||
|
||||
+259
-259
@@ -1,260 +1,260 @@
|
||||
XJTLU Entrepreneur College (Taicang) Cover Sheet
|
||||
|
||||
School of AI and Advanced
|
||||
Module code DTS304TC: Machine Learning School title
|
||||
Computing
|
||||
|
||||
Assessment title Coursework Task 1 Assessment type Coursework
|
||||
|
||||
Submission
|
||||
01/May/2026 23:59
|
||||
deadline
|
||||
|
||||
|
||||
I certify that I have read and understood the University's Policy for dealing with Plagiarism, Collusion and the Fabrication of Data
|
||||
(available on Learning Mall Online).
|
||||
My work does not contain any instances of plagiarism and/or collusion.
|
||||
My work does not contain any fabricated data.
|
||||
|
||||
|
||||
By uploading my assignment onto Learning Mall Online, I formally declare that all of the
|
||||
above information is true to the best of my knowledge and belief.
|
||||
Scoring – For Tutor Use
|
||||
Student ID
|
||||
Theory and Reflection PDF Word Count (Filled by
|
||||
Students)
|
||||
|
||||
Stage of Marking Marker Learning Outcomes Achieved (F/P/M/D) Final
|
||||
Code Score
|
||||
(please modify as appropriate)
|
||||
A B C
|
||||
1st Marker – red
|
||||
pen
|
||||
Moderation The original mark has been accepted by the moderator Y/N
|
||||
IM (please circle as appropriate):
|
||||
– green pen Initials
|
||||
Data entry and score calculation have been checked by Y
|
||||
another tutor (please circle):
|
||||
2nd Marker if
|
||||
needed – green
|
||||
pen
|
||||
For Academic Office Use Possible Academic Infringement (please tick as appropriate)
|
||||
Date Days Late ☐ Category A
|
||||
Received late Penalty Total Academic Infringement Penalty
|
||||
☐ Category B (A,B, C, D, E, Please modify where
|
||||
necessary) _____________________
|
||||
☐ Category C
|
||||
☐ Category D
|
||||
☐ Category E
|
||||
DTS304TC Machine Learning
|
||||
Coursework - Assessment Task 1
|
||||
• Percentage in final mark: 50%
|
||||
• Assessment type: individual coursework
|
||||
• Submission files: one Jupyter notebook (.ipynb), one Coursework Answer Sheet / Theory and Reflection PDF, and one
|
||||
hidden-test CSV
|
||||
|
||||
Learning outcomes assessed
|
||||
• A. Demonstrate a solid understanding of the theoretical issues related to problems that machine-learning methods try to
|
||||
address.
|
||||
• B. Demonstrate understanding of the properties of existing machine-learning algorithms and how they behave on practical data.
|
||||
|
||||
|
||||
|
||||
Notes
|
||||
• Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may
|
||||
result in a loss of marks.
|
||||
• The formal procedure for submitting coursework at XJTLU is strictly followed. Submission link on Learning Mall will be provided
|
||||
in due course. The submission timestamp on Learning Mall will be used to check late submission.
|
||||
• 5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the
|
||||
submission date, up to a maximum of five working days.
|
||||
• All modelling work must be completed individually. Discussion of general ideas is allowed, but code, experiments, and
|
||||
notebooks must be independently developed.
|
||||
• You may not use ChatGPT to directly generate answers for the coursework. High-scoring work must demonstrate your own
|
||||
experimental design, controlled comparisons, failure analysis, and image-level interpretation. ChatGPT or similar tools may be
|
||||
used only in a limited support role such as code understanding, debugging, or grammar support. They must not replace your
|
||||
method design, ablation logic, qualitative analysis, or reflection. Generic AI-produced descriptions without matching evidence in
|
||||
code, tables, figures, and discussion will not receive high marks.
|
||||
• If you use AI tools or outside code in any meaningful way, you must fully understand, verify, and take ownership of every
|
||||
method, number, figure, and written claim that appears in your submission.
|
||||
|
||||
|
||||
|
||||
Question 1: Notebook-Based Coding Exercise - Insurance Premium-Risk Classification (60
|
||||
Marks)
|
||||
In this coursework you will build and improve a multiclass classifier for a fictionalised health-insurance dataset. The task is
|
||||
to predict whether each applicant belongs to a Low, Standard, or High premium-risk group before pricing a policy. The
|
||||
dataset is intentionally realistic: it mixes numerical and categorical variables, contains missing values and dirty entries, and
|
||||
includes some fields that require careful handling to avoid weak modelling practice or label leakage.
|
||||
Your work should show a clear machine-learning workflow: build a sensible first pipeline, compare model families, apply
|
||||
stronger hyperparameter optimisation, complete one compulsory improvement category plus at least one optional category,
|
||||
carry out a compact K-Means/Gaussian Mixture Model (GMM) exploration, and then produce a hidden-test CSV using
|
||||
validation evidence only.
|
||||
The prediction target variable is ‘premium_risk’, and it has 3 imbalanced classes: Standard, High, Low. The dataset
|
||||
contains 33 raw columns: admin/PII columns, synthetic noise features, 1 leakage feature, and genuine predictors.
|
||||
Unless otherwise stated, macro-F1 is the primary validation metric because the dataset is imbalanced; accuracy is reported
|
||||
as a secondary metric.
|
||||
(A) Clean First Pipeline and Baseline Modelling (8 marks)
|
||||
• Load the provided training and validation files and define a consistent target / feature setup.
|
||||
• Handle leakage features, dirty values, missing values, and categorical variables sensibly. A compact sanity check is enough; a
|
||||
long data-audit section is not required.
|
||||
Important: The dataset contains a leakage feature. You must identify and remove it before proceeding to the next stage
|
||||
of analysis; otherwise, the classification results will be severely biased by this leakage and will not be meaningful. If
|
||||
this occurs, multiple parts of your Coursework 1 may be affected, which could significantly impact your marks.
|
||||
• Build one baseline modelling pipeline.
|
||||
• Report at least one validation result using accuracy and macro-F1 score and include a confusion matrix for the baseline model.
|
||||
• Keep preprocessing consistent across train, validation, and hidden-test files.
|
||||
|
||||
|
||||
(B) Controlled Comparison: Random Forest and One Boosting Model (8 marks)
|
||||
• Using the same preprocessing pipeline, validation split, and evaluation metric (primary metric is macro-F1 also report accuracy),
|
||||
carry out an initial controlled comparison between one Random Forest model and one boosting model.
|
||||
• Default XGBoost is recommended because it provides a richer tuning space later, but others may also be used. Default settings
|
||||
or only light sensible adjustments are acceptable in this section.
|
||||
• In the notebook, report the validation result of each model and support the comparison with one or two additional analyses, such
|
||||
as class-wise metrics, a confusion matrix, train-versus-validation behaviour, or stability / sensitivity after tuning.
|
||||
• Your goal is not to prove that one model type always wins. Your goal is to compare the two models fairly, explain the high-level
|
||||
learning difference between bagging and boosting, and use your own notebook evidence to give a careful, dataset-specific
|
||||
interpretation. A generic textbook answer without reference to your own results will receive limited credit.
|
||||
(C) Advanced Hyperparameter Optimisation (12 marks)
|
||||
• At least one main model should be tuned with a genuinely advanced strategy such as Optuna/TPE, Bayesian optimisation,
|
||||
Hyperopt, Ray Tune, or another comparably strong approach.
|
||||
• Hyperparameter tuning should optimise macro-F1 score on the validation set, and the final tuned result should be reported
|
||||
using both accuracy and macro-F1.
|
||||
• RandomizedSearchCV alone is normally not enough for the top band.
|
||||
• Explain briefly why your search space and optimiser are reasonable for the chosen model.
|
||||
(D) Personalised Improvement Work (18 marks)
|
||||
You must complete one compulsory category based on the last digit of your XJTLU student ID, plus at least one additional
|
||||
optional category of your choice. A second optional category is recommended for stronger differentiation but is not compulsory.
|
||||
You should report accuracy and macro-F1 for improved models and include class-wise metrics where helpful. A compact ablation
|
||||
table should normally be included in the notebook for the personalized improvement work
|
||||
|
||||
Last digit Compulsory category
|
||||
0-1 Category A - Data quality and missingness
|
||||
2-3 Category B - Feature representation and engineering
|
||||
4-5 Category C - Imbalance and objective design
|
||||
6-7 Category D - Model robustness, calibration, or ensembling
|
||||
8-9 Category E - Fairness, diagnostics, or interpretability
|
||||
Category Examples of what may be done What good evidence looks like
|
||||
better missing-value strategy; A concise before/after comparison with a short
|
||||
A MissForest or iterative imputation; explanation of why the data handling changed the
|
||||
sensible outlier handling; value cleaning result
|
||||
feature crosses; grouped categories;
|
||||
A compact ablation showing what representation
|
||||
B alternative encodings; modest feature
|
||||
changed and whether it helped
|
||||
selection; transformations
|
||||
class weighting; focal-style loss if
|
||||
Clear evidence of how minority or harder classes
|
||||
C relevant; sampling / resampling;
|
||||
changed, even if overall score moved only slightly
|
||||
thresholding logic
|
||||
bagging/boosting variants; calibration
|
||||
A meaningful diagnostic or comparison rather
|
||||
D checks; soft voting; stacking;
|
||||
than a large collection of loosely connected trials
|
||||
robustness checks
|
||||
SHAP / feature importance; subgroup-
|
||||
Concrete insight into model behaviour, not only
|
||||
E style fairness checks; error analysis;
|
||||
screenshots
|
||||
model interpretation
|
||||
(E) K-Means and Gaussian Mixture Model (GMM) Exploration (6 marks)
|
||||
This is a compact exploratory section. It is not the main performance section, and it does not require clusters to match the class
|
||||
labels exactly. The aim is to show your understanding of unsupervised learning methods and your ability to interpret their results
|
||||
carefully.
|
||||
• Use a sensible processed numeric feature space and briefly explain what you clustered on.
|
||||
• Explore a small range of cluster/component numbers, such as 2-8.
|
||||
• For K-Means, provide sensible supporting evidence, such as inertia (SSE), cluster sizes, or another simple analysis..
|
||||
• For Gaussian Mixture Model (GMM), provide sensible supporting evidence, such as component sizes, posterior
|
||||
confidence/responsibility, or overlap/uncertainty between components.
|
||||
• Include at least one compact table or figure comparing K-Means and GMM.
|
||||
• If class labels are used for reference, explain clearly that unsupervised structure does not need to align exactly with supervised
|
||||
labels
|
||||
• Stronger work may additionally use silhouette score, log-likelihood trends, or a simple visualization.
|
||||
|
||||
|
||||
(F) Final Model Choice and Hidden-Test Export (8 marks)
|
||||
• Choose the final model using validation evidence only.
|
||||
• Retrain appropriately using both train and validation dataset and generate the hidden-test CSV in the required format.
|
||||
• Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
|
||||
column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
|
||||
Low).
|
||||
Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4 marks
|
||||
from this section.
|
||||
• Do not tune on the hidden test and do not claim hidden test performance.
|
||||
• Note: Hidden test score contributes only a small portion of the final marks. High leaderboard rank alone cannot compensate for
|
||||
weak experimental design or poor documentation.
|
||||
|
||||
|
||||
Coursework Answer Sheet / Theory and Reflection (PDF) - all questions below are compulsory
|
||||
(30 Marks)
|
||||
The Coursework Answer Sheet / Theory and Reflection PDF should not repeat the notebook section by section. All prompt areas
|
||||
below are compulsory. The PDF must be concise, directly linked to your own notebook evidence, and no longer than 4 pages /
|
||||
1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from the PDF section. You should aim to
|
||||
demonstrate both your theoretical or algorithmic understanding and your experimental findings or practical observations and
|
||||
clearly link your understanding of the algorithms to your experimental analysis. At least one table, figure, or metric from the
|
||||
notebook must be referenced in each theory answer.
|
||||
|
||||
Prompt area What you should do
|
||||
(1) Briefly state the definitions and key theoretical properties of bagging
|
||||
and boosting models;
|
||||
(2) report the validation results of each model;
|
||||
(3) support your comparison with one or two additional analyses, such as
|
||||
class-wise metrics, a confusion matrix, train–validation behaviour, or
|
||||
1. Bagging versus boosting stability/sensitivity after tuning; and
|
||||
(4) provide a careful interpretation of what this comparison suggests
|
||||
about this dataset and how it relates to the theoretical properties of
|
||||
bagging versus boosting methods.
|
||||
You are not expected to prove that one model type always performs
|
||||
better.
|
||||
Explain why your optimiser and search space were reasonable for the
|
||||
chosen model, which hyperparameters you expected to matter most,
|
||||
2. Hyperparameter optimisation
|
||||
whether the tuned results matched that intuition, and what you learned
|
||||
from the tuning process.
|
||||
Explain hard versus soft assignment and the main assumption difference
|
||||
between K-Means and GMM. Then use your own compact evidence to
|
||||
3. K-Means versus Gaussian Mixture Model (GMM) discuss whether the results matched your intuition and whether GMM
|
||||
revealed anything extra, such as soft membership, uncertainty, or a
|
||||
better fit to partial cluster structure.
|
||||
Reflect on the compulsory category and on every optional category you
|
||||
implemented. Highlight any unique or interesting algorithm or strategy
|
||||
4. Personalised reflection you tried, the personal challenges you faced, the effort you made to
|
||||
address them, and the key lessons you learned. Honest reflection on a
|
||||
neutral or negative result is acceptable if the reasoning is concrete.
|
||||
State briefly what forms of AI assistance, if any, were used. Generic AI-
|
||||
5. AI-use declaration written theory that does not match your notebook evidence will receive
|
||||
limited credit.
|
||||
|
||||
|
||||
|
||||
Coding Quality, Coursework Answer Sheet Quality, and Submission Guidelines (10 marks)
|
||||
|
||||
• Submit your Jupyter Notebook in .ipynb format. It must be well organised, include clear commentary and clean code practices,
|
||||
and show visible outputs. Do not write a second mini-report repeating notebook content.
|
||||
• The notebook should be reproducible from start to finish without errors. Results cited in the PDF should be visible in the
|
||||
notebook and should match the reported values.
|
||||
• If you used supplementary code outside the notebook, submit that code as well so the full workflow remains reproducible.
|
||||
• Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
|
||||
column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
|
||||
Low). Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4
|
||||
marks from this section.
|
||||
• Submit the Coursework Answer Sheet / Theory and Reflection in PDF format. All questions in that section are compulsory. The
|
||||
Coursework Answer Sheet / Theory and Reflection PDF must answer every required prompt, refer to your own notebook
|
||||
evidence, and remain within 4 pages and 1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from
|
||||
the PDF section.
|
||||
• Include all required components: Jupyter notebooks (code), any additional experimental scripts or custom code, the hidden
|
||||
test-results CSV file, and the Coursework Answer Sheet PDF. Submit all files through the Learning Mall platform. After
|
||||
submission, download your files to verify that they can be opened and viewed correctly to ensure the submission was
|
||||
successful.
|
||||
|
||||
Project Material Access Instructions
|
||||
|
||||
To access the complete set of materials for this project, please use the links below:
|
||||
|
||||
• OneDrive Link:
|
||||
https://1drv.ms/f/c/18f09d1a39585f84/IgCXDMbXkFYSSZUZkkTyXyZzAQ1poX9mujUqF8N3JlL0GD0?e=uNhAHq
|
||||
• The same coursework materials have also been uploaded to Learning Mall.
|
||||
When extracting the materials, use the following password to unlock the zip file: DTS304TC (case-sensitive, enter in
|
||||
uppercase).
|
||||
XJTLU Entrepreneur College (Taicang) Cover Sheet
|
||||
|
||||
School of AI and Advanced
|
||||
Module code DTS304TC: Machine Learning School title
|
||||
Computing
|
||||
|
||||
Assessment title Coursework Task 1 Assessment type Coursework
|
||||
|
||||
Submission
|
||||
01/May/2026 23:59
|
||||
deadline
|
||||
|
||||
|
||||
I certify that I have read and understood the University's Policy for dealing with Plagiarism, Collusion and the Fabrication of Data
|
||||
(available on Learning Mall Online).
|
||||
My work does not contain any instances of plagiarism and/or collusion.
|
||||
My work does not contain any fabricated data.
|
||||
|
||||
|
||||
By uploading my assignment onto Learning Mall Online, I formally declare that all of the
|
||||
above information is true to the best of my knowledge and belief.
|
||||
Scoring – For Tutor Use
|
||||
Student ID
|
||||
Theory and Reflection PDF Word Count (Filled by
|
||||
Students)
|
||||
|
||||
Stage of Marking Marker Learning Outcomes Achieved (F/P/M/D) Final
|
||||
Code Score
|
||||
(please modify as appropriate)
|
||||
A B C
|
||||
1st Marker – red
|
||||
pen
|
||||
Moderation The original mark has been accepted by the moderator Y/N
|
||||
IM (please circle as appropriate):
|
||||
– green pen Initials
|
||||
Data entry and score calculation have been checked by Y
|
||||
another tutor (please circle):
|
||||
2nd Marker if
|
||||
needed – green
|
||||
pen
|
||||
For Academic Office Use Possible Academic Infringement (please tick as appropriate)
|
||||
Date Days Late ☐ Category A
|
||||
Received late Penalty Total Academic Infringement Penalty
|
||||
☐ Category B (A,B, C, D, E, Please modify where
|
||||
necessary) _____________________
|
||||
☐ Category C
|
||||
☐ Category D
|
||||
☐ Category E
|
||||
DTS304TC Machine Learning
|
||||
Coursework - Assessment Task 1
|
||||
• Percentage in final mark: 50%
|
||||
• Assessment type: individual coursework
|
||||
• Submission files: one Jupyter notebook (.ipynb), one Coursework Answer Sheet / Theory and Reflection PDF, and one
|
||||
hidden-test CSV
|
||||
|
||||
Learning outcomes assessed
|
||||
• A. Demonstrate a solid understanding of the theoretical issues related to problems that machine-learning methods try to
|
||||
address.
|
||||
• B. Demonstrate understanding of the properties of existing machine-learning algorithms and how they behave on practical data.
|
||||
|
||||
|
||||
|
||||
Notes
|
||||
• Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may
|
||||
result in a loss of marks.
|
||||
• The formal procedure for submitting coursework at XJTLU is strictly followed. Submission link on Learning Mall will be provided
|
||||
in due course. The submission timestamp on Learning Mall will be used to check late submission.
|
||||
• 5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the
|
||||
submission date, up to a maximum of five working days.
|
||||
• All modelling work must be completed individually. Discussion of general ideas is allowed, but code, experiments, and
|
||||
notebooks must be independently developed.
|
||||
• You may not use ChatGPT to directly generate answers for the coursework. High-scoring work must demonstrate your own
|
||||
experimental design, controlled comparisons, failure analysis, and image-level interpretation. ChatGPT or similar tools may be
|
||||
used only in a limited support role such as code understanding, debugging, or grammar support. They must not replace your
|
||||
method design, ablation logic, qualitative analysis, or reflection. Generic AI-produced descriptions without matching evidence in
|
||||
code, tables, figures, and discussion will not receive high marks.
|
||||
• If you use AI tools or outside code in any meaningful way, you must fully understand, verify, and take ownership of every
|
||||
method, number, figure, and written claim that appears in your submission.
|
||||
|
||||
|
||||
|
||||
Question 1: Notebook-Based Coding Exercise - Insurance Premium-Risk Classification (60
|
||||
Marks)
|
||||
In this coursework you will build and improve a multiclass classifier for a fictionalised health-insurance dataset. The task is
|
||||
to predict whether each applicant belongs to a Low, Standard, or High premium-risk group before pricing a policy. The
|
||||
dataset is intentionally realistic: it mixes numerical and categorical variables, contains missing values and dirty entries, and
|
||||
includes some fields that require careful handling to avoid weak modelling practice or label leakage.
|
||||
Your work should show a clear machine-learning workflow: build a sensible first pipeline, compare model families, apply
|
||||
stronger hyperparameter optimisation, complete one compulsory improvement category plus at least one optional category,
|
||||
carry out a compact K-Means/Gaussian Mixture Model (GMM) exploration, and then produce a hidden-test CSV using
|
||||
validation evidence only.
|
||||
The prediction target variable is ‘premium_risk’, and it has 3 imbalanced classes: Standard, High, Low. The dataset
|
||||
contains 33 raw columns: admin/PII columns, synthetic noise features, 1 leakage feature, and genuine predictors.
|
||||
Unless otherwise stated, macro-F1 is the primary validation metric because the dataset is imbalanced; accuracy is reported
|
||||
as a secondary metric.
|
||||
(A) Clean First Pipeline and Baseline Modelling (8 marks)
|
||||
• Load the provided training and validation files and define a consistent target / feature setup.
|
||||
• Handle leakage features, dirty values, missing values, and categorical variables sensibly. A compact sanity check is enough; a
|
||||
long data-audit section is not required.
|
||||
Important: The dataset contains a leakage feature. You must identify and remove it before proceeding to the next stage
|
||||
of analysis; otherwise, the classification results will be severely biased by this leakage and will not be meaningful. If
|
||||
this occurs, multiple parts of your Coursework 1 may be affected, which could significantly impact your marks.
|
||||
• Build one baseline modelling pipeline.
|
||||
• Report at least one validation result using accuracy and macro-F1 score and include a confusion matrix for the baseline model.
|
||||
• Keep preprocessing consistent across train, validation, and hidden-test files.
|
||||
|
||||
|
||||
(B) Controlled Comparison: Random Forest and One Boosting Model (8 marks)
|
||||
• Using the same preprocessing pipeline, validation split, and evaluation metric (primary metric is macro-F1 also report accuracy),
|
||||
carry out an initial controlled comparison between one Random Forest model and one boosting model.
|
||||
• Default XGBoost is recommended because it provides a richer tuning space later, but others may also be used. Default settings
|
||||
or only light sensible adjustments are acceptable in this section.
|
||||
• In the notebook, report the validation result of each model and support the comparison with one or two additional analyses, such
|
||||
as class-wise metrics, a confusion matrix, train-versus-validation behaviour, or stability / sensitivity after tuning.
|
||||
• Your goal is not to prove that one model type always wins. Your goal is to compare the two models fairly, explain the high-level
|
||||
learning difference between bagging and boosting, and use your own notebook evidence to give a careful, dataset-specific
|
||||
interpretation. A generic textbook answer without reference to your own results will receive limited credit.
|
||||
(C) Advanced Hyperparameter Optimisation (12 marks)
|
||||
• At least one main model should be tuned with a genuinely advanced strategy such as Optuna/TPE, Bayesian optimisation,
|
||||
Hyperopt, Ray Tune, or another comparably strong approach.
|
||||
• Hyperparameter tuning should optimise macro-F1 score on the validation set, and the final tuned result should be reported
|
||||
using both accuracy and macro-F1.
|
||||
• RandomizedSearchCV alone is normally not enough for the top band.
|
||||
• Explain briefly why your search space and optimiser are reasonable for the chosen model.
|
||||
(D) Personalised Improvement Work (18 marks)
|
||||
You must complete one compulsory category based on the last digit of your XJTLU student ID, plus at least one additional
|
||||
optional category of your choice. A second optional category is recommended for stronger differentiation but is not compulsory.
|
||||
You should report accuracy and macro-F1 for improved models and include class-wise metrics where helpful. A compact ablation
|
||||
table should normally be included in the notebook for the personalized improvement work
|
||||
|
||||
Last digit Compulsory category
|
||||
0-1 Category A - Data quality and missingness
|
||||
2-3 Category B - Feature representation and engineering
|
||||
4-5 Category C - Imbalance and objective design
|
||||
6-7 Category D - Model robustness, calibration, or ensembling
|
||||
8-9 Category E - Fairness, diagnostics, or interpretability
|
||||
Category Examples of what may be done What good evidence looks like
|
||||
better missing-value strategy; A concise before/after comparison with a short
|
||||
A MissForest or iterative imputation; explanation of why the data handling changed the
|
||||
sensible outlier handling; value cleaning result
|
||||
feature crosses; grouped categories;
|
||||
A compact ablation showing what representation
|
||||
B alternative encodings; modest feature
|
||||
changed and whether it helped
|
||||
selection; transformations
|
||||
class weighting; focal-style loss if
|
||||
Clear evidence of how minority or harder classes
|
||||
C relevant; sampling / resampling;
|
||||
changed, even if overall score moved only slightly
|
||||
thresholding logic
|
||||
bagging/boosting variants; calibration
|
||||
A meaningful diagnostic or comparison rather
|
||||
D checks; soft voting; stacking;
|
||||
than a large collection of loosely connected trials
|
||||
robustness checks
|
||||
SHAP / feature importance; subgroup-
|
||||
Concrete insight into model behaviour, not only
|
||||
E style fairness checks; error analysis;
|
||||
screenshots
|
||||
model interpretation
|
||||
(E) K-Means and Gaussian Mixture Model (GMM) Exploration (6 marks)
|
||||
This is a compact exploratory section. It is not the main performance section, and it does not require clusters to match the class
|
||||
labels exactly. The aim is to show your understanding of unsupervised learning methods and your ability to interpret their results
|
||||
carefully.
|
||||
• Use a sensible processed numeric feature space and briefly explain what you clustered on.
|
||||
• Explore a small range of cluster/component numbers, such as 2-8.
|
||||
• For K-Means, provide sensible supporting evidence, such as inertia (SSE), cluster sizes, or another simple analysis..
|
||||
• For Gaussian Mixture Model (GMM), provide sensible supporting evidence, such as component sizes, posterior
|
||||
confidence/responsibility, or overlap/uncertainty between components.
|
||||
• Include at least one compact table or figure comparing K-Means and GMM.
|
||||
• If class labels are used for reference, explain clearly that unsupervised structure does not need to align exactly with supervised
|
||||
labels
|
||||
• Stronger work may additionally use silhouette score, log-likelihood trends, or a simple visualization.
|
||||
|
||||
|
||||
(F) Final Model Choice and Hidden-Test Export (8 marks)
|
||||
• Choose the final model using validation evidence only.
|
||||
• Retrain appropriately using both train and validation dataset and generate the hidden-test CSV in the required format.
|
||||
• Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
|
||||
column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
|
||||
Low).
|
||||
Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4 marks
|
||||
from this section.
|
||||
• Do not tune on the hidden test and do not claim hidden test performance.
|
||||
• Note: Hidden test score contributes only a small portion of the final marks. High leaderboard rank alone cannot compensate for
|
||||
weak experimental design or poor documentation.
|
||||
|
||||
|
||||
Coursework Answer Sheet / Theory and Reflection (PDF) - all questions below are compulsory
|
||||
(30 Marks)
|
||||
The Coursework Answer Sheet / Theory and Reflection PDF should not repeat the notebook section by section. All prompt areas
|
||||
below are compulsory. The PDF must be concise, directly linked to your own notebook evidence, and no longer than 4 pages /
|
||||
1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from the PDF section. You should aim to
|
||||
demonstrate both your theoretical or algorithmic understanding and your experimental findings or practical observations and
|
||||
clearly link your understanding of the algorithms to your experimental analysis. At least one table, figure, or metric from the
|
||||
notebook must be referenced in each theory answer.
|
||||
|
||||
Prompt area What you should do
|
||||
(1) Briefly state the definitions and key theoretical properties of bagging
|
||||
and boosting models;
|
||||
(2) report the validation results of each model;
|
||||
(3) support your comparison with one or two additional analyses, such as
|
||||
class-wise metrics, a confusion matrix, train–validation behaviour, or
|
||||
1. Bagging versus boosting stability/sensitivity after tuning; and
|
||||
(4) provide a careful interpretation of what this comparison suggests
|
||||
about this dataset and how it relates to the theoretical properties of
|
||||
bagging versus boosting methods.
|
||||
You are not expected to prove that one model type always performs
|
||||
better.
|
||||
Explain why your optimiser and search space were reasonable for the
|
||||
chosen model, which hyperparameters you expected to matter most,
|
||||
2. Hyperparameter optimisation
|
||||
whether the tuned results matched that intuition, and what you learned
|
||||
from the tuning process.
|
||||
Explain hard versus soft assignment and the main assumption difference
|
||||
between K-Means and GMM. Then use your own compact evidence to
|
||||
3. K-Means versus Gaussian Mixture Model (GMM) discuss whether the results matched your intuition and whether GMM
|
||||
revealed anything extra, such as soft membership, uncertainty, or a
|
||||
better fit to partial cluster structure.
|
||||
Reflect on the compulsory category and on every optional category you
|
||||
implemented. Highlight any unique or interesting algorithm or strategy
|
||||
4. Personalised reflection you tried, the personal challenges you faced, the effort you made to
|
||||
address them, and the key lessons you learned. Honest reflection on a
|
||||
neutral or negative result is acceptable if the reasoning is concrete.
|
||||
State briefly what forms of AI assistance, if any, were used. Generic AI-
|
||||
5. AI-use declaration written theory that does not match your notebook evidence will receive
|
||||
limited credit.
|
||||
|
||||
|
||||
|
||||
Coding Quality, Coursework Answer Sheet Quality, and Submission Guidelines (10 marks)
|
||||
|
||||
• Submit your Jupyter Notebook in .ipynb format. It must be well organised, include clear commentary and clean code practices,
|
||||
and show visible outputs. Do not write a second mini-report repeating notebook content.
|
||||
• The notebook should be reproducible from start to finish without errors. Results cited in the PDF should be visible in the
|
||||
notebook and should match the reported values.
|
||||
• If you used supplementary code outside the notebook, submit that code as well so the full workflow remains reproducible.
|
||||
• Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
|
||||
column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
|
||||
Low). Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4
|
||||
marks from this section.
|
||||
• Submit the Coursework Answer Sheet / Theory and Reflection in PDF format. All questions in that section are compulsory. The
|
||||
Coursework Answer Sheet / Theory and Reflection PDF must answer every required prompt, refer to your own notebook
|
||||
evidence, and remain within 4 pages and 1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from
|
||||
the PDF section.
|
||||
• Include all required components: Jupyter notebooks (code), any additional experimental scripts or custom code, the hidden
|
||||
test-results CSV file, and the Coursework Answer Sheet PDF. Submit all files through the Learning Mall platform. After
|
||||
submission, download your files to verify that they can be opened and viewed correctly to ensure the submission was
|
||||
successful.
|
||||
|
||||
Project Material Access Instructions
|
||||
|
||||
To access the complete set of materials for this project, please use the links below:
|
||||
|
||||
• OneDrive Link:
|
||||
https://1drv.ms/f/c/18f09d1a39585f84/IgCXDMbXkFYSSZUZkkTyXyZzAQ1poX9mujUqF8N3JlL0GD0?e=uNhAHq
|
||||
• The same coursework materials have also been uploaded to Learning Mall.
|
||||
When extracting the materials, use the following password to unlock the zip file: DTS304TC (case-sensitive, enter in
|
||||
uppercase).
|
||||
|
||||
Reference in New Issue
Block a user