久久久久久精品无码人妻_青春草无码精品视频在线观_无码精品国产VA在线观看_国产色无码专区在线观看

代做CITS5508、代做 Python 語言程序

時間:2024-04-21  來源:  作者: 我要糾錯



CITS5508 Machine Learning Semester 1, 2024
Assignment 2
Assessed, worth 15%. Due: 8pm, Friday 03th May 2024
Discussion is encouraged, but all work must be done and submitted individually. This assign- ment has 21 tasks, from which 20 are assessed, which total 45 marks.
You will develop Python code for classification tasks. You will use Grid Search and cross- validation to find the optimal hyperparameters of the model and discuss and interpret the differ- ent decisions and their impact on the model’s performance and interpretability.
1 Submission
Your submission consists of two files. The first file is a report describing your analysis/results. Your analysis should provide the requested plots, tables and your reflections about the results. Each deliverable task is indicated as D and a number. Your report should be submitted as a “.PDF” file. Name your file as assig1 <student id>.pdf (where you should replace <student id> with your student ID).
The second file is your Python notebook with the code supporting your analysis/results. Your code should be submitted as assig1 <student id>.ipynb, the Jupyter notebook exten- sion.
Submit your files to LMS before the due date and time. You can submit them multiple times. Only the latest version will be marked. Your submission will follow the rules provided in LMS.
Important:
• You must submit the first part of your assignment as an electronic file in PDF format (do not send DOCX, ZIP or any other file format). Only PDF format is accepted, and any other file format will receive a zero mark.
• You should provide comments on your code.
• You must deliver parts one and two to have your assignment assessed. That is, your submission should contain your analysis and your Jupyter notebook with all coding, both with appropriate formatting.
• Bysubmittingyourassignment,youacknowledgeyouhavereadallinstructionsprovided in this document and LMS.
• There is a general FAQ section and a section in your LMS, Assignments - Assignment 2 - Updates, where you will find updates or clarifications about the tasks when necessary. It is your responsibility to check this page regularly.
1

• You will be assessed on your thinking and process, not only on your results. A perfect performance without demonstrating understanding what you have done won’t provide you marks.
• Your answer must be concise. A few sentences (2-5) should be enough to answer most of the open questions. You will be graded on thoughtfulness. If you are writing long answers, rethink what you are doing. Probably, it is the wrong path.
• Youcanaskinthelaborduringconsultationifyouneedclarificationabouttheassignment questions.
• You should be aware that some algorithms can take a while to run. A good approach to improving their speed in Python is to use the vectorised forms discussed in class. In this case, it is strongly recommended that you start your assignment soon to accommodate the computational time.
• For the functions and tasks that require a random procedure (e.g. splitting the data into 80% training and 20% validation set), you should set the seed of the random generator to the value “5508” or the one(s) specified in the question.
2 Dataset
In this assignment, you are asked to train a few decision tree classifiers on the Breast cancer wisconsin (diagnostic) dataset available on Scikit-Learn and compare their performances.
Description about this dataset can be found on the Scikit-Learn web page:
https://scikit-learn.org/stable/datasets/toy dataset.html#breast-cancer-wisconsin-diagnostic-dataset
There are two classes in the dataset:
• malignant (212 instances, class value 0) and
• benign (357 instances, class value 1). Follow the example code given on the web page
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load breast cancer.html#sklearn.datasets. load breast cancer
to read the dataset and separate it into a feature matrix and a class vector. Your feature matrix should have 569 (rows) × 30 (columns) and your class vector should have 569 elements.
In all asked implementations using Decision Trees, Random Forests or data splits (such as when using train test split()), you should set random state as specified for results reproducibil- ity. You should aim to round your results to the second decimal place.
2

3 Tasks
First inspections on the dataset and preprocessing
D1 2 marks
Re-order the columns in your feature matrix (or dataframe) based on the column name. Provide a scatter plot to inspect the relationship between the first 10 features in the dataset. Use different colours in the visualisation to show instances coming from each class. (Hint: use a grid plot.)
D2 2 marks
Provide a few comments about what you can observe from the scatter plot:
• •
• •
What can be observed regarding the relationship between these features?
Can you observe the presence of clusters of groups? How do they relate to the target variable?
Are there any instances that could be outliers?
Are there features that could be removed? Why or why not?
1 mark
Compute and show the correlation matrix where each cell contains the correlation coefficient between the corresponding pair of features (Hint: you may use the heatmap function from the seaborn package).
D4 1 mark
Do the correlation coefficients support your previous observations?
D5
In a data science project, it’s crucial not just to remove highly correlated features but to consider the context and implications of feature selection carefully. Blindly dropping features may lead to the loss of valuable information or unintended bias in the model’s performance. Here, for the assignment context, we will drop a few features to simplify the classification tasks and speed up the computational time. Create a code that drop the features: mean perimeter, mean radius, worst radius, worst perimeter and radius error. These are features with a linear corre- lation higher than 0.97 in magnitude with some other features kept in the data.
After this process, your data matrix should be updated accordingly and contain 25 features. Task D5 must be performed; otherwise, your order deliverable tasks will be incorrect. However, there are no marks for task D5.
Fitting a Decision Tree model with default hyperparameters
D6 3 marks
Fit a decision tree classifier using default hyperparameters using 80% of the data. Remember to set the random generator’s state to the value “5508” (for both the split and class). Use the trained classifier to perform predictions on the training and test sets. Provide the accuracy, precision and recall scores for both sets and the confusion matrix for the test set.
D3
3

D7 2 marks
Comment on these results. Do you think your classifier is overfitting? If so, why this is happen- ing? If not, why not?
D8 2 marks
Display the decision tree built from the training process (like the one shown in Figure 6.1 of the textbook for the iris dataset).
D9 2 marks
Study the tree diagram and comment on the following:
• How many levels resulted from the model?
• Did the diagram help you to confirm whether the classifier has an overfitting issue? • What can you observe from the leaves?
• Is this an interpretable model?
D10 3 marks
Repeat the data split another four times, each using 80% of the data to train the model and the remaining 20% for testing. For these splits, set the seed of the random state to the values “5509”, “5510”, “5511” and “5512”. The random state of the model can be kept at “5508”.
For each of these four splits, fit the decision tree classifier and use the trained classifier to perform predictions on the test set. Provide three plots to show the accuracy, precision and recall scores for the test set for each split and comment on the consistency of the results in the five splits (including the original split with random state “5508”).
D11 3 marks
Investigate the impact of the training size on the performance of the model. You will do five different splits: 50%-50% (training the model on 50% of the data and testing on the remaining 50%), 60%-40%, 70%-30%, 80%-20% and 90%-10%. For each of these data splits, set back the seed of the random state to the value “5508”.
Provide three plots to show the accuracy, precision and recall scores for the test set for each data split and comment on the results. Did the performance behave as you expected?
Fitting a Decision Tree model with optimal hyperparameters
D12 4 marks
Create a training set using 80% of the data and a test set with the remaining 20%. Use a 10-fold cross-validation and grid-search to find the optimal combination of hyperparameters max depth — using values [2, 3, 4, 5], min samples split — using values [2, 4, 5, 10],andminsamplesleaf—usingvalues[2, 5]ofadecisiontreemodel.Remembertoset the seed of the random state of the data split function and model class to the value “5508”. For the cross-validation, set the value of the random state to “42”. Use accuracy for the scoring argument of the grid-search function.
With the optimal obtained hyperparameters, retrain the model and report: 4

• The optimal hyperparameters;
• The obtained accuracy, precision and recall on the training set;
• The obtained accuracy, precision and recall on the test set;
• The confusion matrix on the test set.
D13
2 marks
Comment: What was the impact of fine-tuning the hyperparameters as opposed to what you obtained in D6? Has fine-tuning done what you expected?
D14 3 marks Repeat the training of task D12 twice: one considering the scoring argument of the grid-search function as precision and the other recall.
For each of the scoring options (accuracy, precision, recall), provide the optimal hyperparame- ters according to the 10-fold cross-validation and grid-search, and, after retraining each model accordingly, provide the confusion matrix on the test set. Comment on the results, considering the problem.
Fitting a Decision Tree with optimal hyperparameters and a reduced feature set
D15 1 mark
Using the model with fine-tuned hyperparameters based on accuracy (the one you obtained in D12), display the feature importance for each feature obtained from the training process. You should sort the feature importances in descending order.
D16 3 marks
Using the feature importance you calculated in the previous task, trim the feature dimension of the data. That is, you should retain only those features whose importance values are above 1% (i.e., 0.01). You can either write your own Python code or use the function SelectFromModel from the sklearn.feature selection package to work out which feature(s) can be removed.
Report what features were retained and removed in the above process. Also report the total feature importance value that is retained after your dimension reduction step.
D17 3 marks
Compare the model’s performance (accuracy, precision, recall) on training and test sets when using the reduced set of features and the model trained on the complete set of features. Also, report the corresponding confusion matrices on the test sets. (You will need to consider whether you should repeat the cross-validation process to find the optimal hyperparameters).
D18 1 mark
Comment on your results. What was the impact (if any) of reducing the number of features?
5

Fitting a Random Forest
D19 3 marks
Considering all features and the 80%-20% data split you did before, use 10-fold cross-validation and grid-search to find a Random Forest classifier’s optimal hyperparameters n estimators (number of estimators) and max depth. Remember to set the seed of the random state of the data split function and model class to the value “5508”. Use n estimators:[10, 20, 50, 100, 1000], and max depth:[2, 3, 4, 5]. For the cross-validation, set the value of the random state to “42”. Use accuracy for the scoring argument of the grid-search function.
Keep the other hyperparameter values to their default values. Use the optimal values for the n estimators and max depth hyperparameters to retrain the model and report:
• The obtained optimal number of estimators and max depth;
• The obtained accuracy, precision and recall on the training set;
• The obtained accuracy, precision and recall on the test set;
• The confusion matrix on the test set.
D20
How do these performances compare with the ones you obtained in D12? What changed with the use of a Random Forest model? Is this result what you would expect?
D21 2 marks
Thinking about the application and the different models you created, discuss:
• Do you think these models are good enough and can be trusted to be used for real?
• Do you think a more complex model is necessary?
• Do you think using a machine learning algorithm for this task is a good idea? That is, should this decision process be automated? Justify.
• Are there considerations with the used dataset?


請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp

標(biāo)簽:

掃一掃在手機打開當(dāng)前頁
  • 上一篇:代做CPT206、c/c++,Python程序設(shè)計代寫
  • 下一篇:FIT5225 代做、代寫 java,c++語言程序
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    久久久久久精品无码人妻_青春草无码精品视频在线观_无码精品国产VA在线观看_国产色无码专区在线观看

    国产综合免费视频| 91视频 -- 69xx| 日本一本二本在线观看| 男生操女生视频在线观看| 日产精品久久久久久久蜜臀| 天堂av在线网站| 韩日视频在线观看| 亚洲国产欧美91| 丁香婷婷激情网| 成年人午夜免费视频| 特级西西444www| 毛葺葺老太做受视频| 日本福利视频在线观看| 中文字幕中文在线| 欧美女人性生活视频| 免费视频爱爱太爽了| 亚洲天堂伊人网| 在线观看免费黄网站| 草草久久久无码国产专区| 成人性做爰片免费视频| 色片在线免费观看| 国产成人精品视频免费看| 黑人巨茎大战欧美白妇| 中文字幕66页| 9l视频白拍9色9l视频| 欧美成人xxxxx| 水蜜桃色314在线观看| 国产资源第一页| 日韩成人av免费| 色天使在线观看| 网站一区二区三区| 春日野结衣av| 日本日本19xxxⅹhd乱影响| 每日在线观看av| 中文字幕第一页亚洲| 交换做爰国语对白| 天天色天天综合网| 天天干天天av| 中文字幕 91| 黄色在线视频网| wwwwxxxx日韩| 欧美午夜aaaaaa免费视频| 免费黄色特级片| 超碰影院在线观看| 亚洲五月天综合| 日本www.色| 亚洲欧美激情网| 一区二区xxx| 五月天激情视频在线观看| 亚洲精品高清无码视频| 日本久久精品一区二区| 日本久久久久久久久久久久| 成人3d动漫一区二区三区| 中文字幕在线观看第三页| 无码少妇一区二区三区芒果| 999精彩视频| 国产一级免费大片| 欧美 国产 小说 另类| 缅甸午夜性猛交xxxx| av免费观看网| 茄子视频成人免费观看| av网站在线不卡| 中文字幕国产高清| 欧美日韩中文字幕在线播放| 一二三四中文字幕| 热99这里只有精品| 久久久久久香蕉| 午夜剧场在线免费观看| 欧美日韩亚洲国产成人| 亚洲色成人www永久在线观看| 久久精品视频16| 欧美日韩大尺度| 中文字幕剧情在线观看| 亚洲精品少妇一区二区| 亚洲熟妇无码一区二区三区| 久久黄色免费看| 999久久久精品视频| 中文字幕人妻熟女人妻洋洋| 亚洲午夜无码av毛片久久| jizz大全欧美jizzcom| 中文字幕av久久| 日韩欧美一区三区| 国产九九热视频| 久久亚洲国产成人精品无码区| 怡红院av亚洲一区二区三区h| 日日噜噜噜噜久久久精品毛片| 欧美 另类 交| 国产肥臀一区二区福利视频| 91欧美视频在线| 国产精品国三级国产av| 91精品91久久久中77777老牛| 中文字幕成人在线视频| 国产一级大片免费看| 亚洲精品乱码久久久久久自慰| 国产无遮挡猛进猛出免费软件| 欧美高清中文字幕| 狠狠躁狠狠躁视频专区| 毛片av在线播放| 亚洲一区在线不卡| 国产freexxxx性播放麻豆 | 国产在线青青草| 中文字幕亚洲影院| 日本国产在线播放| 午夜啪啪小视频| 国产男女无遮挡| 成人免费看片视频在线观看| 99久久国产宗和精品1上映| 日韩第一页在线观看| 成人在线免费播放视频| 永久免费看av| 色www免费视频| 精品无码国模私拍视频| 色网站在线视频| 国产精品无码一本二本三本色| 国产在线拍揄自揄拍无码| 黄色三级视频在线| 国产精品videossex国产高清| 911福利视频| 欧美精品成人网| 久操网在线观看| 成人性做爰片免费视频| 天天综合网久久| 免费裸体美女网站| 亚洲精品www.| 超碰网在线观看| 免费看黄在线看| 手机成人av在线| 男生操女生视频在线观看| 国产综合免费视频| 国产69精品久久久久999小说| 夜夜爽久久精品91| 制服丝袜综合网| 91精品91久久久中77777老牛| 四虎4hu永久免费入口| 91亚洲精品久久久蜜桃借种| 99热手机在线| 凹凸日日摸日日碰夜夜爽1| 激情五月宗合网| 成人黄色片免费| 性做爰过程免费播放| 做a视频在线观看| 国内自拍视频一区| 亚洲乱码国产一区三区| 国产视频九色蝌蚪| 成人一级生活片| 成人毛片100部免费看| 超碰成人在线免费观看| 一级黄色片在线免费观看| 毛片毛片毛片毛| 天天干天天玩天天操| 黑森林精品导航| 色www免费视频| 红桃视频 国产| 亚洲五月激情网| www亚洲国产| 吴梦梦av在线| 中文字幕色呦呦| 男人添女荫道口喷水视频| av在线播放天堂| 成人在线播放网址| 玩弄中年熟妇正在播放| 波多野结衣乳巨码无在线| 国产男女在线观看| 精品免费国产一区二区| 天天影视综合色| 福利片一区二区三区| 亚洲制服在线观看| a级网站在线观看| 肉大捧一出免费观看网站在线播放| 三年中国中文在线观看免费播放| 久久av秘一区二区三区| 日本a在线天堂| 奇米精品一区二区三区| 欧美 日韩 国产一区| 久久久精品麻豆| 欧美成人乱码一二三四区免费| 久久精品一二三四| 久久精品无码中文字幕| 黄色大片中文字幕| 99热成人精品热久久66| 天堂社区在线视频| 激情图片中文字幕| 伊人再见免费在线观看高清版| 青青青在线视频播放| 欧美日韩亚洲一| 亚洲欧美国产中文| 日韩精品福利片午夜免费观看| 国产一线二线三线女| 熟妇人妻va精品中文字幕| 国产亚洲视频一区| 欧美性猛交内射兽交老熟妇| 国产v片免费观看| wwww.国产| 国产尤物av一区二区三区| 18禁免费无码无遮挡不卡网站| 九九热精品在线播放| 97超碰国产精品| 91网址在线播放| 中国黄色录像片| 干日本少妇首页|