【PythonAI】5.2.3 技能实训:使用K近邻算法进行鸢尾花分类

张开发
2026/4/10 13:17:12 15 分钟阅读

分享文章

【PythonAI】5.2.3 技能实训:使用K近邻算法进行鸢尾花分类
#kneighbors_classifier.py#!/usr/bin/env python3# -*- coding: utf-8 -*- K近邻算法实战鸢尾花分类 统信UOS Scikit-learn机器学习入门 importpandasaspdimportnumpyasnpfromsklearn.datasetsimportload_irisfromsklearn.model_selectionimporttrain_test_split,cross_val_scorefromsklearn.preprocessingimportStandardScalerfromsklearn.neighborsimportKNeighborsClassifierfromsklearn.metricsimportaccuracy_score,classification_report,confusion_matriximportmatplotlib.pyplotasplt# 设置中文字体plt.rcParams[font.sans-serif][WenQuanYi Zen Hei]plt.rcParams[axes.unicode_minus]Falsedefload_and_explore_data():加载并探索数据# 加载鸢尾花数据集irisload_iris()# 创建DataFramedfpd.DataFrame(iris.data,columnsiris.feature_names)df[target]iris.target df[species]df[target].map({0:山鸢尾(setosa),1:变色鸢尾(versicolor),2:维吉尼亚鸢尾(virginica)})print(*60)print(鸢尾花数据集探索)print(*60)print(f样本数量:{len(df)})print(f特征数量:{len(iris.feature_names)})print(f类别数量:{len(iris.target_names)})print(f\n特征名称:{iris.feature_names})print(f\n类别分布:\n{df[species].value_counts()})print(f\n数据预览:\n{df.head()})print(f\n统计摘要:\n{df.describe()})returndf,irisdefvisualize_data(df):数据可视化fig,axesplt.subplots(2,2,figsize(14,10))# 花萼长度 vs 花萼宽度colors[#e74c3c,#2ecc71,#3498db]species_listdf[species].unique()fori,speciesinenumerate(species_list):subsetdf[df[species]species]axes[0,0].scatter(subset[sepal length (cm)],subset[sepal width (cm)],ccolors[i],labelspecies,alpha0.7,s60)axes[0,0].set_xlabel(花萼长度 (cm))axes[0,0].set_ylabel(花萼宽度 (cm))axes[0,0].set_title(花萼特征分布)axes[0,0].legend()axes[0,0].grid(True,alpha0.3)# 花瓣长度 vs 花瓣宽度fori,speciesinenumerate(species_list):subsetdf[df[species]species]axes[0,1].scatter(subset[petal length (cm)],subset[petal width (cm)],ccolors[i],labelspecies,alpha0.7,s60)axes[0,1].set_xlabel(花瓣长度 (cm))axes[0,1].set_ylabel(花瓣宽度 (cm))axes[0,1].set_title(花瓣特征分布)axes[0,1].legend()axes[0,1].grid(True,alpha0.3)# 特征箱线图df_melteddf.melt(id_vars[species],value_vars[sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)])df_melted.boxplot(byvariable,axaxes[1,0])axes[1,0].set_title(特征分布箱线图)axes[1,0].set_xlabel(特征)# 类别分布饼图species_countsdf[species].value_counts()axes[1,1].pie(species_counts,labelsspecies_counts.index,autopct%1.1f%%,colorscolors,startangle90)axes[1,1].set_title(样本类别分布)plt.tight_layout()plt.savefig(iris_exploration.png,dpi150,bbox_inchestight)plt.show()print(\n✓ 可视化结果已保存)deftrain_knn_model(X_train,X_test,y_train,y_test):训练K近邻模型print(\n*60)print(K近邻模型训练)print(*60)# 特征标准化KNN对尺度敏感scalerStandardScaler()X_train_scaledscaler.fit_transform(X_train)X_test_scaledscaler.transform(X_test)# 寻找最佳K值k_rangerange(1,31)cv_scores[]forkink_range:knnKNeighborsClassifier(n_neighborsk)scorescross_val_score(knn,X_train_scaled,y_train,cv5)cv_scores.append(scores.mean())best_klist(k_range)[np.argmax(cv_scores)]print(f最佳K值:{best_k}(交叉验证准确率:{max(cv_scores):.3f}))# 使用最佳K值训练最终模型best_knnKNeighborsClassifier(n_neighborsbest_k)best_knn.fit(X_train_scaled,y_train)# 预测y_predbest_knn.predict(X_test_scaled)# 评估accuracyaccuracy_score(y_test,y_pred)print(f\n测试集准确率:{accuracy:.3f})print(f\n分类报告:\n{classification_report(y_test,y_pred,target_namesload_iris().target_names)})# 混淆矩阵cmconfusion_matrix(y_test,y_pred)print(f混淆矩阵:\n{cm})returnbest_knn,scaler,accuracydefpredict_new_sample(model,scaler,sample):预测新样本sample_scaledscaler.transform([sample])predictionmodel.predict(sample_scaled)probabilitiesmodel.predict_proba(sample_scaled)irisload_iris()speciesiris.target_names[prediction[0]]print(f\n新样本特征:{sample})print(f预测类别:{species})print(f预测概率:{dict(zip(iris.target_names,probabilities[0]))})returnspeciesdefmain():主函数# 1. 加载数据df,irisload_and_explore_data()# 2. 可视化visualize_data(df)# 3. 准备数据Xdf.drop([target,species],axis1)ydf[target]X_train,X_test,y_train,y_testtrain_test_split(X,y,test_size0.2,random_state42,stratifyy)print(f\n训练集大小:{len(X_train)})print(f测试集大小:{len(X_test)})# 4. 训练模型model,scaler,accuracytrain_knn_model(X_train,X_test,y_train,y_test)# 5. 预测新样本print(\n*60)print(新样本预测演示)print(*60)# 示例一朵新测量的鸢尾花new_flower[5.1,3.5,1.4,0.2]# 山鸢尾的典型特征predict_new_sample(model,scaler,new_flower)new_flower2[6.3,3.3,6.0,2.5]# 维吉尼亚鸢尾的典型特征predict_new_sample(model,scaler,new_flower2)print(\n*60)print( KNN分类实战完成)print(*60)if__name____main__:main()运行结果(uos_ai_env)MuhtarUOS-Desktop:~/AI_Projects$ python3 kneighbors_classifier.py 鸢尾花数据集探索 样本数量: 150 特征数量: 4 类别数量: 3 特征名称:[sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)]类别分布: 山鸢尾(setosa)50 变色鸢尾(versicolor)50 维吉尼亚鸢尾(virginica)50 Name: species,dtype: int64 数据预览: sepal length(cm)sepal width(cm)petal length(cm)petal width(cm)target species 0 5.1 3.5 1.4 0.2 0 山鸢尾(setosa)1 4.9 3.0 1.4 0.2 0 山鸢尾(setosa)2 4.7 3.2 1.3 0.2 0 山鸢尾(setosa)3 4.6 3.1 1.5 0.2 0 山鸢尾(setosa)4 5.0 3.6 1.4 0.2 0 山鸢尾(setosa)统计摘要: sepal length(cm)sepal width(cm)petal length(cm)petal width(cm)target count 150.000000 150.000000 150.000000 150.000000 150.000000 mean 5.843333 3.057333 3.758000 1.199333 1.000000 std 0.828066 0.435866 1.765298 0.762238 0.819232 min 4.300000 2.000000 1.000000 0.100000 0.000000 25% 5.100000 2.800000 1.600000 0.300000 0.000000 50% 5.800000 3.000000 4.350000 1.300000 1.000000 75% 6.400000 3.300000 5.100000 1.800000 2.000000 max 7.900000 4.400000 6.900000 2.500000 2.000000 kneighbors_classifier.py:95: UserWarning: Matplotlib is currentlyusingagg,which is a non-GUI backend,so cannot show the figure.plt.show()✓ 可视化结果已保存 训练集大小: 120 测试集大小: 30 K近邻模型训练 最佳K值: 5(交叉验证准确率: 0.967)测试集准确率: 0.933 分类报告: precision recall f1-score support setosa 1.00 1.00 1.00 10 versicolor 0.83 1.00 0.91 10 virginica 1.00 0.80 0.89 10 accuracy 0.93 30 macro avg 0.94 0.93 0.93 30 weighted avg 0.94 0.93 0.93 30 混淆矩阵:[[10 0 0][0 10 0][0 2 8]] 新样本预测演示 /home/Muhtar/AI_Projects/uos_ai_env/lib/python3.7/site-packages/sklearn/base.py:451: UserWarning: X does not have valid feature names,but StandardScaler was fitted with feature namesX does not have valid feature names, but新样本特征:[5.1,3.5,1.4,0.2]预测类别: setosa 预测概率:{setosa: 1.0,versicolor: 0.0,virginica: 0.0}/home/Muhtar/AI_Projects/uos_ai_env/lib/python3.7/site-packages/sklearn/base.py:451: UserWarning: X does not have valid feature names,but StandardScaler was fitted with feature namesX does not have valid feature names, but新样本特征:[6.3,3.3,6.0,2.5]预测类别: virginica 预测概率:{setosa: 0.0,versicolor: 0.0,virginica: 1.0} KNN分类实战完成 (uos_ai_env)MuhtarUOS-Desktop:~/AI_Projects$

更多文章