【PythonAI】5.2.3 技能实训：使用K近邻算法进行鸢尾花分类

张开发

• 2026/4/10 13:17:12 • 15 分钟阅读

分享文章

#kneighbors_classifier.py#!/usr/bin/env python3# -*- coding: utf-8 -*- K近邻算法实战鸢尾花分类统信UOS Scikit-learn机器学习入门 importpandasaspdimportnumpyasnpfromsklearn.datasetsimportload_irisfromsklearn.model_selectionimporttrain_test_split,cross_val_scorefromsklearn.preprocessingimportStandardScalerfromsklearn.neighborsimportKNeighborsClassifierfromsklearn.metricsimportaccuracy_score,classification_report,confusion_matriximportmatplotlib.pyplotasplt# 设置中文字体plt.rcParams[font.sans-serif][WenQuanYi Zen Hei]plt.rcParams[axes.unicode_minus]Falsedefload_and_explore_data():加载并探索数据# 加载鸢尾花数据集irisload_iris()# 创建DataFramedfpd.DataFrame(iris.data,columnsiris.feature_names)df[target]iris.target df[species]df[target].map({0:山鸢尾(setosa),1:变色鸢尾(versicolor),2:维吉尼亚鸢尾(virginica)})print(*60)print(鸢尾花数据集探索)print(*60)print(f样本数量:{len(df)})print(f特征数量:{len(iris.feature_names)})print(f类别数量:{len(iris.target_names)})print(f\n特征名称:{iris.feature_names})print(f\n类别分布:\n{df[species].value_counts()})print(f\n数据预览:\n{df.head()})print(f\n统计摘要:\n{df.describe()})returndf,irisdefvisualize_data(df):数据可视化fig,axesplt.subplots(2,2,figsize(14,10))# 花萼长度 vs 花萼宽度colors[#e74c3c,#2ecc71,#3498db]species_listdf[species].unique()fori,speciesinenumerate(species_list):subsetdf[df[species]species]axes[0,0].scatter(subset[sepal length (cm)],subset[sepal width (cm)],ccolors[i],labelspecies,alpha0.7,s60)axes[0,0].set_xlabel(花萼长度 (cm))axes[0,0].set_ylabel(花萼宽度 (cm))axes[0,0].set_title(花萼特征分布)axes[0,0].legend()axes[0,0].grid(True,alpha0.3)# 花瓣长度 vs 花瓣宽度fori,speciesinenumerate(species_list):subsetdf[df[species]species]axes[0,1].scatter(subset[petal length (cm)],subset[petal width (cm)],ccolors[i],labelspecies,alpha0.7,s60)axes[0,1].set_xlabel(花瓣长度 (cm))axes[0,1].set_ylabel(花瓣宽度 (cm))axes[0,1].set_title(花瓣特征分布)axes[0,1].legend()axes[0,1].grid(True,alpha0.3)# 特征箱线图df_melteddf.melt(id_vars[species],value_vars[sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)])df_melted.boxplot(byvariable,axaxes[1,0])axes[1,0].set_title(特征分布箱线图)axes[1,0].set_xlabel(特征)# 类别分布饼图species_countsdf[species].value_counts()axes[1,1].pie(species_counts,labelsspecies_counts.index,autopct%1.1f%%,colorscolors,startangle90)axes[1,1].set_title(样本类别分布)plt.tight_layout()plt.savefig(iris_exploration.png,dpi150,bbox_inchestight)plt.show()print(\n✓ 可视化结果已保存)deftrain_knn_model(X_train,X_test,y_train,y_test):训练K近邻模型print(\n*60)print(K近邻模型训练)print(*60)# 特征标准化KNN对尺度敏感scalerStandardScaler()X_train_scaledscaler.fit_transform(X_train)X_test_scaledscaler.transform(X_test)# 寻找最佳K值k_rangerange(1,31)cv_scores[]forkink_range:knnKNeighborsClassifier(n_neighborsk)scorescross_val_score(knn,X_train_scaled,y_train,cv5)cv_scores.append(scores.mean())best_klist(k_range)[np.argmax(cv_scores)]print(f最佳K值:{best_k}(交叉验证准确率:{max(cv_scores):.3f}))# 使用最佳K值训练最终模型best_knnKNeighborsClassifier(n_neighborsbest_k)best_knn.fit(X_train_scaled,y_train)# 预测y_predbest_knn.predict(X_test_scaled)# 评估accuracyaccuracy_score(y_test,y_pred)print(f\n测试集准确率:{accuracy:.3f})print(f\n分类报告:\n{classification_report(y_test,y_pred,target_namesload_iris().target_names)})# 混淆矩阵cmconfusion_matrix(y_test,y_pred)print(f混淆矩阵:\n{cm})returnbest_knn,scaler,accuracydefpredict_new_sample(model,scaler,sample):预测新样本sample_scaledscaler.transform([sample])predictionmodel.predict(sample_scaled)probabilitiesmodel.predict_proba(sample_scaled)irisload_iris()speciesiris.target_names[prediction[0]]print(f\n新样本特征:{sample})print(f预测类别:{species})print(f预测概率:{dict(zip(iris.target_names,probabilities[0]))})returnspeciesdefmain():主函数# 1. 加载数据df,irisload_and_explore_data()# 2. 可视化visualize_data(df)# 3. 准备数据Xdf.drop([target,species],axis1)ydf[target]X_train,X_test,y_train,y_testtrain_test_split(X,y,test_size0.2,random_state42,stratifyy)print(f\n训练集大小:{len(X_train)})print(f测试集大小:{len(X_test)})# 4. 训练模型model,scaler,accuracytrain_knn_model(X_train,X_test,y_train,y_test)# 5. 预测新样本print(\n*60)print(新样本预测演示)print(*60)# 示例一朵新测量的鸢尾花new_flower[5.1,3.5,1.4,0.2]# 山鸢尾的典型特征predict_new_sample(model,scaler,new_flower)new_flower2[6.3,3.3,6.0,2.5]# 维吉尼亚鸢尾的典型特征predict_new_sample(model,scaler,new_flower2)print(\n*60)print( KNN分类实战完成)print(*60)if__name____main__:main()运行结果(uos_ai_env)MuhtarUOS-Desktop:~/AI_Projects$ python3 kneighbors_classifier.py 鸢尾花数据集探索样本数量: 150 特征数量: 4 类别数量: 3 特征名称:[sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)]类别分布: 山鸢尾(setosa)50 变色鸢尾(versicolor)50 维吉尼亚鸢尾(virginica)50 Name: species,dtype: int64 数据预览: sepal length(cm)sepal width(cm)petal length(cm)petal width(cm)target species 0 5.1 3.5 1.4 0.2 0 山鸢尾(setosa)1 4.9 3.0 1.4 0.2 0 山鸢尾(setosa)2 4.7 3.2 1.3 0.2 0 山鸢尾(setosa)3 4.6 3.1 1.5 0.2 0 山鸢尾(setosa)4 5.0 3.6 1.4 0.2 0 山鸢尾(setosa)统计摘要: sepal length(cm)sepal width(cm)petal length(cm)petal width(cm)target count 150.000000 150.000000 150.000000 150.000000 150.000000 mean 5.843333 3.057333 3.758000 1.199333 1.000000 std 0.828066 0.435866 1.765298 0.762238 0.819232 min 4.300000 2.000000 1.000000 0.100000 0.000000 25% 5.100000 2.800000 1.600000 0.300000 0.000000 50% 5.800000 3.000000 4.350000 1.300000 1.000000 75% 6.400000 3.300000 5.100000 1.800000 2.000000 max 7.900000 4.400000 6.900000 2.500000 2.000000 kneighbors_classifier.py:95: UserWarning: Matplotlib is currentlyusingagg,which is a non-GUI backend,so cannot show the figure.plt.show()✓ 可视化结果已保存训练集大小: 120 测试集大小: 30 K近邻模型训练最佳K值: 5(交叉验证准确率: 0.967)测试集准确率: 0.933 分类报告: precision recall f1-score support setosa 1.00 1.00 1.00 10 versicolor 0.83 1.00 0.91 10 virginica 1.00 0.80 0.89 10 accuracy 0.93 30 macro avg 0.94 0.93 0.93 30 weighted avg 0.94 0.93 0.93 30 混淆矩阵:[[10 0 0][0 10 0][0 2 8]] 新样本预测演示 /home/Muhtar/AI_Projects/uos_ai_env/lib/python3.7/site-packages/sklearn/base.py:451: UserWarning: X does not have valid feature names,but StandardScaler was fitted with feature namesX does not have valid feature names, but新样本特征:[5.1,3.5,1.4,0.2]预测类别: setosa 预测概率:{setosa: 1.0,versicolor: 0.0,virginica: 0.0}/home/Muhtar/AI_Projects/uos_ai_env/lib/python3.7/site-packages/sklearn/base.py:451: UserWarning: X does not have valid feature names,but StandardScaler was fitted with feature namesX does not have valid feature names, but新样本特征:[6.3,3.3,6.0,2.5]预测类别: virginica 预测概率:{setosa: 0.0,versicolor: 0.0,virginica: 1.0} KNN分类实战完成 (uos_ai_env)MuhtarUOS-Desktop:~/AI_Projects$

【PythonAI】5.2.3 技能实训：使用K近邻算法进行鸢尾花分类

最新文章

从代码提交到模型上线：SITS2026定义的8个AI原生CI/CD必检关卡（含GitOps集成模板下载）

大数据运维：分布式集群基础配置

CosyVoice-300M Lite实战对比：轻量TTS模型在多语言场景下的表现评测

别再用Python了！在RK3588开发板上用C API部署RKNN模型，性能提升实战指南

2025届必备的AI学术平台实际效果

猫抓Cat-Catch：革命性网页资源智能捕获工具

推荐文章

CSS Scroll Snap：打造丝滑滚动体验

【2026年最新600套毕设项目分享】springboot高校学习讲座预约系统（14328）

STM32H7 USB复合设备库：CDC+MSC+SDMMC一体化固件

STM32异步Web服务器：零拷贝HTTP/WS工业网关实战

Linux命令-nc（用于设置路由器，是网络工具中的瑞士军刀）

【电池损耗+需求响应】考虑电池储能寿命与需求响应模型的发电计划优化程序Matlab代码

相关文章

2025 AI写作革命：自定义API打造专属小说生成器

用GDAL和PyTorch搞定多光谱.tif图像训练Faster R-CNN（避坑全记录）

HC-SR501人体红外传感器：从参数解析到树莓派实战应用

2026年三维扫描仪选购指南：专业厂家如何选，这几点是关键

微信小程序项目目录结构优化指南：从tabBar报错看最佳实践

探索Feishin：打造个人专属的自托管音乐播放解决方案

分享文章

更多文章

Pixel Epic · Wisdom Terminal 数据库智能运维：基于MySQL的SQL优化与故障诊断

龙芯k - 久久派开发环境搭建及内核升级（下）陶

Spring Cloud进阶--分布式权限校验OAuth越

行式存储（Row-based Storage）和列式存储（Column-base Storage）简介蚜

我让 Claude 和 Codex 同时审计个模块，它们只在个上达成共识怂

归并排序力扣题（leetcode）苯

Ostrakon-VL-8B商业应用：自动识别促销堆头高度/位置/物料完整性标准

【独家首发】R 4.5+Bioconductor 3.19微生物组可重复性审计报告：17个主流pipeline中仅2个通过FAIR 2.0验证

Windows钉钉防撤回终极指南：免费开源工具完整使用教程

用 AI Coding 工具生成万字奇幻世界设定的实践记录等

MPV高级配置与性能优化指南

ESP32 VAD实战：从参数调优到缓存机制解析