电脑
anaconda(python3.6)
jupyter notebook
加载模块。# -*- coding: utf-8 -*-import numpy as npimport scipy as spfrom sklearn import treefrom sklearn.metrics import precision_recall_curvefrom sklearn.metrics import classification_reportfrom sklearn.cross_validation import train_test_split个别模块被别的东西取缔了,python给了我们一个提示。
把txt文档里面的数据读取出来:data = []labels = []with open('D:/HintSoft/Hint-W7/Desktop/data.txt') as ifile: for line in ifile: tokens = line.strip().split(' ') data.append([float(tk) for tk in tokens[:-1]]) labels.append(tokens[-1])x = np.array(data)labels = np.array(labels)print(x)print(labels)
把瘦用0代替,把胖用1代替:y = np.zeros(labels.shape)y[labels=='fat']=1
构造训练数据集和测试集:x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)
构造一个未训练的树状分类器:f = tree.DecisionTreeClassifier(criterion='entropy')用训练集来训练这个分类器:f.fit(x_train, y_train)训练的过程,就是拟合。
把训练的树状分类器保存下来:with open('D:/HintSoft/Hint-W7/Desktop/.dot', 'w') as g: g = tree.export_graphviz(f, out_file=g)打开dot文件,可以看到这个决策树的各参数。
用测试集对f进行测试:answer = f.predict(x_test)print(x_test)print(answer)print(y_test)print(np.mean( answer == y_test))
大家试一下,本文的例子,能不能用线性函数来拟合。