Notes in Machine Learning with Python(2)
Environment Problem
- numpy and scipy
- normal ones:pip install pandas quandl sklearn numpy matplotlib
- pythonprogramming.net
- github
Regression
- python list : xs*ys means every element in xs times ys by the order of index
Matplotlib
plt.scatter()
—-> scatter
Classification
K Nearest Neighbors Application
- dataset
- numpy.reshape :
example_measures 1
2
3
4
5
6
7
8
## a:将要被重塑的类数组或数组
## newshape:整数值或整数元组。新的形状应该兼容于原始形状。如果是一个整数值,表示一个一维数组的长度;
## 如果是元组,一个元素值可以为-1,此时该元素值表示为指定,此时会从数组的长度和剩余的维度中推断出
`
- lib - warnings :
is set to a value less than total voting groups!') 1
`
- numpy.linalg.norm :
np.linalg.norm(np.array(features) - np.array(predict))
python dictionary:
1
2
3
4
5
6dataset = {'k':[[1,2],[2,3],[3,1]], 'r':[[6,5],[7,7],[8,6]]}
new_features = [5,7]
for group in dataset:
for features in data[group]:
euclidean_distance = np.linalg.norm(np.array(features) - np.array(predict))
distances.append([euclidean_distance, group])Lib - Counters :
from collections import Counter
vote_result = Counter(votes).most_common(1)[0][0]
- It gives us a list of tuple,the ‘1’ in here determines the numbers of the most common tuples
- tuples:(the most common element,numbers of the most common)
use the [-num] of list flexbily
1
2
3
4
5
6
7
8
9
10
11test_size = 0.2
train_set = {2:[], 4:[]}
test_set = {2:[], 4:[]}
train_data = full_data[:-int(test_size*len(full_data))]
test_data = full_data[-int(test_size*len(full_data)):]
for i in train_data:
train_set[i[-1]].append(i[:-1])
for i in test_data:
test_set[i[-1]].append(i[:-1])