Cousera机器学习基石第二周笔记 Machine Learning Foundation Week 2 Note in Cousera
Learning to Answer Yes/No
Perceptron Hypothesis Set
Example:Credit Approval Problem Revisited
A Simple Hypothesis Set:the ‘Perceptron’
For x=(x1,x2,...,xd) ‘feature of customer’,compute a weighted’score’ and
<center> $approve \quad credit\quad if \sum_{i=1}^{d} w_i x_i>threshold$<center>
<center> $deny \quad credit\quad if \sum_{i=1}^{d} w_i x_i<threshold$<center>
y : { + 1(good), − 1(bad)},0 ignored-linear formulah ∈ Hare
<center>$h(x)=sign((\sum_{i=i}^{d}w_i x_i)-threshold)$<center>
this is called’perceptron’hypothesis histroically
Vector Form of Perceptron Hypothesis
$$$$
- each’tall’w represents a hypothesis h &is multiplied with ‘tall’ x——will use tall versions to simplify notation
Q:What does h look like
Perceptrons in R2
perceptrons⇔linear(binary) classifiers
Perceptron Learning Algorithm (PLA)
Select g from H
- want: g≈f(hard when f unkown)
- almost necessary:g≈f on D,ideally g(xn) = f(xn) = yn
- difficult:H is of infinite size
- idea: start from some g0,and ‘correct’ its mistakes on D
Perceptron Learning Algorithm
For t=0,1,…
1. find a mistake of wt called(xn(t),yn(t))
sign(WtTXn(t)) ≠ yn(t)
2. (try to) correct the mistake by
wt + 1 ← wt + yn(t)xn(t)
…until no more mistakes
return last w (called wPLA )as g
My question:why determinant is like vector
Pratical implementation of PLA
Cyclic PLA
Some Remaning issues of PLA
### Algorithmic:halt(no mistake)? - naive cyclic:?? - random cyclic:?? - other variant:??
Learning:g ≈ f ?
- on D, if halt,yes(no mistake)
- outside D:??
- if not halting:??
Guarantee of PLA
Linear Separability
- if PLA halts(i.e. no more mistakes), (necessary condition)D allows some w to make no mistakes
- call such D linear separable
PLA Fact: wtGets More Aligned with wf
wf perfect hence every xn correctly away from line: $y_{n(t)}w_f^T x_{n(t)}\ge \underset{n}{min} y_{n(t)}w_f^T x_{n(t)}>0$
wfTwt↑by updating with any(xn(t),yn(t)) $$ \begin{split} w_f^Tw_{t+1}&=w_f^T(w_t+y_{n(t)}x_{n(t)})\\ &\ge w_f^Tw_{t}+\underset{n}{min} y_{n(t)}w_f^T x_{n(t)}\\ &> w_f^Tw_{t}+0 \end{split} $$
wt appears more algned with wf
Q:the length of vector
PLA fact: wt Does Not Grow Too Fast
wt changed only when mistake
⇔ sign(wtTxn(t)) ≠ yn(t) ⇔ yn(t)wtTxn(t) ≤ 0
- mistake ‘limits’||wt||2 growth,even when updating with’longest’ xn $$ \begin{split} ||w_{t+1}||^2&=||w_t+y_{n(t)}x_{n(t)}||^2\\ &=||w_t||^2+2y_{n(t)}w_t^Tx_{n(t)}+||y_{n(t)}x_{n(t)}||^2\\ &\le ||w_t||^2+0+||y_{n(t)}x_{n(t)}||^2\\ &\le||w_t||^2+\underset{n}{max}||y_nx_n||^2 \end{split} $$
start from w0 = 0,afterT mistake corrections, $$ \frac{w_f^T\quad w_T}{||w_f||\quad||w_T||}\ge \sqrt{T}\cdot constant $$
Novikoff theorem
ref:Lihang
设训练数据集T=(x1,y1), (x2,y2), ..., (xN,yN)是线性可分的,其中xi ∈ 𝒳 = Rn, yi ∈ 𝒴 = −1, + 1, i = 1, 2, ..., N,则
(1)存在满足条件||ŵopt|| = 1的超平面ŵopt ⋅ x̂ + bopt = 0将训练数据集完全正确分开,且存在γ > 0,对所有i = 1, 2, ..., N yi(ŵopt⋅x̂i) = yi(ŵopt ⋅ x̂i + bopt ≥ γ
(2)令$R=\underset{1\le i\le N}{max}||\hat{x}_i||$,则感知机算法在训练数据集上的误分类次数k满足不等式 $$ k\le (\frac{R}{\gamma})^2 $$
Non-Separable Data
More about PLA
CONS:maybe not’linnear separable’,and not fully sure how long halting takes
Learning with Noisy Data
Line with Noise Tolerance
NP-hard
Pocket Algorithm
Find the least mistakes until iterations
Summary
Perceptron Hypothesis Set
hyperplanes/linear classifiers in ℛd
Perceptron Learning Algorithm(PLA)
correct mistakes and improve iteratively
Guarantee of PLA
no mistake eventually if linear separable
Non-Separable Data
hold somewhat’best’weights in pocket