Cousera机器学习基石第二周笔记 Machine Learning Foundation Week 2 Note in Cousera

Posted on 2019-04-08 In 学习笔记 Word count in article: 778 Reading time ≈ 3 mins.

Learning to Answer Yes/No

Perceptron Hypothesis Set

Example:Credit Approval Problem Revisited

A Simple Hypothesis Set:the ‘Perceptron’

For x=(x₁,x₂,...,x_d) ‘feature of customer’,compute a weighted’score’ and

<center> $approve \quad credit\quad if \sum_{i=1}^{d} w_i x_i>threshold$<center>

<center> $deny \quad credit\quad if \sum_{i=1}^{d} w_i x_i<threshold$<center>
y : { + 1(good), − 1(bad)},0 ignored-linear formulah ∈ Hare

<center>$h(x)=sign((\sum_{i=i}^{d}w_i x_i)-threshold)$<center>

this is called’perceptron’hypothesis histroically

Vector Form of Perceptron Hypothesis

each’tall’w represents a hypothesis h &is multiplied with ‘tall’ x——will use tall versions to simplify notation

Q:What does h look like

Perceptrons in R²

perceptrons⇔linear(binary) classifiers

Perceptron Learning Algorithm (PLA)

Select g from H

want: g≈f(hard when f unkown)
almost necessary:g≈f on D,ideally g(x_n) = f(x_n) = y_n
difficult:H is of infinite size
idea: start from some g₀,and ‘correct’ its mistakes on D

Perceptron Learning Algorithm

For t=0,1,…

1. find a mistake of w_t called(x_n(t),y_n(t))

sign(W_t^TX_n(t)) ≠ y_n(t)

2. (try to) correct the mistake by

w_t + 1 ← w_t + y_n(t)x_n(t)

…until no more mistakes

return last w (called w_PLA )as g

My question:why determinant is like vector

Pratical implementation of PLA

Cyclic PLA

Some Remaning issues of PLA

###　Algorithmic:halt(no mistake)? - naive cyclic:?? - random cyclic:?? - other variant:??

Learning:g ≈ f ?

on D, if halt,yes(no mistake)
outside D:??
if not halting:??

Guarantee of PLA

Linear Separability

if PLA halts(i.e. no more mistakes), (necessary condition)D allows some w to make no mistakes
call such D linear separable

PLA Fact: w_tGets More Aligned with w_f

w_f perfect hence every x_n correctly away from line: $y_{n(t)}w_f^T x_{n(t)}\ge \underset{n}{min} y_{n(t)}w_f^T x_{n(t)}>0$
w_f^Tw_t↑by updating with any(x_n(t),y_n(t)) $$ \begin{split} w_f^Tw_{t+1}&=w_f^T(w_t+y_{n(t)}x_{n(t)})\\ &\ge w_f^Tw_{t}+\underset{n}{min} y_{n(t)}w_f^T x_{n(t)}\\ &> w_f^Tw_{t}+0 \end{split} $$

w_t appears more algned with w_f

Q:the length of vector

PLA fact: w_t Does Not Grow Too Fast

w_t changed only when mistake

⇔ sign(w_t^Tx_n(t)) ≠ y_n(t) ⇔ y_n(t)w_t^Tx_n(t) ≤ 0

mistake ‘limits’||w_t||² growth,even when updating with’longest’ x_n $$ \begin{split} ||w_{t+1}||^2&=||w_t+y_{n(t)}x_{n(t)}||^2\\ &=||w_t||^2+2y_{n(t)}w_t^Tx_{n(t)}+||y_{n(t)}x_{n(t)}||^2\\ &\le ||w_t||^2+0+||y_{n(t)}x_{n(t)}||^2\\ &\le||w_t||^2+\underset{n}{max}||y_nx_n||^2 \end{split} $$

start from w₀ = 0,afterT mistake corrections, $$ \frac{w_f^T\quad w_T}{||w_f||\quad||w_T||}\ge \sqrt{T}\cdot constant $$

Novikoff theorem

ref:Lihangpage 31

设训练数据集T=(x₁,y₁), (x₂,y₂), ..., (x_N,y_N)是线性可分的，其中x_i ∈ 𝒳 = Rⁿ, y_i ∈ 𝒴 = −1, + 1, i = 1, 2, ..., N,则

（1）存在满足条件||ŵ_opt|| = 1的超平面ŵ_opt ⋅ x̂ + b_opt = 0将训练数据集完全正确分开，且存在γ > 0，对所有i = 1, 2, ..., N y_i(ŵ_opt⋅x̂_i) = y_i(ŵ_opt ⋅ x̂_i + b_opt ≥ γ

(2)令$R=\underset{1\le i\le N}{max}||\hat{x}_i||$,则感知机算法在训练数据集上的误分类次数k满足不等式 $$ k\le (\frac{R}{\gamma})^2 $$

Cousera机器学习基石第二周笔记 Machine Learning Foundation Week 2 Note in Cousera

Perceptron Hypothesis Set

A Simple Hypothesis Set:the ‘Perceptron’

Vector Form of Perceptron Hypothesis

Perceptrons in R²

Perceptron Learning Algorithm (PLA)

Select g from H

Perceptron Learning Algorithm

Pratical implementation of PLA

Cyclic PLA

Some Remaning issues of PLA

Learning:g ≈ f ?

Guarantee of PLA

Linear Separability

PLA Fact: w_tGets More Aligned with w_f

PLA fact: w_t Does Not Grow Too Fast

Novikoff theorem

Non-Separable Data

More about PLA

Learning with Noisy Data

Line with Noise Tolerance

Pocket Algorithm

Summary

Perceptron Hypothesis Set

A Simple Hypothesis Set:the ‘Perceptron’

Vector Form of Perceptron Hypothesis

Perceptrons in R2

Perceptron Learning Algorithm (PLA)

Select g from H

Perceptron Learning Algorithm

Pratical implementation of PLA

Cyclic PLA

Some Remaning issues of PLA

Learning:g ≈ f ?

Guarantee of PLA

Linear Separability

PLA Fact: wtGets More Aligned with wf

PLA fact: wt Does Not Grow Too Fast

Novikoff theorem

Non-Separable Data

More about PLA

Learning with Noisy Data

Line with Noise Tolerance

Pocket Algorithm

Summary

Perceptrons in R²

PLA Fact: w_tGets More Aligned with w_f

PLA fact: w_t Does Not Grow Too Fast