Nai?ve Bayes Classifiers

miracledan 發(fā)布于2019-07-30 15:05 / 1511人閱讀

1.1 Exact Bayes Classifier

We would like to classify categorical output $(k_1,k_2,...,k_3)$ given some attributes$(x_1, x_2, ..., x_n)$

For example, we would like to predict the output is $k_1$ or $k_2$ given three attributes $A,B,C$

If $P(k_1|A, B, C)$ > $P(k_2|A, B, C)$

we would like to say A, B, C are more likely to belong to $k_1$; vice versa

Notation:

If A exists, A; if A does not exist, -A

If B exists, B; if B does not exist, -B

If C exists, C; if C does not exist, -C

Then, if we apply Bayes" Theorm,

$$P(k_1|A, B, C)$$
=$$frac{P(k_1)P(A,B,C|k_1)}{P(A,B,C)}$$

By applying total probability law,

$ Longrightarrow$
$$frac{P(k_1)P(A,B,C|k_1)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$

However, to calculate $P(A,B,C|k_1)$ needs $2^i$ spaces, where i = 3 in this case,
to calculate $P(A,B,C|k_2)$ needs another $2^2$ spaces

The frequency table is like below:

Freuency	A, B, C	A, B, -C	A, -B, C	A, -B, -C	-A, B, C	-A, B, -C	-A, -B, C	-A, -B, C
k1	1	2	3	4	5	6	7	8
k2	9	8	7	6	5	4	3	2

Therefore, we introduce Naive Bayes Algorithm to reduce the storing space and computational speed.

# 1.2 Naive Bayes Classifier

We assume class conditional independence, so that

$P(A,B,C|k_1)$ is equal to $P(A|k_1)P(B|k_1)P(C|k_1)$

$P(A,B,C|k_2)$ is equal to $P(A|k_2)P(B|k_2)P(C|k_2)$

And now, we need only 2in records, where i is the number of attributes, and i being number of categorical output we will predict

Freuency	A	-A	B	-B	C	-C
k1	1	2	3	4	5	6
k2	7	6	5	4	3	2

Therefore, our problem

$$P(k_1|A, B, C)$$
= $$frac{P(k_1)P(A,B,C|k_1)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$
$Longrightarrow$
$$frac{P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]}{P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]+P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]} (i)$$

$$P(k_2|A, B, C)$$
= $$frac{P(k_2)P(A,B,C|k_2)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$
$Longrightarrow$
$$frac{P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]}{P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]+P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]} (ii)$$

We notice that (i),(ii) share the same numerator, we can focus only on the denominator

$$P(k_1|A, B, C)$$
= $$frac{P(k_1)P(A,B,C|k_1)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$
$propto$
$$P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]$$

$$P(k_2|A, B, C)$$
= $$frac{P(k_2)P(A,B,C|k_2)}{P(k_1)P(A,B,C|k_1)+P(k_2)P(A,B,C|k_2)}$$
$propto$
$$P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]$$

If $P(k_1)[P(A|k_1)P(B|k_1)P(C|k_1)]$ > $P(k_2)[P(A|k_2)P(B|k_2)P(C|k_2)]$, we say that A, B, C are more likely to belong to $k_1$; vice versa

1.3 why not P(C | A, B) = P(C | A) * P(C | B)

From 1.2, we know that from Naive Bayes Algorithm, we assume class conditional independence, so that

$P(A,B | C)$ = $P( A | C) * P(B | C)$

buy why not diretly say that

$P(C | A, B)$ = $P(C | A) * P(C | B)$

This is because it happens only when P(C) = 0 or P(1), meaningless

$$P(C | A, B)=P(C | A) * P(C | B)$$

$Longrightarrow$

$$frac{P(C,(A,B))}{P(A,B)} = frac{P(C,A)}{P(A)}*frac{P(C,B)}{P(B)} $$

$Longrightarrow if B and C are class conditional independent$

$$frac{P(C)*P(A)*P(B)}{P(A)*P(B)} = frac{P(C)*P(A)}{P(A)}*frac{P(C)*P(B)}{P(B)} $$

$Longrightarrow$

$$P(C) = P(C)*P(C)$$

where only possible if P(C) = 0, or, 1

Therefore, we use Bayes therom to swap $P(C | A, B)$ to $P(A, B | C) $ before applying naive bayes algorithm

2. Example

Consider the following 4 SMS messages:

message	Label
I am not coming	ham
Good work	ham
Do you need viagra	spam
win an IMac	spam

2.1 Compute the prior probabilities of a new SMS message being ‘spam’ or ‘ham’.

Let $p(spam)$ be the probability of a new SMS message being "spam"

Let $p(ham)$ be the probability of a new SMS message being "ham"

Therefore
$$p(spam)= 0.5$$
$$p(ham)=0.5$$

2.2 For each de-capitalised keyword that appears in your training set (that is, ‘i’, ‘a(chǎn)m’,‘not’, ‘coming’, ‘good’, ‘work’, ‘do’, ‘you’, ‘need’, ‘viagra’, ‘win’, ‘a(chǎn)n’ and ‘imac’), build a frequency table that records the likelihoods P(W|ham), P(-W|ham), P(W|spam) and P(-W|spam).

Each de-capitalised keyword are put into two rows(word row, and -word row):

we mark the number of ham massage that the certain keyword exists on the (word row, ham column);

we mark the number of ham massage that the certain keyword does not exist on the (-word row, ham column);

we mark the number of spam massage that the certain keyword exists on the (word row, spam column);

we mark the number of spam massage that the certain keyword does not exist on the (-word row, spam column);

We can construct a frequency table following:

Frequency	Ham	Spam
-am	1	2
-an	2	1
-coming	1	2
-do	2	1
-good	1	2
-i	1	2
-imac	2	1
-need	2	1
-not	1	2
-viagra	2	1
-win	2	1
-work	1	2
-you	2	1
am	1	0
an	0	1
coming	1	0
do	0	1
good	1	0
i	1	0
imac	0	1
need	0	1
not	1	0
viagra	0	1
win	0	1
work	1	0
you	0	1

Then, to record the likelihoods P(W|ham), P(-W|ham), P(W|spam) and P(-W|spam), we divide each entry in ham column by 2 (the total number of ham messages), and divide each entry in spam column by 2 (the total number of spam messages).

In addition, to prevent the likelihood of 1 and 0, we replace any likelihood smaller than 0.05 (larger than 0.95) with 0.05 (0.95) by using one of the variants of the Laplace estimator.

Therefore, we get the following likelihood table:

Probability of the row name given the column name	Ham	Spam
-am	0.50	0.95
-an	0.95	0.50
-coming	0.50	0.95
-do	0.95	0.50
-good	0.50	0.95
-i	0.50	0.95
-imac	0.95	0.50
-need	0.95	0.50
-not	0.50	0.95
-viagra	0.95	0.50
-win	0.95	0.50
-work	0.50	0.95
-you	0.95	0.50
am	0.50	0.05
an	0.05	0.50
coming	0.50	0.05
do	0.05	0.50
good	0.50	0.05
i	0.50	0.05
imac	0.05	0.50
need	0.05	0.50
not	0.50	0.05
viagra	0.05	0.50
win	0.05	0.50
work	0.50	0.05
you	0.05	0.50

2.3 Predict if the following two SMS messages "Coming home ?" and "Get Viagra now" are ham or spam? 2.3.1 For message "coming home":

$$P(ham | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$

is greater than

$$P(spam | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$

we say that the message "coming home" is more likely to be a ham message; vice versa.

According to Bayes’ Theorem,

$$P(ham | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$

$Bayes" Therom Longrightarrow$

$$frac{P(ham)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| ham)}{P( -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)}$$

$Total probability law Longrightarrow$

$$frac
{P(ham)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| ham)}
{P(ham)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| ham)
+P(spam)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| spam)} $$

$naive bayes simplify Longrightarrow$

$$ frac{P(ham)[P(-i|ham)P(-am|ham)P(-not|ham)P(coming|ham)P(-good|ham)P(-work|ham) P(-do|ham)P(-you|ham)P(-need|ham)P(-viagra|ham)P(-win|ham)(-an|ham)P(-imac|ham)]}{P(ham)[P(-i|ham)P(-am|ham)P(-not|ham)P(coming|ham)P(-good|ham)P(-work|ham) P(-do|ham)P(-you|ham)P(-need|ham)P(-viagra|ham)P(-win|ham)(-an|ham)P(-imac|ham)]+P(spam)[P(-i|spam)P(-am|spam)P(-not|spam)P(coming|spam)P(-good|spam)P(-work|spam) P(-do|spam)P(-you|spam)P(-need|spam)P(-viagra|spam)P(-win|spam)(-an|spam)P(-imac|spam)]}$$ = $$frac {0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.95*0.95*0.95*0.95)}{0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.95*0.95*0.95*0.95)+0.5*(0.95*0.95*0.95*0.05*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)}$$ = $$frac{0.00545576012}{0.00545576012+0.00015112908}=0.97304582369$$ Alternatively, we can focus only on the propensities which are proportional to posterior probability. $$P(ham | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$ $propto$ $$P(ham)[P(-i|ham)P(-am|ham)P(-not|ham)P(coming|ham)P(-good|ham)P(-work|ham) P(-do|ham)P(-you|ham)P(-need|ham)P(-viagra|ham)P(-win|ham)(-an|ham)P(-imac|ham)]$$ = $$ 0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.95*0.95*0.95*0.95)$$ = $$0.00545576012$$ Similarly, According to Bayes’ Theorem, $$P(spam | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$ $Bayes" Therom Longrightarrow$ $$frac{P(spam)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| spam)}{P( -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)}$$ $Total probability law Longrightarrow$ $$frac {P(spam)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| spam)} {P(ham)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| ham) +P(spam)P(-i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac| spam)} $$ $naive bayes simplify Longrightarrow$ $$frac{P(spam)[P(-i|spam)P(-am|spam)P(-not|spam)P(coming|spam)P(-good|spam)P(-work|spam) P(-do|spam)P(-you|spam)P(-need|spam)P(-viagra|spam)P(-win|spam)(-an|spam)P(-imac|spam)]}{P(ham)[P(-i|ham)P(-am|ham)P(-not|ham)P(coming|ham)P(-good|ham)P(-work|ham) P(-do|ham)P(-you|ham)P(-need|ham)P(-viagra|ham)P(-win|ham)(-an|ham)P(-imac|ham)]+P(spam)[P(-i|spam)P(-am|spam)P(-not|spam)P(coming|spam)P(-good|spam)P(-work|spam) P(-do|spam)P(-you|spam)P(-need|spam)P(-viagra|spam)P(-win|spam)(-an|spam)P(-imac|spam)]}$$ = $$frac{0.5*(0.95*0.95*0.95*0.05*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)}{0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.95*0.95*0.95*0.95)+0.5*(0.95*0.95*0.95*0.05*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)} $$

$$frac{0.00015112908}{0.00545576012+0.00015112908}=0.0269541763$$

Alternatively, we can focus only on the propensities which are proportional to posterior probability.

$$P(spam | -i , -am , -not , coming , -good , -work , -do , -you , -need , -viagra , -win , -an , -imac)$$

$propto$

$$ 0.5*(0.95*0.95*0.95*0.05*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)$$

$$0.00015112908$$

Since probability 0.00545576012 > 0.00015112908, or propensity 0.97304582369 > 0.0269541763, we conclude that the message "coming home" is more likely to be a ham message.

2.3.2 For message"Get Viagra now":

$$P(ham | -i , -am , -not , -coming , -good , -work , -do , -you , -need , viagra , -win , -an , -imac)$$

$propto$

$$0.5*(0.5*0.5*0.5*0.5*0.5*0.5*0.95*0.95*0.95*0.05*0.95*0.95*0.95)=0.00028714526$$

$$P(spam | -i , -am , -not , -coming , -good , -work , -do , -you , -need , viagra , -win , -an , -imac) $$

$propto$

$$0.5*(0.95*0.95*0.95*0.95*0.95*0.95*0.5*0.5*0.5*0.5*0.5*0.5*0.5)=0.00287145269$$

To calculate the probability of we divide 0.00028714526 and 0.00287145269 by (0.00028714526+0.00287145269), respectively, and the probability is 0.09090908831 and 0.90909091168, which again suggested "Get Viagra now" is more likely to be a spam message.

Since the propensity 0.00028714526 < 0.00287145269, or the probability 0.09090908831< 0.90909091168, we believe that the message "Get Viagra now" is more likely to be a spam message.

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://systransis.cn/yun/41029.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

miracledan

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

小程序第一彈

閱讀 2234·2021-11-22 09:34
深入理解C語(yǔ)言指針——挑戰(zhàn)C指針筆試題 (和bug郭一起學(xué)C系列）

閱讀 1346·2021-10-11 10:59
監(jiān)控主機(jī)域名是什么-虛擬主機(jī)和域名是什么關(guān)系？

閱讀 4448·2021-09-22 15:56
主機(jī)后面一個(gè)電腦標(biāo)識(shí)的是什么-網(wǎng)絡(luò)標(biāo)識(shí)和主機(jī)標(biāo)識(shí)是什么意思？

閱讀 3301·2021-09-22 15:08
CSS中各種布局的背后(*FC)

閱讀 3412·2019-08-30 14:01
我們來(lái)談?wù)勴?xiàng)目中遇到的需要兼容pc端和移動(dòng)端的css布局

閱讀 784·2019-08-30 11:16
前端高頻面試題

閱讀 1137·2019-08-26 13:51
最近的狀態(tài)

閱讀 2919·2019-08-26 13:43

成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專(zhuān)欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

Nai?ve Bayes Classifiers

相關(guān)文章

Naive Bayes（樸素貝葉斯）

【數(shù)據(jù)科學(xué)系統(tǒng)學(xué)習(xí)】機(jī)器學(xué)習(xí)算法 # 西瓜書(shū)學(xué)習(xí)記錄 [6] 樸素貝葉斯實(shí)踐

發(fā)表評(píng)論

0條評(píng)論

miracledan

男|高級(jí)講師

TA的文章

小程序第一彈

深入理解C語(yǔ)言指針——挑戰(zhàn)C指針筆試題 (和bug郭一起學(xué)C系列）

監(jiān)控主機(jī)域名是什么-虛擬主機(jī)和域名是什么關(guān)系？

主機(jī)后面一個(gè)電腦標(biāo)識(shí)的是什么-網(wǎng)絡(luò)標(biāo)識(shí)和主機(jī)標(biāo)識(shí)是什么意思？

CSS中各種布局的背后(*FC)

我們來(lái)談?wù)勴?xiàng)目中遇到的需要兼容pc端和移動(dòng)端的css布局

前端高頻面試題

最近的狀態(tài)

最新活動(dòng)

資訊專(zhuān)欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

Nai?ve Bayes Classifiers

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！