Data PreProcessing
As shown in the infograph we will break down data preprocessing in 6 essential steps.
Get the dataset from here that is used in this example
import numpy as np import pandas as pdStep 2: Importing dataset
dataset = pd.read_csv("Data.csv") X = dataset.iloc[ : , :-1].values Y = dataset.iloc[ : , 3].valuesStep 3: Handling the missing data
from sklearn.preprocessing import Imputer imputer = Imputer(missing_values = "NaN", strategy = "mean", axis = 0) imputer = imputer.fit(X[ : , 1:3]) X[ : , 1:3] = imputer.transform(X[ : , 1:3])Step 4: Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder labelencoder_X = LabelEncoder() X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])Creating a dummy variable
onehotencoder = OneHotEncoder(categorical_features = [0]) X = onehotencoder.fit_transform(X).toarray() labelencoder_Y = LabelEncoder() Y = labelencoder_Y.fit_transform(Y)Step 5: Splitting the datasets into training sets and Test sets
from sklearn.cross_validation import train_test_split X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)Step 6: Feature Scaling
from sklearn.preprocessing import StandardScaler sc_X = StandardScaler() X_train = sc_X.fit_transform(X_train) X_test = sc_X.fit_transform(X_test)Done
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://systransis.cn/yun/42285.html
摘要:導(dǎo)入數(shù)據(jù)預(yù)處理計(jì)算值從到對(duì)應(yīng)的平均畸變程度用求解距離平均畸變程度用肘部法則來確定最佳的值建模 導(dǎo)入數(shù)據(jù) cus_general = customer[[wm_poi_id,city_type,pre_book,aor_type,is_selfpick_poi,is_selfpick_trade_poi]] cus_ord = customer[[wm_poi_id,month_orig...
摘要:導(dǎo)入庫導(dǎo)入數(shù)據(jù)集這一步的目的是將自變量和因變量拆成一個(gè)矩陣和一個(gè)向量。 數(shù)據(jù)預(yù)處理是機(jī)器學(xué)習(xí)中最基礎(chǔ)也最麻煩的一部分內(nèi)容在我們把精力撲倒各種算法的推導(dǎo)之前,最應(yīng)該做的就是把數(shù)據(jù)預(yù)處理先搞定在之后的每個(gè)算法實(shí)現(xiàn)和案例練手過程中,這一步都必不可少同學(xué)們也不要嫌麻煩,動(dòng)起手來吧基礎(chǔ)比較好的同學(xué)也可以溫故知新,再練習(xí)一下哈 閑言少敘,下面我們六步完成數(shù)據(jù)預(yù)處理其實(shí)我感覺這里少了一步:觀察數(shù)據(jù)...
摘要:機(jī)器學(xué)習(xí)中,數(shù)據(jù)歸一化是非常重要,如果不進(jìn)行數(shù)據(jù)歸一化,可能會(huì)導(dǎo)致模型壞掉或者訓(xùn)練出一個(gè)奇怪的模型。解決方法就是將是數(shù)據(jù)映射到同一尺度,這就是數(shù)據(jù)歸一化。數(shù)據(jù)歸一化的兩個(gè)常用方式為最值歸一化和均值方差歸一化。 機(jī)器學(xué)習(xí)中,數(shù)據(jù)歸一化是非常重要,如果不進(jìn)行數(shù)據(jù)歸一化,可能會(huì)導(dǎo)致模型壞掉或者訓(xùn)練出一個(gè)奇怪的模型。 為什么要進(jìn)行數(shù)據(jù)歸一化 現(xiàn)在有一個(gè)訓(xùn)練數(shù)據(jù)集,包含兩個(gè)樣本,內(nèi)容如下: ...
閱讀 2338·2021-09-26 10:21
閱讀 2824·2021-09-08 09:36
閱讀 3077·2019-08-30 15:56
閱讀 969·2019-08-30 12:57
閱讀 950·2019-08-26 10:39
閱讀 3572·2019-08-23 18:11
閱讀 3096·2019-08-23 17:12
閱讀 1098·2019-08-23 12:18