Pandas庫(kù)基礎(chǔ)分析——數(shù)據(jù)生成和訪問(wèn)

Jonathan Shieber 發(fā)布于2019-07-30 15:42 / 3133人閱讀

摘要：本文著重介紹這兩種數(shù)據(jù)結(jié)構(gòu)的生成和訪問(wèn)的基本方法。是一種類(lèi)似于一維數(shù)組的對(duì)象，由一組數(shù)據(jù)一維數(shù)組對(duì)象和一組與之對(duì)應(yīng)相關(guān)的數(shù)據(jù)標(biāo)簽索引組成。注當(dāng)數(shù)據(jù)未指定索引時(shí)，會(huì)自動(dòng)創(chuàng)建整數(shù)型索引注通過(guò)字典創(chuàng)建，可視為一個(gè)定長(zhǎng)的有序字典。

前言

Pandas是Python環(huán)境下最有名的數(shù)據(jù)統(tǒng)計(jì)包，是基于 Numpy 構(gòu)建的含有更高級(jí)數(shù)據(jù)結(jié)構(gòu)和工具的數(shù)據(jù)分析包。Pandas圍繞著 Series 和 DataFrame 兩個(gè)核心數(shù)據(jù)結(jié)構(gòu)展開(kāi)的。本文著重介紹這兩種數(shù)據(jù)結(jié)構(gòu)的生成和訪問(wèn)的基本方法。

Series

Series是一種類(lèi)似于一維數(shù)組的對(duì)象，由一組數(shù)據(jù)（一維ndarray數(shù)組對(duì)象）和一組與之對(duì)應(yīng)相關(guān)的數(shù)據(jù)標(biāo)簽（索引）組成。
注：numpy（Numerical Python）提供了python對(duì)多維數(shù)組對(duì)象的支持：ndarray，具有矢量運(yùn)算能力，快速、節(jié)省空間。

（1）Pandas說(shuō)明文檔中對(duì)Series特點(diǎn)介紹如下：

""" One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
missing data (currently represented as NaN).

Operations between Series (+, -, /, , *) align values based on their
associated index values-- they need not be the same length. The result
index will be the sorted union of the two indexes.

Parameters
---------- data : array-like, dict, or scalar value

Contains data stored in Series index : array-like or Index (1d)
Values must be hashable and have the same length as `data`.
Non-unique index values are allowed. Will default to
RangeIndex(len(data)) if not provided. If both a dict and index
sequence are used, the index will override the keys found in the
dict. dtype : numpy.dtype or None
If None, dtype will be inferred copy : boolean, default False
Copy input data """

（2）創(chuàng)建Series的基本方法如下，數(shù)據(jù)可以是陣列（list、ndarray）、字典和常量值。s = pd.Series(data, index=index)

s = pd.Series([-1.55666192,-0.75414753,0.47251231,-1.37775038,-1.64899442], index=["a", "b", "c", "d", "e"],dtype="int8" )
a   -1
b    0
c    0
d   -1
e   -1
dtype: int8

s = pd.Series(["a",-0.75414753,123,66666,-1.64899442], index=["a", "b", "c", "d", "e"],)
a           a
b   -0.754148
c         123
d       66666
e    -1.64899
dtype: object

注：Series支持的數(shù)據(jù)類(lèi)型包括整數(shù)、浮點(diǎn)數(shù)、復(fù)數(shù)、布爾值、字符串等numpy.dtype，與創(chuàng)建ndarray數(shù)組相同的是，如未指定類(lèi)型，它會(huì)嘗試推斷出一個(gè)合適的數(shù)據(jù)類(lèi)型，例程中數(shù)據(jù)包含數(shù)字和字符串時(shí)，推斷為object類(lèi)型；如指定int8類(lèi)型時(shí)數(shù)據(jù)以int8顯示。

s = pd.Series(np.random.randn(5))
0    0.485468
1   -0.912130
2    0.771970
3   -1.058117
4    0.926649
dtype: float64

s.index
RangeIndex(start=0, stop=5, step=1)

s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
a    0.485468
b   -0.912130
c    0.771970
d   -1.058117
e    0.926649
dtype: float64

注：當(dāng)數(shù)據(jù)未指定索引時(shí)，Series會(huì)自動(dòng)創(chuàng)建整數(shù)型索引

s = pd.Series({"a" : 0., "b" : 1., "c" : 2.})
a    0.0
b    1.0
c    2.0
dtype: float64

s = pd.Series({"a" : 0., "b" : 1., "c" : 2.}, index=["b", "c", "d", "a"])
b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

注：通過(guò)Python字典創(chuàng)建Series，可視為一個(gè)定長(zhǎng)的有序字典。如果只傳入一個(gè)字典，那么Series中的索引即是原字典的鍵。如果傳入索引，那么會(huì)找到索引相匹配的值并放在相應(yīng)的位置上，未找到對(duì)應(yīng)值時(shí)結(jié)果為NaN。

s = pd.Series(5., index=["a", "b", "c", "d", "e"])
a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

注：數(shù)值重復(fù)匹配以適應(yīng)索引長(zhǎng)度

（3）訪問(wèn)Series中的元素和索引

s = pd.Series({"a" : 0., "b" : 1., "c" : 2.}, index=["b", "c", "d", "a"])
b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

s.values
[  1.   2.  nan   0.]

s.index
Index([u"b", u"c", u"d", u"a"], dtype="object")

注：Series的values和index屬性獲取其數(shù)組表示形式和索引對(duì)象

s["a"]
0.0

s[["a","b"]]
a    0.0
b    1.0
dtype: float64

s[["a","b","c"]]
a    0.0
b    1.0
c    2.0
dtype: float64

s[:2] 
b    1.0
c    2.0
dtype: float64

注：可以通過(guò)索引的方式選取Series中的單個(gè)或一組值

DataFrame

DataFrame是一個(gè)表格型（二維）的數(shù)據(jù)結(jié)構(gòu)，它含有一組有序的列，每列可以是不同的值類(lèi)型（數(shù)值、字符串、布爾值等）。DataFrame既有行索引也有列索引，它可以看做由Series組成的字典（共用同一個(gè)索引）。

（1）Pandas說(shuō)明文檔中對(duì)DataFrame特點(diǎn)介紹如下：

""" Two-dimensional size-mutable, potentially heterogeneous tabular
data structure with labeled axes (rows and columns). Arithmetic
operations align on both row and column labels. Can be thought of as a
dict-like container for Series objects. The primary pandas data
structure

Parameters
---------- data : numpy ndarray (structured or homogeneous), dict, or DataFrame

Dict can contain Series, arrays, constants, or list-like objects index : Index or array-like
Index to use for resulting frame. Will default to np.arange(n) if
no indexing information part of input data and no index provided columns : Index or array-like
Column labels to use for resulting frame. Will default to
np.arange(n) if no column labels are provided dtype : dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer copy : boolean, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input

（2）創(chuàng)建DataFrame的基本方法如下，數(shù)據(jù)可以是由列表、一維ndarray或Series組成的字典（序列長(zhǎng)度必須相同）、二維ndarray、字典組成的字典等df = pd.DataFrame(data, index=index)

df = pd.DataFrame({"one": [1., 2., 3., 5], "two": [1., 2., 3., 4.]})
   one  two
0  1.0  1.0
1  2.0  2.0
2  3.0  3.0
3  5.0  4.0

注：以列表組成的字典形式創(chuàng)建，每個(gè)序列成為DataFrame的一列。不支持單一列表創(chuàng)建df = pd.DataFrame({[1., 2., 3., 5], [1., 2., 3., 4.]})，因?yàn)閘ist為unhashable類(lèi)型

df = pd.DataFrame([[1., 2., 3., 5],[1., 2., 3., 4.]],index=["a", "b"],columns=["one","two","three","four"])
   one  two  three  four
a  1.0  2.0    3.0   5.0
b  1.0  2.0    3.0   4.0

注：以嵌套列表組成形式創(chuàng)建2行4列的表格，通過(guò)index和 columns參數(shù)指定了索引和列名

data = np.zeros((2,), dtype=[("A", "i4"),("B", "f4"),("C", "a10")])
[(0,  0., "") (0,  0., "")]

注：zeros(shape, dtype=float, order="C")返回一個(gè)給定形狀和類(lèi)型的用0填充的數(shù)組

data[:] = [(1,2.,"Hello"), (2,3.,"World")]        
df = pd.DataFrame(data)
   A    B      C
0  1  2.0  Hello
1  2  3.0  World

df = pd.DataFrame(data, index=["first", "second"])
        A    B      C
first   1  2.0  Hello
second  2  3.0  World

df = pd.DataFrame(data, columns=["C", "A", "B"])
       C  A    B
0  Hello  1  2.0
1  World  2  3.0

注：同Series相同，未指定索引時(shí)DataFrame會(huì)自動(dòng)加上索引，指定列則按指定順序進(jìn)行排列

data = {"one" : pd.Series([1., 2., 3.], index=["a", "b", "c"]),
        "two" : pd.Series([1., 2., 3., 4.], index=["a", "b", "c", "d"])}
df = pd.DataFrame(data)
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

注：以Series組成的字典形式創(chuàng)建時(shí)，每個(gè)Series成為一列，如果沒(méi)有顯示指定索引，則各Series的索引被合并成結(jié)果的行索引。NaN代替缺失的列數(shù)據(jù)

df = pd.DataFrame(data,index=["d", "b", "a"])
   one  two
d  NaN  4.0
b  2.0  2.0
a  1.0  1.0

df = pd.DataFrame(data,index=["d", "b", "a"], columns=["two", "three"])
   two three
d  4.0   NaN
b  2.0   NaN
a  1.0   NaN

data2 = [{"a": 1, "b": 2}, {"a": 5, "b": 10, "c": 20}]
df = pd.DataFrame(data2)
   a   b     c
0  1   2   NaN
1  5  10  20.0

注：以字典的列表形式創(chuàng)建時(shí)，各項(xiàng)成為DataFrame的一行，字典鍵索引的并集成為DataFrame的列標(biāo)

df = pd.DataFrame(data2, index=["first", "second"])
        a   b     c
first   1   2   NaN
second  5  10  20.0

df = pd.DataFrame(data2, columns=["a", "b"])
   a   b
0  1   2
1  5  10

df = pd.DataFrame({("a", "b"): {("A", "B"): 1, ("A", "C"): 2},
                 ("a", "a"): {("A", "C"): 3, ("A", "B"): 4},
                 ("a", "c"): {("A", "B"): 5, ("A", "C"): 6}, 
                 ("b", "a"): {("A", "C"): 7, ("A", "B"): 8},  
                 ("b", "b"): {("A", "D"): 9, ("A", "B"): 10}})
       a              b
       a    b    c    a     b
A B  4.0  1.0  5.0  8.0  10.0
  C  3.0  2.0  6.0  7.0   NaN
  D  NaN  NaN  NaN  NaN   9.0

注：以字典的字典形式創(chuàng)建時(shí)，列索引由外層的鍵合并成結(jié)果的列索引，各內(nèi)層字典成為一列，內(nèi)層的鍵會(huì)被合并成結(jié)果的行索引。

（3）訪問(wèn)DataFrame中的元素和索引

data = {"one" : pd.Series([1., 2., 3.], index=["a", "b", "c"]),
        "two" : pd.Series([1., 2., 3., 4.], index=["a", "b", "c", "d"])}
df = pd.DataFrame(data)
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

df["one"]或df.one
a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64

注：通過(guò)類(lèi)似字典標(biāo)記的方式或?qū)傩缘姆绞剑梢詫ataFrame的列獲取為一個(gè)Series。返回的Series擁有原DataFrame相同的索引，且其name屬性也被相應(yīng)設(shè)置。

df[0:1]
   one  two
a  1.0  1.0

注：返回前兩列數(shù)據(jù)

df.loc["a"]
one    1.0
two    1.0
Name: a, dtype: float64

df.loc[:,["one","two"] ]
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

df.loc[["a",],["one","two"]]
   one  two
a  1.0  1.0

df.loc["a","one"]
1.0

注：loc是通過(guò)標(biāo)簽來(lái)選擇數(shù)據(jù)

df.iloc[0:2,0:1]  
   one
a  1.0
b  2.0

df.iloc[0:2]  
   one  two
a  1.0  1.0
b  2.0  2.0

df.iloc[[0,2],[0,1]]#自由選取行位置，和列位置對(duì)應(yīng)的數(shù)據(jù)
   one  two
a  1.0  1.0
c  3.0  3.0

注：iloc通過(guò)位置來(lái)選擇數(shù)據(jù)

df.ix["a"]
one    1.0
two    1.0
Name: a, dtype: float64

df.ix["a",["one","two"]]
one    1.0
two    1.0
Name: a, dtype: float64

df.ix["a",[0,1]]
one    1.0
two    1.0
Name: a, dtype: float64

df.ix[["a","b"],[0,1]]
   one  two
a  1.0  1.0
b  2.0  2.0

df.ix[1,[0,1]]
one    2.0
two    2.0
Name: b, dtype: float64

df.ix[[0,1],[0,1]]
   one  two
a  1.0  1.0
b  2.0  2.0

注：通過(guò)索引字段ix和名稱(chēng)結(jié)合的方式獲取行數(shù)據(jù)

df.ix[df.one>1,:1]
   one
b  2.0
c  3.0

注：使用條件來(lái)選擇，選取one列中大于1的行和第一列

df["one"]=16.8
    one  two
a  16.8  1.0
b  16.8  2.0
c  16.8  3.0
d  16.8  4.0

val = pd.Series([2,2,2],index=["b", "c", "d"])
df["one"]=val
   one  two
a  NaN  1.0
b  2.0  2.0
c  2.0  3.0
d  2.0  4.0

注：列可以通過(guò)賦值方式修改，將列表或數(shù)組賦值給某個(gè)列時(shí)長(zhǎng)度必須和DataFrame的長(zhǎng)度相匹配。Series賦值時(shí)會(huì)精確匹配DataFrame的索引，空位以NaN填充。

df["four"]=[3,3,3,3]
   one  two  four
a  NaN  1.0     3
b  2.0  2.0     3
c  2.0  3.0     3
d  2.0  4.0     3

注：對(duì)不存在的列賦值會(huì)創(chuàng)建新列

df.index.get_loc("a")
0

df.index.get_loc("b")
1

df.columns.get_loc("one")
0

注：通過(guò)行/列索引獲取整數(shù)形式位置

更多python量化交易內(nèi)容互動(dòng)請(qǐng)加微信公眾號(hào)：PythonQT-YuanXiao
歡迎訂閱量化交易課程：[鏈接地址]

GPU云服務(wù)器云服務(wù)器生成動(dòng)態(tài)庫(kù) pc訪問(wèn)數(shù)據(jù)庫(kù)庫(kù)通過(guò)堡壘機(jī) linux 生成庫(kù) 數(shù)據(jù)分析基礎(chǔ)數(shù)據(jù)

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://systransis.cn/yun/41409.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

Jonathan Shieber

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

LOCVPS：香港云地/美國(guó)洛杉磯輕量套餐上線,KVM月付29.6元起

閱讀 1543·2021-08-09 13:47
CSS3常見(jiàn)技巧（二）：如何用CSS3來(lái)實(shí)現(xiàn)三角形？

閱讀 2796·2019-08-30 15:55
【CSS練習(xí)】IT修真院--練習(xí)5-護(hù)工個(gè)人界面

閱讀 3529·2019-08-29 15:42
我不知道你知不知道我知道的偽元素小技巧

閱讀 1141·2019-08-29 13:45
CSS技巧 - 收藏集 - 掘金

閱讀 3039·2019-08-29 12:33
講清楚之 javascript 對(duì)象繼承

閱讀 1773·2019-08-26 11:58
來(lái)，告訴你Node.js究竟是什么？

閱讀 1016·2019-08-26 10:19
React組件設(shè)計(jì)模式-組合組件

閱讀 2443·2019-08-23 18:00

成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專(zhuān)欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

Pandas庫(kù)基礎(chǔ)分析——數(shù)據(jù)生成和訪問(wèn)

相關(guān)文章

Pandas庫(kù)基礎(chǔ)分析——數(shù)據(jù)規(guī)整化處理

Python工具分析風(fēng)險(xiǎn)數(shù)據(jù)

網(wǎng)絡(luò)爬蟲(chóng)介紹

**8步從Python白板到專(zhuān)家，從基礎(chǔ)到深度學(xué)習(xí)**

**一文帶你斬殺Python之Numpy??Pandas全部操作【全網(wǎng)最詳細(xì)】???**

發(fā)表評(píng)論

0條評(píng)論

Jonathan Shieber

男|高級(jí)講師

TA的文章

LOCVPS：香港云地/美國(guó)洛杉磯輕量套餐上線,KVM月付29.6元起

CSS3常見(jiàn)技巧（二）：如何用CSS3來(lái)實(shí)現(xiàn)三角形？

【CSS練習(xí)】IT修真院--練習(xí)5-護(hù)工個(gè)人界面

我不知道你知不知道我知道的偽元素小技巧

CSS技巧 - 收藏集 - 掘金

講清楚之 javascript 對(duì)象繼承

來(lái)，告訴你Node.js究竟是什么？

React組件設(shè)計(jì)模式-組合組件

最新活動(dòng)

資訊專(zhuān)欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

Pandas庫(kù)基礎(chǔ)分析——數(shù)據(jù)生成和訪問(wèn)

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！