摘要:歡迎關(guān)注,專注數(shù)據(jù)分析數(shù)據(jù)挖掘好玩工具本文介紹一些我們可能很少聽過(guò),但是對(duì)于特定問(wèn)題或者特定任務(wù)來(lái)說(shuō),卻非常實(shí)用的工具包,文末提供技術(shù)交流群,歡迎大家一起學(xué)習(xí)討論。
歡迎關(guān)注 ,專注Python、數(shù)據(jù)分析、數(shù)據(jù)挖掘、好玩工具!
本文介紹一些我們可能很少聽過(guò),但是對(duì)于特定問(wèn)題或者特定任務(wù)來(lái)說(shuō),卻非常實(shí)用的工具包,文末提供技術(shù)交流群,歡迎大家一起學(xué)習(xí)討論。
為了配合上述 Python 庫(kù)的使用,我們先從 Kaggle 上下載一個(gè)數(shù)據(jù) – Animal Care and Control Adopted Animals
https://www.kaggle.com/jinbonnie/animal-data
import pandas as pddf = pd.read_csv("animal-data-1.csv")print("Number of pets:", len(df))print(df.head(3))
Number of pets: 10290
id intakedate intakereason istransfer sheltercode / 0 15801 2009-11-28 00:00:00 Moving 0 C09115463 1 15932 2009-12-08 00:00:00 Moving 0 D09125594 2 28859 2012-08-10 00:00:00 Abandoned 0 D12082309 identichipnumber animalname breedname basecolour speciesname / 0 0A115D7358 Jadzia Domestic Short Hair Tortie Cat 1 0A11675477 Gonzo German Shepherd Dog/Mix Tan Dog 2 0A13253C7B Maggie Shep Mix/Siberian Husky Various Dog ... movementdate movementtype istrial returndate returnedreason / 0 ... 2017-05-13 00:00:00 Adoption 0.0 NaN Stray 1 ... 2017-04-24 00:00:00 Adoption 0.0 NaN Stray 2 ... 2017-04-15 00:00:00 Adoption 0.0 NaN Stray deceaseddate deceasedreason diedoffshelter puttosleep isdoa 0 NaN Died in care 0 0 0 1 NaN Died in care 0 0 0 2 NaN Died in care 0 0 0 [3 rows x 23 columns]
下面我們就進(jìn)入這些 Python 庫(kù)的介紹
Missingno 是用于在數(shù)據(jù)集當(dāng)中顯示缺失值的,這對(duì)于我們的數(shù)據(jù)分析來(lái)說(shuō)是非常有用的。而且還能做成熱力圖或者條形圖,來(lái)更加直觀的觀察缺失值
matrix - 類似于 seaborn 中的缺失值熱圖,可以最多展示數(shù)據(jù)集中50列的密度情況,也可以通過(guò)右側(cè)的迷你圖,來(lái)整體觀測(cè)數(shù)據(jù)集的缺失情況
bar - 案列顯示缺失值情況
heatmap - 展示缺失值之間的相關(guān)性,本質(zhì)上,揭示了變量的存在與否對(duì)另一個(gè)變量的存在的影響程度。而對(duì)于沒(méi)有缺失值的列或者全完沒(méi)有值的列,則不會(huì)出現(xiàn)在這里
dendrogram - 樹狀圖與熱圖類似,展示的是列之間缺失的相關(guān)性,而與熱圖不同的地方是通過(guò)一組列來(lái)揭示相關(guān)性
下面我們就來(lái)具體看看這些圖表
import missingno as msnomsno.matrix(df)
msno.bar(df)
msno.heatmap(df)
msno.dendrogram(df)
對(duì)于 missingno 圖表,我們還可以自定義一些參數(shù)
msno.matrix( df, figsize=(25,7), fontsize=30, sort="descending", color=(0.494, 0.184, 0.556), width_ratios=(10, 1) )
最后我們還可以與 matplotlib 相結(jié)合,制作更加優(yōu)美的圖表
import matplotlib.pyplot as pltmsno.matrix( df, figsize=(25,7), fontsize=30, sort="descending", color=(0.494, 0.184, 0.556), width_ratios=(10, 1), inline=False )plt.title("Missing Values Pet Dataset", fontsize=55)plt.show()
這個(gè)庫(kù)可以在 Python 中打印出漂亮的表格,允許智能和可定制的列對(duì)齊、數(shù)字和文本格式、小數(shù)點(diǎn)對(duì)齊,也是一個(gè)數(shù)據(jù)分析過(guò)程中的好用工具。支持的數(shù)據(jù)類型包括 dataframe, list of lists or dictionaries, dictionary, NumPy array
from tabulate import tabulatedf_pretty_printed = df.iloc[:5, [1,2,4,6]]print(tabulate(df_pretty_printed))
- ----------- ----------------------- ------ -----0 Jadzia Domestic Short Hair Female Stray1 Gonzo German Shepherd Dog/Mix Male Stray2 Maggie Shep Mix/Siberian Husky Female Stray3 Pretty Girl Domestic Short Hair Female Stray4 Pretty Girl Domestic Short Hair Female Stray- ----------- ----------------------- ------ -----
我們還可以自定義表格頭,使用參數(shù) headers
print(tabulate( df_pretty_printed, headers="keys", tablefmt="fancy_grid", stralign="center" ))
│ │ animalname │ breedname │ sexname │ returnedreason │╞════╪══════════════╪═════════════════════════╪═══════════╪══════════════════╡│ 0 │ Jadzia │ Domestic Short Hair │ Female │ Stray │├────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤│ 1 │ Gonzo │ German Shepherd Dog/Mix │ Male │ Stray │├────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤│ 2 │ Maggie │ Shep Mix/Siberian Husky │ Female │ Stray │├────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤│ 3 │ Pretty Girl │ Domestic Short Hair │ Female │ Stray │├────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤│ 4 │ Pretty Girl │ Domestic Short Hair │ Female │ Stray │╘════╧══════════════╧═════════════════════════╧═══════════╧══════════════════╛
不過(guò)這個(gè)庫(kù)打印出的表格數(shù)據(jù)在手機(jī)屏幕上會(huì)有一定的兼容性問(wèn)題,只有在PC機(jī)上才能有最佳的顯示效果
維基百科庫(kù),可以方便的訪問(wèn)維基百科信息,以及獲取數(shù)據(jù)
該庫(kù)的幾個(gè)主要功能如下:
搜索維基百科 - search()
獲取文章摘要 - summary
獲取完整頁(yè)面內(nèi)容,包括圖像、鏈接等 - page()
選擇語(yǔ)言 - set_lang()
我們以上面數(shù)據(jù)集當(dāng)中的 Siberian Husky 為關(guān)鍵詞,在維基百科中設(shè)置為俄語(yǔ)搜索一下,看看結(jié)果
import wikipediawikipedia.set_lang("ru")print(wikipedia.search("Siberian Husky"))
["Сибирский хаски", "Древние породы собак", "Маккензи Ривер Хаски", "Породы собак по классификации кинологических организаций", "Ричардсон, Кевин Майкл"]
我們獲取第一個(gè)搜索結(jié)果當(dāng)中的第一段話
print(wikipedia.summary("Сибирский хаски", sentences=1))
Сибирский хаски — заводская специализированная порода собак, выведенная чукчами северо-восточной части Сибири и зарегистрированная американскими кинологами в 1930-х годах как ездовая собака, полученная от аборигенных собак Дальнего Востока России, в основном из Анадыря, Колымы, Камчатки у местных оседлых приморских племён — юкагиров, кереков, азиатских эскимосов и приморских чукчей — анкальын (приморские, поморы — от анкы (море)).
下面我們?cè)賮?lái)獲取圖片信息
print(wikipedia.page("Сибирский хаски").images[0])
就可以拿到圖片了
對(duì)于這個(gè)庫(kù),熟悉 Linux 的同學(xué)應(yīng)該都知道,一個(gè)好用的 shell 命令也叫做 wget,是用來(lái)下載文件的,這個(gè) Python 庫(kù)也有著同樣的功能
我們來(lái)試試下載上面哈士奇圖片吧
import wgetwget.download("https://upload.wikimedia.org/wikipedia/commons/a/a3/Black-Magic-Big-Boy.jpg")
"Black-Magic-Big-Boy.jpg"
當(dāng)然使用該庫(kù),我們還可以方便的下載 HTML 文件
wget.download("https://www.kaggle.com/jinbonnie/animal-data")
"animal-data"
下載好的文件內(nèi)容類似:
<!DOCTYPE html><html lang="en"><head> <title>Animal Care and Control Adopted Animals | Kaggle</title> <meta charset="utf-8" /> <meta name="robots" content="index, follow" /> <meta name="description" content="animal situation in Bloomington Animal Shelter from 2017-2020" /> <meta name="turbolinks-cache-control" content="no-cache" />
這個(gè)庫(kù)是用來(lái)生成假數(shù)據(jù)的,這個(gè)在我們平時(shí)的程序測(cè)試當(dāng)中還是非常好用的。它可以生成包括名字、郵件地址、電話號(hào)碼、工作、句子、顏色,貨幣等等眾多假數(shù)據(jù),同時(shí)還支持本地化,也就是可以將當(dāng)前工作語(yǔ)言環(huán)境作為參數(shù),生成當(dāng)前語(yǔ)言的假數(shù)據(jù),實(shí)在是太貼心了
from faker import Fakerfake = Faker()print( "Fake color:", fake.color(), "/n" "Fake job:", fake.job(), "/n" "Fake email:", fake.email(), "/n" )# Printing a list of fake Korean and Portuguese addressesfake = Faker(["ko_KR", "pt_BR"])for _ in range(5): print(fake.unique.address()) # using the `.unique` propertyprint("/n")# Assigning a seed number to print always the same value / data setfake = Faker()Faker.seed(3920)print("This English fake name is always the same:", fake.name())
Fake color: #212591Fake job: Occupational therapistFake email: nancymoody@hotmail.comEstrada Lavínia da Luz, 62Oeste85775858 Moura / SEResidencial de Moreira, 57Morro Dos Macacos75273529 Farias / TO??????? ??? ???? (??????)???? ??? ????? (????)???? ??? ??53?This English fake name is always the same: Kim Lopez
我們?cè)倩氐轿覀兊膭?dòng)物數(shù)據(jù)集,我們發(fā)現(xiàn)有兩個(gè)動(dòng)物的名字不是特別好
df_bad_names = df[df["animalname"].str.contains("Stink|Pooh")]print(df_bad_names)
identichipnumber animalname breedname speciesname sexname /1692 NaN Stinker Domestic Short Hair Cat Male3336 981020023417175 Pooh German Shepherd Dog Dog Female3337 981020023417175 Pooh German Shepherd Dog Dog Female returndate returnedreason1692 NaN Stray3336 2018-05-14 00:00:00 Incompatible with owner lifestyle3337 NaN Stray
下面我們分別為這兩只貓狗重新命名一個(gè)好聽的名字
# Defining a function to rename the unlucky petsdef rename_pets(name): if name == "Stinker": fake = Faker() Faker.seed(162) name = fake.name() if name == "Pooh": fake = Faker(["de_DE"]) Faker.seed(20387) name = fake.name() return name# Renaming the petsdf["animalname"] = df["animalname"].apply(rename_pets)# Checking the resultsprint(df.iloc[df_bad_names.index.tolist(), :] )
identichipnumber animalname breedname speciesname /1692 NaN Steven Harris Domestic Short Hair Cat3336 981020023417175 Helena Fliegner-Karz German Shepherd Dog Dog3337 981020023417175 Helena Fliegner-Karz German Shepherd Dog Dog sexname returndate returnedreason1692 Male NaN Stray3336 Female 2018-05-14 00:00:00 Incompatible with owner lifestyle3337 Female NaN Stray
怎么樣,名字是不是好聽多了
該庫(kù)可以將自然語(yǔ)言轉(zhuǎn)化為數(shù)字,我們來(lái)看看吧
我們先來(lái)獲取名稱中包含數(shù)據(jù)的動(dòng)物的信息
df_numerized_names = df[["identichipnumber", "animalname", "speciesname"]]/ [df["animalname"].str.contains("Two|Seven|Fifty")]df_numerized_names
下面我們就把名稱中的數(shù)字轉(zhuǎn)化成阿拉伯?dāng)?shù)字
from numerizer import numerizedf["animalname"] = df["animalname"].apply(lambda x: numerize(x))df[["identichipnumber", "animalname", "speciesname"]].iloc[df_numerized_names.index.tolist(), :]
符號(hào)庫(kù),我們可以根據(jù) Unicode Consortium 2 定義的表情符號(hào)代碼將字符串轉(zhuǎn)換為表情符號(hào),emoji 庫(kù)只有兩個(gè)函數(shù):emojize() 和 demojize()
import emojiprint(emoji.emojize(":koala:"))print(emoji.demojize(""))print(emoji.emojize(":rana:", language="it"))
?:koala:?
下面我們來(lái)符號(hào)化我們的動(dòng)物吧
print(df["speciesname"].unique())
["Cat" "Dog" "House Rabbit" "Rat" "Bird" "Opossum" "Chicken" "Wildlife" "Ferret" "Tortoise" "Pig" "Hamster" "Guinea Pig" "Gerbil" "Lizard""Hedgehog" "Chinchilla" "Goat" "Snake" "Squirrel" "Sugar Glider" "Turtle""Tarantula" "Mouse" "Raccoon" "Livestock" "Fish"]
我們要將字母全部轉(zhuǎn)化為小寫,然后在前后分別添加冒號(hào)
df["speciesname"] = df["speciesname"].apply(lambda x: emoji.emojize(f":{x.lower()}:", use_aliases=True))print(df["speciesname"].unique())
["" "" ":house rabbit:" "" "" ":opossum:" "" ":wildlife:" ":ferret:"":tortoise:" "" "" ":guinea pig:" ":gerbil:" "" "" ":chinchilla:" """" ":squirrel:" ":sugar glider:" "" ":tarantula:" "" "" ":livestock:"""]
再進(jìn)行名稱同義詞轉(zhuǎn)化
df["speciesname"] = df["speciesname"].str.replace(":house rabbit:", ":rabbit:")/ .replace(":tortoise:", ":turtle:")/ .replace(":squirrel:", ":chipmunk:")df["speciesname"] = df["speciesname"].apply(lambda x: emoji.emojize(x, variant="emoji_type"))print(df["speciesname"].unique())
["" "" "?" "" "" ":opossum:?" "" ":wildlife:?" ":ferret:?" "?" """" ":guinea pig:" ":gerbil:?" "" "" ":chinchilla:?" "" "" """:sugar glider:" "" ":tarantula:?" "" "" ":livestock:?" ""]
對(duì)于剩下的這些沒(méi)有對(duì)應(yīng)動(dòng)物名稱的數(shù)據(jù),我們?cè)俎D(zhuǎn)化會(huì)原來(lái)的數(shù)據(jù)形式
df["speciesname"] = df["speciesname"].str.replace(":", "").apply(lambda x: x.title())print(df["speciesname"].unique())df[["animalname", "speciesname", "breedname"]].head(3)
["" "" "?" "" "" "Opossum?" "" "Wildlife?" "Ferret?" "?" "" """Guinea Pig" "Gerbil?" "" "" "Chinchilla?" "" "" "" "Sugar Glider""" "Tarantula?" "" "" "Livestock?" ""]
這樣,我們就完成了符號(hào)化動(dòng)物名稱了
這個(gè)庫(kù)的創(chuàng)造可能僅僅是為了娛樂(lè)吧,該庫(kù)可以預(yù)測(cè)每一天不同星座的幸運(yùn)數(shù)字、幸運(yùn)時(shí)間、幸運(yùn)顏色等等,感興趣的朋友可以玩一玩
import pyaztropyaztro.Aztro(sign="taurus").description
"You need to make a radical change in some aspect of your life - probably related to your home. It could be time to buy or sell or just to move on to some more promising location."
我們?cè)賮?lái)看看我們的數(shù)據(jù)集,在我們的數(shù)據(jù)集中,有一只貓和一只狗叫 Aries(白羊座)
df[["animalname", "speciesname"]][(df["animalname"] == "Aries")]
還有很多動(dòng)物叫做 Leo (獅子座)
print("Leo:", df["animalname"][(df["animalname"] == "Leo")].count())
Leo: 18
我們假設(shè)這就是動(dòng)物們的星座,然后來(lái)使用該庫(kù)預(yù)測(cè)他們的運(yùn)勢(shì)吧
aries = pyaztro.Aztro(sign="aries")leo = pyaztro.Aztro(sign="leo")print("ARIES: /n", "Sign:", aries.sign, "/n", "Current date:", aries.current_date, "/n", "Date range:", aries.date_range, "/n", "Sign description:", aries.description, "/n", "Mood:", aries.mood, "/n", "Compatibility:", aries.compatibility, "/n", "Lucky number:", aries.lucky_number, "/n", "Lucky time:", aries.lucky_time, "/n", "Lucky color:", aries.color,
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://systransis.cn/yun/120917.html
摘要:最近數(shù)月一直投身于的開源工作中,完成了大大小小多個(gè)組件,在組件化開發(fā)中積累了不少經(jīng)驗(yàn)。在開發(fā)全局提示組件通知提醒組件對(duì)話框組件時(shí),內(nèi)部都是使用來(lái)渲染,但卻是來(lái)隱式地創(chuàng)建這些實(shí)例,這樣我們就可以像標(biāo)題這樣使用,但其內(nèi)部還是通過(guò)來(lái)管理。 最近數(shù)月一直投身于 iView 的開源工作中,完成了大大小小 30 多個(gè) UI 組件,在 Vue 組件化開發(fā)中積累了不少經(jīng)驗(yàn)。其中也有很多帶有技巧性和黑科...
摘要:最近數(shù)月一直投身于的開源工作中,完成了大大小小多個(gè)組件,在組件化開發(fā)中積累了不少經(jīng)驗(yàn)。在開發(fā)全局提示組件通知提醒組件對(duì)話框組件時(shí),內(nèi)部都是使用來(lái)渲染,但卻是來(lái)隱式地創(chuàng)建這些實(shí)例,這樣我們就可以像標(biāo)題這樣使用,但其內(nèi)部還是通過(guò)來(lái)管理。 最近數(shù)月一直投身于 iView 的開源工作中,完成了大大小小 30 多個(gè) UI 組件,在 Vue 組件化開發(fā)中積累了不少經(jīng)驗(yàn)。其中也有很多帶有技巧性和黑科...
摘要:其他語(yǔ)言數(shù)據(jù)結(jié)構(gòu)跟算法一樣是在開始寫代碼的時(shí)候用得很少,都有著包裝好的現(xiàn)成東西供你使用,但同樣是面試和崗位上升會(huì)用得到,我就不說(shuō)數(shù)據(jù)結(jié)構(gòu)對(duì)代碼有多少好處,請(qǐng)記住一句話能夠?qū)崿F(xiàn)個(gè)功能和能夠最優(yōu)地實(shí)現(xiàn)個(gè)功能,是完全不同級(jí)別的要求。 ...
摘要:如果還有不明白的小白朋友們可以百度一下哈,嗖的一下百家號(hào)補(bǔ)習(xí)班就出來(lái)啦,哈哈,你啥都沒(méi)查到開個(gè)小玩笑。 ## List初步進(jìn)階 ## hello,大家好,經(jīng)過(guò)上篇筆記的介紹,我們已經(jīng)對(duì)List這種數(shù)據(jù)類型有了初步的理解,今天我要趁熱打鐵,為大家介紹一些實(shí)用的List技巧,希望能幫助到各位大家~ extend合并列表() first_lst = [I,am,noob] s...
摘要:數(shù)據(jù)中心運(yùn)維也喜靜不喜動(dòng),少動(dòng)慎動(dòng),這能最大程度減少故障發(fā)生。不過(guò),的確是應(yīng)該最大限度地去主動(dòng)降低數(shù)據(jù)中心操作頻率,盡量少動(dòng),這樣可極大降低故障發(fā)生概率。作為數(shù)據(jù)中心的運(yùn)維人,要時(shí)刻牢記祖訓(xùn)。沒(méi)有折騰,就沒(méi)有故障這句話雖糙,但卻很有道理,尤其在運(yùn)維上。據(jù)有關(guān)咨詢機(jī)構(gòu)統(tǒng)計(jì),數(shù)據(jù)中心的故障中有百分之七十是人為故障,也就是與人的活動(dòng)強(qiáng)相關(guān),可見人對(duì)于數(shù)據(jù)中心來(lái)說(shuō)是多么可怕。人為故障其中也可以分為有...
閱讀 3290·2021-11-18 10:02
閱讀 3454·2021-10-11 10:58
閱讀 3384·2021-09-24 09:47
閱讀 1131·2021-09-22 15:21
閱讀 3963·2021-09-10 11:10
閱讀 3284·2021-09-03 10:28
閱讀 1756·2019-08-30 15:45
閱讀 2149·2019-08-30 14:22