本文針對前面利用Python 所做的一次數(shù)據(jù)匹配實驗,整理了其中的一些對于csv文件的讀寫操作和常用的Python"數(shù)據(jù)結構"(如字典和列表)之間的轉(zhuǎn)換
(Python Version 2.7)
def list2csv(list, file): wr = csv.writer(open(file, "wb"), quoting=csv.QUOTE_ALL) for word in list: wr.writerow([word])將嵌套字典的列表轉(zhuǎn)換為csv文件
my_list = [{"players.vis_name": "Khazri", "players.role": "Midfielder", "players.country": "Tunisia", "players.last_name": "Khazri", "players.player_id": "989", "players.first_name": "Wahbi", "players.date_of_birth": "08/02/1991", "players.team": "Bordeaux"}, {"players.vis_name": "Khazri", "players.role": "Midfielder", "players.country": "Tunisia", "players.last_name": "Khazri", "players.player_id": "989", "players.first_name": "Wahbi", "players.date_of_birth": "08/02/1991", "players.team": "Sunderland"}, {"players.vis_name": "Lewis Baker", "players.role": "Midfielder", "players.country": "England", "players.last_name": "Baker", "players.player_id": "9574", "players.first_name": "Lewis", "players.date_of_birth": "25/04/1995", "players.team": "Vitesse"} ]
# write nested list of dict to csv def nestedlist2csv(list, out_file): with open(out_file, "wb") as f: w = csv.writer(f) fieldnames=list[0].keys() # solve the problem to automatically write the header w.writerow(fieldnames) for row in list: w.writerow(row.values())
# convert csv file to dict # @params: # key/value: the column of original csv file to set as the key and value of dict def csv2dict(in_file,key,value): new_dict = {} with open(in_file, "rb") as f: reader = csv.reader(f, delimiter=",") fieldnames = next(reader) reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=",") for row in reader: new_dict[row[key]] = row[value] return new_dict
其中的new_dict[row[key]] = row[value]中的"key"和"value"是csv文件中的對應的第一行的屬性字段,需要注意的是這里假設csv文件比較簡單,所指定的key是唯一的,否則直接從csv轉(zhuǎn)換為dict文件會造成重復字段的覆蓋而丟失數(shù)據(jù),如果原始數(shù)據(jù)指定作為key的列存在重復的情況,則需要構建列表字典,將value部分設置為list,可參照列表字典的構建部分代碼
# convert csv file to dict(key-value pairs each row) def row_csv2dict(csv_file): dict_club={} with open(csv_file)as f: reader=csv.reader(f,delimiter=",") for row in reader: dict_club[row[0]]=row[1] return dict_club
# build a dict of list like {key:[...element of lst_inner_value...]} # key is certain column name of csv file # the lst_inner_value is a list of specific column name of csv file def build_list_dict(source_file, key, lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: for element in lst_inner_value: new_dict.setdefault(row[key], []).append(row[element]) return new_dict # sample: # test_club=build_list_dict("test_info.csv","season",["move from","move to"]) # print test_clubcsv文件轉(zhuǎn)換為二級字典
id | name | age | country |
1 | danny | 21 | China |
2 | Lancelot | 22 | America |
... | ... | ... | ... |
dct={"China":{"danny":{"id":"1","age":"21"}} "America":{"Lancelot":{"id":"2","age":"22"}}}
# build specific nested dict from csv files(date->name) def build_level2_dict(source_file): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row["country"], dict()) item[row["name"]] = {k: row[k] for k in ("id","age")} new_dict[row["country"]] = item return new_dict
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # inner_key:the inner level key of nested dict # inner_value:set the inner value for the inner key def build_level2_dict2(source_file,outer_key,inner_key,inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) item[row[inner_key]] = row[inner_value] new_dict[row[outer_key]] = item return new_dict
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # inner_key:the inner level key of nested dict,and rest key-value will be store as the value of inner key def build_level2_dict(source_file,outer_key,inner_key): new_dict = {} with open(source_file, "rb")as csv_file: reader = csv.reader(csv_file, delimiter=",") fieldnames = next(reader) inner_keyset=fieldnames inner_keyset.remove(outer_key) inner_keyset.remove(inner_key) csv_file.seek(0) data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) item[row[inner_key]] = {k: row[k] for k in inner_keyset} new_dict[row[outer_key]] = item return new_dict
def build_dict(source_file): projects = defaultdict(dict) # if there is no header within the csv file you need to set the header # and utilize fieldnames parameter in csv.DictReader method # headers = ["id", "name", "age", "country"] with open(source_file, "rb") as fp: reader = csv.DictReader(fp, dialect="excel", skipinitialspace=True) for rowdict in reader: if None in rowdict: del rowdict[None] nationality = rowdict.pop("country") date_of_birth = rowdict.pop("name") projects[nationality][date_of_birth] = rowdict return dict(projects)
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct # {outer_key:[{pairs of lst_inner_value}]} def build_level2_dict3(source_file,outer_key,lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: new_dict.setdefault(row[outer_key], []).append({k: row[k] for k in lst_inner_value}) return new_dict
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct # {outer_key:{key of lst_inner_value:[...value of lst_inner_value...]}} def build_level2_dict4(source_file,outer_key,lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: # print row item = new_dict.get(row[outer_key], dict()) # item.setdefault("move from",[]).append(row["move from"]) # item.setdefault("move to", []).append(row["move to"]) for element in lst_inner_value: item.setdefault(element, []).append(row[element]) new_dict[row[outer_key]] = item return new_dict
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # lst_inner_key:a list of column name # lst_inner_value: a list of column name,for circumstance that the inner value of the same lst_inner_key are not distinct # {outer_key:{lst_inner_key:[...lst_inner_value...]}} def build_list_dict2(source_file,outer_key,lst_inner_key,lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: # print row item = new_dict.get(row[outer_key], dict()) item.setdefault(row[lst_inner_key], []).append(row[lst_inner_value]) new_dict[row[outer_key]] = item return new_dict # dct=build_list_dict2("test_info.csv","season","move from","move to")構造三級字典
# build specific nested dict from csv files # a dict like {outer_key:{inner_key1:{inner_key2:{rest_key:rest_value...}}}} # the params are extract from the csv column name as you like def build_level3_dict(source_file,outer_key,inner_key1,inner_key2): new_dict = {} with open(source_file, "rb")as csv_file: reader = csv.reader(csv_file, delimiter=",") fieldnames = next(reader) inner_keyset=fieldnames inner_keyset.remove(outer_key) inner_keyset.remove(inner_key1) inner_keyset.remove(inner_key2) csv_file.seek(0) data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) sub_item = item.get(row[inner_key1], dict()) sub_item[row[inner_key2]] = {k: row[k] for k in inner_keyset} item[row[inner_key1]] = sub_item new_dict[row[outer_key]] = item return new_dict # build specific nested dict from csv files # a dict like {outer_key:{inner_key1:{inner_key2:inner_value}}} # the params are extract from the csv column name as you like def build_level3_dict2(source_file,outer_key,inner_key1,inner_key2,inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) sub_item = item.get(row[inner_key1], dict()) sub_item[row[inner_key2]] = row[inner_value] item[row[inner_key1]] = sub_item new_dict[row[outer_key]] = item return new_dict
# build specific nested dict from csv files # a dict like {outer_key:{inner_key1:{inner_key2:[inner_value]}}} # for multiple inner_value with the same inner_key2,thus gather them in a list # the params are extract from the csv column name as you like def build_level3_dict3(source_file,outer_key,inner_key1,inner_key2,inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) sub_item = item.get(row[inner_key1], dict()) sub_item.setdefault(row[inner_key2], []).append(row[inner_value]) item[row[inner_key1]] = sub_item new_dict[row[outer_key]] = item return new_dict
sub_item.setdefault(row[inner_key2], []).append(row[inner_value])
def dict2csv(dict,file): with open(file,"wb") as f: w=csv.writer(f) # write each key/value pair on a separate row w.writerows(dict.items())
def dict2csv(dict,file): with open(file,"wb") as f: w=csv.writer(f) # write all keys on one row and all values on the next w.writerow(dict.keys()) w.writerow(dict.values())
import csv import pandas as pd from collections import OrderedDict dct=OrderedDict() dct["a"]=[1,2,3,4] dct["b"]=[5,6,7,8] dct["c"]=[9,10,11,12] header = dct.keys() rows=pd.DataFrame(dct).to_dict("records") with open("outTest.csv", "wb") as f: f.write(",".join(header)) f.write(" ") for data in rows: f.write(",".join(str(data[h]) for h in header)) f.write(" ")
[("a", [1, 2, 3, 4]), ("b", [5, 6, 7, 8]), ("c", [9, 10, 11, 12])] to [{"a": 1, "c": 9, "b": 5}, {"a": 2, "c": 10, "b": 6}, {"a": 3, "c": 11, "b": 7}, {"a": 4, "c": 12, "b": 8}]特殊的csv文件的讀取
def func(id_list,input_file,output_file): with open(input_file, "rb") as f: # if the delimiter for header is "," while ";" for rows reader = csv.reader(f, delimiter=",") fieldnames = next(reader) reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=";") rows = [row for row in reader if row["players.player_id"] in set(id_list)] # operation on rows...
1、2016-12-22: 改進了構建二級字典的方法,使其變得更加靈活
2、2016-12-24 14:55:30: 加入構造三級字典的方法
3、2017年1月9日11:26:59: 最內(nèi)部可保存制定列的元素列表
5、2017年2月9日10:54:41: 加入新的二級列表字典的構建
摘要:本節(jié)中將繪制幅圖像收盤折線圖,收盤價對數(shù)變換,收盤價月日均值,收盤價周日均值,收盤價星期均值。對數(shù)變換是常用的處理方法之一。 《Python編程:從入門到實踐》筆記。本篇是Python數(shù)據(jù)處理的第二篇,本篇將使用網(wǎng)上下載的數(shù)據(jù),對這些數(shù)據(jù)進行可視化。 1. 前言 本篇將訪問并可視化以兩種常見格式存儲的數(shù)據(jù):CSV和JSON: 使用Python的csv模塊來處理以CSV(逗號分隔的值)...
摘要:如果你也是學習愛好者,今天講述的個小技巧,真挺香歡迎收藏學習,喜歡點贊支持。因此,鍵將成為值,而值將成為鍵。幸運的是,這可以通過一行代碼快速完成。因此,我們的代碼不會因錯誤而終止。 ...
目錄Numpy簡介Numpy操作集合1、不同維度數(shù)據(jù)的表示1.1 一維數(shù)據(jù)的表示1.2 二維數(shù)據(jù)的表示1.3 三維數(shù)據(jù)的表示2、 為什么要使用Numpy2.1、Numpy的ndarray具有廣播功能2.2 Numpy數(shù)組的性能比Python原生數(shù)據(jù)類型高3 ndarray的屬性和基本操作3.1 ndarray的基本屬性3.2 ndarray元素類型3.3 創(chuàng)建ndarray的方式3.4 ndarr...
摘要:因其在各個領域的實用性與和等其他編程語言相比的生產(chǎn)力以及與英語類似的命令而廣受歡迎。反轉(zhuǎn)字典一個非常常見的字典任務是如果我們有一個字典并且想要反轉(zhuǎn)它的鍵和值。 ??...
摘要:如果該文件已存在,文件指針將會放在文件的結尾。運行結果以上是讀取文件的方法。為了輸出中文,我們還需要指定一個參數(shù)為,另外規(guī)定文件輸出的編碼。 上一篇文章:Python3網(wǎng)絡爬蟲實戰(zhàn)---30、解析庫的使用:PyQuery下一篇文章:Python3網(wǎng)絡爬蟲實戰(zhàn)---32、數(shù)據(jù)存儲:關系型數(shù)據(jù)庫存儲:MySQL 我們用解析器解析出數(shù)據(jù)之后,接下來的一步就是對數(shù)據(jù)進行存儲了,保存的形式可以...
閱讀 1718·2023-04-25 20:16
閱讀 3928·2021-10-09 09:54
閱讀 2753·2021-09-04 16:40
閱讀 2542·2019-08-30 15:55
閱讀 858·2019-08-29 12:37
閱讀 2773·2019-08-26 13:55
閱讀 2932·2019-08-26 11:42
閱讀 3182·2019-08-23 18:26