【python cookbook】找出序列中出現(xiàn)次數(shù)最多的元素

AZmake 發(fā)布于2019-07-30 16:08 / 2345人閱讀

摘要：?jiǎn)栴}中有這么一個(gè)問題，給定一個(gè)序列，找出該序列出現(xiàn)次數(shù)最多的元素。例如統(tǒng)計(jì)出中出現(xiàn)次數(shù)最多的元素初步探討模塊的類首先想到的是模塊的類，具體用法看這里具體用法看這里具體用法看這里，重要的事情強(qiáng)調(diào)三遍。

問題

《Python Cookbook》中有這么一個(gè)問題，給定一個(gè)序列，找出該序列出現(xiàn)次數(shù)最多的元素。
例如：

words = [
   "look", "into", "my", "eyes", "look", "into", "my", "eyes",
   "the", "eyes", "the", "eyes", "the", "eyes", "not", "around", "the",
   "eyes", "don"t", "look", "around", "the", "eyes", "look", "into",
   "my", "eyes", "you"re", "under"
]

統(tǒng)計(jì)出words中出現(xiàn)次數(shù)最多的元素？

初步探討

1、collections模塊的Counter類
首先想到的是collections模塊的Counter類，具體用法看這里！具體用法看這里！具體用法看這里！https://docs.python.org/3.6/l...，重要的事情強(qiáng)調(diào)三遍。

from collections import Counter

words = [
   "look", "into", "my", "eyes", "look", "into", "my", "eyes",
   "the", "eyes", "the", "eyes", "the", "eyes", "not", "around", "the",
   "eyes", "don"t", "look", "around", "the", "eyes", "look", "into",
   "my", "eyes", "you"re", "under"
]

counter_words = Counter(words)
print(counter_words)
most_counter = counter_words.most_common(1)
print(most_counter)

關(guān)于most_common([n])：

2、根據(jù)dict鍵值唯一性和sorted()函數(shù)

import operator

words = [
    "look", "into", "my", "eyes", "look", "into", "my", "eyes",
    "the", "eyes", "the", "eyes", "the", "eyes", "not", "around", "the",
    "eyes", "don"t", "look", "around", "the", "eyes", "look", "into",
    "my", "eyes", "you"re", "under"
]

dict_num = {}
for item in words:
    if item not in dict_num.keys():
        dict_num[item] = words.count(item)
        
# print(dict_num)

most_counter = sorted(dict_num.items(),key=lambda x: x[1],reverse=True)[0]  
print(most_counter)

sorted函數(shù)：
傳送門：https://docs.python.org/3.6/l...

iterable：可迭代類型；
key：用列表元素的某個(gè)屬性或函數(shù)進(jìn)行作為關(guān)鍵字，有默認(rèn)值，迭代集合中的一項(xiàng);
reverse：排序規(guī)則. reverse = True 降序或者 reverse = False 升序，有默認(rèn)值。
返回值：是一個(gè)經(jīng)過排序的可迭代類型，與iterable一樣。

這里，我們使用匿名函數(shù)key=lambda x: x[1]
等同于:

def key(x):
    return x[1]

這里，我們利用每個(gè)元素出現(xiàn)的次數(shù)進(jìn)行降序排序，得到的結(jié)果的第一項(xiàng)就是出現(xiàn)元素最多的項(xiàng)。

更進(jìn)一步

這里給出的序列很簡(jiǎn)單，元素的數(shù)目很少，但是有時(shí)候，我們的列表中可能存在上百萬上千萬個(gè)元素，那么在這種情況下，不同的解決方案是不是效率就會(huì)有很大差別了呢？
為了驗(yàn)證這個(gè)問題，我們來生成一個(gè)隨機(jī)數(shù)列表，元素個(gè)數(shù)為一百萬個(gè)。
這里使用numpy Package,使用前，我們需要安裝該包，numpy包下載地址：https://pypi.python.org/pypi/...。這里我們環(huán)境是centos7，選擇numpy-1.14.2.zip (md5, pgp)進(jìn)行下載安裝，解壓后python setup.py install

def generate_data(num=1000000):
    return np.random.randint(num / 10, size=num)

np.random.randint(low[, high, size]) 返回隨機(jī)的整數(shù)，位于半開區(qū)間 [low, high)
具體用法參考https://pypi.python.org/pypi

OK,數(shù)據(jù)生成了，讓我們來測(cè)試一下兩個(gè)方法所消耗的時(shí)間,統(tǒng)計(jì)時(shí)間，我們用time函數(shù)就可以。

#!/usr/bin/python
# coding=utf-8
#
# File: most_elements.py
# Author: ralap
# Data: 2018-4-5
# Description: find most elements in list
#

from collections import Counter
import operator
import numpy as np
import random
import time


def generate_data(num=1000000):
    return np.random.randint(num / 10, size=num)


def collect(test_list):
    counter_words = Counter(test_list)
    print(counter_words)
    most_counter = counter_words.most_common(1)
    print(most_counter)


def list_to_dict(test_list):
    dict_num = {}
    for item in test_list:
        if item not in dict_num.keys():
            dict_num[item] = test_list.count(item)

    most_counter = sorted(dict_num.items(), key=lambda x: x[1], reverse=True)[0]
    print(most_counter)

if __name__ == "__main__":
    list_value = list(generate_data())

    t1 = time.time()
    collect(list_value)
    t2 = time.time()
    print("collect took: %sms" % (t2 - t1))

    t1 = t2
    list_to_dict(list_value)
    t2 = time.time()
    print("list_to_dict took: %sms" % (t2 - t1))

以下結(jié)果是我在自己本地電腦運(yùn)行結(jié)果，主要是對(duì)比兩個(gè)方法相對(duì)消耗時(shí)間。

當(dāng)數(shù)據(jù)比較大時(shí)，消耗時(shí)間差異竟然如此之大！下一步會(huì)進(jìn)一步研究Counter的實(shí)現(xiàn)方式，看看究竟是什么魔法讓他性能如此好。

參考資料

https://blog.csdn.net/xie_072...

GPU云服務(wù)器云服務(wù)器觀看人數(shù)最多的av網(wǎng)站使用最多的編程語言次數(shù)最多服務(wù)器最多的公司排名

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://systransis.cn/yun/41511.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

AZmake

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

Spark綜合學(xué)習(xí)筆記（三）搜狗搜索日志分析

閱讀 3292·2021-11-18 10:02
為什么這么多應(yīng)屆生要進(jìn)入互聯(lián)網(wǎng)行業(yè)？

閱讀 3454·2021-10-11 10:58
長(zhǎng)知識(shí)了！這8個(gè)很少用但卻很實(shí)用的 Python 庫(kù)真棒！

閱讀 3385·2021-09-24 09:47
云主機(jī)怎么登陸-怎么登錄云主機(jī)？

閱讀 1132·2021-09-22 15:21
寶塔面板建站網(wǎng)站未備案域名無法打開網(wǎng)頁怎么解決?

閱讀 3963·2021-09-10 11:10
??擼完這個(gè)springboot項(xiàng)目，我對(duì)boot輕車熟路！【源碼+視頻都開源】【強(qiáng)烈建議收藏】??

閱讀 3286·2021-09-03 10:28
初探keyframes-animation

閱讀 1756·2019-08-30 15:45
div垂直居中知幾種？

閱讀 2150·2019-08-30 14:22

成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來選購(gòu)！

【python cookbook】找出序列中出現(xiàn)次數(shù)最多的元素

相關(guān)文章

Python3 CookBook | 數(shù)據(jù)結(jié)構(gòu)和算法（二）

Python每日一練0009

Python實(shí)用技法第11篇：找出序列中出現(xiàn)次數(shù)最多的元素

C語言——一維數(shù)組算法問題

Python實(shí)用技法第10篇：對(duì)切片命名

發(fā)表評(píng)論

0條評(píng)論

AZmake

男|高級(jí)講師

TA的文章

Spark綜合學(xué)習(xí)筆記（三）搜狗搜索日志分析

為什么這么多應(yīng)屆生要進(jìn)入互聯(lián)網(wǎng)行業(yè)？

長(zhǎng)知識(shí)了！這8個(gè)很少用但卻很實(shí)用的 Python 庫(kù)真棒！

云主機(jī)怎么登陸-怎么登錄云主機(jī)？

寶塔面板建站網(wǎng)站未備案域名無法打開網(wǎng)頁怎么解決?

??擼完這個(gè)springboot項(xiàng)目，我對(duì)boot輕車熟路！【源碼+視頻都開源】【強(qiáng)烈建議收藏】??

初探keyframes-animation

div垂直居中知幾種？

最新活動(dòng)

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來選購(gòu)！

【python cookbook】找出序列中出現(xiàn)次數(shù)最多的元素

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來選購(gòu)！