散列表結(jié)構(gòu) 字典與集合

Bamboy 發(fā)布于2019-07-31 11:13 / 983人閱讀

摘要：散列表結(jié)構(gòu)字典與集合散列表散列表結(jié)構(gòu)是字典和集合的一種實(shí)現(xiàn)方式。使用散列表存儲(chǔ)數(shù)據(jù)時(shí)，通過(guò)一個(gè)散列函數(shù)將鍵映射為一個(gè)數(shù)字，這個(gè)數(shù)字范圍是到列表長(zhǎng)度。即使使用一個(gè)高效的散列函數(shù)，仍然存在將兩個(gè)鍵映射為同一個(gè)值的可能，這種現(xiàn)象稱為碰撞。

散列表結(jié)構(gòu) 字典與集合 散列表

散列表(Hash Table)結(jié)構(gòu)是字典(Dictionary)和集合(Set)的一種實(shí)現(xiàn)方式。散列算法的作用是盡可能快地在數(shù)據(jù)結(jié)構(gòu)中找到一個(gè)值。在散列表上插入、刪除和取用數(shù)據(jù)都非?？?，但是對(duì)于查找操作來(lái)說(shuō)卻效率地下

散列表是基于數(shù)組進(jìn)行設(shè)計(jì)的，數(shù)組的長(zhǎng)度是預(yù)先設(shè)定，如有需要可隨時(shí)增加。所有元素根據(jù)和該元素對(duì)應(yīng)的鍵，保存在數(shù)組的特定位置。使用散列表存儲(chǔ)數(shù)據(jù)時(shí)，通過(guò)一個(gè)散列函數(shù)將鍵映射為一個(gè)數(shù)字，這個(gè)數(shù)字范圍是0到列表長(zhǎng)度。散列函數(shù)的選擇依賴于鍵的數(shù)據(jù)類型，在此我們對(duì)鍵的hash值對(duì)數(shù)組長(zhǎng)度區(qū)余的方法。散列表的數(shù)組究竟應(yīng)該有多大？這是編寫(xiě)散列函數(shù)時(shí)必須要考慮的。對(duì)散列表大小的限制，通常數(shù)組的長(zhǎng)度應(yīng)該是一個(gè)質(zhì)數(shù)。

理想情況下，散列函數(shù)會(huì)將每個(gè)鍵值映射為唯一的數(shù)組索引，然而，鍵的數(shù)量是無(wú)限的，散列表的長(zhǎng)度是有限的，一個(gè)理想的目標(biāo)是讓散列函數(shù)盡量將鍵均勻地映射到散列表中。即使使用一個(gè)高效的散列函數(shù)，仍然存在將兩個(gè)鍵映射為同一個(gè)值的可能，這種現(xiàn)象稱為碰撞(collision)。當(dāng)碰撞發(fā)生時(shí)，我們需要方案去解決。

分離鏈接：實(shí)現(xiàn)散列表底層數(shù)組中，每個(gè)數(shù)組元素是一個(gè)新的數(shù)據(jù)結(jié)構(gòu)，比如另一個(gè)數(shù)組(二維數(shù)組)，這樣就能存儲(chǔ)多個(gè)鍵了。即使兩個(gè)鍵散列后的值相同，依然被保存在同樣的位置，只不過(guò)它們?cè)诘诙€(gè)數(shù)組中的位置不一樣罷了。

線性探查：當(dāng)發(fā)生碰撞時(shí)，線性探測(cè)法檢測(cè)散列表的下一個(gè)位置是否為空。如果為空，就將數(shù)據(jù)存入該位置；如果不為空，則繼續(xù)檢查下一個(gè)位置，直到找到一個(gè)空的位置為止。

負(fù)載因子：如果我們持續(xù)往散列表中添加數(shù)據(jù)空間會(huì)不夠用。負(fù)載因子是已使用的空間比散列表大小的值。比如，散列表大小為13，已使用空間位8，負(fù)載因子位0.62。通常當(dāng)負(fù)載因子超過(guò)0.8時(shí)，就要新開(kāi)辟空間并重新散列了。

散列表的操作：

方法	操作
put	向散列表添加新鍵值，或更新鍵的值
remove	從散列表刪除鍵值
get	返回鍵索引到的值

# python3
class HashTable:
    def __init__(self, size=11):
        self._keys = [None] * size
        self._values = [None] * size
        self._length = 0

    # 獲取負(fù)載因子
    @property
    def _load_factor(self):
        return self._length / float(len(self._keys))

    # 散列函數(shù)
    def _hash_func(self, key):
        l = len(self._keys)
        idx = abs(hash(key)) % l
        # 獲取空索引
        while self._keys[idx] is not None and 
                self._keys[idx] != key:
            idx = (idx + l + 1) % l
        return idx

    def put(self, key, value):
        idx = self._hash_func(key)
        # 更新
        if self._keys[idx] == key:
            self._values[idx] = value
        # 添加
        elif self._keys[idx] is None:
            self._keys[idx] = key
            self._values[idx] = value
            self._length += 1
            # 檢查負(fù)載因子
            if self._load_factor >= 0.8:
                self._rehash()

    def get(self, key):
        idx = self._hash_func(key)
        if self._keys[idx] == key:
            return self._values[idx]
        return None

    def remove(self, key):
        idx = self._hash_func(key)
        if self._keys[idx] == key:
            self._keys[idx] = None
            self._values[idx] = None
            self._length -= 1
        elif self._keys[idx] is None:
            self._values[idx] = None
        else:
            return -1

    # 重新散列，擴(kuò)展大小
    def _rehash(self):
        old_keys = self._keys
        old_value = self._values
        new_size = len(self._keys) * 2
        self._keys = [None] * new_size
        self._values = [None] * new_size
        self._length = 0
        for idx in range(len(old_keys)):
            if old_keys[idx] is not None:
                self.put(old_keys[idx], old_value[idx])

    def length(self):
        return self._length

字典

散列表的基本方法就是字典常用的方法，在此可以繼承散列表類的方法，然后完善其他的字典支持的方法。

字典的操作：

方法	操作
keys	返回所有鍵
values	返回所有值
items	返回所有鍵值對(duì)

# python3
class Dict(HashTable):
    def keys(self):
        return [key for key in self._keys if key is not None]

    def values(self):
        return [value for value in self._values if value is not None]

    def items(self):
        return [(self._keys[idx], self._values[idx])
                for idx in range(0, len(self._keys))
                if self._keys[idx] is not None]

    def __iter__(self):
        return iter(self.keys())

    def __len__(self):
        return self.length()

    def __getitem__(self, key):
        return self.get(key)

    def __setitem__(self, key, value):
        self.put(key, value)

    def __contains__(self, key):
        idx = self._hash_func(key)
        return self._keys[idx] is not None

集合

集合是一種包含不同元素的數(shù)據(jù)結(jié)構(gòu)。集合中的元素被稱為成員。集合的兩個(gè)重要特性：首先，集合中的成員是無(wú)序的；其次：集合中不允許相同的成員存在。

集合的定義：

不包含任何成員的集合稱為空集，包含一切可能成員的集合稱為全集。

如果兩個(gè)和的成員完全相同，則稱兩個(gè)集合相等。

如果一個(gè)集合中所有的成員都屬于另一個(gè)集合，則前一集合稱為后一集合的子集。

集合的運(yùn)算：

并集：將兩個(gè)集合中的成員進(jìn)行合并，得到一個(gè)新集合。

交集：兩個(gè)集合中共同存在的成員組成一個(gè)新的集合。

補(bǔ)集：屬于一個(gè)集合而不屬于另一個(gè)集合的成員組成的集合。

其實(shí)集合也是個(gè)散列表，散列表有鍵和值，在這里我們把值設(shè)置位True即可。具體實(shí)現(xiàn)如下。

集合的操作：

方法	操作
put	向集合添加成員。
remove	從集合移除成員。
union	接收一個(gè)集合進(jìn)行并集運(yùn)算返回結(jié)果
intersection	接收一個(gè)集合進(jìn)行交集運(yùn)算返回結(jié)果
difference	接收一個(gè)集合進(jìn)行補(bǔ)集運(yùn)算返回結(jié)果

# python3
class Set(HashTable):
    def put(self, key):
        return super(Set, self).put(key, value=True)

    # 并集運(yùn)算
    def union(self, other):
        if isinstance(other, Set):
            temp = other
            for key in self._keys:
                temp.put(key)
            return temp
        else:
            raise TypeError

    # 交集運(yùn)算
    def intersection(self, other):
        if isinstance(other, Set):
            temp = Set()
            for key in self._keys:
                if key in other:
                    temp.put(key)
            return temp
        else:
            raise TypeError()

    # 補(bǔ)集運(yùn)算
    def difference(self, other):
        if isinstance(other, Set):
            temp = Set()
            for key in self._keys:
                if key not in other:
                    temp.put(key)
            return temp
        else:
            raise TypeError()

    def __or__(self, other):
        return self.union(other)

    def __and__(self, other):
        return self.intersection(other)

    def __sub__(self, other):
        return self.difference(other)

    def __len__(self):
        return self._length

    def __iter__(self):
        for key in self._keys:
            if key is not None:
                yield key

    def __contains__(self, key):
        idx = self._hash_func(key)
        return self._keys[idx] is not None