【Python教學】淺談 Multi-threading Pool 使用方法

先來看一下執行 Multi-threading Pool (多執行緒池/多線程池) 後，可以將每一個頁面原本需要等待 15 s 才能載入完畢並爬取的頁面，降至 3.9 秒 (10 threading pool) ~ 2.28 秒(25 threading pool)，但是多線程池的使用要非常留意 tread-safe 問題，不小心就會像上面圖片一樣，想像中大家是各自完成任務，但實際上是亂七八糟的執行任務。

Multi-threading Pool 實現方法一

首先install threadpool，這是個最後更新時間在2015年的套件目前已經沒有人維護了，但使用起來相當簡單和方便，目前python3.7版本使用上沒有問題。

1	pip install threadpool

多線池程式如下，將要執行的主程式放在main的位置，而list_of_args則是給threadpool要放入的參數，如果是單一參數以list = [‘A’,’B’,’C’]帶入即可

import threading

# 定義線程數量

pool = ThreadPool(3)

# 調用makeRequests創建要開啟的多線程函數，main為線程主要執行程式，list_of_args為要給main執行的參數

requests = makeRequests(main, list_of_args)

# 將所有要運行的多線程請求丟進線程池

[pool.putRequest(req) for req in requests]

# 等待所有線程完成後退出

pool.wait()

如果list_of_args要放入多參數的話，須以下列格式放入：

1	list_of_args = [(["www.yahoo.com.tw", "27.105.45.13"], None), (["www.google.com.tw", "33.125.45.13"], None)]

我最終的寫法是這樣，一個是url的參數，一個是ip的參數

<span class="line"><span class="keyword">import Queue

q= Queue()

</span></span>for i inrange(len(df)):

q.put(df['頁面'][i])

lst = [([q.get(),ip_list[random.randrange(0,len(ip_list))]],None) for i in range(q.qsize())]

Multi-threading Pool 實現方法二

但我個人比較推薦的是multiprocessing.dummy，因為他跟進程池只差在dummy的不同，所以在切換上還蠻方便的！

multiprocessing.dummy 多線池

from multiprocessing.dummy import Pool as ThreadPool

def f(x):

return x*x

if __name__ == '__main__':

p =ThreadPool(5)

p.map(f, [1, 2, 3])

multiprocessing 進程池

from multiprocessing import Pool

def f(x):

return x*x

if __name__ == '__main__':

p = Pool(5)

p.map(f, [1, 2, 3])

最後～

▍關於與 Concurrency Programming 相關其他文章，可以參考：

▍關於 Async IO 相關其他文章，可以參考：

那麼有關於【Python教學】淺談 Multi-threading Pool 使用方法的介紹就到這邊告一個段落囉！有任何問題可以在以下留言～

有關 Max行銷誌的最新文章，都會發佈在 Max 的 Facebook 粉絲專頁，如果想看最新更新，還請您按讚或是追蹤唷！