Python 批量POST下载+已存在文件判断+出错后自动重试

第一个版本做出来使用后发现一些问题,并在第二个版本中进行了修订,现在这个版本已经能满足基本要求.

主要新增重试功能,原因是服务器因为受到大量请求认为你在攻击服务器所以reset了链接(connection reset by peer),所以我们必须进行重试,以免漏掉一些文件.

还有个细节变化是任务的完成时间会显示,方便观察程序的运行.

目前还有一个问题就是同时进行的线程数量不能太多…会报can’t start new thread异常,另外代码的执行效率仍有待提高,毕竟线程数量太多,需要修改分发机制

下面是完整代码,可以直接下载imgloop的图包,想制作别的网站的下载链接需要自己摸清网站结构,然后照着写代码即可:

#!/usr/bin/env python
import sys,os  
import httplib,urllib  
import threading  
import time

def getTime():  
    """In hours:minutes:seconds format returns the current time"""
    return time.strftime('%H:%M:%S')

class download(threading.Thread):

    def __init__(self,number,sleepTime=60):
        threading.Thread.__init__(self,name=str(number))
        self.number=str(number)
        #Redownload delay
        self.sleepTime=sleepTime

    def run(self):
        params=urllib.urlencode({'name':'zipform'})
        headers={'Host':'imgloop.com',
                 'User-Agent':'Mozilla/5.0 (X11; Linux i686; rv:6.0) Gecko/20100101 Firefox/6.0',
                 'Accept':'*/*',
                 'Accept-Language':'zh-cn,zh;q=0.5',
                 'Accept-Encoding':'gzip, deflate',
                 'Accept-Charset':'GB2312,utf-8;q=0.7,*;q=0.7',
                 'Connection':'keep-alive',
                 'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
                 'X-Requested-With':'XMLHttpRequest',
                 'Referer':'http://imgloop.com/Home/article/'+self.number,
                 'Content-Length':'12'
                 }
        conn=httplib.HTTPConnection('imgloop.com')
        conn.request('POST','http://imgloop.com/Home/zip/'+self.number,params,headers)
        response=conn.getresponse()
        if response.status==200:
            fileName=self.number+'.zip'
            #Check file has been downloaded.
            if os.path.exists(fileName) and long(response.getheader('Content-Length'))==os.path.getsize(fileName):
                print '%s %s has been downloaded.' % (getTime(),self.number)
                conn.close()
                return
            while(True):
                try:
                    f=open(fileName,'w')
                    f.write(response.read())
                    f.close()
                except Exception:
                    print '%s %s has a error, in %s redownload' % (getTime(),self.number,self.sleepTime)
                    time.sleep(self.sleepTime)
                    #Redownload.
                    continue
                finally:
                    print '%s %s download completed.' % (getTime(),self.number)
                    #End of the download.
                    break
        else:
            print '%s %s download failed.' % (getTime(),self.number)
        conn.close()

if __name__ == '__main__':  
    #If only 1 argument. 
    if len(sys.argv)==2:
        download(int(sys.argv[1])).start()
        exit()
    for i in range(int(sys.argv[1]),int(sys.argv[2])):
        download(i).start()