Python 批量POST下载+已存在文件判断

基于上一个版本制作的新版,增加了文件下载完毕判断的功能,这样就不必重复下载已经下载过的文件,用起来也比较顺畅

原理依旧很简单,os.path.exists()函数判断文件是否存在,从响应头中取Content-Length部分与os.path.getsize()函数的返回值进行比较,若相等则说明该文件已经下载过了.

Python的逻辑运算是短路逻辑,所以也不用担心os.path.getsize会抛出异常.

#!/usr/bin/env python
import sys,os  
import httplib,urllib  
import threading

class download(threading.Thread):

    def __init__(self,number):
        threading.Thread.__init__(self,name=str(number))
        self.number=str(number)

    def run(self):
        params=urllib.urlencode({'name':'zipform'})
        headers={'Host':'imgloop.com',
                 'User-Agent':'Mozilla/5.0 (X11; Linux i686; rv:6.0) Gecko/20100101 Firefox/6.0',
                 'Accept':'*/*',
                 'Accept-Language':'zh-cn,zh;q=0.5',
                 'Accept-Encoding':'gzip, deflate',
                 'Accept-Charset':'GB2312,utf-8;q=0.7,*;q=0.7',
                 'Connection':'keep-alive',
                 'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
                 'X-Requested-With':'XMLHttpRequest',
                 'Referer':'http://imgloop.com/Home/article/'+self.number,
                 'Content-Length':'12'
                 }
        conn=httplib.HTTPConnection('imgloop.com')
        conn.request('POST','http://imgloop.com/Home/zip/'+self.number,params,headers)
        response=conn.getresponse()
        if response.status==200:
            fileName=self.number+'.zip'
            #check file has been downloaded
            if os.path.exists(fileName) and long(response.getheader('Content-Length'))==os.path.getsize(fileName):
                print self.number+' has been downloaded.'
                conn.close()
                return
            f=open(fileName,'w')
            f.write(response.read())
            f.close()
            print self.number+' download completed.'
        else:
            print self.number+' download failed.'
        conn.close()

if __name__ == '__main__':  
    #if only 1 argument. 
    if len(sys.argv)==2:
        download(int(sys.argv[1])).start()
        exit()
    for i in range(int(sys.argv[1]),int(sys.argv[2])):
        download(i).start()

顺便加上了判断参数数量的功能,在之前我们只能

python download.py argument1 argument2  

现在支持只下载批量文件url中的其中一个文件

python download.py argument1