引子

笔者最近在做一个有关计算机视觉的大创，需要从flickr爬取大量的图片下来，但是发现网上的爬取脚本质量不是很高，也不符合我的需求，文档质量也不是很高。在这里记录一下我自己写的脚本。

爬取流程

首先去flickr申请属于你自己的API key和API 密钥，然后填写进下面的脚本就行了。

需要科学上网。

注意事项

flickr的搜索模式有两种：All和Tags。在代码中，tag变量的内容应该是你要搜索的关键词，如果您想要使用标签搜索，请将flickr.walk()的tag_mode设置为tags。

代码

注意填写四个变量。

import os
import flickrapi
import urllib

API_KEY = "Enter your key here"
API_SECRET = "Enter your secret here"

flickr = flickrapi.FlickrAPI(API_KEY, API_SECRET, cache=True)
download_num = 500

tag = "Enter your tag here"
path = "Enter your path here"

def main():
    count = 0
    try:
        photos = flickr.walk(tag_mode='all',tags=tag, extras='url_c')
    except Exception as e:
        print("Some error occurs when walking down the photos.")

    for photo in photos:
        if count==download_num:
            return
        url=photo.get('url_c')
        if(str(url) == "None"):
            print("The url is none.")
        else:
            urllib.request.urlretrieve(url, path+str(count)+".jpg")
            print("Done saving images! Count: " + str(count))
            count = count + 1


if __name__ == "__main__":
    main()

2023-05-10 16:37:43 #Python