Extracting Pictures from Flickr with Python

A few years ago, when Flickr was new, we made kind of a silly decision to only store our images on Flickr. While this did make a transition between computers easier and freed up a little bit of drive space, we decided that we'd like to pull those pictures back onto our own system.

In the past, I've tried several Flickr Downloadr (missing 'e' intended as a pun) programs, and everything choked or did strange things. Last night, right after I crawled into bed, I decided that I knew a better way.

I crawled out of bed, did a bit of Googling, and found an excellent Python Flickr library, and someone that wrote a python script to backup Flickr pictures to Amazon's S3. In 15 minutes or so, I had a solution that would page through our public pictures, check to see if they were already downloaded, and store them in year and month folders.

import flickr
import urllib
import os.path
import os
page = 1
total_photos = found_photos = 0
while True:
   photos = flickr.people_getPublicPhotos('68432331@N00', 100, page)
   if not len(photos):
      break
   for photo in photos:
      total_photos = total_photos + 1
      photoYear = photo.datetaken[0:4]
      photoMonth = photo.datetaken[5:7]
      photoURL = photo.getURL('Original', 'source')
      photoPath = "C:\FlickrPics\%s\%s\%s.jpg" % (photoYear, photoMonth, photo.id)
      if not os.path.exists(photoPath):
         if not os.path.exists("c:\FlickrPics\%s" % photoYear):
            os.mkdir("c:\FlickrPics\%s" % photoYear)
         if not os.path.exists("c:\FlickrPics\%s\%s" % (photoYear, photoMonth)):
            s.mkdir("c:\FlickrPics\%s\%s&" % (photoYear, photoMonth))
         urllib.urlretrieve(photoURL, photoPath)
         found_photos += 1
      page = page + 1
      print "  Moving to page %s" % page
print "Found %s photos, saved %s new photos" % (total_photos, found_photos)</p>

While running the script, I noticed that the obvious slow part of the process was downloading the images. I wanted a way to download them in parallel, and found my solution in an excellent Python thread pool solution.

I added the following method to their code, that will tell me the number of jobs pending in the thread pool:

#new method in the ThreadPool class
def getWaitingTaskCount(self):
   self.__taskLock.acquire()
   count = len(self.__tasks)
   self.__taskLock.release()
   return count

I had to rewrite my code a little to put the download code into a method, and then I let 'er rip. Downloading now happened 3 at a time. It took about 2 hours to download our 5GB collection of approx 3,300 pictures, and the job was done!

I don't bother with authenticated requests (all of our pictures are public), and I don't do any error checking. It is a one off script, after all. :)

I love python. I really do.

import flickr
import urllib
import os.path
import os
import threading
from time import sleep

#code from ThreadPool not shown.
#Copied from linked solution, with addition new method listed above

pool = ThreadPool(3)

def getPicture(data):
   photoURL = data[0]
   photoPath = data[1]
   urllib.urlretrieve(photoURL, photoPath)
   print photoPath

page = 1
total_photos = found_photos = 0

while True:
   photos = flickr.people_getPublicPhotos("68432331@N00", 100, page)
   if not len(photos):
      break
   for photo in photos:
      total_photos = total_photos + 1
      photoYear = photo.datetaken[0:4]
      photoMonth = photo.datetaken[5:7]
      photoURL = photo.getURL('Original', 'source')
      photoPath = "C:\FlickrPics\%s\%s\%s.jpg" % (photoYear, photoMonth, photo.id)
      if not os.path.exists(photoPath):
         if not os.path.exists("c:\FlickrPics\%s" % photoYear):
            os.mkdir("c:\FlickrPics\%s" % photoYear)
         if not os.path.exists("c:\FlickrPics\%s\%s" % (photoYear, photoMonth)):
            s.mkdir("c:\FlickrPics\%s\%s&" % (photoYear, photoMonth))
         # Insert tasks into the queue and let them run
         pool.queueTask(getPicture, (photoURL, photoPath))

         found_photos += 1
   page = page + 1
   #don't get too far ahead of the download threads
   while pool.getWaitingTaskCount() > 10:
      sleep(1)
   print " Moving to page %s" % page
# When all tasks are finished, allow the threads to terminate
pool.joinAll()
print "Found %s photos, saved %s new photos" % (total_photos, found_photos)

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
# Posted By air jordan | 3/28/10 1:42 AM
air max's Gravatar http://www.saleairmax.com Nike air max shoes
http://www.saleairmax.com nike air max
http://www.saleairmax.com Nike air max shoes 2009
http://www.saleairmax.com Nike air max shoes Tn
http://www.saleairmax.com Nike air max shoes 360
# Posted By air max | 4/1/10 11:26 PM
ed hardy's Gravatar http://www.ed-hardy.cc ed hardy
http://www.ed-hardy.cc buy ed hardy
http://www.ed-hardy.cc cheap ed hardy
http://www.ed-hardy.cc discount ed hardy
http://www.ed-hardy.cc cheap ed hardy
http://www.ed-hardy.cc ed hardy outlet
http://www.ed-hardy.cc wholesale ed hardy
http://www.ed-hardy.cc ed hardy for sale
http://www.ed-hardy.cc ed hardy on sale
http://www.ed-hardy.cc sale ed hardy
http://www.ed-hardy.cc don ed hardy
http://www.ed-hardy.cc ed hardy designs
http://www.ed-hardy.cc authentic ed hardy
# Posted By ed hardy | 4/5/10 11:44 PM
ed hardy clothing's Gravatar http://www.4unj.com wholesale NFL jerseys
http://www.4unj.com cheap NFL jerseys
http://www.4unj.com NFL jerseys
# Posted By ed hardy clothing | 5/9/10 11:19 PM
# Posted By coach handbags | 5/13/10 11:36 PM
Air Jordan's Gravatar http://www.aj-family.com Michel Jordan
http://www.aj-family.com Nike Air Shoes
http://www.aj-family.com Air Jordan
http://www.aj-family.com Discount Nike Jordan Shoes
http://www.aj-family.com Discount Nike Jordan
http://www.aj-family.com Jordan Shoes
# Posted By Air Jordan | 5/18/10 7:07 PM
Jordan 1's Gravatar The ideals which have lighted my way , and time after time have given me new courage to face life cheerfully 19 have been kindness , beauty and truth.
http://fansshirt.com/nfl-nfl-hat-c-1_71.html NFL Hat
http://fansshirt.com/nfl-indianapolis-colts-c-1_12.html" target="_blank">http://fansshirt.com/nfl-indianapolis-colts-c-1_12... Indianapolis Colts jersey
http://fansshirt.com/nfl-miami-dolphins-c-1_14.html" target="_blank">http://fansshirt.com/nfl-miami-dolphins-c-1_14.htm... Miami Dolphins jersey
http://fansshirt.com/nfl-green-bay-packers-c-1_11.html Green Bay Packers jersey
www.jordanmass.com jordan shoes
www.boon-shoes.com ugg shoes
http://fansshirt.com nfl jersey
http://jordanmass.com/nike-jordan-1-c-1.html nike air jordan 1 flight
http://jordanmass.com/nike-jordan-9-c-9.html nike air jordan 9
http://jordanmass.com/nike-jordan-11-c-11.html nike air jordan 11
http://jordanmass.com/nike-jordan-12-c-13.html nike air jordan 12
www.boon-shoes.com/ugg-classic-crochet-knit-c-8.htm" target="_blank">http://www.boon-shoes.com/ugg-classic-crochet-knit... ugg classic crochet knit
www.boon-shoes.com/ugg-classic-cardy-knit-c-7.htm" target="_blank">http://www.boon-shoes.com/ugg-classic-cardy-knit-c... ugg classic cardy knit
www.boon-shoes.com/ugg-classic-argyle-knit-c-6.html" target="_blank">http://www.boon-shoes.com/ugg-classic-argyle-knit-... ugg classic argyle knit
# Posted By Jordan 1 | 5/19/10 2:04 AM
jojo's Gravatar http://www.golden-seller.com
Newest fashion style of sport shoes, jackets, underwear,NHL, NFL, MLB, NBA Jerseys. sunglasses,wallets,boots,rings,necklace,bikinis,boots,dress shoes,high heel shoes,jeans,t-shirt,outfit,hot underwear,swimsuit,rings,necklace,belts,purse,handbags,caps,scarf,

accept paypal,High quality,competitive price, fast delivery
http://www.golden-seller.com
underwear $5
caps $7
Sunglass $12
Purse: $12
Necklace $15
Bracelet $15
Jersey $23
handbag $33
Bikini $23
http://www.golden-seller.com
golden_seller01@hotmail.com
# Posted By jojo | 5/21/10 7:55 AM
# Posted By gucci bags | 5/21/10 8:15 AM