
First demonstration

A code sample tells more than thousand words:

import dryscrape

search_term = 'dryscrape'

# set up a web scraping session
sess = dryscrape.Session(base_url = '')

# we don't need images
sess.set_attribute('auto_load_images', False)

# visit homepage and search for a term
q = sess.at_xpath('//*[@name="q"]')

# extract all links
for link in sess.xpath('//a[@href]'):
  print link['href']

# save a screenshot of the web page
print "Screenshot written to 'google.png'"

In this sample, we use dryscrape to do a simple web search on Google. Note that we set up a Webkit driver instance here and pass it to a dryscrape Session in the constructor. The session instance then passes every method call it cannot resolve – such as visit(), in this case – to the underlying driver.

A more complex example

There was nothing much special about the example above. Let’s look at a more advanced example that actually works on a Javascript-only application: GMail.

import time
import dryscrape

# Setup

email    = ''

# set up a web scraping session
sess = dryscrape.Session(base_url = '')

# there are some failing HTTP requests, so we need to enter
# a more error-resistant mode (like real browsers do)

# we don't need images
sess.set_attribute('auto_load_images', False)

# if we wanted, we could also configure a proxy server to use,
# so we can for example use Fiddler to monitor the requests
# performed by this script
#sess.set_proxy('localhost', 8888)

# GMail send a mail to self

# visit homepage and log in
print "Logging in..."

email_field    = sess.at_css('#Email')
password_field = sess.at_css('#Passwd')


# find the COMPOSE button and click it
print "Sending a mail..."
compose = sess.at_xpath('//*[contains(text(), "COMPOSE")]')

# compose the mail
to      = sess.at_xpath('//*[@name="to"]', timeout=10)
subject = sess.at_xpath('//*[@name="subject"]')
body    = sess.at_xpath('//*[@name="body"]')

subject.set("Note to self")
body.set("Remember to try dryscrape!")

# send the mail

# seems like we need to wait a bit before clicking...
# Blame Google for this ;)
send = sess.at_xpath('//*[normalize-space(text()) = "Send"]')

# open the mail
print "Reading the mail..."
mail = sess.at_xpath('//*[normalize-space(text()) = "Note to self"]',

# sleep a bit to leave the mail a chance to open.
# This is ugly, it would be better to find something
# on the resulting page that we can wait for

# save a screenshot of the web page
print "Writing screenshot to 'gmail.png'"

This just works.

There are some things to note about it, though:

  • at_xpath() and at_css() take an optional timeout argument that can be used to leave the application a bit of time to load content
  • XPath is really useful, you should make yourself familiar with it. You can also use CSS, however.