API Documentation¶
This documentation also contains the API docs for the webkit_server
module, for convenience (and because I am too lazy to set up dedicated docs
for it).
Overview¶

Module dryscrape.session
¶
-
class
dryscrape.session.
Session
(driver=None, base_url=None)[source]¶ Bases:
object
A web scraping session based on a driver instance. Implements the proxy pattern to pass unresolved method calls to the underlying driver.
If no driver is specified, the instance will create an instance of
dryscrape.session.DefaultDriver
to get a driver instance (defaults todryscrape.driver.webkit.Driver
).If base_url is present, relative URLs are completed with this URL base. If not, the get_base_url method is called on itself to get the base URL.
Module dryscrape.mixins
¶
Mixins for use in dryscrape drivers.
-
class
dryscrape.mixins.
AttributeMixin
[source]¶ Bases:
object
Mixin that adds
[]
access syntax sugar to an object that supports aset_attr
andget_attr
method.
-
class
dryscrape.mixins.
HtmlParsingMixin
[source]¶ Bases:
object
Mixin that adds a
document
method to an object that supports abody
method returning valid HTML.
-
class
dryscrape.mixins.
SelectionMixin
[source]¶ Bases:
object
Mixin that adds different methods of node selection to an object that provides an
xpath
method returning a collection of matches.
-
class
dryscrape.mixins.
WaitMixin
[source]¶ Bases:
dryscrape.mixins.SelectionMixin
Mixin that allows waiting for conditions or elements.
-
at_css
(css, timeout=1, **kw)[source]¶ Returns the first node matching the given CSSv3 expression or
None
if a timeout occurs.
-
at_xpath
(xpath, timeout=1, **kw)[source]¶ Returns the first node matching the given XPath 2.0 expression or
None
if a timeout occurs.
-
Module dryscrape.driver.webkit
¶
Headless Webkit driver for dryscrape. Wraps the webkit_server
module.
-
class
dryscrape.driver.webkit.
Driver
(**kw)[source]¶ Bases:
webkit_server.Client
,dryscrape.mixins.WaitMixin
,dryscrape.mixins.HtmlParsingMixin
Driver implementation wrapping a
webkit_server
driver.Keyword arguments are passed through to the underlying
webkit_server.Client
constructor. By default, node_factory_class is set to use the dryscrape node implementation.
-
class
dryscrape.driver.webkit.
Node
(client, node_id)[source]¶ Bases:
webkit_server.Node
,dryscrape.mixins.SelectionMixin
,dryscrape.mixins.AttributeMixin
Node implementation wrapping a
webkit_server
node.
-
class
dryscrape.driver.webkit.
NodeFactory
(client)[source]¶ Bases:
webkit_server.NodeFactory
overrides the NodeFactory provided by
webkit_server
.
Module webkit_server
¶
Python bindings for the webkit-server
-
class
webkit_server.
Client
(connection=None, node_factory_class=<class 'webkit_server.NodeFactory'>)[source]¶ Bases:
webkit_server.SelectionMixin
Wrappers for the webkit_server commands.
If connection is not specified, a new instance of
ServerConnection
is created.node_factory_class can be set to a value different from the default, in which case a new instance of the given class will be used to create nodes. The given class must accept a client instance through its constructor and support a
create
method that takes a node ID as an argument and returns a node object.Deletes all cookies.
Returns a list of all cookies in cookie string format.
-
eval_script
(expr)[source]¶ Evaluates a piece of Javascript in the context of the current page and returns its value.
-
headers
()[source]¶ Returns a list of the last HTTP response headers. Header keys are normalized to capitalized form, as in User-Agent.
-
render
(path, width=1024, height=1024)[source]¶ Renders the current page to a PNG file (viewport size in pixels).
-
set_attribute
(attr, value=True)[source]¶ Sets a custom attribute for our Webkit instance. Possible attributes are:
auto_load_images
dns_prefetch_enabled
plugins_enabled
private_browsing_enabled
javascript_can_open_windows
javascript_can_access_clipboard
offline_storage_database_enabled
offline_web_application_cache_enabled
local_storage_enabled
local_storage_database_enabled
local_content_can_access_remote_urls
local_content_can_access_file_urls
accelerated_compositing_enabled
site_specific_quirks_enabled
For all those options,
value
must be a boolean. You can find more information about these options in the QT docs.
Sets a cookie for future requests (must be in correct cookie string format).
-
set_error_tolerant
(tolerant=True)[source]¶ DEPRECATED! This function is a no-op now.
Used to set or unset the error tolerance flag in the server. If this flag as set, dropped requests or erroneous responses would not lead to an error.
-
set_html
(html, url=None)[source]¶ Sets custom HTML in our Webkit session and allows to specify a fake URL. Scripts and CSS is dynamically fetched as if the HTML had been loaded from the given URL.
-
exception
webkit_server.
EndOfStreamError
(msg='Unexpected end of file')[source]¶ Bases:
exceptions.Exception
Raised when the Webkit server closed the connection unexpectedly.
-
exception
webkit_server.
InvalidResponseError
[source]¶ Bases:
exceptions.Exception
Raised when the Webkit server signaled an error.
-
exception
webkit_server.
NoResponseError
[source]¶ Bases:
exceptions.Exception
Raised when the Webkit server does not respond.
-
exception
webkit_server.
NoX11Error
[source]¶ Bases:
webkit_server.WebkitServerError
Raised when the Webkit server cannot connect to X.
-
class
webkit_server.
Node
(client, node_id)[source]¶ Bases:
webkit_server.SelectionMixin
Represents a DOM node in our Webkit session.
client is the associated client instance.
node_id is the internal ID that is used to identify the node when communicating with the server.
-
eval_script
(js)[source]¶ Evaluate arbitrary Javascript with the
node
variable bound to the current node.
-
exec_script
(js)[source]¶ Execute arbitrary Javascript with the
node
variable bound to the current node.
-
-
exception
webkit_server.
NodeError
[source]¶ Bases:
exceptions.Exception
A problem occured within a
Node
instance method.
-
class
webkit_server.
NodeFactory
(client)[source]¶ Bases:
object
Implements the default node factory.
client is the associated client instance.
-
class
webkit_server.
SelectionMixin
[source]¶ Bases:
object
Implements a generic XPath selection for a class providing
_get_xpath_ids
,_get_css_ids
andget_node_factory
methods.
-
class
webkit_server.
Server
(binary=None)[source]¶ Bases:
object
Manages a Webkit server process. If binary is given, the specified
webkit_server
binary is used instead of the included one.
-
class
webkit_server.
ServerConnection
(server=None)[source]¶ Bases:
object
A connection to a Webkit server.
server is a server instance or None if a singleton server should be connected to (will be started if necessary).
-
class
webkit_server.
SocketBuffer
(f)[source]¶ Bases:
object
A convenience class for buffered reads from a socket.