API Documentation¶
This documentation also contains the API docs for the webkit_server
module, for convenience (and because I am too lazy to set up dedicated docs
for it).
Overview¶
Module dryscrape.session¶
-
class
dryscrape.session.Session(driver=None, base_url=None)[source]¶ Bases:
objectA web scraping session based on a driver instance. Implements the proxy pattern to pass unresolved method calls to the underlying driver.
If no driver is specified, the instance will create an instance of
dryscrape.session.DefaultDriverto get a driver instance (defaults todryscrape.driver.webkit.Driver).If base_url is present, relative URLs are completed with this URL base. If not, the get_base_url method is called on itself to get the base URL.
Module dryscrape.mixins¶
Mixins for use in dryscrape drivers.
-
class
dryscrape.mixins.AttributeMixin[source]¶ Bases:
objectMixin that adds
[]access syntax sugar to an object that supports aset_attrandget_attrmethod.
-
class
dryscrape.mixins.HtmlParsingMixin[source]¶ Bases:
objectMixin that adds a
documentmethod to an object that supports abodymethod returning valid HTML.
-
class
dryscrape.mixins.SelectionMixin[source]¶ Bases:
objectMixin that adds different methods of node selection to an object that provides an
xpathmethod returning a collection of matches.
-
class
dryscrape.mixins.WaitMixin[source]¶ Bases:
dryscrape.mixins.SelectionMixinMixin that allows waiting for conditions or elements.
-
at_css(css, timeout=1, **kw)[source]¶ Returns the first node matching the given CSSv3 expression or
Noneif a timeout occurs.
-
at_xpath(xpath, timeout=1, **kw)[source]¶ Returns the first node matching the given XPath 2.0 expression or
Noneif a timeout occurs.
-
Module dryscrape.driver.webkit¶
Headless Webkit driver for dryscrape. Wraps the webkit_server module.
-
class
dryscrape.driver.webkit.Driver(**kw)[source]¶ Bases:
webkit_server.Client,dryscrape.mixins.WaitMixin,dryscrape.mixins.HtmlParsingMixinDriver implementation wrapping a
webkit_serverdriver.Keyword arguments are passed through to the underlying
webkit_server.Clientconstructor. By default, node_factory_class is set to use the dryscrape node implementation.
-
class
dryscrape.driver.webkit.Node(client, node_id)[source]¶ Bases:
webkit_server.Node,dryscrape.mixins.SelectionMixin,dryscrape.mixins.AttributeMixinNode implementation wrapping a
webkit_servernode.
-
class
dryscrape.driver.webkit.NodeFactory(client)[source]¶ Bases:
webkit_server.NodeFactoryoverrides the NodeFactory provided by
webkit_server.
Module webkit_server¶
Python bindings for the webkit-server
-
class
webkit_server.Client(connection=None, node_factory_class=<class 'webkit_server.NodeFactory'>)[source]¶ Bases:
webkit_server.SelectionMixinWrappers for the webkit_server commands.
If connection is not specified, a new instance of
ServerConnectionis created.node_factory_class can be set to a value different from the default, in which case a new instance of the given class will be used to create nodes. The given class must accept a client instance through its constructor and support a
createmethod that takes a node ID as an argument and returns a node object.Deletes all cookies.
Returns a list of all cookies in cookie string format.
-
eval_script(expr)[source]¶ Evaluates a piece of Javascript in the context of the current page and returns its value.
-
headers()[source]¶ Returns a list of the last HTTP response headers. Header keys are normalized to capitalized form, as in User-Agent.
-
render(path, width=1024, height=1024)[source]¶ Renders the current page to a PNG file (viewport size in pixels).
-
set_attribute(attr, value=True)[source]¶ Sets a custom attribute for our Webkit instance. Possible attributes are:
auto_load_imagesdns_prefetch_enabledplugins_enabledprivate_browsing_enabledjavascript_can_open_windowsjavascript_can_access_clipboardoffline_storage_database_enabledoffline_web_application_cache_enabledlocal_storage_enabledlocal_storage_database_enabledlocal_content_can_access_remote_urlslocal_content_can_access_file_urlsaccelerated_compositing_enabledsite_specific_quirks_enabled
For all those options,
valuemust be a boolean. You can find more information about these options in the QT docs.
Sets a cookie for future requests (must be in correct cookie string format).
-
set_error_tolerant(tolerant=True)[source]¶ DEPRECATED! This function is a no-op now.
Used to set or unset the error tolerance flag in the server. If this flag as set, dropped requests or erroneous responses would not lead to an error.
-
set_html(html, url=None)[source]¶ Sets custom HTML in our Webkit session and allows to specify a fake URL. Scripts and CSS is dynamically fetched as if the HTML had been loaded from the given URL.
-
exception
webkit_server.EndOfStreamError(msg='Unexpected end of file')[source]¶ Bases:
exceptions.ExceptionRaised when the Webkit server closed the connection unexpectedly.
-
exception
webkit_server.InvalidResponseError[source]¶ Bases:
exceptions.ExceptionRaised when the Webkit server signaled an error.
-
exception
webkit_server.NoResponseError[source]¶ Bases:
exceptions.ExceptionRaised when the Webkit server does not respond.
-
exception
webkit_server.NoX11Error[source]¶ Bases:
webkit_server.WebkitServerErrorRaised when the Webkit server cannot connect to X.
-
class
webkit_server.Node(client, node_id)[source]¶ Bases:
webkit_server.SelectionMixinRepresents a DOM node in our Webkit session.
client is the associated client instance.
node_id is the internal ID that is used to identify the node when communicating with the server.
-
eval_script(js)[source]¶ Evaluate arbitrary Javascript with the
nodevariable bound to the current node.
-
exec_script(js)[source]¶ Execute arbitrary Javascript with the
nodevariable bound to the current node.
-
-
exception
webkit_server.NodeError[source]¶ Bases:
exceptions.ExceptionA problem occured within a
Nodeinstance method.
-
class
webkit_server.NodeFactory(client)[source]¶ Bases:
objectImplements the default node factory.
client is the associated client instance.
-
class
webkit_server.SelectionMixin[source]¶ Bases:
objectImplements a generic XPath selection for a class providing
_get_xpath_ids,_get_css_idsandget_node_factorymethods.
-
class
webkit_server.Server(binary=None)[source]¶ Bases:
objectManages a Webkit server process. If binary is given, the specified
webkit_serverbinary is used instead of the included one.
-
class
webkit_server.ServerConnection(server=None)[source]¶ Bases:
objectA connection to a Webkit server.
server is a server instance or None if a singleton server should be connected to (will be started if necessary).
-
class
webkit_server.SocketBuffer(f)[source]¶ Bases:
objectA convenience class for buffered reads from a socket.