Skip to content

User agent API

Wayfarer retrieves URL contents with user agents. It supports two types of user agents:

  • stateless HTTP clients which handle redirects explicitly.
  • stateful browsers which carry state and follow redirects implicitly as they navigate to a URL
Stateless adapters Stateful adapters
interactive no yes
redirect handling explicit implicit

Because spawning browser processes or instantiating HTTP clients is expensive, Wayfarer keeps user agents in a pool and reuses them across jobs. This means that browser state carries over between jobs, as each job checks out a previous job's user agent. When exceptions are raised, you must handle them.

Only on certain irrecoverable errors are individual user agents destroyed and recreated. For example when a browser process crashes, it is replaced with a fresh browser process.

Implementing the user agent interfaces

You implement both stateful and stateless agents by including the Wayfarer::Networking::Strategy module and defining callback methods:

classDiagram
class BaseAgent {
    <<Interface>>
    +#create()*
    +#destroy(instance)*
    +::renew_on()$
}

class StatefulAgent {
    <<Interface>>
    +#navigate(instance, url)*
    +#live(instance)*
}

class StatelessAgent {
    <<Interface>>
    +#fetch(instance, url)*
}

BaseAgent |>.. StatefulAgent : implements
BaseAgent |>.. StatelessAgent : implements

Every user agent implementation must provide the #create instance callback which returns an initialized user agent. Typically, the optional #destroy(instance) instance callback is also implemented to free resources of an existing user agent.

Also a class method ::renew_on can be defined which returns an array of exception classes upon which an instance of the user agent should get recreated (destroy-and-create).

Stateless interface

In addition to the base interface, stateless user agents implement #fetch(instance, url) which fetches pages or indicates redirects:

  • #create() (required)
  • #fetch(instance, url) (required): Called to retrieve a URL. Responses with a 3xx status code must indicate the redirect URL by returning redirect(url), since Wayfarer deals with redirects on your behalf to avoid redirect loops. All other status codes, including 4xx and 5xx, are considered successful and are indicated by calling success(url:, body:, status_code:, headers:).
  • #destroy(instance) (optional)
  • ::renew_on (optional)

The stateless interface indicate HTTP 3xx redirect responses explicitly. This is how Wayfarer provides redirect handling out of the box, as there is a configurable limit on the number of retries to follow.

redirect(url) enqueues a task for the URL and stops processing the current task.

Pages with HTTP error status codes get routed

If a HTTP request to a URL results in an error status code (for example, 404), page retrieval is considered successful. This allows job actions to record such data.

Stateful interface

In addition to the base interface, stateful user agents implement two additional instance callbacks:

  • #create() (required)
  • #navigate(instance, url) (required): Navigates the user agent to the given URL.
  • #live(instance) (required): Turns the current user agent state into a page by calling success(url:, body:, status_code:, headers:).
  • #destroy(instance) (optional)
  • ::renew_on (optional)

Example implementations

class StatelessAgent
  include Wayfarer::Networking::Strategy

  def self.renew_on # optional
    [MyIrrecoverableError]
  end

  def create # required
    MyClient.new
  end

  def fetch(client, url) # required
    response = client.get(url)

    return redirect(response.redirect_url) if response.redirect?

    success(url: url,
            body: response.body,
            status_code: response.status_code,
            headers: response.headers)
  end

  def destroy(client) # optional
    client.close
  end
end
class StatefulAgent
  include Wayfarer::Networking::Strategy

  def self.renew_on # optional
    [MyIrrecoverableError]
  end

  def create # required
    MyBrowser.new
  end

  def navigate(browser, url) # required
    browser.goto(url)
  end

  def live(browser) # required
    success(url: browser.url,
            body: browser.body,
            status_code: browser.status_code,
            headers: browser.headers)
  end

  def destroy(browser) # optional
    browser.quit
  end
end

Register and use the agent:

Wayfarer.config[:network][:agents][:my_agent] = MyAgent.new
Wayfarer.config[:network][:agent] = :my_agent