Fernando Alonso 0139fc0f64 Test login behat FD3 8 years ago
..
Goutte 0139fc0f64 Test login behat FD3 8 years ago
.gitignore 0139fc0f64 Test login behat FD3 8 years ago
.travis.yml 0139fc0f64 Test login behat FD3 8 years ago
LICENSE 0139fc0f64 Test login behat FD3 8 years ago
README.rst 0139fc0f64 Test login behat FD3 8 years ago
box.json 0139fc0f64 Test login behat FD3 8 years ago
composer.json 0139fc0f64 Test login behat FD3 8 years ago
phpunit.xml.dist 0139fc0f64 Test login behat FD3 8 years ago

README.rst

Goutte, a simple PHP Web Scraper
================================

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML
responses.

Requirements
------------

Goutte works with PHP 5.3.3 or later.

Installation
------------

Add ``fabpot/goutte`` as a require dependency in your ``composer.json`` file:

.. code-block:: bash

php composer.phar require fabpot/goutte:~1.0

.. tip::

You can also download the `Goutte.phar`_ file:

.. code-block:: php

require_once '/path/to/goutte.phar';

Usage
-----

Create a Goutte Client instance (which extends
``Symfony\Component\BrowserKit\Client``):

.. code-block:: php

use Goutte\Client;

$client = new Client();

Make requests with the ``request()`` method:

.. code-block:: php

// Go to the symfony.com website
$crawler = $client->request('GET', 'http://www.symfony.com/blog/');

The method returns a ``Crawler`` object
(``Symfony\Component\DomCrawler\Crawler``).

Click on links:

.. code-block:: php

// Click on the "Security Advisories" link
$link = $crawler->selectLink('Security Advisories')->link();
$crawler = $client->click($link);

Extract data:

.. code-block:: php

// Get the latest post in this category and display the titles
$crawler->filter('h2.post > a')->each(function ($node) {
print $node->text()."\n";
});

Submit forms:

.. code-block:: php

$crawler = $client->request('GET', 'http://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
$crawler->filter('.flash-error')->each(function ($node) {
print $node->text()."\n";
});

More Information
----------------

Read the documentation of the BrowserKit and DomCrawler Symfony Components for
more information about what you can do with Goutte.

Technical Information
---------------------

Goutte is a thin wrapper around the following fine PHP libraries:

* Symfony Components: BrowserKit, ClassLoader, CssSelector, DomCrawler, Finder,
and Process;

* `Guzzle`_ HTTP Component.

License
-------

Goutte is licensed under the MIT license.

.. _`Composer`: http://getcomposer.org
.. _`Goutte.phar`: http://get.sensiolabs.org/goutte.phar
.. _`Guzzle`: http://docs.guzzlephp.org