How to scrape tables with DOM Crawler?

Learn how to methodically extract table data using DOM Crawler’s filter and each method, demonstrated with an easy-to-follow code example.

Profile picture of Tuan Nguyen
Tuan Nguyen
Cover Image for How to scrape tables with DOM Crawler?

To scrape tables with DOM Crawler, include the Symfony DOM Crawler package in your PHP script. Define your HTML with the table data, then use a new Crawler instance to load the HTML. Utilize the filter method to select the table and iterate over each row and cell using nested filter calls. This approach allows you to systematically extract data from each cell of the table by leveraging CSS selectors and PHP's iteration techniques, making DOM Crawler a powerful tool for data extraction from HTML structures.

Read this article further to understand more in depth

Step-by-Step Code Breakdown

Here’s how to scrape information from a simple HTML table using DOM Crawler:

First, include the DOM Crawler package from Symfony:

<?php
use Symfony\Component\DomCrawler\Crawler;
?>

Next, define your HTML. Here's a sample table that includes names, ages, and occupations:

<?php
$html = <<<EOD
  <table>
    <tr>
      <th>Name</th>
      <th>Age</th>
      <th>Occupation</th>
    </tr>
    <tr>
      <td>Yasoob</td>
      <td>35</td>
      <td>Software Engineer</td>
    </tr>
    <tr>
      <td>Pierre</td>
      <td>28</td>
      <td>Product Manager</td>
    </tr>
  </table>
EOD;
?>

Load your HTML into the crawler:

<?php
$crawler = new Crawler($html);
?>

Find the first table in your document:
<?php
$table = $crawler->filter('table')->first();
?>

Go through each row in the table:

<?php
$table->filter('tr')->each(function ($row) {
    // And now, for every row, go through each cell:
    $row->filter('td')->each(function ($column) {
        // And print out the cell's text:
        echo $column->text() . PHP_EOL;
    });
});
?>

Conclusion

This straightforward example shows you the ropes on pulling data from HTML tables with DOM Crawler. By utilizing CSS selectors, and the filter and each method, you systematically work through rows and cells. This turns DOM Crawler into a useful instrument for sifting through structured data efficiently. For those looking to enhance their web scraping capabilities further, especially in navigating and extracting data from complex web structures, consider exploring the Google Crawl API for robust web crawling solutions. Additionally, the Google SERP API offers advanced techniques for processing search engine results, aiding in comprehensive data analysis projects.