How to find HTML elements by multiple tags with DOM Crawler?
Learn to use DOM Crawler’s filter method to simultaneously search HTML elements by multiple tags, as shown in this practical code example.


You can search for HTML elements by multiple tags in DOM Crawler by using the filter method alongside a CSS selector that lists several tag names, each separated by a comma. Here is a code sample: it uses Guzzle to load the Serply.io homepage and then uses the DOM Crawler to display the text from all h1 and h2 tags.
Here is a detailed explanation of it:
Start by fetching the webpage you're interested in. We'll use the Guzzle HTTP client for this. It’s like using a digital browser to go to a website and grab what’s on there.
<?php
$client = new \GuzzleHttp\Client();
$response = $client->get('https://serply.io/');
$html = (string) $response->getBody();
?>Next, load the fetched HTML into DOM Crawler. This step prepares the HTML so you can search through it.
<?php
$crawler = new Crawler($html);
?>Now, use the 'filter' method to find all elements tagged as h1 or h2. This is where you specify what you're looking for.
<?php
$headings = $crawler->filter('h1, h2');
?>Finally, loop through the found elements and display their text content. This part is like reading through each section you’ve pinpointed and noting down what it says.
<?php
foreach ($headings as $element) {
echo $element->textContent . PHP_EOL;
}Summing It Up
This walkthrough demonstrates how DOM Crawler, combined with the Guzzle HTTP client, can be a powerful tool for web scraping. By utilizing the 'filter' method with appropriate CSS selectors, you can simultaneously target multiple HTML tags—like h1 and h2 in this case. This makes extracting specific sections from web pages straightforward and efficient, streamlining your web scraping tasks. For those interested in expanding their data extraction capabilities beyond HTML content, exploring the Google Images API can provide additional tools for processing image data, while the Google SERP API offers powerful options for handling search engine results more effectively.