Scraping Backlinks With Advance Google Search Operators and Python

In this article, you'll learn how to use Google search operators to automatically find backlinks with Python.

Profile picture of Tuan Nguyen
Tuan Nguyen
Cover Image for Scraping Backlinks With Advance Google Search Operators and Python

    One tactic to quickly find high Domain Authority sites that have a history of linking to pages discussing your keyword is using advanced Google search operators.

    In this article, I'll show you how to use Google search operators to find backlinks with Python.

    We will create a Python script to scrape and export a list of guest post opportunities that you can use to build a list of high-quality backlinks. This guide assumes you've already completed keyword research and have identified competitors that rank well in the search results using rank tracking software for these queries.

    This method can also be used to gather competitor backlinks for analysis.

    This guide builds on how to scrape Google with Python and reuses some of the code, especially for parsing Google search results. The requirements are Python3+, two Python libraries, requests, and bs4.


    To install the two libraries, run the command:
    pip install requests bs4

    We will call our script scrape.py
    To use both libraries in our script, we need to import them.

    import requests
    from bs4 import BeautifulSoup

    When we want to find backlinks opportunities for guest posting, we need to find websites that allow guest posting or user-generated content.

    We will need to use search operators and footprints to find opportunities. Some examples of footprint are:

    • "write for us"
    • "guest blogger"
    • "become a guest blogger"
    • inurl:guestbook.html

    Note: To find related websites to your topic, you can combine the footprint with your keywords. Use the Serply Footprints repo for footprint data for guest posts on many platforms.

    So the next part of our script will build a Google formatted query that combines footprint with our keyword. The function will take in two arguments, a footprint and a keyword. A huge part of this function is just parings Google results and returning them into a list.

    def query(footprint: str, keyword: str):
        """
        return the results based on footprint and keywords
        :param footprint:
        :param keyword:
        :return:
        """
        # Desktop user-agent
        USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
        # Mobile user-agent
        MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"
        HEADERS = {"user-agent": USER_AGENT}
    
        step = 0
        results = []
    
        # Find the top 300 results for this particular footprint and keyword
        # Google limits the results to 300
        for start in range(0, 300, 100):
            query = {
                "q": "{} {}".format(footprint, keyword),
                "num": 100,
                "start": start,
            }
    
            query = urllib.parse.urlencode(query)
            url = "https://google.com/search?{query}"
    
            resp = requests.get(url, headers=HEADERS)
    
            # Check for captcha
            if resp.status_code == 200:
                # use beautifulsoup to parse the html code
                soup = BeautifulSoup(resp.content, "html.parser")
                gs = soup.find_all("div", class_="rc")
                # grab all results title, link, and description
                for g in gs:
                    anchors = g.find_all("a")
                    if anchors:
                        link = anchors[0]["href"]
                        title = g.find("h3").text
                        item = {"title": title, "link": link}
                        results.append(item)
                if len(gs) < 10:
                    return results
            elif (resp.status_code == 429) or (
                "Our systems have detected unusual traffic from your computer network."
                in resp.content
            ):
                results.append(ValueError("Ran into captcha. Please use a proxy."))
                return results
    
        return results

    That's it. Simply call the function query to grab the top 300 results for any combination of footprint and keywords.

    The next step will be to create a large list of footprints and loop through the list, and find thousands of backlinks from guest posting, directories, bookmarks, edu, pingback, microblogs, indexers, and many more.

    This advanced Google search operators list is powerful if used correctly. The above Python scripts are especially useful for SEO.

    However, these advanced operators come at a cost. After a few queries with Google advanced search operators, you will be faced with CAPTCHAs which gets annoying fast and limits the number of backlinks you can scrape using the above method.

    If you plan on doing a lot of advanced searches, I recommend using proxies or services like Serply for bulk searches. Serply provides a service for unlimited Google Search requests without having to deal with annoying CAPTCHAs.