LinkedIn is a great place to find leads and engage with prospects. To engage with potential leads, you'll need a list of users to contact.

However, getting a list is difficult without some scraping knowledge.

You can search Google for potential LinkedIn users and company profiles using the following script.

Tools Required

You'll need Python 2.7+ and some packages to get started. Once you install Python, you can run the following command to install the necessary packages.

pip install requests

LinkedIn Scraper Script

First, we need to import all the packages that we need.

These packages are used for randomizing the user-agent and making the requests. Then regex is used to parse out the LinkedIn profiles and links.

import random 
import argparse 
import requests 
import re

We create a LinkedinScraper class that tracks and holds the data for each request.

The class requires two parameters keyword and limit.

The keyword parameter specifies the search term. The limit parameter sets the max amount of links to search for.

class LinkedinScraper(object):
  def __init__(self, keyword, limit):
      """
      :param keyword: a str of keyword(s) to search for
      :param limit: number of profiles to scrape
      """
      self.keyword = keyword.replace(' ', '%20')
      self.all_htmls = ""
      self.quantity = '100'
      self.limit = int(limit)
      self.counter = 0

The LinkedinScraper class has three main functions, search , parse_links, and parse_people.

The search function will perform the requests based on the keywords. It first generates a URL that is Google specific query based on the keyword and limit. Then it makes the requests and saves all the HTML into self.all_htmls.

def search(self):
    """
    perform the search
    :return: a list of htmls from Google Searches
    """
    
    # choose a random user agent
    user_agents = [
        'Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36',
        'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0) chromeframe/10.0.648.205',
        'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1500.55 Safari/537.36',
        'Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6',
        'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.19 (KHTML, like Gecko) Ubuntu/11.10 Chromium/18.0.1025.142 Chrome/18.0.1025.142 Safari/535.19',
        'Mozilla/5.0 (Windows NT 5.1; U; de; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 Opera 11.00'
    ]
    while self.counter < self.limit:
        headers = {'User-Agent': random.choice(user_agents)}
        url = 'http://google.com/search?num=100&start=' + str(self.counter) + '&hl=en&meta=&q=site%3Alinkedin.com/in%20' + self.keyword
        resp = requests.get(url, headers=headers)
        if ("Our systems have detected unusual traffic from your computer network.") in resp.text:
            print("Running into captchas")
            return
    
        self.all_htmls += resp.text
        self.counter += 100

The parse_links function will search the HTML and perform regex parsing to extract all the LinkedIn links.

def parse_links(self):
    reg_links = re.compile(r"url=https:\/\/www\.linkedin.com(.*?)&")
    self.temp = reg_links.findall(self.all_htmls)
    results = []
    for regex in self.temp:
      final_url = regex.replace("url=", "")
      results.append("https://www.linkedin.com" + final_url)
    return results

Similarly, the parse_people function will search the HTML for their name and title.

def parse_people(self):
    """
    :param html: parse the html for Linkedin Profiles using regex
    :return: a list of
    """
    reg_people = re.compile(r'">[a-zA-Z0-9._ -]* -|\| LinkedIn')
    self.temp = reg_people.findall(self.all_htmls)
    print(self.temp)
    results = []
    for iteration in (self.temp):
        delete = iteration.replace(' | LinkedIn', '')
        delete = delete.replace(' - LinkedIn', '')
        delete = delete.replace(' profiles ', '')
        delete = delete.replace('LinkedIn', '')
        delete = delete.replace('"', '')
        delete = delete.replace('>', '')
        delete = delete.strip("-")
        if delete != " ":
            results.append(delete)
    return results

This is an example of using the class to search for 500 profiles for the Tesla company.

ls = LinkedinScraper(keyword="Tesla",limit=500)
ls.search()
links = ls.parse_links()
profiles = ls.parse_people()

This is quite a simple script, but it should be a good starting point. However, it doesn't include error and captcha handling when making too many requests to Google.

You can find the complete code at https://github.com/serply-inc/python-linkedin-scraper.

Making too many requests to Google will result in getting your IP blocked. Please use proxies when running this script.

Or check out Serply's API docs https://docs.serply.io/ on performing searches without getting blocked.

How to Scrape Data from LinkedIn with Python

Tools Required

LinkedIn Scraper Script

Related Posts