Growth hacking and web scraping may sound like trendy buzzwords, but both have been around for quite a while on their own. In fact, web scraping goes back almost to the time that the internet was born – but it’s become a valuable technique for growth hacking your online business.
Growth hacking is a business strategy that focuses solely on growth as a metric, and so incorporates strategies such as web scraping that can allow an online business to expand as quickly as possible. Growth hacking is very popular amongst tech startups and SMBs.
Web scraping is a technique where robots, similar to crawlers used by search engines, will scrape websites for the type of data you want compiled into a database for later reference. For example, a retail website can use a web scraper to put together a database of product prices on competitor websites, and have the database updated in real-time to show price drops and market demand.
If you’re interested in using web scraping as a growth hacking technique, we’ve put together a list of things you should be aware of.
There are numerous libraries and platforms for web scraping
Web scrapers are commonly coded in Python language, and there are a lot of libraries available for coders to get started. Some of the most well-known libraries include:
- BeautifulSoup
- Pandas
- Selenium
There are also plenty of software-as-service type platforms that can do web scraping for you, and also services that offer features such as proxy rotation, which you can check out on this list of proxy scrapers.
Web scraping is legal, but you should still be careful
Web scraping as a technique has caught a lot of negative attention lately, for numerous reasons. That doesn’t mean web scraping is bad, just that there are bad ways to use it.
The legality of web scraping was ultimately decided in the 9th Circuit Court of Appeals, in hiQ Labs v. LinkedIn. HiQ Labs had been scraping publicly available profiles on LinkedIn, and LinkedIn tried to claim it was a violation of CFAA (Computer Fraud and Abuse Act), equivalent to hacking. The court disagreed that HiQ Labs’ scraping was a violation of the CFAA, but did mention that it’s possible other kinds of violations, like copyright infringement, were possible.
Numerous other court cases related to web scraping have been popping up, such as the 11th Circuit of Appeals finding that web scraping “may constitute trade secret misappropriation”. That case has many variables to consider, so it’s not exactly black-and-white. Which is exactly our point – web scraping is currently in a bit of a grey area in regards to legality, depending on how you conduct your scraping and for what purpose.
Website owners may not like it for technical reasons
There’s also ethical considerations of web scraping, such as the strain placed on a server by too many bot requests. Webmasters typically include robots.txt files in their website, which are instructions for web crawlers on how to proceed (or not proceed) with their web crawling activities. Following the instructions given in the robots.txt file is a good way of making sure you don’t get your web scraper blocked by the webmaster.
If too many bots are scraping a website, and are sending requests faster than a human normally would, then the web scrapers could basically end up accidentally crashing the website similar to a DDoS attack.
Also, web scrapers may ignore a website’s ToS, or promote methods of getting around blocks, such as rotating IP addresses.