Web scraping, also known as web data extraction, is the process of collecting data from websites automatically using bots or scripts. With the rise of big data analytics, web scraping has become a popular technique to gather large amounts of data from the web. However, scraping raises some legal concerns, especially when it comes to a search engine like Google. So is it legal to scrape Google search results? Let’s take an in-depth look at the legality, limitations, and best practices around Google scraping.
The Legality of Scraping Google Search Results
Google’s Terms of Service
Google’s Terms of Service expressly prohibit scraping their services without permission:
You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means.
This clause bans scraping Google directly without their consent. However, it does not cover scraping the organic search results shown on Google Search pages.
Copyright Law
Google search results are dynamically generated from Google’s index and algorithms. The organic listings themselves don’t have copyright protection.
However, scraping other data from the SERP (search engine results page) like images, snippets, and knowledge panels may violate Google’s copyrights.
The Computer Fraud and Abuse Act
The CFAA prohibits “unauthorized access” to computer systems. Violating a website’s ToS can constitute unauthorized access.
However, the Ninth Circuit Court ruled in hiQ v LinkedIn that scraping publicly available data is not a CFAA violation even if it breaches the ToS.
So scraping Google search results likely does not violate the CFAA since the organic listings are publicly accessible.
Database Rights
In the EU, Google may claim database rights over their collection of web pages. Scraping substantial portions of their database could infringe on these rights.
Summary of Legality
- Scraping organic Google search results does not clearly violate Google’s ToS, copyright law, the CFAA, or database rights. It exists in a legal gray area.
- Scraper sites have faced takedowns but not definitive legal consequences. Google aims to disrupt scraping but prefers algorithmic solutions.
- Scraping at a low volume is unlikely to cause issues. Large-scale scraping brings risks.
Google’s Stance Against Scraping
While not definitively illegal, Google strongly discourages scraping their results. Here are some of the actions they’ve taken against scrapers:
- Sending cease & desist letters to scraper sites
- Revoking API keys
- Blacklisting IP addresses
- Blocking bots via Captchas and other protections
- Adjusting algorithms to devalue scraped content
Scraper sites like RapGenius and FareHarbor have been taken down or switched to licenced data due to Google complaints.
Google also emphasizes web scraping violates the spirit of their Terms of Service. While they may tolerate light scraping, they can crack down if it threatens their business model.
Limitations of Scraping Google Directly
Even if the legal coast is somewhat clear, scraping Google search directly has some drawbacks:
- Blocks and bans – Continuous scraping from one IP risks getting blocked. Proxies help but add overhead.
- Captchas – Google may insert “I’m not a robot” tests to block bots. These require manual solving services.
- Difficulty scaling – Large scrapers need robust infrastructure to handle thousands of proxies, browsers, and CAPTCHA solvers.
- High latency – Browser automation is slow, taking a second or more per result. Hard to scrape millions of results.
- No advanced features – Scraping SERPs directly lacks mobilized data, trends, suggest, autocomplete, and other Google features.
So while possible, scraping Google at scale takes significant technical resources. This motivates some to seek alternative data sources.
Legal Alternatives to Scraping Google Directly
The ecosystem around Google offers some scraping alternatives:
Search Engine API Providers
Companies like SerpApi, Moz, Sigma and others license search data from Google. This gives legal access to SERP data including organic results.
The tradeoff is cost, with prices starting around $30/month for limited queries. Still, APIs enable scalable and reliable access to Google results.
Google Product Forums
Google created the Google Product Forums for discussing search behavior and results. The group has over 20 years of archives.
Scraping historical posts provides organic search data legally. The tradeoff is the manual effort to extract insights.
Browser Extensions
Extensions like SearchResultScraping allow individual-use scraping of Google results. This is technically against Google’s ToS but essentially unenforceable at small scale.
The downside is collecting large datasets requires manually browsing with the extension enabled.
Public SERP Databases
Some researchers have collected organic results from Google and published the datasets publicly for analysis.
For example, Cerebro contains 31M Google results from 2020. Utilizing public SERP data avoids scraping directly.
The limitation is relying on others to collect and share data. Public datasets often have size, date range, and topic restrictions.
Aggregators
Sites like SEM Rush perform their own Google searches and sell aggregated keyword data. This shifts the scraping burden.
However, aggregators can be costly and lack query-level data. Their databases also have hidden gaps not in Google’s index.
Best Practices for Legally Scraping Google
If you do opt to scrape organic Google results directly, some tips:
- Scope it narrowly – Target a specific niche with clear purpose vs bulk data collection.
- Avoid republishing full results – Don’t directly copy chunks of Google’s work.
- Mention it’s scraped – State the data comes from Google when publishing analyses to give credit.
- Delay between queries – Use 6-8+ second delays to avoid detection as a bot.
- Limit requests – Keep daily volumes under Google’s unofficial scraping thresholds.
- Proxy properly – Rotate different IPs through legitimate residential proxies.
- Solve CAPTCHAs – Use a service to automatically solve tests when prompted.
- Check ToS changes – Periodically verify Google hasn’t further restricted scraping.
- Consult qualified counsel – Always get professional legal advice about your specific use case.
Scraping within reasonable limits, using the data judiciously, and staying up-to-date on policies will help avoid issues.
Scraping Google Trends Data
Google Trends provides aggregated search volume data. Unlike the core search engine, Google offers Trends as a public API allowing unlimited use within its scope.
So scraping Trends data is clearly permitted by their terms. The API provides worldwide search volumes by keyword with various filters.
Trends has some limitations in that it doesn’t provide absolute volumes or numbers down to city-levels. But legally accessing search volume data opens many possibilities.
Some examples of permissible uses of Trends data:
- Analyzing seasonal patterns for certain queries
- Comparing search popularity across different terms
- Finding related keywords by similar trends
- Identifying trending searches by locations
- Predicting future search patterns
The Google Trends API has generous free quota allowing 500 queries per project per day. So harvesting Trends data scales easily without proxies or CAPTCHAs.
Scraping Google Patent Search Results
Google provides the ability to search over patents from USPTO and beyond via Google Patents. Much like organic web results, scraping patent information raises similar questions.
Google’s Terms of Service for Patents states:
You agree not to access the Services by any means other than through the interfaces Google provides. Specific prohibited means include scraping.
So generally, scraping Google Patents seems to violate their ToS. However, patent data itself is public information. USPTO bulk data is available free here for download.
A reasonable case could be made that scraping just the bibliographic information of patents that is available elsewhere is permissible, similar to Google search results. Still, consult an attorney before large-scale patent data extraction.
Light scraping of titles, abstracts, classification codes and other patent fields for private research use seems unlikely to cause issues if done responsibly. As always, check Terms for updates.
Is Scraping Google Necessary vs Alternatives?
With the rise of structured data, APIs, public datasets, aggregators and licenced feeds, scraping Google may not be as necessary as it once was.
There are legal and stable ways to access:
- Search volume data
- Keyword rankings
- Related terms
- Autocomplete terms
- Product pricing
- Local business info
- Reviews and ratings
- Video transcripts
- Dictionary definitions
- Flight/hotel info
- Weather data
- Trending topics
- And much more…
Many types of data Google aggregates are available without directly crawling their search engine, especially for commercial use cases.
So consider if alternatives can meet your needs before assuming scraping Google itself is required.
Conclusion
Scraping Google search results exists in a legal gray area but with the right precautions, small-scale scraping solely for private analysis of publicly available data seems unlikely to raise issues. However, use extreme caution before scraping without qualified legal guidance.
Google provides official APIs for many of their services that enable legal data extraction without relying on scraping search result web pages directly. Their stance remains firmly against unauthorized aggregation of their search content.
With the many legal alternatives now available though, from licenced feeds to public databases to Google’s own trend and product APIs, scraping their core search engine directly may not be necessary in many cases. Seek expert legal help, use scraped data carefully, stay updated on Terms of Service, and thoroughly explore alternatives before resorting to extracting Google results at scale.