According to Incapsula, more than four percent of Googlebots are imposters that act as genuine Google crawlers, trying to wreak havoc through DDoS attacks, spamming, scraping, and other malicious activities.
In some recent research, Incapsula observed more than 400 million search engine crawls for over 10,000 sites. This gave them approximately 2.19 billion crawls over a time span of 30 days.
The study had many objectives, and one of them was to inspect the crawl pattern of the Googlebot. The data came from a diverse set of samples. The study found that the average visit rate of Googlebot for one website is 187 visits per day and four pages per visit.
What the Real Googlebot Does
Googlebot is the most active crawler with 60.5 percent page crawls, and the Bingbot comes at the second position with 24.5 percent crawls. Several correlation experiments were conducted to test the hypothesis that popular sites get more crawls. The study involved daily human visitors, and the results were surprising. Contrary to popular belief, there is no direct correlation between the popularity of a website and the number of crawls it gets. This proves that Google actually does not play favorites.
This is a great discovery for SEO professionals as they don’t have to worry about the crawl rate anymore. Since the number of crawls is not proportionate to the popularity of a website, they can focus on other factors.
Another interesting factor about the behavior of a Googlebot was that it crawled most thoroughly through those websites that were frequently updated, such as forums, blogs, and news websites. This shows that Googlebot prefers fresh content and well-structured websites and this could be another helpful tip for SEO professionals as it reinstates their belief that fresh content is indeed a major key in search engine optimization.
Googlebot mostly originates from the US, as expected. However, it also originated from France, UK, Denmark, Belgium, and China in many cases.
Let’s Analyze the Phony Googlebot
Almost 4 percent of Googlebots are not what they seem to be. These imposters use the same HTTPS (S) user agent as used by Googlebots. A user agent is an identification mark for the website visitor – whether it is a human visitor or a boat.
Why Use Fake Googlebots
When a bot uses Google’s credentials, it gains unrestricted access to almost all websites. There are many websites that block suspicious network activity, but they don’t block Googlebots because otherwise their website would disappear from Google. This would be like virtual death, so fake Googlebots use these credentials like a VIP pass to gain unrestricted access to several websites.
This gives security relaxation to fake bots, and hackers can exploit this shortcoming to play with the vulnerability of the website.
On further investigation, the study found that almost 34.3 percent of fake bots were clearly malicious, and 23.5 percent of the malicious bots were programmed for Layer 7 DDoS attacks. DDoS attacks are the perfect attacks for hackers in this case, because fake bots use Google credentials and are free to crawl the website. As most websites do not inspect every visitor for valid credentials, most lie bare for the hackers to exploit.
When a DDoS attack takes place, there is not much the webmaster can do. They can either block all Googlebots and cut back their organic traffic, or allow all the fake bots, and suffer web downtime. A little downtime might not be very harmful, but DDoS attacks can last for months; and since both alternatives are not feasible for the webmaster, the attack is basically a success.
The Origin of Fake Googlebots
Fake Googlebots generally come from botnets. Botnets are a group of compromised devices; and many times, the users who are in a botnet don’t even know that they are working on a compromised device. These devices are essentially Trojan affected computers that people use. The main locations from where these fake bots originate are China, US, India, and Turkey.
Fake Googlebots are not new, and they are, in fact, extremely popular among hackers for DDoS attacks. Since the average webmaster does not possess the necessary tools to counter these attacks, this technique is gaining even more popularity.
This can be distressing news for webmasters, but there are ways to distinguish genuine Googlebots from fake ones. There are techniques like ASN and IP verification. Using these methods, they can identify bots through their location or area of origin. However, these techniques require high software abilities and processing power that are not available to the average webmaster.