An ever-increasing amount of internet traffic is generated by non-human users. Increasingly, websites are facing large volumes of traffic caused by automated scripts, or bots. According to Incapsula’s 2016 report, 51.8% of web traffic was caused by bots – total bot traffic now exceeds total human traffic. All this traffic can be a significant drain on available bandwidth for websites, and cause issues for real customers trying to access your site. Bot traffic (as a percentage of total traffic) has been increasing year on year, and if the problem is left unmanaged, there will gradually be less and less bandwidth available for customers. On top of this, these bots may be doing things that you don’t want e.g. a competitor may be using a bot to scrape pricing data from your website.
Bad bots everywhere
Today’s hackers use bad bots to launch pre-attack scans, post comment spam, exploit vulnerabilities, execute code injection attacks, denial of service attacks, and password guessing hacks against your web facing properties. These bots commit fraud by credential stuffing, repetitively making and canceling purchases, holding and/or consuming inventory, scraping sites, stealing information, and a host of other unwanted activities.
Price and data scraping are rampant
Bad bots scrape prices and product data as well as perform click fraud, putting the overall security of e-commerce websites, customer loyalty, and brand reputation at risk. Of all the bad bot threats, scraping bots are the most rampant and costly to e-commerce businesses. These bad actors seek to scrape information from legitimate online retailers to gain product, inventory, and pricing intelligence that can be used by their competitors. This has spawned an entire industry of opportunistic data scraping enterprises.
These are the five most common types of data scraping agents
- Price Scraping – Bots target the pricing section of a site and scrape pricing information to share with online competitors
- Product Matching – Bots collect and aggregate hundreds, or thousands, of data points from a retail site in order to make exact matches against a retailer’s wide variety of products
- Product Variation Tracking – Bots scrape product data to a level that accounts for multiple variants within a product or product line, such as color, cut and size
- Product Availability Targeting – Bots scrape product availability data to enable competitive positioning against an online retailer’s products based on inventory level and availability
- Continuous Data Refresh – Bots visit the same online retail site on a regular basis so that buyers of the scraped data can react to changes made by the targeted retail site.
Is blocking bots the answer?
So what can online businesses do to alleviate the situation? The seemingly simple solution to the problem would be to block all bot traffic. However, this would be neither effective nor desirable. A robots.txt file will stop well-behaved search engine crawlers from accessing your site, but will do nothing to any that ignore this, as “bad bots” will. Blocking all bots might seem like a good solution, but it overlooks a fundamental fact – not all bots are bad.
Good bots help the world go round
For every bad bot which you wish to block, there is a good bot from which you are benefitting. For example, just over 4% of all web traffic is created by Facebook’s mobile feed fetcher, which fetches your website and presents it within the Facebook mobile app. If the bot is blocked from accessing your site, visitors who follow Facebook links will be presented with an error. A strong social media presence is an important part of any successful company and can be a powerful tool to drive traffic to your website. But if curious potential customers can’t access your site through their social media apps, then they will never be more than potential customers.
Search engine crawlers are another type of good bot. In the modern business world, with consumers having so much information at their fingertips, it’s vital to be high up in search engine rankings. But if search engine’s can’t look through your website, they won’t be able to accurately rank your site based on its contents, and you could miss out on potential business opportunities.
Bot management solutions
An ideal solution to the problem, therefore, needs to be able to distinguish between good bots and bad bots. Good bots should be monitored, to ensure that they don’t consume too much bandwidth, but otherwise left to access the site as necessary. Bad bots should be blocked or carefully managed.
The best method of removing the vast amounts of unwanted traffic generated by bad bots is to eliminate the traffic in the cloud, upstream, before it ever reaches your websites. Some cloud security operators have the ability to eliminate bad bot traffic in their clouds, while others do not.
As the number of devices on the Internet doubles in the next four years (much of it being IoT), the amount of bad bot traffic is going to escalate to unprecedented levels. This will impact online businesses, Internet service providers, hosting providers as well as cloud application providers. Transporting increasing amounts of bad bot traffic across their infrastructures will certainly force providers to increase their capacity, or devise ways of eliminating malicious bot traffic closest to its source.
Whether we like it or not, bots now account for the majority of web traffic, and this is unlikely to change in the future – so get bot ready!
activereach offers a flexible Bot Manager platform that is easily deployed and continuously managed. It ensures an optimal security profile to protect websites and applications from malicious bots, without sacrificing performance. It is hosted in the cloud, so there’s no new hardware to install. The solution includes real-time dashboard, reporting, analytics, and alerts to provide rich insights into the bots coming to your websites and applications.