The world’s largest and most progressive companies depend on web scraping, a technology that allows large-scale public online data gathering.
Web scraping allows them getting deep knowledge of the market they are operating in, it is behind the competitor analysis and even influences product assortment.
However, the potential of web scraping remains untapped – many businesses and even public institutions that could benefit from it are yet to discover it, as Julius Černiauskas, CEO of Oxylabs, explains in an exclusive interview with City A.M.
Web scraping now and web scraping ten or twenty years ago are probably two different things. How has web scraping evolved?
The world’s most known web scraper is Google. And so for many years, web scraping and crawling was mainly associated with search engines. It wasn’t used much elsewhere – it was a niche technology, though an important one from the very beginning. However, everything changed with the digitalisation of businesses. Companies became data hungry. They understood the value of information scattered across the web and wanted to get it. Industry by industry, companies started turning towards web scraping, which allowed them to automatically gather large amounts of data.
“The world’s best-known web scraper is Google.”
Julius Černiauskas, CEO of Oxylabs
Yet still, the use of the technology was limited to several use cases. Ecommerce and finance industries were among the first adopters of web scraping as they quickly realised the value they can get from external data. Meanwhile, the data departments of other industries were still very much focused on their internal data. However, the growing competition made many rethink their strategies.
The pandemic has changed the game?
A recent driver for the growth of web scraping was the global pandemic. Many brick-and-mortar businesses were moving online, which caused the competition to get enormous. Data became the only way to keep up with the competition. With the growing use of web scraping, the technology had to keep up to.
Let’s go back a bit, how has the perception of web scraping changed over the years?
First of all, web scraping was long seen as a niche technology for the select few. It was associated with several sectors, like ecommerce, investment banking or the data industry in general. Now, the technology gets into the limelight much more as many different use cases emerge and new industries discover it.
“The perception of web scraping has also been strongly influenced by many scandals around personal data gathering, such as Cambridge Analytica.”
After hearing the word “data”, people tend to associate it with personal or private data first. However, web scraping is all about the public data that is scattered all over the web. For a while, many looked at web scraping with caution. It was new, there was no clear legal regulation, it seemed complicated.
And abuse was rife.
There were some abusers of the technology in its early days that affected the reputation of the industry, as they didn’t respect the data they were collecting. But these wild wild west days are gone and we ourselves invest a lot of our time and effort in educating the industry, our clients and other market players on how to perform ethical web scraping via best practices. As the web scraping industry is still considered new and developing, there’s a lack of quality information about it.
So web scraping has matured?
As web scraping is now used by more companies than ever before, it is becoming way better understood. So many businesses are using it daily and getting crucial insights from it that the value of this technology has become clear. In some industries, it is already considered mainstream.
Many new technologies lack proper regulation. Are there any similar legal challenges when it comes to web scraping?
The situation with web scraping is similar as with all new technologies – it takes time for regulators to catch up. Therefore, there’s still a lack of comprehensive legal regulation specifically for web scraping. Companies that collect web data must be cautious and double-check every step with legal professionals.
Legal checks are required when defining which data to collect, the sources from which it will be collected, how much data is needed and so on. Data is a sensitive issue and must be treated with the utmost care. When gathering the data, you must respect the conditions around it.
So it is mainly a legal issue?
Legal professionals that evaluate the compliance of data gathering processes need to take into account case law, copyright and intellectual property laws and regulations. We understand that it might get complicated, so we do our best to share our knowledge with the industry – for example, by explaining recent developments in certain law cases. Just recently, there was a long anticipated conclusion. The HiQ Labs vs Linkedin case was long followed by the web scraping community and the result was exactly as we had expected – it concluded that US computer hacking law couldn’t be applied to cases of web scraping. So, the distinction between hacking and web scraping, which is obvious to us, was finally put on paper.
Otherwise, no major developments in web scraping regulation were made by the case. Therefore, we are constantly discussing the need for self-regulation in the industry. Until the regulation falls behind, we must safeguard the industry from within.
What are the main benefits of web scraping – what could be achieved with the technology?
Web scraping allows businesses and individuals to quickly gather large amounts of public data from multiple websites. Different information pieces scattered across the web may be of little value of their own, but common patterns can emerge when many of them are put together.
In short, web scraping enables the users of this technology to get or cross-check information in order to draw valuable insights. When the data collection process goes smoothly and fast, people and businesses can then focus on the other part of the data lifecycle – analysing it and getting value from it. The value comes in many different forms, the most obvious being better business decisions. However, not only businesses benefit from web scraping.
“The majority of regular consumers are oblivious to the fact how much convenience web scraping technology brings to them daily.”
Starting from web scraping powered search engines that bring results within seconds, to travel fare aggregators that make vacation planning so much easier, to price comparison websites that help make the most cost-effective decision. Furthermore, web scraping makes academic research more efficient, it allows making incredible discoveries for investigative journalists, scientists, and public institutions. It is a powerful tool.
How do businesses use web scraping?
A typical customer of Oxylabs is a large international company that requires large amounts of public data for their business operations, for example trivago. The type and scope is up to the company itself, as new ways to include external data in their operations constantly emerge.
The ecommerce industry is one of the most heavy users of web scraping. For them, the technology is crucial to understand the trends and consumer behaviour (for example when some product categories become a hit almost in one day). Ecommerce companies collect data for market research, they conduct competitor analysis, try to understand customer sentiment or predict which products will be the most popular.
Financial companies use web scraping tools to quantify and evaluate companies as well as discover new companies. They also rely on the technology for due diligence and risk management.
SEO companies depend on web scraping for keyword research. Marketing executives use web scraping to detect ad fraud.
Meanwhile for some companies, web scraping provides an entire foundation of their business – for example travel fare aggregators or price comparison websites are based on this technology.