In dMetrics we analyze online users’ chatter. Our algorithms process tons of data and discover users’ decisions, decision triggers and motivations. For example, we know what kind of users prefer product X to product Y, why they switch between products and when.
In this post I would like to discuss web crawling – one of our ways to collect the data. The crawlers are based on an open-source project Scrapy (version 0.16.3). Scrapy is written in Python on top of Twisted framework. This framework is famous for its asynchronous programming model. This model is especially advantageous for I/O-intensive applications: I/O calls are non-blocking and the framework gains a great deal of concurrency without creating numerous threads.