Storm Crawler - Web crawler SDK based on Apache Storm
StormCrawler is an open source collection of resources for building low-latency, scalable web crawlers on Apache Storm. StormCrawler is a library and collection of resources that developers can leverage to build their own crawlers. The good news is that doing so can be pretty straightforward. Often, all you'll have to do will be to declare StormCrawler as a Maven dependency, write your own Topology class (tip : you can extend ConfigurableTopology), reuse the components provided by the project and maybe write a couple of custom ones for your own secret sauce.
http://stormcrawler.net
https://github.com/DigitalPebble/storm-crawler
http://stormcrawler.net/
License:
Tech:
Tags: