Arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008. Arachnode.net is a complete and comprehensive .NET web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages. Its features include
- .NET architecture
- Configurable Rules and Actions
- Lucene.NET Integration
- SQL Server 2008 and full-text indexing
- .DOC/.PDF/.PPT/.XLS Indexing
- HTML to XML and XHTML
- Multi-threading and Throttling
- Respectful Crawling
- Analysis Services
- SQL Server 2008 and SSIS
- EXIF data extraction
http://arachnode.net/
License:
Tech:
Tags: