Bixo
Bixo is an open source Java crawler that runs as a series of Cascading pipes. It is designed to be used as a tool for creating customized crawlers, thus each Cascading pipe implements a discrete operation. By
building a customized Cascading pipe assembly, you can quickly create specialized crawlers that are optimized for a particular use case.
Bixo borrows heavily from the Nutch project, as well as many other open source projects at Apache and elsewhere.
Bixo is an open source project released under the MIT License.
Questions? Contact Stefan Groschupf sg@101tec.com