Niocchi is a free and open source asynchronous crawl library implemented with NIO.
Niocchi is designed to crawl several thousands of hosts in parallel on a single low end server. Most of the Java crawling libraries use standard synchronous Java IO. That means crawling N documents in parallel requires at least N running threads.
Even if each thread is not taking a lot of resources while fetching the content, that approach becomes costly when crawling at a large scale.
On the contrary, doing asynchronous I/O by using the NIO package introduced in Java 1.4 allows the crawling of many documents in parallel using one single thread.