Resources
Doug Oard's list of available text retrieval systems

Collections
U. of Glasgow list of available text retrieval collections
Internet archive (limited availability)
Dot gov at CSIRO (part of TREC)
Linguistic Data Consortium

Open Source Search Engines
ht://Dig
Swish-e

Crawlers
Heritrix

Other Resources
A stop list (also known as a list of stop words)
IR resources (Mark Sanderson)
Cross-language information retrieval (CLIR)
WebIR
Search Engine Watch
Open Directory: Information Retrieval
Chris Manning's NLP resources