available resources
Overview
Language Resources, Corpora
Timed Muc7
MUC7_T consists of 100 articles from the MUC7 corpus training set reannotated for named entities (persons, locations and organizations) with a time stamp indicating the time measured for the linguistic decision making process. The corpus was developed for two principal purposes: for use in evaluations of selective sampling strategies, such as Active Learning; and to create predictive models for annotation costs. The annotation was performed by two advanced students of linguistics with good English language skills who followed the the original guidelines of the MUC7 named entity task (which can be found in the online documentation for the MUC7 corpus).
get it from the Linguistic Data Consortium (LDC)
Software
UIMA
I am a co-author of the jCoRe NLP toolsuite, a collection of UIMA-compliant NLP components, as well as a generic UIMA type system.
AL Framework
A generic framework for Active Learning for Natural Language Processing tasks. Ask for download.