Content Retrieval and Extraction for Advanced Tutoring Environments

The Situation

The US Army’s ability to rapidly adapt and learn affects its ability to outpace its adversaries. The rapid rate of technological change makes the Army’s capacity for accelerated learning even more important, so Warfighters can embrace and benefit from new technologies. Currently, training is “one size fits all” and does not consider that individual learners respond differently to material and training techniques, and that individuals with the same position, job title, or mission may have significantly different duties. Additionally, training content is often stored in keyword-indexed document stores, allowing a trainee to recover all relevant documents containing the specified keywords, ordered according to some sort of relevance score, for example, within-document frequency of the keyword. But many training irrelevant documents can be returned this way, because word count is not the only measure of relevance. In addition, when the document contains homonyms of query terms, false positives are reported. For example, a keyword search for “tank” returns both documents about a fuel “tank” and an M1A1 Abrams “tank.” Finally, relevant documents may not be retrieved, when query terms in keyword searches are synonyms or more specific than the document terms. At the heart if these issues is a reliance on simple word-based indexing, with no consideration of the semantics of the material and the real needs of the trainee.

The Charles River Analytics Solution

Scientists and software engineers at Charles River Analytics created the Content Retrieval and Extraction for Advanced Tutoring Environments, or CREATE, to improve the delivery and generation of training content. CREATE reasons about both learner-specific and mission-specific needs to customize training by position, such as job title and responsibilities. It adapts to different mission requirements in terms of skills and training, even for individuals with the same position. CREATE also models how available training material relates to those skills, missions, and positions, which enables it to generate and deliver the appropriate training to each individual. CREATE automatically ingests and analyzes training content and uses semantic technologies to reason about the relationships between trainees, their missions, their needs, and how training content can be customized to meet these learning requirements. Recommender technology such as collaborative filtering, as well as semantic similarity metrics, are used to determine which training content and topics are most likely to be useful to each individual trainee. Finally, a filter-based search engine enables trainees to intuitively search for material via powerful semantic queries, without having to learn a semantic querying language. This combination of features enables CREATE to deliver more effective training at lower cost, time, and bandwidth. The figure below shows the filter-based search engine, with a single action filter applied (bottom left). The action filter consists of a verb, “inspect,” followed by a category (the JETDS “Radio” designation), and a specific refinement of that category (the AN/PRC-90 radio.) A single document is returned (bottom right.) At the top, the crowd-sourcing system asks the user to validate an automatically extracted semantic relationship.


CREATE filter-based search 

The Filter-based Search Engine

This figure below shows a selection of semantic information that has been extracted from the document returned from the filter-based search. Additional references have been extracted, as well as a list of known equipment names and model numbers. Additionally, the actions discussed in the document (e.g., inspection, repair) and the equipment those actions apply to (the AN/PRC-90) have been extracted (not shown in the figure.)

CREATE screenshot 

Semantic Information extracted from a training document

The Benefit

Semantic learning management, as demonstrated by CREATE, has the potential to lower costs and increase efficiency of training across the Army and other government agencies. Semantic learning management systems can improve training authoring and delivery, and make on-demand training more available while deployed. Industry, academia, and K-12 education markets can also benefit from the reduced costs and improved adaptability of training and education systems based on this technology.


For more information about CREATE or our other semantic technology capabilities, contact us

The research reported in this document/presentation was performed in connection with contract W911QX-13-C-0072 with the U.S. Army Research Laboratory.  The views and conclusions contained in this document/presentation are those of the authors and should not be interpreted as presenting the official policies or position, either expressed or implied, of the U.S. Army Research Laboratory or the U.S. Government unless so designated by other authorized documents.



Contact Us