About


What is LexChecker?

LexChecker is a web-based corpus query tool that shows how English words are used. Users submit a word into the query box (like a Google search) and LexChecker returns a list of the patterns in which the word is typically used. Each pattern listed for a word is linked to sentences from the British National Corpus (BNC) that show the word occurring in that pattern. The patterns are what we have dubbed 'hybrid n-grams'. These are a uniquely useful form of corpus search result. They can consist of a string of words such as keep a close eye on or gain the upper hand. Or they could contain substitutable slots marked by specific parts of speech, for example run the risk of [v-ing] or stand [noun] in good stead or [verb] a storm of protest (as in raise/spark/cause/create/unleash a storm of protest).

Some terms for what LexChecker finds
The sorts of items that LexChecker lists in its search results have been recognized by many researchers as core to what language users need to master. The items however fall within a wide range of lexico-grammatical phenomena and have been given a wide variety of labels. Here we list some of those labels. While those listed below are not necessarily interchangeable, they all do share some degree of 'family resemblance' to each other. To shift the metaphor, they all occupy that terrain that lies between words and grammar and between regularity and idiomicity.

New: LexChecker now accepts multiword queries. Now, if a user submits a query of more than one word, LexChecker returns the patterns in which those words co-occur. So, for example, the query how spell returns how do you spell, among other patterns; make mistake yields make the mistake of [v-ing], among others.


What is StringNet?

StringNet is the knowledge engine behind Lexchecker and other forthcoming tools we are developing. A lexical knowledgebase, it consists of a massive archive of cross-indexed multiword patterns which we have retrieved statistically from 100,000,000 words of our licensed version of BNC. Patterns are detected and selected by strength of association among the co-occurring elements in the pattern. Elements such as word forms, lexemes or word categories (i.e., parts of speech) all can co-occur within the same pattern. For example, the verb stand yields the strings stand a chance and don't stand a chance, but also stand a chance of [v-ing], among many other patterns.

StringNet is a net rather than a list because the patterns (i.e., hybrid n-grams) are cross-indexed to indicate where different patterns share a common slot or word or where different words share a common pattern. Thus, for example, by this cross-indexing, StringNet detects a paradigmatic association between see and tell because these two verbs occupy the same slot in the hybrid n-gram as far as I can [verb] (as far as I can tell/see). This indexing among hybrid n-grams creates a hyper-dimensional graph-theoretic space, enabling StringNet to support our statistical approaches to a variety of applications we are working on, such as learner error detection and correction1, word (dis)similarity measurement (according to shared patterned behavior), determination of potentially confusable words for learners and the pinpointing of where confusable words are similar to and different from each other and thus might create confusion.


Getting Started Tips

Here are some words (selected almost randomly and using a bit of free association) to help give a taste of what LexChecker provides. Click on a word from the list to get the LexChecker search results for it. Be sure to note at the top of the results list whether the word can be used in more than one part of speech (e.g. fit as noun and fit as verb, and so on) and click on each part of speech for that word.

alike attention eye fit fire
mistake place pull reach
ready root storm tide
tongue weather whether wonder


For teachers

Teachers can use LexChecker to enrich the knowledge they pass to students concerning target vocabulary. Explorations with LexChecker can feed discovery exercises concerning more than just the forms of lexical chunks. It can show that collocations like spend time, take time or make mistake also further specify the form of their complements: spend time [V-ing], take time [to V], and make the mistake of [V-ing]. Using LexChecker to compare the patterned uses of the verbs steal and rob for example can show that a possession gets stolen but the possessor gets robbed. And of course an wide variety of other lexico-grammatical patterns of word behaviors.

LexChecker is different from typical corpus concordancing tools that search for words or strings of words within a corpus and display instances of them in the sentences where they occur in the corpus. LexChecker instead searches a massive archive of word patterns that have already been identified and stored by virtue of the strong statistical association among the words of parts of speech that co-occur within a pattern. Each pattern in a search result is listed with a link to all examples of it that occur in BNC.


Who are we?

David Wible is Distinguished Professor of Learning and Instruction at National Central University (NCU) in Taiwan and Director of NCU's Language Center.
Email: wible@stringnet.org

Nai-Lung Tsao is Post-Doctoral Researcher at the Graduate Institute of Learning and Instruction at National Central University and Adjunct Assistant Professor at Tamkang University in Taiwan.
Email: beaktsao@stringnet.org

We have developed Lexchecker as one of a suite of forthcoming tools for various aspects of second language vocabulary learning, teaching, and materials development. Our ongoing research and development has been supported by grants from Taiwan's National Science Council.


References and publications
  1. Nai-Lung Tsao and David Wible. "A Method for Unsupervised Lexical Error Detection and Correction", The NAACLWorkshop on Innovative Use of NLP for Building EducationalApplications, Boulder, Colorado, May 31-June5, 2009. [pdf]