Assignment 2: Spell Check : Binary Search and Binary Search Tree

Due : August 13, 2004

In this assignment you will implement a simple spell checking application using a very simple data structure and algorithm.

  1. Your program should be called bsspell.
  2. You should be able to spell check a document by running the command
    bsspell dictionary.txt document.txt
    
    where dictionary.txt is a text file containing correctly-spelled words and document.txt is a text file containing the words to spell check.
  3. The format of the dictionary file is as follows. There is one word per line. The words are not guaranteed to be in sorted order.
  4. The format of the document file is as follows. It is an ASCII text file with no restrictions. There may be punctuation, capitals, whitespace, numbers, etc.
  5. The output of your program should be a sorted list of mis-spelled words, one per line, without duplicates, in increasing order.

The outline of your program should be as follows:

  1. Read in the dictionary and store it in an array.
  2. Sort the dictionary array. You should use any O(n logn) algorithm for the sort.
  3. Go through the document considering each word in turn. Use binary search to look up the word in the dictionary. If it is not present, add the word to some data structure that keeps track of mis-spelled words.
    You must keep track of the unique mis-spelled words in an efficient manner. Use a binary search tree to store the mis-spelled words. Don't insert duplicate items into the tree. An in-order traversal of the tree at the end produces the unique mis-spelled words in sorted order.
  4. When the document file is exhausted, print the mis-spelled words that you have accumulated in sorted order, one per line.