Assignment 2: Spell Check : Binary Search and Binary Search Tree
Due : August 13, 2004
In this assignment you will implement a simple spell checking
application using a very simple data structure and algorithm.
- Your program should be called bsspell.
- You should be able to spell check a document by running the
command
bsspell dictionary.txt document.txt
where
dictionary.txt
is a text file containing correctly-spelled words and document.txt
is a text file
containing the words to spell check.
- The format of the dictionary file is as follows. There is one
word per line. The words are not guaranteed to be in sorted
order.
- The format of the document file is as follows. It is an ASCII
text file with no restrictions. There may be punctuation, capitals,
whitespace, numbers, etc.
- The output of your program should be a sorted list of mis-spelled
words, one per line, without duplicates, in increasing order.
The outline of your program should be as follows:
- Read in the dictionary and store it in an array.
- Sort the dictionary array. You should use any O(n logn) algorithm for the sort.
- Go through the document considering each word in turn. Use
binary search to look up the word in the dictionary. If it is not
present, add the word to some data structure that keeps track of
mis-spelled words.
You must keep track of the unique mis-spelled words in an efficient
manner. Use a binary search tree to store the mis-spelled words. Don't
insert duplicate items into the tree. An in-order traversal of the tree
at the end produces the unique mis-spelled words in sorted order.
- When the document file is exhausted, print the mis-spelled words
that you have accumulated in sorted order, one per line.