Word Counter
The Word Counter program will read a file and analyse the word frequency of its contents. Any of the text books provided by the Gutenberg Project could be used including The Christmas Carol by Charles Dickens.
Task 1
Develop a program that reads in a file containing the Christmas Carol by Charles Dickens and outputs each word (once) without punctuation.
You need to consider how you will manage the words in particular how you manage full stops, commas and dashes etc. What about comparing don’t and do not? WHat about uppercase and lowercase versions of the same word.
Task 2
Modify your program so it counts each unique word and can return a dictionary of words and their associated totals.
Task 3
Modify your program so a user can enter a word and the computer will return either the number of times that word occurs or tells the user that the word does not exist.
Extensions
- Write the results back out to another results file using CSV format.
- Calculate the time taken to load the words and to find a search word.
- Explore different data structures. It is possible to complete this task with arrays but it will not provide the most efficient solution. As an extension you could explore linked lists and hash tables.