Well, I finally started working on the Lewis concordance we were discussing earlier (see 'Would it be legal to...' thread). I've scanned most of Till We Have Faces into my laptop and converted it with OCR (Optical Character Recognition) to a text file. My next step will be to make a little C++ program to process the text file and spit out HTML pages for each word encountered in the text.
I thought it would be best to reference each use of a word by chapter and paragraph rather than page number, since the page numbering varies for different publishers/editions. For the paragraph format, it would be good to use 32/35 (for 32nd paragraph out of 35 total in chapter)--that way it will be easier to know how far along within the chapter the particular occurrence of the word is. I'm also hoping to include a direct quote of the surrounding sentence for each reference, along with several other keywords from the paragraph to provide some context for the word.
The listings of a particular word will be sorted by book--but I'm debating: should I list the books in chronological order (by date published), or alphabetically, or in some other order?
I'd be curious and eager to hear any suggestions you might have for this little project--there are others here more experienced at working with HTML than I am (obviously). But I don't think it should be too hard to do. The most work will be scanning all the books in.
(It would be nice to be able to see the texts themselves online--but of course, that wouldn't jive with the copyrights--so will just make this a concordance.)
Open to your input,
Micah