This will not be of interest to most users, but is here to answer some questions that I would have.
Some messages are truncated. That is, some or all of the text information is missing when I tried to delete the advertising and footers to avoid indexing that junk. The subject line is still there and if you want to see the real message, you must follow the link and go to Yahoo and look it up. I have fixed most of them, but am still working on the problem. Please tell me about any that you find.
These messages are password protected and are not linked to any pages that might expose them to search engines (such as Goggle) and should not be harvested into public databases. Note, however, that I harvested them from Yahoo so such protections are far from foolproof.
The e-mail address of the sender is masked. That is, when you get the e-mail by being on the list, you can see and reply to the sender off-line. This is a great feature, but I get the messages from the message section of the Yahoo web group and that copy truncates the address so it can not be harvested by spammers. What you can do is click on the link to go to Yahoo and get the original message (with all of the advertisements) and try using the link there to send a reply to the sender. It will work if the person is not only a member of the list, but also has a Yahoo profile. If you are still having problems, write to me and I can get the address for you.
The search engine searches ALL of the messages at one time. There are now about 150,000 messages with about 30,000,000 words. You can set how many found messages appear on each screen. The messages are sorted by the frequency of the target words in the message. You can also sort them by date with the newest first.
This is very different from the Yahoo search engine that seems to have a time limit. The first time you search, it looks at 100-200 messages. It used to expand the number searched but now seems to limit it again so as to display more advertisements. Perhaps they will improve the search and make this project obsolete. So much the better for everyone.
There is a spelling feature that suggests other words if it finds very little and this feature still needs more work.
There are statistics generated that have limited utility. They can be found at Statistics. Note that I use the words 'message' and 'glass' for testing that things still work.
The search engine is Zoom version 6.0 from Wrensoft written in C++ and running as compiled cgi on Dreamhost web services computers using Apache Linux . The indexing is done by Zoom offline on my local machine. I have a shell account with many programmer oriented features and if you think you can successfully implement Lucene or cLucene on the local (not Apache module) level, please let me know.