You'll be wanting no less than a naive stemming algorithm (try out the Porter stemmer; there is certainly offered, no cost code in many languages) to system textual content to start with. Continue to keep this processed textual content and the preprocessed textual content in two separate Room-split arrays. It was originally often called “Evergr