Word clustering is a technique for partitioning sets of words into subsets of semantically similar words and is increasingly becoming a major technique used in a number of NLP tasks ranging from word sense or structural disambiguation to information retrieval and filtering.

It is a lexeme made up of a sequence of two or more lexemes that has properties that are not predictable from the properties of the individual lexemes. If not merged into a single lexeme it will affect the tagging accuracy. We have created a multiword list for Urdu which is being used for better tokenization of Urdu phrases.

