N-Grams (Urdu Sindhi and Punjabi)

A sequence of variable characters that stands for a word or string of words in a corpus.

For Example:

"The cow jumps over the moon". If N=2 (known as bigrams), then the ngrams would be:

  

 
Source(s)
No. of words (tokens)
No. of Unique words (types)
Urdu
1600 books 

Newspapers (Jang, Express, Nawa-e-Waqt)
120,756,442
306,942
Sindhi
Books

newspaper
10,260,412
85,331
Punjabi
Wichaar website

(part of) Punjabi wikipedia