N-Grams (Urdu Sindhi and Punjabi)

A sequence of variable characters that stands for a word or string of words in a corpus.

For Example:

"The cow jumps over the moon". If N=2 (known as bigrams), then the ngrams would be:

  • the cow

  • cow jumps

  • jumps over

  • over the

  • the moon

  

 
Source(s)
No. of words (tokens)
No. of Unique words (types)
Urdu
1600 books 

Newspapers (Jang, Express, Nawa-e-Waqt)
120,756,442
306,942
Sindhi
Books

newspaper
10,260,412
85,331
Punjabi
Wichaar website

(part of) Punjabi wikipedia
   

Search CRCL

Find Us

Feedback/Suggestion