How many unique words are there in the Quran?
I put forward this question to Kais Dukes, author of Quranic Arabic Corpus. My emailed question looked like this:
Dear Brother,
>
> السلام عليكم
>
> I have gone through your website and found it very essential for
> learners, researchers and for curious Muslims.
>
> I have a question to you?
> How many words are there in the Holy Quran without repetion? In other
> words, how many unique words are there in the Quran?
>
> I hope you have the answer, If your answer is from a seconday source,
> please refer to the relevant sources.
>
> مع أطيب التمنيات
>
> Md. Fazlul Haque
In response to my question, he wrote:Salamu Alaykum Fazlul Haque, To the best of my knowledge, our project is the first accurate annotated morphological work for the Quran by computer, so I would be surprised at an accurate unique word count from another secondary source. Although of course, I could be wrong. The number of unique Arabic words in the Quran is not an easy question to answer. In Arabic the concept of a "word" can have multiple technical linguistic interpretations. Based on the existing annotation we have performed at the Quranic Arabic Corpus (http://corpus.quran.com), I can provide the following statistics:
Total number of space-seperated words = 77,430 Number of *unique* surface forms (i.e. space-separated word-forms, including clitics) = 18994 Number of unique words by *stem* = 12183 Number of unique words by *root* = 1685 (not necessarily a great metric for unique word counting, e.g. pronouns have no Semitic root) Number of unique words by *lemma* = 3382 (excluding verbs, and other words where lemma is not annotated). This is a primary source (we annotated this ourselves). These figures are quite accurate, but are subject to minor revision as further checking occurs. The terms used above have technical linguistic meanings. Thus, the number of unique "words" is not only a problem of counting. Wwe have computers, so counting annotated data is in theory very simple, I produced the above statistics after 10 minutes of work just now. The issue is what metric to use ... unique white-space separated word-forms, stems, roots, lemmas, or something else? Unlike English, Arabic is a highly inflected and morphologically rich language, with multiple segments often fused into a single word-form. As an estimate, I would say that there are at most 7,000 unique "words" in the Quran in the sense of what you would need to have a lexicon with wide-ranging coverage for the Quran. Something also interesting to note, is the Zipfian distribution. A handful of words (e.g. the top 100 words) will cover a very large percentage of the actual Quran, i.e. most verses. (the 80/20 rule). You might be interested in these web pages: http://corpus.quran.com/lemmas.jsp - List of unique lemmas in the Quran organized by frequency http://corpus.quran.com/verbs.jsp - List of unique verbs in the Quran organized by frequency Sorry for giving you such a vague linguist's response, but in Arabic the concept of a unique word is itself vague, and Arabic linguists (or at least computational Arabic linguists) tend to prefer to work with better defined terms such as the white-space separated tokens, surface form, lemma, stem and root, but even then those terms also have problems :-) I would suggest that the above two web pages with lists of most frequently occurring lemmas and verb roots, are probably more what you are looking for. If you have any further questions, please ask, I would be happy to help. -- Kais Dukes Language Research Group School of Computing University of Leeds
Awesome and useful post
ReplyDeleteonline quran learning
very useful post.
ReplyDeleteQuran learning
Thank you for your post brother. Where can I find a list of the most commonly used words, that forms 80% of the Quran.
ReplyDeleteThis comment has been removed by the author.
DeleteYou can download the 80% words of the Quran from this link: http://mebk12.meb.gov.tr/meb_iys_dosyalar/41/02/174154/dosyalar/2014_01/02125602_denkelimekartlarngarap..pdf
ReplyDeleteThis comment has been removed by the author.
DeleteAssalamu alaikum Fazlul Haque,
DeleteI followed the link you entered here and found the 80% words of the Quran pdf file. I would like to use the information in it in a work that I am doing. Do you know if it is copy righted or How I can contact the owner.
Thank you JazakaAllahu Khair
🕌 “The Essential Book of Quranic Words” by Abrar Khan
Deletehttp://quranicwords.com
The link you shared is no longer accessible, would you be able to re share its new location. Would loev to get access to 80% of the words in the Quran, as I am 40 and I am a keen memorizer of the Holy Quran
DeleteShukran Jazeelan Ya Akhi'
ReplyDeleteVery useful, the answer is exactly the one I am looking for!!!!
ReplyDeletejazak allah
ReplyDeleteWhat is the meaning of the word wenhar in sura 108/2?
ReplyDeleteAnd sacrifice
DeleteنحرNahar means to cut the neck of camel
This comment has been removed by the author.
ReplyDeleteThank you for your post brother. Where can I find a list of 2000 words not reapeted in alquran?
ReplyDeleteActually that would be based on Quranic Verb ROOTS plus words without ROOTS. I am Trying to find them out.
ReplyDeleteThank you very much brother for your contribution
ReplyDeleteexcellent work
ReplyDeletecheck this book too
https://thequranicwords.wordpress.com/free-version/
🕌 “The Essential Book of Quranic Words” by Abrar Khan
Deletehttp://quranicwords.com
Alhamdulillah, greateful to Mr. Fazlul Haque for your great work which helps me a lot.
ReplyDeletehttps://arabictreelearning.com/wordgame/
ReplyDeleteCheck your knowledge