Tuesday, August 13, 2013

Number of Unique Words in the Quran.

How many unique words are there in the Quran?

 I put forward this question to Kais Dukes, author of Quranic Arabic Corpus. My emailed question looked like this:
Dear Brother,
>
> السلام عليكم
>
> I have gone through your website and found it very essential for
> learners, researchers and for curious Muslims.
>
> I have a question to you?
> How many words are there in the Holy Quran without repetion? In other
> words, how many unique words are there in the Quran?
>
> I hope you have the answer, If your answer is from a seconday source,
> please refer to the relevant sources.
>
> مع أطيب التمنيات
>
> Md. Fazlul Haque
 
 In response to my question, he wrote:

Salamu  Alaykum Fazlul Haque,

To the best of my knowledge, our project is the first accurate
annotated morphological work for the Quran by computer, so I would be
surprised at an accurate unique word count from another secondary
source. Although of course,  I could be wrong. The number of unique
Arabic words in the Quran is not an easy question to answer. In Arabic
the concept of a "word" can have multiple technical linguistic
interpretations. Based on the existing annotation we have performed at
the Quranic Arabic Corpus (http://corpus.quran.com), I can provide the
following statistics:
Total number of space-seperated words = 77,430
Number of *unique* surface forms (i.e. space-separated word-forms,
including clitics) = 18994
Number of unique words by *stem* = 12183
Number of unique words by *root* = 1685 (not necessarily a great
metric for unique word counting, e.g. pronouns have no Semitic root)
Number of unique words by *lemma* = 3382 (excluding verbs, and other
words where lemma is not annotated).

This is a primary source (we annotated this ourselves). These figures
are quite accurate, but are subject to minor revision as further
checking occurs. The terms used above have technical linguistic
meanings. Thus, the number of unique "words" is not only a problem of
counting. Wwe have computers, so counting annotated data is in theory
very simple, I produced the above statistics after 10 minutes of work
just now. The issue is what metric to use ... unique white-space
separated word-forms, stems, roots, lemmas, or something else? Unlike
English, Arabic is a highly inflected and morphologically rich
language, with multiple segments often fused into a single word-form.

As an estimate, I would say that there are at most 7,000 unique
"words" in the Quran  in the sense of what you would need to have a
lexicon with wide-ranging coverage for the Quran. Something also
interesting to note, is the Zipfian distribution. A handful of words
(e.g. the top 100 words) will cover a very large percentage of the
actual Quran, i.e. most verses. (the 80/20 rule).

You might be interested in these web pages:

http://corpus.quran.com/lemmas.jsp - List of unique lemmas in the
Quran organized by frequency
http://corpus.quran.com/verbs.jsp - List of unique verbs in the Quran
organized by frequency

Sorry for giving you such a vague linguist's response, but in Arabic
the concept of a unique word is itself vague, and Arabic linguists (or
at least computational Arabic linguists) tend to prefer to work with
better defined terms such as the white-space separated tokens, surface
form, lemma, stem and root, but even then  those terms also have
problems :-)

I would suggest that the above two web pages with lists of most
frequently occurring lemmas and verb roots, are probably more what you
are looking for.

If you have any further questions, please ask, I would be happy to help.

-- Kais Dukes

Language Research Group
School of Computing
University of Leeds


Tuesday, August 6, 2013

Birds, Beasts and Insects Mentioned in the Quran

Look at the List of Birds, Beasts and Insects Mentioned in the Quran.

This is again a nursery approach to learning Quranic Words easily.



Words
Meanings
Singular
Frequency
Reference
الطَّيْرُ
Bird

19
2:260
الْهُدْهُدَ
Hoopoe

1
27:20
الْغُرَابُ
Crow

2
5:31
السَّلْوَىٰ
Quail

3
2:57
بَعُوضَةً
Mosquito

1
2:26
الذُّبَابُ
Fly

2
22:73
الْجَرَادَ
Locust

1
7:133
الْقُمَّلَ
Lice
قَمْلَةٌ
1
7:133
النَّحْل
Bee

1
16:68
النَّمْلُ
Ant

3
27:18
الْعَنكَبُوتُ
Spider

2
29:41
الْخَيْلُ
Horse

3
16:8
الْبِغَالُ
Mules
بَغْلٌ
1
16:8
الْحَمِيرُ
Donkey

4
16:8
الْإِبِلُ
Camel

2
88:17
الْجَمَلُ
Camel

1
7:40
نَاقَةُ
Camel (fem)

7
11:64
بَقَرَةٌ
Cow

9
2:69
الضَّأْنِ
Sheep

1
6:143
نَعْجَةً
Ewe

2
38:23
الْمَعْزِ
Goat

1
6:143
الْفِيلُ
Elephant

1
105:1
الضَّفَادِعَ
Frogs
ضِفْدِعٌ
1
7:133
حَيَّةٌ
Snake

1
20:20
ثُعْبَانٌ
serpent

2
26:32
الذِّئْبُ
Wolf

3
12:14
الْخِنزِيرِ
Swine

4
16:115
قَسْوَرَةٍ
Lion

1
74:51