Corpus

How To Analyze Quranic Arabic Corpus morphological data 0.4

If you want to analyse Quranic Corpus, Download it from corpus.quran.com/download/, import the txt file into MS Access 2007/2010, Use Query option to get desired result although analysis based on FEATURES column is a little bit tricky.:


Before analyzing Quranic Arabic Corpus morphological data 0.4, you have to learn some terms of Corpus Linguistics.

In linguistics, a morpheme is the smallest semantically meaningful unit in a language. The field of study dedicated to morphemes is called morphology. Morphemes are of two types: Free and Bound Morphemes. A morpheme (or word element) that can stand alone as a word is called Free. It is sometimes called stem, because other non-free elements are added ti it.

In morphology, a bound morpheme is a morpheme that only appears as part of a larger word. They are sometimes called affixes.

Affixes are three types: Prefix, Infix, Suffix
Affixes (prefix, suffix, infix and circumfix) are all bound morphemes.
Bound morphemes occur only before other morphemes.Examples: un- (uncover, undo)
Infix Bound morphemes which are inserted into other morphemes. eg not found in English. But Food > Feed
Suffixes are Bound morphemes which occur following other morphemes.
Examples:
-er (singer, performer)
-ist (typist, pianist)
-ly (manly, friendly)

Quranic Arabic Corpus morphological data 0.4 includes these and other linguistic terms concerned.

Let me explain a few Rows
LOCATION is the Surah:Ayah:word:morpheme reference of the Quran. FORM is the English Transliteration of the surface Arabic Word form, which is based on Buckwalter Transliteration. See the chart:
http://corpus.quran.com/java/buckwalter.jsp

TAG is the lexical or grammatical category of the morpheme concerned. FEATURES describe the detailed linguistic features of the morpheme.

Description of FEATURES
In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words (headword). In English, for example, run, runs, ran and running are forms of the same lexeme, with run as the lemma. Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme.

Difference between stem and lemma
In computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the verb. For example, from "produced", the lemma is "produce", but the stem is "produc-." This is because there are words such as production. In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed.

For illustrations of Other Abbreviated Terms, Go to page
http://corpus.quran.com/documentation/tagset.jsp

For Verb Forms, Refer to page:
http://corpus.quran.com/documentation/verbforms.jsp

The First Word of Quran Bismi
The First Word of Quran Bismi consists of two morphemes: bi which is used as prefix, and somi (don't think that the "o" in somi is like English "O", it is a symbol of 'sukun' according to Buckwalter Transliteration) is a noun; it is a stem; POS=Parts of Speech, N=Noun; its Lemma is {som (whwre hamzah is deleted for widespread use) which is derived from the triliteral ROOT smw ie س م و . It is a |M|masculine noun used here in Genitive case ie اضافة
LOCATION FORM TAG FEATURES
(1:1:1:1) bi P PREFIX|bi+
(1:1:1:2) somi N STEM|POS:N|LEM:{som|ROOT:smw|M|GEN

The First Explicit Verb of the Quran
The First Explicit Verb of the Quran is located in the 2nd word of the Fifth verse of First chapter Fatihah:
(1:5:2:1) naEobudu V STEM|POS:V|IMPF|LEM:Eabada|ROOT:Ebd|1P
This is an IMPERFECT Verb (Present-Future Tense)used in 1st Person Plural

The Second Verb
(1:5:4:1) nasotaEiynu V STEM|POS:V|IMPF|(X)|LEM:{sotaEiynu|ROOT:Ewn|1P
This is also an IMPERFECT verb used in (X) Form and the ROOT is Ewn ie ع و ن

How To Analyze:
Download the txt file, copy and paste it to Excel 2007/2010 (Excel 2003 won't help)
The rows and columns will be separated. Now the analysis depends on what you want out of the QAC.

If you want to know how many prepositions are used i quran, you can do so by custom-filtering the TAG column: click on the right corner, from drop-down deselect 'Select all' and check P, click Ok. You will get all prepositions used in the Quran. How many?
You will get 13006. Unfortunately, you will not get this stat from the site
http://corpus.quran.com/morphologicalsearch.jsp You will get only 7679, here prepositions as stems are counted, not the prefixed and suffixed prepositions.There are 7679 stem prep, 5325 prefix prep and 2 suffix prep in Quran, so the total is 7679+5325+2= 13006.

Sometimes Quranic Arabic Corpus morphological data 0.4 is very helpful for you to find specific Data. For example if you want to know The Past Passive Verbs used in the Quran, you can do that within seconds. Here is the list of Past Passive Verbs used in Quran. (Here FORM is the passive form, Go to ayat and check it)

LOCATION FORM TAG
(4:157:15:1) $ub~iha V
(6:118:3:1) *ukira V
(5:3:23:1) *ubiHa V
(5:13:15:1) *uk~iru V
(76:14:4:2) *ul~ilato V
(2:283:16:1) {&otumina V
(33:11:2:1) {botuliYa V
(2:173:14:1) {DoTur~a V
(14:26:6:1) {jotuv~ato V
(7:75:8:1) {sotuDoEifu V
(42:16:8:1) {sotujiyba V
(5:44:17:1) {sotuHofiZu V
(6:10:2:1) {sotuhozi}a V
(2:166:4:1) {t~ubiEu V
(11:110:5:2) {xotulifa V
(54:9:9:2) {zodujira V
(22:39:1:1) >u*ina V
(2:24:12:1) >uEid~ato V
(9:58:7:1) >uEoTu V
(10:22:28:1) >uHiyTa V
(2:187:1:1) >uHil~a V
(4:128:18:2) >uHoDirati V
(69:5:3:2) >uholiku V
(4:25:36:1) >uHoSi V
(77:12:3:1) >uj~ilato V
(7:120:1:2) >uloqiYa V
(4:60:21:1) >umiru V
(18:56:17:1) >un*iru V
(2:4:4:1) >unzila V
(72:10:5:1) >uriyda V
(4:91:13:1) >urokisu V
(7:6:3:1) >urosila V
(9:108:6:1) >us~isa V
(2:25:25:2) >utu V
(11:60:1:2) >utobiEu V
(6:19:11:2) >uwHiYa V
(7:43:33:1) >uwrivo V
(8:70:18:1) >uxi*a V
(2:246:40:1) >uxorijo V
(2:93:15:2) >u$oribu V
(6:70:34:1) >ubosilu V
(3:185:14:2) >udoxila V
(22:22:8:1) >uEiydu V
(51:9:4:1) >ufika V
(10:27:16:1) >ugo$iyato V
(71:25:3:1) >ugoriqu V
(2:173:9:1) >uhil~a V
(11:1:3:1) >uHokimato V
(2:196:6:1) >uHoSiro V
(5:109:7:1) >ujibo V
(16:106:9:1) >ukoriha V
(25:40:6:1) >umoTirato V
(77:11:3:1) >uq~itato V
(11:116:23:1) >utorifu V
(3:195:22:2) >uw*u V
(32:17:5:1) >uxofiYa V
(26:90:1:2) >uzolifati V
(2:101:14:1) >uwtu V
(27:8:5:1) buwrika V
(22:60:9:1) bugiYa V
(16:58:2:1) bu$~ira V
(82:4:3:1) buEovirato V
(2:258:36:2) buhita V
(26:91:1:2) bur~izati V
(56:5:1:2) bus~ati V
(2:282:77:1) duEu V
(2:61:37:2) Duribato V
(33:14:2:1) duxilato V
(69:14:4:2) duk~a V
(16:126:6:1) Euwqibo V
(2:178:16:1) EufiYa V
(6:91:31:2) Eul~imo V
(18:48:1:2) EuriDu V
(11:28:14:2) Eum~iyato V
(81:4:3:1) EuT~ilato V
(5:107:2:1) Euvira V
(16:71:10:1) fuD~ilu V
(34:54:7:1) fuEila V
(11:1:6:1) fuS~ilato V
(21:96:3:1) futiHato V
(16:110:9:1) futinu V
(82:3:3:1) fuj~irato V
(77:9:3:1) furijato V
(34:23:11:1) fuz~iEa V
(5:64:6:1) gul~ato V
(7:119:1:2) gulibu V
(11:44:7:2) giyDa V
(27:17:1:2) Hu$ira V
(34:54:1:2) Hiyla V
(3:101:14:1) hudiYa V
(69:14:1:2) Humilati V
(84:2:3:2) Huq~ato V
(3:50:11:1) Hur~ima V
(4:86:2:1) Huy~iy V
(22:40:18:2) hud~imato V
(76:21:6:2) Hul~u V
(20:87:7:1) Hum~ilo V
(100:10:1:2) HuS~ila V
(39:69:7:2) jiA@Y^'a V
(16:124:2:1) juEila V
(26:38:1:2) jumiEa V
(3:184:4:1) ku*~iba V
(12:110:8:1) ku*ibu V
(17:35:4:1) kilo V
(54:14:6:1) kufira V
(13:31:12:1) kul~ima V
(2:178:4:1) kutiba V
(11:55:3:2) kiydu V
(81:11:3:1) ku$iTato V
(27:90:4:2) kub~ato V
(58:5:6:1) kubitu V
(26:94:1:2) kubokibu V
(81:1:3:1) kuw~irato V
(5:64:8:2) luEinu V
(3:159:5:1) lin V
(23:35:4:1) mi V
(12:63:7:1) muniEa V
(84:3:3:1) mud~ato V
(18:18:20:3) muli}o V
(34:7:10:1) muz~iqo V
(7:43:29:2) nuwdu V
(68:49:7:2) nubi*a V
(18:99:7:2) nufixa V
(4:161:4:1) nuhu V
(12:110:11:2) nuj~iYa V
(6:37:3:1) nuz~ila V
(81:10:3:1) nu$irato V
(21:65:2:1) nukisu V
(74:8:2:1) nuqira V
(88:19:4:1) nuSibato V
(77:10:3:1) nusifato V
(59:11:23:1) quwtilo V
(2:11:2:1) qiyla V
(54:12:9:1) qudira V
(2:210:12:2) quDiYa V
(7:204:2:1) quri}a V
(13:31:8:1) quT~iEato V
(3:144:13:1) qutila V
(12:26:13:1) qud~a V
(33:61:5:2) qut~ilu V
(6:45:1:2) quTiEa V
(4:91:10:1) rud~u V
(88:18:4:1) rufiEato V
(41:50:17:1) r~ujiEo V
(2:25:14:1) ruziqu V
(56:4:2:1) ruj~ati V
(2:108:7:1) su}ila V
(11:77:5:1) siY^'a V
(40:37:15:2) Sud~a V
(47:15:40:2) suqu V
(7:47:2:1) Surifato V
(39:71:1:2) siyqa V
(13:33:30:2) Sud~u V
(81:12:3:1) suE~irato V
(11:108:3:1) suEidu V
(81:6:3:1) suj~irato V
(15:15:3:1) suk~irato V
(7:149:2:1) suqiTa V
(88:20:4:1) suTiHato V
(13:31:4:1) suy~irato V
(39:73:18:1) Tibo V
(9:87:6:2) TubiEa V
(8:2:10:1) tuliyato V
(5:27:10:2) tuqub~ila V
(77:8:3:1) Tumisato V
(3:112:6:1) vuqifu V
(83:36:2:1) vuw~iba V
(3:96:4:1) wuDiEa V
(13:35:4:1) wuEida V
(3:25:8:2) wuf~iyato V
(12:75:4:1) wujida V
(19:15:4:1) wulida V
(7:20:7:1) wu,riYa V
(32:11:6:1) wuk~ila V
(6:27:4:1) wuqifu V
(26:21:4:1) xifo V
(4:28:6:2) xuliqa V
(9:118:4:1) xul~ifu V
(16:88:7:1) zido V
(4:148:10:1) Zulima V
(2:212:1:1) zuy~ina V
(3:185:11:1) zuHoziHa V
(2:214:16:2) zulozilu V
(81:7:3:1) zuw~ijato V
 Examples: (4:157:15:1)
وَقَوْلِهِمْ إِنَّا قَتَلْنَا الْمَسِيحَ عِيسَى ابْنَ مَرْيَمَ رَسُولَ اللَّهِ وَمَا قَتَلُوهُ وَمَا صَلَبُوهُ وَلَٰكِن شُبِّهَ لَهُمْ ۚ
That they said (in boast), "We killed Christ Jesus the son of Mary, the Messenger of Allah";- but they killed him not, nor crucified him, but so it was made to appear to them,

(6:118:3:1)
فَكُلُوا مِمَّا ذُكِرَ اسْمُ اللَّهِ عَلَيْهِ إِن كُنتُم بِآيَاتِهِ مُؤْمِنِينَ
So eat of (meats) on which Allah's name hath been pronounced, if ye have faith in His signs.


(5:3:23:1)
وَمَا ذُبِحَ عَلَى النُّصُبِ وَأَن تَسْتَقْسِمُوا بِالْأَزْلَامِ
and those which are sacrificed on stone altars, and [prohibited is] that you seek decision through divining arrows.

(81:7:3:1)
وَإِذَا النُّفُوسُ زُوِّجَتْ [٨١:٧]
When the souls are sorted out, (being joined, like with like);

 In Salat everyday We recite إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ [١:٥] You alone we worship. You alone we ask for help. Do you know how many times the detached pronoun (iyya = alone) occur in Quran? This occurs 24 times in the Quran.
1. With 1 Person singular 5 times
2. With 1 Person plural 2 times
3. With 3 Person Masculine singular 8 times
4. With 3 Person Masculine plural 1 time
5. With 2 Person Masculine singular 2 times
6. With 2 Person Masculine Plural 6 times

1 فَإِيَّايَ فَارْهَبُونِ [١٦:٥١] then fear Me (and Me alone)."
2 وَقَالَ شُرَكَاؤُهُم مَّا كُنتُمْ إِيَّانَا تَعْبُدُونَ [١٠:٢٨] and their "Partners" shall say: "It was not us alone that ye worshipped!
3 يَا أَيُّهَا الَّذِينَ آمَنُوا كُلُوا مِن طَيِّبَاتِ مَا رَزَقْنَاكُمْ وَاشْكُرُوا لِلَّهِ إِن كُنتُمْ إِيَّاهُ تَعْبُدُونَ [٢:١٧٢] O ye who believe! Eat of the good things that We have provided for you, and be grateful to Allah, if it is Him alone ye worship.
4 نَّحْنُ نَرْزُقُكُمْ وَإِيَّاهُمْ We provide sustenance for you and for them;-
5. إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ [١:٥] You alone we worship. You alone we ask for help.
6. وَلَا تَقْتُلُوا أَوْلَادَكُمْ خَشْيَةَ إِمْلَاقٍ ۖ نَّحْنُ نَرْزُقُهُمْ وَإِيَّاكُمْ Kill not your children for fear of want: We shall provide sustenance for them as well as for you.

AL Hamdu lillah. This is rather easy with Quranic Arabic Corpus.

 Want to know how many times the word Rahman occurs in the Quran. It is easy. In Access 2007 import the corpus data, Select the "LEM:r~aHoma`n" , copy it, again select and right click, from drop down go to Text Filter, Select Contain, paste the text, click OK. You will get 57 occurrences.

Analyzing Quranic Arabic Corpus morphological data based on word ROOT is easy. For example if you want to filter all the Words based on the ROOT:wqy (whose derivatives are muttaqeen, taqwa, waq etc) just copy the ROOT:wqy in MS Access 2007, right-click the mouse and select Text Filter > Contains > paste > OK. You will get all the 258 occurences of words with this root.

5 comments:

  1. Salam

    I have GP to do Arabic POS tagger, and I don't know if the Jqurantree library and Quran Corpus Data will be useful and helpful ?

    please give me advice or guidline

    ReplyDelete
  2. Dear Abdullah,

    و عليكم السلام و رحمة الله
    POS has already been tagged by QAC in QAC Morphology Data 0.4. So there is no need of tagging, but you can of course suggest correction in discussion board.

    Since I know nothing about Java, I have to analyze the data using MS Access 2007 and Excel 2007. Advance users use XML and Java API for effective analysis.

    ReplyDelete
    Replies
    1. Thank you very much Dr.Fazlul

      I appreciate your respect and your help

      Delete
  3. I usually use MS Access 2007 and Excel 2007 for data analysis of Quranic Arabic Morphology Data 0.4. I can help you in this regard if you need it.

    ReplyDelete
  4. Can we download and export it with arabic text? So that we can analyze and sort it with arabic text.

    Thanks, very helpful indeed.

    ReplyDelete