Thursday, September 6, 2012

How To Analyze Quranic Arabic Corpus morphological data 0.4

How To Analyze Quranic Arabic Corpus morphological data 0.4

If you want to analyse Quranic Corpus, Download it from corpus.quran.com/download/, import the txt file into MS Access 2007/2010, Use Query option to get desired result although analysis based on FEATURES column is a little bit tricky. The Access may look like this:

Before analyzing Quranic Arabic Corpus morphological data 0.4, you have to learn some terms of Corpus Linguistics.

In linguistics, a morpheme is the smallest semantically meaningful unit in a language. The field of study dedicated to morphemes is called morphology. Morphemes are of two types: Free and Bound Morphemes. A morpheme (or word element) that can stand alone as a word is called Free. It is sometimes called stem, because other non-free elements are added ti it.

In morphology, a bound morpheme is a morpheme that only appears as part of a larger word. They are sometimes called affixes.

Affixes are three types: Prefix, Infix, Suffix
Affixes (prefix, suffix, infix and circumfix) are all bound morphemes.
Bound morphemes occur only before other morphemes.Examples: un- (uncover, undo)
Infix Bound morphemes which are inserted into other morphemes. eg not found in English. But Food > Feed
Suffixes are Bound morphemes which occur following other morphemes.
Examples:
-er (singer, performer)
-ist (typist, pianist)
-ly (manly, friendly)

Quranic Arabic Corpus morphological data 0.4 includes these and other linguistic terms concerned.

Let me explain a few Rows
LOCATION is the Surah:Ayah:word:morpheme reference of the Quran. FORM is the English Transliteration of the surface Arabic Word form, which is based on Buckwalter Transliteration. See the chart:
http://corpus.quran.com/java/buckwalter.jsp

TAG is the lexical or grammatical category of the morpheme concerned. FEATURES describe the detailed linguistic features of the morpheme.

Description of FEATURES
In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words (headword). In English, for example, run, runs, ran and running are forms of the same lexeme, with run as the lemma. Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme.

Difference between stem and lemma
In computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the verb. For example, from "produced", the lemma is "produce", but the stem is "produc-." This is because there are words such as production. In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed.

For illustrations of Other Abbreviated Terms, Go to page
http://corpus.quran.com/documentation/tagset.jsp

For Verb Forms, Refer to page:
http://corpus.quran.com/documentation/verbforms.jsp

The First Word of Quran Bismi
The First Word of Quran Bismi consists of two morphemes: bi which is used as prefix, and somi (don't think that the "o" in somi is like English "O", it is a symbol of 'sukun' according to Buckwalter Transliteration) is a noun; it is a stem; POS=Parts of Speech, N=Noun; its Lemma is {som (whwre hamzah is deleted for widespread use) which is derived from the triliteral ROOT smw ie س م و . It is a |M|masculine noun used here in Genitive case ie اضافة
LOCATION FORM TAG FEATURES
(1:1:1:1) bi P PREFIX|bi+
(1:1:1:2) somi N STEM|POS:N|LEM:{som|ROOT:smw|M|GEN

The First Explicit Verb of the Quran
The First Explicit Verb of the Quran is located in the 2nd word of the Fifth verse of First chapter Fatihah:
(1:5:2:1) naEobudu V STEM|POS:V|IMPF|LEM:Eabada|ROOT:Ebd|1P
This is an IMPERFECT Verb (Present-Future Tense)used in 1st Person Plural

The Second Verb
(1:5:4:1) nasotaEiynu V STEM|POS:V|IMPF|(X)|LEM:{sotaEiynu|ROOT:Ewn|1P
This is also an IMPERFECT verb used in (X) Form and the ROOT is Ewn ie ع و ن

How To Analyze:
Download the txt file, copy and paste it to Excel 2007/2010 (Excel 2003 won't help)
The rows and columns will be separated. Now the analysis depends on what you want out of the QAC.

If you want to know how many prepositions are used i quran, you can do so by auto-filtering the TAG column: choose Data>Filter, from drop-down deselect 'Select all' and check P. You will get all prepositions used in the Quran. How many?
Ok, in the last blank cell of Column C, write this formula =COUNTIF(C1:C128215, "P"), press ENTER, you will get 13006. Unfortunately, you will not get this stat from the site
http://corpus.quran.com/morphologicalsearch.jsp You will get only 7679, here prepositions as stems are counted, not the prefixed and suffixed prepositions.There are 7679 stem prep, 5325 prefix prep and 2 suffix prep in Quran, so the total is 7679+5325+2= 13006.

Sometimes Quranic Arabic Corpus morphological data 0.4 is very helpful for you to find specific Data. For example if you want to know The Past Passive Verbs used in the Quran, you can do that within seconds. Here is the list of Past Passive Verbs used in Quran. (Here FORM is the passive form, Go to ayat and check it)

LOCATION FORM TAG
(4:157:15:1) $ub~iha V
(6:118:3:1) *ukira V
(5:3:23:1) *ubiHa V
(5:13:15:1) *uk~iru V
(76:14:4:2) *ul~ilato V
(2:283:16:1) {&otumina V
(33:11:2:1) {botuliYa V
(2:173:14:1) {DoTur~a V
(14:26:6:1) {jotuv~ato V
(7:75:8:1) {sotuDoEifu V
(42:16:8:1) {sotujiyba V
(5:44:17:1) {sotuHofiZu V
(6:10:2:1) {sotuhozi}a V
(2:166:4:1) {t~ubiEu V
(11:110:5:2) {xotulifa V
(54:9:9:2) {zodujira V
(22:39:1:1) >u*ina V
(2:24:12:1) >uEid~ato V
(9:58:7:1) >uEoTu V
(10:22:28:1) >uHiyTa V
(2:187:1:1) >uHil~a V
(4:128:18:2) >uHoDirati V
(69:5:3:2) >uholiku V
(4:25:36:1) >uHoSi V
(77:12:3:1) >uj~ilato V
(7:120:1:2) >uloqiYa V
(4:60:21:1) >umiru V
(18:56:17:1) >un*iru V
(2:4:4:1) >unzila V
(72:10:5:1) >uriyda V
(4:91:13:1) >urokisu V
(7:6:3:1) >urosila V
(9:108:6:1) >us~isa V
(2:25:25:2) >utu V
(11:60:1:2) >utobiEu V
(6:19:11:2) >uwHiYa V
(7:43:33:1) >uwrivo V
(8:70:18:1) >uxi*a V
(2:246:40:1) >uxorijo V
(2:93:15:2) >u$oribu V
(6:70:34:1) >ubosilu V
(3:185:14:2) >udoxila V
(22:22:8:1) >uEiydu V
(51:9:4:1) >ufika V
(10:27:16:1) >ugo$iyato V
(71:25:3:1) >ugoriqu V
(2:173:9:1) >uhil~a V
(11:1:3:1) >uHokimato V
(2:196:6:1) >uHoSiro V
(5:109:7:1) >ujibo V
(16:106:9:1) >ukoriha V
(25:40:6:1) >umoTirato V
(77:11:3:1) >uq~itato V
(11:116:23:1) >utorifu V
(3:195:22:2) >uw*u V
(32:17:5:1) >uxofiYa V
(26:90:1:2) >uzolifati V
(2:101:14:1) >uwtu V
(27:8:5:1) buwrika V
(22:60:9:1) bugiYa V
(16:58:2:1) bu$~ira V
(82:4:3:1) buEovirato V
(2:258:36:2) buhita V
(26:91:1:2) bur~izati V
(56:5:1:2) bus~ati V
(2:282:77:1) duEu V
(2:61:37:2) Duribato V
(33:14:2:1) duxilato V
(69:14:4:2) duk~a V
(16:126:6:1) Euwqibo V
(2:178:16:1) EufiYa V
(6:91:31:2) Eul~imo V
(18:48:1:2) EuriDu V
(11:28:14:2) Eum~iyato V
(81:4:3:1) EuT~ilato V
(5:107:2:1) Euvira V
(16:71:10:1) fuD~ilu V
(34:54:7:1) fuEila V
(11:1:6:1) fuS~ilato V
(21:96:3:1) futiHato V
(16:110:9:1) futinu V
(82:3:3:1) fuj~irato V
(77:9:3:1) furijato V
(34:23:11:1) fuz~iEa V
(5:64:6:1) gul~ato V
(7:119:1:2) gulibu V
(11:44:7:2) giyDa V
(27:17:1:2) Hu$ira V
(34:54:1:2) Hiyla V
(3:101:14:1) hudiYa V
(69:14:1:2) Humilati V
(84:2:3:2) Huq~ato V
(3:50:11:1) Hur~ima V
(4:86:2:1) Huy~iy V
(22:40:18:2) hud~imato V
(76:21:6:2) Hul~u V
(20:87:7:1) Hum~ilo V
(100:10:1:2) HuS~ila V
(39:69:7:2) jiA@Y^'a V
(16:124:2:1) juEila V
(26:38:1:2) jumiEa V
(3:184:4:1) ku*~iba V
(12:110:8:1) ku*ibu V
(17:35:4:1) kilo V
(54:14:6:1) kufira V
(13:31:12:1) kul~ima V
(2:178:4:1) kutiba V
(11:55:3:2) kiydu V
(81:11:3:1) ku$iTato V
(27:90:4:2) kub~ato V
(58:5:6:1) kubitu V
(26:94:1:2) kubokibu V
(81:1:3:1) kuw~irato V
(5:64:8:2) luEinu V
(3:159:5:1) lin V
(23:35:4:1) mi V
(12:63:7:1) muniEa V
(84:3:3:1) mud~ato V
(18:18:20:3) muli}o V
(34:7:10:1) muz~iqo V
(7:43:29:2) nuwdu V
(68:49:7:2) nubi*a V
(18:99:7:2) nufixa V
(4:161:4:1) nuhu V
(12:110:11:2) nuj~iYa V
(6:37:3:1) nuz~ila V
(81:10:3:1) nu$irato V
(21:65:2:1) nukisu V
(74:8:2:1) nuqira V
(88:19:4:1) nuSibato V
(77:10:3:1) nusifato V
(59:11:23:1) quwtilo V
(2:11:2:1) qiyla V
(54:12:9:1) qudira V
(2:210:12:2) quDiYa V
(7:204:2:1) quri}a V
(13:31:8:1) quT~iEato V
(3:144:13:1) qutila V
(12:26:13:1) qud~a V
(33:61:5:2) qut~ilu V
(6:45:1:2) quTiEa V
(4:91:10:1) rud~u V
(88:18:4:1) rufiEato V
(41:50:17:1) r~ujiEo V
(2:25:14:1) ruziqu V
(56:4:2:1) ruj~ati V
(2:108:7:1) su}ila V
(11:77:5:1) siY^'a V
(40:37:15:2) Sud~a V
(47:15:40:2) suqu V
(7:47:2:1) Surifato V
(39:71:1:2) siyqa V
(13:33:30:2) Sud~u V
(81:12:3:1) suE~irato V
(11:108:3:1) suEidu V
(81:6:3:1) suj~irato V
(15:15:3:1) suk~irato V
(7:149:2:1) suqiTa V
(88:20:4:1) suTiHato V
(13:31:4:1) suy~irato V
(39:73:18:1) Tibo V
(9:87:6:2) TubiEa V
(8:2:10:1) tuliyato V
(5:27:10:2) tuqub~ila V
(77:8:3:1) Tumisato V
(3:112:6:1) vuqifu V
(83:36:2:1) vuw~iba V
(3:96:4:1) wuDiEa V
(13:35:4:1) wuEida V
(3:25:8:2) wuf~iyato V
(12:75:4:1) wujida V
(19:15:4:1) wulida V
(7:20:7:1) wu,riYa V
(32:11:6:1) wuk~ila V
(6:27:4:1) wuqifu V
(26:21:4:1) xifo V
(4:28:6:2) xuliqa V
(9:118:4:1) xul~ifu V
(16:88:7:1) zido V
(4:148:10:1) Zulima V
(2:212:1:1) zuy~ina V
(3:185:11:1) zuHoziHa V
(2:214:16:2) zulozilu V
(81:7:3:1) zuw~ijato V

No comments:

Post a Comment