CHAPTER IV
CONTENT ANALYSIS OF ARABIC TEXTS
4.1 First Design (Optimum)
4.1.1 Subjects
The following books and articles where selected from different disciplines, which totaled 165,604 words amounting to 960,267 characters (with spaces).
1- نحو تقويم جديد للكتابة العربية. Talib Abdurrahman. Muharram 1420 A.H. Al-Ummah Book Series, no. 69. Doha, Qatar.
2- الارتقاء بالعربية في وسائل الإعلام. Nooruddin Bulabel. Thw al-Qi`dah. 1422 A.H. Al-Ummah Book Series, no. 84. Doha, Qatar.
3- كردستان العراق: باقة لألوان الطيف. Al-Arabi Magazine. ISSUE NO 525, 1 August 2002.
4- السودان.. قارة تسكن بلداً. Al-Arabi Magazine. ISSUE NO 523, 1 Jun 2002.
5- العربي تظفر بجائزة العويس الـثقافية.. ليست مجلة ولكنها ساحة للأفكار الـخلاقة. Al-Arabi Magazine. ISSUE NO 522, 1 May 2002.
6- دبي اقتصاد المعرفة وثورة الميديا والمعـلوماتية. Al-Arabi Magazine. ISSUE NO 524, 1 July 2002.
7- الحياة.. معركة التحدي :دنيا وآخرة. Fathi Mari’. Al-Ahram Newspaper. August 17, 2002.
8- في فقه الأولويات، دراسة جديدة في ضوء القرآن والسنة، الطبعة رقم 1، يوسف القرضاوي. Almaktab-Alislami Publications. Beirut. 1999.
9- مسؤولون فلسطينيون يرجحون قيادة جماعية ثلاثية تتولى المناصب التي يحتلها عرفات في حال غيابه. Muhammad Ali al Qulaibi. Asharqalawsat Newspaper. August 17, 2002
10- أقل الكلام: الاردن والبعد الاسرائيلي للحرب الامريكية على العراق. Urabe al Rintawi. Addustour Newspaper. August 17, 2002.
11- الصورة تغني عن الكلمات . PC Magazine, Arabic Edition. June 2002.
4.1.2 Tools
The documents were put in a Microsoft Word document in text format. MS Excel was also used for calculations.
4.1.3 Procedure
A search was then run for each character in the Word document using the Find and Replace function, with the same character entered in both the Find what and Replace With fields. Replace All was then pressed and the number of replacements was taken as the frequency count of the character. The More option was used to match the Kashida, Alef Hamza, diacritics, and special characters. The counts were entered into a table and sorted, the table was then imported into MS Excel and the total number of all characters was calculated using the program. A function was run to calculate the percentage of usage for each character by dividing the character count over the total number of characters multiplied by 100. The results can be seen in Appendix A.
4.1.4 Limitations of the Study
The findings of this study are not conclusive. In order to be able to make decisions on such matters, more rigorous research should be done, taking into considerations the following:
1- The study does not address the issue of frequent letter combinations on a wide scale, nor does it address immediately-adjacent digraphs on the same hand. If you look at the Gilbreth’s principles of motions, you will find that sequential movements are most easily performed by alternating the hands with each motion. This develops a rhythm which not only improves speed, but spreads the work evenly.[1] Therefore, in frequent letter combinations like ال , etc., the keys should be struck by alternate hands (and ideally, with the same finger on each hand). Whether this is always the case can be argued. Nevertheless, the study will make considerations in this regard for the most frequent letter combinations: ال, من, لم, في, إلى, أن, etc. by making use of an admittedly simple and credulous survey in Appendix B, also performed by the author of this paper.
2- Same finger travel between upper and lower rows is not addressed. Frequent letter sequences which cause travel between upper and lower rows should be avoided. By putting the common letters on the home row, we eliminated the phenomenon know as a "hurdle," where the same finger must travel from the top row to the bottom (or vice-versa)-"hurdling" over the home row. There are over 1,200 common English words with multiple hurdles, such as the word "number," where the first three characters are struck with the right index finger. Hurdles are a common cause of spelling errors, because fingers tend to get "lost" while hurdling.
3- The study
does not make considerations for the most frequent words in the Arabic language
systematically. However, the most frequent words will be addressed, by utilizing
a list compiled by Bonnie Glover Stalls and Yaser Al-Onaizan as part of the NLP
research activities at University of Southern California's/Information Sciences
Institute[2].
The effects of the previous issues do not seem to be very large and seem to be
tightly linked to (and thereby a function of) letter frequency which is the main
premise for this study.
4- The frequency count of Arabic characters does not take into account common spelling mistakes. Such as neglecting the Hamza in the letters أ and إ, and putting a comma rather than a period at the end of a sentence. Also, the analysis does not take into account the Egyptian school of writing the letter Yaa ي as alif maqSuura ى, which causes problems because Egyptians configure their machines and applications to display the ي as ى, so they may be free to type ى as ي, and find nothing wrong with it because it is always displayed as ى. But when this document is transferred to a machine that does not have this configuration, or displayed in a machine outside Egypt the word that was intended to be إلى appears إلي.
5- This study does not run field test to validate the superiority of the new designs.
4.1.5 Design Methodology[3]
1- The study will favor the right hand a little more than the left hand since most people are right handed. Dvorak’s keyboard uses the right hand 56% of the time, and left hand 44%.
2- The study will try to get most-used letters on the home keys (where your fingers rests when not being used) excluding the little finger (pinkie) which will be included in priority number four in this list[4]
3- Next best is the home row (the middle line of alphabetic keys, where your fingers rest naturally).
4- Next best is the upper row (excluding the little fingers and the middle keys of the upper row, which require significant finger travel, which will be included first in priority number six).
5- Next best is the middle of the lower row.
6- Least good is the outer ends of the lower row, since reaching out for keys using the little finger is harder.
4.1.6 Analysis and Design
1- The combination ال (meaning “the”) is unarguably the most frequent combination and since ideally the keys should be struck by alternate hands (and ideally, with the same finger on each hand), there ideal place is on the home keys (not necessitating the fingers to travel) typed by the right and left index finger respectively.
2- The next most frequent letter combination is من, they were placed so as to be typed by the right and left middle finger respectively.
3- The rest of the characters were positioned in a similar manner: by applying the stated method, and considering letter combinations when possible.
4- Since 0, 1, 2, 3 and 4 are the most frequently typed digits, there location on the number row is on the right side, so that they are typed by the right hand.
5- The characters on the number row with the Shift are insignificant in frequency as can be seen in Appendix A. They are left in the same locations as the standard Arabic keyboard to reduce re-training confusion.
6- The diacritics ( ُ ٌ َ ً ِ ٍ ْ) were left in the same locations as the standard Arabic keyboard to reduce re-training confusion, since their frequency is also insignificant.
7- The Arabic thousands separator (,)[5], is placed in the same location of the period but type with the shift key.
The results of the layout are shown in figure 5-1
Figure 5-1: First Design of Alternative Layout

As we can see, from the resulting layout, three character locations typed with the Shift are now blank do to the elimination of the ligatures (لا لأ لإ لآ)
From the table of frequency of two letter combinations (digraphs) in Appendix B, the sum of digraphs typed by alternating hands is 243,065. The sum of digraphs typed with the same hand is 33,685. Therefore, we have achieved the principle of alternating hands, albeit only on the letter combinations of the most frequent letters.
Frequent Arabic words such as من, في, لي, التي, are on the home row. Words such as إلى, على, في, أن, إن, do not require a Shift for the letters (as apposed to the standard keyboard). Words such as هذه, هذا,الذي , لماذا, لذلك don’t require the hand to leave the basic letter pad for the letter ذ as in the standard keyboard.
Words with characters using the same finger to type as the previous character are much harder to type. By observation, this design does necessitate the typing of such letter combinations.
4.2 Second Design (Optimal)
4.2.1 Design Methodology
This design will only seek to remedy the problematic characters listed in Chapter 1. It will mainly shuffle the frequent but inaccessible letters (. ، د ج ذ) with the infrequent but accessible ones (ؤ لا ء ئ ز). This will also involve the relocation of the letter و. The characters (ء ئ ؤ) since very infrequent will be added to the Shift combinations instead of the current ligatures (لأ, لإ, لآ). The latter ligatures are remnants of the mechanical keyboard but are now redundant because of contextual analysis, therefore they will be removed from the design and replaced by the letters (ء ئ ؤ) respectively.
The results can be seen in figure 5-2. The figure only represents the changed keys; all the other keys are left intact.
Figure 5-2: Second Design of Alternative layout

As we can see, the د which is very frequent is in a fairly accessible place in this design rather than the key on the top row second to the left. Not only is the little finger the shortest, it had to go the furthest to reach the د key (the greatest finger travel), a very unreasonable position.
The period and comma in this design correspond to their locations in the QWERTY keyboard, thereby eliminating the confusion for bilingual typists and also not requiring a Shift, which is unreasonable, since most sentences end with either a period or a comma.
The ج , which is also a frequent letter was also relocated to a more accessible key. Its key was filled by the ز which is one of the least frequent letters.
The letter ذ which is rather frequent is now part of the basic layout of the keyboard in this design, as apposed to the far left key of the number pad.
The difference between this design and that of the current standard Arabic keyboard is only in 10 keys. This is less than the difference between the Apple and IBM-clone layouts, which takes an average of 7 days of migration (based on my own experience). The transition should not be very hard for most people, and should be more feasible economically and in terms of re-training.
The basic Arabic keyboard is already based on a reasonably efficient layout, except for those few problematic letters. Remedying these letters would achieve the greatest improvements in speed, error rate and comfort and the design will be less likely to be rejected.
[1] In "Touch" With the Past, The Quest, Newsletter of the Gilbreth Network. (Volume 4, Number 2), Summer 2000
[3] This study does not seek to radically change the physical structure of the keyboard, although some might argue that it is necessary. The thumb, for example, is the strongest finger in either hand. It doesn’t make sense to use it only for the space bar. Further, most people only use only one thumb consistently for the spacebar; leaving the other thumb -which is strongest finger in both hands- idol. The little finger (pinkie) on the other hand is overused (used for the Shift key in both hands and requires the maximum finger-pushing force). However, these issues relate to a radical change in the physical structure of the keyboard, and are beyond the scope of this study.
[4] Since the little finger is shorter, it has to go further to reach its keys. It is also the main source of pain in RSI sufferers.
[5]
22,000.11 in Hindi numerals looks like this