APPENDIX A
Frequency Table of characters available on standard Arabic IBM-clone keyboard. Excludes symbols and special characters not available on keyboard. Obtained from a corpus of 165,604 words from different disciplines amounting to 960,267 characters (with spaces).
|
|
Character |
Description |
Number of Occurrences |
Percent |
|
1 |
|
Space |
165924 |
17.11 |
|
2 |
|
Alif |
108,456 |
11.18 |
|
3 |
|
Laam |
98,086
|
10.12 |
|
4 |
|
Yaa' |
52,139 |
5.38 |
|
5 |
|
Miim |
48,347
|
4.99 |
|
6 |
|
Waaw |
42,163 |
4.35 |
|
7 |
|
Nuun |
39,365
|
4.06 |
|
8 |
|
Taa' |
30,794 |
3.18 |
|
9 |
|
Baa' |
29,293 |
3.02 |
|
10 |
|
Raa' |
28,955
|
2.99 |
|
11 |
|
Haa' |
27593 |
2.85 |
|
12 |
|
'Ayn |
26,499 |
2.73 |
|
13 |
|
Faa' |
21190 |
2.19 |
|
14 |
|
Taa' marbuuta |
21,066 |
2.17 |
|
15 |
|
Daal |
19356 |
2 |
|
16 |
|
Qaaf |
18180 |
1.87 |
|
17 |
Hamza-on-alif |
16871 |
1.74 |
|
|
18 |
، |
Arabic comma |
16686 |
1.72 |
|
19 |
|
Siin |
15057 |
1.55 |
|
20 |
|
Kaaf |
14965 |
1.54 |
|
21 |
|
Haa' |
12107 |
1.25 |
|
22 |
|
Jiim |
9670 |
1 |
|
23 |
Hamza-under-alif |
7836 |
0.81 |
|
|
24 |
|
Saad |
7535 |
0.78 |
|
25 |
|
Dhaal |
7477 |
0.77 |
|
26 |
|
Alif maqsuura |
7447 |
0.77 |
|
27 |
. |
Period, decimal point / full stop |
7222 |
0.74 |
|
28 |
|
Taa' |
5966 |
0.62 |
|
29 |
|
Xaa' |
5612 |
0.58 |
|
30 |
|
Shiin |
5268 |
0.54 |
|
31 |
|
Carriage return |
5108 |
0.53 |
|
32 |
|
Thaa' |
4990 |
0.51 |
|
33 |
|
Daad |
4725 |
0.49 |
|
34 |
|
Ghayn |
3515 |
0.36 |
|
35 |
|
Zaa' |
3263 |
0.34 |
|
36 |
ً |
تنوين الفتح Tanween fateh |
2828 |
0.29 |
|
37 |
: |
Colon |
2448 |
0.25 |
|
38 |
|
Hamza-on-the-line |
2439 |
0.25 |
|
39 |
|
Hamza-on-yaa' |
2341 |
0.24 |
|
40 |
ـ |
Kashida- tatweel[3] |
2110 |
0.22 |
|
41 |
“ |
Quotation mark |
2048 |
0.21 |
|
42 |
|
Zaa' |
1879 |
0.19 |
|
43 |
) |
Left parenthesis |
1471 |
0.15 |
|
44 |
( |
Right parenthesis |
1467 |
0.15 |
|
45 |
1 |
Arabic-Indic digit one |
1168 |
0.12 |
|
46 |
ّ |
شدة Shadda |
1065 |
0.11 |
|
47 |
|
Madda-on-alif |
1053 |
0.11 |
|
48 |
2 |
Arabic-Indic digit two |
922 |
0.1 |
|
49 |
0[4] |
Arabic-Indic digit zero |
919 |
0.09 |
|
50 |
|
Hamza-on-waaw |
813 |
0.08 |
|
51 |
4 |
Arabic-Indic digit four |
757 |
0.08 |
|
52 |
3 |
Arabic-Indic digit three |
721 |
0.07 |
|
53 |
9 |
Arabic-Indic digit nine |
548 |
0.06 |
|
54 |
5 |
Arabic-Indic digit five |
546 |
0.06 |
|
55 |
- |
Hyphen-minus |
523 |
0.05 |
|
56 |
7 |
Arabic-Indic digit seven |
507 |
0.05 |
|
57 |
8 |
Arabic-Indic digit eight |
495 |
0.05 |
|
58 |
6 |
Arabic-Indic digit six |
469 |
0.05 |
|
59 |
؟ |
Arabic question mark |
267 |
0.03 |
|
60 |
ُ |
ضمة Damma |
257 |
0.03 |
|
61 |
َ |
فتحة Fatha |
243 |
0.03 |
|
62 |
! |
Exclamation mark |
204 |
0.02 |
|
63 |
ِ |
كسرة Kasra |
110 |
0.01 |
|
64 |
|
Horizontal tab |
65 |
0.01 |
|
65 |
ٍ |
تنوين الكسر Tanween kasr |
61 |
0.01 |
|
66 |
+ |
Plus sign |
40 |
0.0 |
|
67 |
* |
Arabic five pointed star |
30 |
0.0 |
|
68 |
$ |
Dollar sign |
25 |
0.0 |
|
69 |
= |
Equals sign |
20 |
0.0 |
|
70 |
؛ |
Arabic semicolon |
18 |
0.0 |
|
71 |
ٌ |
تنوين الضم Tanween damm |
17 |
0.0 |
|
72 |
/ |
Forward slash |
16 |
0.0 |
|
73 |
} |
Right curly brace |
14 |
0.0 |
|
74 |
{ |
Right curly brace |
14 |
0.0 |
|
75 |
% |
Percent sign |
9 |
0.0 |
|
76 |
] |
Closing square bracket / right square bracket |
6 |
0.0 |
|
77 |
[ |
Opening square bracket / left square bracket |
6 |
0.0 |
|
78 |
_ |
Underscore |
3 |
0.0 |
|
79 |
| |
Broken (vertical) bar |
3 |
0.0 |
|
80 |
\ |
Backward slash |
1 |
0.0 |
|
81 |
& |
Ampersand |
1 |
0.0 |
|
82 |
ْ |
سكون Sukoon |
0 |
0.0 |
|
83 |
> |
Less-than sign |
0 |
0.0 |
|
84 |
< |
Greater-than sign |
0 |
0.0 |
|
85 |
~ |
Tilde |
0 |
0.0 |
|
86 |
‘ |
Left single quotation mark |
0 |
0.0 |
|
87 |
@ |
Commercial at |
0 |
0.0 |
|
88 |
# |
Number sign |
0 |
0.0 |
|
89 |
^ |
Caret |
0 |
0.0 |
|
90 |
، |
Arabic thousands separator |
?[5] |
0.0 |
|
91 |
, |
Arabic decimal separator |
?[6] |
0.0 |
[1] Letter often misspelled without hamza (common mistake).
[2] Letter often misspelled without hamza (common mistake).
[3] Used mostly for Arabic poetry justification and after the letter Haa’ which as the abbreviation for the Hijri year (هـ).
[4] Arabic and Hindi Numbers are in the same place on the keyboard. They’re appearance as Arabic or Hindi depend on the system and application settings.
[5] Same as comma, hard to predict.
[6] Same as English comma, hard to predict.