APPENDIX A

Frequency Table of characters available on standard Arabic IBM-clone keyboard. Excludes symbols and special characters not available on keyboard. Obtained from a corpus of 165,604 words from different disciplines amounting to 960,267 characters (with spaces).

 

 

Character
Description

Number of Occurrences

Percent

1

 

Space

165924

17.11

2

Alif

108,456

11.18

3

Laam

98,086

10.12

4

Yaa'

52,139

5.38

5

Miim

48,347

4.99

6

Waaw

42,163

4.35

7

Nuun

39,365

4.06

8

Taa'

30,794

3.18

9

Baa'

29,293

3.02

10

Raa'

28,955

2.99

11

Haa'

27593

2.85

12

'Ayn

26,499

2.73

13

Faa'

21190

2.19

14

Taa' marbuuta

21,066

2.17

15

Daal

19356

2

16

Qaaf

18180

1.87

17

[1]

Hamza-on-alif

16871

1.74

18

،

Arabic comma

16686

1.72

19

Siin

15057

1.55

20

Kaaf

14965

1.54

21

Haa'

12107

1.25

22

Jiim

9670

1

23

[2]

Hamza-under-alif

7836

0.81

24

Saad

7535

0.78

25

Dhaal

7477

0.77

26

Alif maqsuura

7447

0.77

27

.

Period, decimal point / full stop

7222

0.74

28

Taa'

5966

0.62

29

Xaa'

5612

0.58

30

Shiin

5268

0.54

31

 

Carriage return

5108

0.53

32

Thaa'

4990

0.51

33

Daad

4725

0.49

34

Ghayn

3515

0.36

35

Zaa'

3263

0.34

36

ً

تنوين الفتح  Tanween fateh

2828

0.29

37

:

Colon

2448

0.25

38

Hamza-on-the-line

2439

0.25

39

Hamza-on-yaa'

2341

0.24

40

ـ

Kashida- tatweel[3]

2110

0.22

41

Quotation mark

2048

0.21

42

Zaa'

1879

0.19

43

)

Left parenthesis

1471

0.15

44

(

Right parenthesis

1467

0.15

45

1

Arabic-Indic digit one

1168

0.12

46

ّ

شدة Shadda

1065

0.11

47

Madda-on-alif

1053

0.11

48

2

Arabic-Indic digit two

922

0.1

49

0[4]

Arabic-Indic digit zero

919

0.09

50

Hamza-on-waaw

813

0.08

51

4

Arabic-Indic digit four

757

0.08

52

3

Arabic-Indic digit three

721

0.07

53

9

Arabic-Indic digit nine

548

0.06

54

5

Arabic-Indic digit five

546

0.06

55

-

Hyphen-minus

523

0.05

56

7

Arabic-Indic digit seven

507

0.05

57

8

Arabic-Indic digit eight

495

0.05

58

6

Arabic-Indic digit six

469

0.05

59

؟

Arabic question mark

267

0.03

60

ُ

ضمة  Damma

257

0.03

61

َ

فتحة Fatha

243

0.03

62

!

Exclamation mark

204

0.02

63

ِ

كسرة Kasra

110

0.01

64

 

Horizontal tab

65

0.01

65

ٍ

تنوين الكسر Tanween kasr

61

0.01

66

+

Plus sign

40

0.0

67

*

Arabic five pointed star

30

0.0

68

$

Dollar sign

25

0.0

69

=

Equals sign

20

0.0

70

؛

Arabic semicolon

18

0.0

71

ٌ

تنوين الضم Tanween damm

17

0.0

72

/

Forward slash

16

0.0

73

}

Right curly brace

14

0.0

74

{

Right curly brace

14

0.0

75

%

Percent sign

9

0.0

76

]

Closing square bracket / right square bracket

6

0.0

77

[

Opening square bracket / left square bracket

6

0.0

78

_

Underscore

3

0.0

79

|

Broken (vertical) bar

3

0.0

80

\

Backward slash

1

0.0

81

&

Ampersand

1

0.0

82

ْ

سكون    Sukoon

0

0.0

83

>

Less-than sign

0

0.0

84

<

Greater-than sign

0

0.0

85

~

Tilde

0

0.0

86

Left single quotation mark

0

0.0

87

@

Commercial at

0

0.0

88

#

Number sign

0

0.0

89

^

Caret

0

0.0

90

،

Arabic thousands separator

?[5]

0.0

91

,

Arabic decimal separator

?[6]

0.0

 

 

[1] Letter often misspelled without hamza (common mistake).

[2] Letter often misspelled without hamza (common mistake).

[3] Used mostly for Arabic poetry justification and after the letter Haa’ which as the abbreviation for the Hijri year (هـ).

[4] Arabic and Hindi Numbers are in the same place on the keyboard. They’re appearance as Arabic or Hindi depend on the system and application settings.

[5] Same as comma, hard to predict.

[6] Same as English comma, hard to predict.

 

 

Back to DSP Contents