Fârsi Language Profile

Language Identification: ISO 639-1 fas

Other Names: Persian language, Dari

Local name:  ارسی Fârsi

Pronunciation: [fɒːɾˈsiː]

Region: Iran, Afghanistan, Tajikistan, Uzbekistan, Iraq, Turkmenistan, regions of Russia & Azerbaijan

Ethnic Population: Persians, Kurds, Baloch, Bakhtyārī, Lurs and other smaller minorities

Farsi users: L1 speakers 73,581,834 (in Iran 50,568,000, in Afghanistan 16,650,000, in Tajikistan 6,373,834); L2 speakers: Uzbekistan, Bahrain

Language family: Indo-European; Indo-Iranian; Iranian; Western Iranian; Southwestern Iranian; Persian

Early forms: Old Persian; Middle Persian

Language Maps: Iran, Afghanistan, Tajikistan, Uzbekistan, Iraq, Turkmenistan, regions of Russia & Azerbaijan

Language Status: Official language in Iran (as Persian), in Afghanistan (as Dari), in Tajikistan (as Tajik)

Dialects: Iranian Persian, Dari, Tajik, Bukhori, Pahlavani, Hazaragi, Aimaq, Judeo-Persian, Dehwari, Judeo-Tat, Caucasian Tat, Armeno-Tat

Typology: SOV (Subject – Object – Verb) word order.

Writing system: Persian alphabet

Linguistic map of modern Iranian languages: Farsi (green), Pashto (purple) and Kurdish (turquoise), Lurish (red), Baloch (Yellow), as well as smaller communities of other Iranian languages (https://en.wikipedia.org/wiki/Iranian_languages)


Fârsi belongs to the Indo-Iranian branch of the Indo-European language family. It is spoken by an estimated 110 million people worldwide, primarily in Iran, Afghanistan, and Tajikistan. The language is known by several names. Persian is the more widely used name of the language in English, from Latin Persia, from Greek Persis. The Academy of Persian Language and Literature calls the language Persian. Farsi is the Arabicized form of Parsi, from Pars, the name of the region where the language evolved. Dari is the local name used for Persian in Afghanistan. Tajik (Tajiki) is the local name used for Persian in Tajikistan.

Fârsi is a continuation of Middle Persian, the official religious and literary language of the Sasanian Empire (224–651 CE), itself a continuation of Old Persian, which was used in the Achaemenid Empire (550–330 BC). It originated in the region of Fars (Persia) in southwestern Iran.


The history of the Persian language can be divided into the following three distinct periods: Old, Middle, and Modern.

Old Persian as a written language, is one of the oldest Indo-European languages which is attested in original texts as recorded in the southwest in cuneiform inscriptions of the Persian kings of the Achaemenid dynasty (circa 550-330 BC), notably Darius I and Xerxes I Related to Old Persian, but from a different branch of the Iranian language family, was Avestan, the language of Avesta, the sacred scriptures of Zoroastrianism.

Except for this specific use, Avestan disappeared centuries before the advent of Islam..

Middle Persian is a later form of Old Persian. The native name of Middle Persian was Parsik, after the name of the ethnic group of the southwest “Pars” This is the origin of the name Farsi signifying  New Persian. It was the language of the Sassanian Empire (AD 226-641) ). It has a simpler grammar than Old Persian and was usually written in a script  adopted from Aramaic that declined after the Arab conquest in the 7th century.

Modern Persian developed by the 9th century. It is written in Perso-Arabic script (an expanded version of Arabic script) it has been the official and cultural language of Persia since it first appeared. Its grammar is simpler than that of Middle Persian, and contains a vast Arabic vocabulary.

Geographical distribution

Farsi is the language of Iran (formerly Persia) and is also widely spoken in Afghanistan and, in an archaic form, in Tajikistan and the Pamir Mountain region., but was historically a more widely understood language in an area ranging from the Middle East to India. There are also Farsi speakers in other Persian Gulf countries (Bahrain, Iraq, Oman, People’s Democratic Republic of Yemen, and the United Arab Emirates), as well as large communities in the USA.

There are over 40 million Farsi speakers (about 60% of Iran’s population); over 14 million Dari Persian speakers in Afghanistan (50% of the population according to CIA World FactBook & Britannica); and about 2 million Dari Persian speakers in Pakistan.

In Afghanistan Farsi is spoken almost everywhere and over 50% of Afghanistan’s total population speak Farsi or Dari.

The official status of Fârsi:

Fârsi is the official language of Iran. It is the co-official language of Afghanistan, along with Pashto. Dari is Afghanistan’s lingua franca and is the native tongue of various Afghan ethnic groups. Tajiki is the official language of Tajikistan. It is also spoken in Kazakhstan, Kyrgyzstan, Iraq, Oman, Qatar. Standard Persian is based on the dialect spoken in and around Teheran, the capital of Iran.


Dialects: Scholars recognize three major dialect divisions of Persian: Farsi, or the Persian of Iran; Dari Persian of Afghanistan; and Tajiki, a variant spoken in Tajikistan in Central Asia. Farsi and Dari have further dialectal variants, some with names that coincide with provincial names. All are mutually intelligible.

Dari Persian, which, until recently, was mainly spoken in Afghanistan, deferred to the Tehran standard as its model. Although there are clear phonological and morphological contrasts, Farsi and Dar Persian remain quite similar. The dialectal variation between Farsi and Dari has been described as analogous to that between European French and Canadian French. Dari is more conservative in maintaining vowel distinctions that have been lost in Farsi. Luri and Bakhtiari, languages of southwest Iran, are most closely related to Farsi, but these are difficult for Teheran-standard speakers to understand. While speakers of Luri regard their speech as a dialect of Persian, speakers of Farsi do not agree.

Written language

The modern Farsi script, also known as Farsi-Arabic, uses the Arabic alphabet, with four extra letters, for a total of 32 characters. Words are written from right to left, while the numbers, which are similar to Arabic numbers, read from left to right,. Like in written Arabic, Farsi does not use the uppercase, and most letters are cursive.

However, while they share the same alphabet, Farsi and Arabic are very different languages belonging to different families and have completely different phonologies and grammar.

The vowel signs of Arabic script are used in Farsi, although some are pronounced differently. The pronunciation of many Farsi words with Arabic roots is also different from that of their Arabic originals.

Farsi uses numerous foreign words derived not only from Arabic, but also from Turkish and to a lesser extent, French and English.

 The Farsi alphabet

The Fârsi alphabet (Fârsi: /alefbâye fârsi/الفبای فارسی) consists of 32 letters, the appearance of a letter changes depending on its position: isolated, initial (joined on the left), medial (joined on both sides) and final (joined on the right) of a word.. In Farsi, words are written from right to It is entirely written cursively. That is, the majority of letters in a word connect to each other. This writing style is also implemented on computers.

Vowels and vowel diacritics

In the modern Persian script, historically short vowels are usually not written, only the historically long ones are represented in the text, so words distinguished from each other only by short vowels are ambiguous in writing: Farsi


Fârsi Alphabet

Name Romanisation IPA Final Intermediate Initial Isolated
ʾalef ā / ʾ [ɑ] ـا ـا آ / ا
be B [b] ـب ـبـ ب
pe P [p] ـپ ـپـ پ
te T [t] ـت ـتـ
s̱e [s] ث ثـ
jim j [d͡ʒ] ﺞ‎ ـجـ‎ ﺟ‎ ﺝ‎
če č, c [t͡ʃ] ﭻ‎ ـچـ‎ ﭼ‎ ﭺ‎
ḥe(-ye jimi) [h] ﺢ‎ ـحـ‎ ﺣ‎ ﺡ‎
khe x [x] ﺦ‎ ـخـ‎ ﺧ‎ ﺥ‎
dāl d [d] ـد‎ ـد‎ ﺩ‎ ﺩ‎
ẕāl [z] ـذ‎ ـذ‎ ﺫ‎ ﺫ‎
re r [ɾ] ـر‎ ـر‎ ﺭ‎ ﺭ‎
ze z [z] ـز‎ ـز‎ ﺯ‎ ﺯ‎
že ž (zh) [ʒ] ـژ‎ ـژ‎ ژ‎ ژ‎
sin s [s] ـس‎ ـسـ‎ ﺳ‎ ﺱ‎
šin š (sh) [ʃ] ـش‎ ـشـ‎ ﺷ‎ ﺵ‎
ṣād [s] ـص‎ ـصـ‎ ﺻ‎ ﺹ‎
żād ż [z] ـض‎ ـضـ‎ ﺿ‎ ﺽ‎
ṭā [t] ـط‎ ـطـ‎ ﻃ‎ ﻁ‎
ẓā [z] ـظ‎ ـظـ‎ ﻇ‎ ﻅ‎
ʿeyn ʿ [ʔ] ـع‎ ـعـ‎ ﻋ‎ ﻉ‎
ġeyn ġ [ɣ], [ɢ] ـغ‎ ـغـ‎ ﻏ‎ ﻍ‎
fe f [f] ـف‎ ـفـ‎ ﻓ‎ ﻑ‎
qāf q [ɢ],[ɣ],[q] ـق‎ ـقـ‎ ﻗ‎ ﻕ‎
kāf k [k] ـک‎ ـکـ‎ ﮐ‎ ک‎
gāf g [ɡ] ـگ‎ ـگـ‎ ﮔ‎ گ‎
lām l [l] ـل‎ ـلـ‎ ﻟ‎ ﻝ‎
mim m [m] ـم‎ ـمـ‎ ﻣ‎ ﻡ‎
nun n [n] ـن‎ ـنـ‎ ﻧ‎ ﻥ‎
vāv v / ū / ow / [v]-, [uː], [ow] و‎ و‎ و‎ و‎
He (-ye do-češm) h, ẹ [h], [e] ـه‎ ـهـ‎ هـ‎ ﻩ‎
ye y / ī / á [j], [i], [ɒː]; ـیـ ﻳ‎


Fârsi Numbers

The Eastern Arabic numerals are used in conjunction with the Arabic alphabet in the Perso-Arabic script. The shapes of the Persian digits four, five, and six and different from the shapes used in Arabic.

1 – ۱ yek (یک)
2 – ۲ do (دو)
3 – ۳ se (سه)
4 – ۴ chahâr (چهار)
5 – ۵ panj (پنج)
6 – ۶ shesh (شش)
7 – ۷ haft (هفت)
8 – ۸ hasht (هشت)
9 – ۹ noh (نه)
10 – ۱۰ dah (ده)
11 – ۱۱ yâzdah (یازده)
12 – ۱۲ davâzdah (دوازده)
13 – ۱۲ sizdah (سیزده)
14 – ۱۴ chahârdah (چهارده)
15 – ۱۵ poonzdah (پانزده)
16 – ۱۶ shoonzdah (شانزده
17 – ۱۷ hifdah (هفده)
18 – ۱۸ hijdah (هجده)
19 – ۱۹ noozdah (نوزده)
20 – ۲۰ bist (بیست)



Farsi as all Iranian languages shows in its basic elements the characteristic features of an Indo-European language. In the case of languages in contact with Indian civilization, the most noticeable non-Iranian feature often taken over is the Indo-Aryan series of retroflex sounds.

The detailed phonological and morphological structure of the Indo-European parent language has been with time simplified in the development of the Iranian languages. The basic phonological structure of Common Old Iranian has for the most part been maintained, but the morphological system has continued to be simplified.


The grammatical systems of Farsi (Western) Persian and Dari (Eastern) Persian do not differ in any significant way. The description below covers the main grammar points of both languages. Both Dari and Farsi are inflected languages, i.e., they add suffixes to roots to express grammatical relations and to form words. Unlike many other Iranian languages, Dari and Farsi have lost most of their noun and verb inflections.

Farsi language uses the subject-verb-object sentence structure. A peculiarity of Persian is that there is no masculine or feminine gender. For example, the third person او (u) means both he and she. There are no articles and no grammatical gender and case is not marked.

Nouns can be simple or compound. Any unmodified noun in Persian may be generic, i.e., refer to one or more than one items. Plural is not obligatory when more than one item are implied. Some nouns borrowed from Arabic can be pluralized by using the Arabic broken plural.

Suffixes predominate Persian morphology, though there are a small number of prefixes.  Verbs can express tense and aspect, and they agree with the subject in person and number. There is no grammatical gender in modern Persian, and pronouns are not marked for natural gender. In other words, in Persian, pronouns are gender neutral

Adjectives have only one form. They agree neither in gender (since Persian is a gender-neutral language) nor in number with the noun they modify. They come after the noun and are related to it with the genitive particle «e»  and can be modified by adverbs:

Farsi verbs are inflected for three singular and three plural persons. The 2nd and 3rd person plural are often used when referring to singular persons for politeness. There are three moods: indicative, subjunctive, counterfactual conditional. Aspect is as important as tense. There are two aspects: imperfective and perfective. There are three tenses: present, past, and inferential past. Inferential past expresses second-hand knowledge, information, or conclusions.

There are about ten tenses in all. The greatest variety is shown in tenses referring to past events. A series of past tenses (past simple, imperfect, and pluperfect) is matched by a corresponding series of perfect tenses (perfect simple, perfect continuous, and perfect pluperfect— the last of these made by adding a perfect ending to the pluperfect tense

The present tense has a range of meanings (habitual, progressive, punctual, historic). In colloquial Persian this tense is also used with future meaning, although there also exists a separate future tense used in formal styles.  Future is not a tense but a modality (similar to the English want to/wanna + infinitive). All present and past forms may be used in a future context.

Subject pronouns are usually dropped since the verb form already carries information about person and number.