Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Given an alphabet (finite set of symbols) Σ, a <strong>language</strong> is a set of all sequences of such symbols in that alphabet. A language is a <strong>regular language</strong> exactly when it can be expressed in terms of a (formal) <strong>regular expression</strong> and the membership of any string can be decided by a <strong>finite-state machine</strong>.</p> <p>Regular languages belong to the highest hierarchy of the <em>Chomsky Hierarchy</em>, and are also called Type-3 grammars. They are above the Type-2 <em>context-free</em> languages which are recognized by pushdown automata, which are above the Type-1 <em>context-sensitive</em> languages recognized by linear bounded automata, and above the Type-0 <em>recursively enumerable</em> languages which can be recognized by Turing Machines. All regular languages are context-free, context-sensitive, and recursively enumerable. Formal regular expressions can be converted to deterministic finite state machines and to non deterministic finite machines and still represent the same regular language.</p> <p>Please do not confuse this with <strong>regex</strong>. Most regex engines are far more expressive than formal regular expressions, finite state machines, and can represent non-regular languages.</p> <h2>Construction of a Regular Language</h2> <p>The set of all <strong>regular languages</strong> over a given alphabet Σ can be produced exactly by this process:</p> <ul> <li>The empty language <code>{}</code>, rejecting all strings.</li> <li>The language containing only the empty string <code>ε</code></li> <li>All languages containing only a single symbol <code>s ∈ Σ</code>.</li> <li>Every language created by the <em>union</em>, <em>concatenation</em>, or <em>kleene-star</em> of regular languages. Suppose <code>v</code> and <code>w</code> are strings of a regular language <code>A</code> and <code>B</code> respectively: <ul> <li>The union <code>(v|w)</code> is also regular. It accepts languages that are in any of <code>A</code> or <code>B</code>.</li> <li>The concatenation <code>vw</code> is also regular.</li> <li>The kleene-star <code>v*</code> is also regular. It means any copies of strings in <code>A</code> concatenated, including 0.</li> </ul></li> </ul> <h2>Examples and Nonexamples of Regular Languages</h2> <ul> <li><p>Given a simple alphabet <code>Σ = {0, 1}</code>, where <code>|</code> represents union, <code>*</code> represents kleene-star, these formal regular expressions all represent represents a regular language:</p> <ul> <li>The regular expression <code>"0"</code>, <code>"1"</code>, <code>"(0|1)"</code>, <code>"01"</code>, <code>"11"</code>, <code>"0*"</code> are all regular.</li> <li>The regular expression <code>"(0(0|1)*1)"</code>, representing all binary strings beginning with 0 and ending with 1, is regular.</li> <li>Given a regular expression <code>R</code>, the language <code>"R+"</code> and <code>"R?"</code> all represent a regular language, whereas <code>+</code> represents <em>one or more</em>, and <code>?</code> represents <em>zero or one</em>. Namely, <code>"R+"</code> is equivalent to <code>"RR*"</code>, and <code>"R?"</code> is equivalent to <code>"(R|ε)"</code>.</li> <li>Given a regular expression <code>R</code>, the language <code>"R{m,n}"</code> is regular for all natural <code>m,n</code>, where <code>{m,n}</code> represents "from <code>m</code> copies to <code>n</code> copies". This is because it also involves union and concatenation: <code>"R{1,3}"</code> is expanded to <code>"(R|RR|RRR)"</code>.</li> </ul></li> <li><p>Given an alphabet used by regex engines, usually an ASCII or Unicode alphabet containing all ASCII or Unicode characters respectively:</p> <ul> <li>The regex <code>/^.+$/</code> is regular. It includes all non-empty sequences of any character.</li> <li>The regex <code>/^#[A-Za-z]{1,3}[0-9]{2,4}$/</code> represents a regular language, consisting all strings which being with a hashtag, then one to three ASCII letters, followed by two to four decimal digits.</li> <li>The regex <code>/^([\d][\w])*$/</code> represents a regular language. It consists all strings which alternate digit characters and word characters. The shorthand <code>\d</code> and <code>\w</code> are examples of union.</li> </ul></li> <li><p>Many regex engines are much more expressive than regular languages. Backreferences can cause a regex to represent a non-regular language, and consequently they cannot be decided by a finite state machine.</p> <ul> <li>The regex <code>"(.+)\1"</code> represents an <em>irregular</em> language. Involving a backreference capturing the first group <code>.+</code>, it accepts all the sequences of uppercase Latin letters repeated exactly twice. They are called <em>squares</em> in formal language theory. <ul> <li><code>"ABCABC"</code>, <code>"1234.1234."</code> are accepted</li> <li><code>"ABCAB"</code>, <code>"1234567891234567890"</code> are rejected.</li> </ul></li> </ul></li> </ul> <h2>Further Reading</h2> <ul> <li><a href="/questions/tagged/regex" class="post-tag" title="show questions tagged &#39;regex&#39;" rel="tag">regex</a>: Short for <strong>regular expression</strong>. Many regex engines nowadays are far more expressive than formal regular expressions and finite-state machines.</li> <li><a href="/questions/tagged/finite-state-machine" class="post-tag" title="show questions tagged &#39;finite-state-machine&#39;" rel="tag">finite-state-machine</a>: They are equivalent in expressiveness to formal regular expressions. They represent exactly the regular languages. Acronyms include FSM.</li> <li><a href="/questions/tagged/dfa" class="post-tag" title="show questions tagged &#39;dfa&#39;" rel="tag">dfa</a>: Deterministic finite automaton. <a href="/questions/tagged/nfa" class="post-tag" title="show questions tagged &#39;nfa&#39;" rel="tag">nfa</a> Nondeterministic finite automaton. <ul> <li>Both are equivalent in expressiveness.</li> </ul></li> <li><a href="/questions/tagged/context-free-grammar" class="post-tag" title="show questions tagged &#39;context-free-grammar&#39;" rel="tag">context-free-grammar</a>, <a href="/questions/tagged/context-sensitive-grammar" class="post-tag" title="show questions tagged &#39;context-sensitive-grammar&#39;" rel="tag">context-sensitive-grammar</a>: Tags referring to lower levels of the Chomsky hierarchy</li> <li><a href="http://en.wikipedia.org/wiki/Regular_language" rel="nofollow noreferrer">Wikipedia, includes explanation about squares and irregular regexes</a></li> <li><a href="https://stackoverflow.com/q/6718202">What is a regular language?</a></li> </ul>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload