This is the first post of a series of lecture notes about a topic called “Theory of Computation.” This post (and lecture series) assumes that the reader is familiar with the following topics:
- Formal proofs (especially proof by contradiction).
We will introduce sections from these topics here that are necessary to understand most of these lecture notes. A set is a collection of items that are unique (Note: I won’t delve into the notions of set theory, just yet…), usually written with curly braces and separated by commas. An example of a set is : all items in the set are unique. We can have nested sets within sets, such as . The empty set, written , is the unique set that contains zero items. We say that belongs to (or is a member of) a set , written , if is a member of . Likewise, we say that is not a member of a set , written , if is not a member of . We write the cardinality of a set as , and is the number of elements in . For most of these lecture notes, we will only work with finite sets.
We say that is a subset of a set , written , if every element in is a member of . The powerset of a set , written , is the collection of all possible subsets of . For example, if , then . The union of two sets and , written , is the set that contains all elements from or (removing duplicates of course). The intersection of two sets and , written , is the set that contains all elements that are both in and . The set difference (or set minus) of two sets and , written is the set that contains all elements that are in but not in .
Exercise 1 Show that for any finite set with elements, has elements.
A tuple is a sequence of items, not necessarily unique, usually written between two parentheses and separated by commas. An example is . A -tuple is a tuple that has items. The Cartesian product (or just product) of two sets and , written , is the set of all 2-tuples with the first coordinate being a member of , and the second from .
A function takes an element of a given set (called the domain), and outputs a member of another set (called the range). If is a function and is its domain, its range, we write the function formally as . In other words, for some , (there may be some that are not defined). If every element has a defined , then we call a total function.
An alphabet is a finite set of characters (just another name for the elements of ). A string is a finite sequence of characters over . If , then an example string is (technically, we should be writing strings in the tuple format, but we will largely ignore that for strings). The set (called “star”) is the set of all strings over . The empty string, , is the unique string that has length 0. A language is a subset of , and can be described as some “set of strings over .” The complement of a language , is the set of all strings in that are not in .
Computer Science is largely the field based off of being able to understand what “computation” really means (hence the name). We will approach this by looking at several basic questions, which will turn out to be very deep and are very non-trivial in general:
- What does it “mean” to “compute” something?
- How do we measure (or could we even measure) how many resources a certain computation uses?
- And relating to the last question, what kinds of resources do we care about?
- How can we compare two computations to certify if one is better than another?
- Is there a way to formally define a “computer” so that we can ask basic questions about how they compute something?
Of course, at a first glance one may run into trouble as to how to formally define (or model) a “computer.” Since there are many types of computers available to buy, does one actually compute differently than another? Of course, with different configurations of the hardware and software, we can certainly see that some computations are faster or more efficient in some manner; for example, a process in one operating system only takes 2 GB of memory to run, whereas a ported version in another takes 3 GB.
But this is not the central question we are after, for multiple reasons:
- It seems difficult to characterize what “fast” computation actually means.
- We may run into the situation where some computer model actually computes some function (let’s say) faster (or more efficiently) on some inputs, and slower (or less efficiently) on other inputs.
- For some models, we may not worry about minute differences in computation efficiency between the two models; for instance, if the two models described above used 2 GB and 2.01 GB respectively, then we may not worry about this tiny difference.
We want, instead, to characterize what a computer can do, instead of what are the means it does to perform the computation. Now, let’s discuss what, at a high-level, a computer “needs” to perform a computation.
First, there usually is some input that is encoded in some form. In most computers, the input is given in binary (i.e., consisting entirely of 0s and 1s). There may be many input devices, such as a keyboard, mouse, etc. However, we will abstract this by calling all of them “1 device.”
Exercise 2 Given strings, give a method to produce 1 string that contains all of the information of the strings that makes each bit of each string “efficiently recoverable.” What is the bound on the number of bits that you get?
Next, there is a CPU (Central Processing Unit): the basic part of a computer that takes some input (code, data, etc.), and performs some operation depending on what was given to it. Naturally, there are many possible instructions that a CPU may receive, such as add two numbers, store this value, etc. To simplify everything, we will consider the entire contents of memory of the computer as a state. Since most computers store values as binary, we can think of the entire contents of memory as a single integer. We will think about transitions between states as what the CPU does to change (or possibly not) the contents of memory.
Like the input, there is usually some output that is the result of multiple iterations of code/data/etc. entering and then leaving the CPU, also encoded in some form. As before, we will assume that there is exactly one output for simplicity. But even deciding what is the “right” type of output can be cumbersome. So we overcome this by making the “output” be a binary “flag” of sorts: either a “yes” or a “no.”
And finally, there is some kind of secondary storage that will be where “long-term” code/data/etc. is stored. However, for simplicity, we will defer this part to a future section. The reason for this is that actually having some storage like this drastically changes the computational power of the model.
Following the argument above, we need to introduce a machine model for “solving” a problem or “computing” something. We want this machine to be as deterministic as possible in order to make analyzing it easier.
Definition 1 A DFA (Deterministic Finite Automaton) is a 5-tuple where is a finite set of states, is the start state, is the set of final states, is a finite set called the alphabet, and is a total function called the transition function.
DFAs are sometimes drawn (also called “embedded into the plane”). Conventionally, the start state has an arrow coming from nowhere to the start state, each state is a circle, final states are double circles, and transitions are drawn as arrows between states with the label from on the transition. Given an input consisting of characters, we start in the start state, and progressively read characters off of and transitioning between states according to until we cannot read any other characters; in this case, we say that the machine accepts if we land in one of the final states.
Example 1 Here is an example of a DFA that accepts all strings that consist of an odd number of 0’s and an even number of 1’s, or an even number of 0’s and an odd number of 1’s.
Exercise 3 For the DFA above, what are each of the 5 pieces defined to be?
Exercise 4 Create a DFA that accepts the same strings as Example 1 but with only 2 states.
Indeed, we can talk about DFAs as being very “simple” computers, in that they are not only deterministic, but they are also fully specified from construction, and have no dependency on “past history” when running on some input string. This last fact will actually be useful in characterizing the structure of regular languages. But for now, we will introduce more definitions:
Definition 2 The Kleene-closure, sometimes called Kleene star, for an alphabet is the set of all strings for which the symbols are from , usually denoted .
Example 2 If , then consists of all possible binary sequences.
Example 3 Here is a DFA that accepts all strings in that contain 3 or more sequential 0’s.
We want to look at a particular string, say , and how it is accepted by this machine. We start off in , then read the first character of , which is a 1. We stay in , and by reading a 0, we transition to . We keep reading through the string until we reach the last character, which is a 0. At this point, we have reached , and so we take the self-loop on . Since there are no other characters left, we accept this string. Now we will make this more general for an arbitrary DFA and arbitrary string.
Definition 3 Let be a DFA, and with characters denoted as , with each . We define a computation of on as the sequence of states (with ) so that:
- for all .
A computation is accepting if .
Definition 4 The language recognized by a DFA , written , is the set:
has an accepting computation on
Definition 5 A language is regular if there exists a DFA so that is the language recognized by . In other words, .
3. Regular Operations
Since languages are sets of strings, we want to consider the usual set operations: union, intersection, and complement. We want to show that we can apply these operations to the regular languages and still have a regular language at the end. The regular operations are: union, concatenation, and star. But to prove these, we need to prove closures of other operations first.
Definition 6 We say that regular languages are closed under an operation if whenever is applied to regular languages, the result is a regular language. In other words, if are regular languages, is also regular.
We start with complement.
Theorem 7 Regular languages are closed under complement.
Proof: We want to show that if is regular, then is regular. Since is regular, there is a DFA recognizing . is also defined as the set of strings such that has an accepting computation on . But this means that is defined as the set of strings with not having an accepting computation on ; this implies the computation ends in a non-final state. So we can just make a machine for .
Theorem 8 Regular languages are closed under union.
One idea to try to prove this is to feed the input string to the first machine, and then send it to the second machine afterwards. If either machine accepts the string, we accept; otherwise, we reject. The problem with this is that we only have one chance to read the input, and each character is consumed and cannot be returned to. Instead, we will try to remember both states of both machines as if we were simulating both simultaneously. Proof: We want to show that if are regular, then is regular. Because are regular, there are DFAs for accepting , respectively. We create a DFA to recognize :
- , and either or or both,
- where and for all .
Intuitively, we keep a pair of states in which the first coordinate corresponds to a simulation of the first machine, and the second coordinate of the second.
Example 4 For the following 2 DFAs:
Here is the resulting DFA with being the union of the two DFAs’ languages, after performing the product construction on them:
Corollary 9 Regular languages are closed under intersection.
Proof: If are regular languages, then , which we know is regular from Theorems 1 and 2.
Exercise 5 Show that regular languages are closed under symmetric difference (i.e., or but not both).
Now we discuss concatenation: if are regular, is necessarily regular? Given a string , we want to decompose such that and for some integer . But it seems to be difficult to know where to split the string: would there be a way to “guess” this index? We will generalize our model of computation to allow for “guesses.”
We in general would like a generalized version of our machine that allows for multiple selections to be made. Now the model will change depending on how a “selection” is made (uniformly at random, the first one seen, lexicographically first by state name, etc.). But here, we will use a model which involves “nondeterminism” which essentially means “picks the correct choice, if one exists.” Of course, this does not exist in the real world, because if we were able to do this, then we can solve many intractable problems much more quickly.
But the idea of “nondeterminism” is quite important when talking about later topics. One such topic is NP-completeness, which are the candidate languages that have no known polynomial-time algorithm to solve (most people believe that this is true). One definition of problems in NP uses nondeterminism, and another (equivalent) one concerns “certificates” of solutions.
Definition 10 An NFA (Nondeterministic Finite Automaton) is a 5-tuple where is a finite set of states, is the start state, is the set of final states, is a finite set called the alphabet, and is the transition function.
Note the only difference is in the transition function: we allow not reading a character from the input alphabet, we map to a set of states instead of a single state, and the function is not total anymore. Now note that a computation on an NFA needs to be re-defined because of this.
Definition 11 Let be an NFA, and with characters denoted as , with each . We define a computation of on as a sequence of states (with ) so that:
- for all .
A computation is accepting if .
There are a few differences: there may be many (accepting) sequences that give a valid computation, and the next state in the sequence is a member (not just defined as) of the set given by . We define the language recognized by an NFA the same way as for a DFA, except that we only require at least one accepting computation here.
Example 5 Observe the following NFA :
We test if . We start in . Reading a 1 has us allowed to be in either state and (there is a choice). But we keep track of what states we “could” be in as it is important for later. Now, we have to see what the transition from both states on each symbol goes to (since any one of them can lead to an accepting computation); doing so puts us in states , . Since there is no transition out of , the transition function on and 1 gives the empty set, . After reading the last 1, we still end up in states and ; since neither is accepting, we have that .
Note that “choice” here is implemented in a deterministic way as handling all possible computation paths at the same time. However, in the nondeterministic world, only one computation is ever performed, no matter how many computations on a given string actually exist.
Now we discuss the relation of DFAs to NFAs. Clearly, every DFA is an NFA, as a DFA really is an NFA without any choice or “non-reading” transitions. What is not-so-obvious is whether we can transform an NFA into an equivalent DFA. We will actually show this after some build-up; for now, we present constructions for the regular operations:
Theorem 12 If and have NFAs recognizing them, then there is an NFA recognizing .
Proof: Create a new non-final start state, and an -transition from it to the original start states. Correctness details are left as an exercise for the reader.
Theorem 13 If and have NFAs recognizing them, then there is an NFA recognizing . Note: we do not say anything about regularity, since we have not proven that NFAs recognize only the regular languages yet.
Proof: Let for be NFAs for and . Construct an NFA as follows:
- Create an -transition from every to .
Another way of saying this is that whenever we have achieved a string that is in , we are in a state in –we “guess” to transition over to the second machine. If we have “guessed” and at the end of reading the input have reached a state in , then we accept a string; otherwise, we do not. Correctness details are left as an exercise for the reader.
Theorem 14 If has an NFA recognizing it, then there is an NFA recognizing .
Proof: Create a new start state that is final, with an -transition to the old start state. For all final states (except the newly created one), create an -transition to the new start state. We actually could make all final states of the machine (except the new one) to be non-final, but modifying the machine as little as possible is our main goal here. Correctness details are left as an exercise for the reader.
Now we want to show that every NFA accepts a regular language. First, we will work without -transitions; after that, we will add them with a slightly different proof.
Theorem 15 Let be an NFA without -transitions. Then is regular.
Proof: We create a DFA as follows:
Now we look at NFAs with -transitions – the above construction does not work since -transitions are not “removed” from the resulting automaton (and DFAs cannot have them). We want to handle any arbitrarily many choices of taking an -transition at the same time; in essence, we take the union of all possible “reachable” states that can be reached via a chain of -transitions.
Definition 16 Let be an NFA, with . The epsilon closure (sometimes written -closure) of , , is the smallest subset such that:
Now we just slightly modify the proof above to handle -closures instead of just taking one transition:
Theorem 17 Let be an arbitrary NFA, with or without -transitions. Then is regular.
Proof: We create a DFA as follows:
Exercise 6 Construct a family of NFAs with states for NFA such that an equivalent DFA for must have at least states.
Example 6 Here is an NFA with -transitions:
We start with state : the -closure of it is , so we create a state called that. Then we look at all states that can be reached from via 1 transition on 0 or 1, and take their epsilon closures as necessary. We keep iterating until we either exhaust all possible states (which is in this case), or have no “branch” that can be reached. The resulting constructed DFA is below:
Exercise 7 Look at the previous NFA; instead of the “fast” construction which involves 7 states, think of the “brute-force” construction which involves 64 states. How come we could make an NFA with 7 states? What has to happen with the other 57 possible states?
5. Regular Expressions
With the DFA/NFA models that we have created, they describe a machine model for being able to individually “test” strings and accept/reject them based off of whether the state at the end of the computation is an accepting one. But, one may want to describe a particular language in a convenient form that is completely unambiguous (of course after the creation of necessary definitions!). We turn to an object called a regular expression.
Definition 18 is a regular expression over iff one of the following holds:
- , and ;
- with being regular expressions;
- with being regular expressions;
- with being a regular expression.
Example 7 A regular expression for all the strings that contain the substring (with alphabet ) is: . The idea is that if there is an substring, then any string can appear before and after it. If the string does not contain , then it won’t be matched anyway.
We show that regular expressions also describe the regular languages by showing an equivalence between them and NFAs.
Theorem 19 Every language described by a regular expression is regular.
Proof: We need to convert an arbitrary regular expression into an NFA. The first case involves a 2-state NFA, and the second and third describe a 1-state NFA. The last three are implied by Theorems, 3, 4, and 5.
Theorem 20 Every regular language is described by a regular expression.
Proof: It is sufficient to convert a DFA or NFA into a regular expression. The idea is to “remove” states one-by-one, and maintain on the transitions an equivalent regular expression. However, we cannot do this as DFAs/NFAs only allow one “symbol” on the transition; therefore, we need a generalization.
Definition 21 A generalized NFA (or GNFA) is an NFA with a single start state having no incoming transition, a single final state with no outgoing transitions, and every transition is labelled with a regular expression.
Clearly, any arbitrary NFA or DFA satisfies the last requirement of GNFAs. For an arbitrary NFA, we can add a single start state with an -transition to the old start state, and an -transition from all previously final states to a new single final state without any outgoing transitions. Therefore, we have converted it into a GNFA.
Now we handle “ripping” out of states. Suppose we had a situation similar to the following:
where are regular expressions, is the state that we want to remove, and and are arbitrary other states (final or non-final). Then we can remove state and create a transition from to with the following regular expression:
Clearly, if we only have 2 states in the GNFA, then the resulting regular expression on the only transition is the converted regular expression. So for all states other than the new start and final states, we “rip” each state out one by one, and replace each transition that goes into and then out of that state with a regular expression.
Some details that we may need to check for:
- There may already be a transition not shown above (where ); if this is the case and the regular expression on this transition is , then after ripping state , the resulting regular expression is .
- If , removing introduces a loop.
- For completeness, for all “undefined” transitions, we create a new transition between the states as (but this does not affect the correctness of the algorithm).
Exercise 8 Let be a regular language. Then (all but taking their reverse instead) is also regular. (Hint: use regular expressions and the definition. This can be proved via NFAs but is a little more tricky for details. It’s even easier with regular grammars below.)
6. Regular Grammars
We now have two formalisms to describe the regular languages: a machine model, and a tool to describe the language in terms of the three regular operations. But what if instead of matching against a particular string to check whether or not it is in the language, we wanted to show that it can be generated in some way? (Regular) grammars will help us do this. Because of this, we will have an increasingly large collection of ways to describe these “simple” languages.
Definition 22 A regular grammar is a 4-tuple where:
- is a finite set of variables,
- is the start variable,
- is a finite set of terminals, and
- is a finite set of rules, of the form: . with .
Example 8 Here is an example of a regular grammar:
This has . Let’s see if the string 00011 is in the grammar. We have to start with , and we could choose either to replace with , or with . But a quick glance at ‘s rules show that none will help us derive the string, so we replace with . Then we keep iterating until we do, in fact, derive 00011.
Definition 23 We say that a particular derived string derives another string if is of the form , is of the form , and is a rule. If there is a sequence of derivations and , then is in the language of the grammar (often denoted ).
Implied by the name, we show that regular grammars recognize the regular languages.
Theorem 24 Let be an arbitrary regular language. Then there is a regular grammar with .
Proof: Since is regular, let be a DFA recognizing . The construction is as follows. Make 1 variable for each state, the start variable is the start state, the terminals are the input alphabet. For transitions, if , then add the rule ; if is a final state, add a rule .
Theorem 25 Let be an arbitrary regular grammar. Then is regular.
Proof: The construction is essentially a reverse proof of the last theorem, iterating over all the rules:
- For , make a transition from state to state on symbol .
- For , add an -transition from state to state .
- For , make a single final state (shared among all variables) with a transition from state to it on symbol .
- For , make a final state.
7. Non-Regular Languages
We have shown regularity properties for many languages, so it seems not that possible that there are non-regular languages. Suppose we wanted to have a pattern matcher that “counted” numbers of two characters in a string:
the number of 0’s in is equal to the number of 1’s in
Exercise 9 Try designing a DFA/NFA/regular expression for this language. It will turn out to be impossible, but what are the difficulties you run into?
Let’s suppose that there were a non-regular language (we haven’t proven any exist, but let’s assume). Then if we suppose that this actually non-regular language is in fact regular, then we can design a DFA for it; if we can prove that there is a contradiction somewhere, then we can prove that no DFA can in fact exist for it, implying that the language is not regular.
Now look at an arbitrary language; it either is finite or infinite.
Exercise 10 Show that every finite language is regular.
If it is infinite, it may or may not be regular. If it is, then it has a specific property: for any that we run on the DFA for it (for a long-enough ), there is a state that will be repeated in the computation steps. The “long-enough” criterion is satisfied if is at least the number of states.
Exercise 11 Show that if an arbitrary -state DFA accepts any string with , then the DFA’s language is infinite. (Hint: Pigeonhole Principle)
Here is an important lemma that is used most often in proofs involving non-regularity of languages.
Lemma 26 Let be a regular language. Then there is a positive integer , often called the “pumping constant,” so that if with then can be written as such that:
- for all .
Exercise 12 Prove this lemma. (Hint: it uses the intuitive notion of infinite languages from earlier).
This is often called the “Pumping Lemma.” A common use of the Pumping Lemma is to assume a language is regular, then “pump” a given string in the language until we violate the third condition (i.e., obtain a string not in the language). We do this by showing that no matter the decomposition into , we can pump out of the language.
Theorem 27 The language is not regular.
Proof: Suppose that were regular, and let be its pumping length. Choose . Clearly, and . What are the decompositions of ? Since , and occurs only at the beginning of the string, and consist entirely of 0’s (or are empty). We therefore have and for some constants . By the second condition, . Clearly, .
Now we pump by inserting more copies of into the middle (“pumping up”) or removing (“pumping down”). Either method works here, but we will pump up: observe the string . What is the only way that ? The only possibility is that . But since , this is not possible. Therefore, is not regular.
Exercise 13 Show that the language the number of 0’s in is the same as the number of 1’s in is not regular. (Hint: use the previous theorem, and apply a closure property with a known regular language).
Exercise 14 Show that is not regular.
What have we done in this post? Our original goal was to describe, in a formal way, a “simple” computer: one that can only say accept/reject on a input string using simple transitions between states. We called these machines DFAs, and we wanted to understand all of the languages (sets of strings) for which they recognize. We then introduced a generalization of this model that allowed a “choice” between multiple of the same transition, and showed that this model, the NFAs, are equivalent in computational power.
We next introduced a way to describe regular languages, the regular expressions, with equivalence shown between NFAs. Since NFAs are equivalent to DFAs, we have shown that regular expressions describe these languages also. And the final new model we introduced was regular grammars, which is a way of generating strings in a language, and equivalence (again) was shown with NFAs. We now have four equivalent models of computation for the same set of languages.
But we asked the question about whether there exist languages that were not regular. In order to prove their existence, we observed simple facts about long, accepted strings for a DFA, and noted that they must repeat a state. Looking further, we found that we can describe the structure of regular languages. If we can prove a language does not have that structure, it cannot be regular. And we found many languages that are not regular.
There are a lot of other questions here on regular languages that I will not expand upon, but are nevertheless important to study:
- For a given DFA , it may be important to know the number of accepted strings of a given length . For those of you who know about graphs, this relates to the adjacency matrix of a graph, and can be proved in that setting by looking at successive powers of this matrix.
- Is there an algorithm to find the smallest equivalent DFA (i.e., not changing the language)? What about for an NFA? Regular expression? Regular grammar? It turns out that only the first is computationally tractable, whereas the other three are not!
- Related to the last item: instead of the minimum number of states, but the minimum number of final states?
- We may be concerned for running time in terms of one of the conversions described earlier. Can we prove that for the NFA to DFA conversion, there are some NFAs for which we need to create states? What about on “average”?
In the next set of lecture notes, we will look at a model of computation that is “more powerful” than the ones described here (i.e., they will recognize strictly more languages than the regular ones).
9. Further Exercises
Exercise 15 Show that is not regular.
Exercise 16 Show that is not regular.
Exercise 17 Show that is not regular. (Hint: choose , and make the length of the string slightly larger than needed).
Exercise 18 (Difficult) Show that if is any unary language, then is regular (even if is itself not regular!).