Graphs

This will be a standard post that I will refer to when writing new blog posts as I get more into research topics. This particular page will contain information about graphs, and will be added as more information is needed.

A graph {G} is a 2-tuple {(V, E)} where {V} is a finite set of vertices, and {E \subseteq V \times V} is a finite set of edges. Each edge is an unordered pair of vertices; if we instead consider each edge as an ordered pair of vertices, we get a directed graph, whereas the original definition is usually called an undirected graph. A subgraph {H} of a graph {G} is a subset of both {G}‘s vertices and edges. An induced subgraph is a subgraph and has every edge’s endpoints both defined in the subset of the vertices.

A hyperedge is defined similarly to a normal edge but instead is an (un)ordered tuple on a subset of the vertices. A hypergraph then is a graph that contains hyperedges. Normally we won’t talk about hypergraphs on this blog, but they certainly have many applications for where they are used, and are a nice generalization of the standard graph definition.

Two edges are parallel if they share the same endpoints, and a multigraph is a graph with (possibly) parallel edges.

1. Example Graphs

The complete graph on {n} vertices, {K_n}, is the graph where {E = V \times V}.

The path graph on {n} vertices {\{0, 1, \cdots, v-1\}}, {P_n}, is the graph with edges {\{\{i, j\} : i = j-1, 1 \le i \le v-2\}}.

The cycle graph on {n} vertices {\{0, 1, \cdots, v-1\}}, {C_n}, is the graph with edges {\{\{i, j\} : i = j-1 \mod n\}}.

2. More Definitions

We say that two vertices are adjacent if they share the same endpoints. A path is a finite sequence of vertices {(v_1, v_2, \cdots, v_k)} in which {v_i} and {v_{i+1}} are adjacent for all {1 \le i \le k-1} (Note: there may be additional vertices in the middle of the sequence that are also equal).

A cycle in a graph is path (as defined above) in which {v_1 = v_k}. A graph is acyclic if it does not contain a cycle.

A graph is bipartite if there is a partition of the vertices {A, B \subseteq V} such that all edges are in {A \times B}.

Exercise 1 Show that a graph is bipartite if and only if it does not contain any odd-length cycles.

A {k}-clique in a graph {G} is a complete subgraph of {G} on {k} vertices.

2.1. Trees

A graph is connected if for every pair of vertices {v_1, v_2 \in V} there exists a path between {v_1} and {v_2}. A tree is a connected acyclic graph.

Exercise 2 Prove that any tree on {n} vertices has exactly {n-1} edges.

A connected component of a graph is a subset of the vertices {K \subseteq V} such that every two vertices {k_1, k_2 \in K} have a path between them. A forest is a union of (possibly disjoint) trees.

Formal Correctness

I am currently a Teaching Assistant for my advisor for a class called Introduction to Theoretical Computer Science, which essentially covers a standard curriculum on automata theory. We just recently covered the Pumping Lemma for regular languages:

Lemma 1 (Pumping Lemma for Regular Languages) Let {L} be a regular language. Then there exists a “pumping constant” p for {L} such that for all {w \in L} with {|w| \ge p}, there is a decomposition of {w} into {uvx} such that {|uv| \le p, |v| \ge 1}, and {uv^ix \in L} for all {i \ge 0}.

The main “use” of the Pumping Lemma is to prove that a language is not regular, with a simple proof by contradiction. The most standard example of non-regularity is {\{0^n 1^n : n \ge 0\}}, but another example that I have done in my recitation sections is {L = \{0^{n^2} : n \ge 0\}}, which follows from a simple argument of showing inequality with 2 consecutive perfect squares.

One of my students pointed me to this blog post. It claims to show that “checking whether a input number is prime or not” is regular (because there is a regular expression) for it. Of course, I was skeptical: I already knew the problem of even showing that the language consisting of all prime-length strings is not regular:

Theorem 2 The language L consisting of prime lengths only is not regular.

Proof: Suppose it is. Then choose {w = a^p} with {p} any prime number at least (the pumping length of {L}) + 1; note, the + 1 here is needed. A small technical detail: there may not necessarily be a prime strictly larger than {L}‘s pumping length. But we can guarantee a prime?s existence by the “infinitely many primes” proof by Euclid.

We want to decompose {w} into {uvx}, as usual. Whatever the decomposition is, {|x| \ge 2} (because of the +1 from earlier). Therefore, {|ux| \ge 2}. Choose {i = |ux|}. It suffices to show {|uv^i x|} is not prime. We have {|uv^i x| = |ux| + |v| \times |ux|}, which factors into {(1 + |v|) \times |ux|}. Since {1 + |v| \ge 2}, and {|ux| \ge 2}, then {|uv^i x| = (1 + |v|) \times |ux|} is not prime. \Box

And of course, Theorem 2¬†only shows the case of unary alphabets (although the argument can be easily extended). Also note that the language above is not even context-free (a similar counting-then-show-nondivisibility argument): in fact, if it were context free, we would immediately have shown that we can check whether a number is prime or not in {O(n^3)} time (via the well-known CYK algorithm). Looking at the reference my student sent to me, I came across this line: “…The regular expression engine will backtrack and will make (11+?) match “111″…” (emphasis mine). Of course, anyone familiar with automata theory would know that backtracking is not possible here, and can only be done (as taught in standard classes) by Turing Machines.

And that got me thinking. The reason that this student thought that the language was regular was that the name “Regular Expression” is so colloquially and mis-used (as is the norm in theory…), whereas there was a Turing Machine hidden behind the scenes! Also, where else has notation or terminology been mis-used in Computer Science?

One well-known one (in that I differ from the standard opinion) is Landau notation, specifically with big-O notation, commonly used to classify some bound on some function like running time of an algorithm. Here is a standard definition of big-O notation given in an introductory algorithms class:

Definition 3 Let {f(n), g(n)} be two non-negative, real-valued functions with {n} a natural number. Then we say that {f(n)} is big-O of {g(n)}, commonly written {f(n) = O(g(n))}, if there are fixed constants {c, n_0} (independent of {n}) such that {f(n) \le c \times g(n)} for all {n \ge n_0}.

One can note that that for some function {f(n)}, {O(f(n))} describes many functions, and is actually a set of functions, all with the property listed above. Here is my opinion: I hate the notation {f(n) = O(g(n))}. Why? It suggests that a function is equal to a set of functions, which is simply false! I’ve heard the argument that the notation implies that {f(n)} is equal to some function in {O(g(n))}. But why go the extra mile? Why not just use the correct mathematical notation of {f(n) \in O(g(n))}?

I completely understand the need for lackadaisical definitions: sometimes the math of the problem at-hand is too dense, and a more intuitive approach is necessary. However, I would argue the opposite for most cases here. My main argument stems from having worked in multi-person projects before: although many of us do it all the time, we should not assume that the person reading our code/requirements document/etc. knows exactly what we mean in every scenario. Even though we cannot always be semantically correct, there are places in where we can be formally correct, and I think taking advantage of those opportunities is what is most important.

 

Welcome!

This is going to be my academic blog, where I post about my research, or academia in general. Currently, I’m a PhD student at Arizona State University, studying Computer Science. I enjoy mathematics-style problems, especially those that have relation to CS applications, such as software/hardware testing. My current “research area” is in network reliability, and the combinatorial ideas and constructions behind it. However, here are some of the areas that I like and hopefully will discuss on here:

  • Algorithms and analysis
  • Complexity Theory, at large
  • Network Reliability

These topics will most likely not be complete, and I may add more to one topic, introduce another, or not talk about one at some future point.

I haven’t actually blogged before, and am not sure how well this will work. I will try not to devote too much time to this, but at the same time want to give high-quality posts to the community. I will (hopefully) see how things go and adjust accordingly.

And I hope \LaTeX will work easily on here!