Line 900: |
Line 900: |
| | | |
| ===6.7. Basic Notions of Formal Language Theory=== | | ===6.7. Basic Notions of Formal Language Theory=== |
| + | |
| + | <pre> |
| + | This section collects the material on formal language theory that is needed for the rest of this work. |
| + | |
| + | A "formal language" is a countable set of "expressions", each of which is a finite sequence of elements taken from a finite set of "symbols". The primitive symbols that are used to generate the expressions of a formal language are collectively called its "alphabet" or its "lexicon", depending on whether the expressions of the language are intuitively regarded as "words" or as "sentences", respectively. |
| + | |
| + | So long as one considers only words or only sentences, that is, only one level of finite sequences of symbols, it does not matter essentially what the sequences are called. Unless otherwise specified, a formal language is taken by default to be a "one level" formal language, containing only a single level of sequences. If one wants to consider both words and sentences, that is, finite sequences of symbols and then finite sequences of these lower level sequences, all in the same context of discussion, then one has to move up to an essentially more powerful concept, that of a "two level" formal language. |
| + | |
| + | Until further notice, the next part of this discussion strictly applies only to one level formal languages. When this project reaches the stage of dealing with higher level formal languages, a few of the following definitions and default assumptions will need to be adjusted slightly. |
| + | |
| + | It is convenient to have a generic term for referring to alphabets and lexicons, indifferently, without concern for their level of construction. Therefore, I describe any finite set X as a "syntactic resource" for the syntactic domain X, provided that its elements are regarded as syntactic primitives that can be used to construct the signs and expressions in X. If the primitive signs in a syntactic resource are regarded as denoting primitive objects or operations, then I refer to a collection of these objects or operations as an "objective" or an "operational" resource, as the case may be. |
| + | |
| + | It is always tempting to seek analogies between formal language theory and algebraic studies, and it is often very useful to do so. But if one tries to forge an analogy between the relation "X is a resource for X", in the formal language sense, and the relation "X is a basis for X", in the algebraic sense, then it becomes necessary to observe important differences between the two perspectives, as they are currently applied. |
| + | |
| + | In formal language theory, one typically fixes the syntactic resource X as the primary reality, that is, as the ruling parameter of discussion, and then considers each formal language X that can be generated on X as a particular subset of the maximal language that is possible on X. This direction of approach can be contrasted with what is more usual in algebraic studies, where the generated object X is taken as the primary reality, and a basis X is defined secondarily as a minimal or independent spanning set, but generally serves as only one of many possible bases. |
| + | |
| + | The linguistic relation "X is a resource for X" is thus exploited in the opposite direction from the relation "X is a basis for X". There does not seem to be any reason in principle why either study cannot be cast the other way around, but it has to be noted that the current practices, and the preferences that support them, dictate otherwise. |
| + | |
| + | By way of a general notation, I use doubly underlined capital letters to denote finite sets taken as the syntactic resources of formal languages, and I use doubly underlined lower case letters to denote their symbols. Schematically, this appears as follows: |
| + | |
| + | X = {x1, ..., xn}. |
| + | |
| + | In a formal language context, I use singly underlined capital letters to indicate the various formal languages being considered, that is, the countable sets of sequences over a given syntactic resource that are being singled out for attention, and I use singly underlined lower case letters to indicate various individual sequences in these languages. Schematically, this appears as follows: |
| + | |
| + | X = {x1, ..., xm, ...}. |
| + | |
| + | Usually, one compares different formal languages over a fixed resource, but since resources are finite it is no trouble to unite a finite number of them into a common resource. Without loss of generality, then, one typically has a fixed set X in mind throughout a given discussion and has to consider a variety of different formal languages that can be generated from the symbols of X. These sorts of considerations are aided by defining a number of formal operations on the resources X and the languages X. |
| + | |
| + | The "kth power" of X, written as Xk, is defined to be the set of all sequences of length k over X. |
| + | |
| + | Xk = {<u1, ..., uk> : ui C X, i = 1 to k}. |
| + | |
| + | By convention for the case where k = 0, this gives X0 = {<>}, that is, the singleton set consisting of the empty sequence. Depending on the setting, the empty sequence is referred to as the "empty word" or the "empty sentence", and is commonly denoted by an epsilon or a lambda. In this text, I use the symbol "!" (a stricken exclamation point) as a synonym for the empty sequence <>. In addition, I use the symbol "!" (a stricken exclamation point underscored) as a synonym for {<>}. |
| + | |
| + | It is probably worth remarking at this point that all empty sequences are indistinguishable (in a one level formal language, that is), and thus that all singleton sets consisting of an empty sequence are identical. Consequently, X0 = {<>} = ! = Y0, for all resources X and Y. However, the empty language {} and the singleton empty sequence {<>} need to be distinguished from each other. |
| + | |
| + | The "surplus" of X, written as X+, is defined to be the set of all positive length sequences over X. |
| + | |
| + | X+ = Ui=1 Xk = X1 U ... U Xk U ... |
| + | |
| + | The "kleene star" of X, written as X*, is defined to be the set of all finite sequences over X. |
| + | |
| + | X* = Ui=0 Xk = X0 U X+ = X0 U ... U Xk U ... |
| + | |
| + | A standard reference for the above material is: |
| + | |
| + | Denning, P.J., Dennis, J.B., & Qualitz, J.E. |
| + | Machines, Languages, and Computation. |
| + | Prentice Hall, Englewood Cliffs, NJ, 1978. |
| + | </pre> |
| | | |
| ===6.8. A Perspective on Computation=== | | ===6.8. A Perspective on Computation=== |