3.2 选读

●乔姆斯基(2006)《语言与心智》音系部分选读

●David Odden(2013)Introducing Phonology 选读

●Alan Prince & Paul Smolensk(2004)Optimality Theory: Constraint Interaction in Generative Grammar选读

乔姆斯基(2006)《语言与心智》音系部分选读(1)

第五章 语言的形式属性

音系部分的结构

生成语法的句法部分定义(即生成)成对(D, S)的无限集合,其中D是深层结构,S是表层结构。语法的解释部分为D指派语义表达式,为S指派语音表达式。

我们首先考虑向表层结构指派语音表达式的问题。像上文关于普遍语音学的讨论一样,我们将语音表达式看作普遍语音字母的符号串,每个符号都分析作有特定值的区别特征。相同的观念如果用稍微不同的方式陈述的话,那么我们可以把语音表达式看作一个矩阵,其中“行”对应于普遍系统的特征,“列”对应于连续的音段(语音字母表中的符号),每个条目都是个整数,这个整数规定了所探讨的特定音段的特征值。我们需要确定以下问题:一是哪些信息必须被包含在表层结构中,二是语法的音系部分的规则怎样用这些信息来规定所描述的这类语音矩阵。

再考虑例(4),为表述方便我们重新写作(5):

(5)What # disturb-ed # John # was # be-ing # dis-regard-ed # by # every-one.

粗略看来(2),我们可以将(5)看作构成要素“what、disturb、ed、John、was、be、ing、dis、regard、ed、by、every、one”的序列,其联结处用符号“#”和“-”表达,如(5)所示。这些联结符号(juncture)指明了构成要素的组合方式,为音系部分的解释规则提供了信息。事实上,联结符号必须分析为特征集合,即单列矩阵(single-column matrix),其中“行”对应于联结系统中的某些特征,每个条目都是两个特征值中的一个,特征值我们表征为“+”或“-”。同样,每个构成要素将分析为矩阵,其中“列”代表连续的音段,“行”对应于某种范畴特征(categorial feature),每个条目或为“+”或为“-”。所以,(5)整个句子可看作条目为“+”“-”的单个矩阵。(3)

范畴特征包括语音系统的普遍特征与附加特征(diacritic feature),附加特征实质上指明的是规则的例外情况。在一些方言中“what”所对应的语音表达式为[wat],所以,对应于“what”的矩阵包含三个音段,第一个规定为唇滑音(labial glide),第二个规定为低元音、后元音与非圆唇元音,第三个是清辅音、齿辅音与塞辅音(这些规定完全用普遍语音系统提供的特征值“+”和“-”表示)。在这种情况,音系部分的规则会将这些用“+”和“-”值表示的规定转化为用整数表示的更加详细的规定,即每个音段的语音特征(如舌位的高度、送气的程度等),它所显示的值达到预设的普遍语音理论所要求的精确程度,并在语言所准许的变异范围之内。在这个例子中,被指派的值将把(5)中“what”的底层矩阵中的分歧简单地修正为“+”和“-”值。

上面所引例子非常简单。一般来说,音系部分的规则不但会对底层的“+”和“-”值有较好的规定,而且也会极大地改变特征值,并且可能还会插入、删除或重组音段。比如说,构成要素“by”被表征为包含两列的底层矩阵,其中的第二列被规定为高元音、前元音(用特征值进行规定)。然而,相应的语音矩阵将包含三列,第二列被规定为低元音、后元音,第三列为上颚滑音(这里的规定要求在语音矩阵中用整数值的条目表示)(4)

(5)的表层结构被表征为矩阵,两个特征值中的一个出现在矩阵的每个条目中。只有两个特征值可以出现的事实表明底层的矩阵确实起着纯粹分类的作用。每个句子按有别于其它句子的方式进行分类,也按照决定音系部分的规则如何向特定位置指派语音值的方式进行分类。我们看到普遍语音系统的区别特征在构成表层结构一部分的底层矩阵中有分类功能(classificatory function),在构成所探讨句子的语音表达式的矩阵中有语音功能(phonetic function)。只在前一种功能中区别特征才一律两分,只在后一种功能中它们接受直接的物理解释。

刚刚所描述的底层分类矩阵并没有穷尽解释性的音系规则所需要的信息。除此,也有必要了解所探讨的句子是如何被划分作各种长度的短语以及这些短语的类别的。就(5)而言,音系解释需要如下信息:disturb与disregard为动词,what disturbed John是名词短语,John was being 不是短语等。相关信息可以用带标记的括号在句子上标示出来(5)。包含在成对括号[A and ]A中的单位为A类短语。比如说(5)中的序列“what # disturbed # John”将用括号“[NPNP”封闭,表示它是个名词短语;构成要素“disturb”将用括号“[VV”封闭,表示它是个动词;(5)中的整个表达式将用括号“[SS”封闭,表示它是个句子;序列“John was being”不能用成对的括号封闭,因为它根本不是短语。再举个极为简单的例子,如句子“John saw Bill”可按以下方式表征为表层结构,其中每个由正字法表征的项目都可当作一个分类性矩阵:

(6)[S[NP[NJohn]N]NP[VP[Vsaw]V[NP[NBill]N]NP]VP]S

这个表达式表明“John”与“Bill”是名词(N),“saw”是动词(V),“John”与“Bill”是名词短语(NP),“saw Bill”是动词短语(VP),“John saw Bill”是句子(S)。语法的音系部分在解释句子的时候,似乎总是需要可以按上文所描述的方式进行表征的信息,所以我们假定句子的表层结构是用括号对构成要素和联结符号的分类矩阵的合适标记。

语法的音系部分将表层结构转化为语音表达式。我们已经粗略地规定了“表层结构”与“语音表达式”这两个概念,但仍然需要描述音系部分的规则与它们被组织起来的方式。

目前可获得的证据表明音系部分的规则是按照序列R1、……Rn的方式线性排列的,并且规则的序列按以下方式循环应用于表层结构。在应用的第一个循环中,规则R1、……Rn按这种顺序应用于无内部括号的表层结构中最大的连续部分。这些规则中的最后一个应用后,最内部的括号被删除,而应用的第二个循环被引进。在这个循环中,规则又一次按这种顺序应用于无内部括号的表层结构中最大的连续部分。然后最内部的括号被删除,第三个循环被引进。这种操作一直持续,直到到达音系操作的最大范围(简单的例子就是整个句子)。某些规则应用于词内时会受限,只有当应用范围是完整的词时它们才能应用于循环。有些规则在应用的任何阶段都可以自由反复。注意:循环应用的原则是高度本能的。这实际上是说有个固定的规则系统,该系统根据组成部分的(理想的)形式决定着由其组成的较大单位的形式。

我们可以用英语中的重音指派规则说明循环应用规则。似乎有这样一个事实:英语的语音表达式必须允许重音的区别特征能取五六个不同的值,然而,所有音段在表层结构中却没有标记重音。也就是说,对英语而言,重音没有区别特征那样的范畴功能(除了最边缘的)。语音表达式的复杂重音曲线由规则(7)(8)决定(6)

(7)在名词中,向最左边的两个主重元音指派主重音(primary stress)。

(8)向最右边的重音峰(stress-peak)指派主重音,在某一范围内如果没有哪个元音比某一元音V更重,则该元音V在该范围内为重音峰。

规则(7)适用于有两个主重音的名词,规则(8)适用于任何种类的语言单位。这些规则按由(7)到(8)的顺序循环应用。根据规约,当主重音被指派到某一位置,所有其他的重音都弱化一个等级。注意,如果某一范围内没有重读元音,则规则(8)会向最右边的元音指派主重音。

我们用(6)的表层结构说明这些规则。根据循环应用的基本原则,规则(7)与(8)首先应用于最内层的单位[N John]N、[V saw]V与[N Bill]N。规则(7)应用不起来,只能应用规则(8),首先向这些例子中的单个元音指派主重音,然后,最内层的括号被删除。下一个循环需要处理的单位是“[NP John]NP”和“[NP Bill]NP”,根据规则(8),简单地向单个元音重新指派主重音。然后删除最内层的括号,我们得到作为规则应用范围的单位。规则(7)又不能应用,因为“saw Bill”不是名词;规则(8)向“Bill”的元音指派主重音,并把“saw”的重音弱化到第二级。删除最内层的括号,我们得到作为应用范围的单位。规则(7)又不能应用,规则(8)向“Bill”指派主重音,并弱化其他重音,其结果是,作为重音曲线的理想表达式,这是可接受的。

考虑一个稍微复杂的例子“John's black-board eraser”。在第一个应用循环中,规则(7)与(8)应用于最内层被括起来的单位“John"“black"“board”与“erase”,规则(7)不能应用,规则(8)给每个例子中最右侧的元音指派主重音(前面三个例子都只有一个元音)。下一个循环涉及的单位有“John's”与“eraser”,但没有意义(7) 。下一个循环的应用范围是。作为名词,这个单位受规则(7)支配,规则(7)向“black”指派主重音,并在第二级弱化“board”的重音。删除最内层的括号,下一个循环的应用范围是。接着应用规则(7),向“black”指派主重音,并弱化其他重音一个等级。在最后的一个循环里,规则的应用范围是。因为这是个完整的名词短语,不能应用规则(7)。规则(8)向最右边的主重元音指派主重音并弱化所有的其他重音,其结果是。按这种方式,复杂语音表达式由独立激发而且非常简单的规则决定,规则根据循环的一般原则进行应用。

这个例子有典型性,说明了几个很重要的方面。英语语法必须包含规则(7)和规则(8),前者用来解释名词“blackboard”(黑板)中的重音曲线这一事实,后者用来解释短语“black board”(黑色的板子)中的上升曲线。严格地说,循环原则不是英语语法的一部分,而是普遍语法的一部分,它决定着英语或其他语言中特殊规则的应用,不管这些规则是什么样的。在上述例子中,循环应用的一般原则正如所示的那样指派复杂的重音曲线。有了循环原则和规则(7)与规则(8),人们就会知道(8)各种语言表达式的合适的重音曲线,这些语言表达式可以是“John's blackboard eraser”,也可以是他可能从未听过的。这是一个语言基本属性的简单例子。某些普遍原则必须跟特殊规则相互作用以决定全新的语言表达式的形式(和意义)。

这个例子也为某些较微妙又深远的假设提供了证据。毫无疑问的是,像英语里重音曲线这种现象是一种感知现实(perceptual reality)。比如说,经过训练的观察者能在记录他们母语中的新话语时达到很高的一致性。但没有理由认为这些重音曲线能够反映物理现实(physical reality)。这可能是因为重音曲线不能像感知到的细节那样表征在物理信号之中。这里没有矛盾。如果在物理信息中仅有两个重音层次被区分,则对于一个正在学习英语的人来说,他会有足够证据去建构规则(7)和规则(8)(如果blackboard和black board有对立的话)。假定这个人知道循环原则,那么即使重音曲线不是信号的物理属性,他也将能够感知到“John's blackboard eraser”的重音曲线。目前已获得的证据有力地表明这是英语中重音如何被感知的准确描述。

有一点很重要,就是必须明确这种描述中不存在任何神秘之处。原则上,设计这样一种自动机是没有问题的,这种自动机使用规则(7)和规则(8)、英语句法的规则和转换循环原则,并且甚至能够向根本没有表征重音的话语(比如,由传统正字法拼成的句子)指派多层次的重音曲线。把这种自动机粗略地看作言语感知模型(请参见103页的(1)(9)),我们可以认为听话者运用物理信息中已选定的性质去确定形成了什么样的句子,并为之指派出深层结构和表层结构。无论重音是否和所呈现的信号的任何物理性质的信息相一致,任何人只要他全神贯注,都能“听”出由其语法中音系部分所指派的重音曲线。

宽泛地说,对言语感知的这种解释的假定是,话语的句法解释部分是详细“听”出语音表达式的先决条件。这样也排除了两种假定:一种假定是言语感知需要语音形式的全面分析,并且这种分析是有序的,即语义解释之后是句法结构的全面分析,句法结构的全面分析之后是语音形式的全面分析;另一种假定是所感知的语音形式是信号的点对点的精确表达式。需要记住的是:这里并没有证据表明这些已被否定的假设是正确的,也没有证据表明这些刚刚提到的能否定这些假设的观点有什么神秘之处。事实上,刚刚提到的观点貌似具有很高的可信度,因为一方面,它可以排除一些主张,即在识别话语的一些目前还不能觉察的物理属性时,其精确度甚至超出理想条件下实验可论证的东西;另一方面,它可以根据非常简单的假定对新话语(10)的重音曲线的感知进行解释,这些假定是规则(7)、规则(8)与循环应用的一般原则可以被感知系统利用。

关于各种感知模型的相对优势,有许多可说的。这个话题暂且不说,让我们进一步考虑这样一个假设:规则(7)、规则(8)与循环应用的一般原则可以被感知系统利用并按所提出的方式执行。很显然,规则(7)、规则(8)可能是从升降曲线等简单例子中学会的(如“black board”与“blackboard”的对立)。但问题是:人是如何学会循环应用原则的?在直面这个问题之前,有必要解决另一逻辑上先于它的问题:为什么假定原则完全是学会的?原则被执行的证据有很多,但这并不能推出原则是被学会的。事实上,很难想像这些原则是如何被所有说话者完全一致地学会的;是否有充足的证据在物理信号中被用来证明这个原则的正当性也根本不清楚。因此,最合理的结论似乎是:原则根本不是学会的,它只是学习者完成语言习得任务的概念才能(conceptual equipment)的一部分。普遍语法的其他原则也可做类似论证。

需要再次提醒的是,上述结论并不奇怪。原则上,设计这样一种自动机根本没有什么困难,它融合了普遍语法的原则,并用这些原则确定它所探讨的语言中的哪一种语言是可能的语言。从先验上说,假定这些原则是学会的,这一假定并不比另一假定有更多的理由,即人们用直线、角、曲线、距离等来解释视觉刺激是学会的,或者说,其实连人们有两个手臂也是学会的。这完全是个经验事实的问题,目前没有任何关于一般的外部语言类的信息可用来支持如下假设:一些普遍语法的原则是学会的,或是天赋的,或是(从某些方面来说)两者兼备。如果语言事实似乎表明一些原则是不能学会的,则没有理由认为这个结论是矛盾的或是奇怪的。

回到对普遍语法的原则的详细描述上来,语法中的音系部分似乎是由一系列的规则构成的,如上文所述,这些规则按循环的方式应用,并向表层结构指派语音表达式。语音表达式是语音特征规定的矩阵,表层结构是构成要素所带的具有合适标记的括号,其中构成要素本身又是用范畴区别特征的标记进行表征的。目前所获得的证据都支持这些假设,这些假设又为解释语音事实中的许多奇怪特征提供了基础。

值得重视的是,语法中音系部分的这些属性没有先验的必然性。普遍语法的这些假设将可能的人类语言的种类限制在可设想的“语言”集合的一个非常特殊的子集中。我们手头所获的证据表明这些假设属于106页(11)的语言习得装置AM, 如(3),它们构成先天程式的一部分,儿童用它来解决语言学习问题。很显然,这种程式必须非常精确并且高度受限。如果不是这样,在时间、机会和变异等经验上已知的限制下,语言习得就会是难以理解的神秘现象。先前讨论所涉及的各种想法跟确定先天机制的性质这一问题直接相关,因此,这些想法非常值得认真研究和关注。

David Odden(2013)Introducing Phonology 选读(12)

◆ 作者简介

大卫·奥登(David Odden, 1954— ),美国知名音系学家,1975年在华盛顿大学取得学士学位,1981年在伊利诺伊大学取得博士学位,曾先后在华盛顿州立大学、密歇根州立大学、耶鲁大学执教,现为俄亥俄州立大学语言系教授。他在音系学、描写语言等方面颇有建树,特别是在非洲语言声调、班图语言描写、强制调形原则(obligatory contour principle)方面做出了突出贡献。

◆ 正文选读

Chapter 3 Feature theory

This chapter explores the theory for representing language sounds as symbolic units. You will:

◆ see that sounds are defined in terms of a fixed set of universal features

◆ learn the phonetic definitions of features, and how to assign feature values to segments based on phonetic properties

◆ understand how phonological rules are formalized in terms of these features

◆ see how these features make predictions about possible sounds and rules in human language

Key terms: observation, predictions, features, natural classes

3.1 Scientific questions about speech sounds

One of the scientific questions that need to be asked about language is: what is a possible speech sound? Humans can physically produce many more kinds of sounds than are used in language. No language employs hand-clapping, finger-snapping, or vibrations of air between the hand and cheek caused by release of air from the mouth when obstructed by the palm of the hand(though such a sound can easily communicate an attitude).A goal of a scientific theory of language is to systematize such facts and explain them; thus we have discovered one limitation on language sound and its modality—language sounds are produced exclusively within the mouth and nasal passages, in the area between the lips and larynx.

Even staying within the vocal tract, languages also do not, for example, use whistles or inhalation to form speech sounds, nor is a labiolingual trill(a.k.a.“the raspberry”)a speech sound in any language. It is important to understand that even though these various odd sounds are not language sounds, they may still be used in communication. The“raspberry”in American culture communicates a contemptuous attitude; in parts of coastal East Africa and Scandinavia, inhaling with the tongue in the position for schwa expresses agreement. Such noises lie outside of language, and we never find plurality indicated with these sounds, nor are they surrounded by other sounds to form the word dog. General communication has no systematic limitations short of anatomical ones, but in language, only a restricted range of sounds are used.

The issue of possible speech sounds is complicated by manual languages such as American Sign Language. ASL is technically not a counterexample to a claim about modality framed in terms of“speech sounds.” But it is arbitrary to declare manual language to be outside the theory of language, and facts from such languages are relevant in principle. Unfortunately, knowledge of the signed languages of the world is very restricted, especially in phonology. Signed languages clearly have syntax: what isn't clear is what they have by way of phonologies. Researchers have only just begun to scratch the surface of sign language phonologies, so unfortunately we can say nothing more about them here.

The central question is: what is the basis for defining possible speech sounds? Do we use our“speech anatomy”in every imaginable way, or only in certain well-defined ways?

3.1.1 Possible differences in sounds

One way to approach the question is to collect samples of the sounds of all of the languages in the world. This search(which has never been con-ducted)would reveal massive repetition, and would probably reveal that the segment[m]in English is exactly the same as the segment[m]in French, German, Tübatülabal, Arabic, Swahili, Chinese, and innumerable other languages. It would also reveal differences, some of them perhaps a bit surprising. Given the richness of our transcriptional resources for notating phonetic differences between segments, you might expect that if a collection of languages had the same vowels transcribed as[i]and[ɪ],then these vowels should sound the same. This is not so.

Varieties of phonetic[i]vs. [ɪ]. Many languages have this pair of vowels; for example, Matuumbi has[i]and[ɪ].But the actual pronunciation of[i]vs. [ɪ]differs between English and Matuumbi. Matuumbi[i]is higher than in English, and Matuumbi[ɪ]is a bit lower than English[ɪ]—to some people it almost sounds like[e](but is clearly different from[e],even the“pure”[e]found in Spanish).This might force us to introduce new symbols, so that we can accurately represent these distinctions.(This is done in publications on Matuumbi, where the difference is notated as“extreme”, versus“regular”i, u.)Before we embark on a program of adding new symbols, we should be sure that we know how many symbols to add. It turns out that the pronunciation of[i]and[ɪ]differs in many languages: these vowels exist in English, Kamba, Lomwe, Matuumbi, Bari, Kipsigis, Didinga, and Sotho, and their actual pronunciation differs in each language.

You do not have to go very far into exotic languages to find this phonetic difference, for the difference between English[i]and German[i]is also very noticeable, and is something that a language learner must master to develop a good German or English accent. Although the differences may be difficult for the untrained ear to perceive at first, they are consistent, physically measurable, and reproducible by speakers. If written symbols are to represent phonetic differences between languages, a totally accurate transcription should represent these differences. To represent just this range of vowel differences involving[i]and[ɪ],over a dozen new symbols would need to be introduced. Yet we do not introduce large numbers of new symbols to express these differences in pronunciations, because phonological symbols do not represent the precise phonetic properties of the sounds in a language, they only represent the essential contrast between sounds.

Other variants of sounds. Similar variation exists with other phonetic categories. The retroflex consonants of Telugu, Hindi, and Koti are all pronounced differently. Hindi has what might be called“mild”retroflexion, where the tip of the tongue is placed just behind the alveolar ridge, while in Telugu, the tip of the tongue is further back and contact is made between the palate and the underside of the tongue(sublaminal);in Koti, the tongue is placed further forward, but is also sublaminal. Finnish, Norwegian, and English contrast the vowels[a]and[æ],but in each of these languages the vowels are pronounced in a slightly different way. The voiced velar fricative[γ]found in Arabic, Spanish, and the Kurdish language Hawrami are all phonetically different in subtle but audible ways.

The important details of speech. Although languages can differ substantially in the details of how their sounds are pronounced, there are limits on the types of sound differences which can be exploited contrastively, i.e. can form the basis for making differences in meaning. Language can contrast tense[i]and lax[ɪ],but cannot further contrast a hyper-tense high vowel(like that found in Matuumbi), which we might write as[i],with plain tense[i]as in English, or hyper-lax[ɪ-]as in Matuumbi with plain lax[ɪ]as found in English. Within a language, you find at most[i]vs. [ɪ].Languages can have one series of retroflex consonants, and cannot contrast Hindi-style[ʈ]with a Telugu-style phoneme which we might notate as[ʈ].The phonology simply has“retroflex”,and it is up to the phonetic component of a language to say exactly how a retroflex consonant is pronounced.

It is important to emphasize that such phonetic details are not too subtle to hear. The difference between various types of retroflex consonants is quite audible—otherwise, people could not learn the typical pronunciation of retroflex consonants in their language—and the difference between English and German[i]is appreciable. Children learning German can hear and reproduce German[i]accurately. Speakers can also tell when someone mispronounces a German[i]as an English[i],and bilingual German-English speakers can easily switch between the two phonetic vowels.

One thing that phonological theory wants to know is: what is a possible phoneme? How might we answer this? We could look at all languages and publish a list. A monumental difficulty with that is that there are nearly 7,000 languages, but useful information on around only 10 percent of these languages. Worse, this could only say what phonemic contrasts happen to exist at the present. A scientific account of language does not just ask what has been actually observed, it asks about the fundamental nature of language, including potential sounds which may have existed in a language spoken 1,000 years ago, or some future language which will be spoken 1,000 years hence. We are not just interested in observation, we are interested in prediction.

In this connection, consider whether a“bilabial click”is a possible phoneme. We symbolize it as[⊙]—it is like a kiss, but with the lips flat as for[m],not protruded as for[w].Virtually all languages have bilabial consonants, and we know of dozens of languages with click consonants(Dahalo, Sotho, Zulu, Xhosa, Khoekhoe), so the question is whether the combination of concepts“bilabial”and“click”can define a phoneme. As it happens, we know that such a sound does exist, but only in two closely related languages, !Xoo and Eastern≠Hoan, members of the Khoisan language family. These languages have under 5,000 speakers combined, and given socioeconomic factors where these languages are spoken(Namibia and Botswana), it is likely that the languages will no longer be spoken in 200 years. We are fortunate in this case that we have information on these languages which allows us to say that this is a phoneme, but things could have turned out differently. The languages could easily have died out without having been recorded, and then we would wrongly conclude that a bilabial click is not a possible phoneme because it has not been observed. We need a principled, theoretical basis for saying what we think might be observed.

Predictions versus observations. A list of facts is scientifically uninter-esting. A basic goal of science is to have knowledge that goes beyond what has been observed, because we believe that the universe obeys general laws. A list might be helpful in building a theory, but we would not want to stop with a list, because it would give us no explanation why that particular list, as opposed to some other arbitrary list, should constitute the possible phonemes of language. The question“what is a possible phoneme?” should thus be answered by reference to a general theory of what speech sounds are made of, just as a theory of“possible atoms”is based on a general theory of what makes up atoms and rules for putting those bits together. Science is not simply the accumulation and sorting of facts, but rather the attempt to discover laws that regulate the universe. Such laws make predictions about things that we have yet to observe: certain things should be found, other things should never be found.

The Law of Gravity predicts that a rock will fall to earth, which says what it will do and by implication what it will not do: it also won't go up or sideways. Physicists have observed that subatomic particles decay into other particles. Particles have an electrical charge—positive, negative or neutral—and there is a physical law that the charge of a particle is preserved when it decays(adding up the charges of the decay products).The particle known as a“kaon”(K)can be positive(K),negative(K-)or neutral(K0);a kaon can decay into other particles known as“pions”(π)which also can be positive(π),negative(π-)or neutral(π0).Thus a neutral kaon may become a positive pion and a negative pion(K0 →π + π-)or it may become one positive, one negative, and one neutral pion(K0 →π + π- + π0),because in both cases the positives and negatives cancel out and the sum of charges is neutral(0).The Law of Conservation of Charge allows these patterns of decay, and prohibits a neutral kaon from becoming two positive pions(K0 →π + π).In the myriad cases of particle decay which have been observed experimentally, none violates this law which predicts what can happen and what cannot.

Analogously, phonological theory seeks to discover the laws for building phonemes, which predict what phonemes can be found in languages. We will see that theory, after considering a related question which defines phonology.

3.1.2 Possible rules

Previous chapters have focused on rules, but we haven't paid much attention to how they should be formulated. English has rules defining allowed clusters of two consonants at the beginning of the word. The first set of consonant sequences in(1)is allowed, whereas the second set of sequences is disallowed.

(1)pr pl br bl tr dr kr kl gr gl
*rp *lp *rb *lb *rt *rd *rk *lk *rg *lg

This restriction is very natural and exists in many languages—but it is not inevitable, and does not reflect any insurmountable problems of physiology or perception. Russian allows many of these clusters, for example[rtutj]‘mercury' exemplifies the sequence[rt]which is impossible in English.

We could list the allowed and disallowed sequences of phonemes and leave it at that, but this does not explain why these particular sequences are allowed. Why don't we find a language which is like English, except that the specific sequence[lb]is allowed and the sequence[bl]is disallowed? An interesting generalization regarding sequencing has emerged after comparing such rules across languages. Some languages(e.g. Hawaiian)do not allow any clusters of consonants and some(Bella Coola, a Salishan language of British Columbia)allow any combination of two consonants, but no language allows initial[lb]without also allowing[bl].This is a more interesting and suggestive observation, since it indicates that there is something about such sequences that is not accidental in English; but it is still just a random fact from a list of accumulated facts if we have no basis for characterizing classes of sounds, and view the restrictions as restrictions on letters, as sounds with no structure.

There is a rule in English which requires that all vowels be nasalized when they appear before a nasal consonant, and thus we have a rule something like(2).

If rules just replace one arbitrary list of sounds by another list when they stand in front of a third arbitrary list, we have to ask why these particular sets of symbols operate together. Could we replace the symbol[n]with the symbol[tʃ],or the symbol[õ]with the symbol[ø],and still have a rule in some language? It is not likely to be an accident that these particular symbols are found in the rule: a rule similar to this can be found in quite a number of languages, and we would not expect this particular collection of letters to assemble themselves into a rule in many languages, if these were just random collections of letters.

Were phonological rules stated in terms of randomly assembled symbols, there would be no reason to expect(3a)to have a different status from(3b).

(3)a. {p, t, tʃ,k}→{m, n, ɲ,ŋ}/_{m, n, ɲ,ŋ}
b. {b, p, d, q}→{d, q, b, p}/_{s, x, o, ɪ}

Rule(3a)—nasalization of stops before nasals—is quite common, but(3b)is never found in human language. This is not an accident, but rather reflects the fact that the latter process cannot be characterized in terms of a unified phonetic operation applying to a phonetically defined context. The insight which we have implicitly assumed, and make explicit here, is that rules operate not in terms of specific symbols, but in terms of definable classes. The basis for defining those classes is a set of phonetic properties.

As a final illustration of this point, rule(4a)is common in the world's languages but(4b)is completely unattested.

(4)a. k, g →tʃ,dʒ /_i, e
b. p, r →i, b/_o, n

The first rule refers to phonetically definable classes of segments(velar stops, alveopalatal affricates, front vowels), and the nature of the change is definable in terms of a phonetic difference(velars change place of articulation and become alveopalatals).The second rule cannot be characterized by phonetic properties: the sets {p, r},{i, b},and {o, n} are not defined by some phonetic property, and the change of[p]to[i]and[r]to[b]has no coherent phonetic characterization.

The lack of rules like(4b)is not just an isolated limitation of knowledge—it's not simply that we haven't found the specific rules(4b)but we have found(4a)—but rather these kinds of rules represent large, systematic classes.(3b)and(4b)represent a general kind of rule, where classes of segments are defined arbitrarily. Consider the constraint on clusters of two consonants in English. In terms of phonetic classes, this reduces to the simple rule that the first consonant must be a stop and the second consonant must be a liquid. The second rule changes vowels into nasalized vowels before nasal consonants. The basis for defining these classes will be con-sidered now.

3.2 Distinctive feature theory

Just saying that rules are defined in terms of phonetic properties is too broad a claim, since it says nothing about the phonetic properties that are relevant. Consider a hypothetical rule, stated in terms of phonetic properties:

all vowels change place of articulation so that the original difference in formant frequency between F1 and F3 is reduced to half what it originally was, when the vowel appears before a consonant whose duration ranges from 100 to 135 ms.

What renders this rule implausible(no language has one vaguely resembling it)is that it refers to specific numerical durations, and to the difference in frequency between the first and third formant.

An acoustic description considers just physical sound, but a perceptual description factors in the question of how the ear and brain process sound. The difference between 100 Hz and 125 Hz is acoustically the same as that between 5,100 Hz and 5,125 Hz. The two sets are perceptually very different, the former being perceived as“more separate”and the latter as virtually indistinguishable.

The phonetic properties which are the basis of phonological systems are general and somewhat abstract, such as voicing or rounding, and are largely the categories which we have informally been using already: they are not the same, as we will see. The hypothesis of distinctive feature theory is that there is a small set, around two dozen, of phonetically based properties which phonological analysis uses. These properties, the distinctive features, not only define the possible phonemes of human languages, but also define phonological rules.

The classical statement of features derives from Chomsky and Halle(1968).We will use an adapted set of these features, which takes into consideration refinements. Each feature can have one of two values, plus and minus, so for each speech sound, the segment either has the property(is[+Fi])or lacks the property(is[-Fi]).In this section, we follow Chomsky and Halle(1968)and present the generally accepted articulatory correlates of the features, that is, what aspects of production the feature relates to. There are also acoustic and perceptual correlates of features, pertaining to what the segment sounds like, which are discussed by Jakobson, Fant, and Halle(1952)using a somewhat different system of features.

3.2.1 Phonetic preliminaries

By way of phonetic background to understanding certain features, two phonetic points need to be clarified. First, some features are characterized in terms of the“neutral position”,which is a configuration that the vocal tract is assumed to have immediately prior to speaking. The neutral position, approximately that of the vowel[ε],defines relative movement of the tongue.

Second, you need to know a bit about how the vocal folds vibrate, since some feature definitions relate to the effect on vocal fold vibration(important because it provides most of the sound energy of speech).The vocal folds vibrate when there is enough air pressure below the glottis(the opening between the vocal folds)to force the vocal folds apart. This opening reduces subglottal pressure, which allows the folds to close, and this allows air pressure to rebuild to the critical level where the vocal folds are blown apart again. The critical factor that causes the folds to open is that the pressure below the vocal folds is higher than the pressure above.

Air flows from the lungs at a roughly constant rate. Whether there is enough drop in pressure for air to force the vocal folds open is thus determined by the positioning and tension of the vocal folds(how hard it is to force them apart), and the pressure above the glottis. The pressure above the glottis depends on how effectively pressure buildup can be relieved, and this is determined by the degree of constriction in the vocal tract. In short, the configuration of the vocal folds, and the degree and location of constriction above the glottis almost exclusively determine whether there will be voicing.

If the pressure above and below the glottis is nearly equal, air stops flowing and voicing is blocked. So if the vocal tract is completely obstructed(as for the production of a voiceless stop like[k]),air flowing through the glottis rapidly equalizes the pressure below and above the glottis, which stops voicing. On the other hand, if the obstruction in the vocal tract is negligible(as it is in the vowel[a]),the pressure differential needed for voicing is easily maintained, since air passing through the glottis is quickly vented from the vocal tract.

A voiced stop such as[g]is possible, even though it involves a total obstruction of the vocal tract analogous to that found in[k],because it takes time for pressure to build up in the oral cavity to the point that voicing ceases. Production of[g]involves ancillary actions to maintain voicing. The pharynx may be widened, which gives the air more room to escape, delaying the buildup of pressure. The larynx may be lowered, which also increases the volume of the oral cavity; the closure for the stop may be weakened slightly, allowing tiny amounts of air to flow through; the velum may be raised somewhat to increase the size of the air cavity, or it may be lowered somewhat to allow small(usually imperceptible)amounts of air to pass through the nose. The duration of the consonant can be reduced—generally, voiced stops are phonetically shorter than corresponding voiceless stops.

Certain sounds such as vowels lack a radical constriction in the vocal tract, so it is quite easy to maintain voicing during such sounds, whereas with other sounds, specifically obstruents, voicing is difficult to maintain. Some accounts of this distinction, especially that of Chomsky and Halle(1968),refer to“spontaneous voicing”,which is grounded on the assumption that voicing occurs automatically simply by positioning the vocal folds in what we might call the“default”position. For sounds that involve a significant obstruction of the vocal tract, special actions are required for voicing. The features[sonorant]and[consonantal]directly relate to the obstruction in the vocal tract, which determines whether the vocal folds vibrate spontaneously.

3.2.2 Major class features

One of the most intuitive distinctions which feature theory needs to capture is that between consonants and vowels. There are three features, the so-called major class features, which provide a rough first grouping of sounds into functional types that includes the consonant/vowel distinction.

syllabic(syl): forms a syllable peak(and thus can be stressed).

sonorant(son):sounds produced with a vocal tract configuration in which spontaneous voicing is possible.

consonantal(cons): sounds produced with a major obstruction in the oral cavity.

The feature[syllabic]is, unfortunately, simultaneously one of the most important features and one of the hardest to define physically. It corresponds intuitively to the notion“consonant”(where[h],[j],[m],[s],[t]are“consonants”)versus“vowel”(such as[a],[i]):indeed the only difference between the vowels[i, u]and the corresponding glides[j, w]is that[i, u]are[+syllabic]and[j, w]are[-syllabic].The feature[syllabic]goes beyond the intuitive vowel/consonant split. English has syllabic sonorants, such as[r],[l],[n].The main distinction between the English words(American English pronunciation)ear[ɪr]and your[jr]resides in which segments are[+syllabic]versus[-syllabic].In ear, the vowel[ɪ]is[+syllabic]and[r]is[-syllabic],whereas in your, [j]is[-syllabic]and[r]is[+syllabic].The words eel[il]and the reduced form of you'll[jl]for many speakers of American English similarly differ in that[i]is the peak of the syllable(is[+syllabic])in eel, but[l]is the syllable peak in you'll.

Other languages have syllabic sonorants which phonemically contrast with nonsyllabic sonorants, such as Serbo-Croatian which contrasts syllabic[r]with nonsyllabic[r](cf. groze‘fear(gen)’ versus groce‘little throat’).Swahili distinguishes[mbuni]‘ostrich’and[mbuni]‘coffee plant’in the fact that[mbuni]is a three-syllable word and[m]is the peak(the only segment)of that first syllable, but[mbuni]is a two-syllable word, whose first syllable peak is[u].Although such segments may be thought of as“consonants”in one intuitive sense of the concept, they have the feature value[+syllabic].This is a reminder that there is a difference between popular concepts about language and technical terms.“Consonant”is not strictly speaking a technical concept of phonological theory, even though it is a term quite frequently used by phonologists—almost always with the meaning“nonpeak”in the syllable, i.e. a[-syllabic]segment.

The definition of[sonorant]could be changed so that glottal configuration is also included, then the laryngeals would be[-sonorant].There is little compelling evidence to show whether this would be correct; later, we discuss how to go about finding such evidence for revising feature definitions.

The feature[sonorant]captures the distinction between segments such as vowels and liquids where the constriction in the vocal tract is small enough that no special effort is required to maintain voicing, as opposed to sounds such as stops and fricatives which have enough constriction that effort is needed to maintain voicing. In an oral stop, air cannot flow through the vocal tract at all, so oral stops are[-sonorant].In a fricative, even though there is some airflow, there is so much constriction that pressure builds up, with the result that spontaneous voicing is not possible, thus fricatives are[-sonorant].In a vowel or glide, the vocal tract is only minimally constricted so air can flow without impedance: vowels and glides are therefore[+sonorant].A nasal consonant like[n]has a complete obstruction of airflow through the oral cavity, but nevertheless the nasal passages are open which allows free flow of air. Air pressure does not build up during the production of nasals, so nasals are[+sonorant].In the liquid[l],there is a complete obstruction formed by the tip of the tongue with the alveolar ridge, but nevertheless air flows freely over the sides of the tongue so[l]is[+sonorant].

The question whether r is[+sonorant]or[-sonorant]has no simple answer, since many phonetically different segments are transcribed as r; some are[-sonorant]and some are[+sonorant],depending on their phonetic properties. The so-called fricative r of Czech(spelle6d ř)has a considerable constriction, so it is[-sonorant],but the English type[ɹ]is a sonorant since there is very little constriction. In other languages there may be more constriction, but it is so brief that it does not allow significant buildup of air pressure(this would be the case with“tapped”r's).Even though spontaneous voicing is impossible for the laryngeal consonants[h, ʔ]because they are formed by positioning the vocal folds so that voicing is precluded, they are[+sonorant]since they have no constriction above the glottis, which is the essential property defining[+sonorant].

The feature[consonantal]is very similar to the feature[sonorant],but specifically addresses the question of whether there is any major constriction in the oral cavity. This feature groups together obstruents, liquids and nasals which are[+consonantal],versus vowels, glides, and laryngeals([h, ʔ])which are[-consonantal].Vowels and glides have a minor obstruction in the vocal tract, compared to that formed by a fricative or a stop. Glottal stop is formed with an obstruction at the glottis, but none in the vocal tract, hence it is[-consonantal].In nasals and liquids, there is an obstruction in the oral cavity, even though the overall constriction of the whole vocal tract is not high enough toprevent spontaneous voicing. Recent research indicates that this feature may not be necessary, since its function is usually covered as well or better by other features.

The most important phonological use of features is that they identify classes of segments in rules. All speech sounds can be analyzed in terms of their values for the set of distinctive features, and the set of segments that have a particular value for some feature(or set of feature values)is a natural class. Thus the segments[a i r m]are members of the[+syllabic]class, and[j h ʔr m s p]are members of the[-syllabic]class; [ar j ʔr m]are in the[+sonorant]class and[s z p b]are in the[-sonorant]class; [a i w h ʔ]are in the[-consonantal]class and[r m r m s p]are in the[+consonantal]class. Natural classes can be defined in terms of conjunctions of features, such as[+consonantal, -syllabic],which refers to the set of segments which are simultaneously[+consonantal]and[-syllabic].

When referring to segments defined by a combination of features, the features are written in a single set of brackets—[+cons, -syl]refers to a single segment which is both +consonantal and -syllabic, while[+cons][-syl]refers to a sequence of segments, the first being+consonantal and the second being -syllabic.

Accordingly, the three major class features combine to define five maximally differentiated classes, exemplified by the following segment groups.

Further classes are definable by omitting specifications of one or more of these features: for example, the class[-syllabic, +sonorant]includes {j, w, h, ʔ,r, l, m}.

One thing to note is that all[+syllabic]segments, i.e. all syllable peaks, are also[+sonorant].It is unclear whether there are syllabic obstruents, i.e. [s],[k].It has been claimed that such things exist in certain dialects of Berber, but their interpretation remains controversial, since the principles for detection of syllables are controversial. Another gap is the combination[-sonorant, -consonantal],which would be a physical impossibility. A[-sonorant]segment would require a major obstruction in the vocal tract, but the specification[-consonantal]entails that the obstruction could not be in the oral cavity. The only other possibility would be constriction of the nasal passages, and nostrils are not sufficiently constrictable.

3.2.3 Place of articulation

Features to define place of articulation are our next functional set. We begin with the features typically used by vowels, specifically the[+syllabic, -consonantal, +sonorant]segments, and then proceed to consonant features, ending with a discussion of the intersection of these features.

Vowel place features. The features which define place of articulation for vowels are the following.

high: the body of the tongue is raised from the neutral position.

low: the body of the tongue is lowered from the neutral position.

back: the body of the tongue is retracted from the neutral position.

round: the lips are protruded.

tense: sounds requiring deliberate, accurate, maximally distinct gestures that involve considerable muscular effort.

advanced tongue root: produced by drawing the root of the tongue forward.

The main features are[high],[low],[back],and[round].Phonologists primarily distinguish just front and back vowels, governed by[back]:front vowels are[-back]since they do not involve retraction of the tongue body, and back vowels are[+back].Phonetic central vowels are usually treated as phonological back vowels, since typically central vowels are unrounded and back vowels are rounded. Distinctions such as those between[ɨ]and[ɰ],[ɜ]and[ʌ],[y]and[ʉ],[ʚ]and[œ],or[a]and[ɑ]are usually considered to be phonologically unimportant over-differentiations of language-specific phonetic values of phonologically back unrounded vowels. The phonologically relevant question about a vowel pronounced as[ʉ]is not whether the tongue position is intermediate between that of[i]and[u],but whether it patterns with {i, e, y, ø} or with {u, ɰ,o, ʌ}—or does it pattern apart from either set? In lieu of clear examples of a contrast between central and back rounded vowels, or central and back unrounded vowels, we will not at the moment postulate any other feature for the front-back dimension: though section 3.6 considers possible evidence for the phonological relevance of the concept“central vowel”.Given the phonologically questionable status of distinctive central vowels, no significance should be attributed to the use of the symbol[ɨ]versus[ɰ],and typographic convenience may determine that a[+back, -round]high vowel is typically transcribed as[ɨ].

Two main features are employed to represent vowel height. High vowels are[+high]and[-low],low vowels are[+low]and[-high].No vowel can be simultaneously[+high]and[+low]since the tongue cannot be raised and lowered simultaneously; mid vowels are[-high, -low].In addition, any vowel can be produced with lip rounding, using the feature[round].These features allow us to characterize the following vowel contrasts.

Note that[ɑ]is a back low unrounded vowel, in contrast to the symbol[ɒ]for a back low rounded vowel.

Vowels with a laxer,“less deliberate, ” and lower articulation, such as[ɪ]in English sit or[ε]in English set, would be specified as[-tense].

Korean has a set of so-called“tense”consonants but these are phonetically“glottal”consonants.

One question which has not been resolved is the status of low vowels in terms of this feature. Unlike high and mid vowels, there do not seem to be analogous contrasts in low vowels between tense and lax[æ].Another important point about this feature is that while[back],[round],[high],and[low]will also play a role in defining consonants, [tense]plays no role in consonantal contrasts.

The difference between i and ɪ,or e and ε has also been considered to be one of vowel height(proposed in alternative models where vowel height is governed by a single scalar vowel height feature, rather than by the binary features[high]and[low]).This vowel contrast has also been described in terms of the feature“Advanced Tongue Root”(ATR), especially in the vowel systems of languages of Africa and Siberia. There has been debate over the phonetic difference between[ATR]and[tense].Typically, [+tense]front vowels are fronter than their lax counterparts, and[+tense]back vowels are backer than their lax counterparts. In comparison, [+ATR]vowels are supposed to be generally fronter than corresponding[-ATR]vowels, so that[+ATR]back vowels are phonetically fronter than their[—ATR]counterparts. However, some articulatory studies have shown that the physical basis for the tense/lax distinction in English is no different from that which ATR is based on. Unfortunately, the clearest examples of the feature[ATR]are found in languages of Africa, where very little phonetic research has been done. Since no language contrasts both[ATR]and[tense]vowels, it is usually supposed that there is a single feature, whose precise phonetic realization varies somewhat from language to language.

Consonant place features. The main features used for defining consonantal place of articulation are the following.

coronal: produced with the blade or tip of the tongue raised from the neutral position.

anterior: produced with a major constriction located at or in front of the alveolar ridge.

strident: produced with greater noisiness.

distributed: produced with a constriction that extends for a considerable distance along the direction of airflow.

Place of articulation in consonants is primarily described with the features[coronal]and[anterior].Labials, labiodentals, dentals, and alveolars are[+anterior]since their primary constriction is at or in front of the alveolar ridge(either at the lips, the teeth, or just back of the teeth)whereas other consonants(including laryngeals)are[-anterior],since they lack this front constriction. The best way to understand this feature is to remember that it is the defining difference between[s]and[ʃ],where[s]is[+anterior]and[ʃ]is[-anterior].Anything produced where[s]is produced, or in front of that position, is[+anterior];anything produced where[ʃ]is, or behind[ʃ],is[-anterior].

Remember that the two IPA letters <tʃ> represent a single[-anterior]segment, not a combination of[+anterior][t]and[-anterior][ʃ].

Consonants which involve the blade or tip of the tongue are[+coronal],and this covers the dentals, alveolars, alveopalatals, and retroflex consonants. Consonants at other places of articulation-labial, velar, uvular, and laryngeal—are[-coronal].Note that this feature does not encompass the body(back)of the tongue, so while velars and uvulars use the tongue, they use the body of the tongue rather than the blade or tip, and therefore are[-coronal].The division of consonants into classes as defined by[coronal]is illustrated below.

Two other features are important in characterizing the traditional places of articulation. The feature[distributed]is used in coronal sounds to distinguish dental[]from English alveolar[t],or alveopalatal[ʃ]from retroflex[ʂ]:the segments[,ʃ]are[+distributed]and[t, ʈ,ʂ]are[-distributed].The feature[distributed],as applied to coronal consonants, approximately corresponds to the traditional phonetic notion“apical”([-distributed])versus“laminal”([+distributed]).This feature is not rele-vantfor velar and labial sounds and we will not specify any value of[distributed]for noncoronal segments.

The feature[strident]distinguishes strident[f, s]from nonstrident[φ,θ]:otherwise, the consonants[f, φ]would have the same feature specifications. Note that the feature[strident]is defined in terms of the aerody-namic property of greater turbulence(which has the acoustic correlate of greater noise), not in terms of the movement of a particular articulator—this defining characteristic is accomplished by different articulatory configurations. In terms of contrastive usage, the feature[strident]only serves to distinguish bilabial and labiodentals, or interdentals and alveolars. A sound is[+strident]only if it has greater noisiness, and“greater”implies a comparison. In the case of[φ]vs. [f],[β]vs. [v],[θ]vs. [s],or[ð]vs. [z]the second sound in the pair is noisier. No specific degree of noisiness has been proposed which would allow you to determine in isolation whether a given sound meets the definition of strident or not. Thus it is impossible to determine whether[ʃ]is[+strident],since there is no contrast between strident and nonstrident alveopalatal sounds. The phoneme[ʃ]is certainly relatively noisy—noisier than[θ]—but then[θ]is noisier than[φ]is.

[Strident]is not strictly necessary for making a distinction between[s]and[θ],since[distributed]also distinguishes these phonemes. Since[strident]is therefore only crucial for distinguishing bilabial and labial fricatives, it seems questionable to postulate a feature with such broad implications solely to account for the contrast between labiodental and bilabial fricatives. Nonetheless, we need a way of representing this contrast. The main problem is that there are very few languages(such as Ewe, Venda, and Shona)which have both[f]and[φ],or[v]and[β],and the phonological rules of these languages do not give us evidence as to how this distinction should be made in terms of features. We will therefore only invoke the feature[strident]in connection with the[φ,β]vs. [f, v]contrast.

Using these three features, consonantal places of articulation can be partially distinguished as follows.

Vowel features on consonants. The features[high],[low],[back],and[round]are not reserved exclusively for vowels, and these typical vowel features can play a role in defining consonants as well. As we see in(10),velar, uvular, pharyngeal, and glottal places of articulation are not yet distinguished; this is where the features[high],[low],and[back]become important. Velar, uvular, and pharyngeal consonants are[+back]since they are produced with a retracted tongue body. The difference between velar and uvular consonants is that with velar consonants the tongue body is raised, whereas with uvular consonants it is not, and thus velars are[+high]where uvulars are[-high].Pharyngeal consonants are distinguished from uvulars in that pharyngeals are[+low]and uvulars are[-low],indicating that the constriction for pharyngeals is even lower than that for uvulars.

One traditional phonetic place of articulation for consonants is that of“palatal”consonants. The term“palatal”is used in many ways, for example the postalveolar or alveopalatal(palatoalveolar)consonants[ʃ]and[tʃ]might be referred to as palatals. This is strictly speaking a misnomer, and the term“palatal”is best used only for the“true palatals, ” transcribed as[c ç ɟ].Such consonants are found in Hungarian, and also in German in words like[iç]‘I’or in Norwegian[çø:per]‘buys’.These consonants are produced with the body of the tongue raised and fronted, and therefore they have the feature values[+high, -back].The classical feature system presented here provides no way to distinguish such palatals from palatalized velars([kj])either phonetically or phonologically. Palatalized(fronted)velars exist as allophonic vari-ants of velars before front vowels in English, e.g. [kjip]‘keep’;they are articulatorily and acoustically extremely similar to the palatals of Hungarian. Very little phonological evidence is available regarding the treatment of“palatals”versus“palatalized velars”:it is quite possible that[c]and[kj],or[ç]and[xj],are simply different symbols, chosen on the basis of phonological patterning rather than systematic phonetic differences.

With the addition of these features, the traditional places of articula-tion for consonants can now be fully distinguished.

The typical vowel features have an additional function as applied to consonants, namely that they define secondary articulations such as palatalization and rounding. Palatalization involves superimposing the raised and fronted tongue position of the glide[j]onto the canonical articulation of a consonant, thus the features[+high, -back]are added to the primary features that characterize a consonant(those being the features that typify[i, j]).So, for example, the essential feature characteristics of a bilabial are[+anterior, -coronal]and they are only incidentally[-high, -back].A palatalized bilabial would be[+anterior, -coronal, +high, -back].Velarized consonants have the features[+high, +back]analogous to the features of velar consonants; pharyngealized consonants have the features[+back, +low].Consonants may also bear the feature[round].Applying various possible secondary articulations to labial consonants results in the following specifications.

Labialized(pw),palatalized(pj),velarized(pγ)and pharyngealized(pʕ)variants are the most common categories of secondary articulation. Uvularized consonants, i.e. pq,are rare: uvularized clicks are attested in Ju/'hoansi. It is unknown if there is a contrast between rounded consonants differing in secondary height, symbolized above as pw vs. po or pɥ vs. pø.Feature theory allows such a contrast, so eventually we ought to find examples. If, as seems likely after some decades of research, such contrasts do not exist where predicted, there should be a revision of the theory, so that the predictions of the theory better match observations.

This treatment of secondary articulations makes other predictions. One is that there cannot be palatalized uvulars or pharyngeals. This follows from the fact that the features for palatalization([+high, -back])conflict with the features for uvulars([-high, +back])and pharyngeals([-high, +back, +low]).Since such segments do not appear to exist, this supports the theory: otherwise we expect—in lieu of a principle that prohibits them—that they will be found in some language. Second, in this theory a“pure”palatal consonant(such as Hungarian[ɟ])is equivalent to a palatalized(i.e. fronted)velar. Again, since no language makes a contrast between a palatal and a palatalized velar, this is a good prediction of the theory(unless such a contrast is uncovered, in which case it becomes a bad prediction of the theory).

3.2.4 Manner of articulation

Other features relate to the manner in which a segment is produced, apart from the location of the segment's constriction. The manner features are:

continuant(cont): the primary constriction is not narrowed so much that airflow through the oral cavity is blocked.

delayed release(del.rel): release of a total constriction is slowed so that a fricative is formed after the stop portion.

nasal(nas): the velum is lowered which allows air to escape through the nose.

lateral(lat): the mid section of the tongue is lowered at the side.

The feature[continuant]groups together vowels, glides, fricatives, and[h]as[+continuant].Note that[continuant]is a broader group than the traditional notion“fricative”which refers to segments such as[s],[ʃ],or[θ].

The term“fricative”generally refers to nonsonorant continuants, i.e. the class defined by the conjunction of features[+continuant, -sonorant].Since continuants are defined as sounds where air can flow continuously through the oral cavity, nasals like[m n ŋ]are[-continuant],even though they allow continuous airflow(through the nose).

Affricates such as[tʃ,pf]are characterized with the feature[+delayed release].Necessarily, all affricates are[-continuant],since they involve complete constriction followed by a period of partial fricative-like constriction, and therefore they behave essentially as a kind of stop. This feature is in question, since[pf tʃ kx]do not act as a unified phonological class; nevertheless, some feature is needed to characterize stops versus affricates. Various alternatives have been proposed, for example that[kx]might just be the pronunciation of aspirated[kh]since velar[kx]and[kh]never seem to contrast; perhaps the feature[strident]defines[ts]vs. [t].The proper representation of affricates is a currently unresolved issue in phonology.

The feature[+nasal]is assigned to sounds where air flows through the nasal passages, for example[n]as well as nasalized vowels like[ã].Liquids and fricatives can be nasalized as well, but the latter especially are quite rare. L-like sounds are characterized with the feature[lateral].Almost all[+lateral]sounds are coronal, though there are a few reports of velar laterals. Detailed information on the phonetics and phonology of these segments is not available.

Examples of the major manners of articulation are illustrated below, for coronal place of articulation.

3.2.5 Laryngeal features

Three features characterize the state of the glottis:

spread glottis(s.g.): the vocal folds are spread far apart.

constricted glottis(c.g.): the vocal folds are tightly constricted.

voice(voi):the vocal folds vibrate.

Voiced sounds are[+voice].The feature[spread glottis]describes aspirated obstruents([ph],[bh])and breathy sonorants([],[]);[constricted glottis]describes implosives([ɓ]),ejective obstruents([p']),and laryngealized sonorants([],[]).

How to distinguish implosives from ejectives is not entirely obvious, but the standard answer is that ejectives are[-voice]and implosives are[+voice].There are two problems with this. One is that implosives do not generally pattern with other[+voiced]consonants in phonological systems, especially in how consonants affect tone(voiced consonants, but typically not implosives, may lower following tones).The second is that Ngiti and Lendu have both voiced and voiceless implosives. The languages lack ejectives, which raises the possibility that voiceless implosives are phonologically[-voice, +c.g.],which is exactly the specification given to ejective consonants. You may wonder how[-voice, +c.g.]can be realized as an ejective in languages like Navajo, Tigre or Lushootseed, and as a voiceless implosive in Ngiti or Lendu. This is possible because feature values give approximate phonetic descriptions, not exact ones. The Korean“fortis”consonants, found in[k'ata]‘peel(noun),’[ak'i]‘musical instrument, ’ or[alt'a]‘be ill, ’ are often described as glottalized, and phonetic studies have shown that they are produced with glottal constrictions: thus they would be described as[-voice, +c.g.].Nevertheless, they are not ejectives. Similarly, Khoekhoe(Nama)has a contrast between plain clicks([]‘deep’)and glottalized ones([]‘kill’),but the glottalized clicks realize the feature[+c.g.]as a simple constriction of the glottis, not involving an ejective release.

The usual explanation for the difference between ejectives in Navajo and glottalized nonejective consonants in Korean or Khoekhoe is that they have the same phonological specifications, [-voice, +c.g.],but realize the features differently due to language-specific differences in principles of phonetic implementation. This is an area of feature theory where more research is required.

The representations of laryngeal contrasts in consonants are given below.

3.2.6 Prosodic features

Finally, in order to account for the existence of length distinctions, and to represent stressed versus unstressed vowels, two other features were proposed:

long: has greater duration.

stress: has greater emphasis, higher amplitude and pitch, longer duration.

These are obvious: long segments are[+long]and stressed vowels are[+stress].

A major lacuna in the Chomsky and Halle(1968)account of features is a lack of features for tone. This is remedied in chapter 9 when we introduce nonlinear representations. For the moment, we can at least assume that tones are governed by a binary feature[±high tone]—this allows only two levels of tone, but we will not be concerned with languages having more than two tone levels until chapter 9.

Alan Prince & Paul Smolensk(2004)Optimality Theory: Constraint Interaction in Generative Grammar 选读(13)

◆ 作者简介

艾伦·普林斯(Alan Prince, 1946— ),本科就读于加拿大麦吉尔大学,1975年在麻省理工大学取得博士学位,曾先后执教于布兰德斯大学、马萨诸塞大学安姆斯特分校、罗格斯大学新布朗斯维克分校。他是优选论的两位主要创始人之一。

保罗·斯摩棱斯克(Paul Smolensk, 1955— )曾先后在哈佛大学、印第安纳大学大学取得物理学学士、硕士学位,并最终取得数学物理博士学位。他曾先后执教于科罗拉多大学波尔得分校、约翰霍普金斯大学。他是优选论的两位主要创始人之一。

◆ 正文节选

Chapter 1 Preliminaries

1.1 Background and Overview

As originally conceived, the RULE of grammar was to be built from a Structural Description delimiting a class of inputs and a Structural Change specifying the operations that altered the input(e.g. Chomsky 1961).The central thrust of linguistic investigation would therefore be to explicate the system of predicates used to analyze inputs—the possible Structural Descriptions of rules—and to define the operations available for transforming inputs-the possible Structural Changes of rules. This conception has been jolted repeatedly by the discovery that the significant regularities were to be found not in input configurations, nor in the formal details of structure-deforming operations, but rather in the character of the output structures, which ought by rights to be nothing more than epiphenomenal. We can trace a path by which“conditions”on well-formedness start out as peripheral annotations guiding the interpretation of re-write rules, and, metamorphosing by stages into constraints on output structure, end up as the central object of linguistic study.

As the theory of representations in syntax ramified, the theory of operations dwindled in content, even to triviality and, for some, nonexistence. The parallel development inphonology and morphology has been underway for a number of years, but the outcome is perhaps less clear—both in the sense that one view has failed to predominate, and in the sense that much work is itself imperfectly articulate on crucial points. What is clear is that any serious theory of phonology must rely heavily on well-formedness constraints; where by‘serious’we mean‘committed to Universal Grammar’.What remains in dispute, or in subformal obscurity, is the character of the interaction among the posited well-formedness constraints, and, equally, the relation between such constraints and whatever derivational rules they are meant to influence. Given the pervasiveness of this unclarity, and the extent to which it impedes understanding even the most basic functioning of the grammar, it is not excessively dramatic to speak of the issues surrounding the role of well-formedness constraints as involving a kind of conceptual crisis at the center of phonological thought.

Our goal is to develop and explore a theory of the way that representational well-formedness determines the assignment of grammatical structure. We aim therefore to ratify and to extend the results of modern research on the role of constraints in phonological grammar. This body of work is so large and various as to defy concise citation, but we would like to point to such important pieces as Kisseberth 1972,Haiman 1972,Pyle 1972,Hale 1973,Sommerstein 1974,where the basic issues are recognized and addressed; to Wheeler 1981,1988,Bach and Wheeler 1981,Broselow 1982,Dressler 1985,Singh 1987,Paradis 1988ab, Paradis & Prunet 1991,Noske 1982,Hulst 1984,Kaye & Lowenstamm 1984,Kaye, Lowenstamm, & Vergnaud 1985,Calabrese 1988,Myers 1991,Goldsmith 1991,1993,Bird 1990,Coleman 1991,Scobbie 1991,which all represent important strands in recent work; as well as to Vennemann 1972,Hooper[Bybee]1972,1985,Liberman 1975,Goldsmith 1976,Liberman & Prince 1977,McCarthy 1979,McCarthy & Prince 1986,Selkirk 1980ab, 1981,Kiparsky 1981,1982,Kaye & Lowenstamm 1981,McCarthy 1981,1986,Lapointe & Feinstein 1982,Cairns & Feinstein 1982,Steriade 1982,Prince 1983,1990,Kager & Visch 1984ab, Hayes 1984,Hyman 1985,Wurzel 1985,Borowsky 1986ab, Itô 1986,1989,Mester 1986,1992,Halle & Vergnaud 1987,Lakoff 1988,1993,Yip 1988,Cairns 1988,Kager 1989,Visch 1989,Clements 1990,Legendre, Miyata, & Smolensky 1990bc, Mohanan 1991,1993,Archangeli & Pulleyblank 1992,Burzio 1992ab, Itô,Kitagawa, & Mester 1992,Itô & Mester 1992—a sample of work which offers an array of perspectives on the kinds of problems we will be concerned with—some close to, others more distant from our own, and some contributory of fundamental representational notions that will put in appearances throughout this work(for which, see the local references in the text below).Illuminating discussion of fundamental issues and an interesting interpretation of the historical development is found in Goldsmith 1990;Scobbie 1992 reviews further relevant background.

The work of Stampe 1973/79,though framed in a very different way, shares central abstract commitments with our own, particularly in its then-radical conception of substantive universality, which we will assume in a form that makes sense within our proposals. Perhaps more distantly related are chapter 9 of Chomsky & Halle 1968 and Kean 1974.The work of Wertheimer 1923,Lerdahl & Jackendoff 1983(chs 3 and 12),Jackendoff 1983(chs 7 and 8),1987,1991,though not concerned with phonology at all, provide significant conceptual antecedents in their focus on the role of preference; similarly, the proposals of Chomsky 1986,and especially 1989,1992,though very different in implementation, have fundamental similarities with our own. Perlmutter 1971,Rizzi 1990,Bittner 1993,Legendre, Raymond, & Smolensky 1993,and Grimshaw 1993,are among works in syntax and semantics that resonate with our particular concerns.

The basic idea we will explore is that Universal Grammar(UG)consists largely of a set of constraints on representational well-formedness, out of which individual grammars are constructed. The representational system we employ, using ideas introduced into generative phonology in the 1970s and 1980s, will be rich enough to support two fundamental classes of constraints: those that assess output configurations per se and those responsible for maintaining the faithful preservation of underlying structures in the output. Departing from the usual view, we do not assume that the constraints in a grammar are mutually consistent, each true of the observable surface or of some level of representation or of the relation between levels of representation. On the contrary: we assert that the constraints operating in a particular language are highly conflicting and make sharply contrary claims about the well-formedness of most representations. The grammar consists of the constraints together with a general means of resolving their conflicts. We argue further that this conception is an essential prerequisite for a substantive theory of UG.

It follows that many of the conditions which define a particular grammar are, of necessity, frequently violated in the actual forms of the language. The licit analyses are those which satisfy the conflicting constraint set as well as possible; they constitute the optimal analyses of underlying forms. This, then, is a theory of optimality with respect to a grammatical system rather than of well-formedness with respect to isolated individual constraints.

The heart of the proposal is a means for precisely determining which analysis of an input best satisfies—or least violates—a set of conflicting conditions. For most inputs, it will be the case that every possible analysis violates many constraints. The grammar rates all these analyses according to how well they satisfy the whole constraint set and declares any analysis at the top of this list to be optimal. Such an analysis is assigned by the grammar as output to that input. The grammatically well-formed structures are exactly those that are optimal in this sense.

How does a grammar determine which analysis of a given input best satisfies a set of inconsistent well-formedness conditions? Optimality Theory relies on a conceptually simple but surprisingly rich notion of constraint interaction whereby the satisfaction of one constraint can be designated to take absolute priority over the satisfaction of another. The means that a grammar uses to resolve conflicts is to rank constraints in a strict domination hierarchy. Each constraint has absolute priority over all the constraints lower in the hierarchy.

Such prioritizing is in fact found with surprising frequency in the literature, typically as a subsidiary remark in the presentation of complex constraints.(14) We will show that once the notion of constraint-precedence is brought in from the periphery and foregrounded, it reveals itself to be of remarkably wide generality, the formal engine driving many grammatical interactions. It will follow that much that has been attributed to narrowly specific constructional rules or to highly particularized conditions is actually the responsibility of very general well-formedness constraints. In addition, a diversity of effects, previously understood in terms of the triggering or blocking of rules by constraints(or merely by special conditions), will be seen to emerge from constraint interaction.

Although we do not draw on the formal tools of connectionism in constructing Optimality Theory, we will establish a high-level conceptual rapport between the mode of functioning of grammars and that of certain kinds of connectionist networks: what Smolensky(1983,1986)has called‘Harmony maximization’,the passage to an output state with the maximal attainable consistency between constraints bearing on a given input, where the level of consistency is determined exactly by a measure derived from statistical physics. The degree to which a possible analysis of an input satisfies a set of conflicting well-formedness constraints will be referred to as the Harmony of that analysis. We thereby respect the absoluteness of the term‘well-formed’,avoiding terminological confusion and at the same time emphasizing the abstract relation between Optimality Theory and Harmony-theoretic network analysis. In these terms, a grammar is precisely a means of determining which of a pair of structural descriptions is more harmonic. Via pair-wise comparison of alternative analyses, the grammar imposes a harmonic order on the entire set of possible analyses of a given underlying form. The actual output is the most harmonic analysis of all, the optimal one. A structural description is well-formed if and only if the grammar determines it to be an optimal analysis of the corresponding underlying form.

With an improved understanding of constraint interaction, a far more ambitious goal becomes accessible: to build individual grammars directly from universal principles of well-formedness, much as Stampe 1973/79 and Bach 1965 envisioned, in the context of rule theories, building grammars from a universal vocabulary of rules.(This is clearly impossible if we imagine that constraints or rules must be surface- or level-true and hence non-interactive.)The goal is to attain a significant increase in the predictiveness and explanatory force of grammatical theory. The conception we pursue can be stated, in its purest form, as follows: Universal Grammar provides a set of highly general constraints. These often conflicting constraints are all operative in individual languages. Languages differ primarily in the way they resolve the conflicts: in how they rank these universal constraints in strict domination hierarchies that determine the circumstances under which constraints are violated. A language-particular grammar is a means of resolving the conflicts among universal constraints.

On this view, Universal Grammar provides not only the formal mechanisms for constructing particular grammars, but also the very substance that grammars are built from. Although we shall be entirely concerned in this work with phonology and morphology, we note the implications for syntax and semantics.

1.2 Optimality

The standard phonological rule aims to encode grammatical generalizations in this format:

(1)A→B / C—D

The rule scans potential inputs for structures CAD and performs the change on them that is explicitly spelled out in the rule: the unit denoted by A takes on property B. For this format to be worth pursuing, there must be an interesting theory which defines the class of possible predicates CAD(Structural Descriptions)and another theory which defines the class of possible operations A →B(Structural Changes).If these theories are loose and uninformative, as indeed they have proved to be in reality, we must entertain one of two conclusions:

(i)phonology itself simply doesn't have much content, is mostly‘periphery’rather than‘core’,is just a technique for data-compression, with aspirations to depth subverted by the inevitable idiosyncrasies of history and lexicon; or

(ii)the locus of explanatory action is elsewhere.

We suspect the latter.

The explanatory burden can of course be distributed quite differently than in the re-write rule theory. Suppose that the input-output relation is governed by conditions on the well-formedness of the output,‘markedness constraints’,and by conditions asking for the exact preservation of the input in the output along various dimensions,‘faithfulness constraints’.In this case, the inputs falling under the influence of a constraint need share no input-specifiable structure(CAD), nor need there be a single determinate transformation(A→B)that affects them. Rather, we generate(or admit)a set of candidate outputs, perhaps by very general conditions indeed, and then we assess the candidates, seeking the one that best satisfies the relevant constraints. Many possibilities are open to contemplation, but some well-defined measure of value excludes all but the best.(15) The process can be schematically represented like this:

(2)Structure of Optimality-Theoretic Grammar
(a)Gen(Ink)      →{Out1,Out2,...}
(b)H-eval(Outi,1 ≤ i ≤ ∞)→Outreal

The grammar must define a pairing of underlying and surface forms,(inputi,outputj).Each input is associated with a candidate set of possible analyses by the function Gen(short for‘generator’),a fixed part of Universal Grammar. In the rich representational system employed below, an output form retains its input as a subrepresentation, so that departures from faithfulness may be detected by scrutiny of output forms alone. A‘candidate’is an input-output pair, here formally encoded in what is called‘Outi’ in(2).Gen contains information about the representational primitives and their universally irrevocable relations: for example, that the node σ may dominate a node Onset or a node μ(implementing some theory of syllable structure), but never vice versa. Gen will also determine such matters as whether every segment must be syllabified—we assume not, below, following McCarthy 1979 et seq.—and whether every node of syllable structure must dominate segmental material—again, we will assume not, following Itô 1986,1989.The function H-eval evaluates the relative Harmony of the candidates, imposing an order on the entire set. An optimal output is at the top of the harmonic order on the candidate set; by definition, it best satisfies the constraint system. Though Gen has a role to play, the burden of explanation falls principally on the function H-eval, a construction built from well-formedness constraints, and the account of interlinguistic differences is entirely tied to the different ways the constraint-system H-eval can be put together, given UG.

H-eval must be constructible in a general way if the theory is to be worth pursuing. There are really two notions of generality involved here: general with respect to UG, and therefore cross-linguistically; and general with respect to the language at hand, and therefore across constructions, categories, descriptive generalizations, etc. These are logically independent, and success along either dimension of generality would count as an argument in favor of the optimality approach. But the strongest argument, the one that is most consonant with the work in the area, and the one that will be pursued here, breaches the distinction, seeking a formulation of H-eval that is built from maximally universal constraints which apply with maximal breadth over an entire language. It is in this set of constraints, Con, that the substantive universals revealed by the theory lie.

Optimality Theory, in common with much previous work, shifts the burden from the theory of operations(Gen)to the theory of well-formedness(H-eval).To the degree that the theory of well-formedness can be put generally, the theory will fulfill the basic goals of generative grammar. To the extent that operation-based theories cannot be so put, they must be rejected.

Among possible developments of the optimality idea, it is useful to distinguish some basic architectural variants. Perhaps nearest to the familiar derivational conceptions of grammar is what we might call‘harmonic serialism’,by which Gen provides a set of candidate analyses for an input, which are harmonically evaluated; the optimal form is then fed back into Gen, which produces another set of analyses, which are then evaluated; and so on until no further improvement in representational Harmony is possible. Here Gen might mean:‘do any one thing: advance all candidates which differ in one respect from the input.’ The Gen ⇆H-eval loop would iterate until there was nothing left to be done or, better, until nothing that could be done would result in increased Harmony. A significant proposal of roughly this character is the Theory of Constraints and Repair Strategies of Paradis 1988ab, with a couple of caveats: the constraints involved are a set of parochial level-true phonotactic statements, rather than being universal and violable, as we insist; and the repair strategies are quite narrowly specifiable in terms of structural description and structural change rather than being of the general‘do-something-to-α’ variety. Paradis confronts the central complexity implicit in the notion‘repair’:what to do when applying a repair strategy to satisfy one constraint results in violation of another constraint(i.e. at an intermediate level of derivation).Paradis refers to such situations as‘constraint conflicts’and although these are not conflicts in our sense of the term—they cannot be, as Robert Kirchner has pointed out to us, since all of her constraints are surface- or level-true and therefore never disagree among themselves in the assessment of output well-formedness—her work is of unique importance in addressing and shedding light on fundamental complexities in the idea of well-formedness-driven rule-application. The‘persistent rule’theory of Myers 1991 can similarly be related to the notion of Harmony-governed serialism. The program for Harmonic Phonology in Goldsmith 1991,1993,is even more strongly of this character; within its lexical levels, all rules are constrained to apply harmonically. Here again, however, the rules are conceived of as being pretty much of the familiar sort, triggered if they increase Harmony, and Harmony itself is to be defined in specifically phonotactic terms. A subtheory which is very much in the mold of harmonic serialism, using a general procedure to produce candidates, is the‘Move-x’theory of rhythmic adjustment(Prince 1983,Hayes 1991/95).(16)

A contrasting view would hold that the Input →Output map has no internal structure: all possible variants are produced by Gen in one step and evaluated in parallel. In the course of this work, we will see instances of both kinds of analysis, though we will focus predominantly on developing the parallel idea, finding strong support for it, as do McCarthy & Prince 1993a. Definitive adjudication between parallel and serial conceptions, not to mention hybrids of various kinds, is a challenge of considerable subtlety, as indeed the debate over the necessity of serial Move-α illustrates plentifully(e.g. Aoun 1986,Browning 1991,Chomsky 1981),and the matter can be sensibly addressed only after much well-founded analytical work and theoretical exploration.

Optimality Theory abandons two key presuppositions of earlier work. First, that grammatical theory allows individual grammars to narrowly and parochially specify the Structural Description and Structural Change of rules. In place of this is Gen, which defines for any given input a large space of candidate analyses by freely exercising the basic structural resources of the representational theory. The idea is that the desired output lies somewhere in this space, and the constraint system is strong enough to single it out. Second, Optimality Theory abandons the widely held view that constraints are language-particular statements of phonotactic truth. In its place is the assertion that the constraints of Con are universal and of very general formulation, with great potential for disagreement over the well-formedness of analyses; an individual grammar consists of a ranking of these constraints, which resolves any conflict in favor of the higher-ranked constraint. The constraints provided by Universal Grammar must be simple and general; interlinguistic differences arise from the permutations of constraint-ranking; typology is the study of the range of systems that re-ranking permits.

Because they are ranked, constraints are regularly violated in the grammatical forms of a language. Violability has significant consequences not only for the mechanics of description, but also for the process of theory construction: a new class of predicates becomes usable in the formal theory, with a concomitant shift in what we can think the actual generalizations are. We cannot expect the world to stay the same when we change our way of describing it.