by Max Barry

Latest Forum Topics

Advertisement

4

DispatchMetaReference

by The Indefinite Hiatus of Vrnallia. . 36 reads.

Conlanging: the beauty of irregularity

It is easy, as a conlanger, to find yourself purposefully avoiding any kind of irregularity. Perhaps here and there you specify some exception, or you have some sort of assimilation or elision which could be considered irregular, but in general you might find that you are creating a very regular language.

Now, there's nothing inherently wrong with that. If you're creating an auxiliary language it's desirable, and even if not, there are certainly very regular natural languages. Quechua, for example, has basically no irregular verbs ─ even the verb "to be" (Linkkay) is regular. But most languages fall somewhere on a spectrum of regularity, with the likes of Quechua at one end and Archi at the other end.

(LinkQuechua is the group of dialects descended from the language of the Incans. And as for LinkArchi? It's a Northeast Caucasian language and it is weird)

In other words, the majority of languages will have some degree of irregularity, and if your goal is to create a naturalistic language some amount of irregularity is therefore desirable. So this guide intends to provide some tips for conlangers to achieve passable, naturalistic irregularity in languages. I'm going to assume that you, the reader are a conlanger (or prospective conlanger) and as a result have an interest in linguistics. So I'll use linguistic terminology where appropriate, but I shall endeavour to keep things as simple as possible.

I'm also going to refer to my own conlang, LinkVrnallian, a bunch. I promise it isn't just shameless self-promotion! Well, mostly. But really I just want to illustrate how I used the sort of points I cover in one of my own languages. Vrnallian is intended to be naturalistic, and I can refer back to historical forms to show how once-regular forms became irregular through a millennium of sound change. And by a millennium I mean less than a second, since I used a program to apply sound changes. Heh.

Starting with regularity; or, using historical linguistics

Probably the easiest way to introduce irregularity into a language is to take a regular system ─ a paradigm of noun cases, or verbal conjugations, or whatever ─ and bludgeon them into an irregular form. To make this a truly natural process we need to apply certain linguistic principles.

First, let me introduce some forms of two nouns. One is the reconstructed LinkProto-Uralic noun Link*wete "water", the other is the Old Vrnallian noun púhíđka "song":

nominative *wete, genitive *weten, locative *wetenä, genitive plural *wetejn
absolutive púhíđka, ergative púhíđkar, genitive puhíđkáhi, genitive plural púhiđkádéhi, locative plural púhiđkádégo

The meaning of each form is irrelevant. Note that an asterisk marks a form which is reconstructed based on evidence, rather than being directly attested in writing.

Both Proto-Uralic and Old Vrnallian were generally regular, agglutinative languages. In Proto-Uralic we can clearly discern separate suffixes *-n and *-nA for the genitive and locative cases, as well as a plural marker *-j. In Old Vrnallian the suffixes -r and -i are used for the ergative and genitive cases, -go for the locative case, with -de- marking the plural.

Now let's look at the equivalent forms in modern Finnish and Vrnallian:

nominative vesi, genitive veden, inessive vedes / adessive vedel, genitive plural vesien
absolutive pujk, ergative pujkhar, genitive piekháj, genitive plural piekhazej, locative plural piekhazióh

In the modern languages, the formerly regular and divisible system has become to some extent irregular. In Finnish the irregularity consists in the alternation between a nominative singular form ending in -si and a stem ending in -te- (the change from -te- to -de- is regular). The locative ending has merged with other endings to produce the inessive ending -ssA and adessive ending -llA and is no longer divisible. Additionally, the presumed Proto-Uralic genitive plural form *wetejn has been replaced with a form which appears to be derived from an ending *-ten (possibly a fusion of the nominative plural marker *-t with the genitive marker).

In Modern Vrnallian, even more irregularity has arisen. The stem unpredictably varies between pujk in the absolutive singular, pujkha- in the ergative singular and piekha- in the other cases. Also, the formerly distinct plural ending -de- now varies between two forms: z and ž (the latter being written zi before a vowel).

All these changes largely developed through a process of regular sound change. Let's look at just two of the Proto-Uralic and Finnish forms, namely the nominative and genitive singular forms:

*wete > vesi
*weten > veden

The alternation between the end of the stem can be described as a historical progression which looks something like this:

*wete > *weti > Link*veci [*vetsi] > *vesi > vesi
*weten > *veten > *veðen > veden

Regularly, a short e at the end of a word becomes i in Finnish, and by the time of Proto-Finnic the sequence ti had been replaced by ci, which in Finnish became si. This change did not occur when said short e occurred in the middle of a word.

I won't provide the same step by step explanation for Vrnallian, since it is quite a bit more complicated. But I can assure you that the set of sound changes which turn púhiđkádéhi into piekhazej are just as regular as those which turn *wete into *vesi.

A program for sound changes

LinkThis is Zompist's Sound Change Applier. I won't explain how to use it, since you can just click the "Help me!" button for that, but what I will do is explain the point of it.

Overwhelmingly more than not, sound change is a regular process. If we say something like "every instance of original /k/ becomes /s/ in such-and-such an environment", we should have almost no exceptions. If there does seem to be an exception, we would generally assume some reason, for example if a given dialect at some time did not carry out the aforementioned /k/ to /s/ change, then the apparent exception may have just been borrowed from that dialect.

(Borrowed from a dialect? Well, it happened with the English word one, among other words. The normal pronunciation once rhymed with alone. Standard Vrnallian borrowed the word vljempa "sapling" from a dialect which did not change mp to mb)

By using a program such as the SCA, you can write a set of rules which regularly apply to any word you input. The advantages to such an automatic system should be obvious: if you make ad-hoc rules, then unless you are diligent in writing them down in the correct order, you will probably wind up contradicting yourself frequently. Even then, there is plenty of room for mistakes in the manual approach which you avoid using a program.

That said, don't just copy every output word into your dictionary. Read the next section first...

Making the irregular regular; or, how to keep your blood pressure down

I'm not going to say it's impossible for a language to be more irregular than it is regular. After all, Archi exists. But in the vast majority of instances of particularly inconsistent instances of irregularity tend to be to a greater or lesser extent regularised through a process called analogy. Take a look at the following Middle Vrnallian forms of the noun "tin":

absolutive kiejlas, genitive kilúj, locative klah, ergative plural klazi, locative plural kiejlazióh

A brief glance back at pujk up above and you should realise that this noun is on the next level in terms of irregularity. Not only does the stem alternate between kiejla-, kilu- and kla-, but the ergative plural is also irregular (there is not normally an -i at the end). So here we have a noun which those regular sound changes have completely mangled.

A comparison with other nouns would show that the kind of irregularity seen with this noun is exceedingly rare. As a result, it is not unsurprising that this noun was made more regular. The specific process in this case is called levelling: all of those strange stem variants were replaced with the single form kiejla-, as this was the most common stem form, while the irregular case endings were replaced with the regular forms. In modern Vrnallian we have:

kiejlas, kiejlaj, kiejlóh, kiejláz, kiejlazióh

To take a slightly different example, let's look at the English verb dive. What is the past tense form of this verb? The original past tense, still used by many speakers, is Linkdived. But a more recent variant, which mainly occurs in North America is well as in some dialects elsewhere, is Linkdove. This variant was in fact formed by analogy to verbs such as drive > drove. In this case, we have analogy producing a more irregular form. But even this irregular form results from the generalisation of an observable pattern.

Suppletion; or, what do when sound changes are too much

Irregularity frequently, but not always, derives from instances where a regular root has been affected by sound change. Another way that irregularity can occur is through certain formerly distinct roots being conflated within a single semantic sphere or paradigm. Examples in English are numerous:

be, is, was
go, went
I, me
she, her
good, better
bad, worse

Suppletion is incredibly common cross-linguistically. Here are examples from other natural languages:

LinkAncient Greek eimi "I (will) go", ēlthon "I went"
LinkSpanish ir "to go", voy "I go", fui "I went"
LinkHindi-LinkUrdu hai "is", thā "was"
Czech malý "small", menší "smaller"
Finnish hyvä "good", paras "best"
LinkMaltese mara "woman", nisa "women"

And one from Vrnallian for good measure: jeran "it becomes", umizovun "it became".

As should be obvious, suppletion tends to be restricted to particularly basic semantic categories: verbs or motion or existence, pronouns, basic nouns and basic adjectives. But where exactly does the variation come from? We'll just look at three of the above words:

The word good goes back to a Proto-Indo-European root *ghedh- meaning "unite, suit". Meanwhile better goes back to a root *bhed- meaning, interestingly enough, "good". It's therefore clear that an original, more regular Indo-European root *bhed- became irregular when it was conflated with a different root *ghedh-. The fact that cognates of good and better are so numerous within the Germanic family (German gut, besser; Swedish god, bättre; Icelandic góður, betri) means that this conflation must have taken place by the time of Proto-Germanic itself, making it a particularly old instance of suppletion.

The Spanish verb ir "to go" uses multiple suppletive roots in its conjugation, having a present tense voy "I go" and a preterite tense fui "I went" for example. The variation in this verb goes back to multiple distinct verbs in Latin: the infinitive and related forms derive from the usual Latin verb "to go", ire. The present tense forms derive from the verb vadere, also meaning "to go" but implying movement by foot ("to walk"). The preterite forms meanwhile derive from fugere "to flee" > "to go quickly".

The Vrnallian verb jeran "it becomes" (infinitive jrata), with its past tense stem umizovu- is in fact the only verb in Vrnallian which uses suppletion in its conjugation. The infinitive and related forms derive from the Old Vrnallian for "become", érápta, while the past tense derives from a verb uamídápta "to start to be", a derivative of uápta "to be".

Beyond words; or, irregularity on the phrase level

Irregular declension or conjugation is all well and good, but languages are not just single words strung together. So let's consider phrases.

If your language has grammatical gender, a noun can have an irregular gender. You might think "ah! Like all those Spanish nouns ending in -o which are really feminine, like Linkla mano!" Well, that's not what I mean. We have to move away from conflating the ending of a noun with its gender: though the two often correspond, at the very least a language will have some instances of natural gender overriding a word's grammatical ending when a gender is assigned. Sticking with Spanish, Linkel piloto "the (male) pilot" but la piloto "the (female) pilot".

What I'm really talking about is something like the (literary) French noun Linkamour:

le bon amour (masculine)
les bonnes amours (feminine)

In the singular, amour is masculine, and takes the masculine adjective bon rather than feminine bonne. In the plural, however, the gender changes to feminine, and the feminine adjective bonnes is used instead of the masculine bons.

(As it happens, the adjective bon is also irregular: unlike most adjectives it precedes rather than follows the noun it modifies: un amour secret "a secret love")

In languages where gender is assigned purely semantically, irregularity can be introduced where there seems to be a mismatch between a word's natural and grammatical gender. In practice, speakers of such languages seem to consider there to be some link between the noun in question and the gender to which it is assigned, or did at some point in the past. Amongst the LinkOjibwe, whose Linklanguage distinguishes two categories of gender, animate and inanimate, seemingly inanimate objects may be given animate grammatical if they are felt to possess a spirit: Linkmitig "tree" and Linkasin "rock" are both animate.

Another common area of irregularity is numeral-noun agreement. The rules in LinkRussian look like this:

  • After odin/odna/odno "one", the noun and numeral agree in gender, number and case (yes, number. The number "one" has a plural form, odni)

  • After dva "two", tri "three" and chetyre "four", when the numeral is nominative or inanimate accusative, the noun is placed into the genitive singular; otherwise, the noun is placed into the plural of the corresponding case

  • After other numbers, except for those ending in one of the preceding numbers, when the numeral is nominative or accusative (regardless of animacy), the noun is placed into the genitive plural; otherwise, the noun is placed into the plural of the corresponding case

Sound complicated? Yeah, Russian is weird.

I didn't make Vrnallian too that much easier, though: numbers up to and including hozuž "four" agree in case with the head noun, while the noun agrees with the numeral in number. From ku "five" to upiu "ten", and for every multiple of ten, the numeral agrees in case with the head noun, but the head is forced into the singular. Other numbers are in fact nouns which force the head noun to become a genitive attributive (and therefore the noun precedes these numerals).

Finally, let's take a brief look at syntax. Most languages have some sort of phrase order which is most common. In English it's subject-verb-object, in Welsh it's verb-subject-object, in Latin it was subject-object-verb, in LinkHixkaryana it's object-verb-subject... you get the point. But many languages have ways to change the order. In English it's extremely marked (not a thing said I) and comes across as poetic or highfalutin. In Mandarin, you can use a particle Link to use subject-object-verb order, while in German subject-object-verb is arguably the default order. But transformations like these aren't really irregular: in German the movement of the verb within a clause is predictable, while in Mandarin the use of is used to emphasise the object noun.

What, then, would a true syntactical irregularity be? Well, one possibility is presented by the Linkmovement paradox:

Aren't I your friend? (an acceptable sentence)
*I aren't your friend (an unacceptable sentence)
I am not your friend (an acceptable sentence)

In English, the usual contracted form of a question, which involves fronting of the verb (here, be; elsewhere usually do), does not follow the normal rules of agreement in English. If it did, we would expect the question to be am I not your friend? While this is perfectly acceptable, the contracted form amn't I your friend? is far less common. The approach to syntax called Linktransformational grammar cannot explain this as a regular process. In other words, by this approach to syntax it is necessary to consider this to be irregular.

The same movement paradox occurs in this German example, meaning "one attempted to fix the car yesterday":

Gestern wurde versucht, den Wagen zu reparieren
*Den Wagen wurde gestern versucht zu reparieren
Der Wagen wurde gestern versucht zu reparieren

Treating the first sentence as the default, the regular transformation which would be achieved through fronting the noun den Wagen "the car (accusative)" is ungrammatical. The noun must be placed into the nominative case, der Wagen.

The Indefinite Hiatus of Vrnallia

Edited:

RawReport