# The Complete Lojban Language

### An unofficial publication

Version geklojban-1.2.5, Generated 2020-03-07

## 1.1. What is Lojban?

Lojban (pronounced LOZH-bahn) is a constructed language. Previous versions of the language were called Loglan by Dr. James Cooke Brown, who founded the Loglan Project and started the development of the language in 1955. The goals for the language were first described in the open literature in the article “Loglan”, published in Scientific American, June, 1960. Made well-known by that article and by occasional references in science fiction (most notably in Robert Heinlein's novel The Moon Is A Harsh Mistress) and computer publications, Loglan and Lojban have been built over four decades by dozens of workers and hundreds of supporters, led since 1987 by The Logical Language Group (who are the publishers of this book).

There are thousands of artificial languages (of which Esperanto is the best-known), but Loglan/Lojban has been engineered to make it unique in several ways. The following are the main features of Lojban:

• Lojban is designed to be used by people in communication with each other, and possibly in the future with computers.

• Lojban is designed to be neutral between cultures.

• Lojban grammar is based on the principles of predicate logic.

• Lojban has an unambiguous yet flexible grammar.

• Lojban has phonetic spelling, and unambiguously resolves its sounds into words.

• Lojban is simple compared to natural languages; it is easy to learn.

• Lojban's 1300 root words can be easily combined to form a vocabulary of millions of words.

• Lojban is regular; the rules of the language are without exceptions.

• Lojban attempts to remove restrictions on creative and clear thought and communication.

• Lojban has a variety of uses, ranging from the creative to the scientific, from the theoretical to the practical.

• Lojban has been demonstrated in translation and in original works of prose and poetry.

## 1.2. What is this book?

This book is what is called a reference grammar. It attempts to expound the whole Lojban language, or at least as much of it as is understood at present. Lojban is a rich language with many features, and an attempt has been made to discover the functions of those features. The word discover is used advisedly; Lojban was not invented by any one person or committee. Often, grammatical features were introduced into the language long before their usage was fully understood. Sometimes they were introduced for one reason, only to prove more useful for other reasons not recognized at the time.

By intention, this book is complete in description but not in explanation. For every rule in the formal Lojban grammar (given in Chapter 21), there is a bit of explanation and an example somewhere in the book, and often a great deal more than a bit. In essence, Chapter 2 gives a brief overview of the language, Chapter 21 gives the formal structure of the language, and the chapters in between put semantic flesh on those formal bones. I hope that eventually more grammatical material founded on (or even correcting) the explanations in this book will become available.

Nevertheless, the publication of this book is, in one sense, the completion of a long period of language evolution. With the exception of a possible revision of the language that will not even be considered until five years from publication date, and any revisions of this book needed to correct outright errors, the language described in this book will not be changing by deliberate act of its creators any more. Instead, language change will take place in the form of new vocabulary – Lojban does not yet have nearly the vocabulary it needs to be a fully usable language of the modern world, as Chapter 12 explains – and through the irregular natural processes of drift and (who knows?) native-speaker evolution. (Teach your children Lojban!) You can learn the language described here with assurance that (unlike previous versions of Lojban and Loglan, as well as most other artificial languages) it will not be subject to further fiddling by language-meisters.

It is probably worth mentioning that this book was written somewhat piecemeal. Each chapter began life as an explication of a specific Lojban topic; only later did these begin to clump together into a larger structure of words and ideas. Therefore, there are perhaps not as many cross-references as there should be. However, I have attempted to make the index as comprehensive as possible.

Each chapter has a descriptive title, often involving some play on words; this is an attempt to make the chapters more memorable. The title of Chapter 1 (which you are now reading), for example, is an allusion to the book English As We Speak It In Ireland, by P. W. Joyce, which is a sort of informal reference grammar of Hiberno-English. Lojbanistan is both an imaginary country where Lojban is the native language, and a term for the actual community of Lojban-speakers, scattered over the world. Why mangle? As yet, nobody in the real Lojbanistan speaks the language at all well, by the standards of the imaginary Lojbanistan; that is one of the circumstances this book is meant to help remedy.

## 1.3. What are the typographical conventions of this book?

Each chapter is broken into numbered sections; each section contains a mixture of expository text, numbered examples, and possibly tables.

The reader will notice a certain similarity in the examples used throughout the book. One chapter after another rings the changes on the self-same sentences:

Example 1.1.

 mi klama le zarci I go-to that-which-I-describe-as-a store.
 I go to the store.

will become wearisomely familiar before Chapter 21 is reached. This method is deliberate; I have tried to use simple and (eventually) familiar examples wherever possible, to avoid obscuring new grammatical points with new vocabulary. Of course, this is not the method of a textbook, but this book is not a textbook (although people have learned Lojban from it and its predecessors). Rather, it is intended both for self-learning (of course, at present would-be Lojban teachers must be self-learners) and to serve as a reference in the usual sense, for looking up obscure points about the language.

It is useful to talk further about Example 1.1 for what it illustrates about examples in this book. Examples usually occupy three lines. The first of these is in Lojban (in italics), the second in a word-by-word literal translation of the Lojban into English (in boldface), and the third in colloquial English. The second and third lines are sometimes called the literal translation and the colloquial translation respectively. Sometimes, when clarity is not sacrificed thereby, one or both are omitted. If there is more than one Lojban sentence, it generally means that they have the same meaning.

Words are sometimes surrounded by square brackets. In Lojban texts, these enclose optional grammatical particles that may (in the context of the particular example) be either omitted or included. In literal translations, they enclose words that are used as conventional translations of specific Lojban words, but don't have exactly the meanings or uses that the English word would suggest. In Chapter 3, square brackets surround phonetic representations in the International Phonetic Alphabet.

Many of the tables, especially those placed at the head of various sections, are in three columns. The first column contains Lojban words discussed in that section; the second column contains the grammatical category (represented by an UPPER CASE Lojban word) to which the word belongs, and the third column contains a brief English gloss, not necessarily or typically a full explanation. Other tables are explained in context.

A few Lojban words are used in this book as technical terms. All of these are explained in Chapter 2, except for a few used only in single chapters, which are explained in the introductory sections of those chapters.

## 1.4. Disclaimers

It is necessary to add, alas, that the examples used in this book do not refer to any existing person, place, or institution, and that any such resemblance is entirely coincidental and unintentional, and not intended to give offense.

When definitions and place structures of gismu, and especially of lujvo, are given in this book, they may differ from those given in the English-Lojban dictionary (which, as of this writing, is not yet published). If so, the information given in the dictionary supersedes whatever is given here.

## 1.5. Acknowledgements and Credits

Although the bulk of this book was written for the Logical Language Group (LLG) by John Cowan, who is represented by the occasional authorial I, certain chapters were first written by others and then heavily edited by me to fit into this book.

In particular: Chapter 2 is a fusion of originally separate documents, one by Athelstan, and one by Nora Tansky LeChevalier and Bob LeChevalier; Chapter 3 and Chapter 4 were originally written by Bob LeChevalier with contributions by Chuck Barton; Chapter 12 was originally written (in much longer form) by Nick Nicholas; the dialogue near the end of Chapter 13 was contributed by Nora Tansky LeChevalier; Chapter 15 and parts of Chapter 16 were originally by Bob LeChevalier. The BNF grammar, which is also in Chapter 21, was originally written by me, then rewritten by Clark Nelson, and finally touched up by me again.

The research into natural languages from which parts of Chapter 5 draw their material was performed by Ivan Derzhanski. LLG acknowledges his kind permission to use the fruits of his research.

The pictures in this book were drawn by Nora Tansky LeChevalier, except for the picture appearing in Chapter 4, which is by Sylvia Rutiser Rissell.

The index was made by Nora Tansky LeChevalier.

I would like to thank the following people for their detailed reviews, suggestions, comments, and early detection of my embarrassing errors in Lojban, logic, English, and cross-references: Nick Nicholas, Mark Shoulson, Veijo Vilva, Colin Fine, And Rosta, Jorge Llambias, Iain Alexander, Paulo S. L. M. Barreto, Robert J. Chassell, Gale Cowan, Karen Stein, Ivan Derzhanski, Jim Carter, Irene Gates, Bob LeChevalier, John Parks-Clifford (also known as pc), and Nora Tansky LeChevalier.

Nick Nicholas (NSN) would like to thank the following Lojbanists: Mark Shoulson, Veijo Vilva, Colin Fine, And Rosta, and Iain Alexander for their suggestions and comments; John Cowan, for his extensive comments, his exemplary trailblazing of Lojban grammar, and for solving the manskapi dilemma for NSN; Jorge Llambias, for his even more extensive comments, and for forcing NSN to think more than he was inclined to; Bob LeChevalier, for his skeptical overview of the issue, his encouragement, and for scouring all Lojban text his computer has been burdened with for lujvo; Nora Tansky LeChevalier, for writing the program converting old rafsi text to new rafsi text, and sparing NSN from embarrassing errors; and Jim Carter, for his dogged persistence in analyzing lujvo algorithmically, which inspired this research, and for first identifying the three lujvo classes.

Of course, the entire Loglan Project owes a considerable debt to James Cooke Brown as the language inventor, and also to several earlier contributors to the development of the language. Especially noteworthy are Doug Landauer, Jeff Prothero, Scott Layson, Jeff Taylor, and Bob McIvor. Final responsibility for the remaining errors and infelicities is solely mine.

## 1.6. Informal Bibliography

The founding document for the Loglan Project, of which this book is one of the products, is Loglan 1: A Logical Language by James Cooke Brown (4th ed. 1989, The Loglan Institute, Gainesville, Florida, U.S.A.). The language described therein is not Lojban, but is very close to it and may be considered an ancestral version. It is regrettably necessary to state that nothing in this book has been approved by Dr. Brown, and that the very existence of Lojban is disapproved of by him.

The logic of Lojban, such as it is, owes a good deal to the American philosopher W. v.O. Quine, especially Word and Object (1960, M.I.T. Press). Much of Quine's philosophical writings, especially on observation sentences, reads like a literal translation from Lojban.

The theory of negation expounded in Chapter 15 is derived from a reading of Laurence Horn's work A Natural History of Negation.

Of course, neither Brown nor Quine nor Horn is in any way responsible for the uses or misuses I have made of their works.

Depending on just when you are reading this book, there may be three other books about Lojban available: a textbook, a Lojban/English dictionary, and a book containing general information about Lojban. You can probably get these books, if they have been published, from the same place where you got this book. In addition, other books not yet foreseen may also exist.

## 1.7. Captions to Pictures

The following examples list the Lojban caption, with a translation, for the picture at the head of each chapter. If a chapter's picture has no caption, (none) is specified instead.

Chapter 1

coi .lojban.

Greetings, O Lojban!

coi rodo

Greetings, all-of you

Chapter 2

(none)

Chapter 3

.i .ai .i .ai .o

[a sequence of arbitrary Lojban words]

Chapter 4

jbobliku

Lojbanic-blocks

Chapter 5

(none)

Chapter 6
 lei re nanmu cu bevri le re nanmu The-mass-of two men carry the two men
 Two men (jointly) carry two men (both of them).
Chapter 7
 ma drani danfu [What-sumti] is-the-correct type-of-answer?
 .i di'e The-next-sentence.
 .i di'u .i dei The-previous-sentence. This-sentence.
 .i ri .i do'i The-previous-sentence. An-unspecified-utterance.
Chapter 8
 ko viska re prenu poi bruna la .santas. [You!] see two persons who-are brothers-of that-named Santa.
Chapter 9

(none)

Chapter 10
 za'o klama [superfective] come/go
 Something goes (or comes) for too long.
Chapter 11
 le si'o kunti The concept-of emptiness
Chapter 12

(none)

Chapter 13
 .oi ro'i ro'a ro'o [Pain!] [emotional] [social] [physical]
Chapter 14

(none)

Chapter 15
 mi na'e lumci le karce I other-than wash the car
 I didn't wash the car.
Chapter 16
 drata mupli pe'u .djan. another example [please] John
Chapter 17
 zai xanlerfu bu ly. .obu .jy by. .abu ny. [Shift] hand-letters l o j b a n
 "Lojban" in a manual alphabet
Chapter 18
 no no 0 0
Chapter 19

(none)

Chapter 20

(none)

Chapter 21

(none)

## 1.8. Boring Legalities

Permission is granted to make and distribute verbatim copies of this book, either in electronic or in printed form, provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this book, provided that the modifications are clearly marked as such, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.

Permission is granted to copy and distribute translations of this book into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation that has been approved by the Logical Language Group, rather than in English.

The contents of Chapter 21 are in the public domain.

For information, contact: The Logical Language Group, 2904 Beau Lane, Fairfax VA 22031-1303 USA. Telephone: 703-385-0273. Email address: [email protected]. Web Address: http://www.lojban.org.

# Chapter 2. A Quick Tour of Lojban Grammar, With Diagrams

## 2.1. The concept of the bridi

This chapter gives diagrammed examples of basic Lojban sentence structures. The most general pattern is covered first, followed by successive variations on the basic components of the Lojban sentence. There are many more capabilities not covered in this chapter, but covered in detail in later chapters, so this chapter is a quick tour of the material later covered more slowly throughout the book. It also introduces most of the Lojban words used to discuss Lojban grammar.

Let us consider John and Sam and three statements about them:

Example 2.1.

John is the father of Sam.

Example 2.2.

John hits Sam.

Example 2.3.

John is taller than Sam.

These examples all describe relationships between John and Sam. However, in English, we use the noun father to describe a static relationship in Example 2.1, the verb hits to describe an active relationship in Example 2.2, and the adjective taller to describe an attributive relationship in Example 2.3. In Lojban we make no such grammatical distinctions; these three sentences, when expressed in Lojban, are structurally identical. The same part of speech is used to represent the relationship. In formal logic this whole structure is called a predication; in Lojban it is called a bridi, and the central part of speech is the selbri. Logicians refer to the things thus related as arguments, while Lojbanists call them sumti. These Lojban terms will be used for the rest of the book.

In a relationship, there are a definite number of things being related. In English, for example, give has three places: the donor, the recipient and the gift. For example:

Example 2.4.

John gives Sam the book.

and

Example 2.5.

Sam gives John the book.

mean two different things because the relative positions of John and Sam have been switched. Further,

Example 2.6.

The book gives John Sam.

seems strange to us merely because the places are being filled by unorthodox arguments. The relationship expressed by give has not changed.

In Lojban, each selbri has a specified number and type of arguments, known collectively as its place structure. The simplest kind of selbri consists of a single root word, called a gismu, and the definition in a dictionary gives the place structure explicitly. The primary task of constructing a Lojban sentence, after choosing the relationship itself, is deciding what you will use to fill in the sumti places.

This book uses the Lojban terms bridi, sumti, and selbri, because it is best to come to understand them independently of the English associations of the corresponding words, which are only roughly similar in meaning anyhow.

The Lojban examples in this chapter (but not in the rest of the book) use boldface (as well as the usual italics) for selbri, to help you to tell them apart.

## 2.2. Pronunciation

Detailed pronunciation and spelling rules are given in Chapter 3, but what follows will keep the reader from going too far astray while digesting this chapter.

Lojban has six recognized vowels: a, e, i, o, u and y. The first five are roughly pronounced as a as in father, e as in let, i as in machine, o as in dome and u as in flute. y is pronounced as the sound called schwa, that is, as the unstressed a as in about or around.

Twelve consonants in Lojban are pronounced more or less as their counterparts are in English: b, d, f, k, l, m, n, p, r, t, v and z. The letter c, on the other hand is pronounced as the sh in hush, while j is its voiced counterpart, the sound of the s in pleasure. g is always pronounced as it is in gift, never as in giant. s is as in sell, never as in rose. The sound of x is not found in English in normal words. It is found as ch in Scottish loch, as j in Spanish junta, and as ch in German Bach; it also appears in the English interjection yecchh!. It gets easier to say as you practice it. The letter r can be trilled, but doesn't have to be.

The Lojban diphthongs ai, ei, oi, and au are pronounced much as in the English words sigh, say, boy, and how. Other Lojban diphthongs begin with an i pronounced like English y (for example, io is pronounced yo) or else with a u pronounced like English w (for example, ua is pronounced wa).

Lojban also has three semi-letters: the period, the comma and the apostrophe. The period represents a glottal stop or a pause; it is a required stoppage of the flow of air in the speech stream. The apostrophe sounds just like the English letter h. Unlike a regular consonant, it is not found at the beginning or end of a word, nor is it found adjacent to a consonant; it is only found between two vowels. The comma has no sound associated with it, and is used to separate syllables that might ordinarily run together. It is not used in this chapter.

Stress falls on the next to the last syllable of all words, unless that vowel is y, which is never stressed; in such words the third-to-last syllable is stressed. If a word only has one syllable, then that syllable is not stressed.

All Lojban words are pronounced as they are spelled: there are no silent letters.

## 2.3. Words that can act as sumti

Here is a short table of single words used as sumti. This table provides examples only, not the entire set of such words, which may be found in Section 7.16.

 mi I/me, we/us do you ti this, these ta that, those tu that far away, those far away zo'e unspecified value (used when a sumti is unimportant or obvious)

Lojban sumti are not specific as to number (singular or plural), nor gender (masculine/feminine/neutral). Such distinctions can be optionally added by methods that are beyond the scope of this chapter.

The cmavo ti, ta, and tu refer to whatever the speaker is pointing at, and should not be used to refer to things that cannot in principle be pointed at.

Names may also be used as sumti, provided they are preceded with the word la:

 la .meris. the one/ones named Mary la .djan. the one/ones named John

Other Lojban spelling versions are possible for names from other languages, and there are restrictions on which letters may appear in Lojban names: see Section 6.12 for more information.

## 2.4. Some words used to indicate selbri relations

Here is a short table of some words used as Lojban selbri in this chapter:

 x1 (seller) sells x2 (goods) to x3 (buyer) for x4 (price) x1 (talker) talks to x2 (audience) about x3 (topic) in language x4 x1 (agent) is fast at doing x2 (action) x1 (object/light source) is blue-green x1 (object/idea) is beautiful to x2 (observer) by standard x3 x1 is a shoe/boot for x2 (foot) made of x3 (material) x1 runs on x2 (surface) using x3 (limbs) in manner x4 (gait) x1 goes/comes to x2 (destination) from x3 (origin point) via x4 (route) using x5 (means of transportation) x1 pleases/is pleasing to x2 (experiencer) under conditions x3 x1 is a dog of breed x2 x1 takes care of x2 x1 is healthy by standard x2 x1 stays/remains with x2 x1 is a market/store/shop selling x2 (products) operated by x3 (storekeeper)

Each selbri (relation) has a specific rule that defines the role of each sumti in the bridi, based on its position. In the table above, that order was expressed by labeling the sumti positions as x1, x2, x3, x4, and x5.

Like the table in Section 2.3, this table is far from complete: in fact, no complete table can exist, because Lojban allows new words to be created (in specified ways) whenever a speaker or writer finds the existing supply of words inadequate. This notion is a basic difference between Lojban (and some other languages such as German and Chinese) and English; in English, most people are very leery of using words that aren't in the dictionary. Lojbanists are encouraged to invent new words; doing so is a major way of participating in the development of the language. Chapter 4 explains how to make new words, and Chapter 12 explains how to give them appropriate meanings.

## 2.5. Some simple Lojban bridi

Let's look at a simple Lojban bridi. The place structure of the gismu tavla is

Example 2.7.

x1 talks to x2 about x3 in language x4

where the x es with following numbers represent the various arguments that could be inserted at the given positions in the English sentence. For example:

Example 2.8.

John talks to Sam about engineering in Lojban.

has John in the x1 place, Sam in the x2 place, engineering in the x3 place, and Lojban in the x4 place, and could be paraphrased:

Example 2.9.

Talking is going on, with speaker John and listener Sam and subject matter engineering and language Lojban.

The Lojban bridi corresponding to Example 2.7 will have the form

Example 2.10.

 x1 [cu] tavla x2 x3 x4

The word cu serves as a separator between any preceding sumti and the selbri. It can often be omitted, as in the following examples.

Example 2.11.

 mi tavla do zo'e zo'e
 I talk to you about something in some language.

Example 2.12.

 do tavla mi ta zo'e
 You talk to me about that thing in a language.

Example 2.13.

 mi tavla zo'e tu ly.
 I talk to someone about that thing yonder in language L.

(in Example 2.13 the word ly.is a so-called letteral for the Lojban letter l and refers to something labelled l, most likely the language Lojban as its first letter is l.)

When there are one or more occurrences of the cmavo zo'e at the end of a bridi, they may be omitted, a process called ellipsis. Example 2.11 and Example 2.12 may be expressed thus:

Example 2.14.

 mi tavla do
 I talk to you (about something in some language).

Example 2.15.

 do tavla mi ta
 You talk to me about that thing (in some language).

Note that Example 2.13 is not subject to ellipsis by this direct method, as the zo'e in it is not at the end of the bridi.

## 2.6. Variant bridi structure

Consider the sentence

Example 2.16.

 mi [cu] vecnu ti ta zo'e seller-x1 - sells goods-sold-x2 buyer-x3 price-x4 I - sell this to that for some price.

Example 2.16 has one sumti (the x1) before the selbri. It is also possible to put more than one sumti before the selbri, without changing the order of sumti:

Example 2.17.

 mi ti [cu] vecnu ta seller-x1 goods-sold-x2 - sells buyer-x3 I this - sell to that.
 (translates as stilted or poetic English) I this thing do sell to that buyer.

Example 2.18.

 mi ti ta [cu] vecnu seller-x1 goods-sold-x2 buyer-x3 - sells I this to that - sell
 (translates as stilted or poetic English) I this thing to that buyer do sell.

Example 2.16 through Example 2.18 mean the same thing. Usually, placing more than one sumti before the selbri is done for style or for emphasis on the sumti that are out-of-place from their normal position. (Native speakers of languages other than English may prefer such orders.)

If there are no sumti before the selbri, then it is understood that the x1 sumti value is equivalent to zo'e; i.e. unimportant or obvious, and therefore not given. Any sumti after the selbri start counting from x2.

Example 2.19.

 ta [cu] melbi object/idea-x1 - is-beautiful (to someone by some standard) That/Those - is/are beautiful.
 That is beautiful. Those are beautiful.

when the x1 is omitted, becomes:

Example 2.20.

 melbi unspecified-x1 is-beautiful to someone by some standard
 Beautiful! It's beautiful!

Omitting the x1 adds emphasis to the selbri relation, which has become first in the sentence. This kind of sentence is termed an observative, because it is often used when someone first observes or takes note of the relationship, and wishes to quickly communicate it to someone else. Commonly understood English observatives include Smoke! upon seeing smoke or smelling the odor, or Car! to a person crossing the street who might be in danger. Any Lojban selbri can be used as an observative if no sumti appear before the selbri.

The word cu does not occur in an observative; cu is a separator, and there must be a sumti before the selbri that needs to be kept separate for cu to be used. With no sumti preceding the selbri, cu is not permitted. Short words like cu which serve grammatical functions are called cmavo in Lojban.

## 2.7. Varying the order of sumti

For one reason or another you may want to change the order, placing one particular sumti at the front of the bridi. The cmavo se, when placed before the last word of the selbri, will switch the meanings of the first and second sumti places. So

Example 2.21.

 mi tavla do ti

has the same meaning as

Example 2.22.

 do se tavla mi ti

The cmavo te, when used in the same location, switches the meanings of the first and the third sumti places.

Example 2.23.

 mi tavla do ti

has the same meaning as

Example 2.24.

 ti te tavla do mi
 This is talked about to you by me.

Note that only the first and third sumti have switched places; the second sumti has remained in the second place.

The cmavo ve and xe switch the first and fourth sumti places, and the first and fifth sumti places, respectively. These changes in the order of places are known as conversions, and the se, te, ve, and xe cmavo are said to convert the selbri.

More than one of these operators may be used on a given selbri at one time, and in such a case they are evaluated from left to right. However, in practice they are used one at a time, as there are better tools for complex manipulation of the sumti places. See Section 9.4 for details.

The effect is similar to what in English is called the passive voice. In Lojban, the converted selbri has a new place structure that is renumbered to reflect the place reversal, thus having effects when such a conversion is used in combination with other constructs such as le selbri [ku] (see Section 2.10).

## 2.8. The basic structure of longer utterances

People don't always say just one sentence. Lojban has a specific structure for talk or writing that is longer than one sentence. The entirety of a given speech event or written text is called an utterance. The sentences (usually, but not always, bridi) in an utterance are separated by the cmavo ni'o and i. These correspond to a brief pause (or nothing at all) in spoken English, and the various punctuation marks like period, question mark, and exclamation mark in written English. These separators prevent the sumti at the beginning of the next sentence from being mistaken for a trailing sumti of the previous sentence.

The cmavo ni'o separates paragraphs (covering different topics of discussion). In a long text or utterance, the topical structure of the text may be indicated by multiple ni'o s, with perhaps ni'oni'oni'o used to indicate a chapter, ni'oni'o to indicate a section, and a single ni'o to indicate a subtopic corresponding to a single English paragraph.

The cmavo i separates sentences. It is sometimes compounded with words that modify the exact meaning (the semantics) of the sentence in the context of the utterance. (The cmavo xu, discussed in Section 2.15, is one such word – it turns the sentence from a statement to a question about truth.) When more than one person is talking, a new speaker will usually omit the i even though she/he may be continuing on the same topic.

It is still O.K. for a new speaker to say the i before continuing; indeed, it is encouraged for maximum clarity (since it is possible that the second speaker might merely be adding words onto the end of the first speaker's sentence). A good translation for i is the and used in run-on sentences when people are talking informally: I did this, and then I did that, and ..., and ....

## 2.9. tanru

When two gismu are adjacent, the first one modifies the second, and the selbri takes its place structure from the rightmost word. Such combinations of gismu are called tanru. For example,

Example 2.25.

sutra tavla

has the place structure

Example 2.26.

x1 is a fast type-of talker to x2 about x3 in language x4

x1 talks fast to x2 about x3 in language x4

When three or more gismu are in a row, the first modifies the second, and that combined meaning modifies the third, and that combined meaning modifies the fourth, and so on. For example

Example 2.27.

sutra tavla cutci

has the place structure

Example 2.28.

s1 is a fast-talker type of shoe worn by s2 of material s3

That is, it is a shoe that is worn by a fast talker rather than a shoe that is fast and is also worn by a talker.

Note especially the use of type-of as a mechanism for connecting the English translations of the two or more gismu; this convention helps the learner understand each tanru in its context. Creative interpretations are also possible, however:

Example 2.29.

 bajra cutci runner shoe

most probably refers to shoes suitable for runners, but might be interpreted in some imaginative instances as shoes that run (by themselves?). In general, however, the meaning of a tanru is determined by the literal meaning of its components, and not by any connotations or figurative meanings. Thus

Example 2.30.

 sutra tavla fast talker

would not necessarily imply any trickery or deception, unlike the English idiom, and a

Example 2.31.

 jikca toldi social butterfly

must always be an insect with large brightly-colored wings, of the family Lepidoptera.

The place structure of a tanru is always that of the final component of the tanru. Thus, the following has the place structure of klama:

Example 2.32.

 mi [cu] sutra klama la meris. I - quickly-go to Mary.

With the conversion se klama as the final component of the tanru, the place structure of the entire selbri is that of se klama: the x1 place is the destination, and the x2 place is the one who goes:

Example 2.33.

 mi [cu] sutra se klama la meris. I - quickly am-gone-to by Mary.

The following example shows that there is more to conversion than merely switching places, though:

Example 2.34.

 la tam. [cu] melbi tavla la meris. Tom - beautifully-talks to Mary. Tom - is a beautiful-talker to Mary.

has the place structure of tavla, but note the two distinct interpretations.

Now, using conversion, we can modify the place structure order:

Example 2.35.

 la meris. [cu] melbi se tavla la tam. Mary - is beautifully-talked-to by Tom. Mary - is a beautiful-audience for Tom.

and we see that the modification has been changed so as to focus on Mary's role in the bridi relationship, leading to a different set of possible interpretations.

Note that there is no place structure change if the modifying term is converted, and so less drastic variation in possible meanings:

Example 2.36.

 la tam. [cu] tavla melbi la meris. Tom - is talkerly-beautiful to Mary.

Example 2.37.

 la tam. [cu] se tavla melbi la meris. Tom - is audiencely-beautiful to Mary.

and we see that the manner in which Tom is seen as beautiful by Mary changes, but Tom is still the one perceived as beautiful, and Mary, the observer of beauty.

## 2.10. Description sumti

Often we wish to talk about things other than the speaker, the listener and things we can point to. Let's say I want to talk about a talker other than mi. What I want to talk about would naturally fit into the first place of tavla. Lojban, it turns out, has an operator that pulls this first place out of a selbri and converts it to a sumti called a description sumti. The description sumti le tavla ku means the talker, and may be used wherever any sumti may be used.

For example,

Example 2.38.

 mi tavla do le tavla [ku]

means the same as

Example 2.39.

I talk to you about the talker

where the talker is presumably someone other than me, though not necessarily.

Similarly le sutra tavla ku is the fast talker, and le sutra te tavla ku is the fast subject of talk or the subject of fast talk. Which of these related meanings is understood will depend on the context in which the expression is used. The most plausible interpretation within the context will generally be assumed by a listener to be the intended one.

In many cases the word ku may be omitted. In particular, it is never necessary in a description at the end of a sentence, so:

Example 2.40.

 mi tavla do le tavla I talk-to you about-the talker

means exactly the same thing as Example 2.38.

There is a problem when we want to say The fast one is talking. The obvious translation le sutra tavla turns out to mean the fast talker, and has no selbri at all. To solve this problem we can use the word cu, which so far has always been optional, in front of the selbri.

The word cu has no meaning, and exists only to mark the beginning of the selbri within the bridi, separating it from a previous sumti. It comes before any other part of the selbri, including other cmavo like se or te. Thus:

Example 2.41.

 le sutra tavla The fast talker

Example 2.42.

 le sutra cu tavla The fast one - is talking.

Example 2.43.

 le sutra se tavla The fast talked-to one

Example 2.44.

 le sutra cu se tavla The fast one - is talked to.

Consider the following more complex example, with two description sumti.

Example 2.45.

 mi [cu] tavla le vecnu [ku] le blari'o [ku] I - talk-to the seller - about the blue-green-thing. -

The sumti le vecnu contains the selbri vecnu, which has the seller in the x1 place, and uses it in this sentence to describe a particular seller that the speaker has in mind (one that he or she probably expects the listener will also know about). Similarly, the speaker has a particular blue-green thing in mind, which is described using le to mark blari'o, a selbri whose first sumti is something blue-green.

It is safe to omit both occurrences of ku in Example 2.45, and it is also safe to omit the cu.

## 2.11. Examples of brivla

The simplest form of selbri is an individual word. A word which may by itself express a selbri relation is called a brivla. The three types of brivla are gismu (root words), lujvo (compounds), and fu'ivla (borrowings from other languages). All have identical grammatical uses. So far, most of our selbri have been gismu or tanru built from gismu.

gismu:

Example 2.46.

 mi [cu] klama ti zo'e zo'e ta Go-er - goes destination origin route means.
 I go here (to this) using that means (from somewhere via some route).

lujvo:

Example 2.47.

 ta [cu] blari'o That - is-blue-green.

fu'ivla:

Example 2.48.

 ti [cu] djarspageti This - is-spaghetti.

Some cmavo may also serve as selbri, acting as variables that stand for another selbri. The most commonly used of these is go'i, which represents the main bridi of the previous Lojban sentence, with any new sumti or other sentence features being expressed replacing the previously expressed ones. Thus, in this context:

Example 2.49.

 ta [cu] go'i That - too/same-as-last selbri.
 That (is spaghetti), too.

## 2.12. The sumti di'u and la'e di'u

In English, I might say The dog is beautiful, and you might reply This pleases me. How do you know what this refers to? Lojban uses different expressions to convey the possible meanings of the English:

Example 2.50.

 le gerku [ku] cu melbi
 The dog is beautiful.

The following three sentences all might translate as This pleases me.

Example 2.51.

 ti [cu] pluka mi

Example 2.52.

 di'u [cu] pluka mi
 This (the last sentence) pleases me (perhaps because it is grammatical or sounds nice).

Example 2.53.

 la'e di'u [cu] pluka mi
 This (the meaning of the last sentence; i.e. that the dog is beautiful) pleases me.

Example 2.53 uses one sumti to point to or refer to another by inference. It is common to write la'edi'u as a single word; it is used more often than di'u by itself.

## 2.13. Possession

Possession refers to the concept of specifying an object by saying who it belongs to (or with). A full explanation of Lojban possession is given in Chapter 8. A simple means of expressing possession, however, is to place a sumti representing the possessor of an object within the description sumti that refers to the object: specifically, between the le and the selbri of the description:

Example 2.54.

 le mi gerku cu sutra The of-me dog - is fast.
 My dog is fast.

In Lojban, possession doesn't necessarily mean ownership: one may possess a chair simply by sitting on it, even though it actually belongs to someone else. English uses possession casually in the same way, but also uses it to refer to actual ownership or even more intimate relationships: my arm doesn't mean some arm I own but rather the arm that is part of my body. Lojban has methods of specifying all these different kinds of possession precisely and easily.

## 2.14. Vocatives and commands

You may call someone's attention to the fact that you are addressing them by using doi followed by their name. The sentence

Example 2.55.

doi .djan.

means Oh, John, I'm talking to you. It also has the effect of setting the value of do; do now refers to John until it is changed in some way in the conversation. Note that Example 2.55 is not a bridi, but it is a legitimate Lojban sentence nevertheless; it is known as a vocative phrase.

Other cmavo can be used instead of doi in a vocative phrase, with a different significance. For example, the cmavo coi means hello and co'o means good-bye. Either word may stand alone, they may follow one another, or either may be followed by a Lojbanized name surrounded by pauses.

Example 2.56.

 coi .djan. Hello, John.

Example 2.57.

 co'o .djan. Good-bye, John.

Commands are expressed in Lojban by a simple variation of the main bridi structure. If you say

Example 2.58.

 do tavla You are-talking.

you are simply making a statement of fact. In order to issue a command in Lojban, substitute the word ko for do. The bridi

Example 2.59.

 ko tavla

instructs the listener to do whatever is necessary to make Example 2.58 true; it means Talk! Other examples:

Example 2.60.

 ko sutra
 Be fast!

The ko need not be in the x1 place, but rather can occur anywhere a sumti is allowed, leading to possible Lojban commands that are very unlike English commands:

Example 2.61.

 mi tavla ko
 Be talked to by me. Let me talk to you.

The cmavo ko can fill any appropriate sumti place, and can be used as often as is appropriate for the selbri:

Example 2.62.

 ko kurji ko

and

Example 2.63.

 ko ko kurji

both mean You take care of you and Be taken care of by you, or to put it colloquially, Take care of yourself.

## 2.15. Questions

There are many kinds of questions in Lojban: full explanations appear in Section 19.5 and in various other chapters throughout the book. In this chapter, we will introduce three kinds: sumti questions, selbri questions, and yes/no questions.

The cmavo ma is used to create a sumti question: it indicates that the speaker wishes to know the sumti which should be placed at the location of the ma to make the bridi true. It can be translated as Who? or What? in most cases, but also serves for When?, Where?, and Why? when used in sumti places that express time, location, or cause. For example:

Example 2.64.

 ma tavla do mi Who? talks to-you about-me.
 Who is talking to you about me?

The listener can reply by simply stating a sumti:

Example 2.65.

 la djan.
 John (is talking to you about me).

Like ko, ma can occur in any position where a sumti is allowed, not just in the first position:

Example 2.66.

 do [cu] tavla ma You - talk to what/whom?

A ma can also appear in multiple sumti positions in one sentence, in effect asking several questions at once.

Example 2.67.

 ma [cu] tavla ma What/Who - talks to what/whom?

The two separate ma positions ask two separate questions, and can therefore be answered with different values in each sumti place.

The cmavo mo is the selbri analogue of ma. It asks the respondent to provide a selbri that would be a true relation if inserted in place of the mo:

Example 2.68.

 do [cu] mo You - are-what/do-what?

A mo may be used anywhere a brivla or other selbri might. Keep this in mind for later examples. Unfortunately, by itself, mo is a very non-specific question. The response to the question in Example 2.68 could be:

Example 2.69.

 mi [cu] melbi
 I am beautiful.

or:

Example 2.70.

 mi [cu] tavla
 I talk.

Clearly, mo requires some cooperation between the speaker and the respondent to ensure that the right question is being answered. If context doesn't make the question specific enough, the speaker must ask the question more specifically using a more complex construction such as a tanru (see Section 2.9).

It is perfectly permissible for the respondent to fill in other unspecified places in responding to a mo question. Thus, the respondent in Example 2.70 could have also specified an audience, a topic, and/or a language in the response.

Finally, we must consider questions that can be answered Yes or No, such as

Example 2.71.

Are you talking to me?

Like all yes-or-no questions in English, Example 2.71 may be reformulated as

Example 2.72.

Is it true that you are talking to me?

In Lojban we have a word that asks precisely that question in precisely the same way. The cmavo xu, when placed in front of a bridi, asks whether that bridi is true as stated. So

Example 2.73.

 xu do tavla mi Is-it-true-that you are-talking to-me?

is the Lojban translation of Example 2.71.

The answer Yes may be given by simply restating the bridi without the xu question word. Lojban has a shorthand for doing this with the word go'i, mentioned in Section 2.11. Instead of a negative answer, the bridi may be restated in such a way as to make it true. If this can be done by substituting sumti, it may be done with go'i as well. For example:

Example 2.74.

 xu do kanro
 Are you healthy?

Example 2.75.

 mi kanro
 I am healthy.

or

Example 2.76.

 go'i
 I am healthy.

(Note that do to the questioner is mi to the respondent.)

or

Example 2.77.

 le tavla cu kanro
 The talker is healthy.

or

Example 2.78.

 le tavla cu go'i
 The talker is healthy.

A general negative answer may be given by na go'i. na may be placed before any selbri (but after the cu). It is equivalent to stating It is not true that ... before the bridi. It does not imply that anything else is true or untrue, only that that specific bridi is not true. More details on negative statements are available in Chapter 15.

## 2.16. Indicators

Different cultures express emotions and attitudes with a variety of intonations and gestures that are not usually included in written language. Some of these are available in some languages as interjections (i.e. Aha!, Oh no!, Ouch!, Aahh!, etc.), but they vary greatly from culture to culture.

Lojban has a group of cmavo known as attitudinal indicators which specifically covers this type of commentary on spoken statements. They are both written and spoken, but require no specific intonation or gestures. Grammatically they are very simple: one or more attitudinals at the beginning of a bridi apply to the entire bridi; anywhere else in the bridi they apply to the word immediately to the left. For example:

Example 2.79.

 .ie mi [cu] klama Agreement! I - go.
 Yep! I'll go.

Example 2.80.

 .ei mi [cu] klama Obligation! I - go.
 I should go.

Example 2.81.

 mi [cu] klama le melbi I - go to-the beautiful-thing
 .ui [ku] and I am happy because it is the beautiful thing I'm going to -

Not all indicators indicate attitudes. Discursives, another group of cmavo with the same grammatical rules as attitudinal indicators, allow free expression of certain kinds of commentary about the main utterances. Using discursives allows a clear separation of these so-called metalinguistic features from the underlying statements and logical structure. By comparison, the English words but and also, which discursively indicate contrast or an added weight of example, are logically equivalent to and, which does not have a discursive content. The average English-speaker does not think about, and may not even realize, the paradoxical idea that but basically means and.

Example 2.82.

 mi [cu] klama .i do [cu] stali I - go. You - stay.

Example 2.83.

 mi [cu] klama .i ji'a do [cu] stali I - go. In addition, you - stay. added weight

Example 2.84.

 mi [cu] klama .i ku'i do [cu] stali I - go. However, you - stay. contrast

Another group of indicators are called evidentials. Evidentials show the speaker's relationship to the statement, specifically how the speaker came to make the statement. These include za'a (I directly observe the relationship), pe'i (I believe that the relationship holds), ru'a (I postulate the relationship), and others. Many American Indian languages use this kind of words.

Example 2.85.

 pe'i do [cu] melbi I opine! You - are beautiful.

Example 2.86.

 za'a do [cu] melbi I directly observe! You - are beautiful.

## 2.17. Tenses

In English, every verb is tagged for the grammatical category called tense: past, present, or future. The sentence

Example 2.87.

John went to the store

necessarily happens at some time in the past, whereas

Example 2.88.

John is going to the store

is necessarily happening right now.

The Lojban sentence

Example 2.89.

 la djan. [cu] klama le zarci John - goes/went/will-go to-the store

serves as a translation of either Example 2.87 or Example 2.88, and of many other possible English sentences as well. It is not marked for tense, and can refer to an event in the past, the present or the future. This rule does not mean that Lojban has no way of representing the time of an event. A close translation of Example 2.87 would be:

Example 2.90.

 la djan. pu klama le zarci John [past] goes to-the store

where the tag pu forces the sentence to refer to a time in the past. Similarly,

Example 2.91.

 la djan. ca klama le zarci John [present] goes to-the store

necessarily refers to the present, because of the tag ca. Tags used in this way always appear at the very beginning of the selbri, just after the cu, and they may make a cu unnecessary, since tags cannot be absorbed into tanru. Such tags serve as an equivalent to English tenses and adverbs. In Lojban, tense information is completely optional. If unspecified, the appropriate tense is picked up from context.

Lojban also extends the notion of tense to refer not only to time but to space. The following example uses the tag vu to specify that the event it describes happens far away from the speaker:

Example 2.92.

 do vu vecnu zo'e You yonder sell something-unspecified.

In addition, tense tags (either for time or space) can be prefixed to the selbri of a description, producing a tensed sumti:

Example 2.93.

 le pu bajra [ku] cu tavla The earlier/former/past runner - - talked/talks.

(Since Lojban tense is optional, we don't know when he or she talks.)

Tensed sumti with space tags correspond roughly to the English use of this or that as adjectives, as in the following example, which uses the tag vi meaning nearby:

Example 2.94.

 le vi bajra [ku] cu tavla The nearby runner - - talks.
 This runner talks.

Do not confuse the use of vi in Example 2.94 with the cmavo ti, which also means this, but in the sense of this thing.

Furthermore, a tense tag can appear both on the selbri and within a description, as in the following example (where ba is the tag for future time):

Example 2.95.

 le vi tavla [ku] cu ba klama The here talker - - [future] goes.
 The talker who is here will go. This talker will go.

## 2.18. Lojban grammatical terms

Here is a review of the Lojban grammatical terms used in this chapter, plus some others used throughout this book. Only terms that are themselves Lojban words are included: there are of course many expressions like indicator in Chapter 16 that are not explained here. See the Index for further help with these.

 predication; the basic unit of Lojban expression; the main kind of Lojban sentence; a claim that some objects stand in some relationship, or that some single object has some property. argument; words identifying something which stands in a specified relationship to something else, or which has a specified property. See Chapter 6. logical predicate; the core of a bridi; the word or words specifying the relationship between the objects referred to by the sumti. See Chapter 5. one of the Lojban parts of speech; a short word; a structural word; a word used for its grammatical function. one of the Lojban parts of speech; a content word; a predicate word; can function as a selbri; is a gismu, a lujvo, or a fu'ivla. See Chapter 4. a root word; a kind of brivla; has associated rafsi. See Chapter 4. a compound word; a kind of brivla; may or may not appear in a dictionary; does not have associated rafsi. See Chapter 4 and Chapter 12. a borrowed word; a kind of brivla; may or may not appear in a dictionary; copied in a modified form from some non-Lojban language; usually refers to some aspect of culture or the natural world; does not have associated rafsi. See Chapter 4. a word fragment; one or more is associated with each gismu; can be assembled according to rules in order to make lujvo; not a valid word by itself. See Chapter 4. a group of two or more brivla, possibly with associated cmavo, that form a selbri; always divisible into two parts, with the first part modifying the meaning of the second part (which is taken to be basic). See Chapter 5. a group of cmavo that have the same grammatical use (can appear interchangeably in sentences, as far as the grammar is concerned) but differ in meaning or other usage. See Chapter 20.

# Chapter 3. The Hills Are Alive With The Sounds Of Lojban

## 3.1. Orthography

Lojban is designed so that any properly spoken Lojban utterance can be uniquely transcribed in writing, and any properly written Lojban can be spoken so as to be uniquely reproduced by another person. As a consequence, the standard Lojban orthography must assign to each distinct sound, or phoneme, a unique letter or symbol. Each letter or symbol has only one sound or, more accurately, a limited range of sounds that are permitted pronunciations for that phoneme. Some symbols indicate stress (speech emphasis) and pause, which are also essential to Lojban word recognition. In addition, everything that is represented in other languages by punctuation (when written) or by tone of voice (when spoken) is represented in Lojban by words. These two properties together are known technically as audio-visual isomorphism.

Lojban uses a variant of the Latin (Roman) alphabet, consisting of the following letters and symbols:

 ' , . a b c d e f g i j k l m n o p r s t u v x y z

omitting the letters h, q, and w.

The alphabetic order given above is that of the ASCII coded character set, widely used in computers. By making Lojban alphabetical order the same as ASCII, computerized sorting and searching of Lojban text is facilitated.

Capital letters are used only to represent non-standard stress, which can appear only in the representation of Lojbanized names. Thus the English name Josephine, as normally pronounced, is Lojbanized as .DJOsefin., pronounced ['ʔdʒosɛfinʔ]. (See Section 3.2 for an explanation of the symbols within square brackets.) Technically, it is sufficient to capitalize the vowel letter, in this case O, but it is easier on the reader to capitalize the whole syllable.

Without the capitalization, the ordinary rules of Lojban stress would cause the se syllable to be stressed. Lojbanized names are meant to represent the pronunciation of names from other languages with as little distortion as may be; as such, they are exempt from many of the regular rules of Lojban phonology, as will appear in the rest of this chapter.

## 3.2. Basic Phonetics

Lojban pronunciations are defined using the International Phonetic Alphabet, or IPA, a standard method of transcribing pronunciations. By convention, IPA transcriptions are always within square brackets: for example, the word cat is pronounced (in General American pronunciation) [kæt]. Section 3.10 contains a brief explanation of the IPA characters used in this chapter, with their nearest analogues in English, and will be especially useful to those not familiar with the technical terms used in describing speech sounds.

The standard pronunciations and permitted variants of the Lojban letters are listed in the table below. The descriptions have deliberately been made a bit ambiguous to cover variations in pronunciation by speakers of different native languages and dialects. In all cases except r the first IPA symbol shown represents the preferred pronunciation; for r, all of the variations (and any other rhotic sound) are equally acceptable.

 Letter IPA X-SAMPA Description ' [h] [h] an unvoiced glottal spirant , . . the syllable separator . [ʔ] [?] a glottal stop or a pause a [a], [ɑ] [a], [A] an open vowel b [b] [b] a voiced bilabial stop c [ʃ], [ʂ] [S], [s] a voiceless postalveolar fricative d [d] [d] a voiced dental/alveolar stop e [ɛ], [e] [E], [e] a front mid vowel f [f], [ɸ] [f], [p\] an unvoiced labial fricative g [ɡ] [g] a voiced velar stop i [i] [i] a front close vowel j [ʒ], [ʐ] [Z], [z] a voiced postalveolar fricative k [k] [k] an unvoiced velar stop l [l], [l̩] [l], [l=] a voiced lateral approximant (may be syllabic) m [m], [m̩] [m], [m=] a voiced bilabial nasal (may be syllabic) n [n], [n̩], [ŋ], [ŋ̍] [n], [n=], [N], [N=] a voiced dental or velar nasal (may be syllabic) o [o], [ɔ] [o], [O] a back mid vowel p [p] [p] an unvoiced bilabial stop r [r], [ɹ], [ɾ], [ʀ], [r̩], [ɹ̩], [ʀ̩] [r], [r\], [4], [R\], [r=], [r\=], [R\=] a rhotic sound s [s] [s] an unvoiced alveolar sibilant t [t] [t] an unvoiced dental/alveolar stop u [u] [u] a back close vowel v [v], [β] [v], [B] a voiced labial fricative x [x] [x] an unvoiced velar fricative y [ə] [@] a central mid vowel z [z] [z] a voiced alveolar sibilant

The Lojban sounds must be clearly pronounced so that they are not mistaken for each other. Voicing and placement of the tongue are the key factors in correct pronunciation, but other subtle differences will develop between consonants in a Lojban-speaking community. At this point these are the only mandatory rules on the range of sounds.

Note in particular that Lojban vowels can be pronounced with either rounded or unrounded lips; typically o and u are rounded and the others are not, as in English, but this is not a requirement; some people round y as well. Lojban consonants can be aspirated or unaspirated. Palatalizing of consonants, as found in Russian and other languages, is not generally acceptable in pronunciation, though a following i may cause it.

The sounds represented by the letters c, g, j, s, and x require special attention for speakers of English, either because they are ambiguous in the orthography of English (c, g, s), or because they are strikingly different in Lojban (c, j, x). The English c represents three different sounds, [k] in cat and [s] in cent, as well as the [ʃ] of ocean. Similarly, English g can represent [ɡ] as in go, [dʒ] as in gentle, and [ʒ] as in the second "g" in garage (in some pronunciations). English s can be either [s] as in cats, [z] as in cards, [ʃ] as in tension, or [ʒ] as in measure. The sound of Lojban x doesn't appear in most English dialects at all.

There are two common English sounds that are found in Lojban but are not Lojban consonants: the ch of church and the j of judge. In Lojban, these are considered two consonant sounds spoken together without an intervening vowel sound, and so are represented in Lojban by the two separate consonants: tc (IPA [tʃ]) and dj (IPA [dʒ]). In general, whether a complex sound is considered one sound or two depends on the language: Russian views ts as a single sound, whereas English, French, and Lojban consider it to be a consonant cluster.

## 3.3. The Special Lojban Characters

The apostrophe, period, and comma need special attention. They are all used as indicators of a division between syllables, but each has a different pronunciation, and each is used for different reasons:

The apostrophe represents a phoneme similar to a short, breathy English h, (IPA [h]). The letter h is not used to represent this sound for two reasons: primarily in order to simplify explanations of the morphology, but also because the sound is very common, and the apostrophe is a visually lightweight representation of it. The apostrophe sound is a consonant in nature, but is not treated as either a consonant or a vowel for purposes of Lojban morphology (word-formation), which is explained in Chapter 4. In addition, the apostrophe visually parallels the comma and the period, which are also used (in different ways) to separate syllables.

The apostrophe is included in Lojban only to enable a smooth transition between vowels, while joining the vowels within a single word. In fact, one way to think of the apostrophe is as representing an unvoiced vowel glide.

As a permitted variant, any unvoiced fricative other than those already used in Lojban may be used to render the apostrophe: IPA [θ] is one possibility. The convenience of the listener should be regarded as paramount in deciding to use a substitute for [h].

The period represents a mandatory pause, with no specified length; a glottal stop (IPA [ʔ]) is considered a pause of shortest length. A pause (or glottal stop) may appear between any two words, and in certain cases – explained in detail in Section 4.9 – must occur. In particular, a word beginning with a vowel is always preceded by a pause, and a word ending in a consonant is always surrounded by pauses.

Technically, the period is an optional reminder to the reader of a mandatory pause that is dictated by the rules of the language; because these rules are unambiguous, a missing period can be inferred from otherwise correct text. Periods are included only as an aid to the reader.

A period also may be found apparently embedded in a word. When this occurs, such a written string is not one word but two, written together to indicate that the writer intends a unitary meaning for the compound. It is not really necessary to use a space between words if a period appears.

The comma is used to indicate a syllable break within a word, generally one that is not obvious to the reader. Such a comma is written to separate syllables, but indicates that there must be no pause between them, in contrast to the period. Between two vowels, a comma indicates that some type of glide may be necessary to avoid a pause that would split the two syllables into separate words. It is always legal to use the apostrophe (IPA [h]) sound in pronouncing a comma. However, a comma cannot be pronounced as a pause or glottal stop between the two letters separated by the comma, because that pronunciation would split the word into two words.

Otherwise, a comma is usually only used to clarify the presence of syllabic l, m, n, or r (discussed later). Commas are never required in cmevla: no two Lojban cmevla differ solely because of the presence or placement of a comma.

Here is a somewhat artificial example of the difference in pronunciation between periods, commas and apostrophes. In the English song about Old MacDonald's Farm, the vowel string which is written as ee-i-ee-i-o in English could be Lojbanized with periods as:

Example 3.1.

• .i.ai.i.ai.o

• [ʔi ʔaj ʔi ʔaj ʔo]

• Ee! Eye! Ee! Eye! Oh!

However, this would sound clipped, staccato, and unmusical compared to the English. Furthermore, although Example 3.1 is a string of meaningful Lojban words, as a sentence it makes very little sense. (Note the use of periods embedded within the written word.)

If commas were used instead of periods, we could represent the English string as a Lojbanized name, ending in a consonant:

Example 3.2.

• .i,ai,i,ai,on.

• [ʔi jaj ji jaj jonʔ]

The commas represent new syllable breaks, but prohibit the use of pauses or glottal stop. The pronunciation shown is just one possibility, but closely parallels the intended English pronunciation.

However, the use of commas in this way is risky to unambiguous interpretation, since the glides might be heard by some listeners as diphthongs, producing something like

Example 3.3.

• .i,iai,ii,iai,ion.

which is technically a different Lojban name. Since the intent with Lojbanized names is to allow them to be pronounced more like their native counterparts, the comma is allowed to represent vowel glides or some non-Lojbanic sound. Such an exception affects only spelling accuracy and the ability of a reader to replicate the desired pronunciation exactly; it will not affect the recognition of word boundaries.

Still, it is better if Lojbanized names are always distinct. Therefore, the apostrophe is preferred in regular Lojbanized names that are not attempting to simulate a non-Lojban pronunciation perfectly. (Perfection, in any event, is not really achievable, because some sounds simply lack reasonable Lojbanic counterparts.)

If apostrophes were used instead of commas in Example 3.2, it would appear as:

Example 3.4.

• .i'ai'i'ai'on.

• [ʔi hai hi hai honʔ]

which preserves the rhythm and length, if not the exact sounds, of the original English.

## 3.4. Diphthongs and Syllabic Consonants

There exist 16 diphthongs in the Lojban language. A diphthong is a vowel sound that consists of two elements, a short vowel sound and a glide, either a labial (IPA [w]) or palatal (IPA [j]) glide, that either precedes (an on-glide) or follows (an off-glide) the main vowel. Diphthongs always constitute a single syllable.

For Lojban purposes, a vowel sound is a relatively long speech-sound that forms the nucleus of a syllable. Consonant sounds are relatively brief and normally require an accompanying vowel sound in order to be audible. Consonants may occur at the beginning or end of a syllable, around the vowel, and there may be several consonants in a cluster in either position. Each separate vowel sound constitutes a distinct syllable; consonant sounds do not affect the determination of syllables.

The six Lojban vowels are a, e, i, o, u, and y. The first five vowels appear freely in all kinds of Lojban words. The vowel y has a limited distribution: it appears only in Lojbanized names, in the Lojban names of the letters of the alphabet, as a glue vowel in compound words, and standing alone as a space-filler word (like English uh or er).

The Lojban diphthongs are shown in the table below. (Variant pronunciations have been omitted, but are much as one would expect based on the variant pronunciations of the separate vowel letters: ai may be pronounced [ɑj], for example.)

 Letters IPA Description ai [aj] an open vowel with palatal off-glide ei [ɛj] a front mid vowel with palatal off-glide oi [oj] a back mid vowel with palatal off-glide au [aw] an open vowel with labial off-glide ia [ja] an open vowel with palatal on-glide ie [jɛ] a front mid vowel with palatal on-glide ii [ji] a front close vowel with palatal on-glide io [jo] a back mid vowel with palatal on-glide iu [ju] a back close vowel with palatal on-glide ua [wa] an open vowel with labial on-glide ue [wɛ] a front mid vowel with labial on-glide ui [wi] a front close vowel with labial on-glide uo [wo] a back mid vowel with labial on-glide uu [wu] a back close vowel with labial on-glide iy [jə] a central mid vowel with palatal on-glide uy [wə] a central mid vowel with labial on-glide

(Approximate English equivalents of most of these diphthongs exist: see Section 3.11 for examples.)

The first four diphthongs above (ai, ei, oi, and au, the ones with off-glides) are freely used in most types of Lojban words; the ten following ones are used only as stand-alone words and in Lojbanized names and borrowings; and the last two (iy and uy) are used only in Lojbanized names.

The syllabic consonants of Lojban, [l̩], [m̩], [n̩], and [r̩], are variants of the non-syllabic [l], [m], [n], and [r] respectively. They normally have only a limited distribution, appearing in Lojbanized names and borrowings, although in principle any l, m, n, or r may be pronounced syllabically. If a syllabic consonant appears next to a l, m, n, or r that is not syllabic, it may not be clear which is which:

Example 3.5.

• .brlgan.

• [br̩l gan]

• or

• [brl̩ gan]

is a hypothetical Lojbanized name with more than one valid pronunciation; however it is pronounced, it remains the same word.

Syllabic consonants are treated as consonants rather than vowels from the standpoint of Lojban morphology. Thus Lojbanized names, which are generally required to end in a consonant, are allowed to end with a syllabic consonant. An example is .rl., which is an approximation of the English name Earl, and has two syllabic consonants.

Syllables with syllabic consonants and no vowel are never stressed or counted when determining which syllables to stress (see Section 3.9).

## 3.5. Vowel Pairs

Lojban vowels also occur in pairs, where each vowel sound is in a separate syllable. These two vowel sounds are connected (and separated) by an apostrophe. Lojban vowel pairs should be pronounced continuously with the [h] sound between (and not by a glottal stop or pause, which would split the two vowels into separate words).

All vowel combinations are permitted in two-syllable pairs with the apostrophe separating them; this includes those which constitute diphthongs when the apostrophe is not included.

The Lojban vowel pairs are:

 a'a a'e a'i a'o a'u a'y e'a e'e e'i e'o e'u e'y i'a i'e i'i i'o i'u i'y o'a o'e o'i o'o o'u o'y u'a u'e u'i u'o u'u u'y y'a y'e y'i y'o y'u y'y

Vowel pairs involving y appear only in Lojbanized names. They could appear in cmavo (structure words), but only .y'y. is so used – it is the Lojban name of the apostrophe letter (see Section 17.2).

When more than two vowels occur together in Lojban, the normal pronunciation pairs vowels from the left into syllables, as in the Lojbanized name:

Example 3.6.

• .meiin.

• .mei,in.

Example 3.6 contains the diphthong ei followed by the vowel i. In order to indicate a different grouping, the comma must always be used, leading to:

Example 3.7.

• .me,iin.

which contains the vowel e followed by the diphthong ii. In rough English representation, Example 3.6 is May Een, whereas Example 3.7 is Meh Yeen.

## 3.6. Consonant Clusters

A consonant sound is a relatively brief speech-sound that precedes or follows a vowel sound in a syllable; its presence either preceding or following does not add to the count of syllables, nor is a consonant required in either position for any syllable. Lojban has seventeen consonants: for the purposes of this section, the apostrophe is not counted as a consonant.

An important distinction dividing Lojban consonants is that of voicing. The following table shows the unvoiced consonants and the corresponding voiced ones:

 UNVOICED VOICED p b t d k g f v c j s z x -

The consonant x has no voiced counterpart in Lojban. The remaining consonants, l, m, n, and r, are typically pronounced with voice, but can be pronounced unvoiced.

Consonant sounds occur in languages as single consonants, or as doubled, or as clustered combinations. Single consonant sounds are isolated by word boundaries or by intervening vowel sounds from other consonant sounds. Doubled consonant sounds are either lengthened like [s] in English hiss, or repeated like [k] in English backcourt. Consonant clusters consist of two or more single or doubled consonant sounds in a group, each of which is different from its immediate neighbor. In Lojban, doubled consonants are excluded altogether, and clusters are limited to two or three members, except in Lojbanized names.

Consonants can occur in three positions in words: initial (at the beginning), medial (in the middle), and final (at the end). In many languages, the sound of a consonant varies depending upon its position in the word. In Lojban, as much as possible, the sound of a consonant is unrelated to its position. In particular, the common American English trait of changing a t between vowels into a d or even an alveolar tap (IPA [ɾ]) is unacceptable in Lojban.

Lojban imposes no restrictions on the appearance of single consonants in any valid consonant position; however, no consonant (including syllabic consonants) occurs final in a word except in Lojbanized names.

Pairs of consonants can also appear freely, with the following restrictions:

1. It is forbidden for both consonants to be the same, as this would violate the rule against double consonants.

2. It is forbidden for one consonant to be voiced and the other unvoiced. The consonants l, m, n, and r are exempt from this restriction. As a result, bf is forbidden, and so is sd, but both fl and vl, and both ls and lz, are permitted.

3. It is forbidden for both consonants to be drawn from the set c, j, s, z.

4. The specific pairs cx, kx, xc, xk, and mz are forbidden.

These rules apply to all kinds of words, even Lojbanized names. If a name would normally contain a forbidden consonant pair, a y can be inserted to break up the pair:

Example 3.8.

• .djeimyz.

• [dʒɛj məzʔ]

• James

The regular English pronunciation of James, which is [dʒɛjmz], would Lojbanize as .djeimz., which contains a forbidden consonant pair.

## 3.7. Initial Consonant Pairs

The set of consonant pairs that may appear at the beginning of a word (excluding Lojbanized names) is far more restricted than the fairly large group of permissible consonant pairs described in Section 3.6. Even so, it is more than English allows, although hopefully not more than English-speakers (and others) can learn to pronounce.

There are just 48 such permissible initial consonant pairs, as follows:

 bl br cf ck cl cm cn cp cr ct dj dr dz fl fr gl gr jb jd jg jm jv kl kr ml mr pl pr sf sk sl sm sn sp sr st tc tr ts vl vr xl xr zb zd zg zm zv

Lest this list seem almost random, a pairing of voiced and unvoiced equivalent consonants will show significant patterns which may help in learning:

 pl pr fl fr bl br vl vr cp cf ct ck cm cn cl cr jb jv jd jg jm sp sf st sk sm sn sl sr zb zv zd zg zm tc tr ts kl kr dj dr dz gl gr ml mr xl xr

Note that if both consonants of an initial pair are voiced, the unvoiced equivalent is also permissible, and the voiced pair can be pronounced simply by voicing the unvoiced pair. (The converse is not true: cn is a permissible initial pair, but jn is not.)

Consonant triples can occur medially in Lojban words. They are subject to the following rules:

1. The first two consonants must constitute a permissible consonant pair;

2. The last two consonants must constitute a permissible initial consonant pair;

3. The triples ndj, ndz, ntc, and nts are forbidden.

Lojbanized names can begin or end with any permissible consonant pair, not just the 48 initial consonant pairs listed above, and can have consonant triples in any location, as long as the pairs making up those triples are permissible. In addition, Lojbanized names can contain consonant clusters with more than three consonants, again requiring that each pair within the cluster is valid.

## 3.8. Buffering Of Consonant Clusters

Many languages do not have consonant clusters at all, and even those languages that do have them often allow only a subset of the full Lojban set. As a result, the Lojban design allows the use of a buffer sound between consonant combinations which a speaker finds unpronounceable. This sound may be any non-Lojbanic vowel which is clearly separable by the listener from the Lojban vowels. Some possibilities are IPA [ɪ], [ɨ], [ʊ], or even [ʏ], but there probably is no universally acceptable buffer sound. When using a consonant buffer, the sound should be made as short as possible. Two examples showing such buffering (we will use [ɪ] in this chapter) are:

Example 3.9.

• vrusi

• [ˈvru si]

• or

• [vɪ ˈru si]

Example 3.10.

• .AMsterdam.

• [ʔam ster damʔ]

• or

• [ˈʔa mɪ sɪ tɛ rɪ da mɪʔ]

When a buffer vowel is used, it splits each buffered consonant into its own syllable. However, the buffering syllables are never stressed, and are not counted in determining stress. They are, in effect, not really syllables to a Lojban listener, and thus their impact is ignored.

Here are more examples of unbuffered and buffered pronunciations:

Example 3.11.

• klama

• [ˈkla ma]

• [kɪ ˈla ma]

Example 3.12.

• xapcke

• [ˈxap ʃkɛ]

• [ˈxa pɪ ʃkɛ]

• [ˈxa pɪ ʃɪ kɛ]

In Example 3.12, we see that buffering vowels can be used in just some, rather than all, of the possible places: the second pronunciation buffers the pc consonant pair but not the ck. The third pronunciation buffers both.

Example 3.13.

• ponyni'u

• [po nə 'ni hu]

Example 3.13 cannot contain any buffering vowel. It is important not to confuse the vowel y, which is pronounced [ə], with the buffer, which has a variety of possible pronunciations and is never written. Consider the contrast between

Example 3.14.

• bongynanba

• [boŋ gə ˈnan ba]

an unlikely Lojban compound word meaning bone bread (note the use of [ŋ] as a representative of n before g) and

Example 3.15.

• bongnanba

• [boŋ ˈgnan ba]

a possible borrowing from another language (Lojban borrowings can only take a limited form). If Example 3.15 were pronounced with buffering, as

Example 3.16.

• [boŋ gɪ ˈnan ba]

it would be very similar to Example 3.14. Only a clear distinction between y and any buffering vowel would keep the two words distinct.

Since buffering is done for the benefit of the speaker in order to aid pronounceability, there is no guarantee that the listener will not mistake a buffer vowel for one of the six regular Lojban vowels. The buffer vowel should be as laxly pronounced as possible, as central as possible, and as short as possible. Furthermore, it is worthwhile for speakers who use buffers to pronounce their regular vowels a bit longer than usual, to avoid confusion with buffer vowels. The speakers of many languages will have trouble correctly hearing any of the suggested buffer vowels otherwise. By this guideline, Example 3.16 would be pronounced

Example 3.17.

• [boːŋ gɪ ˈnaːn baː]

with lengthened vowels.

## 3.9. Syllabication And Stress

A Lojban word has one syllable for each of its vowels, diphthongs, and syllabic consonants (referred to simply as vowels for the purposes of this section.) Syllabication rules determine which of the consonants separating two vowels belong to the preceding vowel and which to the following vowel. These rules are conventional only; the phonetic facts of the matter about how utterances are syllabified in any language are always very complex.

A single consonant always belongs to the following vowel. A consonant pair is normally divided between the two vowels; however, if the pair constitute a valid initial consonant pair, they are normally both assigned to the following vowel. A consonant triple is divided between the first and second consonants. Apostrophes and commas, of course, also represent syllable breaks. Syllabic consonants usually appear alone in their syllables.

It is permissible to vary from these rules in Lojbanized names. For example, there are no definitive rules for the syllabication of Lojbanized names with consonant clusters longer than three consonants. The comma is used to indicate variant syllabication or to explicitly mark normal syllabication.

Here are some examples of Lojban syllabication:

Example 3.18.

• pujenaicajeba

• pu,je,nai,ca,je,ba

This word has no consonant pairs and is therefore syllabified before each medial consonant.

Example 3.19.

• ninmu

• nin,mu

This word is split at a consonant pair.

Example 3.20.

• fitpri

• fit,pri

This word is split at a consonant triple, between the first two consonants of the triple.

Example 3.21.

• sairgoi

• sair,goi

• sai,r,goi

This word contains the consonant pair rg; the r may be pronounced syllabically or not.

Example 3.22.

• klezba

• klez,ba

• kle,zba

This word contains the permissible initial pair zb, and so may be syllabicated either between z and b or before zb.

Stress is a relatively louder pronunciation of one syllable in a word or group of words. Since every syllable has a vowel sound (or diphthong or syllabic consonant) as its nucleus, and the stress is on the vowel sound itself, the terms stressed syllable and stressed vowel are largely interchangeable concepts.

Most Lojban words are stressed on the next-to-the-last, or penultimate, syllable. In counting syllables, however, syllables whose vowel is y or which contain a syllabic consonant (l, m, n, or r) are never counted. (The Lojban term for penultimate stress is da'amoi terbasna.) Similarly, syllables created solely by adding a buffer vowel, such as [ɪ], are not counted.

There are actually three levels of stress – primary, secondary, and weak. Weak stress is the lowest level, so it really means no stress at all. Weak stress is required for syllables containing y, a syllabic consonant, or a buffer vowel.

Primary stress is required on the penultimate syllable of Lojban content words (called brivla). Lojbanized names (called cmevla) may be stressed on any syllable, but if a syllable other than the penultimate is stressed, the syllable (or at least its vowel) must be capitalized in writing. Lojban structural words (called cmavo) may be stressed on any syllable or none at all. However, primary stress may not be used in a syllable just preceding a brivla, unless a pause divides them; otherwise, the two words may run together.

Secondary stress is the optional and non-distinctive emphasis used for other syllables besides those required to have either weak or primary stress. There are few rules governing secondary stress, which typically will follow a speaker's native language habits or preferences. Secondary stress can be used for contrast, or for emphasis of a point. Secondary stress can be emphasized at any level up to primary stress, although the speaker must not allow a false primary stress in brivla, since errors in word resolution could result.

The following are Lojban words with stress explicitly shown:

Example 3.23.

• dikyjvo

• DI,ky,jvo

(In a fully-buffered dialect, the pronunciation would be: ['di kə ʒɪ vo].) Note that the syllable ky is not counted in determining stress. The vowel y is never stressed in a normal Lojban context.

Example 3.24.

• .armstrong.

• .ARM,strong.

This is a Lojbanized version of the name Armstrong. The final g must be explicitly pronounced. With full buffering, the name would be pronounced:

Example 3.25.

• [ˈʔa rɪ mɪ sɪ tɪ ro nɪ gɪʔ]

However, there is no need to insert a buffer in every possible place just because it is inserted in one place: partial buffering is also acceptable. In every case, however, the stress remains in the same place: on the first syllable.

The English pronunciation of Armstrong, as spelled in English, is not correct by Lojban standards; the letters ng in English represent a velar nasal (IPA [ŋ]) which is a single consonant. In Lojban, ng represents two separate consonants that must both be pronounced; you may not use [ŋ] to pronounce Lojban ng, although [ŋg] is acceptable. English speakers are likely to have to pronounce the ending with a buffer, as one of the following:

Example 3.26.

• [ˈʔarm stron gɪʔ]

• or

• [ˈʔarm stroŋ gɪʔ]

• or even

• [ˈʔarm stro nɪgʔ]

The normal English pronunciation of the name Armstrong could be Lojbanized as:

Example 3.27.

• .ARMstron.

since Lojban n is allowed to be pronounced as the velar nasal [ŋ].

Here is another example showing the use of y:

Example 3.28.

• bisydja

• BI,sy,dja

• BI,syd,ja

This word is a compound word, or lujvo, built from the two affixes bis and dja. When they are joined, an impermissible consonant pair results: sd. In accordance with the algorithm for making lujvo, explained in Section 4.11, a y is inserted to separate the impermissible consonant pair; the y is not counted as a syllable for purposes of stress determination.

Example 3.29.

• da'udja

• da'UD,ja

• da'U,dja

These two syllabications sound the same to a Lojban listener – the association of unbuffered consonants in syllables is of no import in recognizing the word.

Example 3.30.

• e'u bridi

• e'u BRI,di

• E'u BRI,di

• e'U.BRI,di

In Example 3.30, e'u is a cmavo and bridi is a brivla. Either of the first two pronunciations is permitted: no primary stress on either syllable of e'u, or primary stress on the first syllable. The third pronunciation, which places primary stress on the second syllable of the cmavo, requires that – since the following word is a brivla – the two words must be separated by a pause. Consider the following two cases:

Example 3.31.

• le re nobli prenu

• le re NObli PREnu

Example 3.32.

• le re no bliprenu

• le re no bliPREnu

If the cmavo no in Example 3.32 were to be stressed, the phrase would sound exactly like the given pronunciation of Example 3.31, which is unacceptable in Lojban: a single pronunciation cannot represent both.

## 3.10. IPA For English Speakers

There are many dialects of English, thus making it difficult to define the standardized symbols of the IPA in terms useful to every reader. All the symbols used in this chapter are repeated here, in more or less alphabetical order, with examples drawn from General American. In addition, some attention is given to the Received Pronunciation of (British) English. These two dialects are referred to as GA and RP respectively. Speakers of other dialects should consult a book on phonetics or their local television sets.

## 3.11. English Analogues For Lojban Diphthongs

Here is a list of English words that contain diphthongs that are similar to the Lojban diphthongs. This list does not constitute an official pronunciation guide; it is intended as a help to English-speakers.

 Lojban English ai “pie” ei “pay” oi “boy” au “cow” ia “yard” ie “yes” ii “ye” io “yodel” (in GA only) iu “unicorn” or “few” ua “suave” ue “wet” ui “we” uo “woe” (in GA only) uu “woo” iy “million” (the “io” part, that is) uy “was” (when unstressed)

## 3.12. Oddball Orthographies

The following notes describe ways in which Lojban has been written or could be written that differ from the standard orthography explained in the rest of this chapter. Nobody needs to read this section except people with an interest in the obscure. Technicalities are used without explanation or further apology.

There exists an alternative orthography for Lojban, which is designed to be as compatible as possible (but no more so) with the orthography used in pre-Lojban versions of Loglan. The consonants undergo no change, except that x is replaced by h. The individual vowels likewise remain unchanged. However, the vowel pairs and diphthongs are changed as follows:

• ai, ei, oi, au become ai, ei, oi, ao.

• ia through iu and ua through uu remain unchanged.

• a'i, e'i, o'i and a'o become a,i, e,i, o,i and a,o.

• i'a through i'u and u'a through u'u are changed to ia through iu and ua through uu in lujvo and cmavo other than attitudinals, but become i,a through i,u and u,a through u,u in cmevla, fu'ivla, and attitudinal cmavo.

• All other vowel pairs simply drop the apostrophe.

The result of these rules is to eliminate the apostrophe altogether, replacing it with comma where necessary, and otherwise with nothing. In addition, names and the cmavo i are capitalized, and irregular stress is marked with an apostrophe (now no longer used for a sound) following the stressed syllable.

• It is not standard, and has not been used.

• It does not represent any changes to the standard Lojban phonology; it is simply a representation of the same phonology using a different written form.

• It was designed to aid in a planned rapprochement between the Logical Language Group and The Loglan Institute, a group headed by James Cooke Brown. The rapprochement never took place.

There also exists a Cyrillic orthography for Lojban which was designed when the introductory Lojban brochure was translated into Russian.

 а a
 б b
 ш c
 д d
 е e
 ф f
 г g
 и i
 ж j
 к k
 л k
 м m
 н n
 о o
 п p
 р r
 с s
 т t
 у u
 в v
 х x
 ъ y
 з z

The Lojban letter y is mapped onto the hard sign ъ, as in Bulgarian. The apostrophe, comma, and period are unchanged. Diphthongs are written as vowel pairs, as in the Roman representation. Capital Lojban letters are written using corresponding capital Cyrillic letters.

Finally, an orthography using the Tengwar of Féanor, a fictional orthography invented by J. R. R. Tolkien and described in the Appendixes to The Lord Of The Rings, has been devised for Lojban. The following mapping, which closely resembles that used for Westron, will be meaningful only to those who have read those appendixes. In brief, the tincotéma and parmatéma are used in the conventional ways; the calmatéma represents palatal consonants, and the quessetéma represents velar consonants.

 tinco t
 calma -
 ando d
 anga -
 thule -
 harma c
 anto -
 anca j
 numen n
 noldo -
 ore r
 anna i
 parma p
 quesse k
 umbar b
 ungwe g
 formen f
 hwesta x
 ampa v
 unque -
 malta m
 nwalme -
 vala u
 vilya -

The letters vala and anna are used for u and i only when those letters are used to represent glides. Of the additional letters, r, l, s, and z are written with rómen, lambe, silme, and áre/ esse respectively; the inverted forms are used as free variants.

Lojban, like Quenya, is a vowel-last language, so tehtar are read as following the tengwar on which they are placed. The conventional tehtar are used for the five regular vowels, and the dot below for y. The Lojban apostrophe is represented by halla. There is no equivalent of the Lojban comma or period.

# Chapter 4. The Shape Of Words To Come: Lojban Morphology

## 4.1. Introductory

Morphology is the part of grammar that deals with the form of words. Lojban's morphology is fairly simple compared to that of many languages, because Lojban words don't change form depending on how they are used. English has only a small number of such changes compared to languages like Russian, but it does have changes like boys as the plural of boy, or walked as the past-tense form of walk. To make plurals or past tenses in Lojban, you add separate words to the sentence that express the number of boys, or the time when the walking was going on.

However, Lojban does have what is called derivational morphology: the capability of building new words from old words. In addition, the form of words tells us something about their grammatical uses, and sometimes about the means by which they entered the language. Lojban has very orderly rules for the formation of words of various types, both the words that already exist and new words yet to be created by speakers and writers.

A stream of Lojban sounds can be uniquely broken up into its component words according to specific rules. These so-called morphology rules are summarized in this chapter. (However, a detailed algorithm for breaking sounds into words has not yet been fully debugged, and so is not presented in this book.) First, here are some conventions used to talk about groups of Lojban letters, including vowels and consonants.

1. V represents any single Lojban vowel except y; that is, it represents a, e, i, o, or u.

2. VV represents either a diphthong, one of the following:

 ai ei oi au

or a two-syllable vowel pair with an apostrophe separating the vowels, one of the following:

 a'a a'e a'i a'o a'u e'a e'e e'i e'o e'u i'a i'e i'i i'o i'u o'a o'e o'i o'o o'u u'a u'e u'i u'o u'u

3. C represents a single Lojban consonant, not including the apostrophe, one of b, c, d, f, g, j, k, l, m, n, p, r, s, t, v, x, or z . Syllabic l, m, n, and r always count as consonants for the purposes of this chapter.

4. CC represents two adjacent consonants of type C which constitute one of the 48 permissible initial consonant pairs:

 pl pr fl fr bl br vl vr cp cf ct ck cm cn cl cr jb jv jd jg jm sp sf st sk sm sn sl sr zb zv zd zg zm tc tr ts kl kr dj dr dz gl gr ml mr xl xr
5. C/C represents two adjacent consonants which constitute one of the permissible consonant pairs (not necessarily a permissible initial consonant pair). The permissible consonant pairs are explained in Section 3.6. In brief, any consonant pair is permissible unless it: contains two identical letters, contains both a voiced (excluding r, l, m, n) and an unvoiced consonant, or is one of certain specified forbidden pairs.

6. C/CC represents a consonant triple. The first two consonants must constitute a permissible consonant pair; the last two consonants must constitute a permissible initial consonant pair.

Lojban has three basic word classes – parts of speech – in contrast to the eight that are traditional in English. These three classes are called cmavo, brivla, and cmevla. Each of these classes has uniquely identifying properties – an arrangement of letters that allows the word to be uniquely and unambiguously recognized as a separate word in a string of Lojban, upon either reading or hearing, and as belonging to a specific word-class.

They are also functionally different: cmavo are the structure words, corresponding to English words like and, if, the and to; brivla are the content words, corresponding to English words like come, red, doctor, and freely; cmevla are proper names, corresponding to English James, Afghanistan, and Pope John Paul II.

## 4.2. cmavo

The first group of Lojban words discussed in this chapter are the cmavo. They are the structure words that hold the Lojban language together. They often have no semantic meaning in themselves, though they may affect the semantics of brivla to which they are attached. The cmavo include the equivalent of English articles, conjunctions, prepositions, numbers, and punctuation marks. There are over a hundred subcategories of cmavo, known as selma'o, each having a specifically defined grammatical usage. The various selma'o are discussed throughout Chapter 5 to Chapter 19 and summarized in Chapter 20.

Standard cmavo occur in four forms defined by their word structure. Here are some examples of the various forms:

 V-form CV-form VV-form CVV-form

In addition, there is the cmavo .y. (remember that y is not a V), which must have pauses before and after it.

A simple cmavo thus has the property of having only one or two vowels, or of having a single consonant followed by one or two vowels. Words consisting of three or more vowels in a row, or a single consonant followed by three or more vowels, are also of cmavo form, but are reserved for experimental use: a few examples are ku'a'e, sau'e, and bai'ai. All CVV cmavo beginning with the letter x are also reserved for experimental use. In general, though, the form of a cmavo tells you little or nothing about its grammatical use.

Experimental use means that the language designers will not assign any standard meaning or usage to these words, and words and usages coined by Lojban speakers will not appear in official dictionaries for the indefinite future. Experimental-use words provide an escape hatch for adding grammatical mechanisms (as opposed to semantic concepts) the need for which was not foreseen.

The cmavo of VV-form include not only the diphthongs and vowel pairs listed in Section 4.1, but also the following ten additional diphthongs:

In addition, cmavo can have the form Cy, a consonant followed by the letter y. These cmavo represent letters of the Lojban alphabet, and are discussed in detail in Chapter 17.

Compound cmavo are sequences of cmavo attached together to form a single written word. A compound cmavo is always identical in meaning and in grammatical use to the separated sequence of simple cmavo from which it is composed. These words are written in compound form merely to save visual space, and to ease the reader's burden in identifying when the component cmavo are acting together.

Compound cmavo, while not visually short like their components, can be readily identified by two characteristics:

1. They have no consonant pairs or clusters, and

2. They end in a vowel.

For example:

Example 4.1.

• .iseci'i

• .i se ci'i

Example 4.2.

• punaijecanai

• pu nai je ca nai

Example 4.3.

• ki'e.u'e

• ki'e .u'e

The cmavo u'e begins with a vowel, and like all words beginning with a vowel, requires a pause (represented by .) before it. This pause cannot be omitted simply because the cmavo is incorporated into a compound cmavo. On the other hand,

Example 4.4.

ki'e'u'e

is a single cmavo reserved for experimental purposes: it has four vowels.

Example 4.5.

• cy.ibu.abu

• cy. .ibu .abu

Again the pauses are required (see Section 4.9); the pause after cy. merges with the pause before .ibu.

There is no particular stress required in cmavo or their compounds. Some conventions do exist that are not mandatory. For two-syllable cmavo, for example, stress is typically placed on the first vowel; an example is

Example 4.6.

• .e'o ko ko kurji

• .E'o ko ko KURji

This convention results in a consistent rhythm to the language, since brivla are required to have penultimate stress; some find this esthetically pleasing.

If the final syllable of one word is stressed, and the first syllable of the next word is stressed, you must insert a pause or glottal stop between the two stressed syllables. Thus

Example 4.7.

le re nanmu

can be optionally pronounced

Example 4.8.

• le RE. NANmu

since there are no rules forcing stress on either of the first two words; the stress on re, though, demands that a pause separate re from the following syllable nan to ensure that the stress on nan is properly heard as a stressed syllable. The alternative pronunciation

Example 4.9.

• LE re NANmu

is also valid; this would apply secondary stress (used for purposes of emphasis, contrast or sentence rhythm) to le, comparable in rhythmical effect to the English phrase THE two men. In Example 4.8, the secondary stress on re would be similar to that in the English phrase the TWO men.

Both cmavo may also be left unstressed, thus:

Example 4.10.

• le re NANmu

This would probably be the most common usage.

## 4.3. brivla

Predicate words, called brivla, are at the core of Lojban. They carry most of the semantic information in the language. They serve as the equivalent of English nouns, verbs, adjectives, and adverbs, all in a single part of speech.

Every brivla belongs to one of three major subtypes. These subtypes are defined by the form, or morphology, of the word – all words of a particular structure can be assigned by sight or sound to a particular type (cmavo, brivla, or cmevla) and subtype. Knowing the type and subtype then gives you, the reader or listener, significant clues to the meaning and the origin of the word, even if you have never heard the word before.

The same principle allows you, when speaking or writing, to invent new brivla for new concepts on the fly; yet it offers people that you are trying to communicate with a good chance to figure out your meaning. In this way, Lojban has a flexible vocabulary which can be expanded indefinitely.

All brivla have the following properties:

1. always end in a vowel;

2. always contain a consonant pair in the first five letters, where y and apostrophe are not counted as letters for this purpose (see Section 4.6.);

3. always are stressed on the next-to-the-last (penultimate) syllable; this implies that they have two or more syllables.

The presence of a consonant pair distinguishes brivla from cmavo and their compounds. The final vowel distinguishes brivla from cmevla, which always end in a consonant. Thus da'amei must be a compound cmavo because it lacks a consonant pair; lojban. must be a cmevla because it lacks a final vowel.

Thus, bisycla has the consonant pair sc in the first five non- y letters even though the sc actually appears in the form of sy.. Similarly, the word ro'inre'o contains nr in the first five letters because the apostrophes are not counted for this purpose.

The three subtypes of brivla are:

1. gismu, the Lojban primitive roots from which all other brivla are built;

2. lujvo, the compounds of two or more gismu; and

3. fu'ivla (literally copy-word), the specialized words that are not Lojban primitives or natural compounds, and are therefore borrowed from other languages.

## 4.4. gismu

The gismu, or Lojban root words, are those brivla representing concepts most basic to the language. The gismu were chosen for various reasons: some represent concepts that are very familiar and basic; some represent concepts that are frequently used in other languages; some were added because they would be helpful in constructing more complex words; some because they represent fundamental Lojban concepts (like cmavo and gismu themselves).

The gismu do not represent any sort of systematic partitioning of semantic space. Some gismu may be superfluous, or appear for historical reasons: the gismu list was being collected for almost 35 years and was only weeded out once. Instead, the intention is that the gismu blanket semantic space: they make it possible to talk about the entire range of human concerns.

There are about 1350 gismu. In learning Lojban, you need only to learn most of these gismu and their combining forms (known as rafsi) as well as perhaps 200 major cmavo, and you will be able to communicate effectively in the language. This may sound like a lot, but it is a small number compared to the vocabulary needed for similar communications in other languages.

All gismu have very strong form restrictions. Using the conventions defined in Section 4.1, all gismu are of the forms CVC/CV or CCVCV. They must meet the rules for all brivla given in Section 4.3; furthermore, they:

1. always have five letters;

2. always start with a consonant and end with a single vowel;

3. always contain exactly one consonant pair, which is a permissible initial pair (CC) if it's at the beginning of the gismu, but otherwise only has to be a permissible pair (C/C);

4. are always stressed on the first syllable (since that is penultimate).

The five letter length distinguishes gismu from lujvo and fu'ivla. In addition, no gismu contains ' .

With the exception of five special brivla variables, broda, brode, brodi, brodo, and brodu, no two gismu differ only in the final vowel. Furthermore, the set of gismu was specifically designed to reduce the likelihood that two similar sounding gismu could be confused. For example, because gismu is in the set of gismu, kismu, xismu, gicmu, gizmu, and gisnu cannot be.

Almost all Lojban gismu are constructed from pieces of words drawn from other languages, specifically Chinese, English, Hindi, Spanish, Russian, and Arabic, the six most widely spoken natural languages. For a given concept, words in the six languages that represent that concept were written in Lojban phonetics. Then a gismu was selected to maximize the recognizability of the Lojban word for speakers of the six languages by weighting the inclusion of the sounds drawn from each language by the number of speakers of that language. See Section 4.14 for a full explanation of the algorithm.

Here are a few examples of gismu, with rough English equivalents (not definitions):

Example 4.11.

 creka
 shirt

Example 4.12.

 lijda
 religion

Example 4.13.

 blanu
 blue

Example 4.14.

 mamta
 mother

Example 4.15.

 cukta
 book

Example 4.16.

 patfu
 father

Example 4.17.

 nanmu
 man

Example 4.18.

 ninmu
 woman

A small number of gismu were formed differently; see Section 4.15 for a list.

## 4.5. lujvo

When specifying a concept that is not found among the gismu (or, more specifically, when the relevant gismu seems too general in meaning), a Lojbanist generally attempts to express the concept as a tanru. Lojban tanru are an elaboration of the concept of metaphor used in English. In Lojban, any brivla can be used to modify another brivla. The first of the pair modifies the second. This modification is usually restrictive – the modifying brivla reduces the broader sense of the modified brivla to form a more narrow, concrete, or specific concept. Modifying brivla may thus be seen as acting like English adverbs or adjectives. For example,

Example 4.19.

skami pilno

is the tanru which expresses the concept of computer user.

The simplest Lojban tanru are pairings of two concepts or ideas. Such tanru take two simpler ideas that can be represented by gismu and combine them into a single more complex idea. Two-part tanru may then be recombined in pairs with other tanru, or with individual gismu, to form more complex or more specific ideas, and so on.

The meaning of a tanru is usually at least partly ambiguous: skami pilno could refer to a computer that is a user, or to a user of computers. There are a variety of ways that the modifier component can be related to the modified component. It is also possible to use cmavo within tanru to provide variations (or to prevent ambiguities) of meaning.

Making tanru is essentially a poetic or creative act, not a science. While the syntax expressing the grouping relationships within tanru is unambiguous, tanru are still semantically ambiguous, since the rules defining the relationships between the gismu are flexible. The process of devising a new tanru is dealt with in detail in Chapter 5.

To express a simple tanru, simply say the component gismu together. Thus the binary metaphor big boat becomes the tanru

Example 4.20.

barda bloti

representing roughly the same concept as the English word ship.

The binary metaphor father mother can refer to a paternal grandmother (a father-ly type of mother), while mother father can refer to a maternal grandfather (a mother-ly type of father). In Lojban, these become the tanru

Example 4.21.

patfu mamta

and

Example 4.22.

mamta patfu

respectively.

The possibility of semantic ambiguity can easily be seen in the last case. To interpret Example 4.22, the listener must determine what type of motherliness pertains to the father being referred to. In an appropriate context, mamta patfu could mean not grandfather but simply father with some motherly attributes, depending on the culture. If absolute clarity is required, there are ways to expand upon and explain the exact interrelationship between the components; but such detail is usually not needed.

When a concept expressed in a tanru proves useful, or is frequently expressed, it is desirable to choose one of the possible meanings of the tanru and assign it to a new brivla. For Example 4.19, we would probably choose user of computers, and form the new word

Example 4.23.

 sampli

Such a brivla, built from the rafsi which represent its component words, is called a lujvo. Another example, corresponding to the tanru of Example 4.20, would be:

Example 4.24.

 bralo'i “big-boat” ship

The lujvo representing a given tanru is built from units representing the component gismu. These units are called rafsi in Lojban. Each rafsi represents only one gismu. The rafsi are attached together in the order of the words in the tanru, occasionally inserting so-called hyphen letters to ensure that the pieces stick together as a single word and cannot accidentally be broken apart into cmavo, gismu, or other word forms. As a result, each lujvo can be readily and accurately recognized, allowing a listener to pick out the word from a string of spoken Lojban, and if necessary, unambiguously decompose the word to a unique source tanru, thus providing a strong clue to its meaning.

The lujvo that can be built from the tanru mamta patfu in Example 4.22 is

Example 4.25.

 mampa'u

which refers specifically to the concept maternal grandfather. The two gismu that constitute the tanru are represented in mampa'u by the rafsi mam- and -pa'u, respectively; these two rafsi are then concatenated together to form mampa'u.

Like gismu, lujvo have only one meaning. When a lujvo is formally entered into a dictionary of the language, a specific definition will be assigned based on one particular interrelationship between the terms. (See Chapter 12 for how this has been done.) Unlike gismu, lujvo may have more than one form. This is because there is no difference in meaning between the various rafsi for a gismu when they are used to build a lujvo. A long rafsi may be used, especially in noisy environments, in place of a short rafsi; the result is considered the same lujvo, even though the word is spelled and pronounced differently. Thus the word brivla, built from the tanru bridi valsi, is the same lujvo as brivalsi, bridyvla, and bridyvalsi, each of which uses a different combination of rafsi.

When assembling rafsi together into lujvo, the rules for valid brivla must be followed: a consonant cluster must occur in the first five letters (excluding y and ' ), and the lujvo must end in a vowel.

A y (which is ignored in determining stress or consonant clusters) is inserted in the middle of the consonant cluster to glue the word together when the resulting cluster is either not permissible or the word is likely to break up. There are specific rules describing these conditions, detailed in Section 4.6.

An r (in some cases, an n) is inserted when a CVV-form rafsi attaches to the beginning of a lujvo in such a way that there is no consonant cluster. For example, in the lujvo

Example 4.26.

 soirsai from sonci sanmi “soldier meal” field rations

the rafsi soi- and -sai are joined, with the additional r making up the rs consonant pair needed to make the word a brivla. Without the r, the word would break up into soi sai, two cmavo. The pair of cmavo have no relation to their rafsi lookalikes; they will either be ungrammatical (as in this case), or will express a different meaning from what was intended.

Learning rafsi and the rules for assembling them into lujvo is clearly seen to be necessary for fully using the potential Lojban vocabulary.

Most important, it is possible to invent new lujvo while you speak or write in order to represent a new or unfamiliar concept, one for which you do not know any existing Lojban word. As long as you follow the rules for building these compounds, there is a good chance that you will be understood without explanation.

## 4.6. rafsi

Every gismu has from two to five rafsi, each of a different form, but each such rafsi represents only one gismu. It is valid to use any of the rafsi forms in building lujvo – whichever the reader or listener will most easily understand, or whichever is most pleasing – subject to the rules of lujvo making. There is a scoring algorithm which is intended to determine which of the possible and legal lujvo forms will be the standard dictionary form (see Section 4.12).

Each gismu always has at least two rafsi forms; one is the gismu itself (used only at the end of a lujvo), and one is the gismu without its final vowel (used only at the beginning or middle of a lujvo). These forms are represented as CVC/CV or CCVCV (called the 5-letter rafsi), and CVC/C or CCVC (called the 4-letter rafsi) respectively. The dashes in these rafsi form representations show where other rafsi may be attached to form a valid lujvo. When lujvo are formed only from 4-letter and 5-letter rafsi, known collectively as long rafsi, they are called unreduced lujvo.

Some examples of unreduced lujvo forms are:

Example 4.27.

 mamtypatfu from mamta patfu “mother father” or “maternal grandfather”

Example 4.28.

 lerfyliste from lerfu liste “letter list” or a “list of letters” (letters of the alphabet)

Example 4.29.

 nancyprali from nanca prali “year profit” or “annual profit”

Example 4.30.

 prunyplipe from pruni plipe “elastic (springy) leap” or “spring” (the verb)

Example 4.31.

 vancysanmi from vanci sanmi “evening meal” or “supper”

In addition to these two forms, each gismu may have up to three additional short rafsi, three letters long. All short rafsi have one of the forms CVC, CCV, or CVV. The total number of rafsi forms that are assigned to a gismu depends on how useful the gismu is, or is presumed to be, in making lujvo, when compared to other gismu that could be assigned the rafsi.

For example, zmadu (more than) has the two short rafsi zma and mau (in addition to its unreduced rafsi zmad and zmadu), because a vast number of lujvo have been created based on zmadu, corresponding in general to English comparative adjectives ending in -er such as whiter (Lojban labmau). On the other hand, bakri (chalk) has no short rafsi and few lujvo.

There are at most one CVC-form, one CCV-form, and one CVV-form rafsi per gismu. In fact, only a tiny handful of gismu have both a CCV-form and a CVV-form rafsi assigned, and still fewer have all three forms of short rafsi. However, gismu with both a CVC-form and another short rafsi are fairly common, partly because more possible CVC-form rafsi exist. Yet CVC-form rafsi, even though they are fairly easy to remember, cannot be used at the end of a lujvo (because lujvo must end in vowels), so justifying the assignment of an additional short rafsi to many gismu.

The intention was to use the available rafsi space- the set of all possible short rafsi forms – in the most efficient way possible; the goal is to make the most-used lujvo as short as possible (thus maximizing the use of short rafsi), while keeping the rafsi very recognizable to anyone who knows the source gismu. For this reason, the letters in a rafsi have always been chosen from among the five letters of the corresponding gismu. As a result, there are a limited set of short rafsi available for assignment to each gismu. At most seven possible short rafsi are available for consideration (of which at most three can be used, as explained above).

Here are the only short rafsi forms that can possibly exist for gismu of the form CVC/CV, like sakli. The digits in the second column represent the gismu letters used to form the rafsi.

 CVC 123 -sak- CVC 124 -sal- CVV 12'5 -sa'i- CVV 125 -sai- CCV 345 -kli- CCV 132 -ska-

(The only actual short rafsi for sakli is -sal-.)

For gismu of the form CCVCV, like blaci, the only short rafsi forms that can exist are:

 CVC 134 -bac- CVC 234 -lac- CVV 13'5 -ba'i- CVV 135 -bai- CVV 23'5 -la'i- CVV 235 -lai- CCV 123 -bla-

(In fact, blaci has none of these short rafsi; they are all assigned to other gismu. Lojban speakers are not free to reassign any of the rafsi; the tables shown here are to help understand how the rafsi were chosen in the first place.)

There are a few restrictions: a CVV-form rafsi without an apostrophe cannot exist unless the vowels make up one of the four diphthongs ai, ei, oi, or au; and a CCV-form rafsi is possible only if the two consonants form a permissible initial consonant pair (see Section 4.1). Thus mamta, which has the same form as salci, can only have mam, mat, and ma'a as possible rafsi: in fact, only mam is assigned to it.

Some cmavo also have associated rafsi, usually CVC-form. For example, the ten common numerical digits, which are all CV form cmavo, each have a CVC-form rafsi formed by adding a consonant to the cmavo. Most cmavo that have rafsi are ones used in composing tanru.

The term for a lujvo made up solely of short rafsi is fully reduced lujvo. Here are some examples of fully reduced lujvo:

Example 4.32.

 cumfri from cumki lifri “possible experience”

Example 4.33.

 klezba from klesi zbasu “category make”

Example 4.34.

 kixta'a from krixa tavla “cry-out talk”

Example 4.35.

 sniju'o from sinxa djuno “sign know”

In addition, the unreduced forms in Example 4.27 and Example 4.28 may be fully reduced to:

Example 4.36.

 mampa'u from mamta patfu “mother father” or “maternal grandfather”

Example 4.37.

 lerste from lerfu liste “letter list” or a “list of letters”

As noted above, CVC-form rafsi cannot appear as the final rafsi in a lujvo, because all lujvo must end with one or two vowels. As a brivla, a lujvo must also contain a consonant cluster within the first five letters – this ensures that they cannot be mistaken for compound cmavo. Of course, all lujvo have at least six letters since they have two or more rafsi, each at least three letters long; hence they cannot be confused with gismu.

When attaching two rafsi together, it may be necessary to insert a hyphen letter. In Lojban, the term hyphen always refers to a letter, either the vowel y or one of the consonants r and n. (The letter l can also be a hyphen, but is not used as one in lujvo.)

The y-hyphen is used after a CVC-form rafsi when joining it with the following rafsi could result in an impermissible consonant pair, or when the resulting lujvo could fall apart into two or more words (either cmavo or gismu).

Thus, the tanru pante tavla (protest talk) cannot produce the lujvo patta'a, because tt is not a permissible consonant pair; the lujvo must be patyta'a. Similarly, the tanru mudri siclu (wooden whistle) cannot form the lujvo mudsiclu; instead, mudysiclu must be used. (Remember that y is not counted in determining whether the first five letters of a brivla contain a consonant cluster: this is why.)

The y-hyphen is also used to attach a 4-letter rafsi, formed by dropping the final vowel of a gismu, to the following rafsi. (This procedure was shown, but not explained, in Example 4.27 to Example 4.31.)

The lujvo forms zunlyjamfu, zunlyjma, zuljamfu, and zuljma are all legitimate and equivalent forms made from the tanru zunle jamfu (left foot). Of these, zuljma is the preferred one since it is the shortest; it thus is likely to be the form listed in a Lojban dictionary.

The r-hyphen and its close relative, the n-hyphen, are used in lujvo only after CVV-form rafsi. A hyphen is always required in a two-part lujvo of the form CVV-CVV, since otherwise there would be no consonant cluster.

An r-hyphen or n-hyphen is also required after the CVV-form rafsi of any lujvo of the form CVV-CVC/CV or CVV-CCVCV since it would otherwise fall apart into a CVV-form cmavo and a gismu. In any lujvo with more than two parts, a CVV-form rafsi in the initial position must always be followed by a hyphen. If the hyphen were to be omitted, the supposed lujvo could be broken into smaller words without the hyphen: because the CVV-form rafsi would be interpreted as a cmavo, and the remainder of the word as a valid lujvo that is one rafsi shorter.

An n-hyphen is only used in place of an r-hyphen when the following rafsi begins with r. For example, the tanru rokci renro (rock throw) cannot be expressed as ro'ire'o (which breaks up into two cmavo), nor can it be ro'irre'o (which has an impermissible double consonant); the n-hyphen is required, and the correct form of the hyphenated lujvo is ro'inre'o. The same lujvo could also be expressed without hyphenation as rokre'o.

There is also a different way of building lujvo, or rather phrases which are grammatically and semantically equivalent to lujvo. You can make a phrase containing any desired words, joining each pair of them with the special cmavo zei. Thus,

Example 4.38.

 bridi zei valsi

is the exact equivalent of brivla (but not necessarily the same as the underlying tanru bridi valsi, which could have other meanings.) Using zei is the only way to get a cmavo lacking a rafsi, a cmevla, or a fu'ivla into a lujvo:

Example 4.39.

 xy. zei kantu X ray

Example 4.40.

 kulnr,farsi zei lolgai “Farsi floor-cover” Persian rug

Example 4.41.

 na'e zei .a zei na'e zei by. livgyterbilma “non-A, non-B liver-disease” non-A, non-B hepatitis

Example 4.42.

 .cerman. zei jamkarce “Sherman war-car” Sherman tank

Example 4.41 is particularly noteworthy because the phrase that would be produced by removing the zeis from it doesn't end with a brivla, and in fact is not even grammatical. As written, the example is a tanru with two components, but by adding a zei between by. and livgyterbilma to produce

Example 4.43.

 na'e zei .a zei na'e zei by. zei livgyterbilma non-A-non-B-hepatitis

the whole phrase would become a single lujvo. The longer lujvo of Example 4.43 may be preferable, because its place structure can be built from that of bilma, whereas the place structure of a lujvo without a brivla must be constructed ad hoc.

Note that rafsi may not be used in zei phrases, because they are not words. CVV rafsi look like words (specifically cmavo) but there can be no confusion between the two uses of the same letters, because cmavo appear only as separate words or in compound cmavo (which are really just a notation for writing separate but closely related words as if they were one); rafsi appear only as parts of lujvo.

## 4.7. fu'ivla

The use of tanru or lujvo is not always appropriate for very concrete or specific terms (e.g. brie or cobra), or for jargon words specialized to a narrow field (e.g. quark, integral, or iambic pentameter). These words are in effect names for concepts, and the names were invented by speakers of another language. The vast majority of words referring to plants, animals, foods, and scientific terminology cannot be easily expressed as tanru. They thus must be borrowed (actually copied) into Lojban from the original language.

There are four stages of borrowing in Lojban, as words become more and more modified (but shorter and easier to use). Stage 1 is the use of a foreign name quoted with the cmavo la'o (explained in full in Section 19.10):

Example 4.44.

me la'o ly. spaghetti .ly.

is a predicate with the place structure x1 is a quantity of spaghetti.

Stage 2 involves changing the foreign name to a Lojbanized name, as explained in Section 4.8:

Example 4.45.

me la .spagetis.

One of these expedients is often quite sufficient when you need a word quickly in conversation. (This can make it easier to get by when you do not yet have full command of the Lojban vocabulary, provided you are talking to someone who will recognize the borrowing.)

Where a little more universality is desired, the word to be borrowed must be Lojbanized into one of several permitted forms. A rafsi is then usually attached to the beginning of the Lojbanized form, using a hyphen to ensure that the resulting word doesn't fall apart.

The rafsi categorizes or limits the meaning of the fu'ivla; otherwise a word having several different jargon meanings in other languages would require the word-inventor to choose which meaning should be assigned to the fu'ivla, since fu'ivla (like other brivla) are not permitted to have more than one definition. Such a Stage 3 borrowing is the most common kind of fu'ivla.

Finally, Stage 4 fu'ivla do not have any rafsi classifier, and are used where a fu'ivla has become so common or so important that it must be made as short as possible. (See Section 4.16 for a proposal concerning Stage 4 fu'ivla.)

The form of a fu'ivla reliably distinguishes it from both the gismu and the cmavo. Like cultural gismu, fu'ivla are generally based on a word from a single non-Lojban language. The word is borrowed (actually copied, hence the Lojban tanru fukpi valsi) from the other language and Lojbanized – the phonemes are converted to their closest Lojban equivalent and modifications are made as necessary to make the word a legitimate Lojban fu'ivla-form word. All fu'ivla:

1. must contain a consonant cluster in the first five letters of the word; if this consonant cluster is at the beginning, it must either be a permissible initial consonant pair, or a longer cluster such that each pair of adjacent consonants in the cluster is a permissible initial consonant pair: spraile is acceptable, but not ktraile or trkaile;

2. must end in one or more vowels;

3. must not be gismu or lujvo, or any combination of cmavo, gismu, and lujvo; furthermore, a fu'ivla with a CV cmavo joined to the front of it must not have the form of a lujvo (the so-called slinku'i test, not discussed further in this book);

4. cannot contain y, although they may contain syllabic pronunciations of Lojban consonants;

5. like other brivla, are stressed on the penultimate syllable.

Note that consonant triples or larger clusters that are not at the beginning of a fu'ivla can be quite flexible, as long as all consonant pairs are permissible. There is no need to restrict fu'ivla clusters to permissible initial pairs except at the beginning.

This is a fairly liberal definition and allows quite a lot of possibilities within fu'ivla space. Stage 3 fu'ivla can be made easily on the fly, as lujvo can, because the procedure for forming them always guarantees a word that cannot violate any of the rules. Stage 4 fu'ivla require running tests that are not simple to characterize or perform, and should be made only after deliberation and by someone knowledgeable about all the considerations that apply.

Here is a simple and reliable procedure for making a non-Lojban word into a valid Stage 3 fu'ivla:

1. Eliminate all double consonants and silent letters.

2. Convert all sounds to their closest Lojban equivalents. Lojban y, however, may not be used in any fu'ivla.

3. If the last letter is not a vowel, modify the ending so that the word ends in a vowel, either by removing a final consonant or by adding a suggestively chosen final vowel.

4. If the first letter is not a consonant, modify the beginning so that the word begins with a consonant, either by removing an initial vowel or adding a suggestively chosen initial consonant.

5. Prefix the result of steps 1-4 with a 4-letter rafsi that categorizes the fu'ivla into a topic area. It is only safe to use a 4-letter rafsi; short rafsi sometimes produce invalid fu'ivla. Hyphenate the rafsi to the rest of the fu'ivla with an r-hyphen; if that would produce a double r, use an n-hyphen instead; if the rafsi ends in r and the rest of the fu'ivla begins with n (or vice versa), or if the rafsi ends in "r" and the rest of the fu'ivla begins with "tc", "ts", "dj", or "dz" (using "n" would result in a phonotactically impermissible cluster), use an l-hyphen. (This is the only use of l-hyphen in Lojban.)

Alternatively, if a CVC-form short rafsi is available it can be used instead of the long rafsi.

6. Remember that the stress necessarily appears on the penultimate (next-to-the-last) syllable.

In this section, the hyphen is set off with commas in the examples, but these commas are not required in writing, and the hyphen need not be pronounced as a separate syllable.

Here are a few examples:

Example 4.46.

 spaghetti (from English or Italian) spageti (Lojbanize) cidj,r,spageti (prefix long rafsi) dja,r,spageti (prefix short rafsi)

where cidj- is the 4-letter rafsi for cidja, the Lojban gismu for food, thus categorizing cidjrspageti as a kind of food. The form with the short rafsi happens to work, but such good fortune cannot be relied on: in any event, it means the same thing.

Example 4.47.

 Acer (the scientific name of maple trees) acer (Lojbanize) xaceru (add initial consonant and final vowel) tric,r,xaceru (prefix rafsi) ric,r,xaceru (prefix short rafsi)

where tric- and ric- are rafsi for tricu, the gismu for tree. Note that by the same principles, maple sugar could get the fu'ivla saktrxaceru, or could be represented by the tanru tricrxaceru sakta. Technically, ricrxaceru and tricrxaceru are distinct fu'ivla, but they would surely be given the same meanings if both happened to be in use.

Example 4.48.

 brie (from French) bri (Lojbanize) cirl,r,bri (prefix rafsi)

where cirl- represents cirla (cheese).

Example 4.49.

 cobra kobra (Lojbanize) sinc,r,kobra (prefix rafsi)

where sinc- represents since (snake).

Example 4.50.

where sask- represents saske (science). Note the extra vowel a added to the end of the word, and the diphthong ua, which never appears in gismu or lujvo, but may appear in fu'ivla.

Example 4.51.

 자모 (from Korean) djamo (Lojbanize) lerf,r,djamo (prefix rafsi) ler,l,djamo (prefix rafsi)

where ler- represents lerfu (letter). Note the l-hyphen in "lerldjamo", since "lerndjamo" contains the forbidden cluster "ndj".

The use of the prefix helps distinguish among the many possible meanings of the borrowed word, depending on the field. As it happens, spageti and kuarka are valid Stage 4 fu'ivla, but xaceru looks like a compound cmavo, and kobra like a gismu.

For another example, integral has a specific meaning to a mathematician. But the Lojban fu'ivla integrale, which is a valid Stage 4 fu'ivla, does not convey that mathematical sense to a non-mathematical listener, even one with an English-speaking background; its source – the English word integral – has various other specialized meanings in other fields.

Left uncontrolled, integrale almost certainly would eventually come to mean the same collection of loosely related concepts that English associates with integral, with only the context to indicate (possibly) that the mathematical term is meant.

The prefix method would render the mathematical concept as cmacrntegrale, if the i of integrale is removed, or something like cmacrnintegrale, if a new consonant is added to the beginning; cmac- is the rafsi for cmaci (mathematics). The architectural sense of integral might be conveyed with dinjrnintegrale or tarmrnintegrale, where dinju and tarmi mean building and form respectively.

Here are some fu'ivla representing cultures and related things, shown with more than one rafsi prefix:

Example 4.52.

 bang,r,blgaria Bulgarian (in language)

Example 4.53.

 kuln,r,blgaria Bulgarian (in culture)

Example 4.54.

 gugd,r,blgaria Bulgaria (the country)

Example 4.55.

 bang,r,kore,a Korean (the language)

Example 4.56.

 kuln,r,kore,a Korean (the culture)

Note the commas in Example 4.55 and Example 4.56, used because ea is not a valid diphthong in Lojban. Arguably, some form of the native name Chosen should have been used instead of the internationally known Korea; this is a recurring problem in all borrowings. In general, it is better to use the native name unless using it will severely impede understanding: Navajo is far more widely known than Dine'e.

## 4.8. cmevla

Lojbanized names, called cmevla, are very much like their counterparts in other languages. They are labels applied to things (or people) to stand for them in descriptions or in direct address. They may convey meaning in themselves, but do not necessarily do so.

Because names are often highly personal and individual, Lojban attempts to allow native language names to be used with a minimum of modification. The requirement that the Lojban speech stream be unambiguously analyzable, however, means that most names must be modified somewhat when they are Lojbanized. Here are a few examples of English names and possible Lojban equivalents:

Example 4.57.

 .djim. Jim

Example 4.58.

 .djein. Jane

Example 4.59.

 .arnold. Arnold

Example 4.60.

 .pit. Pete

Example 4.61.

 .katrinas. Katrina

Example 4.62.

 .kat,r,in. Catherine

(Note that syllabic r is skipped in determining the stressed syllable, so Example 4.62 is stressed on the ka.)

Example 4.63.

 .katis. Cathy

Example 4.64.

 .keit. Kate

Cmevla may have almost any form, but always end in a consonant, and are followed by a pause. They are penultimately stressed, unless unusual stress is marked with capitalization. A cmevla may have multiple parts, each ending with a consonant and pause, or the parts may be combined into a single word with no pause. For example,

Example 4.65.

 .djan. .braun.

and

Example 4.66.

 .djanbraun.

are both valid Lojbanizations of John Brown.

The final arbiter of the correct form of a name is the person doing the naming, although most cultures grant people the right to determine how they want their own name to be spelled and pronounced. The English name Mary can thus be Lojbanized as .meris., .maris., .meiris., .merix., or even .marys.. The last alternative is not pronounced much like its English equivalent, but may be desirable to someone who values spelling over pronunciation. The final consonant need not be an s; there must, however, be some Lojban consonant at the end.

Lojban cmevla are identifiable as word forms by the following characteristics:

1. They must end in one or more consonants. There are no rules about how many consonants may appear in a cluster in cmevla, provided that each consonant pair (whether standing by itself, or as part of a larger cluster) is a permissible pair.

2. They may contain the letter y as a normal, non-hyphenating vowel. They are the only kind of Lojban word that may contain the two diphthongs iy and uy.

3. They are always surrounded in speech by pauses, one right before the first consonant, and the other one right after the final consonant, both being written as ..

4. They may be stressed on any syllable; if this syllable is not the penultimate one, it must be capitalized when writing. Neither names nor words that begin sentences are capitalized in Lojban, so this is the only use of capital letters.

cmevla meeting these criteria may be invented, Lojbanized from names in other languages, or formed by appending a consonant onto a cmavo, a gismu, a fu'ivla or a lujvo. Some cmevla built from Lojban words are:

Example 4.67.

 .pav.
 the One

from the cmavo pa, with rafsi pav, meaning one

Example 4.68.

 .sol.
 the Sun

from the gismu solri, meaning solar, or actually pertaining to the Sun

Example 4.69.

 .ralj.
 Chief (as a title)

from the gismu ralju, meaning principal.

Example 4.70.

 .nol.

from the gismu nobli, with rafsi nol, meaning noble.

To Lojbanize a name from the various natural languages, apply the following rules:

1. Eliminate double consonants and silent letters.

2. Add a final s or n (or some other consonant that sounds good) if the name ends in a vowel.

3. Convert all sounds to their closest Lojban equivalents.

4. If possible and acceptable, shift the stress to the penultimate (next-to-the-last) syllable. Use commas and capitalization in written Lojban when it is necessary to preserve non-standard syllabication or stress. Do not capitalize names otherwise.

5. If the name contains an impermissible consonant pair, insert a vowel between the consonants: y is recommended.

There are some additional rules for Lojbanizing the scientific names (technically known as Linnaean binomials after their inventor) which are internationally applied to each species of animal or plant. Where precision is essential, these names need not be Lojbanized, but can be directly inserted into Lojban text using the cmavo la'o, explained in Section 19.10. Using this cmavo makes the already lengthy Latinized names at least four syllables longer, however, and leaves the pronunciation in doubt. The following suggestions, though incomplete, will assist in converting Linnaean binomals to valid Lojban names. They can also help to create fu'ivla based on Linnaean binomials or other words of the international scientific vocabulary. The term back vowel in the following list refers to any of the letters a, o, or u; the term front vowel correspondingly refers to any of the letters e, i, or y.

1. Change double consonants other than cc to single consonants.

2. Change cc before a front vowel to kc, but otherwise to k.

3. Change c before a back vowel and final c to k.

4. Change ng before a consonant (other than h) and final ng to n.

5. Change x to z initially, but otherwise to ks.

6. Change pn to n initially.

7. Change final ie and ii to i.

8. Make the following idiosyncratic substitutions:

 aa a ae e ch k ee i eigh ei ew u igh ai oo u ou u ow au ph f q k sc sk w u y i

However, the diphthong substitutions should not be done if the two vowels are in two different syllables.

9. Change h between two vowels to ' , but otherwise remove it completely. If preservation of the h seems essential, change it to x instead.

10. Place ' between any remaining vowel pairs that do not form Lojban diphthongs.

Some further examples of Lojbanized names are:

 English “Mary” .meris. or .meiris. English “Smith” .smit. English “Jones” .djonz. English “John” .djan. or .jan. (American) or .djon. or .jon. (British) English “Alice” .alis. English “Elise” .eLIS. English “Johnson” .djansn. English “William” .uiliam. or .uil,iam. English “Brown” .braun. English “Charles” .tcarlz. French “Charles” .carl. French “De Gaulle” .dyGOL. German “Heinrich” .xainrix. Spanish “Joaquin” .xuaKIN. Russian “Svetlana” .sfietlanys. Russian “Khrushchev” .xrucTCOF. Hindi “Krishna” .kricnas. Polish “Lech Walesa” .lex. .va,uensas. Spanish “Don Quixote” .don. .kicotes. or modern Spanish: .don. .kixotes. or Mexican dialect: .don. .ki'otes. Chinese “Mao Zedong” .maudzydyn. Japanese “Fujiko” .fudjikos. or .fujikos.

## 4.9. Rules for inserting pauses

Summarized in one place, here are the rules for inserting pauses between Lojban words:

1. Any two words may have a pause between them; it is always illegal to pause in the middle of a word, because that breaks up the word into two words.

2. Every word ending in a consonant must be surrounded by pauses. Necessarily, all such words are cmevla.

3. Every word beginning with a vowel must be preceded by a pause. Such words are either cmavo, fu'ivla, or cmevla; all gismu and lujvo begin with consonants.

4. Every cmevla must be surrounded by pauses.

5. If the last syllable of a word bears the stress, and a brivla follows, the two must be separated by a pause, to prevent confusion with the primary stress of the brivla. In this case, the first word must be either a cmavo or a cmevla with unusual stress (which already ends with a pause, of course).

6. A cmavo of the form Cy must be followed by a pause unless another Cy-form cmavo follows.

7. When non-Lojban text is embedded in Lojban, it must be preceded and followed by pauses. (How to embed non-Lojban text is explained in Section 19.10.)

## 4.10. Considerations for making lujvo

Given a tanru which expresses an idea to be used frequently, it can be turned into a lujvo by following the lujvo-making algorithm which is given in Section 4.11.

In building a lujvo, the first step is to replace each gismu with a rafsi that uniquely represents that gismu. These rafsi are then attached together by fixed rules that allow the resulting compound to be recognized as a single word and to be analyzed in only one way.

There are three other complications; only one is serious.

The first is that there is usually more than one rafsi that can be used for each gismu. The one to be used is simply whichever one sounds or looks best to the speaker or writer. There are usually many valid combinations of possible rafsi. They all are equally valid, and all of them mean exactly the same thing. (The scoring algorithm given in Section 4.12 is used to choose the standard form of the lujvo – the version which would be entered into a dictionary.)

The second complication is the serious one. Remember that a tanru is ambiguous – it has several possible meanings. A lujvo, or at least one that would be put into the dictionary, has just a single meaning. Like a gismu, a lujvo is a predicate which encompasses one area of the semantic universe, with one set of places. Hopefully the meaning chosen is the most useful of the possible semantic spaces. A possible source of linguistic drift in Lojban is that as Lojbanic society evolves, the concept that seems the most useful one may change.

You must also be aware of the possibility of some prior meaning of a new lujvo, especially if you are writing for posterity. If a lujvo is invented which involves the same tanru as one that is in the dictionary, and is assigned a different meaning (or even just a different place structure), linguistic drift results. This isn't necessarily bad. Every natural language does it. But in communication, when you use a meaning different from the dictionary definition, someone else may use the dictionary and therefore misunderstand you. You can use the cmavo za'e (explained in Section 19.11) before a newly coined lujvo to indicate that it may have a non-dictionary meaning.

The essential nature of human communication is that if the listener understands, then all is well. Let this be the ultimate guideline for choosing meanings and place structures for invented lujvo.

The third complication is also simple, but tends to scare new Lojbanists with its implications. It is based on Zipf's Law, which says that the length of words is inversely proportional to their usage. The shortest words are those which are used more; the longest ones are used less. Conversely, commonly used concepts will be tend to be abbreviated. In English, we have abbreviations and acronyms and jargon, all of which represent complex ideas that are used often by small groups of people, so they shortened them to convey more information more rapidly.

Therefore, given a complicated tanru with grouping markers, abstraction markers, and other cmavo in it to make it syntactically unambiguous, the psychological basis of Zipf's Law may compel the lujvo-maker to drop some of the cmavo to make a shorter (technically incorrect) tanru, and then use that tanru to make the lujvo.

This doesn't lead to ambiguity, as it might seem to. A given lujvo still has exactly one meaning and place structure. It is just that more than one tanru is competing for the same lujvo. But more than one meaning for the tanru was already competing for the right to define the meaning of the lujvo. Someone has to use judgment in deciding which one meaning is to be chosen over the others.

If the lujvo made by a shorter form of tanru is in use, or is likely to be useful for another meaning, the decider then retains one or more of the cmavo, preferably ones that set this meaning apart from the shorter form meaning that is used or anticipated. As a rule, therefore, the shorter lujvo will be used for a more general concept, possibly even instead of a more frequent word. If both words are needed, the simpler one should be shorter. It is easier to add a cmavo to clarify the meaning of the more complex term than it is to find a good alternate tanru for the simpler term.

And of course, we have to consider the listener. On hearing an unknown word, the listener will decompose it and get a tanru that makes no sense or the wrong sense for the context. If the listener realizes that the grouping operators may have been dropped out, he or she may try alternate groupings, or try inserting an abstraction operator if that seems plausible. (The grouping of tanru is explained in Chapter 5; abstraction is explained in Chapter 11.) Plausibility is the key to learning new ideas and to evaluating unfamiliar lujvo.

## 4.11. The lujvo-making algorithm

The following is the current algorithm for generating Lojban lujvo given a known tanru and a complete list of gismu and their assigned rafsi. The algorithm was designed by Bob LeChevalier and Dr. James Cooke Brown for computer program implementation. It was modified in 1989 with the assistance of Nora LeChevalier, who detected a flaw in the original tosmabru test.

Given a tanru that is to be made into a lujvo:

1. Choose a 3-letter or 4-letter rafsi for each of the gismu and cmavo in the tanru except the last.

2. Choose a 3-letter (CVV-form or CCV-form) or 5-letter rafsi for the final gismu in the tanru.

3. Join the resulting string of rafsi, initially without hyphens.

4. Add hyphen letters where necessary. It is illegal to add a hyphen at a place that is not required by this algorithm. Right-to-left tests are recommended, for reasons discussed below.

1. If there are more than two words in the tanru, put an r-hyphen (or an n-hyphen) after the first rafsi if it is CVV-form. If there are exactly two words, then put an r-hyphen (or an n-hyphen) between the two rafsi if the first rafsi is CVV-form, unless the second rafsi is CCV-form (for example, saicli requires no hyphen). Use an r-hyphen unless the letter after the hyphen is r, in which case use an n-hyphen. Never use an n-hyphen unless it is required.

2. Put a y-hyphen between the consonants of any impermissible consonant pair. This will always appear between rafsi.

3. Put a y-hyphen after any 4-letter rafsi form.

5. Test all forms with one or more initial CVC-form rafsi – with the pattern CVC ... CVC + X – for tosmabru failure. X must either be a CVCCV long rafsi that happens to have a permissible initial pair as the consonant cluster, or is something which has caused a y-hyphen to be installed between the previous CVC and itself by one of the above rules.

The test is as follows:

1. Examine all the C/C consonant pairs up to the first y-hyphen, or up to the end of the word in case there are no y-hyphens.

These consonant pairs are called "joints”.

2. If all of those joints are permissible initials, then the trial word will break up into a cmavo and a shorter brivla. If not, the word will not break up, and no further hyphens are needed.

3. Install a y-hyphen at the first such joint.

Note that the tosmabru test implies that the algorithm will be more efficient if rafsi junctures are tested for required hyphens from right to left, instead of from left to right; when the test is required, it cannot be completed until hyphenation to the right has been determined.

## 4.12. The lujvo scoring algorithm

This algorithm was devised by Bob and Nora LeChevalier in 1989. It is not the only possible algorithm, but it usually gives a choice that people find preferable. The algorithm may be changed in the future. The lowest-scoring variant will usually be the dictionary form of the lujvo. (In previous versions, it was the highest-scoring variant.)

1. Count the total number of letters, including hyphens and apostrophes; call it L.

2. Count the number of apostrophes; call it A.

3. Count the number of y-, r-, and n-hyphens; call it H.

4. For each rafsi, find the value in the following table. Sum this value over all rafsi; call it R:

 CVC/CV (final) (-sarji) 1 CVC/C (-sarj-) 2 CCVCV (final) (-zbasu) 3 CCVC (-zbas-) 4 CVC (-nun-) 5 CVV with an apostrophe (-ta'u-) 6 CCV (-zba-) 7 CVV with no apostrophe (-sai-) 8

5. Count the number of vowels, not including y; call it V.

The score is then:

(1000 * L) - (500 * A) + (100 * H) - (10 * R) - V

In case of ties, there is no preference. This should be rare. Note that the algorithm essentially encodes a hierarchy of priorities: short words are preferred (counting apostrophes as half a letter), then words with fewer hyphens, words with more pleasing rafsi (this judgment is subjective), and finally words with more vowels are chosen. Each decision principle is applied in turn if the ones before it have failed to choose; it is possible that a lower-ranked principle might dominate a higher-ranked one if it is ten times better than the alternative.

Here are some lujvo with their scores (not necessarily the lowest scoring forms for these lujvo, nor even necessarily sensible lujvo):

Example 4.71.

 zbasai zba + sai (1000 * 6) - (500 * 0) + (100 * 0) - (10 * 15) - 3 = 5847

Example 4.72.

 nunynau nun + y + nau (1000 * 7) - (500 * 0) + (100 * 1) - (10 * 13) - 3 = 6967

Example 4.73.

 sairzbata'u sai + r + zba + ta'u (1000 * 11) - (500 * 1) + (100 * 1) - (10 * 21) - 5 = 10385

Example 4.74.

 zbazbasysarji zba + zbas + y + sarji (1000 * 13) - (500 * 0) + (100 * 1) - (10 * 12) - 4 = 12976

## 4.13. lujvo-making examples

This section contains examples of making and scoring lujvo. First, we will start with the tanru gerku zdani (dog house) and construct a lujvo meaning doghouse, that is, a house where a dog lives. We will use a brute-force application of the algorithm in Section 4.12, using every possible rafsi.

The rafsi for gerku are:

 -ger-, -ge'u-, -gerk-, -gerku

The rafsi for zdani are:

 -zda-, -zdan-, -zdani.

Step 1 of the algorithm directs us to use -ger-, -ge'u- and -gerk- as possible rafsi for gerku; Step 2 directs us to use -zda- and -zdani as possible rafsi for zdani. The six possible forms of the lujvo are then:

 ger -zda ger -zdani ge'u -zda ge'u -zdani gerk -zda gerk -zdani

We must then insert appropriate hyphens in each case. The first two forms need no hyphenation: ge cannot fall off the front, because the following word would begin with rz, which is not a permissible initial consonant pair. So the lujvo forms are gerzda and gerzdani.

The third form, ge'u-zda, needs no hyphen, because even though the first rafsi is CVV, the second one is CCV, so there is a consonant cluster in the first five letters. So ge'uzda is this form of the lujvo.

The fourth form, ge'u-zdani, however, requires an r-hyphen; otherwise, the ge'u- part would fall off as a cmavo. So this form of the lujvo is ge'urzdani.

The last two forms require y-hyphens, as all 4-letter rafsi do, and so are gerkyzda and gerkyzdani respectively.

The scoring algorithm is heavily weighted in favor of short lujvo, so we might expect that gerzda would win. Its L score is 6, its A score is 0, its H score is 0, its R score is 12, and its V score is 3, for a final score of 5878. The other forms have scores of 7917, 6367, 9506, 8008, and 10047 respectively. Consequently, this lujvo would probably appear in the dictionary in the form gerzda.

For the next example, we will use the tanru bloti klesi (boat class) presumably referring to the category (rowboat, motorboat, cruise liner) into which a boat falls. We will omit the long rafsi from the process, since lujvo containing long rafsi are almost never preferred by the scoring algorithm when there are short rafsi available.

The rafsi for bloti are -lot-, -blo-, and -lo'i-; for klesi they are -kle- and -lei-. Both these gismu are among the handful which have both CVV-form and CCV-form rafsi, so there is an unusual number of possibilities available for a two-part tanru:

 lotkle blokle lo'ikle lotlei lo'irlei

Only lo'irlei requires hyphenation (to avoid confusion with the cmavo sequence lo'i lei). All six forms are valid versions of the lujvo, as are the six further forms using long rafsi; however, the scoring algorithm produces the following results:

 lotkle 5878 blokle 5858 lo'ikle 6367 lotlei 5867 5847 lo'irlei 7456

So the form blolei is preferred, but only by a tiny margin over blokle; "lotlei" and "lotkle" are only slightly worse; lo'ikle suffers because of its apostrophe, and lo'irlei because of having both apostrophe and hyphen.

Our third example will result in forming both a lujvo and a cmevla from the tanru logji bangu girzu, or logical-language group in English. (The Logical Language Group is the name of the publisher of this book and the organization for the promotion of Lojban.)

The available rafsi are -loj- and -logj-; -ban-, -bau-, and -bang-; and -gri- and -girzu, and (for cmevla purposes only) -gir- and -girz-. The resulting 12 lujvo possibilities are:

 loj -ban -gri loj -bau -gri loj -bang -gri logj -ban -gri logj -bau -gri logj -bang -gri loj -ban -girzu loj -bau -girzu loj -bang -girzu logj -ban -girzu logj -bau -girzu logj -bang -girzu

and the 12 cmevla possibilities are:

 loj -ban -gir loj -bau -gir loj -bang -gir logj -ban -gir logj -bau -gir logj -bang -gir loj -ban -girz loj -bau -girz loj -bang -girz logj -ban -girz logj -bau -girz logj -bang -girz

After hyphenation, we have:

 lojbangri lojbaugri lojbangygri logjybangri logjybaugri logjybangygri lojbangirzu lojbaugirzu lojbangygirzu logjybangirzu logjybaugirzu logjybangygirzu lojbangir lojbaugir lojbangygir logjybangir logjybaugir logjybangygir lojbaugirz lojbangygirz logjybangirz logjybaugirz logjybangygirz

The only fully reduced lujvo forms are lojbangri and lojbaugri, of which the latter has a slightly lower score: 8827 versus 8796, respectively. However, for the name of the organization, we chose to make sure the name of the language was embedded in it, and to use the clearer long-form rafsi for girzu, producing .lojbangirz.

Finally, here is a four-part lujvo with a cmavo in it, based on the tanru nakni ke cinse ctuca or male (sexual teacher). The ke cmavo ensures the interpretation teacher of sexuality who is male, rather than teacher of male sexuality. Here are the possible forms of the lujvo, both before and after hyphenation:

 nak -kem -cin -ctu nak -kem -cin -ctuca nakykemcinctuca nak -kem -cins -ctu nakykemcinsyctu nak -kem -cins -ctuca nakykemcinsyctuca nakn -kem -cin -ctu naknykemcinctu nakn -kem -cin -ctuca naknykemcinctuca nakn -kem -cins -ctu naknykemcinsyctu nakn -kem -cins -ctuca naknykemcinsyctuca

Of these forms, nakykemcinctu is the shortest and is preferred by the scoring algorithm. On the whole, however, it might be better to just make a lujvo for cinse ctuca (which would be cinctu) since the sex of the teacher is rarely important. If there was a reason to specify male, then the simpler tanru nakni cinctu (male sexual-teacher) would be appropriate. This tanru is actually shorter than the four-part lujvo, since the ke required for grouping need not be expressed.

## 4.14. The gismu creation algorithm

The gismu were created through the following process:

1. At least one word was found in each of the six source languages (Chinese, English, Hindi, Spanish, Russian, Arabic) corresponding to the proposed gismu. This word was rendered into Lojban phonetics rather liberally: consonant clusters consisting of a stop and the corresponding fricative were simplified to just the fricative (tc became c, dj became j) and non-Lojban vowels were mapped onto Lojban ones. Furthermore, morphological endings were dropped. The same mapping rules were applied to all six languages for the sake of consistency.

2. All possible gismu forms were matched against the six source-language forms. The matches were scored as follows:

1. If three or more letters were the same in the proposed gismu and the source-language word, and appeared in the same order, the score was equal to the number of letters that were the same. Intervening letters, if any, did not matter.

2. If exactly two letters were the same in the proposed gismu and the source-language word, and either the two letters were consecutive in both words, or were separated by a single letter in both words, the score was 2. Letters in reversed order got no score.

3. Otherwise, the score was 0.

3. The scores were divided by the length of the source-language word in its Lojbanized form, and then multiplied by a weighting value specific to each language, reflecting the proportional number of first-language and second-language speakers of the language. (Second-language speakers were reckoned at half their actual numbers.) The weights were chosen to sum to 1.00. The sum of the weighted scores was the total score for the proposed gismu form.

4. Any gismu forms that conflicted with existing gismu were removed. Obviously, being identical with an existing gismu constitutes a conflict. In addition, a proposed gismu that was identical to an existing gismu except for the final vowel was considered a conflict, since two such gismu would have identical 4-letter rafsi.

More subtly: If the proposed gismu was identical to an existing gismu except for a single consonant, and the consonant was "too similar” based on the following table, then the proposed gismu was rejected.

 proposed gismu existing gismu b p, v c j, s d t f p, v g k, x j c, z k g, x l r m n n m p b, f r l s c, z t d v b, f x g, k z j, s

See Section 4.4 for an example.

5. The gismu form with the highest score usually became the actual gismu. Sometimes a lower-scoring form was used to provide a better rafsi. A few gismu were changed in error as a result of transcription blunders (for example, the gismu gismu should have been gicmu, but it's too late to fix it now).

The language weights used to make most of the gismu were as follows:

 Chinese 0.36 English 0.21 Hindi 0.16 Spanish 0.11 Russian 0.09 Arabic 0.07

reflecting 1985 number-of-speakers data. A few gismu were made much later using updated weights:

 Chinese 0.347 Hindi 0.196 English 0.16 Spanish 0.123 Russian 0.089 Arabic 0.085

(English and Hindi switched places due to demographic changes.)

Note that the stressed vowel of the gismu was considered sufficiently distinctive that two or more gismu may differ only in this vowel; as an extreme example, bradi, bredi, bridi, and brodi (but fortunately not brudi) are all existing gismu.

## 4.15. Cultural and other non-algorithmic gismu

The following gismu were not made by the gismu creation algorithm. They are, in effect, coined words similar to fu'ivla. They are exceptions to the otherwise mandatory gismu creation algorithm where there was sufficient justification for such exceptions. Except for the small metric prefixes and the assignable predicates beginning with brod-, they all end in the letter o, which is otherwise a rare letter in Lojban gismu.

The following gismu represent concepts that are sufficiently unique to Lojban that they were either coined from combining forms of other gismu, or else made up out of whole cloth. These gismu are thus conceptually similar to lujvo even though they are only five letters long; however, unlike lujvo, they have rafsi assigned to them for use in building more complex lujvo. Assigning gismu to these concepts helps to keep the resulting lujvo reasonably short.

 1st assignable predicate 2nd assignable predicate 3rd assignable predicate 4th assignable predicate 5th assignable predicate structure word (from cmalu valsi) Lojbanic (from logji bangu) compound word (from pluja valsi) Mathematical EXpression

It is important to understand that even though cmavo, lojbo, and lujvo were made up from parts of other gismu, they are now full-fledged gismu used in exactly the same way as all other gismu, both in grammar and in word formation.

The following three groups of gismu represent concepts drawn from the international language of science and mathematics. They are used for concepts that are represented in most languages by a root which is recognized internationally.

Small metric prefixes (values less than 1):

 .1 deci .01 centi .001 milli 10-6 micro 10-9 nano 10-12 pico 10-15 femto 10-18 atto 10-21 zepto 10-24 yocto

Large metric prefixes (values greater than 1):

 10 deka 100 hecto 1000 kilo 106 mega 109 giga 1012 tera 1015 peta 1018 exa 1021 zetta 1024 yotta

Other scientific or mathematical terms:

The gismu sinso and tanjo were only made non-algorithmically because they were identical (having been borrowed from a common source) in all the dictionaries that had translations. The other terms in this group are units in the international metric system; some metric units, however, were made by the ordinary process (usually because they are different in Chinese).

Finally, there are the cultural gismu, which are also borrowed, but by modifying a word from one particular language, instead of using the multi-lingual gismu creation algorithm. Cultural gismu are used for words that have local importance to a particular culture; other cultures or languages may have no word for the concept at all, or may borrow the word from its home culture, just as Lojban does. In such a case, the gismu algorithm, which uses weighted averages, doesn't accurately represent the frequency of usage of the individual concept. Cultural gismu are not even required to be based on the six major languages.

The six Lojban source languages:

 Chinese (from “Zhōngguó”) English Hindi Spanish Russian Arabic

Seven other widely spoken languages that were on the list of candidates for gismu-making, but weren't used:

 Bengali Portuguese Bahasa Melayu/Bahasa Indonesia Japanese (from “Nippon”) German (from „Deutsch“) French (from « Français ») Urdu

(Urdu and Hindi began as the same language with different writing systems, but have now become somewhat different, principally in borrowed vocabulary. Urdu-speakers were counted along with Hindi-speakers when weights were assigned for gismu-making purposes.)

Countries with a large number of speakers of any of the above languages (where the meaning of large is dependent on the specific language):

 English: American British Scottish Australian Canadian

 Spanish: Argentinian Mexican

 Russian: Soviet/USSR Ukrainian

 Arabic: Palestinian Algerian Jordanian Libyan Lebanese Egyptian Moroccan Iraqi Saudi Syrian

 Bahasa Melayu/Bahasa Indonesia: Indonesian Malaysian

 Portuguese: Brazilian

 Urdu: Pakistani

The continents (and oceanic regions) of the Earth:

 North American (from berti merko) Antarctican (from cadzu cipni) South American (from “Quechua”) African Polynesian/Oceanic European Asiatic

A few smaller but historically important cultures:

 Latin/Roman Sanskrit Hebrew/Israeli/Jewish Greek (from «Hellas»)

Major world religions:

 Buddhist Taoist Islamic/Moslem Christian

A few terms that cover multiple groups of the above:

 Jehovist (Judeo-Christian-Moslem) Semitic Slavic Hispanic (New World Spanish)

## 4.16. rafsi fu'ivla: a proposal

The list of cultures represented by gismu, given in Section 4.15, is unavoidably controversial. Much time has been spent debating whether this or that culture deserves a gismu or must languish in fu'ivla space. To help defuse this argument, a last-minute proposal was made when this book was already substantially complete. I have added it here with experimental status: it is not yet a standard part of Lojban, since all its implications have not been tested in open debate, and it affects a part of the language (lujvo-making) that has long been stable, but is known to be fragile in the face of small changes. (Many attempts were made to add general mechanisms for making lujvo that contained fu'ivla, but all failed on obvious or obscure counterexamples; finally the general zei mechanism was devised instead.)

The first part of the proposal is uncontroversial and involves no change to the language mechanisms. All valid Type 4 fu'ivla of the form CCVVCV would be reserved for cultural brivla analogous to those described in Section 4.15. For example,

Example 4.75.

 tci'ile
 Chilean

is of the appropriate form, and passes all tests required of a Stage 4 fu'ivla. No two fu'ivla of this form would be allowed to coexist if they differed only in the final vowel; this rule was applied to gismu, but does not apply to other fu'ivla or to lujvo.

The second, and fully experimental, part of the proposal is to allow rafsi to be formed from these cultural fu'ivla by removing the final vowel and treating the result as a 4-letter rafsi (although it would contain five letters, not four). These rafsi could then be used on a par with all other rafsi in forming lujvo. The tanru

Example 4.76.

 tci'ile ke canre tutra Chilean type-of-( sand territory)
 Chilean desert

could be represented by the lujvo

Example 4.77.

tci'ilykemcantutra

which is an illegal word in standard Lojban, but a valid lujvo under this proposal. There would be no short rafsi or 5-letter rafsi assigned to any fu'ivla, so no fu'ivla could appear as the last element of a lujvo.

The cultural fu'ivla introduced under this proposal are called rafsi fu'ivla, since they are distinguished from other Type 4 fu'ivla by the property of having rafsi. If this proposal is workable and introduces no problems into Lojban morphology, it might become standard for all Type 4 fu'ivla, including those made for plants, animals, foodstuffs, and other things.

# Chapter 5. “Pretty Little Girls' School”: The Structure Of Lojban selbri

## 5.1. Lojban content words: brivla

At the center, logically and often physically, of every Lojban bridi is one or more words which constitute the selbri. A bridi expresses a relationship between things: the selbri specifies which relationship is referred to. The difference between:

Example 5.1.

 do mamta mi You are-a-mother-of me
 You are my mother

and

Example 5.2.

 do patfu mi You are-a-father-of me.
 You are my father.

lies in the different selbri.

The simplest kind of selbri is a single Lojban content word: a brivla. There are three different varieties of brivla: those which are built into the language (the gismu), those which are derived from combinations of the gismu (the lujvo), and those which are taken (usually in a modified form) from other languages (the fu'ivla). In addition, there are a few cmavo that can act like brivla; these are mentioned in Section 5.9, and discussed in full in Chapter 7.

For the purposes of this chapter, however, all brivla are alike. For example,

Example 5.3.

 ta bloti That is-a-boat.
 That is a boat.

Example 5.4.

 ta brablo That is-a-large-boat.
 That is a ship.

Example 5.5.

 ta blotrskunri That is-a-(boat)-schooner.
 That is a schooner.

illustrate the three types of brivla (gismu, lujvo, and fu'ivla respectively), but in each case the selbri is composed of a single word whose meaning can be learned independent of its origins.

The remainder of this chapter will mostly use gismu as example brivla, because they are short. However, it is important to keep in mind that wherever a gismu appears, it could be replaced by any other kind of brivla.

## 5.2. Simple tanru

Beyond the single brivla, a selbri may consist of two brivla placed together. When a selbri is built in this way from more than one brivla, it is called a tanru, a word with no single English equivalent. The nearest analogue to tanru in English are combinations of two nouns such as lemon tree. There is no way to tell just by looking at the phrase lemon tree exactly what it refers to, even if you know the meanings of lemon and tree by themselves. As English-speakers, we must simply know that it refers to a tree which bears lemons as fruits. A person who didn't know English very well might think of it as analogous to brown tree and wonder, What kind of tree is lemon-colored?

In Lojban, tanru are also used for the same purposes as English adjective-noun combinations like big boy and adverb-verb combinations like quickly run. This is a consequence of Lojban not having any such categories as noun, verb, adjective, or adverb. English words belonging to any of these categories are translated by simple brivla in Lojban. Here are some examples of tanru:

Example 5.6.

 tu pelnimre tricu That-yonder is-a-lemon tree.
 That is a lemon tree.

Example 5.7.

 la .djan. barda nanla That-named John is-a-big boy.
 John is a big boy.

Example 5.8.

 mi sutra bajra I quick run
 I quickly run./I run quickly.

Note that pelnimre is a lujvo for lemon; it is derived from the gismu pelxu, yellow, and nimre, citrus. Note also that sutra can mean fast/quick or quickly depending on its use:

Example 5.9.

 mi sutra I am-fast/quick

shows sutra used to translate an adjective, whereas in Example 5.8 it is translating an adverb. (Another correct translation of Example 5.8, however, would be I am a quick runner.)

There are special Lojban terms for the two components of a tanru, derived from the place structure of the word tanru. The first component is called the seltau, and the second component is called the tertau.

The most important rule for use in interpreting tanru is that the tertau carries the primary meaning. A pelnimre tricu is primarily a tree, and only secondarily is it connected with lemons in some way. For this reason, an alternative translation of Example 5.6 would be:

Example 5.10.

That is a lemon type of tree.

This type of relationship between the components of a tanru is fundamental to the tanru concept.

We may also say that the seltau modifies the meaning of the tertau:

Example 5.11.

That is a tree which is lemon-ish (in the way appropriate to trees)

would be another possible translation of Example 5.6. In the same way, a more explicit translation of Example 5.7 might be:

Example 5.12.

John is a boy who is big in the way that boys are big.

This way that boys are big would be quite different from the way in which elephants are big; big-for-a-boy is small-for-an-elephant.

All tanru are ambiguous semantically. Possible translations of:

Example 5.13.

 ta klama jubme That is-a-goer type-of-table.

include:

• That is a table which goes (a wheeled table, perhaps).

• That is a table owned by one who goes.

• That is a table used by those who go (a sports doctor's table?).

• That is a table when it goes (otherwise it is a chair?).

In each case the object referred to is a goer type of table, but the ambiguous type of relationship can mean one of many things. A speaker who uses tanru (and pragmatically all speakers must) takes the risk of being misunderstood. Using tanru is convenient because they are short and expressive; the circumlocution required to squeeze out all ambiguity can require too much effort.

No general theory covering the meaning of all possible tanru exists; probably no such theory can exist. However, some regularities obviously do exist:

Example 5.14.

 do barda prenu You are-a-large person.

Example 5.15.

 do cmalu prenu You are-a-small person.

are parallel tanru, in the sense that the relationship between barda and prenu is the same as that between cmalu and prenu. Section 5.14 and Section 5.15 contain a partial listing of some types of tanru, with examples.

## 5.3. Three-part tanru grouping with bo

The following cmavo is discussed in this section:

 bo BO closest scope grouping

Consider the English sentence:

Example 5.16.

That's a little girls' school.

What does it mean? Two possible readings are:

Example 5.17.

That's a little school for girls.

Example 5.18.

That's a school for little girls.

This ambiguity is quite different from the simple tanru ambiguity described in Section 5.2. We understand that girls' school means a school where girls are the students, and not a school where girls are the teachers or a school which is a girl (!). Likewise, we understand that little girl means girl who is small. This is an ambiguity of grouping. Is girls' school to be taken as a unit, with little specifying the type of girls' school? Or is little girl to be taken as a unit, specifying the type of school? In English speech, different tones of voice, or exaggerated speech rhythm showing the grouping, are used to make the distinction; English writing usually leaves it unrepresented.

Lojban makes no use of tones of voice for any purpose; explicit words are used to do the work. The cmavo bo (which belongs to selma'o BO) may be placed between the two brivla which are most closely associated. Therefore, a Lojban translation of Example 5.17 would be:

Example 5.19.

 ta cmalu nixli bo ckule That is-a-small girl - school.

Example 5.18 might be translated:

Example 5.20.

 ta cmalu bo nixli ckule That is-a-small - girl school.

The bo is represented in the literal translation by a bracketed hyphen (not to be confused with the bare hyphen used as a placeholder in other glosses) because in written English a hyphen is sometimes used for the same purpose: a big dog-catcher would be quite different from a big-dog catcher (presumably someone who catches only big dogs).

Analysis of Example 5.19 and Example 5.20 reveals a tanru nested within a tanru. In Example 5.19, the main tanru has a seltau of cmalu and a tertau of nixli bo ckule; the tertau is itself a tanru with nixli as the seltau and ckule as the tertau. In Example 5.20, on the other hand, the seltau is cmalu bo nixli (itself a tanru), whereas the tertau is ckule. This structure of tanru nested within tanru forms the basis for all the more complex types of selbri that will be explained below.

What about Example 5.21? What does it mean?

Example 5.21.

 ta cmalu nixli ckule That is-a-small girl school.

The rules of Lojban do not leave this sentence ambiguous, as the rules of English do with Example 5.16. The choice made by the language designers is to say that Example 5.21 means the same as Example 5.20. This is true no matter what three brivla are used: the leftmost two are always grouped together. This rule is called the left-grouping rule. Left-grouping in seemingly ambiguous structures is quite common – though not universal – in other contexts in Lojban.

Another way to express the English meaning of Example 5.19 and Example 5.20, using parentheses to mark grouping, is:

Example 5.22.

 ta cmalu nixli bo ckule That is-a-small type-of (girl type-of school).

Example 5.23.

 ta cmalu bo nixli ckule That is-a-(small type-of girl) type-of school.

Because type-of is implicit in the Lojban tanru form, it has no Lojban equivalent.

Note: It is perfectly legal, though pointless, to insert bo into a simple tanru:

Example 5.24.

 ta klama bo jubme That is-a-goer - table.

is a legal Lojban bridi that means exactly the same thing as Example 5.13, and is ambiguous in exactly the same ways. The cmavo bo serves only to resolve grouping ambiguity: it says nothing about the more basic ambiguity present in all tanru.

## 5.4. Complex tanru grouping

If one element of a tanru can be another tanru, why not both elements?

Example 5.25.

 do mutce bo barda gerku bo kavbu You are-a-(very type-of large) (dog type-of capturer).
 You are a very large dog-catcher.

In Example 5.25, the selbri is a tanru with seltau mutce bo barda and tertau gerku bo kavbu. It is worth emphasizing once again that this tanru has the same fundamental ambiguity as all other Lojban tanru: the sense in which the dog type-of capturer is said to be very type-of large is not precisely specified. Presumably it is his body which is large, but theoretically it could be one of his other properties.

We will now justify the title of this chapter by exploring the ramifications of the phrase pretty little girls' school, an expansion of the tanru used in Section 5.3 to four brivla. (Although this example has been used in the Loglan Project almost since the beginning – it first appeared in Quine's book Word and Object (1960) – it is actually a mediocre example because of the ambiguity of English pretty; it can mean beautiful, the sense intended here, or it can mean very. Lojban melbi is not subject to this ambiguity: it means only beautiful.)

Here are four ways to group this phrase:

Example 5.26.

 ta melbi cmalu nixli ckule That is-a-((pretty type-of little) type-of girl) type-of school.
 That is a school for girls who are beautifully small.

Example 5.27.

 ta melbi cmalu nixli bo ckule That is-a-(pretty type-of little) (girl type-of school).
 That is a girls' school which is beautifully small.

Example 5.28.

 ta melbi cmalu bo nixli ckule That is-a-(pretty type-of (little type-of girl)) type-of school.
 That is a school for small girls who are beautiful.

Example 5.29.

 ta melbi cmalu bo nixli bo ckule That is-a-pretty type-of (little type-of (girl type-of school)).
 That is a small school for girls which is beautiful.

Example 5.29 uses a construction which has not been seen before: cmalu bo nixli bo ckule, with two consecutive uses of bo between brivla. The rule for multiple bo constructions is the opposite of the rule when no bo is present at all: the last two are grouped together. Not surprisingly, this is called the right-grouping rule, and it is associated with every use of bo in the language. Therefore,

Example 5.30.

 ta cmalu bo nixli bo ckule That is-a-little type-of (girl type-of school).

means the same as Example 5.19, not Example 5.20. This rule may seem peculiar at first, but one of its consequences is that bo is never necessary between the first two elements of any of the complex tanru presented so far: all of Example 5.26 through Example 5.29 could have bo inserted between melbi and cmalu with no change in meaning.

## 5.5. Complex tanru with ke and ke'e

The following cmavo are discussed in this section:

 ke KE start grouping ke'e KEhE end grouping

There is, in fact, a fifth grouping of pretty little girls' school that cannot be expressed with the resources explained so far. To handle it, we must introduce the grouping parentheses cmavo, ke and ke'e (belonging to selma'o KE and KEhE respectively). Any portion of a selbri sandwiched between these two cmavo is taken to be a single tanru component, independently of what is adjacent to it. Thus, Example 5.26 can be rewritten in any of the following ways:

Example 5.31.

 ta ke melbi cmalu ke'e nixli ckule That is-a-( pretty little ) girl school.

Example 5.32.

 ta ke ke melbi cmalu ke'e nixli ke'e ckule That is-a-( ( pretty little ) girl ) school.

Example 5.33.

 ta ke ke ke melbi cmalu ke'e nixli ke'e ckule ke'e That is-a-( ( ( pretty little ) girl ) school ).

Even more versions could be created simply by placing any number of ke cmavo at the beginning of the selbri, and a like number of ke'e cmavo at its end. Obviously, all of these are a waste of breath once the left-grouping rule has been grasped. However, the following is equivalent to Example 5.28 and may be easier to understand:

Example 5.34.

 ta melbi ke cmalu nixli ke'e ckule That is-a-( pretty type-of ( little type-of girl ) ) type-of school.

Likewise, a ke and ke'e version of Example 5.27 would be:

<