Skip to main content
14

Friendly Lojban

Chapter 14. Morphology & lujvo

Three Word Classes

Every Lojban word belongs to exactly one of three classes, identifiable by its shape alone:

cmavo — structure words
Short particles with no consonant clusters: cu, le, mi, pu, je, lo, .i. They handle grammar — articles, conjunctions, tense markers, etc. Forms: V, CV, VV, CVV.
brivla — predicate words
Content words that end in a vowel and contain a consonant pair within the first five non-y letters. Three subtypes: gismu, lujvo, fu'ivla.
cmene — proper names
End in a consonant (hence always followed by a pause): la .teris., la .alis., la .lojban..

This three-way distinction is unambiguous: you can always tell which class a word belongs to by looking at (or hearing) its shape.

Recognizing words in a stream

When splitting continuous Lojban text or speech into words, use morphology first (same tests as the parser):

  1. Pauses and quotes — A . before a vowel-initial word is a real pause; la before a name requires pauses around cmene; ' between vowels is /h/. (See Chapter 19.)
  2. cmene — Ends in a consonant; must be wrapped in pauses (and usually la).
  3. cmavo — No consonant cluster; shapes V, CV, VV, CVV (and some longer compounds like cui, nai). If it could be cmavo or the start of brivla, the next letters decide.
  4. brivla — Has a permissible consonant pair in the first five letters (counting only non-y letters) and ends in a vowel.

ZOI / lo'u quotations and other non-Lojban fragments follow their own rules — see Chapter 17. For the full decision procedure, see the Word Recognition section in this chapter below — it covers the boundary cases you will actually encounter when reading.


gismu: Root Words

The ~1350 gismu are Lojban's primitive vocabulary. They are always exactly five letters long, always start with a consonant, always end in a single vowel, and always stress the first (penultimate) syllable.

Two shapes:

  • CVC/CV — e.g. klama, prenu, bridi
  • CCVCV — e.g. blanu, tricu, mlatu

Each gismu comes from sounds in the six most-spoken natural languages (Mandarin, English, Hindi, Spanish, Russian, Arabic), blended to maximize recognizability across language backgrounds.

Examples:

gismuMeaning
klamago/come (x₁ goes to x₂ from x₃ via x₄ by means x₅)
prenuperson
blanublue
melbibeautiful
cuktabook
mamtamother
patfufather
gerkudog
mlatucat
zdanihome/nest

No two gismu differ only in their final vowel (ensuring they can't be confused). Gismu are the building blocks for all compound words.


rafsi: Word Pieces

Each gismu has 2–5 rafsi (combining forms) used to build compound words. Rafsi are not standalone words — they only appear inside lujvo.

Complete rafsi shape typology:

ShapeLettersExampleNotes
CVC3kla-, ber-Short rafsi; most common
CCV3bla-, kla-Short; only for gismu starting with CC
CV'V3+apost.ka'a, se'iShort; vowel pair with h-sound
CVV3kai, mauShort; vowel diphthong
CCVC4klam-Long rafsi = gismu minus final vowel
CCVCV5klamaLong rafsi = the full gismu (only at end of lujvo)

Not every gismu has all six shapes — it depends on whether the gismu begins with CC, has a CV'V sequence, etc. Each gismu has at minimum a 4-letter and 5-letter long rafsi.

The tosmabru test:

When two CVC rafsi are joined, the result might accidentally look like a valid brivla starting at the wrong boundary. The test: if removing the final vowel of the whole lujvo and leaving the first five letters results in something that parses as a brivla, insert a y hyphen after the first CVC rafsi.

The test is named after the example: tosybau (one's-own-language — from tosto + bangu). Without y, tosbau is correct, but if the combination were tosmabru (animal species — tosto + mabru), the parser could split it as tos- + mabru (a gismu). The y prevents this: tosymabru.

In practice: whenever you join two CVC rafsi where the combined CC cluster at the join would be illegal or ambiguous, insert y. The lujvo-scoring algorithm tells you when this is needed.

Common rafsi for frequently used gismu:

gismuShort rafsi
klamakla, ka'a
prenupre
blanubla
melbimel, mle
mamtamam
patfupaf, pa'u
zmaduzma, mau
mlatumla
gerkuger, ge'u
zdanizda

lujvo: Compound Words

A lujvo is built by chaining rafsi together. It encodes a tanru (metaphorical combination) as a single unambiguous word with a fixed definition.

Process:

  1. Identify the tanru: e.g. skami pilno (computer user)
  2. Find rafsi: skami → sam-, pilno → pli or -pilno
  3. Chain: sampli (computer-user)

More examples:

TanruLujvoMeaning
barda blotibarblotiship (big boat)
mamta patfumampa'umaternal grandfather
zdani mlatuzdamlatuhouse cat
bridi valsibrivlapredicate word
zunle jamfuzuljmaleft foot
skami pilnosamplicomputer user

Unlike tanru (which are semantically vague), each lujvo has one specific fixed meaning. When you dictionary-define a lujvo, you lock in which interpretation of the underlying tanru it means.


Hyphen Letters

When chaining rafsi, consonant clusters must be maintained and the result must parse as a single word. Lojban uses letter hyphens to ensure this:

y-hyphen: inserted after a CVC rafsi when needed to prevent an illegal consonant cluster or word-boundary ambiguity:

pante tavlapatyta'a (not patta'att is illegal)

mudri siclumudysiclu (not mudsiclu — would split)

r-hyphen / n-hyphen: inserted after CVV rafsi to create a needed consonant cluster:

soi + saisoisai would be two cmavosoisai needs r: sorsai (using r-hyphen)

When following rafsi starts with r, use n instead: ro'i + re'oro'inre'o


zei: Ad-hoc lujvo

When you need to form a lujvo-equivalent from words that have no rafsi (especially cmavo or fu'ivla), use zei as a joiner:

bridi zei valsi = brivla (exact equivalent)

by. zei livgyterbilma = B-disease (where by. is the letter B)

zei lets you create compound predicates from any words, including borrowed terms.


fu'ivla: Borrowed Words

fu'ivla (copy-words) are loanwords for concepts that don't fit neatly into Lojban's gismu system — biological species, foods, technical jargon, cultural terms.

The four formal stages:

StageFormExample for "spaghetti"Notes
1Raw foreign word in la'o quotela'o gy.spaghetti.gy.Always works; no Lojbanization needed
2Lojbanized as a cmene (name)la spagetis.Treated grammatically as a name, not a predicate
3Lojbanized brivla with rafsi prefixspagetis (rare) or rafsi-prefixedThe standard fu'ivla; must have brivla morphology
4Full lujvo with fu'ivla as componente.g. cidja-spagetisRarely used

Stage 3 structural requirements (the most important):

A stage 3 fu'ivla must:

  1. Pass the brivla morphology test — it must look like a gismu or lujvo (not a cmavo or cmene).
  2. Have a consonant cluster within the first five letters (to distinguish it from gismu length).
  3. End in a vowel.
  4. Not accidentally parse as an existing Lojban word.

Since many borrowed words don't naturally have a CC cluster early, a rafsi prefix is prepended to force the shape. The rafsi must end in a consonant, and the borrowed stem must begin with a consonant (so the join creates a CC cluster):

cac- (rafsi of cacra, hour) + tuscactus is ambiguous; use kokso (coconut) built properly

tcati — tea (from Chinese chá; the initial tc cluster is native) ckafi — coffee (the ck cluster is provided by Lojbanization) blaci — glass (material)

If the foreign word doesn't naturally produce a CC cluster, prepend a meaningful rafsi as a classifier:

gri- (rafsi of grisi, grease) → grispolka = polka (dance related to jumping)

The choice of rafsi is semantic — it hints at the word's domain — but is otherwise flexible.

Examples of common fu'ivla:

  • tcati — tea
  • ckafi — coffee
  • patxu — pot
  • blaci — glass (material)
  • mledi — mold/fungus
  • xarju — pig

Word Recognition Algorithm

Because of these strict morphological rules, any string of Lojban sounds can be unambiguously segmented into words without spaces. The shapes uniquely identify word boundaries:

  • cmavo: short, no consonant cluster
  • gismu: exactly 5 letters, consonant cluster, ends in vowel
  • lujvo: 6+ letters, consonant cluster in first 5, ends in vowel
  • cmene: ends in consonant (pause follows)

This means Lojban speech is unambiguous at the word level before you even consider meaning.


cmene: Lojbanization Rules in Detail

Lojban names (cmene) must end in a consonant and be surrounded by pauses. Beyond those basics, the full rules are:

Consonant clusters inside cmene: Every consonant pair inside the name must be permissible by Lojban phonology rules (the same rules as gismu and lujvo). Impermissible clusters require a buffer vowel (usually y, i, or u) inserted between them.

Names ending in a vowel: Add a consonant, typically s or n:

  • Mary → .meris. or .merin.
  • Joe → .djos.
  • Sue → .sus.

Stress: The default is penultimate stress, but non-standard stress can be marked by capitalizing the stressed vowel:

  • Ivan (stress on first syllable) → .IVan. or .ivan. (lowercase assumes pen-ultimate)
  • A name like .karlos. naturally stresses kar (penultimate of the two syllables)

Lojbanization strategy:

  1. Identify the source pronunciation (not spelling).
  2. Map each sound to the nearest Lojban phoneme.
  3. Resolve impermissible clusters by inserting buffer vowels.
  4. Ensure the result ends in a consonant.
  5. Add a pause mark (period) before and after.

Examples:

SourceLojban formNotes
John.djan.English /dʒɑn/ → dj+a+n
Alice.alis.straightforward
George.djordj./dʒɔrdʒ/ → two dj clusters
Zhang.jang.Chinese /ʈʂɑŋ/ → j+a+ng
Smith.smiTs.th → ts; capital T marks stress
Nguyen.nguin.ng cluster is permitted

When a name could parse as a brivla: If a name's shape matches brivla morphology (CC cluster, ends in vowel), add a final consonant to force cmene parsing. For example, a character named Prenu ("Person") would need to be .prenus. to avoid being parsed as the gismu prenu.


Rules for Inserting Pauses

Pauses are mandatory (not just recommended) in seven situations:

  1. Before a cmene that begins with a vowel: The vowel would otherwise attach to the previous word. Write a period (.) before the name: .alis., .ivan..

  2. After every cmene: The final consonant needs a clear pause boundary: la .djan. not la .djana.

  3. Before and after SI/SA/SU (erasure words): These words erase what came before; they must be clearly bounded to avoid erasing the wrong thing.

  4. After ZO (single-word quoter): zo quotes the immediately following word; a pause after the quoted word ends the quotation: zo .djan. cu cmene = the word "john" is a name.

  5. Around ZOI and LA'O delimiters: The delimiter word before and after the foreign text must be surrounded by pauses so the parser knows it's a delimiter, not regular speech.

  6. After text that ends in a consonant cluster (if the next word begins with a consonant): To prevent the cluster from appearing to bridge into the next word.

  7. Around embedded non-Lojban text: Before and after any foreign-language passage embedded in Lojban speech.


Lujvo Place Structures: Selecting What Matters

A lujvo is built from a tanru, but its place structure needs to be determined — which places of the component gismu should survive in the final lujvo?

The standard method: take the place structure of the tertau (the main gismu, the last in the tanru), then add important places from the seltau where needed.

Example: mamta patfu (maternal grandfather)

  • mamta (mother): x₁ is mother of x₂
  • patfu (father): x₁ is father of x₂
  • The lujvo mampa'u: x₁ is a maternal grandfather of x₂

The x₂ place of mamta (the child) and x₂ of patfu (the child) collapse — they're the same thing. The lujvo absorbs both into one x₂.

Dependent places are places of the seltau that are already determined by a place of the tertau. They don't appear in the lujvo's place structure because they're not independent:

balsoi (great soldier, from barda + sonci)

  • sonci x₁ is a soldier of army x₂
  • barda x₁ is big
  • x₁ of barda is determined by x₁ of sonci (same entity)
  • So: balsoi x₁ is a great soldier of army x₂ — barda's place collapses

Symmetrical vs. asymmetrical lujvo:

In a symmetrical lujvo, both components contribute equally and the relationship is reciprocal:

datpre (different person): x₁ is a person different from x₂ in x₃ (from drata + prenu — "other person" is symmetric: A differs from B ↔ B from A)

In an asymmetrical lujvo, one component modifies the other directionally:

balsoi = great soldier (the bigness modifies the soldier, not vice versa) zdamlatu = house cat (the house constrains the cat's type)

Most lujvo are asymmetrical — the seltau narrows the tertau's meaning.


Comparatives and Superlatives

Lojban expresses comparison through specific gismu and BAI particles, not through inflection:

zmadu — x₁ exceeds x₂ in property x₃ by amount x₄ mleca — x₁ is less than x₂ in property x₃ by amount x₄ dunli — x₁ equals x₂ in property x₃

mi zmadu do le ka barda I exceed you in the property of being big. = I am bigger than you.

le plise cu mleca le perli le ka titla The apple is less sweet than the pear.

The BAI shorthand (same meaning as zmadu / mleca, but attaching to another selbri; se conversion is often clearer than bare mau/me'a — see Chapter 10):

mi zmadu do le ka barda I am bigger than you.

mi mleca do le ka barda I am smaller than you.

Compact comparative lujvo (-mau, -me'a) — citmau, citme'a, nelcymau, klamau, and friends — plus zenba/jdika for “more than before” and traji-based extremes (citrai, balrai), are spelled out with CLL pointers in Chapter 12 (section Comparative lujvo). This section’s zmadu / mleca / traji material is the tanru-level companion.

Superlatives use traji. Places: x₁ = the extreme individual; x₂ = property (ka); x₃ = which extreme (defaults to “more”, i.e. ka zmadu); x₄ = the comparison set.

le traji be le ka barda bei zo'e bei le'i prenu The one who is most big among the set of people.

Relative clause (note ke'a for x₁ of traji, and zo'e for x₃ so x₄ can be the set):

le prenu poi ke'a traji le ka ce'u barda ku zo'e le'i prenu The person who is biggest among the people.

The compound verai (from ve + rai) tags traji’s fourth place — “superlative among …” — and is often the clearest shortcut:

le prenu cu barda verai le'i prenu The person is biggest among the people.

The bare rai cmavo tags traji’s first place (“with superlative …”); for “among a set”, prefer verai or an explicit traji sentence.


Notes on gismu place structures

Unlike lujvo guidelines, gismu places were fixed case by case (the list is now frozen). A few pressures shaped them — the same ones that also influence sensible lujvo design:

PressureEffect
BrevityFewer places = easier to learn but less specific; gismu aim for broad coverage.
ConvenienceExtra places avoid coining new brivla when a slot already fits a common need.
Metaphysical necessityKeep a place only if it is essential to the concept; drop it if instances need not vary there.
RegularityRelated gismu tend to share parallel places (e.g. breed/species on animals).

Worked examples (CLL-style):

  • xekri — only “x₁ is black”: color is subjective; no “objective standard” place (ci'u or a lujvo can add one).
  • jbena — time and location places exist so le te jbena / le ve jbena are simple terms (birthday, birthplace), even though tense tags usually carry time/place for other bridi.
  • rinka — x₁ causes x₂; no agent place, because causes need not involve someone doing something (use gasnu / lujvo when you need an agent).
  • cinfo — x₂ breed exists for regularity across animal/plant gismu, even when the species is not very diverse.

Ordering habits (not strict rules): places are often ordered by salience — e.g. klama puts the goer before the route. When both appear, destination tends to come before origin. “Under conditions” / “by standard” slots are often last.

ckaji (has / is characterized by) — important for property talk and adjacent to comparatives: x₁ is the entity, x₂ is the property (usually le ka …).

le gerku cu ckaji le ka xunre The dog has the property of being red.

For machine-checkable place types and glosses, see the project’s typed gismu reference (and the underlying formal-_gismu_.tsv in the source tree).


Lujvo Place Structures

When you form a lujvo, you do not simply inherit the place structure of the tanru it came from. A tanru always carries the place structure of its right-hand word (the tertau), but a lujvo needs to take all of its components into account. This section explains how to think about which places a lujvo should have and in what order.

The seltau and tertau

In a two-part tanru — and therefore in a two-part lujvo — the left component is called the seltau (modifier) and the right component is called the tertau (head). The overall concept is a type of whatever the tertau describes, modified by the seltau.

For example, in gerku zdani (dog house), zdani is the tertau (it's a type of house) and gerku is the seltau (the dog part is the modifier). The resulting lujvo gerzda describes a kind of zdani, not a kind of gerku.

How a lujvo gets its meaning

A tanru is deliberately vague: gerku zdani just means "some house that has something to do with some dog." The relationship between the seltau and tertau is left open. A lujvo, by contrast, locks in one specific interpretation. The lujvo-maker picks the most useful and most obvious relationship.

Almost always, the best relationship is found by noticing that one place of the seltau refers to the same thing as one place of the tertau. For gerzda:

  • zdani: z1 is a house for inhabitant z2
  • gerku: g1 is a dog of breed g2

A dog living in a house means z2 (the inhabitant) is the same as g1 (the dog). That overlap is the relationship. Since they refer to the same thing, that place only needs to appear once in the lujvo — it is merged.

So the tentative place structure of gerzda becomes:

z1 is a house for dog z2=g1 of breed g2

Dependent places

A place is dependent on another if you can predict its value once the other is known. For gerku, g2 (the breed) is dependent on g1 (the dog): once you know which specific dog you're talking about, the breed is determined. Dependent places that come from the seltau can often be dropped from the lujvo's place structure.

So in gerzda, the breed place g2 gets dropped — you're describing a doghouse, not a dog, so the breed is incidental. The final place structure is simply:

z1 is a house for dog z2

However, there's an important exception: dependent places that come from the tertau are kept. The tertau defines what kind of thing the lujvo is, and dropping its places would make the lujvo too different from the base word. If dropping a tertau place seems necessary, it's usually a sign that you've chosen the wrong tertau.

Sometimes a dependent place from the seltau is still important to keep. If you were making a lujvo for school building (kuldi'u, from ckule dinju), you'd want to keep the subject of the school even though it's technically dependent on the school identity, because music school building and elementary school building are very different.

Symmetrical and asymmetrical lujvo

When the overlap is between the first place of the seltau and the first place of the tertau — both components describing the same individual — the lujvo is called symmetrical.

Example: balsoi (great soldier), from banli sonci:

  • banli: b1 is great in property b2 by standard b3
  • sonci: s1 is a soldier of army s2

Here b1 = s1 (the same person is both great and a soldier). That's the symmetrical pattern.

When the first place of the seltau matches some other place of the tertau, the lujvo is asymmetrical.

Example: gerzda above — g1 (first place of gerku) matches z2 (second place of zdani), not z1. The lujvo is about the house, not the dog.

In principle, any asymmetrical lujvo could be made symmetrical by applying a SE conversion to one component. gerzda (asymmetrical) could be replaced by gerselzda (symmetrical: dog-housed-in), but that would make the first place the dog rather than the house, which is backwards for the meaning doghouse. Shorter and more direct is usually better.

Ordering the places

Once you've selected which places survive, you need to arrange them in a sensible order. The rules are:

For symmetrical lujvo: tertau places come first, then any surviving seltau places.

Example: balsoi place structure:

b1=s1 is a great soldier of army s2 in property b2 by standard b3

The tertau (sonci) places come first: s2 (army). Then the surviving seltau (banli) places: b2 (property), b3 (standard).

For asymmetrical lujvo: the seltau places are inserted immediately after the tertau place they share. Remaining tertau places follow after.

Example: dalmikce (veterinarian, from danlu mikce — animal doctor):

  • danlu: d1 is an animal of species d2
  • mikce: m1 is a doctor to patient m2 for ailment m3 using treatment m4

Here d1 = m2 (the animal is the patient). Place structure:

m1 is a doctor for animal m2=d1 of species d2 for ailment m3 using treatment m4

After the shared place m2=d1, the remaining seltau place d2 (species) is inserted, then the remaining tertau places m3 and m4.

Lujvo with more than two parts

Multi-part lujvo are easiest to understand as nested binary tanru. Treat the whole lujvo as having two components, where one or both of those components may themselves be lujvo.

Example: bavlamdei (tomorrow), from ba (future) + lamji (adjacent) + djedi (day). Think of it as bavla'i (next-after) + djedi (day), where bavla'i is itself an intermediate lujvo.

Build the place structure by composing the component place structures in the same way as for two-part lujvo, working from the inside out.

Eliding SE rafsi from the seltau

It is very common to drop the rafsi for SE conversion words (se, te, ve, xe) from the seltau of a lujvo, producing a shorter word. This is generally safe when the intended interpretation is clear and the alternative (without SE) would be implausible.

Example: ti'ifla (bill, proposed law), from stidi flalu (suggest + law). The second place of stidi (what is suggested) lines up with the first place of flalu (the law), but that means we'd normally need selti'i (suggested-thing) as the seltau. ti'ifla drops the sel- but still carries the same place structure as selti'ifla would have.

The convention is: give such lujvo the place structure they would have with the appropriate SE inserted. Just be aware that ambiguity is possible if another interpretation is equally plausible.

Eliding SE rafsi from the tertau — don't!

Dropping SE from the tertau is much more dangerous and should generally be avoided.

Consider translating blue-eyed. You might be tempted to use blakanla (from blanu kanla, blue + eye). But Jack is not an eye — he has eyes. The correct tertau is selkanla (bearer-of-eyes). Using the wrong tertau produces a lujvo whose first place is the eye, not the person with the eye, which means you'd always need se blakanla to get to the right referent. Instead, use blaselkanla with the SE made explicit.

Eliding KE and KEhE rafsi

Grouping cmavo ke and ke'e are often dropped from lujvo for brevity. This is usually fine when the correct grouping is obvious from context or plausibility.

Example: zernerkla (to sneak in) almost certainly comes from zekri ke nenri klama (crime-(inside-go)), since zekri nenri (crime-inside) makes little sense as a unit. The dropped ke doesn't cause confusion here.

However, be careful when the alternative grouping is also plausible — two different lujvo with different meanings can result from the same rafsi sequence depending on how the implicit grouping is read.

Note: if you want to apply a scalar negation (na'e, to'e) or SE conversion to an entire lujvo, it is safer to keep them as two words or use an explicit ke rafsi rather than just prepending the conversion rafsi.

Abstract lujvo

NU abstractors (nu, ka, ni, du'u, etc.) can participate in lujvo construction. When they do, all the places of the abstracted predicate become extra places of the lujvo, shifted down by one to leave room for the abstraction event place at position x₁.

Example: nunkla (from nu klama, event-of-going):

nu1 is the event of k1's going to k2 from k3 via k4 by means k5

The nu place comes first (x₁ = the event), then all five places of klama follow as x₂–x₆.

For abstractors that have a second place (like ni, where x₂ is the measurement scale), that second place is placed after all the predicate places rather than before them.

The rafsi jax- corresponds to jai. When used in a lujvo, any fai place remains a fai place of the lujvo and does not participate in the numbered place structure.

Abstract lujvo are a common and productive pattern. English words ending in -hood, -ness, or -dom often map to nun- lujvo (from nu) or kam- lujvo (from ka): kambla = blueness.

Implicit-abstraction lujvo

A particularly important pattern arises when the seltau effectively serves as the selbri of an event abstraction that fills a place of the tertau — and that abstraction relationship is not spelled out, but is instead deducible from the semantics.

Example: ctigau (to feed), from citka gasnu (eat + agent-of). The place structure of gasnu requires its g2 place to be an event. If the seltau is citka (to eat), the listener can deduce that an event of eating is involved, even though the nu abstractor rafsi is absent. The final place structure is:

g1 (agent) causes c1 to eat c2

This is equivalent to the more explicit but wordier nunctikezgau, but shorter and equally clear in context.

Other gismu with event places (rinka, basti, galfi, jgina, etc.) can form implicit-abstraction lujvo the same way. For example, likygau (to liquefy):

g1 causes l1 to be liquid of composition l2 under conditions l3

Use implicit-abstraction lujvo when the implicit event is unambiguously recoverable. If the symmetrical interpretation (both an agent and the thing being described) is equally plausible, the implicit-abstraction reading can be confusing.

Anomalous lujvo

Some lujvo in common use don't perfectly follow the guidelines above — either because the seltau-tertau overlap is indirect, or because the veljvo doesn't fully capture the relationship. lange'u (sheepdog) is a classic example: a sheepdog is neither a sheep-breed dog nor a sheep that is a dog. Its real meaning is dog that controls a sheep flock, which requires a third component (jitro, to control) not present in the rafsi sequence. The shorter form lange'u is used as an abbreviation for the fuller but unwieldy terlantroge'u, and it inherits that longer lujvo's place structure.

Anomalous lujvo are acceptable and common — just be aware that they require more interpretive effort from the listener and should ideally have their place structure clearly documented.


The Lujvo-Making Algorithm

Given a tanru to turn into a lujvo, the formal process is:

  1. For every component except the last, choose a 3-letter or 4-letter rafsi.
  2. For the last component, choose a 3-letter (CVV or CCV) or 5-letter (long) rafsi.
  3. Join the rafsi into a single string.
  4. Insert hyphen letters where required (see the rules in the previous section on rafsi). Work right to left when checking, since the tosmabru test (step 5) depends on what comes after.
  5. The tosmabru test: if the lujvo begins with one or more CVC-form rafsi followed by another CVC-form rafsi, check that the sequence cannot be misread as a cmavo followed by a shorter lujvo. If it can, insert a y-hyphen or choose a different rafsi.

The algorithm was designed to be implementable by computer, and lujvo-making software can generate all valid forms automatically.

Choosing the best form: the scoring algorithm

When multiple valid rafsi combinations exist for the same tanru, the lujvo scoring algorithm selects the preferred dictionary form. The lowest-scoring form wins. Here's how the score is calculated:

Let L = total letter count (including hyphens and apostrophes), A = number of apostrophes, H = number of hyphen letters (y, r, n), and R = sum of rafsi type values (CVC/C rafsi score 2; CVC rafsi score 5; CVV-with-apostrophe score 6; CCV score 7; CVV-without-apostrophe score 8; long rafsi score lower). Let V = vowel count (excluding y).

Score = (1000 × L) − (500 × A) + (100 × H) − (10 × R) − V

In plain English, the algorithm strongly prefers shorter words, then penalizes apostrophes slightly less than full letters, then prefers fewer hyphens, then prefers "nicer" rafsi forms, and finally prefers more vowels as a tiebreaker.

Worked example (tanru gerku zdani, doghouse): using the rafsi choices for gerku and zdani, the scoring algorithm (described in the previous section) builds six hyphenated candidates. Their scores (lower = better dictionary form):

CandidateScore
gerzda5878
gerzdani7917
ge'uzda6367
ge'urzdani9506
gerkyzda8008
gerkyzdani10047

gerzda wins — fewest letters, no hyphens. The formula is the same as in the previous paragraph; computers (and the Scoring formula section above) list every tie-break. You do not need to score by hand unless you are coining a new lujvo for a dictionary.


The Gismu Creation Algorithm

If you've ever wondered why Lojban's root words sound vaguely familiar in multiple languages, here's why. Each gismu was created by a systematic algorithm designed to maximize recognizability across the six most widely spoken languages at the time: Chinese, English, Hindi, Spanish, Russian, and Arabic.

The process:

  1. Find a word in each of the six source languages for the concept. Render it into Lojban phonetics (simplify consonant clusters, drop endings, map vowels).
  2. Try every possible 5-letter gismu shape (CVCCV or CCVCV). For each candidate, score how closely it matches the six source-language forms: 3+ matching letters in order = their count; exactly 2 matching consecutive letters = 2; otherwise 0.
  3. Divide each match score by the length of the source word and multiply by a language weight (proportional to speaker population, with second-language speakers counted at half). Sum the weighted scores.
  4. Eliminate any candidate that conflicts with an existing gismu (identical, or identical except for the final vowel — since those would share a 4-letter rafsi).
  5. The highest-scoring remaining form becomes the gismu. Occasionally a slightly lower-scoring form is used to provide a more useful rafsi.

This is why patfu (father) sounds like padre/paternal to Romance language speakers, nanmu (man) echoes nán (Chinese) and nam (Hindi), and so on.

Cultural and non-algorithmic gismu

A small number of gismu were not created by the algorithm. They fall into a few groups:

  • Lojban-specific concepts: words like cmavo, lujvo, lojbo, mekso, gismu itself — coined by combining or shortening other Lojban words. These are conceptually lujvo-like but are given gismu status (and rafsi) to keep lujvo built from them reasonably short.
  • Assignable predicates: broda, brode, brodi, brodo, brodu — the five "pro-brivla" variables used for temporary selbri assignments (see Chapter 5).
  • International scientific vocabulary: roots for chemical elements, SI units, and mathematical constants drawn from the international language of science.
  • Cultural gismu: names for specific cultures, nations, or religions where the algorithm was inapplicable.

All non-algorithmic gismu end in -o, making them easy to recognize as exceptions.


ClassShapeRole
cmavoV, CV, VV, CVV (no consonant cluster)Grammar particles
gismuCVC/CV or CCVCV (5 letters)Root predicates
lujvo6+ letters, has consonant clusterCompound predicates (from rafsi)
fu'ivlaBrivla-shaped loanwordsBorrowed concepts
cmeneEnds in consonantProper names

Also in this chapter: Recognizing words in a stream (boundary algorithm); gerku zdani scoring table (under Choosing the best form: the scoring algorithm).

Lujvo building steps:

  1. Form a tanru expressing the concept
  2. Find rafsi for each component
  3. Chain rafsi left-to-right, inserting hyphen letters as needed
  4. Verify: consonant cluster in first 5 non-y letters, ends in vowel
  5. Optionally use zei for words without rafsi

Lujvo place structure:

  • Default: start with tertau's places, add non-dependent seltau places
  • Dependent places collapse (same entity as a tertau place)
  • Symmetrical lujvo: both components equally present; asymmetrical: seltau narrows tertau

Comparatives:

  • zmadu (exceeds) / mleca (less than) / dunli (equals) with le ka property
  • BAI shortcuts: mau / se mau (from zmadu), me'a / se me'a (from mleca); verai (superlative among a set, from traji x₄); bare rai tags traji x₁ (“with superlative …”)