Here's some basic information about the Chinese language that might be helpful to people who don't speak Chinese.

  1. Introduction
  2. Chinese Dialects
  3. Traditional vs Simplified Characters
  4. Tones
  5. Romanization

Chinese Language Introduction

The Chinese language consists of one shared common written language and a whole bunch of spoken dialects.

The thing that differentiates Chinese from most other languages is the relationship between the spoken and written word. Most languages have a phonetic-based writing system, which simply means the pronunciation of a word determines how the word is written. The advantage of a phonetic-based writing system is that you are able to form unlimited number of words using just a few dozen letters.

The Chinese writing system on the other hand, is meaning-based instead of phonetic-based. Despite what the guy at local tattoo parlor says, there is no such thing as Chinese alphabet. Instead of representing a sound, each character represents an idea or meaning.

The advantage of a meaning-based writing system is that speakers of various Chinese dialects can share the same written language. Since the characters do not represent a sound, people of various dialects can apply their own pronunciation to each character. Everyone agrees 中文 means "Chinese language," even though they pronounce it differently.

The disadvantage of a meaning-based writing system is that there are tons and tons of characters. The eight-volume Honjyu daai cidin (Hanyu da cidian) 漢語大詞典 "Comprehensive Chinese Word Dictionary" published in 1986-1993 contains over 23,000 characters. However, the average person will come across 3,000 to 6,000 different characters in their daily life. Each character has a different number, type, and location of strokes.

Many words are formed by compounding 2 or more characters together. For instance, 電 electric + 視 look at = 電視 television; 電 electric + 視 look at + 台 platform = 電視台 television station.

While at first glance a meaning-based writing system might seem like a completely alien concept, billions of non-Chinese people use a similar system in their normal everyday daily life. In fact, if you read the previous paragraph, you just used it. No, I have not gone crazy. Reread the last paragraph and see if you can find what I'm talking about. See it yet?

If you see it, congrats. If not, here's the answer. Read outloud "2." Now ask someone who speaks Spanish, French, German, Russian, etc. to do the same thing. Even though different languages have different ways of pronouncing "2", they all mean the same thing.

To learn more about the Chinese written language, visit and FAQ.

Chinese Dialects

Depending on who you ask, there are between 5 and 13 major Chinese dialects. Within each dialect, there are even more subdivisions. The various dialects have different pronunciation, number of tones, vocabulary and grammar.

Sometimes the differences between various dialects are so great that the speakers will have absolutely no idea what each other are saying. Cantonese and Mandarin are as different as Spanish and French. A Cantonese speaker listening to Mandarin might be able to understand a few words here and there, but most of the time they would be completely clueless.

Sometimes the differences in pronunciation, grammar, and vocabulary are minor and there is high level of mutual comprehension. The speakers might think the other people talk with a real weird regional accent, but they can understand what each other is saying.

And then there are those in between cases where the dialects have enough similarities in sound and vocabulary that people speaking different dialects can somewhat understand each other. For instance my Cantonese speaking mother has hung around my dad's wacky pack of Toisaan-speaking relatives long enough that she has grown accustomed to the thick "ng-" sounding Toisaan dialect, and even though she does not understand each and every single word they say, she has a general idea of what they are talking about. I, on the other hand, can barely understand Cantonese and have no clue whatsoever about Toisaan.

word: definition Cantonese Toisaan Mandarin
修: to build sau1 sau xiu1
象: appearance zoeng6 zoeng xiang4

Just like when a native French speaker who learns English as a second language often speaks English with a French accent, the same thing happens when a native speaker of one Chinese dialect learns another Chinese dialect. For example my mom's Cantonese-speaking family says my dad speaks Cantonese with a Toisaan accent.

Traditional vs Simplified Characters

The Chinese Communist government believed that by simplifying the written language, literacy rates would increase, which would lead to a more educated public and help improve the social-economic conditions of the people and of the country. Starting in the 1950's the CCP starting "simplifying" the Chinese characters that they deemed overly complicated.

definition traditional simplified jyutping pinyin
Kwan surname gwaan1 guan1
Gwongdung (Guangdong) province, Hoiping (Kaiping) county gwong2 dung1 guang3 dong1

While the simplified characters are easier to write, the main problems with them are:

1) Only mainland China adopted the simplified characters 簡體字, Hong Kong and Taiwan still use the traditional characters 繁體字. Since the people in Hong Kong and Taiwan are taught the traditional characters, they can't fully comprehend the written material that come from mainland China because they don't recognize many of the simplified characters, and vice versa.

2) Two different encoding system were developed to display Chinese characters on computers. Computer documents or websites with traditional characters use "Big 5" encoding, and simplified documents and websites use "GB" encoding.

3) Simplified characters are just plain ugly. Yeah, that's right...I said ugly.


Throughout the years various schemes have been developed to represent and "spell" the chinese words with the Roman alphabet. Since many of the sounds in Chinese are not found in English, different people have different interpretations on how the Chinese sounds should be "spelled." For example our last name 關 can be spelled Kwan, Kuan, Quan, Guan, Kuon, Quon, Kwuan, Cuan, etc.


Wade-Giles was one of the earliest and most widely used romanization systems. It was first published by Thomas Francis Wade in 1867, and refined by Herbert Allen Giles in 1912. For decades Wade-Giles was the most commonly used system. This system did not represent the tones.

Mandarin - Hanyu Pinyin

*Han is the ethnic majority in china.

In 1958, the Chinese Communist Party (CCP) adopted Hanyu Pinyin, also known as Pinyin, as the standard system of romanizing Mandarin in mainland China. One of the reason pinyin was developed and promoted was because there were some wacky radicals within the party who wanted to replace the Chinese characters with a phonetic-based writing system in order to increase literacy and to make a clean break from traditional Chinese culture. Even though the moderate members of the party supported a standardized romanization method, they opposed the idea of completely abandoning the Chinese characters. Luckily the more moderate factions of the party prevailed and the party decided to support the use of both Chinese characters and pinyin.

Since pinyin was officially backed by the Chinese government, the United Nations adopted pinyin as internationally accepted standard system of romanizing Mandarin in 1977. That's why Peking (Wade-Giles) is now spelled Beijing (pinyin).

Pinyin uses a series of tonal markings above each syllable to represent the five tones in Mandarin.

For more information about pinyin visit the Wikipedia pinyin page.

Cantonese - Yale, Lau, Jyutping

Unlike Mandarin, there is no internationally recognized romanization standard for Cantonese. Yale, Lau, and jyutping are some of the more widely used romanization systems for Cantonese.

Yale and Sidney Lau are two of the more popular Cantonese romanization schemes. The Yale system was developed in the 1940's at Yale University to teach Cantonese and Mandarin to white folk in the U.S. The Lau system was developed in 1960's to teach Cantonese to the British imperialists pigs in Hong Kong.

In 1993 the Linguistic Society of Hong Kong (LSHK) decided to make things simpler by making things more complicated, and created yet another system called jyutping. The Hong Kong government is now pushing jyutping as the official romanization system. I'm using jyutping on this site.

The main difference between these various schemes is how the "spell" the Cantonese sounds and represent the tones. Yale uses tonal markings and adding an extra letter to represent the tones; Lau an jyutpin uses numbers.

The funny things is, even with all these different schemes floating around, when you go to Hong Kong, you will notice that many of the English signs and names follow none of these schemes. People just "spell" the words however they feel like it.

For more information about jyutping visit Wikipedia jyutping page or Jyutping Pronunciation Guide.


When I first started the family tree I wanted to include the Toisaan pronunciation of the names, since it was the language that our ancestors spoke. After doing a little research on the web, I quickly found out that no one ever bother trying to romanize Toisaanwaa. Since I don't have the will, desire, or knowledge to invent my own system, I decided to just stuck with Cantonese jyutping and Mandarin pinyin.


Tones are an integral part of the Chinese language. Every syllable you utter has a tone associated with it. Some dialects only have four tones, other dialects have as many as 13 tones.

If you say a word in the wrong tone, people might 1) just nod and pretend like they understand what you're saying even though they aren't really sure, 2) talk behind your back and laugh about bad your Chinese is, or 3)start yelling and hitting you because accidentally said something obscene and really offense to them.

Cantonese Tones

Depending on who you ask, Cantonese has 6 to 10 tones.

In theory Cantonese has 10 tones.

If you ask a native speaker from Hong Kong, they will say Cantonese has eight tones. They don't distinguish between (1) high level and (2) high falling. They usually just use the high level tone. In addition native speakers from Hong Kong usually just use the high and low clipped tones, they don't use the mid clipped tone.

Even though native speakers count the clipped sounds as separate tones, many of the books for learning Cantonese do not. The Yale romanization system has 7 tones. It groups the high, mid, and low clipped tones with the corresponding high, mid, and low flat tones.

Other romanization systems like the Sidney Lau and jyutpin system have 6 tones. They adopt the native speaker practice of not distinguishing between the high level and high falling tone. These systems also do not count the clipped tones as separate tone.

  tone description native Yale jyutpin word (jyutpin): definition
1 high level high level flat tone 1 1 1 分 (fan1): to divide
2 high falling starts high, falls to mid 1 2 1 分 (fan1): to divide
3 mid rising starts mid level, rises to high 2 3 2 粉 (fan2): powder
4 mid level mid level flat tone 3 4 3 瞓 (fan3): to sleep
5 low falling starts at mid level, falls to low 5 5 4 墳 (fan4): grave
6 low rising starts at low level, rises to mid 6 6 5 憤 (fan5): to be angry
7 low level low level flat tone 7 7 6 份 (fan6): a portion
8 high clipped high level flat tone with glottal stop* 4 1 1 濕 (sap1): damp
9 mid clipped mid level flat tone with glottal stop   3 3 霎 (sap3): instant
10 low clipped low level flat tone with glottal stop 8 7 6 十 (sap6): ten

*Glottal stop is a linguistic term that means instead of pronouncing the full syllable, the speakers stops shorts, and does not finishing saying the syallble. One example is the difference between "chip" and "chipmunk." "Chip" is said with a puff of air at the end, the "chip" in "chipmunk" stops short, and does not end with a puff of air.

Mandarin Tones

Mandarin has four tones and one neutral tone.

  tone description word (pinyin): definition
1 high tone high level flat tone 媽 (ma1): mother
2 rising starts in the middle and rises to high 麻 (ma2): tingling
3 dipping starts in middle dips to low and returns to the middle 馬 (ma3): horse
4 falling starts high and falls to low 罵 (ma4): to scold
5 neutral no tone (ma5)