End of the Cantonese Goldlisting Project

In March 2015, I posted about my Cantonese experiment, where I used a list of characters and a dictionary to goldlist Cantonese. At first, I thought it was just going to be a slightly inefficient way of going about it, but there were several unforeseen problems that made the experiment a major failure. Now that the project is in the closing stages and I have to wait before I can keep distilling, I wanted to write about what I learned. To be frank, it was more of a waste of time and effort than expected, and I would not do it again.

The whole project has spanned five years—seven if I count the work I did before the actual experiment. While I haven’t actually spent five years on it, this project has caused me to lose more time than what I actually spent working on it. For most of the last five years, I haven’t done much goldlisting. This was partly because of work, but also because all the problems that became apparent really demotivated me. That was probably the worst outcome of the entire project, because I could have finished many shorter projects instead of wasting time on one big one.

Length of the Project

I originally thought it would only take me about 20,000 headlist lines, but I didn’t stop until I reached 40,000. There was still no end in sight, and it was apparent that I had included hundreds (if not thousands) of words I shouldn’t have. There were also many duplicates, because it was hard to keep track of which characters I had already covered.

Lack of Context

I knew this before I started, but I didn’t think lack of context would be such a big problem. I had done small-scale projects like this before, and those worked out well. I sometimes misunderstood what a word meant or how to use it, but then I’d encounter it in the wild and realize the mistake, and although I had no context, the words I was learning were all very basic, so I could always imagine a context and be fairly sure it was accurate. This time, the words ranged from very basic all the way to needlessly advanced, and I had no idea which were which, because although CantoDict does divide them up into five categories, almost all of the words are put in the middle category, which kind of defeats the purpose.

Source Material

The biggest problem turned out to be my choice of source material. CantoDict is maintained by volunteers who work when they can and want to. Nobody is responsible for checking every single entry to ensure accuracy, so there are many inaccurate entries. I suspect these were added and parsed automatically, because they often lack the changed tones and are listed with their citation tones instead, and others use the wrong reading of one of the characters.

For this reason, it is useful as a reference or second opinion, but should not be relied upon alone (but be warned: many online dictionaries get their Cantonese data directly from CantoDict). Fortunately, the editors do a good job linking to discussions about each entry when they come up on the forums, in which case you do get a second opinion right there.

Early on, I did not realize just how many wrong entries there would be. I used the forums to ask when I suspected something was off, but I didn’t always get a reply there either, so I soon resorted to asking friends instead. Eventually, I got pretty good at spotting suspicious words to ask about, and this revealed another problem, namely that many of the words and expressions were completely unknown to my friends! Often, I’d ask how to pronounce something, and they’d ask me what it was supposed to mean.

The sheer amount of such words was the main reason I not only stopped adding to the headlist, but also simply gave up and crossed out hundreds of the words I had left. I crossed out anything that seemed suspicious, although I’d sometimes ask someone to make sure. I’m sure I also crossed out many legitimate words by doing it this way, but it was better than wasting more time learning the useless ones.

Final Stages

The last stages of the project involved filling up the last bronze book and distilling all the words I have left. The reason for this is that the CantoDict headlist only reached 38725 lines, so I had space left over. I filled it up with Teach Yourself Cantonese, Colloquial Cantonese, and Intermediate Cantonese. I wish I had started with these instead of saving them for last, but they also came with their own problems, which I may write about later.

After cutting out all the words I didn’t trust, there wasn’t that much left to distill, but I did it anyway. Now, I’m all caught up and waiting to be able to keep distilling. I estimate that I’ll finish the entire project in less than two months from now and have 16 bronze books, 3 silver books, and 1 gold book to show for it.


This experiment was really a failure. It was meant to be a somewhat inefficient way to achieve a very useful goal, but it turned out not to be. Even if I had used a more efficient method with the same source material, I would just have failed faster (which would have been preferable, but now I know).

That’s not to say I got nothing out of it, though. I did learn thousands of useful words and expressions and I got really good at spotting suspicious information about Cantonese (which did come in handy as I was going through the textbooks at the end—Cantonese learning materials always come with lots of mistakes in them, unfortunately!). Still, it wasn’t worth the wasted time and effort in the end, and I would not do it this way again.

The Way Forward

Although the experiment is over, my Cantonese goldlisting doesn’t have to be. I only got halfway through Intermediate Cantonese before running out of pages in my bronze book. I could start a 17th bronze book and keep going, but I haven’t decided yet. For future projects, I’ll also choose dictionaries that at least have example sentences. I do happen to have one for Cantonese, namely 東方廣東語辭典 (a Cantonese–Japanese dictionary that is better than any Cantonese–English one I’ve seen), so I might goldlist that and see how it goes. It is unfortunately not without errors either, but it’s far more accurate than CantoDict.

Traditional Japanese

Most people have heard of the opposition between Traditional Chinese and the other widely used Chinese writing system, Simplified Chinese, but Japanese tends to be viewed as a monolith. This is probably because both TC and SC are widely used, but Japanese is almost always written in only one orthography—the one introduced in the postwar reforms.

Even when people have heard of the reform, it probably wouldn’t occur to them to refer to the postwar orthography as “Simplified Japanese”, even though that is exactly what it is. My use of “Traditional Japanese” mirrors that of “Traditional Chinese” to distinguish it from “Simplified Japanese”, partially for laughs, but mostly seriously.

What is Traditional Japanese?

I admit that I am, ironically, simplifying this a little bit. There are three character sets that are relevant to this discussion of Japanese writing: hiragana, katakana, and Chinese characters. Since the two former sets are interchangeable, we can consider them collectively and refer to them as kana. When I say “Traditional Japanese”, I mean both the traditional shape of the Chinese characters and the traditional kana spelling.

Traditional Characters

It is not unheard of for historical texts to now be rendered in what is often called old kana spelling or historical kana spelling, but with simplified characters. It also wasn’t unheard of historically to simplify characters here and there, but the standard shape was shared throughout the Sinosphere (roughly modern Japan, Korea, Vietnam, and Greater China), and is the shape recorded in the KangHsi Dictionary (康煕字典). Some people also use the traditional shapes of characters with the new kana orthography, but this is even less common.

Traditional characters are rather straightforward, since they correspond to the KangHsi shapes. Sometimes, though, the preferred character variants in Japan differed from those in China, and this is still the case, so you usually see 踊, 澁, 雇, 勞働, and 豫定 in Japan and 踴, 澀, 僱, 勞動, and 預定 in China. There are also some characters that were made in Japan, such as 込, 畑, and of course 働 above. In one case, four characters are collapsed into one: 辨 (and its variant 辧), 辯, 瓣, and 弁 are all written 弁 in the new orthography, but it isn’t uncommon to see it used for 辮 and 辦 as well. In addition, for some words, the character spellings were changed to variant spellings that used more common characters, such as 交叉→交差.

Traditional Kana Orthography

The traditional kana spelling is slightly more complicated. While the new kana spelling has a low number of possible spelling corresponding to phonemes (almost 1:1, but not quite), the traditional one has many ways to spell the same sound. This is because it preserves distinctions that have been lost in speech. Consider /oː/, which can be spelled ⟨おう, おお⟩ in the new orthography, but also ⟨おを, おほ, おふ, あう, あふ⟩ in the traditional one. This does make it more difficult to spell, but reading is still straightforward, because these sequences are always read /oː/, with one exception: verbs ending in /u/. A verb like ⟨會ふ⟩ ‘to meet’ would be rendered ⟨あふ⟩, but read /aꜜu/, not /oꜜː/. (Many people do read it as /oꜜː/ when reading texts in Traditional Japanese, but I consider that an erroneous reading pronunciation.)


The simplified kana orthography collapses these distinctions to correspond to pronunciation:

  • わ行 ‘wa-line’ kana and あ行 ‘a-line’ kana: ゐ→い, ゑ→え, を→お. (The new orthography retains を only as an accusative marker.)
  • Word-medial (intervocalic) は行 ‘ha-line’ and わ行 ‘wa-line’ (which is already mostly collapsed into the あ行 ‘a-line’—see above): あは→あわ, あひ→あい, あふ→あう, あへ→あえ, あほ→あお. (This only occurs within stems, and there are a few exceptions, such as あひる ‘duck’.)
  • くわ→か, ぐわ→が.
  • The so-called “yotsugana”: ぢ→じ, づ→ず. (The new orthography retains づ and ぢ where つ and ち are voiced in compound words in a process called 連濁 [rendaku]. In addition, 痔 ‘hæmorrhoids’ is still often written ぢ.)
  • あ+う sequences and お+う sequences: あう→おう, あふ→おう. (Notice that this leads to おお and おう still being distinguished.)
  • え+う sequences and よう: せう→しょう, けふ→きょう, etc.
  • い+う sequences and ゆう: きう→きゅう, じふ→じゅう, etc.
  • く, き, ち, つ→っ where it before か行 ‘ka-line’ and た行 ‘ta-line’ kana (and indeed anywhere つ marks a long consonant): 學校 ‘school’ ガクカウ→ガッコウ, 決定 ‘decision’ ケツテイ→ケッテイ, びつくり→びっくり etc. (Compound words are excepted from this, so e.g. 洗濯+機 ‘washing machine’ センタク+キ does not officially become センタッ+キ.)

These changes “stack”, as it were, so は行 ‘ha-line’ kana going to the わ行 ‘wa-line’ go on to the あ行 ‘a-line’ unless they become わ /wa/, and くわう also becomes こう.

New Distinctions

The new orthography did actually introduce a new distinction, namely small kana. ゃ, ゅ, ょ, っ (and ゎ) used to be written や, ゆ, よ, つ (and わ), which could cause ambiguities. However, many cases of く, き, and ち also became っ.

Etymological Kana Orthography

Because the traditional orthography makes many historical distinctions, it is often included in Japanese dictionaries (though not in Japanese—English ones), and especially character dictionaries. In fact, many of these dictionaries make more distinctions than the historical orthography in Sino-Japanese readings of characters. I call this deep historical kana orthography. Sino-Japanese readings are conventionally rendered in katakana:

  • Final ン and ム. These were considered variants until the early 20th century, but are distinguished in character dictionaries to show whether a character had -n or -m, respectively, in Middle Chinese.
  • Medial ヰ and ヱ, analogous to クワ. For example, 兄 was spelled キヤウ, but is often rendered クヰヤウ in “deep” orthography. (Some dictionaries also write them as small kana in these cases, but there are no Unicode characters for for small ゐ, ヰ, ゑ, or ヱ.)
  • Sometimes, ウ and イ are rendered in small kana as ゥ and ィ when they derive from a Middle Chinese /ŋ/.

My Preferred Spelling

I like to use the deepest kana orthography I possibly can, including small ヰ and ヱ, when I can help it. In practice, though, “deep” kana orthography looks exactly the same as the traditional spelling, since Sino-Japanese is almost always written in Chinese characters.

Although I prefer etymological spelling, as long as it’s (mostly) read in a regular way, I don’t really hate the new spelling. I see it as an informal shortcut that would be well suited for handwritten notes scribbled down in a hurry, but less appropriate for formal texts. This is also the attitude I have towards Simplified Chinese or indeed any kind of character variants.

How to Learn Traditional Japanese

Most serious Japanese dictionaries written for Japanese readers include the traditional spelling, but not the “deep” variant. For that, you’ll need a character dictionary written for Japanese readers, such as 漢字源, or rely on Wiktionary (especially Japanese Wiktionary is good at adding these spellings).

For traditional characters, you can usually rely on lists that show the simplified Jōyō characters (常用漢字), but you have to watch out for changed character spellings and which character to use for 弁, although some serious dictionaries actually do help you with that. An example is Sanseidō’s 新明解國語辭典, which also includes traditional spellings and Standard Japanese pitch patterns.

I hope this helps. I am also working on a learner’s dictionary and textbook for Traditional Japanese, but it will take a while to finish.

Lyrics to Korean Socialist and Patriotic Songs

I’m no socialist or supporter of either Marxism or 主體 (Juche), but I do enjoy socialist music. The best such music is currently made in north Korea, and the lyrics are readily available from sites like this one.

However, while some of the lyrics on that site include mixed-script versions, which is rare in the first place, not all of them do. Also, that site doesn’t have all the songs I like. That’s why I want to include mixed-script versions of some of my favourites here, starting with the song that got me interested in north Korean music in the first place.

Songs added so far:
攻擊戰이다、千里馬 달린다、더 높이 더 빨리、愛國歌、우리의 金正日同志、當身이 없으면 祖國도 없다


붉은旗 추켜들고 進擊해간다
銃대를 앞세우고 突擊해간다
一心의 千萬隊伍 이끌고가는
그 모습은 先軍旗幟다

攻擊 攻擊 攻擊 앞으로
將軍님의 革命方式은
白頭山번개처럼 攻擊
正日峰우뢰처럼 攻擊
攻擊 攻擊 攻擊戰이다

山嶽이 막아서도 踏步가 없다
大敵이 밀려와도 防禦가 없다
瞬間도 멈춤없이 맞받아치는
그 戰法은 必勝不敗다

攻擊 攻擊 攻擊 앞으로
將軍님의 革命方式은
白頭山번개처럼 攻擊
正日峰우뢰처럼 攻擊
攻擊 攻擊 攻擊戰이다

目標는 強盛大國 希望峰이다
標대는 主體偉業 勝利峰이다
先軍의 值線走廊 暴風쳐가는
그 걸음은 強行軍이다

攻擊 攻擊 攻擊 앞으로
將軍님의 革命方式은
白頭山번개처럼 攻擊
正日峰우뢰처럼 攻擊
攻擊 攻擊 攻擊戰이다

千里馬 달린다

百戰百勝 勞動黨 새 時代를 열었다
千里馬의 氣像을 온 世上의 떨치자
어서가자 빨리가자 千里馬 타고서
七箇年計劃을 앞당겨 나가자
에헤 에야차 에야차
共產主義 새 언덕이 저기 보인다
에헤 에야차 에야차
共產主義 새 언덕이 저기 보인다

勞動黨의 戰士들 하나로 뭉쳤다
빛나는 새 勝利 우리를 불은다
어서가자 빨리가자 千里馬 타고서
創造와 革新으로 새 奇蹟 올리자
에헤 에야차 에야차
祖國統一 새날이 밝안다
에헤 에야차 에야차
祖國統一 새날이 밝안다

數十年을 하루로 달리여 나간다
建設과 增產의 불길을 높여라
어서가자 빨리가자 千里馬 타고서
後孫萬代 幸福할 樂園을 꾸미자
에헤 에야차 에야차
勞動黨 旗발 따라 나간다
에헤 에야차 에야차
勞動黨 旗발 따라 나간다

더 높이 더 빨리

아~ 더 높이 아~ 더 빨리
將軍님의 領導 따라
달리자 더 높이 더 빨리

붉은旗 추켜들고 革命의 노래로 疾風 같이 달리자
이 땅에 발붙이고 未來를 내다보며 우리의 式으로
더 높이 더 빨리
將軍님의 領導 따라
달리자 더 높이 더 빨리

工場을 建設해도 土地를 整理해도 우리의 式으로
科學과 技術의 目標는 占領해도 最尖端水準으로
더 높이 더 빨리
將軍님의 領導따라
달리자 더 높이 더 빨리

先軍의 銃대 우에 千萬이 굳게 뭉친 우리 힘 無限타
勝利한 그 氣勢로 또 다시 飛躍하면 더 좋은 樂園되리
더 높이 더 빨리
將軍님의 領導 따라
달리자 더 높이 더 빨리


아침은 빛나라 이 江山
銀金에 資源도 가득한
三千里 아름다운 내 祖國
半萬年 오랜 歷史에
燦爛한 文化로 잘한?
슬기론 人民의 이 榮光
몸과 맘 다 바쳐 이 朝鮮
길이 받드세

白頭山 氣像을 다 안고
勤勞의 精神은 깃들어
真理로 뭉쳐진 억센 뜻
온 世界 앞서 나가리
솟는 힘 怒濤도 내밀어
人民의 뜻으로 선 나라
限없이 富強하는 이 朝鮮
길이 빛내세

우리의 金正日同志

幸福한 내 나라 한지붕아래
和睦한 大家庭을 꾸려주셨네
人民을 親兄弟로 키워주신 분
그이는 우리의 金正日同志

언제나 슬기론 人民이라고
素朴한 그 생각도 政策에 담네
人民을 先生으로 부르시는 분
그 先生의 스승은 金正日同志

人民을 熱烈히 사랑하시며
한平生 人民爲해 服務하시네
人民을 하늘처럼 믿으시는 분
그 하늘의 太陽은 金正日同志

當身이 없으면 祖國도 없다

사나운 暴風도 쳐몰아내고
信念을 안겨준 金正日同志*
當身이 없으면 우리도 없고、
當身이 없으면 祖國도 없다!

未來도 希望도 다 맡아주는
民族의 運命인 金正日同志*
當身이 없으면 우리도 없고、
當身이 없으면 祖國도 없다!

世上이 열百番 變한다해도
人民은 믿는다 金正日同志*
當身이 없으면 우리도 없고、
當身이 없으면 祖國도 없다!

*同志 is the original, but since he became general, the 將軍 version  has also been commonly used. The song can be sung with either 「金正日同志」 or 「金正日將軍」.

Berrjod and Berglyd

The name “Berrjod” comes from Old Norse “Berurjóðr”, which is composed of “bera” and “rjóðr”. The latter means “glade”, and the former is speculated to be the name of a river. But where the river name comes from is less clear. It could be from the verb “bera”, meaning “to carry”, or it could be from the noun “bera”, meaning “female bear”. Either way, it’s not surprising that “Berurjóðr” would become “Berrjod” (pronounced [ˈbǽʁju] locally) in modern Norwegian.

At one point, the modern form was likely mistaken as a corruption of “Bergljod”, a compound of “berg”, meaning “mountain”, and “ljod”, meaning “sound”. However, this would have had to come from “Berghljóð” or “Bjarghljóð”, which are not attested, to my knowledge. This misunderstanding of the name was written down in regular Dano-Norwegian spelling as “Berglyd” (pronounced [ˈbæ̀ʁɡˌly͑ːd] locally).

Both names are in common use, and can be used interchangeably. Although the most common form in writing is “Berglyd”, the official name of the farm is “Berrjod”. Signs in the area point to “Berglyd”, though the locals mostly say “Berrjod”. Its rarity in writing is probably also the reason that “Berrjod” is often misspelled “Berjo”.

Having said that, I personally much prefer “Berrjod”, for its authenticity.

著 and 着

Few characters are as quirky as 著. It seems to have been a variant that eventually split off from 箸, and now its own variant 着 is in the process of splitting off from 著. This illustrates a trend in the development of Chinese characters: they often start out as one sign that may represent several different (though usually similar-sounding) syllables and thus several meanings. Then, they split into more signs that share the burden of sounds and meanings associated with them. It’s rare for characters to merge, but there are many examples of such splits.


The Republic of China

It has been assigned 5 common readings in the national language of the Republic of China, or Standard Mandarin: ㄓㄨˋ、ㄓㄨㄛˊ、ㄓㄠˊ、ㄓㄠˉ、˙ㄓㄜ (zhù, zhuó, zháu, zhāu, zhe), as well as 2 that are rare enough to be ignored: ㄔㄨˊ、ㄓㄨˇ (chú, zhǔ). As far as the Ministry of Education of the Republic of China is concerned, 着 is a variant of 著, and that’s the whole story.

ㄓㄨㄛˊ is a merger of two Middle Chinese readings: one with a voiceless initial consonant, and one with a voiced one (cf. Cantonese zhoek³ and zhoek⁶). It is interesting to note that ㄓㄨㄛˊ and ㄓㄠˊ probably both developed from the reading with the voiced initial (one being literary and the other colloquial), but they ended up acquiring different meanings. ㄓㄠˉ is possibly a further development of the latter.

˙ㄓㄜ is a Mandarin-specific particle that had to be written down somehow, and this character did the job.

ㄔㄨˊ is only used in 著雍・著雝 (ㄔㄨˊ ㄩㄥˉ [chúyūng]), an alternate name for 戊, the fifth of the ten heavenly stems (十天干). ㄓㄨˇ is only used in 著任 (ㄓㄨˇ ㄖㄣˋ [zhǔrèn]), but I’m not sure what it means. Most dictionaries ignore these two, for obvious reasons.



In pre-war Japan, the situation was the same: 著 had two Sino-Japanese readings (since Japan never had the ㄓㄨㄛˊ–ㄓㄠˊ–ㄓㄠˉ split or a reading corresponding to the Mandarin particle): チョ and チャク (cho and chaku). Here from 詳解漢和字典, published just after WWII, identifying 着 as a vulgar variant of 著.



And here is the 著 entry in the same dictionary:

Screenshot at 2017-04-27 20:31:35 Screenshot at 2017-04-27 20:31:50 Screenshot at 2017-04-27 20:32:04 Screenshot at 2017-04-27 20:32:14


However, after WWII, 著 and 着 were assigned different roles: 著 for チョ, 着 for チャク. Native Japanese readings follow the meanings connected to the Sino-Japanese readings, so 著 for あらはꜜす and いちじるしꜜい (arawaꜜsu and ichijirushiꜜi) and 着 for きる and つꜜく・つくꜜ (kiru and tsuꜜku/tsukuꜜ).


Mainland China

When the Communist Party of China developed their standard, they assigned the two characters the same roles as the Japanese. 著 for ㄓㄨˋ, 着 for ㄓㄨㄛˊ、ㄓㄠˊ、˙ㄓㄜ and ㄓㄠˉ (this last one is also written 招, since they sound the same in Mandarin).



Korean usage is the same as well, suggesting this usage wasn’t just invented after the war. Korea never had any major character reforms, and still 著 is usually reserved for 저 (chŏː) and 着 is usually reserved for 착 (ch’ak). However, either can be used for either reading. In practice, though, most Koreans unfortunately don’t use characters at all anymore.



In Vietnam, characters are unfortunately used even less than in Korea, but dictionaries from the late 1800s suggest trứ and trước were both written 著, as in the Republic of China. Interestingly, there doesn’t seem to be a trược reading (corresponding to a Middle Chinese voiced initial consonant) for this character. The meanings that would be associated with that reading are listed under trước.

著 (trứ) in Bonet’s (1899) Vietnamese–French dictionary:Screenshot at 2017-04-25 21:37:54


And 著 (trước) in the same dictionary:

Screenshot at 2017-04-25 21:38:36


著 (trứ) in Génibrel’s (1898) Vietnamese–French dictionary:

Screenshot at 2017-04-25 21:35:31


And 著 (trước) in the same dictionary:

Screenshot at 2017-04-25 21:36:45 Screenshot at 2017-04-25 21:37:16


Modern dictionaries do include 着, and though they seem to prefer the reading trước for it, some also list trứ. 著 is always listed with both readings.


Hong Kong (and Macau)

Until recently, I was under the impression that Hong Kong usage, and presumably Macanese usage as well, differed from all of the above. CantoDict distinguishes them this way: 著 for zhy³ (ㄓㄨˋ) and zhoek³ (ㄓㄨㄛˊ), 着 for zhoek⁶ (ㄓㄨㄛˊ、ㄓㄠˊ、˙ㄓㄜ and ㄓㄠˉ).

著 in CantoDict (24.04.2017)

Screenshot at 2017-04-24 22:19:32


着 in CantoDict (24.04.2017)

Screenshot at 2017-04-24 22:20:07

However, my friend Kumono Shōta showed me the List of Graphemes of Commonly-used Chinese Characters, published by the Hong Kong Education Bureau. I was surprised to learn that people on the CantoDict forums seem to be wrong about the official Hong Kong division of 著 and 着. According to this list, 著 is for zhy³ (ㄓㄨˋ) and 着 is for zhoek³ (ㄓㄨㄛˊ、ㄓㄠˊ、˙ㄓㄜ and ㄓㄠˉ), just like in all the other jurisdictions (except for the Republic of China, of course).


著 in the List of Graphemes of Commonly-used Chinese Characters:


着 in the List of Graphemes of Commonly-used Chinese Characters:


Correspondance List

I made a list with the readings and the meanings the characters (roughly) correspond to!

Mandarin – Cantonese – Japanese – Korean – Vietnamese – English

ㄓㄨˋ – zhy³ – チョ – 저 – trứ – notoriety, authorship

ㄓㄨㄛˊ – zhoek³ – チャク – 착 – trước – to don

ㄓㄨㄛˊ – zhoek⁶ – チャク – 착 – trước – to make contact, to apply

ㄓㄠˊ – zhoek⁶ – チャク – 착 – trước – to ignite, to affect

ㄓㄠˉ – zhoek⁶ – チャク – 착 – trước – (boardgame) move

˙ㄓㄜ – zhoek⁶ – チャク – 착 – trước – stative particle

Visit to North Korea

In March of 2014, I went on a “March Madness” tour to the Democratic People’s Republic of Korea. I went on the same tour as Jan van der Aa and his sister, which was nice.

I had to travel through China, so I got a dual-entry visa and traveled to Peking. Although I wasn’t going to stay there for 72 hours, I couldn’t take advantage of the visa-free entry, because the Communist Party only let Norwegian citizens stay for 24 hours without a visa. I suspect they were angry about the Nobel Prize that went to Lau Hiubo (劉曉波). Thanks, Nobel Committee.

We had a briefing at Koryo Tours office, and left in a bus the next morning. I almost missed it, because I stupidly forgot my card in an ATM and ran back to retrieve it. Fortunately, the employees at Bank of China (中國銀行) were coöperative, so I got my card back and ran back to catch the bus.

When we got to the airport, we met our first north Koreans. They had just played in a football tournament, but were beaten by Portugal. It was very interesting to hear them call each other “comrade” (同志 to superiors, 동무 to everyone else). It was also fun to try talking to them in rudimentary Korean.

One of my favourite moments was when we boarded the plane. Leaving the generic-looking Chinese airport and being plunged into the atmosphere of the Air Koryo plane (from the Soviet era) was quite an experience. One of my favourite songs, My Country is the Best (내 사는 내 나라 第一로 좋아), was playing (and yes, they kept playing music until we landed). The stewardesses (all of them were female) were beautiful, and I got a seat right next to one of them. I didn’t get to take a picture with her, but since we got a newspaper that had Kim Jong-un on the front page (which every newspaper does), she did show me how to fold it in a respectful way. You can’t fold the leader of the country, so you have to fold the top and the bottom of the paper instead of the middle. The meal on the plane was something that looked like a hamburger. It was OK, I guess.

When we landed, we were reminded not to take pictures of any unfinished structures (finishing building projects quickly is apparently a matter of national pride), the military, or anything else that might put the country in a bad light. Other than that, we could take as many pictures as we liked.

We met a pair of guides, and when one of them said her name was Hwang (黃), she did it with a rising tone, so without thinking, I asked if she was Chinese. That was embarrassing. Fortunately, our guides turned out to be another pair, and they were great guides, too. Very friendly and good at their job. Both of them were called Kim (金).


On our way to Ryanggang Hotel (兩江호텔), we stopped at Pyongyang’s Arc of Triumph (凱旋門) and saw some kids on roller skates. While driving through Pyongyang, we could see the unfinished Ryugyong Hotel (柳京호텔) in the distance. Its exterior is finished, so we were allowed to take pictures of it.


We also passed the Chollima Statue (千里馬銅像), but never ended up visiting it.


The hotel was beautiful, and there was a bookstore in the lobby. By the end of the trip, I bought a book that was all about what a great man General Kim Jong-il was.


The room was one of the best I’ve ever stayed in, but I don’t usually stay in hotels anyway. It was very clean, and the bed was comfortable. The electricity kept going off, but that was convenient, because I couldn’t figure out how to turn off the lights. The downside was that I woke up every time the power came back. I found out how to control the lights eventually, and had a great night’s sleep.

Day 1

We had breakfast downstairs, and it included some great pancake things that I haven’t been able to find since. The waitress called them 알라지 (I had her check my spelling, too). The tea was also fantastic. I don’t think I’ve ever had tea that good.


As we drove through Pyongyang, one of the things I really liked was the absence of advertisements. There was a lot of slogans and calls to support the Party, but the only ad was for Pyonghwa Motors (平和自動車).


The bus took us to Kumsusan Palace of the Sun (錦繡山太陽宮殿), where the bodies of President Kim Il-sung and General Kim Jong-il lie in state. This was the most solemn part of the tour, and I had to wear a suit. We had to pass through several wind machines that blew off dust and hair from our clothes, and we walked through long corridors filled with art and other things commemorating the two leaders. We couldn’t take pictures inside, and we had to keep our arms straight down at all times. I absentmindedly put my hands behind my back at one point, and the other group’s guide scolded both me and our guide for it. Oops.

We got to see both of the leaders, and participated in a bowing ritual in order to walk around the body. First, we faced the feet and bowed. Then, we faced the body’s left side and bowed. Then, we faced the head without bowing. Then, we faced the right side and bowed before moving on to the next room. We did a lot of bowing on this trip.


Afterwards, we took pictures outside before moving on to the Revolutionary Martyrs’ Cemetary (大城山革命烈士陵), where, among others, Kim Jong-suk, President Kim Il-sung’s first wife, is buried.


Next, we visited the Mansudae Grand Monument (萬壽臺大紀念碑), where we bowed to the bronze statues of President Kim Il-sung and General Kim Jong-il. Any pictures of the statues had to include their whole body, not just part of it.


We could also see the Grand People’s Study House (人民大學習堂) on the way.


Finally got a passable picture of the Ryugyong Hotel!


These signs were everywhere, with varying messages and slogans on them. This one says 「先軍朝鮮의 太陽 金正恩將軍 萬歲!」 (“Long live the sun of Songun [‘military-first’] Korea”) in Korean letters, since Chinese characters are even rarer in the north than in the south.


Then, we visited President Kim Il-sung’s childhood home, where we saw this strange pot, which I’m sure I’ve seen on the Internet before.

Day 2


On the third day, we went to Kim Il-sung Square (金日成廣場), where the famous military parades are held.


There was no parade when we were there, and they are actually quite rare, but we did see the markings that make them possible.


This is the other side of the Grand People’s Study House, where the leader overlooks parades.


On the other side of the Taedong River (大同江), we could see the Juche Tower (主體思想塔).


We walked down the street to the foreign bookstore (外國文冊房). Our British tour leader set up the tour so that we could walk among normal people. We could also talk to them, though we didn’t have that much time.


Before going, I was told that, although we could pay in USD, RMB, or EUR, the best deals were in dollars. That was not the case. The fixed rate made EUR the best choice nearly 100% of the time, so I ended up borrowing Euros from Jan and pay him back in dollars later, using the actual rate. In this case, I bought a dictionary.


Afterwards, we went to the Pyongyang bowling hall. It was fun, and even though the power went out a few times, the machine still kept track of our scores.


Then, we drove to Kaesong (開城), where we stayed in a historical guesthouse. I opted not to shower there, since everything was old-fashioned, the water was cold, and I was lazy.


The food was great as usual.


There was also a dog soup option, which the Koreans call “sweet meat soup” (단고기국). The taste was OK, but I couldn’t finish it.

Day 3


From Kaesong, the Demilitarized Zone between north and south Korea was close by. We went to the Joint Security Area (共同警備區域) and the Peace Museum (朝鮮民主主義人民共和國平和博物館).


We even got to see some tourists on the south Korean side. We waved at them, but they were not allowed to make any moves back. It’s more relaxed on the northern side.


After that, we went back to Kaesong and up on a hill that overlooks this historical city.


We made sure to bow to this statue of President Kim Il-sung, of course.


We had some great food once again.


Then, we went to a museum. Apparently, it’s famous for its ginseng.


On the way back to Pyongyang, we stopped at this monument, the Arch of Reunification, or Monument to the Three-Point Charter for National Reunification (祖國統一三大憲章記念塔).


Then, we went to visit the Juche Tower. Too bad it was on the only cloudy day. The guides told us about the symbolism behind its measurements, and how it was completed ahead of schedule.


There were plaques for all the major donors to the tower.


I went to a public toilet, and took this picture on the way. The toilet was very clean, but I don’t have a picture of it.


It was too cloudy to get a good picture from the top, but there’s Kim Il-sung Square.


Next stop was the Monument to Party Founding (黨創建紀念塔). It was pretty cool, and the local guide was very friendly. I asked her how to say “the Workers’ Party of Korea” (朝鮮勞動黨) in Korean, because I wanted to hear whether she would say 「로동」 or 「노동」. She and another guide said 「조선노동당」 at the same time. Since I didn’t want the preceding /n/ to interfere, I repeated 「朝鮮…?」 to get them to say the next part. The man said 「로동당」, but the woman said 「노동당」, even though they were both from Pyongyang. Interesting.


Here’s Mansudae in the background. Lights go on at night to make sure the leaders’ faces are always lit up. Most of them stay on even when the power fails.

We went to visit an art gallery with lots of beautiful paintings. Many of them were of nature, but there were also quite a lot of revolutionary motifs. Mt. Paektu, the Arirang Mass Games, and a painting of the Young Pioneers that looks like an actual photograph! I also liked the painting of all the major monuments of Pyongyang.


And, of course, the President and the General.

Afterwards, we went to a brewery. I don’t drink, so it wasn’t that interesting, but the waitresses were cute.


On the way back to the hotel, I noticed that the Juche Tower actually lights up at night. Cool, but I couldn’t get a good picture of it.

Day 4


On the fourth day, we first went to Pyongyang Railway Museum (鐵道省革命史蹟館). It was mostly about the train travels of the President and General.


Most interesting to me were the old newspaper articles in mixed script. I took so many pictures of newspapers that I ended up lagging behind the group.


As always, there was a lot of information about President Kim Il-sung and General Kim Jong-il too. Not all of it had to do with trains, either.


They had a lot of old trains.


Next, we went to take the Pyongyang Metro (平壤地下鐵道). The escalator went deep down into the ground.


The metro has two or three lines; the Chollima Line (千里馬線) and the Hyoksin Line (革新線), and then there’s the Mangyongdae Line (萬景臺線), which seems to be an extension of the Chollima Line. You can press a button to activate lights that show you where to get on and off.


The stations on the Mangyongdae Line are very impressive. We started at Puhung Station (復興驛).


Then, we took the metro to Yonggwang Station (榮光驛). I was happy to hear another song I like, It is War (攻擊戰이다).


Finally, we took the metro all the way to Kaeson Station (凱旋驛), which is where the Arc of Triumph is. On the way, I sat next to a young girl who was studying Mandarin. Her English was good, too.


At Kaeson Station, this message says 「온 社會를 金日成-金正日主義化하자!」 (“Let’s Kimilsungism-Kimjongilism-ify the whole society!”) in Korean letters.

We got peach ice cream, or “eskimo” (에스키모) as the Koreans call it. It was good.


Next, we went to a factory museum. It had this interesting message: 「石炭은 工業의 食糧이다」 (“Coal is the food of industry”).


There were many old machines there.


Then, we passed one of the few things in Pyongyang to survive the Korean War, or Fatherland Liberation War (祖國解放戰爭), as they call it.


Next stop was the Grand People’s Study House, which we got to enter this time. Apparently, any citizen of the DPRK can use a computer with an Internet connection to order books from this grand library. I tried searching, but the Korean keyboard wasn’t indicated visually, and it wasn’t a layout I was familiar with.


Here’s the view of Kim Il-sung Square from the balcony, with the Juche Tower in the background.


Next, we went to my favourite museum: the Victorious Fatherland Liberation War Museum (祖國解放戰爭勝利紀念館).

They had lots of cannons, tanks, submarines, and planes.


I love the northern spelling of “tank” (땅크).


We also visited the captured American ship USS Pueblo.


And we watched this fantastic video. It puts a smile on my face every time.


Then, we entered the main museum. Unfortunately, we weren’t allowed to take pictures inside, but it was a great experience. There was a life-sized model of President Kim Il-sung greeting us in the lobby, and the building looked kind of like a luxury hotel.


I also saw these around the city. They say 「偉大한 金日成同志와 金正日同志는 永遠히 우리와 함께 계신다」 (“The great Comrade Kim Il-sung and Comrade Kim Jong-il are with us forever”) in Korean letters.


After the Victorious Fatherland Liberation War Museum, we drove through the countryside again.


On the way to Pyongsong, we could also see slogans in the countryside. This one says 「榮光스러운 朝鮮勞動黨 萬歲!」 (“Long live the glorious Workers’ Party of Korea!”) in Korean letters.


The hotel in Pyongsong was much smaller than the one in Pyongyang, but very cozy. The staff was super friendly. I was tired, so I went to bed early and missed the karaoke. Bummer, since I love north Korean music and would have loved to join.

Day 5


The next day, we went to a camp that served as a base of operation against the Japanese (or Americans … I forgot which).


On the way back to Pyongyang, we stopped at a candy factory. I bought four bottles of soda, because they were only 5 RMB each. The strawberry one was the best, but the peach one was OK too. The other two weren’t that good.


We also visited Kim Jong-suk Middle School No. 1 (金正淑 第一 中學校). I sang Chollima on the Wing (千里馬 달린다) and No Motherland Without You (當身이 없으면 祖國도 없다) with one of the classes.

There was a strange taxidermy room. The animals were kind of weird.


We left Pyongsong and headed to a “village” of movie backgrounds. There were areas for ancient Asia, South Korea, China, and more.

It was cool to walk through.


I saw only one church building on the trip. Apparently, it is in use.

Then, we went to another art gallery. I talked to a multilingual woman who knew French, among other languages.


After that, we went to visit the famous Yanggakdo International Hotel (羊角島國際호텔).

We had to visit the top floor with the revolving restaurant, of course.


After that, it was time to take the train back to China.


Got a few good pictures of the Korean countryside.


Finally, we reached Tantung (丹東). It was an interesting feeling. I felt safe in the DPRK, but not very free, since we had to stay with the guides at all times.


The train we travelled with didn’t have any toilet paper, but I had brought my own just in case.


One of the guards also had this strange mix of simplified and normal characters on his shoulder. All the other ones had it all in simplified.


I enjoyed the train ride, although it was about 24 hours long from Pyongyang to Peking.

I skipped a couple of things, like the grave of an ancient king near Kaesong and the store selling lots of imported goods, but I didn’t get any good pictures there. I hope I can visit some other places if I come back in the future!

Cantonese Goldlisting Project

My goldlisting project for this year is an average of 100 lines per day (but I’m currently 18 days ahead of schedule), and the language I’m goldlisting is Cantonese. My goal is 15 000 items, but because I need extra lines for readings, I’ll need to reach about 20 000 headlist lines.

Source and Method

I’m using a great list of characters encountered in the Taiwanese school system, finding the CantoDict page for one character at a time, and goldlisting all the “compounds” that seem worthwhile. This usually means I’ll skip:

  • Things I don’t understand the English translation of.
  • Transparent compound words.
  • Proper nouns that I don’t recognize.

The first few characters in the list have huge lists of words containing them, and the last ones may have only one or two. Some are not even in the dictionary, and others are there, but with no words. So in the beginning, I will be using the same list for a long time, but the further I get, the shorter the lists get. In addition, I keep track of what characters I’ve already goldlisted, so I skip any words containing characters I’ve already done, which further shortens the lists.


In calculating project sizes in the goldlist system, David James multiplies the amount of headlist lines by three to get an approximate number of lines. In my case, the entire project should have 60,000 lines in total, although in reality it will likely have less than that. I’m currently at 10,300 headlist lines, but I’ve already started distilling, so I have about 8,000 lines in my distillations too. If I see every item approximately 3 times on average, I’ve finished about 1/3 of the project.

If I keep going at 100 lines per day, the remaining ~40,000 lines should take me ~400 days. But if I can manage to keep going at the current rate of 200 lines per day, I’ll need only ~200 days, which means I could finish before next year, if work and other circumstances allow it.


Initially, I planned on doing the project in 20 batches of 1,000 headlist lines each, and just finish them one by one, but I found out about David’s superior batch system early on and decided to use that instead. If the numbers look scary, don’t worry; I felt completely overwhelmed when I looked at them too. But it’s actually very easy:

  • Write your headlist for the first batch (in my case 2,000 lines).
  • Distil the headlist from 1–2,000, and and continue making the headlist from 2,001–3,900.
  • Return to the beginning of the book, where your D1 (first distillation) starts, and distil it. Now distil the second headlist batch, and finally continue making the headlist from 3,901–5,700.
  • Return to the beginning of the book, where your D2 starts. Distil your way through the book again, distilling one page at a time. Now continue making the headlist from 5,701–6,300.
  • Return to the beginning of the book, where your D3 starts. Distil your way through the whole book. Add 1,500 lines to your headlist.
  • Return to the beginning of the book, where your D4 starts. Distil your way through the whole book. Add 1,400 lines to your headlist.
  • Return to the beginning of the book, where your D5 starts. Distil your way through the whole book. Add 1,300 lines to your headlist.
  • Return to the beginning of the book, where your D6 starts. Distil your way through the whole book. Add 1,200 lines to your headlist.
  • And so on.

Of course, you won’t find a single book that can fit everything, so since my books have 100 sheets and are 35 lines deep, I can fit 2,500 headlist words with three distillations in each if I use the very last page and the very first as if they were a double page. After D3, I have to sample from several pages to make a new list of 25 lines per page in a new book. We call the first book “bronze” and the second “silver” – the next is “gold” and then even “platinum” if you want to keep going, but you may not need to continue once you finish the silver book.

Having explained the system I planned on using, I have to say I ended up not sticking to it after all, and there’s a good reason for that. I can only do the headlist when I have access to my source, and my source is online. Therefore, I decided to do the headlist when I can, since the other distillations can be done anywhere (except when I sample from one book to put into another, but at least I don’t need to be online for that. But I think I will follow the batch system for distillations, and so far, I have.

Books and Pens

I use these 100-page, 35-line Kokuyo Campus notebooks:


Each one can fit 2,500 headlist lines, so I’ll need 8 of them at the bronze stage, and probably 2 at the silver stage. I might just use a smaller book at the gold stage.

I don’t have a specific type of pen that I use, but I try to use comfortable ones. I prefer 0.38 mm., but 0.5 mm. pens are OK as well. If you want to try this at home, make sure you stock up on pens, because this really eats them up! I like to use black for the headlist, blue for D1, red for D2, and green for D3, then black for D4, and so on, rotating the colours, but you can do it with any colour you like.

Cantonese: the Key to the Sinosphere

Let’s say you want to learn some East Asian languages, especially to a high level. Not just one, but several. In most cases, that will be Mandarin, Japanese, and Korean, so I’ll talk about those, but the secrets I’m going to tell you apply to any Chinese language and even Vietnamese. You can learn all of the above languages much more easily if you first invest some time into Cantonese.

A common complaint is that Cantonese is less useful than Mandarin. That may be so, but Cantonese is still a huge language that comes with a distinct and interesting culture, most notably spoken in Hong Kong and southern China, but also in chinatowns all over the world. But the real reason I’m writing about it is its almost magical ability to act as a key to any other language you may want to learn in what I like to call the Sinosphere. The usefulness of knowing Cantonese is bigger for some languages than for others, but it helps a lot in all cases. Let’s go through the benefits of knowing Cantonese.


As you may or may not know, all Chinese languages descended from a common language, and all but a few of them only split after what we call the Middle Chinese period. Well, these transitions are actually impossible to talk about in absolute terms, I’d argue, but that’s another discussion. Early Middle Chinese is described as having four tonal categories: 平 ‘level’, 上 ‘rising’, 去 ‘departing’, and 入 ‘entering’ (though 入 didn’t contrast directly with the others, as we shall see). We can’t really know for sure what the pronunciation of these tones were, but we assume that all tones were lower in pitch if their syllable started with a voiced initial, meaning that the vocal cords vibrate when you articulate the initial sound (you can feel voicing by touching your throat while saying ‘mmm…’).

By the Late Middle Chinese period, however, obstruents, meaning sounds that are made by restricting the airflow either wholly or partially, had lost their voicing distinctions, so that /p/ and /b/ had merged into /p/. But their tones remained the same, and the result was that the number of tones doubled. The high tones are called 陰 ‘Yin’ and the low tones 陽 ‘Yang’. Traditionally, the number of tones is then said to be 8, but if we only count contrasting tones, there were only 6. Regardless of how you prefer to count, all of these categories are preserved in Cantonese, and the high entering tone has split according to vowel length, so that short vowels have a higher tone than long ones, dividing the 入 category into three levels.

Most of the Chinese languages have lost one or more categories: Mandarin has preserved 4 out of 8 categories. Shanghainese has preserved 5. But the interesting thing is how categories were lost: in almost all cases, a category simply merged with another one. In some cases, it split and merged into two different categories, depending on whether or not its initial was an obstruent or a sonorant. In one case, it’s unfortunately random. In addition, there are numerous exceptions to the rules, of course, but the vast majority of syllables/characters follow the rules. Knowing what category a character belonged to in Middle Chinese lets us know what tone a character has today, as long as we know what categories the modern tones correspond to. Because Cantonese has preserved all of the categories, it lets us know what tones a character has in other languages – even in Vietnamese. Let’s look at how the tones of Cantonese lets us predict Mandarin tones:

Tone 1 (陰平): high level tone in both Cantonese and Mandarin. Some Cantonese speakers also distinguish a high falling tone.

Tone 2 (陰上): mid rising tone in Cantonese and low falling/dipping tone in Mandarin. (This is usually called the third tone in Mandarin.)

Tone 3 (陰去): mid level tone in Cantonese and high falling tone in Mandarin. (This is usually called the fourth tone in Mandarin.)

Tone 4 (陽平): low falling/very low level tone in Cantonese and mid rising tone in Mandarin. (This is usually called the second tone in Mandarin.)

Tone 5 (陽上): low rising tone in Cantonese and low falling/dipping tone in Mandarin – just like tone 2 (陰上). Many obstruent-initial tone 5 syllables instead become high falling tones in Mandarin.

Tone 6 (陽去): low level tone in Cantonese and high falling tone in Mandarin – just like tone 3 (陰去).

Obstruent-final Syllables

Sometimes called tones 7 through 9 (because one category split into two in Cantonese), these are actually the same as the other three level tones, 1/3/6, except that they end in unreleased obstruents, -p/t/k. In the most common romanization systems, they are labelled 1/3/6, but it can be useful to think of them as tones 7-9, because it fits the rest of the list:

Tone 7 and 8 (陰入高 & 陰入低): high level tone and mid level tone respectively in Cantonese and random in Mandarin.

Tone 9 (陽入): low level tone in Cantonese and mid rising (if obstruent-initial) or high falling (if sonorant-initial) in Mandarin.

Mandarin doesn’t retain any of the final obstruents, but it’s actually the oddball of the lot.

Final Consonants

Cantonese preserves six final consonants: -m, -n, -ng, -p, -t, -k. Although Vietnamese for example preserves two more, -nh and -ch, the intact tonal categories of Cantonese more than make up for it, because most of the languages have lost those two finals anyway.

Mandarin preserves only two final consonants: -n and -ng. Original -m has merged with -n, so they are predictable going from Cantonese to Mandarin, but not the other way around, just like the tones. The obstruent finals are gone, but as far as I know there is no predictable pattern to go by as far as their traces are concerned.

Japanese has only one final nasal, /ɴ/ (ん). Original -n and -m correspond to this (loaned as /nu/ and /mu/), but -ng doesn’t. Instead, -ng was loaned as final /u/, and is (usually) written う even today, though usually in the combination おう, signifying long /oː/. Of the obstruent finals, -t and -k are preserved usually as /t(s)u/ (つ) and /ku/ (く), and sometimes as /t(ɕ)i/ (ち) and /ki/ (き). The last one, -p, was borrowed into Japanese as /pu/, which has since undergone a change to /hu/ (ふ) and then deletion of the /h/, meaning -p is also /u/ う in most cases. Examples: 十 ‘ten’ (-p) /ʑipu/ => /ʑihu/ => /ʑiu/ => /ʑuː/; 白 ‘white’ (-k) /paku/ => /haku/; 工 ‘work’ (-ng) /kou/ => /koː/.

Korean also preserves six final consonants, but -t was borrowed as -l (possibly due to weakening in Old Mandarin). Otherwise, the endings are the same, except for a few labial-initial syllables, because Cantonese has undergone a process of dissimilation where a labial final consonant turns into a dental one if the initial consonant is also labial. But these syllables are few and far between, so don’t worry about it. Examples: 法 ‘law’ (-p, but -t in Canto) /pʌp/; 發 ‘prosper’ (-t) /pal/; 南 (-m) /nam/.


Although Cantonese (and Mandarin) lost their voicing distinctions in initials, the low tones of Cantonese still lets us predict their presence in e.g. Shanghainese and Japanese. However, it should be noted that the two main categories of Sino-Japanese readings, 吳音 (Go’on) and 漢音 (Kan’on), differ in this respect, with 吳音 retaining voiced initials and the later 漢音 having only voiceless obstruent initials, so it’s not always predictable for each word. Example: 上 ‘above/rise’ (-ng) has the Go’on /ʑoː/ (じょう), but the Kan’on /ɕoː/ (しょう).

Problems with Cantonese

Recently (that is, in the last couple of centuries), Cantonese has undergone, and is still undergoing, certain sound changes that sadly make it less suited to be used as a key. However, all but one of these are preserved in every major romanization scheme:

  • Merger of (null)- and ng-. Originally, the three high tones had null-initials and their corresponding low tones had ng- (with a few exceptions; 啱 had both ng- and a high tone). Nowadays, they are used interchangeably by most people, with ng- being preferred in formal contexts and null otherwise. Not a big problem comparatively, since the tones reveal where ng- was originally.
  • Merger of n- and l-. This is common in large parts of China, especially the south. Unlike the ng-/null merger, though, this one is not otherwise predictable, so you should take care to learn which syllables start with n-, even if you decide to pronounce them l-.
  • Merger of ng and m. With the exception of 唔 ‘not’, every single instance of “m” as its own syllable in Cantonese was originally ng. Easy to predict.
  • Merger of gwo-/kwo- and go-/ko-. The same goes with this one: learn which syllables have a -w-, even if you decide not to pronounce it.
  • Merger of -n/-ng and -t/-k. These also have to be learned even if you decide not to distinguish them.
  • Merger of sh/ch/zh- and s/c/z-. This is an old merger, and is the only one not usually distinguished in romanization. However, given the choice between tone categories and consonants, I choose tone categories any day of the week.
  • Merger of -om/-op and -am/-ap. This is another old merger, but one that affects a relatively small number of words, and isn’t that useful comparatively anyway. Cantonese syllables ending in -om are -an in Peking and those ending in -am are -in in Peking.

Cantonese has also lost a fair number of medials. Vowels may not be too helpful (though you’ll notice some patterns even when they are different, such as Cantonese short /a/ corresponding to Mandarin /i/, and /oː/ corresponding to Mandarin /ɑ/. Mandarin vowels tend to be slightly more similar to Japanese and Korean ones, but vowels seem to be the least predictable feature.

So now you know why you should learn Cantonese to make everything easier for yourself. Remember that you don’t need to understand all the technical stuff in this article; your brain is great at recognizing patterns, so learning Cantonese will give you most of the benefits automatically. Have fun!


Scandinavia is a dialect continuum, so I normally talk about it as one language. Most people don’t, but that’s fine too. Lets look at the names “Norwegian”, “Swedish”, and “Danish”. They have two main meanings, descriptively:
1. Any and all Old Norse-derived varieties spoken within the political borders of the respective Scandinavian countries.
2. The main Old Norse-derived variety used in the respective Scandinavian countries (roughly Oslo, Stockholm, and Copenhagen speech).

Basically, they are terms of convenience. If you’re a learner, you’ll probably use it more in the second meaning, so I’ll do that here, but don’t forget that Scandinavian is more than these three (four if you count Nynorsk) standards. All the standard varieties are really similar to each other, and one common way of saying it is that Norwegian and Swedish sound similar (Oslo being close to Stockholm geographically), whereas Norwegian and Danish look similar in writing (Bokmål being based on the Danish written tradition).

There’s a lot of truth to this, and with Oslo being a place where all kinds of Norwegians and other Scandinavians alike can be heard, learning Bokmål (the most popular written language in Norway) with Oslo pronunciation is a good way to access Scandinavia from the middle. Norwegians are generally better at understanding their neighbours than the other way around, which may have something to do with being in-between. I personally think Oslo-accented Bokmål gives the best coverage overall.

Varieties of Language

The terms ‘language’ and ‘dialect’ (and, indeed, ‘variety’ – a very handy term that encompasses ‘language’, ‘dialect’, ‘register’, and so on) are nothing more than terms of convenience and a reflection of culture. They are not scientific terminology. The reason for this is that it’s impossible to determine by objective linguistic criteria whether we’re dealing with one or two varieties – and that goes on all levels (even the individual). An excellent in-depth discussion of why this is so can be found in “Sociolinguistics” by R. A. Hudson.

‘Dialect’ has roughly two descriptive (as opposed to prescriptive) meanings:
1. “Variety of a certain language.”
2. “Non-standardized/non-official variety.”
(For the second meaning, a ‘language’ is then “standardized/official variety” – but a better term is simply “standard language”.)

The first meaning tends to be preferred by language enthusiasts, but it’s a lousy definition, since it’s impossible to determine what a ‘language’ is, and thus also what a ‘dialect’ is. The second tends to be hated by egalitarian-minded people because ‘language’ comes with a certain prestige that ‘dialect’ doesn’t have, but at least it makes it possible to distinguish the terms without ambiguity.

Whenever this question comes up, someone with the first definition in mind mentions mutual intelligibility as a definitive measure that can help us decide. However, mutual intelligibility is not only highly gradual; it also depends heavily on one’s previous exposure and will to understand, and so doesn’t help.