HTML 4.01 supports the ISO 8859-1 (Latin-1) character set.
The lower part of ISO-8859-1 (codes from 0-127) is the original 7-BIT ASCII.
Most of these characters can be used without a character reference.
The higher part of ISO-8859-1 (codes from 160-255) can all be used using
character entity names.
Some characters have a special meaning in HTML, like the less than sign (<)
that defines the start of an HTML tag. If we want the browser to actually
display these characters we must insert character entities in the HTML source.
A character entity has three parts: an ampersand (&), an entity name or a # and
an entity number, and finally a semicolon (;).
To display a less than sign in an HTML document we must write: < <
The advantage of using a name instead of a number is that a name is easier to
remember. The disadvantage is that not all browsers support the newest entity
names, while the support for entity numbers is very good in almost all browsers.
The Pinyin system also incorporates suprasegmentalphonemes to represent the four tones of Mandarin. Each tone is indicated by a diacritical mark above a non-medial vowel. Note that the lower-case letter "a" in pinyin is supposed to be of the handwritten type with no curl over the top. This can be achieved by using a font in which the letter happens to look like this, or alternatively by specifying it using Unicode as we have done in the bracketed example. Note that tones marks can also appear on consonants in certain vowelless exclamations.
The first tone is represented by a macron (ˉ) added to the pinyin vowel:
The third tone is symbolized by a caron (ˇ, also known as a reverse circumflex). Note, it is officially not a breve (˘, lacking a downward angle), although this misuse is somewhat common on the Internet.
The fifth or neutral tone is represented by a normal vowel without any accent mark:
(ɑ) a e i o u ü A E I O U Ü
(In some cases, this is also written with a dot before the syllable; for example, ·ma.)
Since most computer fonts do not contain the macron or caron accents, a common convention is to postfix the individual syllables with a digit representing their tone (e.g., "tóng" (tong with the rising tone) is written "tong2"). The digit is numbered as the order listed above, except the "fifth tone", which, in addition to being numbered 5, is also either not numbered or numbered zero, as in ma0 (吗/嗎, an interrogative marker).
The pinyin vowels are ordered as a, o, e, i, u, and ü. Generally, the tone mark is placed on the vowel that first appears in the order mentioned. Liú is a superficial exception whose true pronunciation is lióu. And since o precedes i, óu (contracted to ú) is marked.
These tone marks normally are only used in Mandarin textbooks or in foreign learning texts, but they are essential for correct pronunciation of Mandarin syllables, as exemplified by the following classical example of five characters whose pronunciations differ only in their tones:
(Being "mother", "hemp", "horse", "insult" and a question particle, respectively.)