article

Unicode reserves 1,114,112 (= 220 + 216 or 17 × 216, hexadecimal 110000) code points, and currently assigns characters to more than 96,000 of those code points. The first 256 codes correspond with those of ISO 8859-1, the most popular 8-bit character encoding in the Western world. As a result, the first 128 characters are also identical to ASCII.

The Unicode code space for characters is divided into 17 planes, each with 65,536 (= 216) code points, although currently only a few planes are used:

The cap of 220 code points (excluding Plane 16) exists in order to maintain compatibility with the UTF-16 encoding, which addresses only that range (see below). Currently, about ten percent of the Unicode code space is used. Furthermore, ranges of characters have been tentatively blocked out for every known unencoded script (see *), and while Unicode may need another plane for ideographic characters, there are ten planes available if previously unknown scripts with tens of thousands of characters are discovered. This 20 bit limit is unlikely to be reached in the near future.

Basic Multilingual Plane


The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.

The graphic on the right is a visual roadmap to the Basic Multilingual Plane. The colours in use are:

As of Unicode 4.1, The BMP includes the following scripts:

Several scripts are expected to be included in the BMP in the next revision of Unicode. These scripts, and their proposed code point ranges, are the following:

Several other scripts are proposed for inclusion in the BMP, including:

Supplementary Multilingual Plane


Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.

As of Unicode 4.1, Plane One includes the following scripts:

Several scripts are expected to be included in the next revision of Unicode:

Many other scripts are proposed for inclusion in Plane One, including:

Private Use Area


A Private Use Area (PUA) is one of several ranges which are reserved for private use. For this range, the Unicode standard does not specify any characters.

The Basic Multilingual Plane includes a PUA in the range from U+E000 to U+F8FF (57344–63743). Plane Fifteen (U+F0000 to U+FFFFF), and Plane Sixteen (U+100000 to 10FFFF) are completely reserved for private use as well.

The use of the PUA was a concept inherited from certain Asian encoding systems. These systems had private use areas to encode Japanese Gaiji (rare personal name characters) in application-specific ways. Similarly the ConScript Unicode Registry aims to coordinate the mapping of scripts not yet encoded in or rejected by Unicode in the PUAs. The Medieval Unicode Font Initiative uses the PUA to encode various ligatures, precomposed characters, and symbols found in medieval texts.

One example of usage of the Private Use Area is Apple Computer's usage of U+F8FF for the Apple logo.

Other planes


Plane 2, the Supplementary Ideographic Plane (SIP), is used for about 40,000 rare Chinese characters that are mostly historic, although there are some modern ones. Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains some non-recommended language tag characters and some variation selection characters.

Mapping tables


Unicode mapping tables BMP SMP SIP SSP Windows_Programming/Unicode/Character_reference/0000-0FFFWindows Programming/Unicode/Character reference/8000-8FFFUnicode 10000-10FFFUnicode 20000-20FFFUnicode 28000-28FFFUnicode E0000-E0FFF Windows Programming/Unicode/Character reference/1000-1FFFWindows Programming/Unicode/Character reference/9000-9FFF Unicode 21000-21FFFUnicode 29000-29FFF Windows Programming/Unicode/Character reference/2000-2FFFWindows Programming/Unicode/Character reference Unicode 22000-22FFFUnicode 2A000-2AFFF Windows Programming/Unicode/Character reference/3000-3FFFWindows Programming/Unicode/Character reference Unicode 23000-23FFF  Windows Programming/Unicode/Character reference/4000-4FFFWindows Programming/Unicode/Character referenceUnicode 1D000-1DFFFUnicode 24000-24FFFUnicode 2F000-2FFFF Windows Programming/Unicode/Character reference/5000-5FFFWindows Programming/Unicode/Character reference Unicode 25000-25FFF  Windows Programming/Unicode/Character reference/6000-6FFFWindows Programming/Unicode/Character reference Unicode 26000-26FFF  Windows Programming/Unicode/Character reference/7000-7FFFWindows Programming/Unicode/Character reference Unicode 27000-27FFF

Unicode

기본 다국어 평면 | Lijst van Unicode-subbereiken | Символы, представленные в Юникоде | 基本多文種平面

 

This article is licensed under the GNU Free Documentation License. It uses material from the "Mapping of Unicode characters".

Home Pageartsbusinesscomputersgameshealthhospitalshomekids & teensnewsphysiciansrecreationreferenceregionalscienceshoppingsocietysportsworld