The Unicode Project

The Unicode Project is an attempt to integrate Unicode into Squeak. On this page you find information about the current status of the project and downloadable files.

Here you can look at screenshots that show some Unicode enabled applications. (The page contains 7 images and the size of all images is approximately 70 KBytes.)

Project Status:

last updated: November 19, 2003

Available features:

Incomplete features:

Available fonts include some very good free bdf-fonts from various Internet sites. See acknowledgements below.

Missing features:

Download Information:

The Installation Guide gives detailed information about needed files and about the installation procedure. It is recommended that you store this file on your computer.

Implementation Notes

soon coming

Test Files for the Encoding-Aware Scamper

A collection of test files for the encoding-aware version of Scamper is available on request. That collection of test files contains html files for 30 different encodings.

Acknowledgements:

Most of the fonts that are included into this Unicode package were created by individuals that most generously share their admirable work with others. Creating glyphs for full Unicode support is a tremendous task, a task that can take you years. In fact, implementation of full support for Unicode is beyond the forces of one single individual, it requires cooperation.
I used glyphs from these sites:

A remark about Unihan glyphs:

It is well-known that Han Unification results in a collection of glyphs that ignores the writing traditions of the people that use hanzi ideographs.

For an introductory article about the problems of Han unification, read http://www.cs.uu.nl/~otfried/Mule/unihan.htm written by Otfried Cheong.

Ken Lunde writes:

Other issues regarding ISO 10646-1:1993 have to do with proper character rendering (that is, how characters are displayed, printed, or otherwise imaged). Many (sometimes) subtle character form differences have been collapsed under ISO 10646-1:1993. Language or locale was not one of the factors used in performing Han Unification. This means that it is nearly impossible to create a single ISO 10646-1: 1993 font that meets the character form criteria of each of the four CJK locales. An ISO 10646-1:1993 code point is not enough information to render a Chinese character. If the font was specifically designed for a single locale, it is a non-problem, but if there is any CJK intent, text must be flagged for language or locale.

The hanzi ideographs in this package reflect the taiwanese writing traditions. Some of these ideographs are not suitable to display chinese, japanese or korean texts. Regrettably I do not currently have writing variants of all ideographs that are not identical in all four CJK locales. However, I cannot resist to communicate this observation: The TTC file formats can be used to pack similar fonts into one file. The advantage of the format is that all fonts in a TTC file can share the outlines of their common glpyhs. One should think that font manufacturers would use this feature to create multifont files that contain fonts for all four CJK locales. In fact, I never saw such a TTC font. I conclude that font manufacturers are either not interested to create fonts that support all CJK locales or believe that the differences between the CJK locales are irrelevant.

As an alternative to the TTC file format, it is possible to use the GSUB table of an OpenType font to specify writing variants. This is often the better solution - especially for fonts with a large number of codepoints. The font Arial Unicode MS, which is part of some newer Microsoft products, uses the GSUB table to provide writing variants for traditional chinese, simplified chinese and korean ideographs.

It should be noted that the CJK compatibility block (U+2F800 to U+2FA1F) contains writing variants for UniHan ideographs that cannot be used for all CJK locales. As an example, the glyph at codepoint U+771F, which is often cited as being unsuitable for Japanese, has writing variants at codepoints U+2F946 and U+2F947.

Your comments:

Your comments are welcome. Please mail your comments to Boris.Gaertner@gmx.net