PuTTY: Glossary text.

To: <developer at arabeyes dot org>

Subject: PuTTY: Glossary text.

From: "alsayed" <alsayed at mum dot mans dot edu dot eg>

Date: Sat, 2 Nov 2002 12:50:43 +0200

Organization: CITC- Mansura University

Mr:/ Kamal here is the glossary text , i had gathered most of the terms
found in the http://freshmeat.net/articles/view/467/ article , and i need
your comments and suggested modification , sorry for its bad formatting ,
but i was in hurry to reach our date .

Best regards,
sayed

Detailed glossary of the terms founded in [http://freshmeat.net/articles/view/467/] article titled "A Proposal for a True Internationalization". ================================================================================================= IME: Input Method Editor ------------------------ The input method editor relieves users of the need to remember all possible character values. Instead, the IME monitors the user's keystrokes, anticipates the characters the user may want, and presents a list of candidate characters from which to choose. By default, the IME provides an IME window through which users enter keystrokes and view and select candidates. Win32�-based applications can use the input method manager (IMM) functions and messages to create and manage their own IME windows, providing a custom interface while using the conversion capabilities of the IME. IMM is only enabled on East Asian (Chinese, Japanese, Korean) localized Windows 95/98 and Windows NT4.0/3.51 platforms. On these systems, call GetSystemMetrics with SM_DBCSENABLED to determine if IMM is enabled. Windows 2000 provides full-featured IME support in all localized language versions. Note, however, that IMM is enabled only when an Asian language pack is installed. An IME-enabled application can call GetSystemMetrics with SM_IMMENABLED to determine if IMM is enabled. ==----------------------- http://msdn.microsoft.com ==----------------------- ================================================================================================= i18n: ----- i18n = internationalization and l10n = localization i18n is the process of preparing a generic product. "Generic" in the sense that there are no locale-specific features hard-coded into the software, or the associated documentation. The goal of the i18n process is to separate the main product features from the locale-specific features (such as ui text, date and time handling, and much more). Once you have a internationalized source code base, you are not done. In a perfect world, you don't even have a English (or whatever the first language is) product yet. What you need next is a process for plugging in one or more of the locale-specific choices into the generic code-base, and building the product for release. This process is what we call "l10n" or "localization"." ==----------------------------------------------------------------------- http://www.i18n.com/article.pl?sid=01/11/03/189205 http://www.i18n.com/article.pl?sid=02/03/02/0345215&mode=thread&threshold= =------------------------------------------------------------------------- ================================================================================================= gettext: -------- the GNU `gettext' utilities are a set of tools that provides a framework to help other GNU packages produce multi-lingual messages. These tools include a set of conventions about how programs should be written to support message catalogs, a directory and file naming organization for the message catalogs themselves, a runtime library supporting the retrieval of translated messages, and a few stand-alone programs to massage in various ways the sets of translatable strings, or already translated strings ==---------------------------------- http://www.gnu.org/software/gettext/ ==---------------------------------- ================================================================================================= Ami: ---- Ami is an X input method server for Korean text input. Hangul or Hanja Korean text can be input with Ami, which responds the requests from XIM compliant applications. In this package, Ami has been built as a GNOME panel applet. ------------------------------------------------------ http://packages.debian.org/unstable/x11/ami-gnome.html ------------------------------------------------------ ================================================================================================= Kinput2: -------- kinput2 is Japanese input server, which supports the XIM protocol. ---------------------------------------------------------- http://www.suse.de/~mfabian/suse-cjk/kinput2.html#foot1885 ---------------------------------------------------------- ================================================================================================= XIM: ---- XIM (= X Input Method) is a generic API to build applictions which have support for international input. All applications which have support for the XIM-protocol build in, can be used to input Japanese, Chinese and Korean. Many X11 applications already support XIM, for example: most KDE 2 and most Gnome applications Browsers like Mozilla, Netscape, Konqueror Editors like Emacs, XEmacs, gvim terminals like kterm and rxvt Java applications --------------------------------------------- http://www.suse.de/~mfabian/suse-cjk/xim.html --------------------------------------------- ================================================================================================= .po files: ---------- When the programmers insert user-visible text strings into the programs, they enclose them in _() or N_(). These are macros that gettext use to extract the strings from the source code and put them into a message catalog. This is placed in a subdirectory called po/ in the top directory of the programs' source code and is called for example gnome-libs.pot. The first thing you do when starting a new translation is to copy this file to XX.po where XX is your language code (specified in ISO 634, which is a standard containing language codes). After doing this you can start translating the actual content of the file. The file contains a number of strings of the form: #: panel/panel_config.c:1050 msgid "Color to use:" msgstr "Farge som skal brukes:" The first line tells you which source file the string comes from. panel/panel_config.c at line 1050. The next is the original string from the .c file, and the last is the translation. The last line will be empty when you first copy the .pot file to XX.po. Now to have your translation included in the build process you must add the language code to configure.in in the ALL_LINGUAS environment variable. For readability i recommend you add it in alphabetical order. ---------------------------------------------------------------- http://developer.gnome.org/projects/gtp/translate-gnome/x22.html ---------------------------------------------------------------- ================================================================================================= cana: ----- Canna is a client-server based Japanese input system. ----------------------------------------------------- http://www.suse.de/~mfabian/suse-cjk/canna.html#canna ----------------------------------------------------- ================================================================================================= FreeWnn : --------- FreeWnn is a client-server based input system. There are four different servers, one for Japanese, one for Korean, one for traditional Chinese and one for simplified Chinese. --------------------------------------------------------- http://www.suse.de/~mfabian/suse-cjk/freewnn.html#freewnn --------------------------------------------------------- ================================================================================================= UNICODE : --------- Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use. These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption. Unicode is changing all that! Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends. Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption. ---------------------------------------------------------- http://www.unicode.org/unicode/standard/WhatIsUnicode.html ---------------------------------------------------------- ================================================================================================= UTF8: ----- UCS and Unicode are first of all just code tables that assign integer numbers to characters. There exist several alternatives for how a sequence of such characters or their respective integer values can be represented as a sequence of bytes. The two most obvious encodings store Unicode text as sequences of either 2 or 4 bytes sequences. The official terms for these encodings are UCS-2 and UCS-4 respectively. Unless otherwise specified, the most significant byte comes first in these (Bigendian convention). An ASCII or Latin-1 file can be transformed into a UCS-2 file by simply inserting a 0x00 byte in front of every ASCII byte. If we want to have a UCS-4 file, we have to insert three 0x00 bytes instead before every ASCII byte. Using UCS-2 (or UCS-4) under Unix would lead to very severe problems. Strings with these encodings can contain as parts of many wide characters bytes like '\0' or '/' which have a special meaning in filenames and other C library function parameters. In addition, the majority of UNIX tools expects ASCII files and can't read 16-bit words as characters without major modifications. For these reasons, UCS-2 is not a suitable external encoding of Unicode in filenames, text files, environment variables, etc. The UTF-8 encoding defined in ISO 10646-1:2000 Annex D and also described in RFC 2279 as well as section 3.8 of the Unicode 3.0 standard does not have these problems. It is clearly the way to go for using Unicode under Unix-style operating systems. UTF-8 has the following properties: UCS characters U+0000 to U+007F (ASCII) are encoded simply as bytes 0x00 to 0x7F (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8. All UCS characters >U+007F are encoded as a sequence of several bytes, each of which has the most significant bit set. Therefore, no ASCII byte (0x00-0x7F) can appear as part of any other character. The first byte of a multibyte sequence that represents a non-ASCII character is always in the range 0xC0 to 0xFD and it indicates how many bytes follow for this character. All further bytes in a multibyte sequence are in the range 0x80 to 0xBF. This allows easy resynchronization and makes the encoding stateless and robust against missing bytes. All possible 231 UCS codes can be encoded. UTF-8 encoded characters may theoretically be up to six bytes long, however 16-bit BMP characters are only up to three bytes long. The sorting order of Bigendian UCS-4 byte strings is preserved. The bytes 0xFE and 0xFF are never used in the UTF-8 encoding. ------------------------------------------- http://www.cl.cam.ac.uk/~mgk25/unicode.html ------------------------------------------- ================================================================================================= UCS: ---- The international standard ISO 10646 defines the Universal Character Set (UCS). UCS is a superset of all other character set standards. It guarantees round-trip compatibility to other character sets. If you convert any text string to UCS and then back to the original encoding, then no information will be lost. UCS contains the characters required to represent practically all known languages. This includes not only the Latin, Greek, Cyrillic, Hebrew, Arabic, Armenian, and Georgian scripts, but also also Chinese, Japanese and Korean Han ideographs as well as scripts such as Hiragana, Katakana, Hangul, Devangari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Khmer, Bopomofo, Tibetian, Runic, Ethiopic, Canadian Syllabics, Cherokee, Mongolian, Ogham, Myanmar, Sinhala, Thaana, Yi, and others. For scripts not yet covered, research on how to best encode them for computer usage is still going on and they will be added eventually. This includes not only Cuneiform, Hieroglyphs and various Indo-European languages, but even some selected artistic scripts such as Tolkien's Tengwar and Cirth. UCS also covers a large number of graphical, typographical, mathematical and scientific symbols, including those provided by TeX, Postscript, APL, MS-DOS, MS-Windows, Macintosh, OCR fonts, as well as many word processing and publishing systems, and more are being added. ISO 10646 defines formally a 31-bit character set. The most commonly used characters, including all those found in older encoding standards, have been placed in one of the first 65534 positions (0x0000 to 0xFFFD). This 16-bit subset of UCS is called the Basic Multilingual Plane (BMP) or Plane 0. The characters that were later added outside the 16-bit BMP are mostly for specialist applications such as historic scripts and scientific notation. Current plans are that there will never be characters assigned outside the 21-bit code space from 0x000000 to 0x10FFFF, which covers a bit over one million potential future characters. The ISO 10646-1 standard was first published in 1993 and defines the architecture of the character set and the content of the BMP. A second part ISO 10646-2 was added in 2001 and defines characters encoded outside the BMP. New characters are still being added on a continuous basis, but the existing characters will not be changed any more and are stable. UCS assigns to each character not only a code number but also an official name. A hexadecimal number that represents a UCS or Unicode value is commonly preceded by "U+" as in U+0041 for the character "Latin capital letter A". The UCS characters U+0000 to U+007F are identical to those in US-ASCII (ISO 646 IRV) and the range U+0000 to U+00FF is identical to ISO 8859-1 (Latin-1). The range U+E000 to U+F8FF and also larger ranges outside the BMP are reserved for private use. UCS also defines several methods for encoding a string of characters as a sequence of bytes, such as UTF-8 and UTF-16. The full references for the two parts of the UCS standard are International Standard ISO/IEC 10646-1, Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane. Second edition, International Organization for Standardization, Geneva, 2000. International Standard ISO/IEC 10646-2, Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 2: Supplementary Planes. First edition, International Organization for Standardization, Geneva, 2001. The standards can be ordered online from ISO as a set of PDF files on CD-ROM for 80 CHF (~54 EUR, ~53 USD, ~35 GBP) each. ------------------------------------------- http://www.cl.cam.ac.uk/~mgk25/unicode.html ------------------------------------------- ================================================================================================ UTF Glossary: ------------- UTF. Abbreviation for Unicode (or UCS) Transformation Format. UTF-2. Obsolete name for UTF-8. UTF-7. Unicode (or UCS) Transformation Format, 7-bit encoding form, specified by RFC-2152. UTF-8. Unicode (or UCS) Transformation Format, 8-bit encoding form. UTF-8 is the Unicode Transformation Format that serializes a Unicode scalar value (code point) as a sequence of one to four bytes, as specified in Table 3-1, UTF-8 Bit Distribution . (See Definition D36 in Section 3.8, Transformations .) UTF-16. Unicode (or UCS) Transformation Format, 16-bit encoding form. The UTF-16 is the Unicode Transformation Format that serializes a Unicode scalar value (code point) as a sequence of two bytes, in either big-endian or little-endian format. (See Definition D35 in Section 3.8, Transformations .) UTF-16BE. The Unicode Transformation Format that serializes a Unicode scalar value (code point) as a sequence of two bytes, in big-endian format. An initial sequence corresponding to U+FEFF is interpreted as a ZERO WIDTH NO-BREAK SPACE. (See Definition D33 in Section 3.8, Transformations .) UTF-16LE. The Unicode Transformation Format that serializes a Unicode scalar value (code point) as a sequence of two bytes, in little-endian format. An initial sequence corresponding to U+FEFF is interpreted as a ZERO WIDTH NO-BREAK SPACE. (See Definition D34 in Section 3.8, Transformations .) UTF-32. The Unicode Transformation Format that serializes a Unicode code point as a sequence of four bytes, in either big-endian or little-endian format. An initial sequence corresponding to U+FEFF is interpreted as a byte order mark: it is used to distinguish between the two byte orders. The byte order mark is not considered part of the content of the text. A serialization of Unicode code points into UTF-32 may or may not begin with a byte order mark. UTF-32BE. The Unicode Transformation Format that serializes a Unicode code point as a sequence of four bytes, in big-endian format. An initial sequence corresponding to U+FEFF is interpreted as a ZERO WIDTH NO-BREAK SPACE. UTF-32LE. The Unicode Transformation Format that serializes a Unicode code point as a sequence of four bytes, in little-endian format. An initial sequence corresponding to U+FEFF is interpreted as a ZERO WIDTH NO-BREAK SPACE. --------------------------------- http://www.unicode.org/glossary/ --------------------------------- =============================================================================================== AbiWord: -------- AbiWord is a free word processing program similar to Microsoft� Word. It is suitable for typing papers, letters, reports, memos, and so forth. -------------------------- http://www.abisource.com/ -------------------------- ================================================================================================= glyph: ------ (1) The actual shape (bit pattern, outline) of a character image. For example, an italic 'a' and a roman 'a' are two different glyphs representing the same underlying character. In this strict sense, any two images which differ in shape constitute different glyphs. In this usage, ``glyph'' is a synonym for ``character image'', or simply ``image''. (2) A kind of idealized surface form derived from some combination of underlying characters in some specific context, rather than an actual character image. In this broad usage, two images would constitute the same glyph whenever they have essentially the same topology (as in oblique 'a' and roman 'a'), but different glyphs when one is written with a hooked top and the other without (the way one prints an 'a' by hand). In this usage, ``glyph'' is a synonym for ``glyph type,'' where glyph is defined as in sense 1. ---------------------------------------------- http://rcum.uni-mb.si/local/fontfaq/cf_18.htm http://rcum.uni-mb.si/local/fontfaq/CF_1.HTM ---------------------------------------------- =================================================================================================