mirror of
git://git.code.sf.net/p/zsh/code
synced 2025-09-03 10:21:46 +02:00
27710: update FAQ on advanced character sets
This commit is contained in:
parent
62c744e60b
commit
4de2fee610
2 changed files with 31 additions and 41 deletions
|
@ -1,5 +1,7 @@
|
||||||
2010-02-15 Peter Stephenson <pws@csr.com>
|
2010-02-15 Peter Stephenson <pws@csr.com>
|
||||||
|
|
||||||
|
* 27710: Etc/FAQ.yo: update sections on advanced character sets.
|
||||||
|
|
||||||
* unposted: Etc/FAQ.yo: correct outrageously old dates in FAQ.
|
* unposted: Etc/FAQ.yo: correct outrageously old dates in FAQ.
|
||||||
|
|
||||||
2010-02-13 Peter Stephenson <p.w.stephenson@ntlworld.com>
|
2010-02-13 Peter Stephenson <p.w.stephenson@ntlworld.com>
|
||||||
|
@ -12742,5 +12744,5 @@
|
||||||
|
|
||||||
*****************************************************
|
*****************************************************
|
||||||
* This is used by the shell to define $ZSH_PATCHLEVEL
|
* This is used by the shell to define $ZSH_PATCHLEVEL
|
||||||
* $Revision: 1.4899 $
|
* $Revision: 1.4900 $
|
||||||
*****************************************************
|
*****************************************************
|
||||||
|
|
68
Etc/FAQ.yo
68
Etc/FAQ.yo
|
@ -933,37 +933,16 @@ sect(What is zsh's support for Unicode/UTF-8?)
|
||||||
ways of supporting character sets beyond ASCII. `UTF-8' is an
|
ways of supporting character sets beyond ASCII. `UTF-8' is an
|
||||||
encoding of Unicode that is particularly natural on Unix-like systems.
|
encoding of Unicode that is particularly natural on Unix-like systems.
|
||||||
|
|
||||||
Q: Does zsh support UTF-8?
|
The production branch of zsh, 4.2, has very limited support:
|
||||||
|
the built-in printf command supports "\u" and "\U" escapes
|
||||||
A: zsh's built-in printf command supports "\u" and "\U" escapes
|
to output arbitrary Unicode characters; ZLE (the Zsh Line Editor) has
|
||||||
to output arbitrary Unicode characters. ZLE (the Zsh Line Editor) has
|
|
||||||
no concept of character encodings, and is confused by multi-octet
|
no concept of character encodings, and is confused by multi-octet
|
||||||
encodings.
|
encodings.
|
||||||
|
|
||||||
Q: Why doesn't zsh have proper UTF-8 support?
|
However, the 4.3 branch has much better support, and furthermore this
|
||||||
|
is now fairly stable. (Only a few minor areas need fixing before
|
||||||
A: The code has not been written yet.
|
this becomes a production release.) This is discussed more
|
||||||
|
fully below, see `Multibyte input and output'.
|
||||||
Q: What makes UTF-8 support difficult to implement?
|
|
||||||
|
|
||||||
A: In order to handle arbitrary encodings the correct way, significant
|
|
||||||
and intrusive changes must be made to the shell.
|
|
||||||
|
|
||||||
Q: Why can't zsh just use readline?
|
|
||||||
|
|
||||||
A: ZLE is not encapsulated from the rest of the shell. Isolating it
|
|
||||||
such that it could be replaced by readline would be a significant
|
|
||||||
effort. Furthermore, using readline would effect a significant loss of
|
|
||||||
features.
|
|
||||||
|
|
||||||
Q: What changes are planned?
|
|
||||||
|
|
||||||
A: Introduction of Unicode support will be gradual, so if you are
|
|
||||||
interested in being involved you should join the zsh-workers mailing
|
|
||||||
list. As a first step ZLE will be rewritten to use wide characters
|
|
||||||
internally. Character based widgets can then operate on a single wide
|
|
||||||
character instead of a single byte, and the proper display width can be
|
|
||||||
calculated with wcswidth().
|
|
||||||
|
|
||||||
|
|
||||||
chapter(How to get various things to work)
|
chapter(How to get various things to work)
|
||||||
|
@ -2030,7 +2009,7 @@ sect(What is multibyte input?)
|
||||||
zsh will be able to use any such encoding as long as it contains ASCII as
|
zsh will be able to use any such encoding as long as it contains ASCII as
|
||||||
a single-octet subset and the system can provide information about other
|
a single-octet subset and the system can provide information about other
|
||||||
characters. However, in the case of Unicode, UTF-8 is the only one you
|
characters. However, in the case of Unicode, UTF-8 is the only one you
|
||||||
are likely to enounter.
|
are likely to enounter that is useful in zsh.
|
||||||
|
|
||||||
(In case you're confused: Unicode is the character set, while UTF-8 is
|
(In case you're confused: Unicode is the character set, while UTF-8 is
|
||||||
an encoding of it. You might hear about other encodings, such as UCS-2
|
an encoding of it. You might hear about other encodings, such as UCS-2
|
||||||
|
@ -2063,7 +2042,7 @@ sect(How does zsh handle multibyte input and output?)
|
||||||
Note that if the shell is emulating a Bourne shell the tt(MULTIBYTE)
|
Note that if the shell is emulating a Bourne shell the tt(MULTIBYTE)
|
||||||
option is unset by default. This allows various POSIX modes to
|
option is unset by default. This allows various POSIX modes to
|
||||||
work normally (POSIX does not deal with multibyte characters). If
|
work normally (POSIX does not deal with multibyte characters). If
|
||||||
you use a "sh" or "ksh" emulation interactively you shouldprobably
|
you use a "sh" or "ksh" emulation interactively you should probably
|
||||||
set the tt(MULTIBYTE) option.
|
set the tt(MULTIBYTE) option.
|
||||||
|
|
||||||
The other option that affects multibyte support is tt(COMBINING_CHARS),
|
The other option that affects multibyte support is tt(COMBINING_CHARS),
|
||||||
|
@ -2072,9 +2051,10 @@ sect(How does zsh handle multibyte input and output?)
|
||||||
assumed to be modifications (accents etc.) to the base character and to
|
assumed to be modifications (accents etc.) to the base character and to
|
||||||
be displayed within the same screen area as the base character. As not
|
be displayed within the same screen area as the base character. As not
|
||||||
all terminals handle this, even if they correctly display the base
|
all terminals handle this, even if they correctly display the base
|
||||||
multibyte character, this option is not on by default. The KDE terminal
|
multibyte character, this option is not on by default. Recent versions
|
||||||
emulator tt(konsole), tt(rxvt-unicode), and the Unicode version of
|
of the KDE and GNOME terminal emulators tt(konsole) and
|
||||||
xterm, tt(xterm -u8) or the front-end tt(uxterm), are known to handle
|
tt(gnome-terminal) as well as tt(rxvt-unicode), and the Unicode version
|
||||||
|
of xterm, tt(xterm -u8) or the front-end tt(uxterm), are known to handle
|
||||||
combining characters.
|
combining characters.
|
||||||
|
|
||||||
The tt(COMBINING_CHARS) option only affects output; combining characters
|
The tt(COMBINING_CHARS) option only affects output; combining characters
|
||||||
|
@ -2110,12 +2090,12 @@ sect(How do I ensure multibyte input and output work on my system?)
|
||||||
edit file names that have been created using a different character
|
edit file names that have been created using a different character
|
||||||
set it won't work properly.)
|
set it won't work properly.)
|
||||||
it() The terminal emulator. Those that are supplied with a recent
|
it() The terminal emulator. Those that are supplied with a recent
|
||||||
desktop environment, such as gnome-terminal, are likely to have
|
desktop environment, such as tt(konsole) and tt(gnome-terminal), are
|
||||||
extensive support for localization and may work correctly as soon
|
likely to have extensive support for localization and may work
|
||||||
as they know the locale. You can enable UTF-8 support for
|
correctly as soon as they know the locale. You can enable UTF-8
|
||||||
tt(xterm) in its application defaults file. The following are
|
support for tt(xterm) in its application defaults file. The
|
||||||
the relevant resources; you don't actually need all of them, as
|
following are the relevant resources; you don't actually need all of
|
||||||
described below. If you use a mytt(~/.Xdefaults) or
|
them, as described below. If you use a mytt(~/.Xdefaults) or
|
||||||
mytt(~/.Xresources) file for setting resources, prefix all the lines
|
mytt(~/.Xresources) file for setting resources, prefix all the lines
|
||||||
with mytt(xterm):
|
with mytt(xterm):
|
||||||
verb(
|
verb(
|
||||||
|
@ -2147,7 +2127,12 @@ sect(How do I ensure multibyte input and output work on my system?)
|
||||||
this feature does.) If your terminal doesn't have characters
|
this feature does.) If your terminal doesn't have characters
|
||||||
that need to be input as multibyte, however, you can still use
|
that need to be input as multibyte, however, you can still use
|
||||||
the meta bindings and can ignore the warning message. Use
|
the meta bindings and can ignore the warning message. Use
|
||||||
mytt(bindkey -m 2>/dev/null) to suprress it.
|
mytt(bindkey -m 2>/dev/null) to suppress it.
|
||||||
|
|
||||||
|
You might also note that the latest version of the Cygwin environment
|
||||||
|
for Windows supports UTF-8. In previous versions, zsh was able
|
||||||
|
to compile with the tt(MULTIBYTE) option enabled, but the system
|
||||||
|
didn't provide full support for it.
|
||||||
|
|
||||||
|
|
||||||
sect(How can I input characters that aren't on my keyboard?)
|
sect(How can I input characters that aren't on my keyboard?)
|
||||||
|
@ -2187,6 +2172,9 @@ url(http://www.unicode.org/charts/)(http://www.unicode.org/charts/).
|
||||||
however, using UTF-8 massively extends the number of valid characters
|
however, using UTF-8 massively extends the number of valid characters
|
||||||
that can be produced.
|
that can be produced.
|
||||||
|
|
||||||
|
If you have a recent X Window System installation, you might find
|
||||||
|
the tt(AltGr) key helps you input accented Latin characters; for
|
||||||
|
example on my keyboard tt(AltGr-; e) gives mytt(e) with an acute accent.
|
||||||
See also url(http://www.cl.cam.ac.uk/~mgk25/unicode.html#input)(http://www.cl.cam.ac.uk/~mgk25/unicode.html#input)
|
See also url(http://www.cl.cam.ac.uk/~mgk25/unicode.html#input)(http://www.cl.cam.ac.uk/~mgk25/unicode.html#input)
|
||||||
for general information on entering Unicode characters from a keyboard.
|
for general information on entering Unicode characters from a keyboard.
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue