mirror of
git://sourceware.org/git/glibc.git
synced 2025-02-17 13:00:43 +08:00
Update.
* manual/message.texi: Document new interfaces.
This commit is contained in:
parent
964328be73
commit
b8a46c1d5a
@ -28,6 +28,7 @@
|
||||
* intl/po2test.sed: New file.
|
||||
* intl/tst-gettext.c: New file.
|
||||
* intl/tst-gettext.sh: New file.
|
||||
* manual/message.texi: Document new interfaces.
|
||||
|
||||
* intl/gettext.c: Call __dcgettext directly.
|
||||
|
||||
|
@ -226,7 +226,7 @@ When an error occured the global variable @var{errno} is set to
|
||||
@item EBADF
|
||||
The catalog does not exist.
|
||||
@item ENOMSG
|
||||
The set/message ttuple does not name an existing element in the
|
||||
The set/message tuple does not name an existing element in the
|
||||
message catalog.
|
||||
@end table
|
||||
|
||||
@ -470,7 +470,7 @@ This is the interface defined in the X/Open standard. If no
|
||||
@var{Input-File} parameter is given input will be read from standard
|
||||
input. Multiple input files will be read as if they are concatenated.
|
||||
If @var{Output-File} is also missing, the output will be written to
|
||||
standard output. To provide the interface one is used from other
|
||||
standard output. To provide the interface one is used to from other
|
||||
programs a second interface is provided.
|
||||
|
||||
@smallexample
|
||||
@ -604,10 +604,10 @@ gencat -H ex.h -o ex.cat ex.msg
|
||||
This generates a header file with the following content:
|
||||
|
||||
@smallexample
|
||||
#define SetTwoSet 0x2 /* u.msg:8 */
|
||||
#define SetTwoSet 0x2 /* ex.msg:8 */
|
||||
|
||||
#define SetOneSet 0x1 /* u.msg:4 */
|
||||
#define SetOnetwo 0x2 /* u.msg:6 */
|
||||
#define SetOneSet 0x1 /* ex.msg:4 */
|
||||
#define SetOnetwo 0x2 /* ex.msg:6 */
|
||||
@end smallexample
|
||||
|
||||
As can be seen the various symbols given in the source file are mangled
|
||||
@ -768,6 +768,8 @@ categories:
|
||||
@menu
|
||||
* Translation with gettext:: What has to be done to translate a message.
|
||||
* Locating gettext catalog:: How to determine which catalog to be used.
|
||||
* Advanced gettext functions:: Additional functions for more complicated
|
||||
situations.
|
||||
* Using gettextized software:: The possibilities of the user to influence
|
||||
the way @code{gettext} works.
|
||||
@end menu
|
||||
@ -800,6 +802,8 @@ the @file{libintl.h} header file. On systems where these functions are
|
||||
not part of the C library they can be found in a separate library named
|
||||
@file{libintl.a} (or accordingly different for shared libraries).
|
||||
|
||||
@comment libintl.h
|
||||
@comment GNU
|
||||
@deftypefun {char *} gettext (const char *@var{msgid})
|
||||
The @code{gettext} function searches the currently selected message
|
||||
catalogs for a string which is equal to @var{msgid}. If there is such a
|
||||
@ -845,6 +849,8 @@ uses the @code{gettext} functions but since it must not depend on a
|
||||
currently selected default message catalog it must specify all ambiguous
|
||||
information.
|
||||
|
||||
@comment libintl.h
|
||||
@comment GNU
|
||||
@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
|
||||
The @code{dgettext} functions acts just like the @code{gettext}
|
||||
function. It only takes an additional first argument @var{domainname}
|
||||
@ -857,6 +863,8 @@ As for @code{gettext} the return value type is @code{char *} which is an
|
||||
anachronism. The returned string must never be modified.
|
||||
@end deftypefun
|
||||
|
||||
@comment libintl.h
|
||||
@comment GNU
|
||||
@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
|
||||
The @code{dcgettext} adds another argument to those which
|
||||
@code{dgettext} takes. This argument @var{category} specifies the last
|
||||
@ -990,6 +998,8 @@ domain named @code{foo}. The important point is that at any time
|
||||
exactly one domain is active. This is controlled with the following
|
||||
function.
|
||||
|
||||
@comment libintl.h
|
||||
@comment GNU
|
||||
@deftypefun {char *} textdomain (const char *@var{domainname})
|
||||
The @code{textdomain} function sets the default domain, which is used in
|
||||
all future @code{gettext} calls, to @var{domainname}. Please note that
|
||||
@ -1019,6 +1029,8 @@ This possibility is questionable to use since the domain @code{messages}
|
||||
really never should be used.
|
||||
@end deftypefun
|
||||
|
||||
@comment libintl.h
|
||||
@comment GNU
|
||||
@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
|
||||
The @code{bindtextdomain} function can be used to specify the directory
|
||||
which contains the message catalogs for domain @var{domainname} for the
|
||||
@ -1056,6 +1068,298 @@ variable @var{errno} is set accordingly.
|
||||
@end deftypefun
|
||||
|
||||
|
||||
@node Advanced gettext functions
|
||||
@subsubsection Additional functions for more complicated situations
|
||||
|
||||
The functions of the @code{gettext} family described so far (and all the
|
||||
@code{catgets} functions as well) have one problem in the real world
|
||||
which have been neglected completely in all existing approaches. What
|
||||
is meant here is the handling of plural forms.
|
||||
|
||||
Looking through Unix source code before the time anybody thought about
|
||||
internationalization (and, sadly, even afterwards) one can often find
|
||||
code similar to the following:
|
||||
|
||||
@smallexample
|
||||
printf ("%d file%s deleted", n, n == 1 ? "" : "s");
|
||||
@end smallexample
|
||||
|
||||
@noindent
|
||||
After the first complains from people internationalizing the code people
|
||||
either completely avoided formulations like this or used strings like
|
||||
@code{"file(s)"}. Both look unnatural and should be avoided. First
|
||||
tries to solve the problem correctly looked like this:
|
||||
|
||||
@smallexample
|
||||
if (n == 1)
|
||||
printf ("%d file deleted", n);
|
||||
else
|
||||
printf ("%d files deleted", n);
|
||||
@end smallexample
|
||||
|
||||
But this does not solve the problem. It helps languages where the
|
||||
plural form of a noun is not simply constructed by adding an `s' but
|
||||
that is all. Once again people fell into the trap of believing the
|
||||
rules their language is using are universal. But the handling of plural
|
||||
forms differs widely between the language families. There are two
|
||||
things we can differ between (and even inside language families);
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
The form how plural forms are build differs. This is a problem with
|
||||
language which have many irregularities. German, for instance, is a
|
||||
drastic case. Though English and German are part of the same language
|
||||
family (Germanic), the almost regular forming of plural noun forms
|
||||
(appending an `s') is ardly found in German.
|
||||
|
||||
@item
|
||||
The number of plural forms differ. This is somewhat surprising for
|
||||
those who only have experiences with Romanic and Germanic languages
|
||||
since here the number is the same (there are two).
|
||||
|
||||
But other language families have only one form or many forms. More
|
||||
information on this in an extra section.
|
||||
@end itemize
|
||||
|
||||
The consequence of this is that application writers should not try to
|
||||
solve the problem in their code. This would be localization since it is
|
||||
only usable for certain, hardcoded language environments. Instead the
|
||||
extended @code{gettext} interface should be used.
|
||||
|
||||
These extra functions are taking instead of the one key string two
|
||||
strings and an numerical argument. The idea behind this is that using
|
||||
the numerical argument and the first string as a key, the implementation
|
||||
can select using rules specified by the translator the right plural
|
||||
form. The two string arguments then will be used to provide a return
|
||||
value in case no message catalog is found (similar to the normal
|
||||
@code{gettext} behaviour). In this case the rules for Germanic language
|
||||
is used and it is assumed that the first string argument is the singular
|
||||
form, the second the plural form.
|
||||
|
||||
This has the consequence that programs without language catalogs can
|
||||
display the correct strings only if the program itself is written using
|
||||
a Germanic language. This is a limitation but since the GNU C library
|
||||
(as well as the GNU @code{gettext} package) are written as part of the
|
||||
GNU package and the coding standards for the GNU project require program
|
||||
being written in English, this solution nevertheless fulfills its
|
||||
purpose.
|
||||
|
||||
@comment libintl.h
|
||||
@comment GNU
|
||||
@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
|
||||
The @code{ngettext} function is similar to the @code{gettext} function
|
||||
as it finds the message catalogs in the same way. But it takes two
|
||||
extra arguments. The @var{msgid1} parameter must contain the singular
|
||||
form of the string to be converted. It is also used as the key for the
|
||||
search in the catalog. The @var{msgid2} parameter is the plural form.
|
||||
The parameter @var{n} is used to determine the plural form. If no
|
||||
message catalog is found @var{msgid1} is returned if @code{n == 1},
|
||||
otherwise @code{msgid2}.
|
||||
|
||||
An example for the us of this function is:
|
||||
|
||||
@smallexample
|
||||
printf (ngettext ("%d file removed", "%d files removed", n), n);
|
||||
@end smallexample
|
||||
|
||||
Please note that the numeric value @var{n} has to be passed to the
|
||||
@code{printf} function as well. It is not sufficient to pass it only to
|
||||
@code{ngettext}.
|
||||
@end deftypefun
|
||||
|
||||
@comment libintl.h
|
||||
@comment GNU
|
||||
@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
|
||||
The @code{dngettext} is similar to the @code{dgettext} function in the
|
||||
way the message catalog is selected. The difference is that it takes
|
||||
two extra parameter to provide the correct plural form. These two
|
||||
parameters are handled in the same way @code{ngettext} handles them.
|
||||
@end deftypefun
|
||||
|
||||
@comment libintl.h
|
||||
@comment GNU
|
||||
@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
|
||||
The @code{dcngettext} is similar to the @code{dcgettext} function in the
|
||||
way the message catalog is selected. The difference is that it takes
|
||||
two extra parameter to provide the correct plural form. These two
|
||||
parameters are handled in the same way @code{ngettext} handles them.
|
||||
@end deftypefun
|
||||
|
||||
@subsubheading The problem of plural forms
|
||||
|
||||
A description of the problem can be found at the beginning of the last
|
||||
section. Now there is the question how to solve it. Without the input
|
||||
of linguists (which was not available) it was not possible to determine
|
||||
whether there are only a few different forms in which plural forms are
|
||||
formed or whether the number can increase with every new supported
|
||||
language.
|
||||
|
||||
Therefore the solution implemented is to allow the translator to specify
|
||||
the rules of how to select the plural form. Since the formula varies
|
||||
with every language this is the only viable solution except for
|
||||
harcoding the information in the code (which still would require the
|
||||
possibility of extensionsto not prevent the use of new languages). The
|
||||
details are explained in the GNU @code{gettext} manual. Here only a a
|
||||
bit of information is provided.
|
||||
|
||||
The information about the plural form selection has to be stored in the
|
||||
header entry (the one with the empty (@code{msgid} string). There shoud
|
||||
be something like:
|
||||
|
||||
@smallexample
|
||||
nplurals=2; plural=n == 1 ? 0 : 1
|
||||
@end smallexample
|
||||
|
||||
The @code{nplurals} value must be a decimal number which specifies how
|
||||
many different plural forms exist for this language. The string
|
||||
following @code{plural} is an expression which is using the C language
|
||||
syntax. Exceptions are that no negative number are allowed, numbers
|
||||
must be decimal, and the only variable allowed is @code{n}. This
|
||||
expression will be evaluated whenever one of the functions
|
||||
@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
|
||||
numeric value passed to these functions is then substituted for all uses
|
||||
of the variable @code{n} in the expression. The resulting value then
|
||||
must be greater or equal to zero and smaller than the value given as the
|
||||
value of @code{nplurals}.
|
||||
|
||||
@noindent
|
||||
The following rules are known at this point. The language with families
|
||||
are listed. But this does not necessarily mean the information can be
|
||||
generalized for the whole family (as can be easily seen in the table
|
||||
below).@footnote{Additions are welcome. Send appropriate information to
|
||||
@email{bug-glibc-manual@@gnu.org}.}
|
||||
|
||||
@table @asis
|
||||
@item Only one form:
|
||||
Some languages only require one single form. There is no distinction
|
||||
between the singular and plural form. And appropriate header entry
|
||||
would look like this:
|
||||
|
||||
@smallexample
|
||||
nplurals=1; plural=0
|
||||
@end smallexample
|
||||
|
||||
@noindent
|
||||
Languages with this property include:
|
||||
|
||||
@table @asis
|
||||
@item Finno-Ugric family
|
||||
Hungarian
|
||||
@item Asian family
|
||||
Japanese
|
||||
@item Turkic/Altaic family
|
||||
Turkish
|
||||
@end table
|
||||
|
||||
@item Two forms, singular used for one only
|
||||
This is the form used in most existing programs sine it is what English
|
||||
is using. A header entry would look like this:
|
||||
|
||||
@smallexample
|
||||
nplurals=2; plural=n != 1
|
||||
@end smallexample
|
||||
|
||||
(Note: this uses the feature of C expressions that boolean expressions
|
||||
have to value zero or one.)
|
||||
|
||||
@noindent
|
||||
Languages with this property include:
|
||||
|
||||
@table @asis
|
||||
@item Germanic family
|
||||
Danish, Dutch, English, German, Norwegian, Swedish
|
||||
@item Finno-Ugric family
|
||||
Finnish
|
||||
@item Latin/Greek family
|
||||
Greek
|
||||
@item Semitic family
|
||||
Hebrew
|
||||
@item Romance family
|
||||
Italian, Spanish
|
||||
@item Artificial
|
||||
Esperanto
|
||||
@end table
|
||||
|
||||
@item Two forms, singular used for zero and one
|
||||
Exceptional case in the language family. The header entry would be:
|
||||
|
||||
@smallexample
|
||||
nplurals=2; plural=n>1
|
||||
@end smallexample
|
||||
|
||||
@noindent
|
||||
Languages with this property include:
|
||||
|
||||
@table @asis
|
||||
@item Romanic family
|
||||
French
|
||||
@end table
|
||||
|
||||
@item Three forms, special cases for one and two
|
||||
The header entry would be:
|
||||
|
||||
@smallexample
|
||||
nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2
|
||||
@end smallexample
|
||||
|
||||
@noindent
|
||||
Languages with this property include:
|
||||
|
||||
@table @asis
|
||||
@item Celtic
|
||||
Gaeilge
|
||||
@end table
|
||||
|
||||
@item Three forms, special case for one and all numbers ending in 2, 3, or 4
|
||||
The header entry would look like this:
|
||||
|
||||
@smallexample
|
||||
nplurals=3; plural=n==1 ? 0 : n%10>=2 && n%10<=4 ? 1 : 2
|
||||
@end smallexample
|
||||
|
||||
@noindent
|
||||
Languages with this property include:
|
||||
|
||||
@table @asis
|
||||
@item Slavic family
|
||||
Russian
|
||||
@end table
|
||||
|
||||
@item Three forms, special case for one and some numbers ending in 2, 3, or 4
|
||||
The header entry would look like this:
|
||||
|
||||
@smallexample
|
||||
nplurals=3; plural=n==1 ? 0 : \
|
||||
n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2
|
||||
@end smallexample
|
||||
|
||||
(Continuation in the next line is possible.)
|
||||
|
||||
@noindent
|
||||
Languages with this property include:
|
||||
|
||||
@table @asis
|
||||
@item Slavic family
|
||||
Polish
|
||||
@end table
|
||||
|
||||
@item Four forms, special case for one and all numbers ending in 2, 3, or 4
|
||||
The header entry would look like this:
|
||||
|
||||
@smallexample
|
||||
nplurals=4; plural=n==1 ? 0 : n%10==2 ? 1 : n==3 || n+=4 ? 2 : 3
|
||||
@end smallexample
|
||||
|
||||
@noindent
|
||||
Languages with this property include:
|
||||
|
||||
@table @asis
|
||||
@item Slavic family
|
||||
Slovenian
|
||||
@end table
|
||||
@end table
|
||||
|
||||
|
||||
@node Using gettextized software
|
||||
@subsubsection User influence on @code{gettext}
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user