From edcf9c237c137a7e299703f5f64d05f427372ab1 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Mon, 19 May 2003 21:38:24 +0000 Subject: [PATCH] Add error message style guidelines to the SGML documentation. --- doc/src/sgml/nls.sgml | 51 ++-- doc/src/sgml/sources.sgml | 595 +++++++++++++++++++++++++++++++++++++- 2 files changed, 626 insertions(+), 20 deletions(-) diff --git a/doc/src/sgml/nls.sgml b/doc/src/sgml/nls.sgml index 24a38643bb..634c82e90b 100644 --- a/doc/src/sgml/nls.sgml +++ b/doc/src/sgml/nls.sgml @@ -1,4 +1,6 @@ - + @@ -241,20 +243,20 @@ gmake update-po - If the original is a printf format string, the translation also - needs to be. The translation also needs to have the same + If the original is a printf format string, the translation + also needs to be. The translation also needs to have the same format specifiers in the same order. Sometimes the natural rules of the language make this impossible or at least awkward. - In this case you can use this format: + In that case you can modify the format specifiers like this: msgstr "Die Datei %2$s hat %1$u Zeichen." Then the first placeholder will actually use the second argument from the list. The digits$ needs to - follow the % and come before any other format manipulators. + follow the % immediately, before any other format manipulators. (This feature really exists in the printf - family of functions. You may not have heard of it because + family of functions. You may not have heard of it before because there is little use for it outside of message internationalization.) @@ -279,6 +281,7 @@ msgstr "Die Datei %2$s hat %1$u Zeichen." open file %s) should probably not start with a capital letter (if your language distinguishes letter case) or end with a period (if your language uses punctuation marks). + It may help to read . @@ -301,8 +304,11 @@ msgstr "Die Datei %2$s hat %1$u Zeichen." For the Programmer + + Mechanics + - This section describes how to support native language support in a + This section describes how to implement native language support in a program or library that is part of the PostgreSQL distribution. Currently, it only applies to C programs. @@ -348,15 +354,15 @@ fprintf(stderr, gettext("panic level %d\n"), lvl); - This may tend to add a lot of clutter. One common shortcut is to + This may tend to add a lot of clutter. One common shortcut is to use -#define _(x) gettext((x)) +#define _(x) gettext(x) Another solution is feasible if the program does much of its communication through one or a few functions, such as - elog() in the backend. Then you make this + ereport() in the backend. Then you make this function call gettext internally on all - input values. + input strings. @@ -430,19 +436,29 @@ fprintf(stderr, gettext("panic level %d\n"), lvl); The build system will automatically take care of building and installing the message catalogs. + + + + Message-writing guidelines - To ease the translation of messages, here are some guidelines: + Here are some guidelines for writing messages that are easily + translatable. - Do not construct sentences at run-time out of laziness, like + Do not construct sentences at run-time, like -printf("Files where %s.\n", flag ? "copied" : "removed"); +printf("Files were %s.\n", flag ? "copied" : "removed"); The word order within the sentence may be different in other - languages. + languages. Also, even if you remember to call gettext() on each + fragment, the fragments may not translate well separately. It's + better to duplicate a little code so that each message to be + translated is a coherent whole. Only numbers, file names, and + such-like run-time variables should be inserted at runtime into + a message text. @@ -462,8 +478,8 @@ else then be disappointed. Some languages have more than two forms, with some peculiar rules. We may have a solution for this in - the future, but for now this is best avoided altogether. You - could write: + the future, but for now the matter is best avoided altogether. + You could write: printf("number of copied files: %d", n); @@ -485,6 +501,7 @@ printf("number of copied files: %d", n); + diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml index c07264e5db..96d5171945 100644 --- a/doc/src/sgml/sources.sgml +++ b/doc/src/sgml/sources.sgml @@ -1,5 +1,5 @@ @@ -9,8 +9,17 @@ $Header: /cvsroot/pgsql/doc/src/sgml/sources.sgml,v 2.6 2002/01/20 22:19:56 pete Formatting - Source code formatting uses a 4 column tab spacing, currently with + Source code formatting uses 4 column tab spacing, with tabs preserved (i.e. tabs are not expanded to spaces). + Each logical indentation level is one additional tab stop. + Layout rules (brace positioning, etc) follow BSD conventions. + + + + While submitted patches do not absolutely have to follow these formatting + rules, it's a good idea to do so. Your code will get run through + pgindent, so there's no point in making it look nice + under some other set of formatting conventions. @@ -57,13 +66,593 @@ set tabstop=4 The text browsing tools more and less can be invoked as - more -x4 less -x4 + to make them show tabs appropriately. + + + Reporting Errors Within the Server + + + Error, warning, and log messages generated within the server code + should be created using ereport, or its older cousin + elog. The use of this function is complex enough to + require some explanation. + + + + There are two required elements for every message: a severity level + (ranging from DEBUG to PANIC) and a primary + message text. In addition there are optional elements, the most + common of which is an error identifier code that follows the SQL spec's + SQLSTATE conventions. + ereport itself is just a shell function, that exists + mainly for the syntactic convenience of making message generation + look like a function call in the C source code. The only parameter + accepted directly by ereport is the severity level. + The primary message text and any optional message elements are + generated by calling auxiliary functions, such as errmsg, + within the ereport call. + + + + A typical call to ereport might look like this: + + ereport(ERROR, + (errcode(ERRCODE_DIVISION_BY_ZERO), + errmsg("division by zero"))); + + This specifies error severity level ERROR (a run-of-the-mill + error). The errcode call specifies the SQLSTATE error code + using a macro defined in src/include/utils/elog.h. The + errmsg call provides the primary message text. Notice the + extra set of parentheses surrounding the auxiliary function calls --- + these are annoying but syntactically necessary. + + + + Here is a more complex example: + + ereport(ERROR, + (errmsg("Unable to identify an operator %s %s %s", + format_type_be(arg1), + NameListToString(op), + format_type_be(arg2)), + errhint("Try explicitly casting the arguments to appropriate types"))); + + This illustrates the use of format codes to embed run-time values into + a message text. Also, an optional hint message is provided. + + + + The available auxiliary routines for ereport are: + + + + errcode(sqlerrcode) specifies the SQLSTATE error identifier + code for the condition. If this is not specified, it defaults to + ERRCODE_INTERNAL_ERROR, which is a convenient default since + a large number of ereport calls are in fact for internal + can't happen conditions. But never use this default when + reporting user mistakes. + + + + + errmsg(const char *msg, ...) specifies the primary error + message text, and possibly run-time values to insert into it. Insertions + are specified by sprintf-style format codes. In addition to + the standard format codes accepted by sprintf, the format + code %m can be used to insert the error message returned + by strerror for the current value of errno. + + + That is, the value that was current when the ereport call + was reached; changes of errno within the auxiliary reporting + routines will not affect it. That would not be true if you were to + write strerror(errno) explicitly in errmsg's + parameter list; accordingly, do not do so. + + + %m does not require any + corresponding entry in the parameter list for errmsg. + Note that the message string will be run through gettext + for possible localization before format codes are processed. + + + + + errmsg_internal(const char *msg, ...) is the same as + errmsg, except that the message string will not be + included in the internationalization message dictionary. + This should be used for can't happen cases that are probably + not worth expending translation effort on. + + + + + errdetail(const char *msg, ...) supplies an optional + detail message; this is to be used when there is additional + information that seems inappropriate to put in the primary message. + The message string is processed in just the same way as for + errmsg. + + + + + errhint(const char *msg, ...) supplies an optional + hint message; this is to be used when offering suggestions + about how to fix the problem, as opposed to factual details about + what went wrong. + The message string is processed in just the same way as for + errmsg. + + + + + errcontext(const char *msg, ...) is not normally called + directly from an ereport message site; rather it is used + in error_context_stack callback functions to provide + information about the context in which an error occurred, such as the + current location in a PL function. + The message string is processed in just the same way as for + errmsg. Unlike the other auxiliary functions, this can + be called more than once per ereport call; the successive + strings thus supplied are concatenated with separating newlines. + + + + + errposition(int cursorpos) specifies the textual location + of an error within a query string. Currently it is only useful for + errors detected in the lexical and syntactic analysis phases of + query processing. + + + + + + + You may also see uses of the older function elog. This + is equivalent to an ereport call specifying only severity + level and primary message. Because the error code always defaults to + ERRCODE_INTERNAL_ERROR, elog should only be + used for internal errors. + + + + Advice about writing good error messages can be found in + . + + + + + Error Message Style Guide + + + This style guide is offered in the hope of maintaining a consistent, + user-friendly style throughout all the messages generated by + PostgreSQL. + + + + What goes where + + + The primary message should be short, factual, and avoid reference to + implementation details such as specific function names. + Short means should fit on one line under normal + conditions. Use a detail message if needed to keep the primary + message short, or if you feel a need to mention implementation details + such as the particular system call that failed. Both primary and detail + messages should be factual. Use a hint message for suggestions about what + to do to fix the problem, especially if the suggestion might not always be + applicable. + + + + For example, instead of + + IpcMemoryCreate: shmget(key=%d, size=%u, 0%o) failed: %m + (plus a long addendum that is basically a hint) + +write + + Primary: could not create shared memory segment: %m + Detail: Failed syscall was shmget(key=%d, size=%u, 0%o) + Hint: the addendum + + + + + Rationale: keeping the primary message short helps keep it to the point, + and lets clients lay out screen space on the assumption that one line is + enough for error messages. Detail and hint messages may be relegated to a + verbose mode, or perhaps a pop-up error-details window. Also, details and + hints would normally be suppressed from the server log to save + space. Reference to implementation details is best avoided since users + don't know the details anyway. + + + + + + Formatting + + + Don't put any specific assumptions about formatting into the message + texts. Expect clients and the server log to wrap lines to fit their own + needs. In long messages, newline characters (\n) may be used to indicate + suggested paragraph breaks. Don't end a message with a newline. Don't + use tabs or other formatting characters. (In error context displays, + newlines are automatically added to separate levels of context such as + function calls.) + + + + Rationale: Messages are not necessarily displayed on terminal-type + displays. In GUI displays or browsers these formatting instructions are + at best ignored. + + + + + + Quotation marks + + + English text should use double quotes when quoting is appropriate. + Text in other languages should consistently use one kind of quotes that is + consistent with publishing customs and computer output of other programs. + + + + Rationale: The choice of double quotes over single quotes is somewhat + arbitrary, but tends to be the preferred use. Some have suggested + choosing the kind of quotes depending on the type of object according to + SQL conventions (namely, strings single quoted, identifiers double + quoted). But this is a language-internal technical issue that many users + aren't even familiar with, it won't scale to other kinds of quoted terms, + it doesn't translate to other languages, and it's pretty pointless, too. + + + + + + Use of quotes + + + Use quotes always to delimit file names, user-supplied identifiers, and + other variables that might contain words. Do not use them to mark up + variables that will not contain words (for example, operator names). + + + + There are functions in the backend that will double-quote their own output + at need (for example, format_type_be()). Do not put + additional quotes around the output of such functions. + + + + Rationale: Objects can have names that create ambiguity when embedded in a + message. Be consistent about denoting where a plugged-in name starts and + ends. But don't clutter messages with unnecessary or duplicate quote + marks. + + + + + + Grammar and punctuation + + + The rules are different for primary error messages and for detail/hint + messages: + + + + Primary error messages: Do not capitalize the first letter. Do not end a + message with a period. Do not even think about ending a message with an + exclamation point. + + + + Detail and hint messages: Use complete sentences, and end each with + a period. Capitalize the starts of sentences. + + + + Rationale: Avoiding punctuation makes it easier for client applications to + embed the message into a variety of grammatical contexts. Often, primary + messages are not grammatically complete sentences anyway. (And if they're + long enough to be more than one sentence, they should be split into + primary and detail parts.) However, detail and hint messages are longer + and may need to include multiple sentences. For consistency, they should + follow complete-sentence style even when there's only one sentence. + + + + + + Upper case vs. lower case + + + Use lower case for message wording, including the first letter of a + primary error message. Use upper case for SQL commands and key words if + they appear in the message. + + + + Rationale: It's easier to make everything look more consistent this + way, since some messages are complete sentences and some not. + + + + + + Avoid passive voice + + + Use the active voice. Use complete sentences when there is an acting + subject (A could not do B). Use telegram style without + subject if the subject would be the program itself; do not use + I for the program. + + + + Rationale: The program is not human. Don't pretend otherwise. + + + + + + Present vs past tense + + + Use past tense if an attempt to do something failed, but could perhaps + succeed next time (perhaps after fixing some problem). Use present tense + if the failure is certainly permanent. + + + + There is a nontrivial semantic difference between sentences of the form + + could not open file "%s": %m + +and + + cannot open file "%s" + + The first one means that the attempt to open the file failed. The + message should give a reason, such as disk full or + file doesn't exist. The past tense is appropriate because + next time the disk might not be full anymore or the file in question may + exist. + + + + The second form indicates the the functionality of opening the named file + does not exist at all in the program, or that it's conceptually + impossible. The present tense is appropriate because the condition will + persist indefinitely. + + + + Rationale: Granted, the average user will not be able to draw great + conclusions merely from the tense of the message, but since the language + provides us with a grammar we should use it correctly. + + + + + + Type of the object + + + When citing the name of an object, state what kind of object it is. + + + + Rationale: Else no one will know what foo.bar.baz is. + + + + + + Brackets + + + Square brackets are only to be used (1) in command synopses to denote + optional arguments, or (2) to denote an array subscript. + + + + Rationale: Anything else does not correspond to widely-known customary + usage and will confuse people. + + + + + + Assembling error messages + + + When a message includes text that is generated elsewhere, embed it in + this style: + + could not open file %s: %m + + + + + Rationale: It would be difficult to account for all possible error codes + to paste this into a single smooth sentence, so some sort of punctuation + is needed. Putting the embedded text in parentheses has also been + suggested, but it's unnatural if the embedded text is likely to be the + most important part of the message, as is often the case. + + + + + + Reasons for errors + + + Messages should always state the reason why an error occurred. + For example: + + BAD: could not open file %s + BETTER: could not open file %s (I/O failure) + + If no reason is known you better fix the code. + + + + + + Function names + + + Don't include the name of the reporting routine in the error text. We have + other mechanisms for finding that out when needed, and for most users it's + not helpful information. If the error text doesn't make as much sense + without the function name, reword it. + + BAD: pg_atoi: error in "z": can't parse "z" + BETTER: invalid input syntax for integer: "z" + + + + + Avoid mentioning called function names, either; instead say what the code + was trying to do: + + BAD: open() failed: %m + BETTER: could not open file %s: %m + + If it really seems necessary, mention the system call in the detail + message. (In some cases, providing the actual values passed to the + system call might be appropriate information for the detail message.) + + + + Rationale: Users don't know what all those functions do. + + + + + + Tricky words to avoid + + + Unable + + Unable is nearly the passive voice. Better use + cannot or could not, as appropriate. + + + + + Bad + + Error messages like bad result are really hard to interpret + intelligently. It's better to write why the result is bad, + e.g., invalid format. + + + + + Illegal + + Illegal stands for a violation of the law, the rest is + invalid. Better yet, say why it's invalid. + + + + + Unknown + + Try to avoid unknown. Consider error: unknown + response. If you don't know what the response is, how do you know + it's erroneous? Unrecognized is often a better choice. + Also, be sure to include the value being complained of. + + BAD: unknown node type + BETTER: unrecognized node type: 42 + + + + + + Find vs. Exists + + If the program uses a nontrivial algorithm to locate a resource (e.g., a + path search) and that algorithm fails, it is fair to say that the program + couldn't find the resource. If, on the other hand, the + expected location of the resource is known but the program cannot access + it there then say that the resource doesn't exist. Using + find in this case sounds weak and confuses the issue. + + + + + + + Proper spelling + + + Spell out words in full. For instance, avoid: + + + + spec + + + + + stats + + + + + parens + + + + + auth + + + + + xact + + + + + + + Rationale: This will improve consistency. + + + + + + Localization + + + Keep in mind that error message texts need to be translated into other + languages. Follow the guidelines in + to avoid making life difficult for translators. + + + + +