postgresql/doc/src/sgml/sources.sgml
2004-12-13 18:05:10 +00:00

715 lines
23 KiB
Plaintext

<!--
$PostgreSQL: pgsql/doc/src/sgml/sources.sgml,v 2.16 2004/12/13 18:05:09 petere Exp $
-->
<chapter id="source">
<title>PostgreSQL Coding Conventions</title>
<sect1 id="source-format">
<title>Formatting</title>
<para>
Source code formatting uses 4 column tab spacing, with
tabs preserved (i.e. tabs are not expanded to spaces).
Each logical indentation level is one additional tab stop.
Layout rules (brace positioning, etc) follow BSD conventions.
</para>
<para>
While submitted patches do not absolutely have to follow these formatting
rules, it's a good idea to do so. Your code will get run through
<application>pgindent</>, so there's no point in making it look nice
under some other set of formatting conventions.
</para>
<para>
For <productname>Emacs</productname>, add the following (or
something similar) to your <filename>~/.emacs</filename>
initialization file:
<programlisting>
;; check for files with a path containing "postgres" or "pgsql"
(setq auto-mode-alist
(cons '("\\(postgres\\|pgsql\\).*\\.[ch]\\'" . pgsql-c-mode)
auto-mode-alist))
(setq auto-mode-alist
(cons '("\\(postgres\\|pgsql\\).*\\.cc\\'" . pgsql-c-mode)
auto-mode-alist))
(defun pgsql-c-mode ()
;; sets up formatting for PostgreSQL C code
(interactive)
(c-mode)
(setq-default tab-width 4)
(c-set-style "bsd") ; set c-basic-offset to 4, plus other stuff
(c-set-offset 'case-label '+) ; tweak case indent to match PG custom
(setq indent-tabs-mode t)) ; make sure we keep tabs when indenting
</programlisting>
</para>
<para>
For <application>vi</application>, your
<filename>~/.vimrc</filename> or equivalent file should contain
the following:
<programlisting>
set tabstop=4
</programlisting>
or equivalently from within <application>vi</application>, try
<programlisting>
:set ts=4
</programlisting>
</para>
<para>
The text browsing tools <application>more</application> and
<application>less</application> can be invoked as
<programlisting>
more -x4
less -x4
</programlisting>
to make them show tabs appropriately.
</para>
</sect1>
<sect1 id="error-message-reporting">
<title>Reporting Errors Within the Server</title>
<indexterm>
<primary>ereport</primary>
</indexterm>
<indexterm>
<primary>elog</primary>
</indexterm>
<para>
Error, warning, and log messages generated within the server code
should be created using <function>ereport</>, or its older cousin
<function>elog</>. The use of this function is complex enough to
require some explanation.
</para>
<para>
There are two required elements for every message: a severity level
(ranging from <literal>DEBUG</> to <literal>PANIC</>) and a primary
message text. In addition there are optional elements, the most
common of which is an error identifier code that follows the SQL spec's
SQLSTATE conventions.
<function>ereport</> itself is just a shell function, that exists
mainly for the syntactic convenience of making message generation
look like a function call in the C source code. The only parameter
accepted directly by <function>ereport</> is the severity level.
The primary message text and any optional message elements are
generated by calling auxiliary functions, such as <function>errmsg</>,
within the <function>ereport</> call.
</para>
<para>
A typical call to <function>ereport</> might look like this:
<programlisting>
ereport(ERROR,
(errcode(ERRCODE_DIVISION_BY_ZERO),
errmsg("division by zero")));
</programlisting>
This specifies error severity level <literal>ERROR</> (a run-of-the-mill
error). The <function>errcode</> call specifies the SQLSTATE error code
using a macro defined in <filename>src/include/utils/errcodes.h</>. The
<function>errmsg</> call provides the primary message text. Notice the
extra set of parentheses surrounding the auxiliary function calls &mdash;
these are annoying but syntactically necessary.
</para>
<para>
Here is a more complex example:
<programlisting>
ereport(ERROR,
(errcode(ERRCODE_AMBIGUOUS_FUNCTION),
errmsg("function %s is not unique",
func_signature_string(funcname, nargs,
actual_arg_types)),
errhint("Unable to choose a best candidate function. "
"You may need to add explicit typecasts.")));
</programlisting>
This illustrates the use of format codes to embed run-time values into
a message text. Also, an optional <quote>hint</> message is provided.
</para>
<para>
The available auxiliary routines for <function>ereport</> are:
<itemizedlist>
<listitem>
<para>
<function>errcode(sqlerrcode)</function> specifies the SQLSTATE error identifier
code for the condition. If this routine is not called, the error
identifier defaults to
<literal>ERRCODE_INTERNAL_ERROR</> when the error severity level is
<literal>ERROR</> or higher, <literal>ERRCODE_WARNING</> when the
error level is <literal>WARNING</>, otherwise (for <literal>NOTICE</>
and below) <literal>ERRCODE_SUCCESSFUL_COMPLETION</>.
While these defaults are often convenient, always think whether they
are appropriate before omitting the <function>errcode()</> call.
</para>
</listitem>
<listitem>
<para>
<function>errmsg(const char *msg, ...)</function> specifies the primary error
message text, and possibly run-time values to insert into it. Insertions
are specified by <function>sprintf</>-style format codes. In addition to
the standard format codes accepted by <function>sprintf</>, the format
code <literal>%m</> can be used to insert the error message returned
by <function>strerror</> for the current value of <literal>errno</>.
<footnote>
<para>
That is, the value that was current when the <function>ereport</> call
was reached; changes of <literal>errno</> within the auxiliary reporting
routines will not affect it. That would not be true if you were to
write <literal>strerror(errno)</> explicitly in <function>errmsg</>'s
parameter list; accordingly, do not do so.
</para>
</footnote>
<literal>%m</> does not require any
corresponding entry in the parameter list for <function>errmsg</>.
Note that the message string will be run through <function>gettext</>
for possible localization before format codes are processed.
</para>
</listitem>
<listitem>
<para>
<function>errmsg_internal(const char *msg, ...)</function> is the same as
<function>errmsg</>, except that the message string will not be
included in the internationalization message dictionary.
This should be used for <quote>can't happen</> cases that are probably
not worth expending translation effort on.
</para>
</listitem>
<listitem>
<para>
<function>errdetail(const char *msg, ...)</function> supplies an optional
<quote>detail</> message; this is to be used when there is additional
information that seems inappropriate to put in the primary message.
The message string is processed in just the same way as for
<function>errmsg</>.
</para>
</listitem>
<listitem>
<para>
<function>errhint(const char *msg, ...)</function> supplies an optional
<quote>hint</> message; this is to be used when offering suggestions
about how to fix the problem, as opposed to factual details about
what went wrong.
The message string is processed in just the same way as for
<function>errmsg</>.
</para>
</listitem>
<listitem>
<para>
<function>errcontext(const char *msg, ...)</function> is not normally called
directly from an <function>ereport</> message site; rather it is used
in <literal>error_context_stack</> callback functions to provide
information about the context in which an error occurred, such as the
current location in a PL function.
The message string is processed in just the same way as for
<function>errmsg</>. Unlike the other auxiliary functions, this can
be called more than once per <function>ereport</> call; the successive
strings thus supplied are concatenated with separating newlines.
</para>
</listitem>
<listitem>
<para>
<function>errposition(int cursorpos)</function> specifies the textual location
of an error within a query string. Currently it is only useful for
errors detected in the lexical and syntactic analysis phases of
query processing.
</para>
</listitem>
<listitem>
<para>
<function>errcode_for_file_access()</> is a convenience function that
selects an appropriate SQLSTATE error identifier for a failure in a
file-access-related system call. It uses the saved
<literal>errno</> to determine which error code to generate.
Usually this should be used in combination with <literal>%m</> in the
primary error message text.
</para>
</listitem>
<listitem>
<para>
<function>errcode_for_socket_access()</> is a convenience function that
selects an appropriate SQLSTATE error identifier for a failure in a
socket-related system call.
</para>
</listitem>
</itemizedlist>
</para>
<para>
There is an older function <function>elog</> that is still heavily used.
An <function>elog</> call
<programlisting>
elog(level, "format string", ...);
</programlisting>
is exactly equivalent to
<programlisting>
ereport(level, (errmsg_internal("format string", ...)));
</programlisting>
Notice that the SQLSTATE errcode is always defaulted, and the message
string is not included in the internationalization message dictionary.
Therefore, <function>elog</> should be used only for internal errors and
low-level debug logging. Any message that is likely to be of interest to
ordinary users should go through <function>ereport</>. Nonetheless,
there are enough internal <quote>can't happen</> error checks in the
system that <function>elog</> is still widely used; it is preferred for
those messages for its notational simplicity.
</para>
<para>
Advice about writing good error messages can be found in
<xref linkend="error-style-guide">.
</para>
</sect1>
<sect1 id="error-style-guide">
<title>Error Message Style Guide</title>
<para>
This style guide is offered in the hope of maintaining a consistent,
user-friendly style throughout all the messages generated by
<productname>PostgreSQL</>.
</para>
<simplesect>
<title>What goes where</title>
<para>
The primary message should be short, factual, and avoid reference to
implementation details such as specific function names.
<quote>Short</quote> means <quote>should fit on one line under normal
conditions</quote>. Use a detail message if needed to keep the primary
message short, or if you feel a need to mention implementation details
such as the particular system call that failed. Both primary and detail
messages should be factual. Use a hint message for suggestions about what
to do to fix the problem, especially if the suggestion might not always be
applicable.
</para>
<para>
For example, instead of
<programlisting>
IpcMemoryCreate: shmget(key=%d, size=%u, 0%o) failed: %m
(plus a long addendum that is basically a hint)
</programlisting>
write
<programlisting>
Primary: could not create shared memory segment: %m
Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
Hint: the addendum
</programlisting>
</para>
<para>
Rationale: keeping the primary message short helps keep it to the point,
and lets clients lay out screen space on the assumption that one line is
enough for error messages. Detail and hint messages may be relegated to a
verbose mode, or perhaps a pop-up error-details window. Also, details and
hints would normally be suppressed from the server log to save
space. Reference to implementation details is best avoided since users
don't know the details anyway.
</para>
</simplesect>
<simplesect>
<title>Formatting</title>
<para>
Don't put any specific assumptions about formatting into the message
texts. Expect clients and the server log to wrap lines to fit their own
needs. In long messages, newline characters (\n) may be used to indicate
suggested paragraph breaks. Don't end a message with a newline. Don't
use tabs or other formatting characters. (In error context displays,
newlines are automatically added to separate levels of context such as
function calls.)
</para>
<para>
Rationale: Messages are not necessarily displayed on terminal-type
displays. In GUI displays or browsers these formatting instructions are
at best ignored.
</para>
</simplesect>
<simplesect>
<title>Quotation marks</title>
<para>
English text should use double quotes when quoting is appropriate.
Text in other languages should consistently use one kind of quotes that is
consistent with publishing customs and computer output of other programs.
</para>
<para>
Rationale: The choice of double quotes over single quotes is somewhat
arbitrary, but tends to be the preferred use. Some have suggested
choosing the kind of quotes depending on the type of object according to
SQL conventions (namely, strings single quoted, identifiers double
quoted). But this is a language-internal technical issue that many users
aren't even familiar with, it won't scale to other kinds of quoted terms,
it doesn't translate to other languages, and it's pretty pointless, too.
</para>
</simplesect>
<simplesect>
<title>Use of quotes</title>
<para>
Use quotes always to delimit file names, user-supplied identifiers, and
other variables that might contain words. Do not use them to mark up
variables that will not contain words (for example, operator names).
</para>
<para>
There are functions in the backend that will double-quote their own output
at need (for example, <function>format_type_be</>()). Do not put
additional quotes around the output of such functions.
</para>
<para>
Rationale: Objects can have names that create ambiguity when embedded in a
message. Be consistent about denoting where a plugged-in name starts and
ends. But don't clutter messages with unnecessary or duplicate quote
marks.
</para>
</simplesect>
<simplesect>
<title>Grammar and punctuation</title>
<para>
The rules are different for primary error messages and for detail/hint
messages:
</para>
<para>
Primary error messages: Do not capitalize the first letter. Do not end a
message with a period. Do not even think about ending a message with an
exclamation point.
</para>
<para>
Detail and hint messages: Use complete sentences, and end each with
a period. Capitalize the first word of sentences.
</para>
<para>
Rationale: Avoiding punctuation makes it easier for client applications to
embed the message into a variety of grammatical contexts. Often, primary
messages are not grammatically complete sentences anyway. (And if they're
long enough to be more than one sentence, they should be split into
primary and detail parts.) However, detail and hint messages are longer
and may need to include multiple sentences. For consistency, they should
follow complete-sentence style even when there's only one sentence.
</para>
</simplesect>
<simplesect>
<title>Upper case vs. lower case</title>
<para>
Use lower case for message wording, including the first letter of a
primary error message. Use upper case for SQL commands and key words if
they appear in the message.
</para>
<para>
Rationale: It's easier to make everything look more consistent this
way, since some messages are complete sentences and some not.
</para>
</simplesect>
<simplesect>
<title>Avoid passive voice</title>
<para>
Use the active voice. Use complete sentences when there is an acting
subject (<quote>A could not do B</quote>). Use telegram style without
subject if the subject would be the program itself; do not use
<quote>I</quote> for the program.
</para>
<para>
Rationale: The program is not human. Don't pretend otherwise.
</para>
</simplesect>
<simplesect>
<title>Present vs past tense</title>
<para>
Use past tense if an attempt to do something failed, but could perhaps
succeed next time (perhaps after fixing some problem). Use present tense
if the failure is certainly permanent.
</para>
<para>
There is a nontrivial semantic difference between sentences of the form
<programlisting>
could not open file "%s": %m
</programlisting>
and
<programlisting>
cannot open file "%s"
</programlisting>
The first one means that the attempt to open the file failed. The
message should give a reason, such as <quote>disk full</quote> or
<quote>file doesn't exist</quote>. The past tense is appropriate because
next time the disk might not be full anymore or the file in question may
exist.
</para>
<para>
The second form indicates the the functionality of opening the named file
does not exist at all in the program, or that it's conceptually
impossible. The present tense is appropriate because the condition will
persist indefinitely.
</para>
<para>
Rationale: Granted, the average user will not be able to draw great
conclusions merely from the tense of the message, but since the language
provides us with a grammar we should use it correctly.
</para>
</simplesect>
<simplesect>
<title>Type of the object</title>
<para>
When citing the name of an object, state what kind of object it is.
</para>
<para>
Rationale: Otherwise no one will know what <quote>foo.bar.baz</>
refers to.
</para>
</simplesect>
<simplesect>
<title>Brackets</title>
<para>
Square brackets are only to be used (1) in command synopses to denote
optional arguments, or (2) to denote an array subscript.
</para>
<para>
Rationale: Anything else does not correspond to widely-known customary
usage and will confuse people.
</para>
</simplesect>
<simplesect>
<title>Assembling error messages</title>
<para>
When a message includes text that is generated elsewhere, embed it in
this style:
<programlisting>
could not open file %s: %m
</programlisting>
</para>
<para>
Rationale: It would be difficult to account for all possible error codes
to paste this into a single smooth sentence, so some sort of punctuation
is needed. Putting the embedded text in parentheses has also been
suggested, but it's unnatural if the embedded text is likely to be the
most important part of the message, as is often the case.
</para>
</simplesect>
<simplesect>
<title>Reasons for errors</title>
<para>
Messages should always state the reason why an error occurred.
For example:
<programlisting>
BAD: could not open file %s
BETTER: could not open file %s (I/O failure)
</programlisting>
If no reason is known you better fix the code.
</para>
</simplesect>
<simplesect>
<title>Function names</title>
<para>
Don't include the name of the reporting routine in the error text. We have
other mechanisms for finding that out when needed, and for most users it's
not helpful information. If the error text doesn't make as much sense
without the function name, reword it.
<programlisting>
BAD: pg_atoi: error in "z": can't parse "z"
BETTER: invalid input syntax for integer: "z"
</programlisting>
</para>
<para>
Avoid mentioning called function names, either; instead say what the code
was trying to do:
<programlisting>
BAD: open() failed: %m
BETTER: could not open file %s: %m
</programlisting>
If it really seems necessary, mention the system call in the detail
message. (In some cases, providing the actual values passed to the
system call might be appropriate information for the detail message.)
</para>
<para>
Rationale: Users don't know what all those functions do.
</para>
</simplesect>
<simplesect>
<title>Tricky words to avoid</title>
<formalpara>
<title>Unable</title>
<para>
<quote>Unable</quote> is nearly the passive voice. Better use
<quote>cannot</quote> or <quote>could not</quote>, as appropriate.
</para>
</formalpara>
<formalpara>
<title>Bad</title>
<para>
Error messages like <quote>bad result</quote> are really hard to interpret
intelligently. It's better to write why the result is <quote>bad</quote>,
e.g., <quote>invalid format</quote>.
</para>
</formalpara>
<formalpara>
<title>Illegal</title>
<para>
<quote>Illegal</quote> stands for a violation of the law, the rest is
<quote>invalid</quote>. Better yet, say why it's invalid.
</para>
</formalpara>
<formalpara>
<title>Unknown</title>
<para>
Try to avoid <quote>unknown</quote>. Consider <quote>error: unknown
response</quote>. If you don't know what the response is, how do you know
it's erroneous? <quote>Unrecognized</quote> is often a better choice.
Also, be sure to include the value being complained of.
<programlisting>
BAD: unknown node type
BETTER: unrecognized node type: 42
</programlisting>
</para>
</formalpara>
<formalpara>
<title>Find vs. Exists</title>
<para>
If the program uses a nontrivial algorithm to locate a resource (e.g., a
path search) and that algorithm fails, it is fair to say that the program
couldn't <quote>find</quote> the resource. If, on the other hand, the
expected location of the resource is known but the program cannot access
it there then say that the resource doesn't <quote>exist</quote>. Using
<quote>find</quote> in this case sounds weak and confuses the issue.
</para>
</formalpara>
</simplesect>
<simplesect>
<title>Proper spelling</title>
<para>
Spell out words in full. For instance, avoid:
<itemizedlist>
<listitem>
<para>
spec
</para>
</listitem>
<listitem>
<para>
stats
</para>
</listitem>
<listitem>
<para>
parens
</para>
</listitem>
<listitem>
<para>
auth
</para>
</listitem>
<listitem>
<para>
xact
</para>
</listitem>
</itemizedlist>
</para>
<para>
Rationale: This will improve consistency.
</para>
</simplesect>
<simplesect>
<title>Localization</title>
<para>
Keep in mind that error message texts need to be translated into other
languages. Follow the guidelines in <xref linkend="nls-guidelines">
to avoid making life difficult for translators.
</para>
</simplesect>
</sect1>
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->