mirror of
git://gcc.gnu.org/git/gcc.git
synced 2024-12-26 23:29:44 +08:00
222bb619fb
2001-02-06 Phil Edwards <pme@sources.redhat.com> * docs/html/configopts.html: Fix HTML markup. * docs/html/install.html: Bring up to date. * docs/html/17_intro/C++STYLE: Add global variable conventions. * docs/html/21_strings/howto.html: More notes. * docs/html/22_locale/howto.html: Fix HTML markup. * docs/html/27_io/howto.html: More notes. * docs/html/27_io/binary_iostreams_kanze.txt: New file. * docs/html/27_io/binary_iostreams_kuehl.txt: New file. From-SVN: r39503
428 lines
20 KiB
HTML
428 lines
20 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
|
<META NAME="AUTHOR" CONTENT="pme@sources.redhat.com (Phil Edwards)">
|
|
<META NAME="KEYWORDS" CONTENT="HOWTO, libstdc++, GCC, g++, libg++, STL">
|
|
<META NAME="DESCRIPTION" CONTENT="HOWTO for the libstdc++ chapter 27.">
|
|
<META NAME="GENERATOR" CONTENT="vi and eight fingers">
|
|
<TITLE>libstdc++-v3 HOWTO: Chapter 27</TITLE>
|
|
<LINK REL=StyleSheet HREF="../lib3styles.css">
|
|
<!-- $Id: howto.html,v 1.2 2001/01/23 17:02:27 pme Exp $ -->
|
|
</HEAD>
|
|
<BODY>
|
|
|
|
<H1 CLASS="centered"><A NAME="top">Chapter 27: Input/Output</A></H1>
|
|
|
|
<P>Chapter 27 deals with iostreams and all their subcomponents
|
|
and extensions. All <EM>kinds</EM> of fun stuff.
|
|
</P>
|
|
|
|
|
|
<!-- ####################################################### -->
|
|
<HR>
|
|
<H1>Contents</H1>
|
|
<UL>
|
|
<LI><A HREF="#1">Copying a file</A>
|
|
<LI><A HREF="#2">The buffering is screwing up my program!</A>
|
|
<LI><A HREF="#3">Binary I/O</A>
|
|
<LI><A HREF="#4">Iostreams class hierarchy diagram</A>
|
|
<LI><A HREF="#5">What is this <sstream>/stringstreams thing?</A>
|
|
<LI><A HREF="#6">Deriving a stream buffer</A>
|
|
<LI><A HREF="#7">More on binary I/O</A>
|
|
</UL>
|
|
|
|
<HR>
|
|
|
|
<!-- ####################################################### -->
|
|
|
|
<H2><A NAME="1">Copying a file</A></H2>
|
|
<P>So you want to copy a file quickly and easily, and most important,
|
|
completely portably. And since this is C++, you have an open
|
|
ifstream (call it IN) and an open ofstream (call it OUT):
|
|
<PRE>
|
|
#include <fstream>
|
|
|
|
std::ifstream IN ("input_file");
|
|
std::ofstream OUT ("output_file"); </PRE>
|
|
</P>
|
|
<P>Here's the easiest way to get it completely wrong:
|
|
<PRE>
|
|
OUT << IN;</PRE>
|
|
For those of you who don't already know why this doesn't work
|
|
(probably from having done it before), I invite you to quickly
|
|
create a simple text file called "input_file" containing
|
|
the sentence
|
|
<PRE>
|
|
The quick brown fox jumped over the lazy dog.</PRE>
|
|
surrounded by blank lines. Code it up and try it. The contents
|
|
of "output_file" may surprise you.
|
|
</P>
|
|
<P>Seriously, go do it. Get surprised, then come back. It's worth it.
|
|
</P>
|
|
<HR WIDTH="60%">
|
|
<P>The thing to remember is that the <TT>basic_[io]stream</TT> classes
|
|
handle formatting, nothing else. In particular, they break up on
|
|
whitespace. The actual reading, writing, and storing of data is
|
|
handled by the <TT>basic_streambuf</TT> family. Fortunately, the
|
|
<TT>operator<<</TT> is overloaded to take an ostream and
|
|
a pointer-to-streambuf, in order to help with just this kind of
|
|
"dump the data verbatim" situation.
|
|
</P>
|
|
<P>Why a <EM>pointer</EM> to streambuf and not just a streambuf? Well,
|
|
the [io]streams hold pointers (or references, depending on the
|
|
implementation) to their buffers, not the actual
|
|
buffers. This allows polymorphic behavior on the part of the buffers
|
|
as well as the streams themselves. The pointer is easily retrieved
|
|
using the <TT>rdbuf()</TT> member function. Therefore, the easiest
|
|
way to copy the file is:
|
|
<PRE>
|
|
OUT << IN.rdbuf();</PRE>
|
|
</P>
|
|
<P>So what <EM>was</EM> happening with OUT<<IN? Undefined
|
|
behavior, since that particular << isn't defined by the Standard.
|
|
I have seen instances where it is implemented, but the character
|
|
extraction process removes all the whitespace, leaving you with no
|
|
blank lines and only "Thequickbrownfox...". With
|
|
libraries that do not define that operator, IN (or one of IN's
|
|
member pointers) sometimes gets converted to a void*, and the output
|
|
file then contains a perfect text representation of a hexidecimal
|
|
address (quite a big surprise). Others don't compile at all.
|
|
</P>
|
|
<P>Also note that none of this is specific to o<B>*f*</B>streams.
|
|
The operators shown above are all defined in the parent
|
|
basic_ostream class and are therefore available with all possible
|
|
descendents.
|
|
</P>
|
|
<P>Return <A HREF="#top">to top of page</A> or
|
|
<A HREF="../faq/index.html">to the FAQ</A>.
|
|
</P>
|
|
|
|
<HR>
|
|
<H2><A NAME="2">The buffering is screwing up my program!</A></H2>
|
|
<!--
|
|
This is not written very well. I need to redo this section.
|
|
-->
|
|
<P>First, are you sure that you understand buffering? Particularly
|
|
the fact that C++ may not, in fact, have anything to do with it?
|
|
</P>
|
|
<P>The rules for buffering can be a little odd, but they aren't any
|
|
different from those of C. (Maybe that's why they can be a bit
|
|
odd.) Many people think that writing a newline to an output
|
|
stream automatically flushes the output buffer. This is true only
|
|
when the output stream is, in fact, a terminal and not a file
|
|
or some other device -- and <EM>that</EM> may not even be true
|
|
since C++ says nothing about files nor terminals. All of that is
|
|
system-dependant. (The "newline-buffer-flushing only occuring
|
|
on terminals" thing is mostly true on Unix systems, though.)
|
|
</P>
|
|
<P>Some people also believe that sending <TT>endl</TT> down an
|
|
output stream only writes a newline. This is incorrect; after a
|
|
newline is written, the buffer is also flushed. Perhaps this
|
|
is the effect you want when writing to a screen -- get the text
|
|
out as soon as possible, etc -- but the buffering is largely
|
|
wasted when doing this to a file:
|
|
<PRE>
|
|
output << "a line of text" << endl;
|
|
output << some_data_variable << endl;
|
|
output << "another line of text" << endl; </PRE>
|
|
The proper thing to do in this case to just write the data out
|
|
and let the libraries and the system worry about the buffering.
|
|
If you need a newline, just write a newline:
|
|
<PRE>
|
|
output << "a line of text\n"
|
|
<< some_data_variable << '\n'
|
|
<< "another line of text\n"; </PRE>
|
|
I have also joined the output statements into a single statement.
|
|
You could make the code prettier by moving the single newline to
|
|
the start of the quoted text on the thing line, for example.
|
|
</P>
|
|
<P>If you do need to flush the buffer above, you can send an
|
|
<TT>endl</TT> if you also need a newline, or just flush the buffer
|
|
yourself:
|
|
<PRE>
|
|
output << ...... << flush; // can use std::flush manipulator
|
|
output.flush(); // or call a member fn </PRE>
|
|
</P>
|
|
<P>On the other hand, there are times when writing to a file should
|
|
be like writing to standard error; no buffering should be done
|
|
because the data needs to appear quickly (a prime example is a
|
|
log file for security-related information). The way to do this is
|
|
just to turn off the buffering <EM>before any I/O operations at
|
|
all</EM> have been done, i.e., as soon as possible after opening:
|
|
<PRE>
|
|
std::ofstream os ("/foo/bar/baz");
|
|
std::ifstream is ("/qux/quux/quuux");
|
|
int i;
|
|
|
|
os.rdbuf()->pubsetbuf(0,0);
|
|
is.rdbuf()->pubsetbuf(0,0);
|
|
...
|
|
os << "this data is written immediately\n";
|
|
is >> i; // and this will probably cause a disk read </PRE>
|
|
</P>
|
|
<P>Since all aspects of buffering are handled by a streambuf-derived
|
|
member, it is necessary to get at that member with <TT>rdbuf()</TT>.
|
|
Then the public version of <TT>setbuf</TT> can be called. The
|
|
arguments are the same as those for the Standard C I/O Library
|
|
function (a buffer area followed by its size).
|
|
</P>
|
|
<P>A great deal of this is implementation-dependant. For example,
|
|
<TT>streambuf</TT> does not specify any actions for its own
|
|
<TT>setbuf()</TT>-ish functions; the classes derived from
|
|
<TT>streambuf</TT> each define behavior that "makes
|
|
sense" for that class: an argument of (0,0) turns off
|
|
buffering for <TT>filebuf</TT> but has undefined behavior for
|
|
its sibling <TT>stringbuf</TT>, and specifying anything other
|
|
than (0,0) has varying effects. Other user-defined class derived
|
|
from streambuf can do whatever they want.
|
|
</P>
|
|
<P>A last reminder: there are usually more buffers involved than
|
|
just those at the language/library level. Kernel buffers, disk
|
|
buffers, and the like will also have an effect. Inspecting and
|
|
changing those are system-dependant.
|
|
</P>
|
|
<P>Return <A HREF="#top">to top of page</A> or
|
|
<A HREF="../faq/index.html">to the FAQ</A>.
|
|
</P>
|
|
|
|
<HR>
|
|
<H2><A NAME="3">Binary I/O</A></H2>
|
|
<P>The first and most important thing to remember about binary I/O is
|
|
that opening a file with <TT>ios::binary</TT> is not, repeat
|
|
<EM>not</EM>, the only thing you have to do. It is not a silver
|
|
bullet, and will not allow you to use the <TT><</>></TT>
|
|
operators of the normal fstreams to do binary I/O.
|
|
</P>
|
|
<P>Sorry. Them's the breaks.
|
|
</P>
|
|
<P>This isn't going to try and be a complete tutorial on reading and
|
|
writing binary files (because "binary"
|
|
<A HREF="#7">covers a lot of ground)</A>, but we will try and clear
|
|
up a couple of misconceptions and common errors.
|
|
</P>
|
|
<P>First, <TT>ios::binary</TT> has exactly one defined effect, no more
|
|
and no less. Normal text mode has to be concerned with the newline
|
|
characters, and the runtime system will translate between (for
|
|
example) '\n' and the appropriate end-of-line sequence (LF on Unix,
|
|
CRLF on DOS, CR on Macintosh, etc). (There are other things that
|
|
normal mode does, but that's the most obvious.) Opening a file in
|
|
binary mode disables this conversion, so reading a CRLF sequence
|
|
under Windows won't accidentally get mapped to a '\n' character, etc.
|
|
Binary mode is not supposed to suddenly give you a bitstream, and
|
|
if it is doing so in your program then you've discovered a bug in
|
|
your vendor's compiler (or some other part of the C++ implementation,
|
|
possibly the runtime system).
|
|
</P>
|
|
<P>Second, using <TT><<</TT> to write and <TT>>></TT> to
|
|
read isn't going to work with the standard file stream classes, even
|
|
if you use <TT>skipws</TT> during reading. Why not? Because
|
|
ifstream and ofstream exist for the purpose of <EM>formatting</EM>,
|
|
not reading and writing. Their job is to interpret the data into
|
|
text characters, and that's exactly what you don't want to happen
|
|
during binary I/O.
|
|
</P>
|
|
<P>Third, using the <TT>get()</TT> and <TT>put()/write()</TT> member
|
|
functions still aren't guaranteed to help you. These are
|
|
"unformatted" I/O functions, but still character-based.
|
|
(This may or may not be what you want, see below.)
|
|
</P>
|
|
<P>Notice how all the problems here are due to the inappropriate use
|
|
of <EM>formatting</EM> functions and classes to perform something
|
|
which <EM>requires</EM> that formatting not be done? There are a
|
|
seemingly infinite number of solutions, and a few are listed here:
|
|
<UL>
|
|
<LI>"Derive your own fstream-type classes and write your own
|
|
<</>> operators to do binary I/O on whatever data
|
|
types you're using." This is a Bad Thing, because while
|
|
the compiler would probably be just fine with it, other humans
|
|
are going to be confused. The overloaded bitshift operators
|
|
have a well-defined meaning (formatting), and this breaks it.
|
|
<LI>"Build the file structure in memory, then <TT>mmap()</TT>
|
|
the file and copy the structure." Well, this is easy to
|
|
make work, and easy to break, and is pretty equivalent to
|
|
using <TT>::read()</TT> and <TT>::write()</TT> directly, and
|
|
makes no use of the iostream library at all...
|
|
<LI>"Use streambufs, that's what they're there for."
|
|
While not trivial for the beginner, this is the best of all
|
|
solutions. The streambuf/filebuf layer is the layer that is
|
|
responsible for actual I/O. If you want to use the C++
|
|
library for binary I/O, this is where you start.
|
|
</UL>
|
|
</P>
|
|
<P>How to go about using streambufs is a bit beyond the scope of this
|
|
document (at least for now), but while streambufs go a long way,
|
|
they still leave a couple of things up to you, the programmer.
|
|
As an example, byte ordering is completely between you and the
|
|
operating system, and you have to handle it yourself.
|
|
</P>
|
|
<P>Deriving a streambuf or filebuf
|
|
class from the standard ones, one that is specific to your data
|
|
types (or an abstraction thereof) is probably a good idea, and
|
|
lots of examples exist in journals and on Usenet. Using the
|
|
standard filebufs directly (either by declaring your own or by
|
|
using the pointer returned from an fstream's <TT>rdbuf()</TT>)
|
|
is certainly feasible as well.
|
|
</P>
|
|
<P>One area that causes problems is trying to do bit-by-bit operations
|
|
with filebufs. C++ is no different from C in this respect: I/O
|
|
must be done at the byte level. If you're trying to read or write
|
|
a few bits at a time, you're going about it the wrong way. You
|
|
must read/write an integral number of bytes and then process the
|
|
bytes. (For example, the streambuf functions take and return
|
|
variables of type <TT>int_type</TT>.)
|
|
</P>
|
|
<P>Another area of problems is opening text files in binary mode.
|
|
Generally, binary mode is intended for binary files, and opening
|
|
text files in binary mode means that you now have to deal with all of
|
|
those end-of-line and end-of-file problems that we mentioned before.
|
|
An instructive thread from comp.lang.c++.moderated delved off into
|
|
this topic starting more or less at
|
|
<A HREF="http://www.deja.com/getdoc.xp?AN=436187505">this</A>
|
|
article and continuing to the end of the thread. (You'll have to
|
|
sort through some flames every couple of paragraphs, but the points
|
|
made are good ones.)
|
|
</P>
|
|
|
|
<HR>
|
|
<H2><A NAME="4">Iostreams class hierarchy diagram</A></H2>
|
|
<P>The <A HREF="iostreams_hierarchy.pdf">diagram</A> is in PDF. Rumor
|
|
has it that once Benjamin Kosnik has been dead for a few decades,
|
|
this work of his will be hung next to the Mona Lisa in the
|
|
<A HREF="http://www.louvre.fr/">Musee du Louvre</A>.
|
|
</P>
|
|
|
|
<HR>
|
|
<H2><A NAME="5">What is this <sstream>/stringstreams thing?</A></H2>
|
|
<P>Stringstreams (defined in the header <TT><sstream></TT>)
|
|
are in this author's opinion one of the coolest things since
|
|
sliced time. An example of their use is in the Received Wisdom
|
|
section for Chapter 21 (Strings),
|
|
<A HREF="../21_strings/howto.html#1.1internal"> describing how to
|
|
format strings</A>.
|
|
</P>
|
|
<P>The quick definition is: they are siblings of ifstream and ofstream,
|
|
and they do for <TT>std::string</TT> what their siblings do for
|
|
files. All that work you put into writing <TT><<</TT> and
|
|
<TT>>></TT> functions for your classes now pays off
|
|
<EM>again!</EM> Need to format a string before passing the string
|
|
to a function? Send your stuff via <TT><<</TT> to an
|
|
ostringstream. You've read a string as input and need to parse it?
|
|
Initialize an istringstream with that string, and then pull pieces
|
|
out of it with <TT>>></TT>. Have a stringstream and need to
|
|
get a copy of the string inside? Just call the <TT>str()</TT>
|
|
member function.
|
|
</P>
|
|
<P>This only works if you've written your
|
|
<TT><<</TT>/<TT>>></TT> functions correctly, though,
|
|
and correctly means that they take istreams and ostreams as
|
|
parameters, not i<B>f</B>streams and o<B>f</B>streams. If they
|
|
take the latter, then your I/O operators will work fine with
|
|
file streams, but with nothing else -- including stringstreams.
|
|
</P>
|
|
<P>If you are a user of the strstream classes, you need to update
|
|
your code. You don't have to explicitly append <TT>ends</TT> to
|
|
terminate the C-style character array, you don't have to mess with
|
|
"freezing" functions, and you don't have to manage the
|
|
memory yourself. The strstreams have been officially deprecated,
|
|
which means that 1) future revisions of the C++ Standard won't
|
|
support them, and 2) if you use them, people will laugh at you.
|
|
</P>
|
|
|
|
<HR>
|
|
<H2><A NAME="6">Deriving a stream buffer</A></H2>
|
|
<P>Creating your own stream buffers for I/O can be remarkably easy.
|
|
If you are interested in doing so, we highly recommend two very
|
|
excellent books: <EM>Standard C++ IOStreams and Locales</EM> by
|
|
Langer and Kreft, ISBN 0-201-18395-1, and
|
|
<A HREF="http://www.josuttis.com/libbook/">The C++ Standard Library</A>
|
|
by Nicolai Josuttis, ISBN 0-201-37926-0. Both are published by
|
|
Addison-Wesley, who isn't paying us a cent for saying that, honest.
|
|
</P>
|
|
<P>Here is a simple example, io/outbuf1, from the Josuttis text. It
|
|
transforms everything sent through it to uppercase. This version
|
|
assumes many things about the nature of the character type being
|
|
used (for more information, read the books or the newsgroups):
|
|
<PRE>
|
|
#include <iostream>
|
|
#include <streambuf>
|
|
#include <locale>
|
|
#include <cstdio>
|
|
|
|
class outbuf : public std::streambuf
|
|
{
|
|
protected:
|
|
/* central output function
|
|
* - print characters in uppercase mode
|
|
*/
|
|
virtual int_type overflow (int_type c) {
|
|
if (c != EOF) {
|
|
// convert lowercase to uppercase
|
|
c = std::toupper(static_cast<char>(c),getloc());
|
|
|
|
// and write the character to the standard output
|
|
if (putchar(c) == EOF) {
|
|
return EOF;
|
|
}
|
|
}
|
|
return c;
|
|
}
|
|
};
|
|
|
|
int main()
|
|
{
|
|
// create special output buffer
|
|
outbuf ob;
|
|
// initialize output stream with that output buffer
|
|
std::ostream out(&ob);
|
|
|
|
out << "31 hexadecimal: "
|
|
<< std::hex << 31 << std::endl;
|
|
return 0;
|
|
}
|
|
</PRE>
|
|
Try it yourself!
|
|
</P>
|
|
|
|
<HR>
|
|
<H2><A NAME="7">More on binary I/O</A></H2>
|
|
<P>Towards the beginning of February 2001, the subject of
|
|
"binary" I/O was brought up in a couple of places at the
|
|
same time. One notable place was Usenet, where James Kanze and
|
|
Dietmar Kühl separately posted articles on why attempting
|
|
generic binary I/O was not a good idea. (Here are copies of
|
|
<A HREF="binary_iostreams_kanze.txt">Kanze's article</A> and
|
|
<A HREF="binary_iostreams_kuehl.txt">Kühl's article</A>.)
|
|
</P>
|
|
<P>Briefly, the problems of byte ordering and type sizes mean that
|
|
the unformatted functions like <TT>ostream::put()</TT> and
|
|
<TT>istream::get()</TT> cannot safely be used to communicate
|
|
between arbitrary programs, or across a network, or from one
|
|
invocation of a program to another invocation of the same program
|
|
on a different platform, etc.
|
|
</P>
|
|
<P>The entire Usenet thread is instructive, and took place under the
|
|
subject heading "binary iostreams" on both comp.std.c++
|
|
and comp.lang.c++.moderated in parallel. Also in that thread,
|
|
Dietmar Kühl mentioned that he had written a pair of stream
|
|
classes that would read and write XDR, which is a good step towards
|
|
a portable binary format.
|
|
</P>
|
|
|
|
|
|
<!-- ####################################################### -->
|
|
|
|
<HR>
|
|
<P CLASS="fineprint"><EM>
|
|
Comments and suggestions are welcome, and may be sent to
|
|
<A HREF="mailto:pme@sources.redhat.com">Phil Edwards</A> or
|
|
<A HREF="mailto:gdr@gcc.gnu.org">Gabriel Dos Reis</A>.
|
|
<BR> $Id: howto.html,v 1.2 2001/01/23 17:02:27 pme Exp $
|
|
</EM></P>
|
|
|
|
|
|
</BODY>
|
|
</HTML>
|
|
|
|
|