TODO list for PostgreSQL ======================== Last updated: Tue Feb 18 20:30:27 EST 2003 Current maintainer: Bruce Momjian (pgman@candle.pha.pa.us) The most recent version of this document can be viewed at the PostgreSQL web site, http://www.PostgreSQL.org. A dash (-) marks changes that will appear in the upcoming 7.4 release. Bracketed items "[]" have more detailed. Urgent ====== * Add replication of distributed databases [replication] o automatic failover o load balancing o master/slave replication o multi-master replication o partition data across servers o sample implementation in contrib/rserv o queries across databases or servers (two-phase commit) o allow replication over unreliable or non-persistent links o http://gborg.postgresql.org/project/pgreplication/projdisplay.php * Point-in-time data recovery using backup and write-ahead log * Create native Win32 port [win32] Reporting ========= * Allow elog() to return error codes, module name, file name, line number, not just messages (Peter E) * Add error codes (Peter E) * Make error messages more consistent [error] * Show location of syntax error in query [yacc] Administration ============== * Incremental backups * Remove unreferenced table files and temp tables during database vacuum or postmaster startup (Bruce) * Remove behavior of postmaster -o after making postmaster/postgres flags unique * Allow easy display of usernames in a group * Allow configuration files to be specified in a different directory * Add start time to pg_stat_activity * Allow limits on per-db/user connections * Have standalone backend read postgresql.conf * Add group object ownership, so groups can rename/drop/grant on objects, so we can implement roles * Add the concept of dataspaces/tablespaces [tablespaces] * Allow incremental backups * Allow CIDR format to be used in pg_hba.conf Data Types ========== * Add IPv6 capability to INET/CIDR types * Remove Money type, add money formatting for decimal type * Change factorial to return a numeric * Change NUMERIC data type to use base 10,000 internally * Change NUMERIC to enforce the maximum precision, and increase it * Add function to return compressed length of TOAST data values (Tom) * Allow INET subnet tests using non-constants * Add now("transaction|statement|clock") functionality * -Add GUC variables to control floating number output digits (Pedro Ferreira) * Have sequence dependency track use of DEFAULT sequences, seqname.nextval * Disallow changing default expression of a SERIAL column * Allow infinite dates just like infinite timestamps * Allow pg_dump to dump sequences using NO_MAXVALUE and NO_MINVALUE * CONVERSION o Allow better handling of numeric constants, type conversion [typeconv] * ARRAYS o Allow nulls in arrays o Allow arrays to be ORDER'ed o Support construction of array result values in expressions * BINARY DATA o Improve vacuum of large objects, like /contrib/vacuumlo o Add security checking for large objects o Make file in/out interface for TOAST columns, similar to large object interface (force out-of-line storage and no compression) o Auto-delete large objects when referencing row is deleted Multi-Language Support ====================== * Add NCHAR (as distinguished from ordinary varchar), * Allow LOCALE on a per-column basis, default to ASCII * Support multiple simultaneous character sets, per SQL92 * Improve Unicode combined character handling * Optimize locale to have minimal performance impact when not used (Peter E) * Add octet_length_server() and octet_length_client() (Thomas, Tatsuo) * Make octet_length_client the same as octet_length() (?) * Prevent mismatch of frontend/backend encodings from converting bytea data from being interpreted as encoded strings * Remove Cyrillic recode support Views / Rules ============= * Automatically create rules on views so they are updateable, per SQL92 [view] * Add the functionality for WITH CHECK OPTION clause of CREATE VIEW * Allow NOTIFY in rules involving conditionals * Have views on temporary tables exist in the temporary namespace * Move psql backslash information into views * Allow RULE recompilation Indexes ======= * Allow CREATE INDEX zman_index ON test (date_trunc( 'day', zman ) datetime_ops) fails index can't store constant parameters * Order duplicate index entries by tid for faster heap lookups * Allow inherited tables to inherit index, UNIQUE constraint, and primary key, foreign key [inheritance] * UNIQUE INDEX on base column not honored on inserts from inherited table INSERT INTO inherit_table (unique_index_col) VALUES (dup) should fail [inheritance] * Add UNIQUE capability to non-btree indexes * Add btree index support for reltime, tinterval, regproc * Add rtree index support for line, lseg, path, point * Certain indexes will not shrink, e.g. indexes on ever-increasing columns and indexes with many duplicate keys * Use indexes for min() and max() or convert to SELECT col FROM tab ORDER BY col DESC LIMIT 1 if appropriate index exists and WHERE clause acceptible * Allow LIKE indexing optimization for non-ASCII locales * Use index to restrict rows returned by multi-key index when used with non-consecutive keys or OR clauses, so fewer heap accesses * Be smarter about insertion of already-ordered data into btree index * Prevent index uniqueness checks when UPDATE does not modify the column * Use bitmaps to fetch heap pages in sequential order [performance] * Use bitmaps to combine existing indexes [performance] * Improve handling of index scans for NULL * Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8, float4, numeric/decimal too [optimizer] * Add FILLFACTOR to btree index creation * Add concurrency to GIST * Improve concurrency of hash indexes (Neil) * Require DROP COLUMN CASCADE for a column that is part of a multi-column index Commands ======== * Add BETWEEN ASYMMETRIC/SYMMETRIC (Christopher) * Allow LIMIT/OFFSET to use expressions * CREATE TABLE AS can not determine column lengths from expressions [atttypmod] * Allow UPDATE to handle complex aggregates [update] * Allow command blocks to ignore certain types of errors * Allow backslash handling in quoted strings to be disabled for portability * Return proper effected tuple count from complex commands [return] * Allow DELETE to handle table aliases for self-joins [delete] * Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT * Allow REINDEX to rebuild all indexes, remove /contrib/reindex * -Make a transaction-safe TRUNCATE (Rod) * Add ROLLUP, CUBE, GROUPING SETS options to GROUP BY * Add schema option to createlang * ALTER o ALTER TABLE ADD COLUMN does not honor DEFAULT and non-CHECK CONSTRAINT o ALTER TABLE ADD COLUMN column DEFAULT should fill existing rows with DEFAULT value o ALTER TABLE ADD COLUMN column SERIAL doesn't create sequence because of the item above o -Add ALTER TABLE tab SET WITHOUT OIDS (Rod) * Add ALTER SEQUENCE to modify min/max/increment/cache/cycle values * CLUSTER o Automatically maintain clustering on a table o -Allow CLUSTER to cluster all tables (Alvaro Herrera) * COPY o Allow dump/load of CSV format o Allow COPY to report error lines and continue; optionally allow error codes to be specified; requires savepoints or can not be run in a multi-statement transaction o Allow copy to understand \x as hex * CURSOR o Allow BINARY option to SELECT, just like DECLARE o -MOVE 0 should not move to end of cursor (Bruce) o Allow UPDATE/DELETE WHERE CURRENT OF cursor using per-cursor tid stored in the backend o Prevent DROP of table being referenced by our own open cursor o Allow cursors outside transactions [cursor] * INSERT o Allow INSERT/UPDATE of system-generated oid value for a row o Allow INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..) o Allow INSERT/UPDATE ... RETURNING new.col or old.col; handle RULE cases (Philip) * SHOW/SET o Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM ANALYZE, and CLUSTER o Add SET SCHEMA o Allow EXPLAIN EXECUTE to see prepared plans o Allow SHOW of non-modifiable variables, like pg_controldata o Add GUC parameter to control the maximum number of rewrite cycles * SERVER-SIDE LANGUAGES o Allow PL/PgSQL's RAISE function to take expressions o Change PL/PgSQL to use palloc() instead of malloc() o Add untrusted version of plpython o Allow Java server-side programming, http://pljava.sourceforge.net [java] o Fix problems with complex temporary table creation/destruction without using PL/PgSQL EXECUTE, needs cache prevention/invalidation o Fix PL/pgSQL RENAME to work on variables other than OLD/NEW o Improve PL/PgSQL exception handling o Allow parameters to be specified by name and type during definition o Allow function parameters to be passed by name, get_employee_salary(emp_id => 12345, tax_year => 2001) o Add PL/PgSQL packages o Allow array declarations and other data types in PL/PgSQL DECLARE o Add PL/PgSQL PROCEDURES that can return multiple values o Add table function support to pltcl, plperl, plpython o -Make PL/PgSQL %TYPE schema-aware o Allow PL/PgSQL to support array element assignment Clients ======= * Allow psql to show transaction status if backend protocol changes made * Add XML interface: psql, pg_dump, COPY, separate server (?) * -Add schema, cast, and conversion backslash commands to psql (Christopher) * -Allow pg_dump to dump a specific schema (Neil Conway) * Allow psql to do table completion for SELECT * FROM schema_part and table completion for SELECT * FROM schema_name. * JDBC o Comprehensive test suite. This may be available already. o JDBC-standard BLOB support o Error Codes (pending backend implementation) o Support both 'make' and 'ant' o Fix LargeObject API to handle OIDs as unsigned ints o Use cursors implicitly to avoid large results (see setCursorName()) o Add LISTEN/NOTIFY support to the JDBC driver (Barry) * ECPG o Implement set descriptor, using descriptor o Make casts work in variable initializations o Implement SQLDA o Allow multi-threaded use of SQLCA o Solve cardinality > 1 for input descriptors / variables o Understand structure definitions outside a declare section o sqlwarn[6] should be 'W' if the PRECISION or SCALE value specified o Improve error handling o Allow :var[:index] or :var[] as cvariable for an array var o Add a semantic check level, e.g. check if a table really exists o Fix nested C comments o Add SQLSTATE o fix handling of DB attributes that are arrays * Python o Allow users to register their own types with _pg o Allow SELECT to return a dictionary of dictionaries o Allow COPY BINARY FROM Referential Integrity ===================== * Add MATCH PARTIAL referential integrity [foreign] * Add deferred trigger queue file (Jan) * Implement dirty reads and use them in RI triggers * Enforce referential integrity for system tables * Change foreign key constraint for array -> element to mean element in array * Allow DEFERRABLE UNIQUE constraints * Allow triggers to be disabled [trigger] * -Support statement-level triggers (Neil) * Support triggers on columns (Neil) Dependency Checking =================== * Flush cached query plans when their underlying catalog data changes * Use dependency information to dump data in proper order Transactions ============ * Overhaul bufmgr/lockmgr/transaction manager * Allow savepoints / nested transactions [transactions] (Bruce) Exotic Features =============== * Add SQL99 WITH clause to SELECT (Tom, Fernando) * Add SQL99 WITH RECURSIVE to SELECT (Tom, Fernando) * Allow queries across multiple databases [crossdb] * Add pre-parsing phase that converts non-ANSI features to supported features * Allow plug-in modules to emulate features from other databases * SQL*Net listener that makes PostgreSQL appear as an Oracle database to clients * Two-phase commit to implement distributed transactions PERFORMANCE =========== Fsync ===== * Delay fsync() when other backends are about to commit too [fsync] o Determine optimal commit_delay value * Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options o Allow multiple blocks to be written to WAL with one write() Cache ===== * Shared catalog cache, reduce lseek()'s by caching table size in shared area * Add free-behind capability for large sequential scans (Bruce) * Allow binding query args over FE/BE protocol * Consider use of open/fcntl(O_DIRECT) to minimize OS caching * Make blind writes go through the file descriptor cache * Cache last known per-tuple offsets to speed long tuple access Vacuum ====== * Improve speed with indexes (perhaps recreate index instead) [vacuum] * Reduce lock time by moving tuples with read lock, then write lock and truncate table [vacuum] * Provide automatic running of vacuum in the background (Tom) [vacuum] * Allow free space map to be auto-sized or warn when it is too small Locking ======= * Make locking of shared data structures more fine-grained * Add code to detect an SMP machine and handle spinlocks accordingly from distributted.net, http://www1.distributed.net/source, in client/common/cpucheck.cpp * Research use of sched_yield() for spinlock acquisition failure Startup Time ============ * Experiment with multi-threaded backend [thread] * Add connection pooling [pool] * Allow persistent backends [persistent] * Create a transaction processor to aid in persistent connections and connection pooling * Do listen() in postmaster and accept() in pre-forked backend * Have pre-forked backend pre-connect to last requested database or pass file descriptor to backend pre-forked for matching database Write-Ahead Log =============== * Have after-change WAL write()'s write only modified data to kernel * Reduce number of after-change WAL writes; they exist only to gaurd against partial page writes [wal] * Turn off after-change writes if fsync is disabled (?) * Add WAL index reliability improvement to non-btree indexes * Find proper defaults for postgresql.conf WAL entries * Add checkpoint_min_warning postgresql.conf option to warn about checkpoints that are too frequent * Allow xlog directory location to be specified during initdb, perhaps using symlinks * Allow pg_xlog to be moved without symlinks * Allow WAL information to recover corrupted pg_controldata Optimizer / Executor ==================== * Improve Subplan list handling * Allow Subplans to use efficient joins(hash, merge) with upper variable * -Add hash for evaluating GROUP BY aggregates (Tom) * Allow merge and hash joins on expressions not just simple variables (Tom) * -Make IN/NOT IN have similar performance to EXISTS/NOT EXISTS (Tom) * Missing optimizer selectivities for date, r-tree, etc. [optimizer] * Allow ORDER BY ... LIMIT to select top values without sort or index using a sequential scan for highest/lowest values (Oleg) * -Inline simple SQL functions to avoid overhead (Tom) * Precompile SQL functions to avoid overhead (Neil) * Add utility to compute accurate random_page_cost value * Improve ability to display optimizer analysis using OPTIMIZER_DEBUG * Use CHECK constraints to improve optimizer decisions * Check GUC geqo_threshold to see if it is still accurate * Allow sorting, temp files, temp tables to use multiple work directories Miscellaneous ============= * Do async I/O for faster random read-ahead of data * -Get faster regex() code from Henry Spencer * Use mmap() rather than SYSV shared memory or to write WAL files (?) [mmap] * Improve caching of attribute offsets when NULLs exist in the row * Add a script to ask system configuration questions and tune postgresql.conf Source Code =========== * Add use of 'const' for variables in source tree * Rename some /contrib modules from pg* to pg_* * Move some things from /contrib into main tree * Remove warnings created by -Wcast-align * Move platform-specific ps status display info from ps_status.c to ports * Modify regression tests to prevent failures do to minor numeric rounding * -Add OpenBSD's getpeereid() call for local socket authentication * Improve access-permissions check on data directory in Cygwin (Tom) * Add --port flag to regression tests * Add documentation for perl, including mention of DBI/DBD perl location * Add optional CRC checksum to heap and index pages * Change representation of whole-tuple parameters to functions * Clarify use of 'application' and 'command' tags in SGML docs * Better document ability to build only certain interfaces (Marc) * Remove or relicense modules that are not under the BSD license, if possible * Remove memory/file descriptor freeing befor elog(ERROR) (Bruce) * Acquire lock on a relation before building a relcache entry for it * Research interaction of setitimer() and sleep() used by statement_timeout * Wire Protocol Changes o Show transaction status in psql o Allow binding of query parameters, support for prepared queries o Add optional textual message to NOTIFY o Remove hard-coded limits on user/db/password names o Remove unused elements of startup packet (unused, tty, passlength) o Fix COPY/fastpath protocol? o Allow fastpast to pass values in portable format o Replication support? o Error codes o Dynamic character set handling o Special passing of binary values in platform-neutral format (bytea?) o ecpg improvements? o Add decoded type, length, precision o Compression? --------------------------------------------------------------------------- Developers who have claimed items are: -------------------------------------- * Barry is Barry Lind * Billy is Billy G. Allie * Bruce is Bruce Momjian of Software Research Assoc. * Christopher is Christopher Kings-Lynne of Family Health Network * D'Arcy is D'Arcy J.M. Cain of The Cain Gang Ltd. * Dave is Dave Cramer * Edmund is Edmund Mergl * Fernando Nasser of Red Hat * Gavin Sherry of Alcove Systems Engineering * Hiroshi is Hiroshi Inoue * Karel is Karel Zak * Jan is Jan Wieck of PeerDirect Corp. * Liam is Liam Stewart of Red Hat * Marc is Marc Fournier of PostgreSQL, Inc. * Mark is Mark Hollomon * Michael is Michael Meskes of Credativ * Neil is Neil Conway * Oleg is Oleg Bartunov * Peter M is Peter T Mount of Retep Software * Peter E is Peter Eisentraut * Philip is Philip Warner of Albatross Consulting Pty. Ltd. * Rod is Rod Taylor * Ross is Ross J. Reedstrom * Stephan is Stephan Szabo * Tatsuo is Tatsuo Ishii of Software Research Assoc. * Thomas is Thomas Lockhart of Jet Propulsion Labratory * Tom is Tom Lane of Red Hat * Vadim is Vadim B. Mikheev of Sector Data