mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-11-21 03:13:05 +08:00
Minor wording improvements per suggestion from Jeff Davis. Also tweak
hyphenated-word parser examples per earlier discussion with Alvaro.
This commit is contained in:
parent
8a8bcb447a
commit
2aac6f10f6
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.26 2007/10/25 13:06:35 alvherre Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.27 2007/10/27 00:19:45 tgl Exp $ -->
|
||||
|
||||
<chapter id="textsearch">
|
||||
<title id="textsearch-title">Full Text Search</title>
|
||||
@ -1770,7 +1770,7 @@ LIMIT 10;
|
||||
<row>
|
||||
<entry><literal>hword</></entry>
|
||||
<entry>Hyphenated word, all letters</entry>
|
||||
<entry><literal>político-militar</literal></entry>
|
||||
<entry><literal>lógico-matemática</literal></entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>numhword</></entry>
|
||||
@ -1780,14 +1780,13 @@ LIMIT 10;
|
||||
<row>
|
||||
<entry><literal>hword_asciipart</></entry>
|
||||
<entry>Hyphenated word part, all ASCII</entry>
|
||||
<entry><literal>militar</literal> in the context
|
||||
<literal>político-militar</literal>, or <literal>postgresql</literal> in the context <literal>postgresql-beta1</literal></entry>
|
||||
<entry><literal>postgresql</literal> in the context <literal>postgresql-beta1</literal></entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>hword_part</></entry>
|
||||
<entry>Hyphenated word part, all letters</entry>
|
||||
<entry><literal>físico</literal> or <literal>químico</literal>
|
||||
in the context <literal>físico-químico</literal></entry>
|
||||
<entry><literal>lógico</literal> or <literal>matemática</literal>
|
||||
in the context <literal>lógico-matemática</literal></entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>hword_numpart</></entry>
|
||||
@ -1902,12 +1901,12 @@ SELECT alias, description, token FROM ts_debug('foo-bar-beta1');
|
||||
instructive example:
|
||||
|
||||
<programlisting>
|
||||
SELECT alias, description, token FROM ts_debug('http://foo.com/stuff/index.html');
|
||||
alias | description | token
|
||||
----------+---------------+--------------------------
|
||||
SELECT alias, description, token FROM ts_debug('http://example.com/stuff/index.html');
|
||||
alias | description | token
|
||||
----------+---------------+------------------------------
|
||||
protocol | Protocol head | http://
|
||||
url | URL | foo.com/stuff/index.html
|
||||
host | Host | foo.com
|
||||
url | URL | example.com/stuff/index.html
|
||||
host | Host | example.com
|
||||
uri | URI | /stuff/index.html
|
||||
</programlisting>
|
||||
</para>
|
||||
@ -3093,8 +3092,9 @@ SELECT plainto_tsquery('supernovae stars');
|
||||
</para>
|
||||
|
||||
<para>
|
||||
A GiST index is <firstterm>lossy</firstterm>, meaning it is necessary
|
||||
to check the actual table row to eliminate false matches.
|
||||
A GiST index is <firstterm>lossy</firstterm>, meaning that the index
|
||||
may produce false matches, and it is necessary
|
||||
to check the actual table row to eliminate such false matches.
|
||||
<productname>PostgreSQL</productname> does this automatically; for
|
||||
example, in the query plan below, the <literal>Filter:</literal>
|
||||
line indicates the index output will be rechecked:
|
||||
@ -3112,14 +3112,15 @@ EXPLAIN SELECT * FROM apod WHERE textsearch @@ to_tsquery('supernovae');
|
||||
index by a fixed-length signature. The signature is generated by hashing
|
||||
each word into a random bit in an n-bit string, with all these bits OR-ed
|
||||
together to produce an n-bit document signature. When two words hash to
|
||||
the same bit position there will be a false match, and if all words in
|
||||
the same bit position there will be a false match. If all words in
|
||||
the query have matches (real or false) then the table row must be
|
||||
retrieved to see if the match is correct.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Lossiness causes performance degradation since random access to table
|
||||
records is slow; this limits the usefulness of GiST indexes. The
|
||||
Lossiness causes performance degradation due to useless fetches of table
|
||||
records that turn out to be false matches. Since random access to table
|
||||
records is slow, this limits the usefulness of GiST indexes. The
|
||||
likelihood of false matches depends on several factors, in particular the
|
||||
number of unique words, so using dictionaries to reduce this number is
|
||||
recommended.
|
||||
|
Loading…
Reference in New Issue
Block a user