Commit Graph

27 Commits

Author SHA1 Message Date
H. Peter Anvin (Intel)
b14dbb95a1 phash: simplify the code generators
Simplify the code generators by merging the two hash constant arrays
into one. The hash is effectively the same, although the order of the
constants differ (possibly in a way which makes the indexing easier.)
The main difference is the amount of code is necessary to generate
each of the output C files.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2020-07-10 19:26:52 -07:00
H. Peter Anvin (Intel)
0d17f8a7e6 phash: bloat the hashes somewhat, reducing the likelihood of false positives
Set the hash size scaling constant to 1.6, signifying 3.2 times the
hash load. This both reduces the convergence time and makes it less
likely (< 25%) that a non-entry will require a secondary comparison,
and after all, in most of our use cases non-entries are by far the
more common.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2020-07-09 23:39:58 -07:00
H. Peter Anvin (Intel)
177a05d0ce perl files: clean up warnings
Clean up some perl warnings, some of which were legitimate (apparently
undef doesn't actually take a list of arguments, a common enough
mistake that it is mentioned in the man page!, and a list of variables
after "my" can be cantankerous), and some of which were nuisance but
were easy enough to clean up.

Maybe this can resolve the problems with very old version of Perl?

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2019-08-09 13:30:19 -07:00
H. Peter Anvin
4b63094602 perllib/README: delete obsolete file
We have not included CPAN modules for a long time.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2017-02-23 20:24:56 -08:00
H. Peter Anvin
c33f05a9c7 phash.sh: Use int() for the size of the hash table
Pass the hash table size to int() to make it a bit more sane.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-03-26 09:25:10 -07:00
Victor van den Elzen
bc8522e3a0 Fix Perl deprecation warnings.
Use of defined on aggregates (hashes and arrays) is deprecated.
You should instead use a simple test for size.
2010-11-07 17:20:23 +01:00
H. Peter Anvin
0e7370cfa6 phash: move sample function to the sample file
read_input() shouldn't be part of the phash.ph module; instead it
should go into the sample usage file phash.pl.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-22 14:02:28 -07:00
H. Peter Anvin
3031bb8ee2 phash.ph: we haven't required the Graph module for a long time
We removed the need for the Graph module back at checkin
c593173e11 in 2008.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-18 10:36:21 -08:00
H. Peter Anvin
1df0b9ee2d phash: canonicalize order, fix handling of ignored duplicates
Canonicalize the order of the prehash entries, so we don't have to
worry about looking up both pairs of edges.

When we find a collision that we decide to ignore, there is no point
in adding the same edge into the array again; instead, just skip the
current edge.
2008-05-25 18:44:44 -07:00
H. Peter Anvin
14f9ea2925 phash: allow collisions if the hash target is the same
If the hash target is the same value, we can permit collisions.  This
isn't relevant for the current applications of the hash generator, but
is useful for cases where one have a number of sources for the same
target.  It's easy to check, either way.
2008-05-25 18:17:49 -07:00
H. Peter Anvin
c593173e11 phash: massively speed up the perfect hash generator
Make the perfect hash generator about 200x faster by using a very
simple custom graph adjacency representation instead of using
Graph::Undirected.
2008-05-25 18:10:57 -07:00
H. Peter Anvin
711b0c1e39 phash: cut random vector set down a bit
Reduce the size of the random vector set somewhat
2008-05-20 16:46:36 -07:00
H. Peter Anvin
a59795c986 Use the crc64 we already use as the perfect hash function prehash
Use the same crc64 that we already use for the symbol table hash as
the perfect hash function prehash.  We appear to get radically faster
convergence this way, and the crc64 is probably *faster*, since the
table likely to be resident in memory.
2007-10-02 17:40:00 -07:00
H. Peter Anvin
5255fd1f36 Change the token prehash function for better convergence
Combining arithmetric (add) and bitwise (xor) mixing seems to give
better result than either.

With the new prehash function, we find a valid hash much quicker.
2007-09-18 12:38:07 -07:00
H. Peter Anvin
ff6e9b4699 phash: Tell the user when the graph is OK
Tell the user when the graph is OK, so that we don't get quite so much of
"a list of errors followed by a long pause."
2007-09-12 16:55:57 +00:00
H. Peter Anvin
cdea6f96b6 phash: Be a bit more aggressive about trying to make a small hash
Change the threshold to 0.7 instead of 0.8.
2007-09-12 01:27:53 +00:00
H. Peter Anvin
e17a3cb29d phash.ph: yet another attempt at getting Perl to behave, arithmetically 2007-09-02 14:46:00 +00:00
H. Peter Anvin
e3e9e9fa0d phash.ph: remove some stale code
Remove old randomization code which is no longer used.
2007-09-02 06:20:15 +00:00
Chuck Crayne
757dfad900 Force use of integer values for generating hash keys. 2007-09-02 01:00:34 +00:00
H. Peter Anvin
b938e043ca phash: don't rely on the build platform Perl version of rand()
rand() in Perl can vary between platforms, so don't use it.  Instead,
remove a completely pointless level of indirection (it introduced a
permutation which cancelled itself out) and provide a canned set of
random numbers for the rest.  This guarantees we will always use the
same numbers.
2007-08-31 18:10:23 +00:00
H. Peter Anvin
91a86cdb31 tokhash: Speed up the rejection of unhashed values
Speed up the rejection of unhashed values (typically identifiers) by
filling unused hash slots with a large value (but not so large that
it is likely to overflow.)  This means those values will be rejected
already by the range check, not needing strcmp().
2007-08-31 07:23:31 +00:00
H. Peter Anvin
fb5a599c8a phash.ph: use a bipartite graph to reduce the storage requirements
Since we fold the f- and g-functions together, if we guarantee that g is
bipartite, we can make g twice the size of f without cost.  This greatly
improves the odds of generating a smaller hash.
2007-08-30 23:42:39 +00:00
H. Peter Anvin
74cc5e569c Finishing touches on perfect hash tokenizer; actually turn the thing on
Finish the perfect hash tokenizer, and actually enable it.

Move stdscan() et al to a separate file, since it's not needed in any
of the clients of nasmlib other than nasm itself.

Run make alldeps.
2007-08-30 22:35:34 +00:00
H. Peter Anvin
215c1a3781 phash.ph: more powerful prehashing 2007-08-30 21:39:37 +00:00
H. Peter Anvin
b44d7a76a2 Make the perfect hash generator an includable module 2007-08-30 20:15:25 +00:00
H. Peter Anvin
7a089c0fc7 Add README file 2007-08-29 17:24:03 +00:00
H. Peter Anvin
16a76654b8 Create a Perl library directory, and add the Graph module to it
Graph-0.84 from CPAN
2007-08-29 17:20:09 +00:00