Simplify the code generators by merging the two hash constant arrays
into one. The hash is effectively the same, although the order of the
constants differ (possibly in a way which makes the indexing easier.)
The main difference is the amount of code is necessary to generate
each of the output C files.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Set the hash size scaling constant to 1.6, signifying 3.2 times the
hash load. This both reduces the convergence time and makes it less
likely (< 25%) that a non-entry will require a secondary comparison,
and after all, in most of our use cases non-entries are by far the
more common.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Clean up some perl warnings, some of which were legitimate (apparently
undef doesn't actually take a list of arguments, a common enough
mistake that it is mentioned in the man page!, and a list of variables
after "my" can be cantankerous), and some of which were nuisance but
were easy enough to clean up.
Maybe this can resolve the problems with very old version of Perl?
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
read_input() shouldn't be part of the phash.ph module; instead it
should go into the sample usage file phash.pl.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Canonicalize the order of the prehash entries, so we don't have to
worry about looking up both pairs of edges.
When we find a collision that we decide to ignore, there is no point
in adding the same edge into the array again; instead, just skip the
current edge.
If the hash target is the same value, we can permit collisions. This
isn't relevant for the current applications of the hash generator, but
is useful for cases where one have a number of sources for the same
target. It's easy to check, either way.
Use the same crc64 that we already use for the symbol table hash as
the perfect hash function prehash. We appear to get radically faster
convergence this way, and the crc64 is probably *faster*, since the
table likely to be resident in memory.
Combining arithmetric (add) and bitwise (xor) mixing seems to give
better result than either.
With the new prehash function, we find a valid hash much quicker.
rand() in Perl can vary between platforms, so don't use it. Instead,
remove a completely pointless level of indirection (it introduced a
permutation which cancelled itself out) and provide a canned set of
random numbers for the rest. This guarantees we will always use the
same numbers.
Speed up the rejection of unhashed values (typically identifiers) by
filling unused hash slots with a large value (but not so large that
it is likely to overflow.) This means those values will be rejected
already by the range check, not needing strcmp().
Since we fold the f- and g-functions together, if we guarantee that g is
bipartite, we can make g twice the size of f without cost. This greatly
improves the odds of generating a smaller hash.
Finish the perfect hash tokenizer, and actually enable it.
Move stdscan() et al to a separate file, since it's not needed in any
of the clients of nasmlib other than nasm itself.
Run make alldeps.