This is a followup to PR #101344 (commit
0e06eb80bc9cf08000bf86d06c0ce57bcf7775be).
Some of them were not an issue because Godot was initializing all
members, but they were "fixed" just in case since it could become a
problem in the future.
Valgrind was specifically complaining about HashMapData &
GlobalPipelineData.
In order to make CommandQueueMT more maintainable this PR changes the
previous macro hell with variadic templates instead. This makes the
class far more explicit and will allow us to more easily change the way
the class functions in the future.
Furthermore this refactoring has allowed for some optimizations. In
particular by using std::forward to delay the decision of decaying the
type to as late as possible we are able to move the data from the
callsite into our Command buffer and later move it to the call.
In practice what this means is that compared to the old version instead
of copying values 3 times, we can now get away with 1 copy, and 1 move
for lvalues, and just 2 moves for rvalues. This saves quite a few
operations in a hot codepath.
We also now test to make sure that the amount of copies and moves are
what we expect. This way we can spot performance regressions in this
code easily.
Somewhat unscientifically, running TPS-demo by pressing enter and not
touching the controls average mspf, repeatable across many runs:
before: 6.467
after : 6.202
This fixes UBSAN errors reported by running our testsuite, importing the
TPS demo, and running the TPS demo. I have tried, wherever possible, to
fix issues related to reported issues but not directly reported by UBSAN
because thse code paths just happened to not have been exercised in
these cases.
These fixes apply only to errors reported, and caused by, core/
The following things have been changed:
* Make sure there are no implicit sign changing casts in core.
* Explicitly type enums that are part of a public API such that users of
the API cannot pass in wrongly-sized values leading to potential stack
corruption.
* Ensure that memcpy is never called with invalid or null pointers as
this is undefined behavior, and when the engine is built with
optimizations turned on leads to memory corruption and hard to debug
crashes.
* Replace enum values only used as static values with constexpr static
const values instead. This has no runtime overhead. This makes it so
that the size of the enums is explicit.
* Make sure that nan and inf is handled consistently in String.
* Implement a _to_int template to ensure that all of the paths use the
same algorhithm, and correct the negative integer case.
* Changed the way the json serializer precision work, and added tests to
verify the new behavior. The behavior doesn't quite match master in
particulary for negative doubles as the original code tried to cast -inf
to an int. This then led to negative doubles losing all but one of
their decimal points when serializing. Behavior in GDScript remains
unchanged.
3205a92ad8 was a major commit which removed `PoolVector`, and replaced
most references to `PoolVector` with `Vector` instead. In most cases,
this was appropriate, given that `PoolVector` was being replaced with
`Vector`, as an effective generalist in 64-bit address space layouts.
However, vector.h itself was left with an artifact advising the reader
to use `Vector` instead of `Vector` for large arrays. While this led
to a fascinating deep dive, and hopefully improved some of the
documentation along the way, it's probably best to clean this up for
the next person.
This helps, for importers spitting out new resources to the res://
filesystem to actually hash them to generate deterministic UIDs.
This PR also fixes the determinism for translations.
This PR makes RID_Owner lock free for fetching values, this should give a very
significant peformance boost where used.
Some considerations:
* A maximum number of elements to alocate must be given (by default 256k).
* Access to the RID structure is still safe given they are independent from addition/removals.
* RID access was never really thread-safe in the sense that the contents of the data are not protected anyway. Each server needs to implement locking as it sees fit.
Refactored the following template classes by replacing runtime checks with compile-time checks using if constexpr for improved code clarity and maintainability:
- RID_Alloc
- SortArray
- PagedAllocator
Changes made:
- Updated conditional checks for THREAD_SAFE in the RID_Alloc class.
- Updated conditional checks for Validate in the SortArray class.
- Updated conditional checks for thread_safe in the PagedAllocator class.
ResourceLoader:
- Fix invalid tokens being returned.
- Remove no longer written `ThreadLoadTask::dependent_path` and the code reading from it.
- Clear deadlock hazard by keeping the mutex unlocked during userland polling.
WorkerThreadPool:
- Include thread call queue override in the thread state reset set, which allows to simplify the code that handled that (imperfectly) in the ResourceLoader.
- Handle the mutex type correctly on entering an allowance zone.
CommandQueueMT:
- Handle the additional possibility of command buffer reallocation that mutex unlock allowance introduces.
Update CowData::insert to call ptrw() before loop, to avoid calling _copy_on_write for each item in the array, as well as repeated index checks in set and get. For larger Vectors/Arrays, this makes inserts around 10x faster for ints, 3x faster for Strings, and 2x faster for Variants. Less of an impact on smaller Vectors/Arrays, as a larger percentage of the time is spent allocating.