1) synced dca3-kos repo which has some gainzy commits
2) rwdc_common.h
- all low-level and matrix/vector routines for SH4 are now shared in
this common file, included in both RW and Liberty/Miami engines
3) CMatrix
a. assignment operator: now uses asm-optimized mat_copy()
b. multiplication operator: now use mat_mult() SH4 routine
c. Scale(): applies a scale matrix via mat_scale
d. MultiplyInverse: fipr-optimizations
4) CQuaternion
a. multiplication: SH4 ASM FIPR optimized
b. Get(V3d& axis, float &angle): fast inversion/division
c. Set(RWMatrix&): fast division
5) CVector
a. Multiply3x3() now accelerated with mat_transpose
5) RwQuat
a. mult(): FIPR accelerated
b. length(): FIPR/FSRRA accelerated
By default with GCC SH, position-independent code is generated.
Supposedly this can be less-performant (although the extent to which it
is on SH4 is debatable). It also has to produce larger binaries due to
offsets in the .text segment.
Added -fno-PIC flag to miami and liberty Makefiles. Binary size dropped
by over 8KB. Performance looks the same. Don't forget to add this when
building KOS too for best results.
Threads for sampman and the VMU profiler were created just using
std::thread from C++11, which doesn't allow for configuration of its
stack size or label.
Wrote a small wrapper around KOS threads (and std::threads as fallback)
which takes arguments for configuring thread label, stack size, and
whether to detach it.
1) Added common/thread/thread.h/.c thread abstraction layer.
2) Updated Makefiles
3) VmuProfiler
- averages only 10 frames now, which is avg FPS over a second
- subclasses dc::Thread and only has a 2KB stack now
- got a label for thread dumps
4) Sampman (liberty/miami)
- now uses dc::Thread
- now only needs 2KB stack size
- got a label for thread dumps