psq_l with QUANTIZE_FLOAT does not use the FPU, so it does not trim the precision of the u32 input data.
We already have the helper ConvertToDouble for floating point u32->u64 convertion used in lfs, so let's use it here as well.
Previously, PowerPC.h had four macros in it like so:
\#define rPS0(i) (*(double*)(&PowerPC::ppcState.ps[i][0]))
\#define rPS1(i) (*(double*)(&PowerPC::ppcState.ps[i][1]))
\#define riPS0(i) (*(u64*)(&PowerPC::ppcState.ps[i][0]))
\#define riPS1(i) (*(u64*)(&PowerPC::ppcState.ps[i][1]))
Casting between object representations like this is undefined behavior.
Given this is used heavily with the interpreter (that is, the most
accurate, but slowest CPU backend), we don't exactly want to allow
undefined behavior to creep into it.
Instead, this adds a helper struct for operating with the paired singles,
and replaces the four macros with a single macro for accessing the
paired-singles/floating-point registers.
This way, it's left up to the caller to explicitly decide how it wants to interpret
the data (and makes it more obvious where different interpretations of
the same data are occurring at, as there'll be a call to one of the
[x]AsDouble() functions).
These bits enable or disable paired-single execution based on how
they're set. If PSE isn't set, then all paired-single instructions are
illegal. If PSE is set, but LSQE isn't set, then psq_l, psq_lu, psq_st
and psq_stu are illegal to execute.
Also thanks go out to my roommate @Veegie for letting me use his Wii as
a blasting ground for tests, since mine isn't on hand right now. It only
caught on fire twice and only burned down half of the house through the
process; what a team player.
This option completely disabled the DCBZ instruction. Users are toggling
this option in dolphin forks and using that same problematic config when
launching dolphin. Removing the option from dolphin will let the config be
ignored.
PowerPCState's cr_val member is an array of u64s, so we can just use the
correct printf macro specifier within cinttypes. This also avoids
truncation on operating systems that use an LLP64 data model (like
Windows), where long is actually 32 bits in size, not 64-bit, which
could result in wonky values being printed, should Trace ever be used on
it.
HID2.LSQE is the Load/store quantize enable bit for non-indexed format
instructions (which are psq_l, psq_lu, psq_st, and psq_stu). If this bit
is not set and any of these instructions are attempted to be executed,
then a program exception is supposed to occur.
Despite both being documented as read-only registers, only one of them
is truly read-only. An mtspr to HID1 will steamroll bits 0-4 with
bits 0-4 of whatever value is currently in the source register, the rest
of the bits are not modified as bits 5-31 are considered reserved, so
these ignore writes to them.
PVR on the other hand, is truly a read-only register. Attempts to write
to it don't modify the value within it, so we model this behavior.
According to PEM 3.3.6.1, if a division by zero occurs and FPSCR.ZE is
set, then the result of the instruction operation is unchanged (see
table 3-13). Similarly, if an invalid operation occurs and FPSCR.VE is
set, then the destination should also remain unchanged (see table 3-12).
Hardware also matches this behavior.
We were handling this for other relevant instructions, but we weren't
doing so for the arithmetic instructions. This corrects that.
This also alters our NI_* functions to return an FPResult type, which
allows us to see which kind of exception in particular is set in
exceptional cases. This is necessary for cases like the fdiv
instructions, which requires handling both ZE and VE being potentially
set.
Rather than introduce this handling in every system instruction that modifies
the FPSCR directly, we can instead just handle it within the data structure
instead, which avoids duplicating mask handling across instructions.
This also allows handling proper masking from the debugger register
windows themselves without duplicating masking behavior there either.
As peculiar as this may be, decrementer exceptions by means of setting
the decrementer's zeroth bit from 0 to 1 is valid behavior by software
(and is defined in Programming Environments for 32-bit Microprocessors
in section 2.3.14.1 -- Decrementer operation). Given it's valid behavior,
it doesn't necessarily make sense to use a panic alert and halt, as this
isn't a condition where everything should be considered in a critical
state.
Instead, change it to an info log, so we still make note of it, but
without potentially tearing down state or halting emulation.
This hardware behavior makes sense, as the FI bit is used to signify an
inexact result. An inexact result is a form of value that results during
the rounding phase of denormalization. If any bits of the significand
are lost during said rounding, then the result is considered to be
inexact.
However NaN and infinity are not classed as subnormals and therefore
don't undergo the denormalization step, making loss of precision not
possible (in NaN's case, numerically rounding something that is
literally Not a Number doesn't even make sense).
FR is set to indicate whether or not the last arithmetic or rounding and
conversion instruction that rounded the intermediate result incremented
the fractional portion of the result. Given neither input types would be
affected by this, this should also be unset.
This corrects more of the exceptional case handling for these values to
match hardware.
Prevents implicit conversions to types and requires explicitly
specifying them in order to construct instances of them. Given these are
used within emulation code directly, being explicit is always better
than implicit.
As explained within 179d73ac0d, the table
within the Programming Environments Manual for PowerPC lists the FI and
FR bits as cleared for invalid operation cases. So, we amend the
relevant cases here in order to be accurate to hardware.
As explained within commit a08ad82ace, if
an invalid exception occurs and VE is set, then the destination register
should remain unchanged. Ditto for when ZE is set and a zero divide
exception occurs.
In the PEM manual, within Table 3-12, which lists what should occur for
invalid operation exceptions, the FPSCR.FI and FPSCR.FR bits are listed
as "Cleared" for when FPSCR.VE is unset and set. So we clear these bits
as well to match hardware behavior.
In the PowerPC Microprocessor Family: The Programming Environments
Manual for 32 and 64-bit Microprocessors, in section 3.3.6.1, Table
3-12 lists what should occur if an invalid operation exception occurs in
situations where VE is set and when VE is not set. In the case where VE
is set, it lists the frD as "Unchanged". It also lists the FPRF flags as
"Unchanged".
Further down in Table 3-13, the listings for what should occur when zero
divide exceptions occur is listed, both for when ZE is set, and when it
isn't. When ZE is set, it lists frD as "Unchanged". It also lists the
FPRF flags as "Unchanged" as well.
This also alters the code so that we don't even calculate the result if
we don't need to compute it, making it a little bit less wasteful.
Another bit of behavior that we weren't performing correctly is the
unsetting of FPSCR.FI and FPSCR.FR when FPSCR.ZX is supposed to be set.
This is supported in PEM's section 3.3.6.1 where the following is
stated:
"
When a zero divide condition occurs, the following actions are taken:
- Zero divide exception condition bit is set FPSCR[ZX] = 1.
- FPSCR[FR, FI] are cleared.
"
And so, this fixes that behavior.