Commit graph

2433 commits

Author SHA1 Message Date
Mai M
1054abc9cc
Merge pull request #9712 from JosJuice/jitarm64-fmul-rounding
JitArm64: Fix fmul rounding issues
2021-05-20 10:25:02 -04:00
Mai M
5949a19fe6
Merge pull request #9714 from JosJuice/jitarm64-convert-fmov
JitArm64: Prefer using FMOV when doing single/double conversion
2021-05-20 10:24:36 -04:00
Mai M
6958df5967
Merge pull request #9695 from JosJuice/jitarm64-fres
JitArm64: Implement fres and frsqrte
2021-05-20 10:23:49 -04:00
Mai M
539c2cb00e
Merge pull request #9667 from Sintendo/jit64divwx2
Jit64: Minor divwx optimizations
2021-05-20 10:22:54 -04:00
JosJuice
11be2314fe JitArm64: Fix fmul rounding issues
This is a port of 4f18f60 to JitArm64.
2021-05-15 23:27:34 +02:00
JosJuice
66e912a252 PPCAnalyst: Treat frspx output as single 2021-05-15 23:27:33 +02:00
JosJuice
77afb0f4c3 PPCAnalyst: Apply "bitexact" analysis to fprIsSingle
This lets us set fprIsSingle to true in more cases.
2021-05-15 23:27:33 +02:00
JosJuice
e5f2dcd891 JitArm64: Implement FPRF updates for fres+frsqrte 2021-05-15 19:21:17 +02:00
JosJuice
4b3fda7906 JitArm64: Implement frsqrte 2021-05-15 19:21:15 +02:00
JosJuice
85226e09f0 JitArm64: Implement fres 2021-05-15 19:16:32 +02:00
JosJuice
8c12068a03 JitArm64: Prefer using FMOV when doing single/double conversion
FMOV is faster than INS and ties UMOV. (On all CPUs I checked,
at least. It certainly shouldn't be slower, though.)
2021-05-15 18:56:40 +02:00
JosJuice
b980797a16 PPCAnalyst: Fix broken bitexact analysis
`code` points to the first instruction in the block, not the
current instruction.
2021-05-15 18:19:04 +02:00
Mat M
24b9a64c11
Merge pull request #9690 from Sintendo/jit64divwux
Jit64: divwux - Prefer three-operand IMUL
2021-05-13 06:42:14 -04:00
JosJuice
bfe8b1068d JitArm64: Implement FPRF updates 2021-05-13 11:51:00 +02:00
Sintendo
2cafa0a960 Jit64: divwux - Prefer three-operand IMUL
By taking advantage of three-operand IMUL, we can eliminate a MOV
instruction. This is a small code size win. However, due to IMUL sign
extending the immediate value to 64 bits, we can only apply this when
the magic number's most significant bit is zero.

To ensure this can actually happen, we also minimize the magic number by
checking for trailing zeroes.

Example (Unsigned division by 18)
Before:
41 BE E4 38 8E E3    mov         r14d,0E38E38E4h
4D 0F AF F5          imul        r14,r13
49 C1 EE 24          shr         r14,24h

After:
4D 69 F5 39 8E E3 38 imul        r14,r13,38E38E39h
49 C1 EE 22          shr         r14,22h
2021-05-06 19:54:33 +02:00
JosJuice
b305e4cfc1 JitArm64: Fix JitRegister::Register call for cstd
Seems like I made a little copy-paste error.
2021-05-06 00:20:47 +02:00
Léo Lam
51bf2dca21
Merge pull request #9675 from JosJuice/jit64-div-80000000
Jit64: Fix UB/infinite loop when compiling division by 0x80000000
2021-04-26 23:50:27 +02:00
JosJuice
7d4b87e7ae Jit64: Fix UB/infinite loop when compiling division by 0x80000000 2021-04-26 23:42:03 +02:00
JosJuice
ac679eb24d
Merge pull request #9666 from leoetlino/jit-block-hashtable
Jit: Optimize block link queries by using hash tables
2021-04-25 18:45:41 +02:00
JosJuice
69c14d6ec3 JitArm64: Fix frspx with single precision source
I haven't observed this breaking any game, but it didn't match
the behavior of the interpreter as far as I could tell from
reading the code, in that denormals weren't being flushed.
2021-04-25 15:56:59 +02:00
JosJuice
54451ac731 JitArm64: Use ConvertSingleToDoubleLower in RW when faster 2021-04-25 15:56:59 +02:00
JosJuice
9d6263f306 JitArm64: Add unit tests for single/double conversion 2021-04-25 15:56:58 +02:00
JosJuice
2a9d88739c JitArm64: Skip accurate single/double conversion if store-safe 2021-04-25 15:56:58 +02:00
JosJuice
1d106ceaf5 JitArm64: Optimize ConvertSingleToDouble, part 2
If we can prove that FCVT will provide a correct conversion,
we can use FCVT. This makes the common case a bit faster
and the less likely cases (unfortunately including zero,
which FCVT actually can convert correctly) a bit slower.
2021-04-25 15:56:19 +02:00
JosJuice
018e247624 JitArm64: Optimize ConvertSingleToDouble, part 1 2021-04-25 15:56:19 +02:00
JosJuice
28e4869c43 JitArm64: Optimize ConvertDoubleToSingle 2021-04-25 15:56:19 +02:00
JosJuice
6e0a5876ef JitArm64: Use accurate single/double conversions
Our old conversion approach became a lot more inaccurate when
enabling flush-to-zero, to the point of obviously breaking games.
2021-04-25 15:56:19 +02:00
JosJuice
39eccf6603 JitArm64: Call RW before FCMPE in fselx
Needed because the next commit will make RW clobber flags.
2021-04-25 15:56:19 +02:00
JosJuice
949686bbe7 JitArm64: Factor out single/double conversion code to functions
Preparation for following commits.

This commit intentionally doesn't touch paired stores,
since paired stores are supposed to flush to zero.
(Consistent with Jit64.)
2021-04-25 15:56:19 +02:00
JosJuice
fdf7744a53 JitArm64: Move float conversion code out of EmitBackpatchRoutine
This simplifies some of the following commits. It does require
an extra register, but hey, we have 32 of them.

Something I think would be nice to add to the register cache
in the future is the ability to keep both the single and double
version of a guest register in two different host registers
when that is useful. That way, the extra register we write to
here can be read by a later instruction, saving us from
having to perform the same conversion again.
2021-04-25 15:56:19 +02:00
Léo Lam
aa3a96f048
Merge pull request #9644 from JosJuice/jit-fallback-discard
Jits: Fix interpreter fallback handling of discarded registers
2021-04-25 13:20:41 +02:00
JosJuice
b3b5016f54 Jits: Fix interpreter fallback handling of discarded registers
When the interpreter writes to a discarded register, its type
must be changed so that it is no longer considered discarded.

Fixes a 62ce1c7 regression.
2021-04-25 13:01:40 +02:00
Sintendo
47e16133e5 Jit64: divwx - Eliminate XOR for constant dividend
We normally check for division by zero to know if we should set the
destination register to zero with a XOR. However, when the divisor and
destination registers are the same the explicit zeroing can be omitted.
In addition, some of the surrounding branching can be simplified as
well.

Before:
45 85 FF             test        r15d,r15d
75 05                jne         normal_path
45 33 FF             xor         r15d,r15d
EB 0C                jmp         done
normal_path:
B8 5A 00 00 00       mov         eax,5Ah
99                   cdq
41 F7 FF             idiv        eax,r15d
44 8B F8             mov         r15d,eax
done:

After:
45 85 FF             test        r15d,r15d
74 0C                je          done
B8 5A 00 00 00       mov         eax,5Ah
99                   cdq
41 F7 FF             idiv        eax,r15d
44 8B F8             mov         r15d,eax
done:
2021-04-24 21:32:21 +02:00
Sintendo
abc4c8f601 Jit64: divwx - Eliminate MOV for division by power of 2
Division by a power of two can be slightly improved when the
destination and dividend registers are the same.

Before:
8B C6                mov         eax,esi
85 C0                test        eax,eax
8D 70 03             lea         esi,[rax+3]
0F 49 F0             cmovns      esi,eax
C1 FE 02             sar         esi,2

After:
85 F6                test        esi,esi
8D 46 03             lea         eax,[rsi+3]
0F 48 F0             cmovs       esi,eax
C1 FE 02             sar         esi,2
2021-04-24 19:28:23 +02:00
Sintendo
246adf0d6d Jit64: divwx - Eliminate MOV for division by 2
When destination and input registers match, a redundant MOV instruction
can be eliminated.

Before:
8B C7                mov         eax,edi
8B F8                mov         edi,eax
C1 EF 1F             shr         edi,1Fh
03 F8                add         edi,eax
D1 FF                sar         edi,1

After:
8B C7                mov         eax,edi
C1 EF 1F             shr         edi,1Fh
03 F8                add         edi,eax
D1 FF                sar         edi,1
2021-04-24 18:53:21 +02:00
Léo Lam
c812ab6a63
Jit: Optimize block link queries by using hash tables
Repeated erase() + iteration on a std::multimap is extremely slow.

Slow enough that it causes a 7 second long stutter during some
transitions in F-Zero X (a N64 VC game that triggers many, many icache
invalidations).

And slow enough that JitBaseBlockCache::DestroyBlock shows up on a
flame graph as taking >50% of total CPU time on the CPU-GPU thread:
https://i.imgur.com/vvqiFL6.png

This commit optimises those block link queries by replacing the
std::multimap (which is typically implemented with red-black trees)
with hash tables.

Master: https://i.imgur.com/vvqiFL6.png / 7s stutters
(starting from 5.0-2021 and with branch following disabled)

This commit: https://i.imgur.com/hAO74fy.png / ~0.7s stutters, which
is pretty close to 5.0 stable. (5.0-2021 introduced the performance
regression and it is especially noticeable when branch following
is disabled, which is the case for all N64 VC games since 5.0-8377.)
2021-04-24 17:20:59 +02:00
Lioncash
adebc499f9 Jit64: Indicate explicit [[fallthrough]] within load helper 2021-04-19 17:37:44 -04:00
JMC47
5322256065
Merge pull request #9625 from leoetlino/mmu-sdr-update
MMU: Fix SDR updates being silently dropped in some cases
2021-04-06 20:23:13 -04:00
Pokechu22
dad309d365 Disable ICache emulation for some games
Specifically, 'Scooby-Doo! Mystery Mayhem', 'Scooby-Doo! Unmasked', 'Ed, Edd n Eddy: The Mis-Edventures', and the Wii version of 'Happy Feet'.

The JIT cache causes problems with emulated icache invalidation in these games, resulting in areas failing to load.
2021-04-06 12:44:10 -07:00
Léo Lam
49edd5f482
MMU: Remove a bunch of useless swaps
The swaps are confusing and don't accomplish much.

It was originally written like this:

u32 pte = bswap(*(u32*)&base_mem[pteg_addr]);

then bswap was changed to Common::swap32, and then the array access
was replaced with Memory::Read_U32, leading to the useless swaps.
2021-04-06 18:25:29 +02:00
Léo Lam
960d957f4f
MMU: Fix SDR updates being silently dropped in some cases
While 6xx_pem.pdf §7.6.1.1 mentions that the number of trailing
zeros in HTABORG must be equal to the number of trailing ones
in the mask (i.e. HTABORG must be properly aligned), this is actually
not a hard requirement. Real hardware will just OR the base address
anyway. Ignoring SDR changes would lead to incorrect emulation.

Logging a warning instead of dropping the SDR update silently is a
saner behaviour.
2021-04-06 18:25:09 +02:00
JMC47
5222a4b7e5
Merge pull request #9585 from JosJuice/jitarm64-skip-carry
JitArm64: Skip calculating carry flag when not needed
2021-04-06 04:41:16 -04:00
JMC47
99d43362e6
Merge pull request #9351 from JosJuice/discard-registers
Jits: Discard registers which we know will be overwritten
2021-04-06 04:40:26 -04:00
Pokechu22
004dfd1586 Replace uses of cassert with Common/Assert.h 2021-04-02 10:18:18 -07:00
JosJuice
b3f71f7cdc JitArm64: Allow DoJit at address 0 (fix launching Wii titles)
JitArm64::DoJit contains a check where it prints a warning and tries
to pause emulation if instructed to compile code at address 0. I'm
assuming this was done in order to provide a nicer error behavior
in cases where PC was accidentally set to null. Unfortunately, it
has started causing us problems recently, as 688bd61 writes and runs
some code at address 0 to simulate the PPC being held in reset.
What makes this worse is that calling Core::SetState from the CPU
thread is actually not allowed and will cause a deadlock instead of
the intended behavior. I don't believe there is anything on a real
console that would stop you from executing code at address 0 (as
long as the MMU has been set up to allow it), and Jit64::DoJit
doesn't contain any check like this, so let's remove the check.
2021-04-01 11:28:53 +02:00
Léo Lam
c915b780cf
Merge pull request #9596 from Minty-Meeo/apply-moar-RunAsCPUThread
Apply More Core::RunAsCPUThread
2021-03-27 01:11:34 +01:00
JosJuice
62ce1c7653 Jits: Discard registers which we know will be overwritten
This commit adds a new "discarded" state for registers.
Discarding a register is like flushing it, but without
actually writing its value back to memory. We can discard
a register only when it is guaranteed that no instruction
will read from the register before it is next written to.

Discarding reduces the register pressure a little, and can
also let us skip a few flushes on interpreter fallbacks.
2021-03-24 20:48:44 +01:00
JosJuice
901170e299 PPCTables: Use u64 for instruction flags
We've run out of space :(
2021-03-24 20:48:36 +01:00
JosJuice
1845c5948d PPCAnalyst: Rework the store-safe logic
The output of instructions like fabsx and ps_sel is store-safe
if and only if the relevant inputs are. The old code was always
marking the output as store-safe if the output was a single,
and never otherwise.

Also, the old code was treating the output of psq_l/psq_lu as
store-safe, which seems incorrect (if dequantization is disabled).
2021-03-24 12:02:09 +01:00
JosJuice
3bd920638d JitArm64: Use STP for pc/npc, part 2
I missed one place in dd8e504.
2021-03-23 21:27:07 +01:00