PPU LLVM arm64+macOS port (#12115)

* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
This commit is contained in:
Jeff Guo 2022-06-14 05:28:38 -07:00 committed by GitHub
parent 264253757c
commit cefc37a553
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
14 changed files with 306 additions and 16 deletions

View file

@ -196,7 +196,11 @@ void* jit_runtime_base::_add(asmjit::CodeHolder* code) noexcept
ensure(!code->relocateToBase(uptr(p)));
{
// We manage rw <-> rx transitions manually on Apple
// because it's easier to keep track of when and where we need to toggle W^X
#if !(defined(ARCH_ARM64) && defined(__APPLE__))
asmjit::VirtMem::ProtectJitReadWriteScope rwScope(p, codeSize);
#endif
for (asmjit::Section* section : code->_sections)
{
@ -248,6 +252,9 @@ void jit_runtime::initialize()
void jit_runtime::finalize() noexcept
{
#ifdef __APPLE__
pthread_jit_write_protect_np(false);
#endif
// Reset JIT memory
#ifdef CAN_OVERCOMMIT
utils::memory_reset(get_jit_memory(), 0x80000000);
@ -262,6 +269,15 @@ void jit_runtime::finalize() noexcept
// Restore code/data snapshot
std::memcpy(alloc(s_code_init.size(), 1, true), s_code_init.data(), s_code_init.size());
std::memcpy(alloc(s_data_init.size(), 1, false), s_data_init.data(), s_data_init.size());
#ifdef __APPLE__
pthread_jit_write_protect_np(true);
#endif
#ifdef ARCH_ARM64
// Flush all cache lines after potentially writing executable code
asm("ISB");
asm("DSB ISH");
#endif
}
jit_runtime_base& asmjit::get_global_runtime()
@ -432,6 +448,21 @@ static u64 make_null_function(const std::string& name)
c.db(ch);
c.db(0);
c.align(AlignMode::kData, 16);
#else
// AArch64 implementation
Label jmp_address = c.newLabel();
Label data = c.newLabel();
// Force absolute jump to prevent out of bounds PC-rel jmp
c.ldr(args[0], arm::ptr(jmp_address));
c.br(args[0]);
c.align(AlignMode::kCode, 16);
c.bind(data);
c.embed(name.c_str(), name.size());
c.embedUInt8(0U);
c.bind(jmp_address);
c.embedUInt64(reinterpret_cast<u64>(&null));
c.align(AlignMode::kData, 16);
#endif
});
@ -840,6 +871,7 @@ jit_compiler::jit_compiler(const std::unordered_map<std::string, u64>& _link, co
std::string result;
auto null_mod = std::make_unique<llvm::Module> ("null_", *m_context);
null_mod->setTargetTriple(utils::c_llvm_default_triple);
if (_link.empty())
{
@ -852,7 +884,7 @@ jit_compiler::jit_compiler(const std::unordered_map<std::string, u64>& _link, co
else
{
mem = std::make_unique<MemoryManager2>();
null_mod->setTargetTriple(llvm::Triple::normalize("x86_64-unknown-linux-gnu"));
null_mod->setTargetTriple(utils::c_llvm_default_triple);
}
// Auxiliary JIT (does not use custom memory manager, only writes the objects)