-
Notifications
You must be signed in to change notification settings - Fork 888
Codebase cleanup and assembly refactoring #64
Conversation
Not at all! On the contrary, I'm really grateful that you took your time to improve the code quality of HAXM. I can see that you spot something wrong and actually went on to fix it--that's really admirable. I've reviewed the first three patches and I think they are good to go. Please allow me more time on the last one, which is especially exciting because it is another big step toward a unified assembler for all supported platforms. |
raphaelning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you haven't tested the new NASM code yet. I have probably overlooked some issues, which you'll run into when you build and debug the code ;-)
core/ia32.asm
Outdated
| %define reg_arg1_32 ecx | ||
| %define reg_arg1_64 rcx | ||
| %define reg_arg1 reg_arg1_64 | ||
| %define reg_arg2_16 cx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to
https://docs.microsoft.com/en-us/cpp/build/overview-of-x64-calling-conventions
arg2 is found in *DX.
core/ia32.asm
Outdated
| %define reg_ret_32 eax | ||
| %define reg_ret reg_ret_32 | ||
| %elifidn __CONV__, x64_systemv | ||
| %define reg_arg1_16 si |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to
https://wiki.osdev.org/System_V_ABI#x86-64
arg1 is found in *DI, whereas arg2 is found in *SI.
core/ia32.asm
Outdated
| ret | ||
|
|
||
| function __fls | ||
| bsr eax, ecx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eax=>reg_ret_32,ecx=>reg_arg1_32- The original implementation initializes the return value to 0 before invoking BSR, which is necessary, because if the input is 0, BSR may not just return 0. See the description of the BSR instruction in Intel SDM Vol. 2A:
If the content source operand is 0, the content of the destination operand is undefined.
core/ia32.asm
Outdated
| function __handle_cpuid | ||
| %ifidn __BITS__, 64 | ||
| push rbx | ||
| mov r8, rcx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rcx => reg_arg1
core/ia32.asm
Outdated
| ret | ||
| %else | ||
| rdtsc | ||
| mov [reg_arg2 + qword_struct.lo], eax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reg_arg2 => reg_arg1
core/ia32.asm
Outdated
| mov edx, esi | ||
| wrmsr | ||
| push esi | ||
| push edi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
core/ia32.asm
Outdated
|
|
||
| function get_kernel_rflags | ||
| pushf | ||
| pop ax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ax => reg_ret_16
core/ia32.asm
Outdated
| mov rcx, reg_arg1 | ||
| rdmsr | ||
| shl rdx, 32 | ||
| or rax, rdx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rax => reg_ret
core/ia32.asm
Outdated
| %ifidn __BITS__, 64 | ||
| rdtsc | ||
| shl rdx, 32 | ||
| or rax, rdx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rax => reg_ret
core/haxlib.vcxproj
Outdated
| <NASM Include="emulate_ops.asm"> | ||
| <FileType>Document</FileType> | ||
| </NASM> | ||
| <NASM Include="ia32.asm" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: So we don't really need to specify the FileType tag? Then shall we just delete line 181?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what caused the <FileType> tag to be added, but it's unnecessary. It's deleted now.
f2b32d1 to
77ff4b8
Compare
|
Sorry for the issues, I was waiting for the other PR to be merged to rebase and review+test the code. There's still a build error under 32-bit configurations caused by mismatching calling conventions. I will fix it and let you know when everything's ready. |
|
Sure.
I see. I managed to dig out your old comment about the previous Win32 linking issue. Not sure if it's related to this one:
I also noticed that the new |
54bf8c3 to
b67284a
Compare
raphaelning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't expect you to finish all the NASM porting work with this PR--thanks a lot! Since the cover page still says the PR is "not ready to merge", we haven't started testing it. Today I only reviewed the updated ia32.asm; will finish the rest tomorrow.
| ; | ||
| %ifidn __CONV__, x32_cdecl | ||
| ; | ||
| ; Although cdecl does not place arguments in registers, we simulate fastcall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat, but I do have a concern--see below.
core/ia32.asm
Outdated
| global %1 | ||
| %1: | ||
| %ifidn __CONV__, x32_cdecl | ||
| mov reg_arg1, [esp + 0x4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the function has no argument or only one argument? We will waste a couple of instructions, and perhaps access invalid memory (highly unlikely, but I'm not sure if we can rule out the possibility).
How about adding a second parameter to %macro function, for declaring the number of parameters the function takes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a second parameter to
%macro function, for declaring the number of parameters the function takes?
Agreed. I will do that.
core/ia32.asm
Outdated
| %ifidn __BITS__, 64 | ||
| push rbx | ||
| mov r8, reg_arg1 | ||
| mov rax, [r8 + vcpu_state._rax] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vcpu_state => cpuid_args
| %endmacro | ||
|
|
||
| function asm_vmxon | ||
| vmxon [reg_arg1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would require the C function to be declared differently:
vmx_error_t ASMCALL asm_vmxon(paddr_t *vmxon_addr_ptr);
where vmxon_addr_ptr points to the 8-byte buffer that stores the physical address of the VMXON region, i.e. there are two levels of indirection.
This is because the instruction requires an m64 operand:
This operand is always 64 bits and is always in memory.
I think it is designed this way so that one can pass a long (> 32-bit) physical address to it in 32-bit mode. Note that in 32-bit mode, vmxon_addr_ptr itself is 32 bits and fits in reg_arg1, which is ECX.
Since we are renaming the function anyway, changing its C signature shouldn't be a big deal. The new C function will have to be called like this:
paddr_t vmxon_addr = hax_page_pa(cpu_data->vmxon_page);
vmx_error_t err = asm_vmxon(&vmxon_addr);
That should make sure vmxon_addr is always stored in memory (on the stack), so we can keep the assembly code simple.
| vmx_check | ||
|
|
||
| function asm_vmclear | ||
| vmclear [reg_arg1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about the m64 operand. We could change the C function signature to:
vmx_error_t ASMCALL asm_vmclear(paddr_t *vmcs_addr_ptr);
and call it like this:
paddr_t vmcs_addr = hax_page_pa(cpu_data->vmcs_page);
vmx_error_t err = asm_vmclear(&vmcs_addr);
core/vmx_ops.asm
Outdated
| vmclear [reg_arg1] | ||
| vmx_check | ||
|
|
||
| function asm_vmptrld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about the m64 operand.
| vmx_check | ||
|
|
||
| function asm_vmptrst | ||
| vmptrst [reg_arg1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about the m64 operand.
core/vmx_ops.asm
Outdated
| global %1 | ||
| %1: | ||
| %ifidn __CONV__, x32_cdecl | ||
| mov reg_arg1, [esp + 0x4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I've commented on ia32.asm, we should make this conditional on the number of arguments that the function takes.
core/vmx_ops.asm
Outdated
| ret | ||
|
|
||
| function asm_vmxrun | ||
| ; TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you have removed the old __vmxrun implementation. It's also okay if you keep it and address this TODO in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is implemented now.
core/vmx_ops.asm
Outdated
| ret | ||
|
|
||
| function asm_invept | ||
| ; TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is implemented now.
| ret | ||
|
|
||
| function get_kernel_gdt | ||
| sgdt [reg_arg1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Does sgdt not require the operand to be labeled fword ptr like lgdt does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the keyword is not necessary. In fact, fword is not even defined in NASM. See:
https://www.nasm.us/doc/nasmdoci.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. But it makes sense, because lgdt, etc. don't really use other (standard) operand sizes, so removing fword ptr doesn't really create any ambiguity.
| ret | ||
|
|
||
| function get_kernel_idt | ||
| sidt [reg_arg1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is fword ptr not necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No (see comment above).
| #include "../../../../include/hax.h" | ||
| #include "../../../../core/include/vcpu.h" | ||
|
|
||
| void hax_disable_preemption(preempt_flag *eflags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Windows version of this function is defined in windows/hax_wrapper.c and implemented with pure C code. I have no idea why the Mac version is defined in vmcs.c, but we can't just delete it. Instead, let's move it into darwin/..../hax_wrapper.cpp, and I think it's okay to keep the inline assembly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did that: #70. We could merge it now, but then you would have to rebase this entire PR. Let us know what works best for you.
| hax_disable_irq(); | ||
| } | ||
|
|
||
| void hax_enable_preemption(preempt_flag *eflags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Let's move this one into hax_wrapper.cpp as well.
|
I wonder if we should split this PR into multiple ones, so we can merge commits early, as soon as they become ready. Right now we have a small PR #69, which is ready to merge but has some conflicts with this PR, at least with commit |
|
Sorry, last week and part of this week I was/will be mostly AFK for work reasons. Don't worry about merging other PRs first, I don't mind rebasing. Also, I expect to finish this PR next weekend. Thanks for your other comments, I will try to update my patch as soon as possible :-) |
400d196 to
d555074
Compare
|
Thanks for merging the recent PRs, they had really useful stuff! Rebasing this patch was really simple. Sorry for the delays, I don't like keeping PRs idling, but I got overwhelmed by real-life work. All features intended for this PR are finished now. There's more to come, but that's for another PR since this one is already big enough. Since your last review, I've fixed the issues brought up by your comments and additionally:
This branch build successfully under Windows in Debug/Release and Win32/x64 configurations. I haven't got time to build it under macOS or test the current changes yet. Once you approve the patch, I'll rewrite the history into three commits:
|
raphaelning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's more to come, but that's for another PR since this one is already big enough.
Sounds good, thanks!
This branch build successfully under Windows in Debug/Release and Win32/x64 configurations. I haven't got time to build it under macOS or test the current changes yet.
Cool. If you address my new comments (there aren't many), I think the code will be good enough to be put to the test.
core/ia32.asm
Outdated
| %ifidn __CONV__, x32_cdecl | ||
| %if %2 >= 1 | ||
| mov reg_arg1, [esp + 0x4] | ||
| %elif %2 >= 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With %elif, the condition is equivalent to %2 < 1 && %2 >= 2 which is always false. I'd rewrite this part as below:
%if %2 >= 3
%error "Unsupported number of arguments"
%else
%if %2 >= 1
mov reg_arg1, [esp + 0x4]
%endif
%if %2 >= 2
mov reg_arg2, [esp + 0x8]
%endif
%endif
core/ia32.asm
Outdated
| %ifidn __BITS__, 64 | ||
| push rbx | ||
| mov r8, reg_arg1 | ||
| mov rax, [r8 + cpuid_args._eax] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rax => eax, because cpuid_args_t::eax is declared as uint32_t.
core/ia32.asm
Outdated
| push rbx | ||
| mov r8, reg_arg1 | ||
| mov rax, [r8 + cpuid_args._eax] | ||
| mov rcx, [r8 + cpuid_args._ecx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rcx => ecx
core/ia32.asm
Outdated
| mov rax, [r8 + cpuid_args._eax] | ||
| mov rcx, [r8 + cpuid_args._ecx] | ||
| cpuid | ||
| mov [r8 + cpuid_args._eax], rax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rax => eax
core/ia32.asm
Outdated
| mov rcx, [r8 + cpuid_args._ecx] | ||
| cpuid | ||
| mov [r8 + cpuid_args._eax], rax | ||
| mov [r8 + cpuid_args._ebx], rbx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rbx => ebx
core/ia32.asm
Outdated
| .hi resd 1 | ||
| endstruc | ||
|
|
||
| struc vcpu_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused now, so I'd remove it.
core/vmx_ops.asm
Outdated
| %ifidn __CONV__, x32_cdecl | ||
| %if %2 >= 1 | ||
| mov reg_arg1, [esp + 0x4] | ||
| %elif %2 >= 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use %elif here. I've suggested an alternative in ia32.asm.
| if (result == VMX_SUCCEED) { | ||
| cpu_data->vmm_flag |= VMXON_HAX; | ||
| } else { | ||
| bool fatal = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub doesn't allow me to comment on the unchanged lines below, but the Mac-specific code should not be excluded from the refactoring:
err & VMX_FAIL_INVALID=>result == VMX_FAIL_INVALID__vmptrst()=>paddr_t vmcs_addr; asm_vmptrst(&vmcs_addr);err=>result
core/vmx_ops.asm
Outdated
| ; push the state | ||
| push reg_arg1 | ||
| cmp reg_arg2_16, 1h | ||
| mov rax, rcx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rcx => reg_arg1
core/vmx_ops.asm
Outdated
| push eax | ||
| push ebx | ||
| ; push the state | ||
| push reg_arg1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to copy the state pointer to eax as the original 32-bit MASM code used to do:
mov eax, reg_arg1
push eax
Otherwise, since reg_arg1 == ecx != eax, we'll be loading ECX with the wrong data at line 317. And we can't fix that by simply replacing eax with reg_arg1 in lines 317-323, because line 317 will modify ECX and thus reg_arg1.
Yes, that's a good plan. Please feel free to incorporate the miscellaneous changes (spelling, coding style, etc.) into these three at your convenience. |
|
All done, patch history is clean now. An overview of every change is updated at #64 (comment). |
|
Thanks! Clang was able to spot a few more issues when we tried to build this PR on Mac (sorry I still don't know how to comment on unchanged code):
In addition, my colleague @wcwang ran a few tests on Windows, and some of them resulted in a host freeze. He can tell you more about that. |
|
When I launched Debian in QEMU, which guest image is debian_squeeze_i386_desktop.qcow2, the host would be frozen once the launch command was entered. No QEMU window popped up or logs output in command prompt. Similar results occurred on Android virtual device launch. You may try to reproduce this issue on Windows. |
| 43C9A9E7138DDA93000A1071 /* hax_host.h in Headers */ = {isa = PBXBuildFile; fileRef = 43C9A9E6138DDA93000A1071 /* hax_host.h */; }; | ||
| 43F857E013931E75008A93D6 /* com_intel_hax_mem.h in Headers */ = {isa = PBXBuildFile; fileRef = 43F857DE13931E75008A93D6 /* com_intel_hax_mem.h */; }; | ||
| 43F857E113931E75008A93D6 /* com_intel_hax_mem.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 43F857DF13931E75008A93D6 /* com_intel_hax_mem.cpp */; }; | ||
| 4BCC4E0513FB6729005E4BE4 /* ia32.c in Sources */ = {isa = PBXBuildFile; fileRef = 4BCC4E0213FB6729005E4BE4 /* ia32.c */; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ia32.asm and vmx_ops.asm haven't been added to the Xcode project. We're trying to do that and see if we can reproduce the host freeze issue on Mac too.
|
I was able to fix the compiler errors and add the two new Still, when loading I looked into a few of them, and they are actually C wrappers around |
|
Thanks for the report, testing and for the macOS-specific patch. I'll update my PR, and try to find and fix the bugs found by @wcwang.
Seems like a compiler bug in MSVC, see minimal example at: https://godbolt.org/g/KDjcGJ. Other compilers (gcc/clang/icc) either throw warnings/errors for this. Even MSVC throws an error when building as C++. I've reported the issue, but let's see if they even bother with it, as they tend to heavily prioritize C++ over C.
Will do so. I'll let you know with another comment once all reported issues are fixed. |
Thanks for the information! Before they fix this bug, we'll just continue to rely on Clang to detect coding errors. Regarding the host freeze issue, I wonder how you plan to debug it and narrow it down. We'd like to help, but all we are sure about now is that the bug lies in the third/last commit, which is very big. To make debugging easier, could you consider splitting it further, e.g.:
Or if you have a better idea, let us know :-) |
|
BTW:
|
|
We just tried to only pick up ia32.asm(original ia32.asm and segments.asm) related change, qemu could boot up with guest OS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were able to fix the host freeze bug with a simple change :-) The entire PR seems to be working fine on Windows now, although we are still running more tests to confirm that...
core/vmx_ops.asm
Outdated
| endstruc | ||
|
|
||
| %macro vmx_check 0 | ||
| pushf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushf => pushfw
This turns out to be the root cause of the host freeze issue! Although the pushf mnemonic is used by the Intel SDM for the 16-bit (operand size) variant of the instruction, it is still ambiguous, and in this case gets assembled into pushfq by NASM for win64. pushfw, on the other hand, is unambiguous.
core/ia32.asm
Outdated
| ret | ||
|
|
||
| function get_kernel_rflags, 0 | ||
| pushf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushf => pushfw (see my other comment)
|
Thanks a lot for tracking the bug down. I've now applied the
Sorry I couldn't made it in time, I am mostly AFK until tomorrow. I'll deal with the linking issues mentioned at #64 (comment) then. |
raphaelning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I couldn't made it in time
No worries. There's no real need to split the last commit anymore :-)
We tested the 32-bit code and found one issue so far, which should be easy to fix. Will let you know tomorrow if there's more.
I'll deal with the linking issues mentioned at #64 (comment) then.
Thanks, and please feel free to create a separate commit for it.
core/ia32.asm
Outdated
| %else | ||
| mov ecx, reg_arg1 | ||
| rdmsr | ||
| mov [reg_arg2 + qword_struct.lo], eax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Win32 driver causes a BSOD as soon as it's loaded, and this is the culprit: Here, reg_arg2 == edx, but EDX was just overwritten by rdmsr. So we have to save reg_arg2 to another register (e.g. EBX, which itself may need to be saved to the stack first) before calling rdmsr, and use that register as the base address here.
nit: To keep the code portable (e.g. in case we want to support other calling conventions), we could surround cdecl-specific logic with %ifidn __CONV__, x32_cdecl .. %endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could surround cdecl-specific logic with %ifidn CONV, x32_cdecl .. %endif
Or perhaps %ifidn reg_arg2, edx .. %endif. Anyway, it's not easy to achieve portability across different calling conventions, since we probably have cdecl-specific logic elsewhere in the code. So it's perfectly fine if you just explain in a comment why we must save reg_arg2.
BTW, I've noticed that line 234, which is equivalent to mov ecx, ecx, isn't optimized out by NASM, even for the Release build, and even if I manually run nasm with -Ox. This isn't a big issue at all, but do you think NASM should be smarter in recognizing this no-op? Or maybe it doesn't want to be too smart, because we could just want to waste some CPU cycles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but do you think NASM should be smarter in recognizing this no-op? Or maybe it doesn't want to be too smart, because we could just want to waste some CPU cycles?
I believe that's out-of-scope for NASM: it assembles the source as-is and only optimizes code when the instruction mnemonics are ambiguous and allow different variants. E.g. it will prioritize imm8 over imm32 when the constants is in range [0x00, 0xFF]. There's more info at:
https://www.nasm.us/doc/nasmdoc2.html#section-2.1.23
As you suggested, the only real solution is writing calling-convention specific implementations of functions that have side-effects on "specific" GPRs (just asm_rdmsr and asm_wrmsr). There might be more such pseudo-nop's out there, but I don't see them as a big issue either. Fortunately, most functions are simple enough that can be implemented just with the reg_* macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks! BTW, with asm_rdmsr fixed, the new code seems to work well on Win32, so I think we are very close to getting this PR merged!
core/ia32.asm
Outdated
| mov ecx, reg_arg1 | ||
| rdmsr | ||
| mov [reg_arg2 + qword_struct.lo], eax | ||
| mov [reg_arg2 + qword_struct.hi], edx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
9f9da56 to
62a7eb7
Compare
|
All fixed. I've implemented this in separate commits (c2af1d5 and 62a7eb7) as you requested, but it's worth considering to merge them to the 3rd commit to avoid breaking git-bisect on macOS. Also, two things worth mentioning:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this causes linking issues as both file.s, file.c generate file.o, hence that annoying meaningless suffix.
I see. Another solution is to move *.S to another directory, e.g. core/asm/, but I'm totally fine with the renaming approach.
I've removed hax_enable_irq and hax_disable_irq from macOS' hax_wrapper.cpp file in favor of the generic wrapper at ia32.c, that relies on our NASM implementations rather than macOS kernel function ml_set_interrupts_enabled. I hope this is fine, if not I'll revert this.
I'd actually prefer calling the macOS kernel function, even though it would create platform-dependent code paths (#ifdef __MACH__), because:
- I think we should only resort to assembly when the same functionality cannot be implemented in C or is not already available through a C wrapper.
ml_set_interrupts_enabled()seems to do a little more than justcli/sti(see source code here). The latest PR causes a host freeze on Mac, and I wonder if that's because of this change.
it's worth considering to merge them to the 3rd commit to avoid breaking git-bisect on macOS.
Agreed. Please do that after we get the new commits working. At the moment tests are failing on both Windows and Mac (with different symptoms), and we're trying to debug the issue.
core/ia32.c
Outdated
| extern void ASMCALL asm_enable_irq(void); | ||
| extern void ASMCALL asm_disable_irq(void); | ||
|
|
||
| extern void asm_btr(uint8 *addr, uint bit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already declared in asm.h (with ASMCALL modifier). Remove?
core/ia32.c
Outdated
| extern void ASMCALL asm_disable_irq(void); | ||
|
|
||
| extern void asm_btr(uint8 *addr, uint bit); | ||
| extern void asm_bts(uint8 *addr, uint bit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
core/ia32.c
Outdated
| extern void asm_btr(uint8 *addr, uint bit); | ||
| extern void asm_bts(uint8 *addr, uint bit); | ||
|
|
||
| extern void asm_fxinit(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already declared in asm.h (with ASMCALL modifier). Remove?
core/ia32.c
Outdated
| extern void asm_bts(uint8 *addr, uint bit); | ||
|
|
||
| extern void asm_fxinit(void); | ||
| extern void asm_fxsave(mword *addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
core/ia32.c
Outdated
|
|
||
| extern void asm_fxinit(void); | ||
| extern void asm_fxsave(mword *addr); | ||
| extern void asm_fxrstor(mword *addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
core/ia32.c
Outdated
| } | ||
|
|
||
| void _vmx_vmwrite_natural(struct vcpu_t *vcpu, const char *name, | ||
| component_index_t component, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Align with the first parameter.
core/ia32.c
Outdated
|
|
||
| void _vmx_vmwrite_natural(struct vcpu_t *vcpu, const char *name, | ||
| component_index_t component, | ||
| uint64 source_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Same here.
core/include/vmx.h
Outdated
| uint64 vmx_vmread_64(struct vcpu_t *vcpu, component_index_t component); | ||
|
|
||
| void _vmx_vmwrite(struct vcpu_t *vcpu, const char *name, | ||
| component_index_t component, mword source_val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Insert more spaces to align with the first parameter (struct vcpu_t *).
core/include/vmx.h
Outdated
| void _vmx_vmwrite(struct vcpu_t *vcpu, const char *name, | ||
| component_index_t component, mword source_val); | ||
| void _vmx_vmwrite_natural(struct vcpu_t *vcpu, const char *name, | ||
| component_index_t component, uint64 source_val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Same here.
core/include/vmx.h
Outdated
| void _vmx_vmwrite_natural(struct vcpu_t *vcpu, const char *name, | ||
| component_index_t component, uint64 source_val); | ||
| void _vmx_vmwrite_64(struct vcpu_t *vcpu, const char *name, | ||
| component_index_t component, uint64 source_val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Same here.
|
All fixed.
I fully agree with your reasoning. I've reverted that change and used |
|
Thanks! The new code looks all good, and works well on Windows. However, most Mac tests are broken, resulting in a host kernel panic. We're still trying to root cause the issue. For now, let me post the macOS kernel panic logs, in case you can spot an obvious problem. So far we've seen two kinds of macOS kernel panic: a) b) |
|
The backtrace in crash dump b) gives us some clue. The HAXM function that shows up there is _asm_vmxrun:
00000000000045aa 9c pushfq
00000000000045ab 41 50 pushq %r8
00000000000045ad 41 51 pushq %r9
00000000000045af 41 52 pushq %r10
00000000000045b1 41 53 pushq %r11
00000000000045b3 41 54 pushq %r12
00000000000045b5 41 55 pushq %r13
00000000000045b7 41 56 pushq %r14
00000000000045b9 41 57 pushq %r15
00000000000045bb 51 pushq %rcx
00000000000045bc 52 pushq %rdx
00000000000045bd 56 pushq %rsi
00000000000045be 57 pushq %rdi
00000000000045bf 55 pushq %rbp
00000000000045c0 50 pushq %rax
00000000000045c1 53 pushq %rbx
00000000000045c2 bb 14 6c 00 00 movl $0x6c14, %ebx
00000000000045c7 48 89 e0 movq %rsp, %rax
00000000000045ca 48 83 e8 08 subq $0x8, %rax
00000000000045ce 0f 79 d8 vmwriteq %rax, %rbx
00000000000045d1 5b popq %rbx
00000000000045d2 58 popq %rax
00000000000045d3 50 pushq %rax
00000000000045d4 53 pushq %rbx
00000000000045d5 57 pushq %rdi
00000000000045d6 66 83 fe 01 cmpw $0x1, %si
00000000000045da 48 89 f8 movq %rdi, %rax
00000000000045dd 48 8b 48 08 movq 0x8(%rax), %rcx
00000000000045e1 48 8b 50 10 movq 0x10(%rax), %rdx
00000000000045e5 48 8b 58 18 movq 0x18(%rax), %rbx
00000000000045e9 48 8b 68 28 movq 0x28(%rax), %rbp
00000000000045ed 48 8b 70 30 movq 0x30(%rax), %rsi
00000000000045f1 48 8b 78 38 movq 0x38(%rax), %rdi
00000000000045f5 4c 8b 40 40 movq 0x40(%rax), %r8
00000000000045f9 4c 8b 48 48 movq 0x48(%rax), %r9
00000000000045fd 4c 8b 50 50 movq 0x50(%rax), %r10
0000000000004601 4c 8b 58 58 movq 0x58(%rax), %r11
0000000000004605 4c 8b 60 60 movq 0x60(%rax), %r12
0000000000004609 4c 8b 68 68 movq 0x68(%rax), %r13
000000000000460d 4c 8b 70 70 movq 0x70(%rax), %r14
0000000000004611 4c 8b 78 78 movq 0x78(%rax), %r15
0000000000004615 48 8b 00 movq qword_struct.lo(%rax), %rax
0000000000004618 74 05 je asm_vmxrun.resume
000000000000461a 0f 01 c2 vmlaunch
000000000000461d eb 47 jmp asm_vmxrun.exit_entry_fail
asm_vmxrun.resume:
000000000000461f 0f 01 c3 vmresume
0000000000004622 eb 42 jmp asm_vmxrun.exit_entry_fail
asm_vmxrun.exit_entry:
0000000000004624 57 pushq %rdi
0000000000004625 48 8b 7c 24 08 movq 0x8(%rsp), %rdi
000000000000462a 48 89 07 movq %rax, qword_struct.lo(%rdi)
000000000000462d 48 89 4f 08 movq %rcx, 0x8(%rdi)
0000000000004631 48 89 57 10 movq %rdx, 0x10(%rdi)
0000000000004635 59 popq %rcx
0000000000004636 48 89 5f 18 movq %rbx, 0x18(%rdi)
000000000000463a 48 89 6f 28 movq %rbp, 0x28(%rdi)
000000000000463e 48 89 77 30 movq %rsi, 0x30(%rdi)
0000000000004642 48 89 4f 38 movq %rcx, 0x38(%rdi)
0000000000004646 4c 89 47 40 movq %r8, 0x40(%rdi)
000000000000464a 4c 89 4f 48 movq %r9, 0x48(%rdi)
000000000000464e 4c 89 57 50 movq %r10, 0x50(%rdi)
0000000000004652 4c 89 5f 58 movq %r11, 0x58(%rdi)
0000000000004656 4c 89 67 60 movq %r12, 0x60(%rdi)
000000000000465a 4c 89 6f 68 movq %r13, 0x68(%rdi)
000000000000465e 4c 89 77 70 movq %r14, 0x70(%rdi)
0000000000004662 4c 89 7f 78 movq %r15, 0x78(%rdi)
asm_vmxrun.exit_entry_fail:
0000000000004666 5b popq %rbx
0000000000004667 5b popq %rbx
0000000000004668 58 popq %rax
0000000000004669 5d popq %rbp
000000000000466a 5f popq %rdi
000000000000466b 5e popq %rsi
000000000000466c 5a popq %rdx
000000000000466d 59 popq %rcx
000000000000466e 41 5f popq %r15
0000000000004670 41 5e popq %r14
0000000000004672 41 5d popq %r13
0000000000004674 41 5c popq %r12
0000000000004676 41 5b popq %r11
0000000000004678 41 5a popq %r10
000000000000467a 41 59 popq %r9
000000000000467c 41 58 popq %r8
000000000000467e 66 9c pushfw
0000000000004680 66 58 popw %ax
0000000000004682 66 a9 40 00 testw $vcpu_state._r8, %ax
0000000000004686 75 14 jne ..@31.fail_valid
0000000000004688 66 a9 01 00 testw $0x1, %ax
000000000000468c 75 07 jne ..@31.fail_invalid
000000000000468e b8 00 00 00 00 movl $qword_struct.lo, %eax
0000000000004693 eb 0c jmp ..@31.continue
..@31.fail_invalid:
0000000000004695 b8 01 00 00 00 movl $0x1, %eax
000000000000469a eb 05 jmp ..@31.continue
..@31.fail_valid:
000000000000469c b8 02 00 00 00 movl $0x2, %eax
..@31.continue:
00000000000046a1 9d popfq
00000000000046a2 c3 retq
_vmx_get_rip:
00000000000046a3 48 b8 97 47 00 00 00 00 00 00 movabsq $0x4797, %rax
00000000000046ad c3 retq
_chunk_alloc:
00000000000046ae 55 pushq %rbp
00000000000046af 48 89 e5 movq %rsp, %rbp
00000000000046b2 41 57 pushq %r15
00000000000046b4 41 56 pushq %r14
00000000000046b6 41 55 pushq %r13
00000000000046b8 41 54 pushq %r12
00000000000046ba 53 pushq %rbx
00000000000046bb 50 pushq %rax
00000000000046bc 48 89 d3 movq %rdx, %rbx
00000000000046bf 49 89 f6 movq %rsi, %r14
00000000000046c2 49 89 fc movq %rdi, %r12
00000000000046c5 48 85 db testq %rbx, %rbx
00000000000046c8 74 34 je 0x46fe
00000000000046ca 66 41 f7 c4 ff 0f testw $0xfff, %r12w
00000000000046d0 74 55 je 0x4727
00000000000046d2 48 8d 05 4b 49 02 00 leaq _default_hax_log_level(%rip), %rax
00000000000046d9 41 bf ea ff ff ff movl $0xffffffea, %r15d
00000000000046df 83 38 04 cmpl $0x4, qword_struct.lo(%rax)
00000000000046e2 0f 8f f8 00 00 00 jg 0x47e0
00000000000046e8 48 8d 3d 93 4b 01 00 leaq 0x14b93(%rip), %rdi ## literal pool for: "haxm_error: chunk_alloc: base_uva 0x%llx is not page aligned.\n"
00000000000046ef 31 c0 xorl %eax, %eax
00000000000046f1 4c 89 e6 movq %r12, %rsi
00000000000046f4 e8 00 00 00 00 callq _printf
00000000000046f9 e9 e2 00 00 00 jmp 0x47e0
00000000000046fe 48 8d 05 1f 49 02 00 leaq _default_hax_log_level(%rip), %rax
0000000000004705 41 bf ea ff ff ff movl $0xffffffea, %r15d
000000000000470b 83 38 04 cmpl $0x4, qword_struct.lo(%rax)
000000000000470e 0f 8f cc 00 00 00 jg 0x47e0
0000000000004714 48 8d 3d 3f 4b 01 00 leaq 0x14b3f(%rip), %rdi ## literal pool for: "haxm_error: chunk_alloc: chunk is NULL\n"
000000000000471b 31 c0 xorl %eax, %eax
000000000000471d e8 00 00 00 00 callq _printf
0000000000004722 e9 b9 00 00 00 jmp 0x47e0
0000000000004727 66 41 f7 c6 ff 0f testw $0xfff, %r14w
000000000000472d 74 2c je 0x475b
000000000000472f 48 8d 05 ee 48 02 00 leaq _default_hax_log_level(%rip), %rax
0000000000004736 41 bf ea ff ff ff movl $0xffffffea, %r15d
000000000000473c 83 38 04 cmpl $0x4, qword_struct.lo(%rax)
000000000000473f 0f 8f 9b 00 00 00 jg 0x47e0
0000000000004745 48 8d 3d 75 4b 01 00 leaq 0x14b75(%rip), %rdi ## literal pool for: "haxm_error: chunk_alloc: size 0x%llx is not page aligned.\n"
000000000000474c 31 c0 xorl %eax, %eax
000000000000474e 4c 89 f6 movq %r14, %rsi
0000000000004751 e8 00 00 00 00 callq _printf
0000000000004756 e9 85 00 00 00 jmp 0x47e0
000000000000475b bf 18 00 00 00 movl $vcpu_state._rbx, %edi
0000000000004760 31 f6 xorl %esi, %esi
0000000000004762 e8 16 c2 00 00 callq _hax_vmalloc
0000000000004767 49 89 c5 movq %rax, %r13
000000000000476a 4d 85 ed testq %r13, %r13
000000000000476d 74 4d je 0x47bc
000000000000476f 4d 89 65 08 movq %r12, 0x8(%r13)
0000000000004773 4d 89 75 10 movq %r14, 0x10(%r13)
0000000000004777 4c 89 e7 movq %r12, %rdi
000000000000477a 4c 89 f6 movq %r14, %rsi
000000000000477d 4c 89 ea movq %r13, %rdx
0000000000004780 e8 95 1b 01 00 callq _hax_pin_user_pages
0000000000004785 41 89 c7 movl %eax, %r15d
0000000000004788 45 85 ff testl %r15d, %r15d
000000000000478b 74 4d je 0x47da
000000000000478d 48 8d 05 90 48 02 00 leaq _default_hax_log_level(%rip), %rax
0000000000004794 83 38 04 cmpl $0x4, qword_struct.lo(%rax)
0000000000004797 7f 14 jg 0x47ad
0000000000004799 48 8d 3d 84 4b 01 00 leaq 0x14b84(%rip), %rdi ## literal pool for: "haxm_error: hax_chunk: pin user pages failed, uva: 0x%llx, size: 0x%llx.\n"
00000000000047a0 31 c0 xorl %eax, %eax
00000000000047a2 4c 89 e6 movq %r12, %rsi
00000000000047a5 4c 89 f2 movq %r14, %rdx
00000000000047a8 e8 00 00 00 00 callq _printf
00000000000047ad be 18 00 00 00 movl $vcpu_state._rbx, %esi
00000000000047b2 4c 89 ef movq %r13, %rdi
00000000000047b5 e8 74 c3 00 00 callq _hax_vfree
00000000000047ba eb 24 jmp 0x47e0
00000000000047bc 48 8d 05 61 48 02 00 leaq _default_hax_log_level(%rip), %rax
00000000000047c3 41 bf f4 ff ff ff movl $0xfffffff4, %r15d
00000000000047c9 83 38 04 cmpl $0x4, qword_struct.lo(%rax)
00000000000047cc 7f 12 jg 0x47e0
00000000000047ce 48 8d 3d 27 4b 01 00 leaq 0x14b27(%rip), %rdi ## literal pool for: "haxm_error: hax_chunk: vmalloc failed.\n"
00000000000047d5 e9 41 ff ff ff jmp 0x471b
00000000000047da 4c 89 2b movq %r13, qword_struct.lo(%rbx)
00000000000047dd 45 31 ff xorl %r15d, %r15d
00000000000047e0 44 89 f8 movl %r15d, %eax
00000000000047e3 48 83 c4 08 addq $0x8, %rsp
00000000000047e7 5b popq %rbx
00000000000047e8 41 5c popq %r12
00000000000047ea 41 5d popq %r13
00000000000047ec 41 5e popq %r14
00000000000047ee 41 5f popq %r15
00000000000047f0 5d popq %rbp
00000000000047f1 c3 retqApparently, the host RIP returned by 00000000000046a3 48 b8 97 47 00 00 00 00 00 00 movabsq $0x4797, %raxwhere But the disassembly of We'll try to manually patch the address in the KEXT binary and see if it fixes the problem. But eventually, we need to understand the interface between NASM and the Mac linker and come up with a proper fix. |
|
The manual address/offset patching method works, so this I inspected If we compare the original ( i. The old code uses the Changing |
| %endif | ||
|
|
||
| function vmx_get_rip, 0 | ||
| mov reg_ret, asm_vmxrun.exit_entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> lea reg_ret, [rel asm_vmxrun.exit_entry]
This is enough to fix the macOS kernel panic! (A lot of credit goes to @junxiaoc )
Basically, the use of lea with the REL keyword causes the address to be computed using RIP-relative addressing. The disassembly looks like this:
0000000000004a62 48 8d 05 7a ff ff ff leaq asm_vmxrun.exit_entry(%rip), %raxAlthough rel is specific to 64-bit mode, it seems to be just ignored for the Win32 build, and this lea approach works across all platforms we support. Nevertheless, for readability (especially since mov is more intuitive than lea), we could save the mov approach for 32-bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please feel free to incorporate
haxm-PR64-mac-kernel-panic-fix.zip into your 3rd commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raphaelning @junxiaoc Thank you so much for debugging+fixing the issue, and also for your patience with this PR. :-)
I'll add the patch and squash everything into the 3rd commit (it will be big, but as mentioned earlier it's convenient to avoid breaking git-bisect).
Signed-off-by: Alexandro Sanchez Bach <[email protected]>
raphaelning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! We're excited to land this PR :-)
|
All done! |
I just cleaned up parts of the codebase, hoping to make an improvement to overall readability. I hope it's not too biased with my own tastes. If so, please feel free to reject any change and I will rewrite the patch cherry-picking only the commits you deem acceptable.
Summary
VMX_EXIT_prefix to VMX exit reasons enum. Goal is to avoid polluting the global namespace with ambiguous names such as "CPUID_INSTRUCTION" that are confusing without context (e.g. it could refer to the "cpuid" opcode, or the "cpuid" exit reason, or anything else involving that instruction)._INSTRUCTIONsuffix from VMX exit reasons enum. Goal is to keep consistency withexit_*handlers (which don't have such such suffix). It's also less verbose this way.VECTOR_XX, to remain consistent with the Intel SDM.config_tto use designated initializersfunction_*macros.vmx_error_twithvmx_result_tenum type with VMX_SUCCEED (0), VMX_FAIL_INVALID (1), VMX_FAIL_VALID (2) values. Transforming EFLAGS into these values happens in the assembly functions via the macrovmx_check. The new enum typevmx_error_tcontains the error codes for VMX_FAIL_VALID. This change required renaming a couple of variables.vmx_error_t cpu_vmentry_failed(...)tovoid cpu_vmentry_failed(...)since the return value is useless and always ignored by the caller.load_kernel_ldttoset_kernel_ldtto remain consistent with IDT/GDT setters.asm.htovmx.h.As always, based on the changes you accept, I'll rewrite the Git history to avoid polluting the repository.