Experimentation reveals that gcc ignores -mrtd for the implicit
arithmetic functions (e.g. __udivdi3), but not for the implicit
memcpy() and memset() functions. Mark the implicit arithmetic
functions with __attribute__((cdecl)) to compensate for this.
(Note: we cannot mark with with __cdecl, because we define __cdecl to
incorporate regparm(0) as well.)
Don't trash the %ecx value returned by relocate(). This was causing
us to round down the size for the relocation copy to the nearest 64kB
(+0x10 bytes); this just happened to work on most machines because the
last 64kB of the image is all-zeroes anyway (it's the .bss).
Add __bss16() macro, and allow use of .bss16 section by removing
link-time check for section overlaps. (In order to avoid wasting
space in the executable image, .bss16 will overlap with the following
section, which is .text).
Use fast in-situ test for gate A20 being set, to cut down on the
number of (potentially very slow) gateA20_set operations.
Die with a fatal error if we are unable to set gate A20; if this fails
then we are bound to experience memory corruption at a later stage,
and I'd prefer to pick it up early.
Improve error reporting for strange length combinations reported by
the UNDI stack.
Ignore obviously invalid length combinations (as returned by
e.g. VMWare's PXE stack).
Limit to one packet per poll to avoid memory exhaustion.
Set up %ds *before* testing a value in our data segment (d'oh!).
Always send EOI; do not chain to BIOS's default interrupt handler.
They are just too unpredictable; at least VMware's seems to kill the
machine if you go anywhere near it.
Disable interrupts after return from PXENV_UNDI_ISR, just in case some
dumb PXE stack enables them.