The .lkrn prefix currently provides a zImage kernel with unused setup
sectors and the whole iPXE binary placed within the "protected mode
kernel" portion of the zImage.
The work carried out years ago to create the .mrom format provides a
mechanism allowing the iPXE binary to be split into a small real-mode
header and a larger payload. This neatly matches the way that a
bzImage is loaded: the "setup sectors" can contain the header and the
"protected mode kernel" can contain the payload.
This removes the size restrictions on an iPXE .lkrn image (and hence
on derived image formats such as .iso).
Also remove obsolete copyright information, since none of the original
code or functionality now remains.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcp] Defer sending ACKs until all received packets have been processed
When running inside a virtual machine (or when using the UNDI driver),
transmitting packets can be expensive. When we receive several
packets in one poll (e.g. because a slow BIOS timer interrupt routine
has caused us to fall behind in processing), we can safely send just a
single ACK to cover all of the received packets. This reduces the
time spent transmitting and allows us to clear the backlog much
faster.
Various RFCs (starting with RFC1122) state that there should be an ACK
for at least every second segment. We choose not to enforce this
rule. Under normal operation each poll should find at most one
received packet, and we will then not delay any ACKs. We delay
(i.e. omit) ACKs only when under sufficiently heavy load that we are
finding multiple packets per poll; under these conditions it is
important to clear the backlog quickly since any delay may lead to
dropped packets.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 8540300 ("[build] Disable ccache for all relevant build
targets") attempted to generalise the rule for $(BIN)/version.o to
$(BIN)/version.% in order to apply the dependency to all relevant
build targets (debug objects, assembly listings, etc).
This generalisation appears to work for the ccache override
directives, but seems to cause make (at least, GNU make 4.0) to simply
ignore the dependency upon the git index.
Since version.c contains only some string constants, there is unlikely
to be a substantive need for its debug objects, assembly listings,
etc. Restore the previous form of the dependency and accept that
hypothetical builds with e.g. DEBUG=version will not be handled
correctly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Exclude time spent in hypervisor from profiling
When profiling, exclude any time spent inside the hypervisor
responding to our MMIO accesses. This substantially reduces the
variance accumulated on many other profilers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[profile] Allow interrupts to be excluded from profiling results
Interrupt processing adds noise to profiling results. Allow
interrupts (from within protected mode) to be profiled separately,
with time spent within the interrupt handler being excluded from any
other profiling currently in progress.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[undi] Place an upper limit on the number of PXENV_UNDI_ISR calls per poll
PXENV_UNDI_ISR calls may implicitly refill the underlying receive
ring, and so could continue to retrieve packets indefinitely. Place
an upper limit on the number of calls to PXENV_UNDI_ISR per call to
undinet_poll().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When making a call from real mode to protected mode, we save and
restore the global and interrupt descriptor table registers. The
restore currently takes place after returning to real mode, which
generates two EXCEPTION_NMIs and corresponding VM exits when running
under KVM on an Intel CPU.
Avoid the VM exits by restoring the descriptor table registers inside
prot_to_real, while still running in protected mode.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[librm] Speed up real-to-protected mode transition under KVM
Ensure that all segment registers have zero in the low two bits before
transitioning to protected mode. This allows the CPU state to
immediately be deemed to be "valid", and eliminates the need for any
further emulated instructions.
Load the protected-mode interrupt descriptor table after switching to
protected mode, since this avoids triggering an EXCEPTION_NMI and
corresponding VM exit.
This reduces the time taken by real_to_prot under KVM by around 50%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[librm] Speed up protected-to-real mode transition under KVM
On an Intel CPU supporting VMX, KVM will emulate instructions while
the CPU state remains "invalid". In real mode, the CPU state is
defined to be "invalid" if any segment register has a base which is
not equal to (sreg<<4) or a limit which is not equal to 64kB.
We don't actually use the base stored in the REAL_DS descriptor for
any significant purpose. Change the base stored in this descriptor to
be equal to (REAL_DS<<4). A segment register loaded with REAL_DS is
then automatically valid in both real and protected modes. This
allows KVM to stop emulating instructions much sooner.
The only use of REAL_DS for memory accesses currently occurs in the
indirect ljmp within prot_to_real. Change this to a direct ljmp,
storing rm_cs in .text16 as part of the ljmp instruction. This
removes the only memory access via REAL_DS (thereby allowing for the
above descriptor base address hack), and also simplifies the ljmp
instruction (which will still have to be emulated).
Load the real-mode interrupt descriptor table register before
switching to real mode, since this avoids triggering an EXCEPTION_NMI
and corresponding VM exit.
This reduces the time taken by prot_to_real under KVM by around 65%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The mode-transition code involves paths which switch back and forth
between the .text and .text16 sections. At present, only the start of
each function is labelled, which makes it difficult to decode
addresses within the parts of the function existing in a different
section.
Add explicit labels at the start of each section change, so that
addresses can be meaningfully decoded to the nearest label.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[pcbios] Do not switch to real mode to sleep the CPU
Now that we can handle interrupts while in protected mode, there is no
need to switch to real mode just to halt the CPU.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[pcbios] Do not switch to real mode to check for timer interrupt
The currticks() function is called at least once per TCP packet, and
so is performance-critical. Switching to real mode just to allow the
timer interrupt to fire is expensive when running inside a virtual
machine, and imposes a significant performance cost.
Fix by enabling interrupts without switching to real mode. This
results in an approximately 100% increase in download speed when
running under KVM.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We now have the ability to handle interrupts while in protected mode,
and so no longer need to set up a dedicated interrupt descriptor table
while running COM32 executables.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running in a virtual machine, switching to real mode may be
expensive. Allow interrupts to be enabled while in protected mode and
reflected down to the real-mode interrupt handlers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for an explicit debug level of zero, which will enable
assertions and profiling (i.e. anything controlled by NDEBUG) without
generating any debug messages.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[librm] Use genuine real mode to accelerate operation in virtual machines
We currently use flat real mode wherever real mode is required. This
guarantees that we will not surprise some unsuspecting external caller
which has carefully set up flat real mode by suddenly reducing the
segment limits to 64kB.
However, operating in flat real mode imposes a severe performance
penalty in some virtualisation environments, since some CPUs cannot
fully virtualise flat real mode and so the hypervisor must fall back
to emulation. In particular, operating under KVM on a pre-Westmere
Intel CPU will be at least an order of magnitude slower, to the point
that there is a visible teletype effect when printing anything to the
BIOS console. (Older versions of KVM used to cheat and ignore the
"flat" part of flat real mode, which masked the problem.)
Switch (back) to using genuine real mode with 64kB segment limits
instead of flat real mode. Hopefully this won't break anything.
Add an explicit switch to flat real mode before returning to the BIOS
from the ROM prefix, since we know that a PMM BIOS will call the ROM
initialisation point (and potentially the BEV) in flat real mode.
As noted in previous commit messages, it is not possible to restore
the real-mode segment limits after a transition to protected mode,
since there is no way to know which protected-mode segment descriptor
was originally used to initialise the limit portion of the segment
register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Inside a virtual machine, writing the RX ring tail pointer may incur a
substantial overhead of processing inside the hypervisor. Minimise
this overhead by writing the tail pointer once per batch of
descriptors, rather than once per descriptor.
Profiling under qemu-kvm (version 1.6.2) shows that this reduces the
amount of time taken to refill the RX descriptor ring by around 90%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Operations which are negligible on physical hardware (such as issuing
a posted write to the transmit ring tail register) may involve
substantial amounts of processing within the hypervisor if running in
a virtual machine.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[test] Check for correct -mrtd assumption on libgcc arithmetic functions
As observed in commit 082cedb ("[build] Fix __libgcc attribute for
recent gcc versions"), recent versions of gcc have changed the
semantics of -mrtd as applied to the implicit arithmetic functions.
It is possible for tests to succeed even if our assumptions about
gcc's interpretation of -mrtd are incorrect. In particular, if gcc
chooses to utilise a frame pointer in the calling function, then it
can tolerate a temporarily incorrect stack pointer (since the stack
pointer will shortly afterwards be restored from the frame pointer
anyway).
Add tests designed specifically to check that our implementations of
the implicit arithmetic functions manipulate the stack pointer as
expected by gcc.
The effect of these tests can be observed by temporarily reverting
commit 082cedb ("[build] Fix __libgcc attribute for recent gcc
versions"): without this fix in place, the tests will fail on gcc 4.7
and later.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[build] Fix __libgcc attribute for recent gcc versions
We observed some time ago (in commit 4ce8d61 "Import various libgcc
functions from syslinux") that gcc seems to treat calls to the
implicit arithmetic functions (e.g. __udivdi3()) as being affected by
-mregparm but unaffected by -mrtd.
This seems to be no longer the case with current gcc versions, which
treat calls to these functions as being affected by both -mregparm and
-mrtd, as expected.
There is nothing obvious in the gcc changelogs to indicate precisely
when this happened. From experimentation with available gcc versions,
the change occurred sometime between v4.6.3 and v4.7.2. We assume
that only versions up to v4.6.x require the special treatment.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On a 32-bit system, 64-bit division is implemented using the libgcc
functions provided in __udivmoddi4.c etc. Calls to these functions
are generated automatically by gcc, with a calling convention that is
somewhat empirical in nature. Add these self-tests primarily as a
check that we are using the correct calling convention.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Escape sequences received via the serial console can fail since the
cpu_nap() in getchar_timeout() can delay processing for more than the
time it takes for a single character to arrive.
Fix by enabling the UART FIFOs.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Avoid completely filling the TX descriptor ring
It is unclear from the datasheets whether or not the TX ring can be
completely filled (i.e. whether writing the tail value as equal to the
current head value will cause the ring to be treated as completely
full or completely empty). It is very plausible that this edge case
could differ in behaviour between real hardware and the many
implementations of an emulated Intel NIC found in various virtual
machines. Err on the side of caution and always leave at least one
ring entry empty.
Signed-off-by: Michael Brown <mcb30@ipxe.org>