[librm] Speed up protected-to-real mode transition under KVM
On an Intel CPU supporting VMX, KVM will emulate instructions while
the CPU state remains "invalid". In real mode, the CPU state is
defined to be "invalid" if any segment register has a base which is
not equal to (sreg<<4) or a limit which is not equal to 64kB.
We don't actually use the base stored in the REAL_DS descriptor for
any significant purpose. Change the base stored in this descriptor to
be equal to (REAL_DS<<4). A segment register loaded with REAL_DS is
then automatically valid in both real and protected modes. This
allows KVM to stop emulating instructions much sooner.
The only use of REAL_DS for memory accesses currently occurs in the
indirect ljmp within prot_to_real. Change this to a direct ljmp,
storing rm_cs in .text16 as part of the ljmp instruction. This
removes the only memory access via REAL_DS (thereby allowing for the
above descriptor base address hack), and also simplifies the ljmp
instruction (which will still have to be emulated).
Load the real-mode interrupt descriptor table register before
switching to real mode, since this avoids triggering an EXCEPTION_NMI
and corresponding VM exit.
This reduces the time taken by prot_to_real under KVM by around 65%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We now have the ability to handle interrupts while in protected mode,
and so no longer need to set up a dedicated interrupt descriptor table
while running COM32 executables.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running in a virtual machine, switching to real mode may be
expensive. Allow interrupts to be enabled while in protected mode and
reflected down to the real-mode interrupt handlers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[build] Fix __libgcc attribute for recent gcc versions
We observed some time ago (in commit 4ce8d61 "Import various libgcc
functions from syslinux") that gcc seems to treat calls to the
implicit arithmetic functions (e.g. __udivdi3()) as being affected by
-mregparm but unaffected by -mrtd.
This seems to be no longer the case with current gcc versions, which
treat calls to these functions as being affected by both -mregparm and
-mrtd, as expected.
There is nothing obvious in the gcc changelogs to indicate precisely
when this happened. From experimentation with available gcc versions,
the change occurred sometime between v4.6.3 and v4.7.2. We assume
that only versions up to v4.6.x require the special treatment.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The VESA frame buffer console uses the VESA BIOS extensions (VBE) to
enumerate video modes, selects an appropriate mode, and then hands off
to the generic frame buffer code.
The font is extracted from the VGA BIOS, avoiding the need to provide
an external font file.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of Linux apparently complain if initrds are not aligned
to a page boundary. Fix by changing INITRD_ALIGN from 4 bytes to 4096
bytes.
The amount of padding at the end of each initrd will now often be
sufficient to allow the cpio header to be prepended without crossing
an alignment boundary. The final location of the initrd may therefore
end up being slightly higher than the post-shuffle location.
bzimage_load_initrd() must therefore now copy the initrd body prior to
copying the cpio header, otherwise the start of the initrd body may be
overwritten by the cpio header. (Note that the guarantee that an
initrd will never need to overwrite an initrd at a higher location
still holds, since the overall length of each initrd cannot decrease
as a result of adding a cpio header.)
Reported-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[libc] Redefine low 8 bits of error code as "platform error code"
The low 8 bits of an iPXE error code are currently defined as the
closest equivalent PXE error code. Generalise this scheme to
platforms other than PC-BIOS by extending this definition to "closest
equivalent platform error code". This allows for the possibility of
returning meaningful errors via EFI APIs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Abstract out the ability to reboot the system to a separate reboot()
function (with platform-specific implementations), add an EFI
implementation, and make the existing "reboot" command available under
EFI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[vmware] Allow settings to be specified in the VMware .vmx file
Allow iPXE settings to be specified in the .vmx file via the VMware
GuestInfo mechanism. For example:
guestinfo.ipxe.filename = "http://boot.ipxe.org/demo/boot.php"
guestinfo.ipxe.dns = "192.168.0.1"
guestinfo.ipxe.net0.ip = "192.168.0.15"
guestinfo.ipxe.net0.netmask = "255.255.255.0"
guestinfo.ipxe.net0.gateway = "192.168.0.1"
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[i386] Use memory address constraints in __bswap_16s() and __bswap_64s()
Minimise code size by forcing the use of memory addresses for
__bswap_16s() and __bswap_64s(). (__bswap_32s() cannot avoid loading the
value into a register.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix a strict-aliasing error on certain versions of gcc.
Reported-by: Marko Myllynen <myllynen@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[i386] Optimise byte-swapping functions and provide __bswap_{16,32,64}s()
Use the "bswap" instruction to shrink the size of byte-swapping code,
and provide the in-place variants __bswap_{16,32,64}s.
"bswap" is available only on 486 and later processors. (We already
assume the presence of "cpuid" and "rdtsc", which are available only
on Pentium and later processors.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RTC-based entropy source uses the nanosecond-scale CPU TSC to
measure the time between two 1kHz interrupts generated by the CMOS
RTC. In a physical machine these clocks are driven from independent
crystals, resulting in some observable clock drift. In a virtual
machine, the CMOS RTC is typically emulated using host-OS
constructions such as SIGALRM.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[rng] Add ANS X9.82 Approved Source of Entropy Input
ANS X9.82 specifies several Approved Sources of Entropy Input (SEI).
One such SEI uses an entropy source as the Source of Entropy Input,
condensing each entropy source output after each GetEntropy call.
This can be implemented relatively cheaply in iPXE and avoids the need
to allocate potentially very large buffers.
(Note that the terms "entropy source" and "Source of Entropy Input"
are not synonyms within the context of ANS X9.82.)
Use the iPXE API mechanism to allow entropy sources to be selected at
compilation time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[pxe] Provide PXENV_FILE_EXIT_HOOK only for ipxelinux.0 builds
PXENV_FILE_EXIT_HOOK is designed to allow ipxelinux.0 to unload both
the iPXE and pxelinux components without affecting the underlying PXE
stack. Unfortunately, it causes unexpected behaviour in other
situations, such as when loading a non-embedded pxelinux.0 via
undionly.kpxe. For example:
PXE ROM -> undionly.kpxe -> pxelinux.0 -> chain.c32 to boot hd0
would cause control to return to iPXE instead of booting from the hard
disk. In some cases, this would result in a harmless but confusing
"No more network devices" message; in other cases stranger things
would happen, such as being returned to the iPXE shell prompt.
The fundamental problem is that when pxelinux detects
PXENV_FILE_EXIT_HOOK, it may attempt to specify an exit hook and then
exit back to iPXE, assuming that iPXE will in turn exit cleanly via
the specified exit hook. This is not a valid assumption in the
general case, since the action of exiting back to iPXE does not
directly cause iPXE to exit itself. (In the specific case of
ipxelinux.0, this will work since the embedded script exits as soon as
pxelinux.0 exits.)
Fix the unexpected behaviour in the non-ipxelinux.0 cases by including
support for PXENV_FILE_EXIT_HOOK only when using a new .kkkpxe format.
The ipxelinux.0 build process should therefore now use undionly.kkkpxe
instead of undionly.kkpxe.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow an initrd (such as an embedded script) to be passed to iPXE when
loaded as a .lkrn (or .iso) image. This allows an embedded script to
be varied without recompiling iPXE.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the multiple-SAN-drive capability of the iPXE core via the iPXE
command line by adding commands to hook and unhook additional drives.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[int13] Include disk signature in debugging output
The disk signature is used by some OSes (notably Windows) to identify
the boot disk, so it's useful debugging information to have.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[int13] Add infrastructure to support EDD version 4.0
Support the extensions mandated by EDD 4.0, including:
o the ability to specify a flat physical address in a disk address
packet,
o the ability to specify a sector count greater than 127 in a disk
address packet,
o support for all functions within the Fixed Disk Access and EDD
Support subsets,
o the ability to describe a device using EDD Device Path Information.
This implementation is based on draft revision 3 of the EDD 4.0
specification, with reference to the EDD 3.0 specification. It is
possible that this implementation may need to change in order to
conform to the final published EDD 4.0 specification.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[block] Replace gPXE block-device API with an iPXE asynchronous interface
The block device interface used in gPXE predates the invention of even
the old gPXE data-transfer interface, let alone the current iPXE
generic asynchronous interface mechanism. Bring this old code up to
date, with the following benefits:
o Block device commands can be cancelled by the requestor. The INT 13
layer uses this to provide a global timeout on all INT 13 calls,
with the result that an unexpected passive failure mode (such as
an iSCSI target ACKing the request but never sending a response)
will lead to a timeout that gets reported back to the INT 13 user,
rather than simply freezing the system.
o INT 13,00 (reset drive) is now able to reset the underlying block
device. INT 13 users, such as DOS, that use INT 13,00 as a method
for error recovery now have a chance of recovering.
o All block device commands are tagged, with a numerical tag that
will show up in debugging output and in packet captures; this will
allow easier interpretation of bug reports that include both
sources of information.
o The extremely ugly hacks used to generate the boot firmware tables
have been eradicated and replaced with a generic acpi_describe()
method (exploiting the ability of iPXE interfaces to pass through
methods to an underlying interface). The ACPI tables are now
built in a shared data block within .bss16, rather than each
requiring dedicated space in .data16.
o The architecture-independent concept of a SAN device has been
exposed to the iPXE core through the sanboot API, which provides
calls to hook, unhook, boot, and describe SAN devices. This
allows for much more flexible usage patterns (such as hooking an
empty SAN device and then running an OS installer via TFTP).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
COM32 binaries generally expect to run with interrupts
enabled. Syslinux does so, and COM32 programs will execute cli/sti
pairs when running a critical section, to provide mutual exclusion
against BIOS interrupt handlers. Previously, under iPXE, the IDT was
not valid, so any interrupt (e.g. a timer tick) would generally cause
the machine to triple fault.
This change introduces code to:
- Create a valid IDT at the same location that syslinux uses
- Create an "interrupt jump buffer", which contains small pieces of
code that simply record the vector number and jump to a common
handler
- Thunk down to real mode and execute the BIOS's interrupt handler
whenever an interrupt is received in a COM32 program
- Switch IDTs and enable/disable interrupts when context switching to
and from COM32 binaries
Testing done:
- Booted VMware ESX using a COM32 multiboot loader (mboot.c32)
- Built with GDBSERIAL enabled, and tested breakpoints on int22 and
com32_irq
- Put the following code in a COM32 program:
asm volatile ( "sti" );
while ( 1 );
Before this change, the machine would triple fault
immediately. After this change, it hangs as expected. Under Bochs,
it is possible to see the interrupt handler run, and the current
time in the BIOS data area gets incremented.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most of iPXE uses __attribute__((packed)) anyway, and PACKED conflicts
with an identically-named macro in the upstream EFI header files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>