[pxe] Maintain a queue for received PXE UDP packets
Some devices return multiple packets in a single poll. Handle such
devices gracefully by enqueueing received PXE UDP packets (along with
a pseudo-header to hold the IPv4 addresses and port numbers) and
dequeueing them on subsequent calls to PXENV_UDP_READ.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tftp] Explicitly abort connection whenever parent interface is closed
Fetching the TFTP file size is currently implemented via a custom
"tftpsize://" protocol hack. Generalise this approach to instead
close the TFTP connection whenever the parent data-transfer interface
is closed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow drivers to specify a supported PCI class code. To save space in
the final binary, make this an attribute of the driver rather than an
attribute of a PCI device ID list entry.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[build] Use -malign-double to build 32-bit UEFI binaries
The EDK2 codebase uses -malign-double for 32-bit builds, which causes
64-bit integers to be naturally aligned. This affects the layout of
some structures (including EFI_BLOCK_IO_MEDIA).
This mirrors wimboot commit 7b8f39d ("[build] Fix building of 32-bit
UEFI version").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[mromprefix] Allow for .mrom images larger than 128kB
The .mrom payload has a code type of 0xff and so the initialisation
length field (single byte at offset 0x02) does not need to be
present. Use only the PCI header's image length field, which allows
the .mrom payload to be up to 32MB in size.
Inspired-by: Swift Geek <swiftgeek@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[mromprefix] Use PCI length field to obtain length of individual images
mromprefix.S currently uses the initialisation length field (single
byte at offset 0x02) to determine the length of a ROM image within a
multi-image ROM BAR. For PCI ROM images with a code type other than
0, the initialisation length field may not be present.
Fix by using the PCI header's image length field instead.
Inspired-by: Swift Geek <swiftgeek@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The build process has for a long time assumed that every ROM is a PCI
ROM, and will always include the PCI header and PCI-related
functionality (such as checking the PCI BIOS version, including the
PCI bus:dev.fn address within the ROM product name string, etc.).
While real ISA cards are no longer in use, some virtualisation
environments (notably VirtualBox) have support only for ISA ROMs.
This can cause problems: in particular, VirtualBox will call our
initialisation entry point with random garbage in %ax, which we then
treat as the PCI bus:dev.fn address of the autoboot device: this
generally prevents the default boot sequence from using any network
devices.
Create .isarom and .pcirom prefixes which can be used to explicitly
specify the type of ROM to be created. (Note that the .mrom prefix
always implies a PCI ROM, since the .mrom mechanism relies on
reconfiguring PCI BARs.)
Make .rom a magic prefix which will automatically select the
appropriate PCI or ISA ROM prefix for ROMs defined via a PCI_ROM() or
ISA_ROM() macro. To maintain backwards compatibility, we default to
building a PCI ROM for anything which is not directly derived from a
PCI_ROM() or ISA_ROM() macro (e.g. bin/intel.rom).
Add a selection of targets to "make everything" to ensure that the
(relatively obscure) ISA ROM build process is included within the
per-commit QA checks.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Since some PnP BIOSes fail to set %es:di to point to the PnP signature
on entry, we identify a PnP BIOS by scanning through the top 64kB of
base memory looking for the PnP structure. We therefore don't
actually use the values of %es:di provided to the initialisation entry
point, and so there is no need to preserve them.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using version 1 grant tables limits guests to using 16TB of grantable
RAM, and prevents the use of subpage grants. Some versions of the Xen
hypervisor refuse to allow the grant table version to be set after the
first grant references have been created, so the loaded operating
system may be stuck with whatever choice we make here. We therefore
currently use version 2 grant tables, since they give the most
flexibility to the loaded OS.
Current versions (7.2.0) of the Windows PV drivers have no support for
version 2 grant tables, and will merrily create version 1 entries in
what the hypervisor believes to be a version 2 table. This causes
some confusion.
Avoid this problem by attempting to use version 1 tables, since
otherwise we may render Windows unable to boot.
Play nicely with other potential bootloaders by accepting either
version 1 or version 2 grant tables (if we are unable to set our
requested version).
Note that the use of version 1 tables on a 64-bit system introduces a
possible failure path in which a frame number cannot fit into the
32-bit field within the v1 structure. This in turn introduces
additional failure paths into netfront_transmit() and
netfront_refill_rx().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[xen] Accept alternative Xen platform PCI device ID 5853:0002
At some point during XenServer development history, the Windows PV
drivers changed to using a PCI device ID of 5853:0002 rather than
5853:0001. Current (7.2.0) drivers will bind to either 5853:0001 or
5853:0002, and the general approach taken by the world at large
(including Amazon EC2) seems to be to use only 5853:0001.
However, the current version of XenServer (6.2.0) will create the
platform device as 5853:0002 (via the platform:device_id VM parameter)
for any VMs created using the built-in templates for Windows Vista or
later.
Accept either PCI ID, since the underlying device is identical.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[efi] Default to releasing network devices for use via SNP
We currently treat network devices as available for use via the SNP
API only if RX queue processing has been frozen. (This is similar in
spirit to the way that RX queue processing is frozen for the network
device currently exposed via the PXE API.)
The default state of a freshly created network device is for the RX
queue to not be frozen, and thus to be unavailable for use via SNP.
This causes problems when devices are added through code paths other
than _efidrv_start() (which explicitly releases devices for use via
SNP).
We don't actually need to freeze RX queue processing, since calls via
the SNP API will always use netdev_poll() rather than net_poll(), and
so will never trigger the RX queue processing code path anyway.
We can therefore simplify the code to use a single global flag to
indicate whether network devices are claimed for use by iPXE or
available for use via SNP. Using a global flag allows the default
state for dynamically created network devices to behave sensibly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for Xen PV-HVM domains (detected via the Xen
platform PCI device with IDs 5853:0001), including support for
accessing configuration via XenStore and enumerating devices via
XenBus.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ioapi] Fail ioremap() when attempting to map a zero bus address
When a 32-bit iPXE binary is running on a system which allocates PCI
memory BARs above 4GB, our PCI subsystem will return the base address
for any such BARs as zero (with a warning message if DEBUG=pci is
enabled). Currently, ioremap() will happily map an address pointing
to the start of physical memory, providing no sensible indication of
failure.
Fix by always returning NULL if we are asked to ioremap() a zero bus
address.
With a totally flat memory model (e.g. under EFI), this provides an
accurate failure indication since no PCI peripheral will be mapped to
the zero bus address.
With the librm memory model, there is the possibility of a spurious
NULL return from ioremap() if the bus address happens to be equal to
virt_offset. Under the current virtual memory map, the NULL virtual
address will always be the start of .textdata, and so this problem
cannot occur; a NULL return from ioremap() will always be an accurate
failure indication.
Debugged-by: Anton D. Kachalov <mouse@yandex-team.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a single instance of EFI_DRIVER_BINDING_PROTOCOL (attached to
our image handle); this matches the expectations scattered throughout
the EFI specification.
Open the underlying hardware device using EFI_OPEN_PROTOCOL_BY_DRIVER
and EFI_OPEN_PROTOCOL_EXCLUSIVE, to prevent other drivers from
attaching to the same device.
Do not automatically connect to devices when being loaded as a driver;
leave this task to the platform firmware (or to the user, if loading
directly from the EFI shell).
When running as an application, forcibly disconnect any existing
drivers from devices that we want to control, and reconnect them on
exit.
Provide a meaningful driver version number (based on the build
timestamp), to allow platform firmware to automatically load newer
versions of iPXE drivers if multiple drivers are present.
Include device paths within debug messages where possible, to aid in
debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[librm] Allow for the PIC interrupt vector offset to be changed
Some external code (observed with FreeBSD's bootloader) will continue
to make INT 13 calls after reconfiguring the 8259 PIC to change the
vector offsets for IRQs. If an IRQ (e.g. the timer IRQ) subsequently
occurs while iPXE is in protected mode, this will cause a general
protection fault since the corresponding IDT entry is empty.
A general protection fault is INT 0x0d, which happens to overlap with
the original IRQ5. We therefore do have an ISR set up to handle a
general protection fault, but this ISR simply reflects the interrupt
down to the real-mode INT 0x0d and then attempts to return. Since our
ISR is expecting a hardware interrupt rather than a general protection
fault, it doesn't remove the error code from the stack before issuing
the iret instruction; it therefore attempts to return to a garbage
address. Since the segment part of this address is likely to be
invalid, a second general protection fault occurs. This cycle
continues until we run out of stack space and triple fault.
Fix by reflecting all INTs down to real mode. This actually reduces
the code size by four bytes (but increases the bss size by almost
2kB).
Reported-by: Brian Rak <dn@devicenull.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[lkrnprefix] Make real-mode setup code relocatable
The bzImage boot protocol allows the real-mode code to be loaded at
any segment within base memory. (The fact that both iPXE and recent
versions of Syslinux will load the real-mode code at 1000:0000 is a
coincidence; it is not guaranteed by the specification.)
Fix by making the code relocatable.
Reported-by: Andrew Stuart <andrew@shopcusa.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rework geniso and genliso to provide a single merged utility for
generating ISO images.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The .lkrn prefix currently provides a zImage kernel with unused setup
sectors and the whole iPXE binary placed within the "protected mode
kernel" portion of the zImage.
The work carried out years ago to create the .mrom format provides a
mechanism allowing the iPXE binary to be split into a small real-mode
header and a larger payload. This neatly matches the way that a
bzImage is loaded: the "setup sectors" can contain the header and the
"protected mode kernel" can contain the payload.
This removes the size restrictions on an iPXE .lkrn image (and hence
on derived image formats such as .iso).
Also remove obsolete copyright information, since none of the original
code or functionality now remains.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[profile] Allow interrupts to be excluded from profiling results
Interrupt processing adds noise to profiling results. Allow
interrupts (from within protected mode) to be profiled separately,
with time spent within the interrupt handler being excluded from any
other profiling currently in progress.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[undi] Place an upper limit on the number of PXENV_UNDI_ISR calls per poll
PXENV_UNDI_ISR calls may implicitly refill the underlying receive
ring, and so could continue to retrieve packets indefinitely. Place
an upper limit on the number of calls to PXENV_UNDI_ISR per call to
undinet_poll().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When making a call from real mode to protected mode, we save and
restore the global and interrupt descriptor table registers. The
restore currently takes place after returning to real mode, which
generates two EXCEPTION_NMIs and corresponding VM exits when running
under KVM on an Intel CPU.
Avoid the VM exits by restoring the descriptor table registers inside
prot_to_real, while still running in protected mode.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[librm] Speed up real-to-protected mode transition under KVM
Ensure that all segment registers have zero in the low two bits before
transitioning to protected mode. This allows the CPU state to
immediately be deemed to be "valid", and eliminates the need for any
further emulated instructions.
Load the protected-mode interrupt descriptor table after switching to
protected mode, since this avoids triggering an EXCEPTION_NMI and
corresponding VM exit.
This reduces the time taken by real_to_prot under KVM by around 50%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[librm] Speed up protected-to-real mode transition under KVM
On an Intel CPU supporting VMX, KVM will emulate instructions while
the CPU state remains "invalid". In real mode, the CPU state is
defined to be "invalid" if any segment register has a base which is
not equal to (sreg<<4) or a limit which is not equal to 64kB.
We don't actually use the base stored in the REAL_DS descriptor for
any significant purpose. Change the base stored in this descriptor to
be equal to (REAL_DS<<4). A segment register loaded with REAL_DS is
then automatically valid in both real and protected modes. This
allows KVM to stop emulating instructions much sooner.
The only use of REAL_DS for memory accesses currently occurs in the
indirect ljmp within prot_to_real. Change this to a direct ljmp,
storing rm_cs in .text16 as part of the ljmp instruction. This
removes the only memory access via REAL_DS (thereby allowing for the
above descriptor base address hack), and also simplifies the ljmp
instruction (which will still have to be emulated).
Load the real-mode interrupt descriptor table register before
switching to real mode, since this avoids triggering an EXCEPTION_NMI
and corresponding VM exit.
This reduces the time taken by prot_to_real under KVM by around 65%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>