[librm] Add support for running in 64-bit long mode
Add support for running the BIOS version of iPXE in 64-bit long mode.
A 64-bit BIOS version of iPXE can be built using e.g.
make bin-x86_64-pcbios/ipxe.usb
make bin-x86_64-pcbios/8086100e.mrom
The 64-bit BIOS version should appear to function identically to the
normal 32-bit BIOS version. The physical memory layout is unaltered:
iPXE is still relocated to the top of the available 32-bit address
space. The code is linked to a virtual address of 0xffffffffeb000000
(in the negative 2GB as required by -mcmodel=kernel), with 4kB pages
created to cover the whole of .textdata. 2MB pages are created to
cover the whole of the 32-bit address space.
The 32-bit portions of the code run with VIRTUAL_CS and VIRTUAL_DS
configured such that truncating a 64-bit virtual address gives a
32-bit virtual address pointing to the same physical location.
The stack pointer remains as a physical address when running in long
mode (although the .stack section is accessible via the negative 2GB
virtual address); this is done in order to simplify the handling of
interrupts occurring while executing a portion of 32-bit code with
flat physical addressing via PHYS_CODE().
Interrupts may be enabled in either 64-bit long mode, 32-bit protected
mode with virtual addresses, 32-bit protected mode with physical
addresses, or 16-bit real mode. Interrupts occurring in any mode
other than real mode will be reflected down to real mode and handled
by whichever ISR is hooked into the BIOS interrupt vector table.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In a 64-bit build, the entirety of the 32-bit address space is
identity-mapped and so any valid physical address may immediately be
used as a virtual address. Conversely, a virtual address that is
already within the 32-bit address space may immediately be used as a
physical address.
A valid virtual address that lies outside the 32-bit address space
must be an address within .textdata, and so can be converted to a
physical address by adding virt_offset.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[librm] Mark virt_offset, text16, data16, rm_cs, and rm_ds as constant
The physical locations of .textdata, .text16 and .data16 are constant
from the point of view of C code. Mark the relevant variables as
constant to allow gcc to optimise out redundant reads.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[librm] Add phys_call() wrapper for calling code with physical addressing
Add a phys_call() wrapper function (analogous to the existing
real_call() wrapper function) for calling code with flat physical
addressing, and use this wrapper within the PHYS_CODE() macro.
Move the relevant functionality inside librm.S, where it more
naturally belongs.
The COMBOOT code currently uses explicit calls to _virt_to_phys and
_phys_to_virt. These will need to be rewritten if our COMBOOT support
is ever generalised to be able to run in a 64-bit build.
Specifically:
- com32_exec_loop() should be restructured to use PHYS_CODE()
- com32_wrapper.S should be restructured to use an equivalent of
prot_call(), passing parameters via a struct i386_all_regs
- there appears to be no need for com32_wrapper.S to switch between
external and internal stacks; this could be omitted to simplify
the design.
For now, librm.S continues to expose _virt_to_phys and _phys_to_virt
for use by com32.c and com32_wrapper.S. Similarly, librm.S continues
to expose _intr_to_virt for use by gdbidt.S.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The bulk of the iPXE binary (the .textdata section) is physically
relocated at runtime to the top of the 32-bit address space in order
to allow space for an OS to be loaded. The relocation is achieved
with the assistance of segmentation: we adjust the code and data
segment bases so that the link-time addresses remain valid.
Segmentation is not available (for normal code and data segments) in
long mode. We choose to compile the C code with -mcmodel=kernel and
use a link-time address of 0xffffffffeb000000. This choice allows us
to identity-map the entirety of the 32-bit address space, and to alias
our chosen link-time address to the physical location of our .textdata
section. (This requires the .textdata section to always be aligned to
a page boundary.)
We simultaneously choose to set the 32-bit virtual address segment
bases such that the link-time addresses may simply be truncated to 32
bits in order to generate a valid 32-bit virtual address. This allows
symbols in .textdata to be trivially accessed by both 32-bit and
64-bit code.
There is no (sensible) way in 32-bit assembly code to generate the
required R_X86_64_32S relocation records for these truncated symbols.
However, subtracting the fixed constant 0xffffffff00000000 has the
same effect as truncation, and can be represented in a standard
R_X86_64_32 relocation record. We define the VIRTUAL() macro to
abstract away this truncation operation, and apply it to all
references by 32-bit (or 16-bit) assembly code to any symbols within
the .textdata section.
We define "virt_offset" for a 64-bit build as "the value to be added
to an address within .textdata in order to obtain its physical
address". With this definition, the low 32 bits of "virt_offset" can
be treated by 32-bit code as functionally equivalent to "virt_offset"
in a 32-bit build.
We define "text16" and "data16" for a 64-bit build as the physical
addresses of the .text16 and .data16 sections. Since a physical
address within the 32-bit address space may be used directly as a
64-bit virtual address (thanks to the identity map), this definition
provides the most natural access to variables in .text16 and .data16.
Note that this requires a minor adjustment in prot_to_real(), which
accesses .text16 using 32-bit virtual addresses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[librm] Transition to protected mode within init_librm()
Long-mode operation will require page tables, which are too large to
sensibly fit in our .data16 segment in base memory.
Add a portion of init_librm() running in 32-bit protected mode to
provide access to high memory. Use this portion of init_librm() to
initialise the .textdata variables "virt_offset", "text16", and
"data16", eliminating the redundant (re)initialisation currently
performed on every mode transition as part of real_to_prot().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move most arch/i386 files to arch/x86, and adjust the contents of the
Makefiles and the include/bits/*.h headers to reflect the new
locations.
This patch makes no substantive code changes, as can be seen using a
rename-aware diff (e.g. "git show -M5").
This patch does not make the pcbios platform functional for x86_64; it
merely allows it to compile without errors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[vmware] Expose GuestRPC mechanism in 64-bit builds
The GuestRPC mechanism (used for VMWARE_SETTINGS and CONSOLE_VMWARE)
does not use any real-mode code and so can be exposed in both 64-bit
and 32-bit builds.
Reported-by: Matthew Helton <mwhelton@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[xen] Wait for and clear XenStore event before receiving data
Older, out-of-tree Xen kernel modules (such as those provided with
SuSE Linux Enterprise Server 11) do not clear the leftover "event
pending" bit when opening an event channel. Consequently, no event is
ever delivered to indicate that there is information in the XenStore
ring buffer, and the system hangs shortly after loading the
xen-platform-pci kernel module.
Work around this problem by always waiting for the XenStore event
channel to be signalled, and clearing the event before processing the
received data.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[int13con] Add basic ability to log to a local disk via INT 13
Several popular public cloud providers do not provide any sensible
mechanism for obtaining debug output from an OS which is failing to
boot. For example, Amazon EC2 provides the "Get System Log" facility,
which occasionally deigns to report a random subset of the characters
emitted via the VM's serial port, but usually returns only a blank
screen. (Amazingly, this is still superior to the debugging
facilities provided by Azure.)
Work around these shortcomings by adding a console type which sends
output to a magically detected raw disk partition, and including such
a partition within any iPXE .usb-format image.
To use this facility:
- build an iPXE .usb image with CONSOLE_INT13 enabled
- boot the cloud VM from this image
- after the boot fails, attach the VM's boot disk to a second VM
- from this second VM, use "less -f -R /dev/sdb3" (or similar) to
view the iPXE output.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The valgrind headers are not x86-specific; they detect the CPU
architecture and contain inline assembly for multiple architectures.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[legal] Relicense files under GPL2_OR_LATER_OR_UBDL
These files cannot be automatically relicensed by util/relicense.pl
since they either contain unusual but trivial contributions (such as
the addition of __nonnull function attributes), or contain lines
dating back to the initial git revision (and so require manual
knowledge of the code's origin).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[timer] Rewrite the 8254 Programmable Interval Timer support
The 8254 timer code (used to implement udelay()) has an unknown
provenance. Rewrite this code to avoid potential licensing
uncertainty.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with memcpy(), we can reduce the code size (by an average of 0.2%)
by giving the compiler more visibility into what memset() is doing,
and by avoiding the "rep" prefix on short fixed-length sequences of
string operations.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some of the C library string functions have an unknown provenance.
Reimplement all such functions to avoid potential licensing
uncertainty.
Remove the inline-assembler versions of strlen(), memswap(), and
strncmp(); these save a minimal amount of space (around 40 bytes in
total) and are not performance-critical.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for Xen PV-HVM domains (detected via the Xen
platform PCI device with IDs 5853:0001), including support for
accessing configuration via XenStore and enumerating devices via
XenBus.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ioapi] Fail ioremap() when attempting to map a zero bus address
When a 32-bit iPXE binary is running on a system which allocates PCI
memory BARs above 4GB, our PCI subsystem will return the base address
for any such BARs as zero (with a warning message if DEBUG=pci is
enabled). Currently, ioremap() will happily map an address pointing
to the start of physical memory, providing no sensible indication of
failure.
Fix by always returning NULL if we are asked to ioremap() a zero bus
address.
With a totally flat memory model (e.g. under EFI), this provides an
accurate failure indication since no PCI peripheral will be mapped to
the zero bus address.
With the librm memory model, there is the possibility of a spurious
NULL return from ioremap() if the bus address happens to be equal to
virt_offset. Under the current virtual memory map, the NULL virtual
address will always be the start of .textdata, and so this problem
cannot occur; a NULL return from ioremap() will always be an accurate
failure indication.
Debugged-by: Anton D. Kachalov <mouse@yandex-team.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The VESA frame buffer console uses the VESA BIOS extensions (VBE) to
enumerate video modes, selects an appropriate mode, and then hands off
to the generic frame buffer code.
The font is extracted from the VGA BIOS, avoiding the need to provide
an external font file.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[settings] Expose CPUID instruction via settings mechanism
Allow CPUID values to be read using the syntax
${cpuid/<register>.<function>}
For example, ${cpuid/2.0x80000001} will give the value of %ecx after
calling CPUID with %eax=0x80000001. Values for <register> are encoded
as %eax=0, %ebx=1, %ecx=2, %edx=3.
The numeric encoding is more sophisticated than described above,
allowing for settings such as the CPU model (obtained by calling CPUID
with %eax=0x80000002-0x80000004 inclusive and concatenating the values
returned in %eax:%ebx:%ecx:%edx). See the source code for details.
The "cpuvendor" and "cpumodel" settings provide easy access to these
more complex CPUID settings.
This functionality is intended to complement the "cpuid" command,
which allows for testing individual CPUID feature bits.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
No code from the original source remains within this file; relicense
under GPL2+ with a new copyright notice.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcpip] Add faster algorithm for calculating the TCP/IP checksum
The generic TCP/IP checksum implementation requires approximately 10
CPU clocks per byte (as measured using the TSC). Improve this to
approximately 0.5 CPU clocks per byte by using "lodsl ; adcl" in an
unrolled loop.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcpip] Allow for architecture-specific TCP/IP checksum routines
Calculating the TCP/IP checksum on received packets accounts for a
substantial fraction of the response latency.
Signed-off-by: Michael Brown <mcb30@ipxe.org>