Expose image tail-recursion to iPXE scripts via the "--replace"
option. This functions similarly to exec() under Unix: the
currently-executing script is replaced with the new image (as opposed
to running the new image as a subroutine).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[efi] Standardise #include guard in ipxe_download.h
The script include/ipxe/efi/import.pl relies on a particular format
for the #include guard in order to detect EFI headers that are not
imported.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PXENV_FILE_CMDLINE is an iPXE extension, and will not be supported by
most PXE stacks. Do not report any errors to the user, since in
almost all cases the error will mean simply "not loaded by iPXE".
Reported-by: Patrick Domack <patrickdk@patrickdk.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI_CPU_IO_PROTOCOL is not available on all EFI platforms. In
particular, it is not available under OVMF, as used for qemu.
Since the EFI_CPU_IO_PROTOCOL is an abomination of unnecessary
complexity, banish it and use raw I/O instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Attempt to restore the network device to the state it was in prior to
calling the NBP. This simplifies the task of taking follow-up action
in an iPXE script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[settings] Expose exit status of failed command via ${errno}
Allow scripts to report errors in more detail by exposing the most
recent error via the ${errno} setting. For example:
chain ${filename} || goto failed
...
:failed
imgfree http://192.168.0.1/ipxe_error.php?error=${errno}
Note that ${errno} is valid only immediately after executing a failed
command.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[pxeprefix] Place temporary stack after iPXE binary
Some BIOSes (observed on a Supermicro system with an AMI BIOS) seem to
use the area immediately below 0x7c00 to store data related to the
boot process. This data is currently liable to be overwritten by the
temporary stack used while decompressing and installing iPXE.
Try to avoid any such problems by placing the temporary stack
immediately after the loaded iPXE binary. Any memory used by the
stack could then potentially have been overwritten anyway by a larger
binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On i350 the datasheet contradicts itself in stating that the default
value of RXDCTL.ENABLE for queue zero is both set (according to the
"Receive Initialization" section) and unset (according to the "Receive
Descriptor Control - RXDCTL" section). Empirical evidence suggests
that the default value is unset.
Explicitly enable both transmit and receive queues to avoid any
ambiguity.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Refill receive ring only after enabling receiver
On 82576 (and probably others), the datasheet states that "the tail
register of the queue (RDT[n]) should not be bumped until the queue is
enabled". There is some confusion over exactly what constitutes
"enabled": the initialisation blurb says that we should "poll the
RXDCTL register until the ENABLE bit is set", while the description
for the RXDCTL register says that the ENABLE bit is set by default
(for queue zero). Empirical evidence suggests that the ENABLE bit
reads as set immediately after writing to RCTL.EN, and so polling is
not necessary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[bzimage] Update setup_move_size only for protocol versions 2.00 and 2.01
The setup_move_size field is not defined in protocol versions earlier
than 2.00 (and is obsolete in versions later than 2.01). In binaries
using versions earlier than 2.00, the relevant location is likely to
contain executable code.
Interestingly, this bug has been present since support for pre-2.00
protocol versions was added in 2009, and has been unexpectedly
modifying the memtest86+ code fragment:
mov $0x92, %dx
inb %dx, %al
Fortuitously, the modification exactly overwrote the value loaded into
%dx, and so the net effect was limited to causing Fast Gate A20
detection to always fail.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A window size of 256kB should be sufficient to allow for
full-bandwidth transfers over a Gigabit LAN, and for acceptable
transfer speeds over other typical links.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The maximum TCP throughput is fundamentally limited by the amount of
available receive buffer space. Increase the heap size from 128kB to
512kB to allow the use of larger TCP windows.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcp] Truncate TCP window to prevent future packet discards
Whenever memory pressure causes a queued packet to be discarded (and
so retransmitted), reduce the maximum TCP window to a size that would
have prevented the discard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Discarding the active ARP cache entry in the middle of a download will
substantially disrupt the TCP stream. Try to minimise any such
disruption by treating ARP cache entries as expensive, and discarding
them only when nothing else is available to discard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[iobuf] Allocate I/O buffer descriptor separately to conserve aligned memory
I/O buffers are allocated on aligned boundaries. The I/O buffer
descriptor (the struct io_buffer) is currently attached to the end of
the I/O buffer. When the size of the buffer is close to its
alignment, this can waste large amounts of aligned memory.
For example, a network card using 2048-byte receive buffers will end
up allocating 2072 bytes on a 2048-byte boundary. This effectively
wastes 50% of the available memory.
Improve the situation by allocating the descriptor separately from the
main I/O buffer if inline allocation would cause the total allocated
size to cross the alignment boundary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Process all received packets in net_poll()
The current logic is to process at most one received packet per call
to net_poll(), on the basis that refilling the hardware descriptor
ring should be delayed as little as possible. However, this limits
the rate at which packets can be processed and ultimately ends up
adding latency which, in turn, limits the achievable throughput.
With temporary modifications in place to essentially remove all
resource constraints (heap size increased to 16MB, RX descriptor ring
increased to 64 descriptors) and a TCP window size of 1MB, the
throughput on a gigabit (i.e. 119MBps) network can be observed to fall
off exponentially from around 115MBps to around 75MBps. Changing
net_poll() to process all received packets results in a steady
119MBps throughput.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 196751c ("[build] Enable warnings when building utilities")
revealed a previously hidden compiler warning in util/nrv2b.c
regarding an out-of-bounds array subscript in the code
#if defined(SWD_BEST_OFF)
if (s->best_pos[2] == 0)
s->best_pos[2] = key + 1;
#endif
where best_pos[] is defined by
#define SWD_BEST_OFF 1
#if defined(SWD_BEST_OFF)
unsigned int best_off[ SWD_BEST_OFF ];
unsigned int best_pos[ SWD_BEST_OFF ];
#endif
With SWD_BEST_OFF set to 1, it can be proven that all code paths
referring to s->best_off[] and s->best_pos[] will never be executed,
with the exception of the two lines above. Since these two lines
alone can have no effect on execution, we can safely undefine
SWD_BEST_OFF.
Verified by comparing md5sums of bin/undionly.kpxe before and after
the change.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[arp] Prevent ARP cache entries from being deleted mid-transmission
Each ARP cache entry maintains a transmission queue, which is sent out
as soon as the link-layer address is known. If multiple packets are
queued, then it is possible for memory pressure to cause the ARP cache
discarder to be invoked during transmission of the first packet, which
may cause the ARP cache entry to be deleted before the second packet
can be sent. This results in an invalid pointer dereference.
Avoid this problem by reference-counting ARP cache entries and
ensuring that an extra reference is held while processing the
transmission queue, and by using list_first_entry() rather than
list_for_each_entry_safe() to traverse the queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit ea61075 ("[tcp] Add support for TCP window scaling") introduced
a potential NULL pointer dereference by referring to the connection's
send window scale before checking whether or not the connection is
known.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[iobuf] Relax alignment requirement for small I/O buffers
iPXE currently aligns all I/O buffers on a 2kB boundary. This is
overkill for transmitted packets, which are typically much smaller
than 2kB.
Align I/O buffers on their own size. This reduces the alignment
requirement for small buffers, while preserving the guarantee that I/O
buffers will never cross boundaries that might cause problems for some
DMA engines.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tls] Request a maximum fragment length of 2048 bytes
The default maximum plaintext fragment length for TLS is 16kB, which
is a substantial amount of memory for iPXE to have to allocate for a
temporary decryption buffer.
Reduce the memory footprint of TLS connections by requesting a maximum
fragment length of 2kB.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The maximum unscaled TCP window (64kB) implies a maximum bandwidth of
around 300kB/s on a WAN link with an RTT of 200ms. Add support for
the TCP window scaling option to remove this upper limit.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[undi] Align the received frame payload for faster processing
The undinet driver always has to make a copy of the received frame
into an I/O buffer. Align this copy sensibly so that subsequent
operations are as fast as possible.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[monojob] Check for keypresses only once per timer tick
Checking for keypresses takes a non-negligible amount of time, and
measurably affects our RTT. Minimise the impact by checking for
keypresses only once per timer tick.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcpip] Add faster algorithm for calculating the TCP/IP checksum
The generic TCP/IP checksum implementation requires approximately 10
CPU clocks per byte (as measured using the TSC). Improve this to
approximately 0.5 CPU clocks per byte by using "lodsl ; adcl" in an
unrolled loop.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcpip] Allow for architecture-specific TCP/IP checksum routines
Calculating the TCP/IP checksum on received packets accounts for a
substantial fraction of the response latency.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "rep" prefix can be used with an iteration count of zero, which
allows the variable-length memcpy() to be implemented without using
any conditional jumps.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A reasonably large (512MB) file transferred via HTTP over Gigabit
Ethernet should complete in around 4.6 seconds. Increase the
resolution of the "time" command to tenths of a second, to allow such
transfers to be meaningfully measured.
Signed-off-by: Michael Brown <mcb30@ipxe.org>