[tcp] Truncate TCP window to prevent future packet discards
Whenever memory pressure causes a queued packet to be discarded (and
so retransmitted), reduce the maximum TCP window to a size that would
have prevented the discard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Discarding the active ARP cache entry in the middle of a download will
substantially disrupt the TCP stream. Try to minimise any such
disruption by treating ARP cache entries as expensive, and discarding
them only when nothing else is available to discard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[iobuf] Allocate I/O buffer descriptor separately to conserve aligned memory
I/O buffers are allocated on aligned boundaries. The I/O buffer
descriptor (the struct io_buffer) is currently attached to the end of
the I/O buffer. When the size of the buffer is close to its
alignment, this can waste large amounts of aligned memory.
For example, a network card using 2048-byte receive buffers will end
up allocating 2072 bytes on a 2048-byte boundary. This effectively
wastes 50% of the available memory.
Improve the situation by allocating the descriptor separately from the
main I/O buffer if inline allocation would cause the total allocated
size to cross the alignment boundary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Process all received packets in net_poll()
The current logic is to process at most one received packet per call
to net_poll(), on the basis that refilling the hardware descriptor
ring should be delayed as little as possible. However, this limits
the rate at which packets can be processed and ultimately ends up
adding latency which, in turn, limits the achievable throughput.
With temporary modifications in place to essentially remove all
resource constraints (heap size increased to 16MB, RX descriptor ring
increased to 64 descriptors) and a TCP window size of 1MB, the
throughput on a gigabit (i.e. 119MBps) network can be observed to fall
off exponentially from around 115MBps to around 75MBps. Changing
net_poll() to process all received packets results in a steady
119MBps throughput.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 196751c ("[build] Enable warnings when building utilities")
revealed a previously hidden compiler warning in util/nrv2b.c
regarding an out-of-bounds array subscript in the code
#if defined(SWD_BEST_OFF)
if (s->best_pos[2] == 0)
s->best_pos[2] = key + 1;
#endif
where best_pos[] is defined by
#define SWD_BEST_OFF 1
#if defined(SWD_BEST_OFF)
unsigned int best_off[ SWD_BEST_OFF ];
unsigned int best_pos[ SWD_BEST_OFF ];
#endif
With SWD_BEST_OFF set to 1, it can be proven that all code paths
referring to s->best_off[] and s->best_pos[] will never be executed,
with the exception of the two lines above. Since these two lines
alone can have no effect on execution, we can safely undefine
SWD_BEST_OFF.
Verified by comparing md5sums of bin/undionly.kpxe before and after
the change.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[arp] Prevent ARP cache entries from being deleted mid-transmission
Each ARP cache entry maintains a transmission queue, which is sent out
as soon as the link-layer address is known. If multiple packets are
queued, then it is possible for memory pressure to cause the ARP cache
discarder to be invoked during transmission of the first packet, which
may cause the ARP cache entry to be deleted before the second packet
can be sent. This results in an invalid pointer dereference.
Avoid this problem by reference-counting ARP cache entries and
ensuring that an extra reference is held while processing the
transmission queue, and by using list_first_entry() rather than
list_for_each_entry_safe() to traverse the queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit ea61075 ("[tcp] Add support for TCP window scaling") introduced
a potential NULL pointer dereference by referring to the connection's
send window scale before checking whether or not the connection is
known.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[iobuf] Relax alignment requirement for small I/O buffers
iPXE currently aligns all I/O buffers on a 2kB boundary. This is
overkill for transmitted packets, which are typically much smaller
than 2kB.
Align I/O buffers on their own size. This reduces the alignment
requirement for small buffers, while preserving the guarantee that I/O
buffers will never cross boundaries that might cause problems for some
DMA engines.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tls] Request a maximum fragment length of 2048 bytes
The default maximum plaintext fragment length for TLS is 16kB, which
is a substantial amount of memory for iPXE to have to allocate for a
temporary decryption buffer.
Reduce the memory footprint of TLS connections by requesting a maximum
fragment length of 2kB.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The maximum unscaled TCP window (64kB) implies a maximum bandwidth of
around 300kB/s on a WAN link with an RTT of 200ms. Add support for
the TCP window scaling option to remove this upper limit.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[undi] Align the received frame payload for faster processing
The undinet driver always has to make a copy of the received frame
into an I/O buffer. Align this copy sensibly so that subsequent
operations are as fast as possible.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[monojob] Check for keypresses only once per timer tick
Checking for keypresses takes a non-negligible amount of time, and
measurably affects our RTT. Minimise the impact by checking for
keypresses only once per timer tick.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcpip] Add faster algorithm for calculating the TCP/IP checksum
The generic TCP/IP checksum implementation requires approximately 10
CPU clocks per byte (as measured using the TSC). Improve this to
approximately 0.5 CPU clocks per byte by using "lodsl ; adcl" in an
unrolled loop.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcpip] Allow for architecture-specific TCP/IP checksum routines
Calculating the TCP/IP checksum on received packets accounts for a
substantial fraction of the response latency.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "rep" prefix can be used with an iteration count of zero, which
allows the variable-length memcpy() to be implemented without using
any conditional jumps.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A reasonably large (512MB) file transferred via HTTP over Gigabit
Ethernet should complete in around 4.6 seconds. Increase the
resolution of the "time" command to tenths of a second, to allow such
transfers to be meaningfully measured.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use hw pointer in PCI driver data as expected by sky2_remove().
Signed-off-by: Valentine Barshak <gvaxon@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[crypto] Allow an error margin on X.509 certificate validity periods
iPXE has no concept of the local time zone, mainly because there is no
viable way to obtain time zone information in the absence of local
state. This causes potential problems with newly-issued certificates
and certificates that are about to expire.
Avoid such problems by allowing an error margin of around 12 hours on
certificate validity periods, similar to the error margin already
allowed for OCSP response timestamps.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[dhcp] Request broadcast responses when we already have an IPv4 address
FCoE requires the use of multiple local unicast link-layer addresses.
To avoid the complexity of managing multiple addresses, iPXE operates
in promiscuous mode. As a consequence, any unicast packets with
non-matching IPv4 addresses are rejected at the IPv4 layer (rather
than at the link layer).
This can cause problems when issuing a second DHCP request: if the
address chosen by the DHCP server does not match the existing address,
then the DHCP response will itself be rejected.
Fix by requesting a broadcast response from the DHCP server if the
network interface already has any IPv4 addresses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[romprefix] Treat 0xffffffff as an error return from PMM
PMM defines the return code 0xffffffff as meaning "unsupported
function". It's hard to imagine a PMM BIOS that doesn't support
pmmAllocate(), but apparently such things do exist.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[romprefix] Allow .mrom image to be placed anywhere within the BAR
A .mrom image currently assumes that it is the first image within the
expansion ROM BAR, which may not be correct when multiple images are
present.
Fix by scanning through the BAR until we locate an image matching our
build ID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[romprefix] Add a dummy ROM header to cover the .mrom payload
The header of a .mrom image declares its length to be only a few
kilobytes; the remainder is accessed via a sideband mechanism. This
makes it difficult to append an additional ROM image, such as an EFI
ROM.
Add a second, dummy ROM header covering the payload portion of the
.mrom image, allowing consumers to locate any appended ROM images in
the usual way.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[cmdline] Use "cpuid --ext" instead of "cpuid --amd"
Avoid potential confusion in the documentation by using a
vendor-neutral name for the extended (AMD-defined) feature set.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add "sync" command (loosely based on the Unix "sync"), which will wait
for any pending operations to complete.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE is fundamentally asynchronous in operation: some operations
continue in the background even after the foreground has continued to
a new task. For example, the closing FIN/ACK exchanges of a TCP
connection will take place in the background after an HTTP download
has completed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[http] Provide credentials only when requested by server
Provide HTTP Basic authentication credentials only in response to a
401 Unauthorized response from the server.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[http] Defer processing response code until after receiving all headers
Some headers can modify the meaning of the response code. For
example, a WWW-Authenticate header can change the interpretation of a
401 Unauthorized response from "Access denied" to "Please
authenticate".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[crypto] Rename KEY= to PRIVKEY= and "key" to "privkey"
The setting name "key" conflicts with the setting name "key" already
in use by the 802.11 code. Resolve the conflict by renaming the newer
setting to "privkey".
Signed-off-by: Michael Brown <mcb30@ipxe.org>