[tcp] Allow out-of-order receive queue to be discarded
Allow packets in the receive queue to be discarded in order to free up
memory. This avoids a potential deadlock condition in which the
missing packet can never be received because the receive queue is
occupying all of the memory available for further RX buffers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a facility allowing cached data to be discarded in order to
satisfy memory allocations that would otherwise fail.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Maintain a queue of received packets, so that lost packets need not
result in retransmission of the entire TCP window.
Increase the TCP window to 8kB, in order that we can potentially
transmit enough duplicate ACKs to trigger Fast Retransmission at the
sender.
Using a 10MB HTTP download in qemu-kvm with an artificial drop rate of
1 in 64 packets, this reduces the download time from around 26s to
around 4s.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Provide a test mechanism for discarding packets at random
Setting NETDEV_DISCARD_RATE to a non-zero value will cause one in
every NETDEV_DISCARD_RATE packets to be discarded at random on both
the transmit and receive datapaths, allowing the robustness of
upper-layer network protocols to be tested even in simulation
environments that provide wholly reliable packet transmission.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[virtio] Replace virtio-net with native iPXE driver
This patch adds a native iPXE virtio-net driver and removes the legacy
Etherboot virtio-net driver. The main reasons for doing this are:
1. Multiple virtio-net NICs are now supported by iPXE. The legacy
driver kept global state and caused issues in virtual machines with
more than one virtio-net device.
2. Faster downloads. The native iPXE driver downloads 100 MB over
HTTP in 12s, the legacy Etherboot driver in 37s. This simple
benchmark uses KVM with tap networking and the Python
SimpleHTTPServer both running on the same host.
Changes to core virtio code reduce vring descriptors to 256 (QEMU uses
128 for virtio-blk and 256 for virtio-net) and change the opaque token
from u16 to void*. Lowering the descriptor count reduces memory
consumption. The void* opaque token change makes driver code simpler.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The new errdb error code database is more accurate than the regular
expression-based errcode scripts. This patch removes errcode scripts
in favor of errdb.
The gpxebot.py script is no longer needed, gpxebot has been released
as a separate open source codebase:
http://git.etherboot.org/?p=people/stefanha/gpxebot.git;a=summary
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[settings] Unregister the children when unregistering the parent
The DHCP settings registered as a child of the netdevice settings are
not unregistered anywhere. This prevents the netdevice from being
freed on shutdown.
Fix by automatically unregistering any child settings when the parent
settings are unregistered.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcp] Treat ACKs as sent only when successfully transmitted
iPXE currently forces sending (i.e. sends a pure ACK even in the
absence of fresh data to send) only in response to packets that
consume sequence space or that lie outside of the receive window.
This ignores the possibility that a previous ACK was not actually sent
(due to, for example, the retransmission timer running).
This does not cause incorrect behaviour, but does cause unnecessary
retransmissions from our peer. For example:
1. Peer sends final data packet (ack 106 seq 521..523)
2. We send FIN (seq 106..107 ack 523)
3. Peer sends FIN (ack 106 seq 523..524)
4. We send nothing since retransmission timer is running for our FIN
5. Peer ACKs our FIN (ack 107 seq 524..524)
6. We send nothing since this packet consumes no sequence space
7. Peer retransmits FIN (ack 107 seq 523..524)
8. We ACK peer's FIN (seq 107..107 ack 524)
What should happen at step (6) is that we should ACK the peer's FIN,
since we can deduce that we have never sent this ACK.
Fix by maintaining an "ACK pending" flag that is set whenever we are
made aware that our peer needs an ACK (whether by consuming sequence
space or by sending a packet that appears out of order), and is
cleared only when the ACK packet has been transmitted.
Reported-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcp] Use a dedicated timer for the TIME_WAIT state
iPXE currently repurposes the retransmission timer to hold the TCP
connection in the TIME_WAIT state (i.e. waiting for up to 2*MSL in
case we are required to re-ACK our peer's FIN due to a lost ACK).
However, the fact that this timer is running will prevent such an ACK
from ever being sent, since the logic in tcp_xmit() assumes that a
running timer indicates that we ourselves are waiting for an ACK and
so blocks the transmission. (We always wait for an ACK before sending
our next packet, to keep our transmit data path as simple as
possible.)
Fix by using an entirely separate timer for the TIME_WAIT state, so
that packets can still be sent.
Reported-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[build] Speed up rebuilding on header file changes
Split src_template into deps_template (which handles the definition of
foo_DEPS) and rules_template (which handles the rules referencing
foo_DEPS). The rules_template is not affected by any included header
files and so does not need to be reprocessed following a change to an
included header file.
This reduces the time required to rebuild the Makefile rules following
a change to stdint.h by around 45%, at a cost of increasing the time
required to rebuild after a "make veryclean" by around 3%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[build] Avoid unnecessary "rm" and "touch" in dependency generation
Speed up dependency generation by omitting the totally unnecessary
"rm" and "touch" commands. This reduces the time taken to generate
dependencies by around 6%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Weak functions whose visibility is hidden may be inlined due to a bug
in GCC. Explicitly mark weak functions noinline to work around the
problem.
This makes the PXE_MENU config option work again, the PXE boot menu
was never being called because the compiler inlined a weak stub
function.
The GCC bug was identified and fixed by Richard Sandiford
<rdsandiford@googlemail.com> but in the meantime iPXE needs to
implement a workaround.
Reported-by: Steve Jones <steve@squaregoldfish.co.uk>
Reported-by: Shao Miller <shao.miller@yrdsb.edu.on.ca>
Suggested-by: Joshua Oreman <oremanj@rwcr.net>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[hci] Continue processing while prompting for shell banner
Continue calling step() while displaying the shell banner. This
potentially allows TCP connections to close gracefully after a failed
boot attempt.
Inspired-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[debug] Expose pause() and more() debugging functions
Include the pause() and more() debugging functions within the general
iPXE debugging framework, by introducing DBGxxx_PAUSE() and
DBGxxx_MORE() macros.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
putline() was introduced back in 2007 for a feature that was never
committed. No console driver implements it and no code calls it, so
remove it from struct console_driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Every other scalar integer value in struct tcp_connection is in host
byte order; change the definition of local_port to match.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
image_set_cmdline() strdup()s cmdline, which free_image() doesn't
clean up.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This bug caused .probe to fail because the NIC did not reset properly.
Signed-off-by: Andrei Faur <da3drus@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add NonVolatile Option (nvo) and NonVolatile Storage (nvs) support to
the myri10ge driver using the EEPROM read/write mechanism provided by
the NIC's Vendor Specific PCI capability.
The myri10ge NIC is capabile of storing 64KB or more of nonvolatile
options, but this patch advertises only 512 bytes of nvo storage
because iPXE malloc's a buffer matching the total size we advertise.
512 is plenty without wasting malloc'd memory. (The 2 other drivers
currently supporting nvo advertise 256 bytes or less.)
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Make Ctrl-D delete a setting, because the Text User Interface (tui)
previously provided no way to delete a setting. Also, update the
on-screen instructions to describe the new feature. Deleting settings
is especially important for settings stored in precious nonvolatile
storage.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Implement jump scrolling with "..." displayed where the settings list
continues off-screen, because there are now too many settings to fit
on screen in the "config ..." text user interface.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a PCI_CAP_ID_VNDR definition for the PCI standard "Vendor
Specific" capability ID.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The existence and usage of the BEV entry point is covered by the PnP
spec, not the BBS spec; the BBS spec merely describes a policy for
selecting the boot device order. iPXE should therefore check only for
a PnP BIOS in order to decide whether or not to hook INT19.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[build] Fix broken build caused by implied dependency upon "perl"
Commit ea12dc0 ("[build] Avoid hard-coding the path to perl")
introduced a build failure for fully clean trees (e.g. after running
"make veryclean"), since the dependency upon $(PARSEROM) now includes
a dependency upon "perl" (which doesn't exist) rather than upon
"/usr/bin/perl" (which does exist).
There should of course be no dependency upon the perl binary at all;
the dependency should be upon "./util/parserom.pl" alone.
Fix by removing the $(PERL) from the definition of Perl-based utility
paths, and adding $(PERL) at the point of usage.
Reported-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The path "/usr/bin/perl" has been hard-coded since Etherboot 5.1, for
no discernible reason. Use just "perl" instead to fix the
inconsistency and allow building on systems with Perl installed
outside of /usr/bin.
Reported-by: Gabor Z. Papp <gzp@papp.hu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[r8169] Remove driver cfg lookup, use pci_device_id->driver_data instead
This patch removes the cfg lookup made in the r8169 driver and
replaces it with equivalent information found in the driver_data field
of the pci_device_id structure.
Signed-off-by: Andrei Faur <da3drus@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The handshake record in TLS can contain multiple messages.
Originally-fixed-by: Timothy Stack <tstack@vmware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Since more reference-counted structures than embedded images might
want to mark themselves unfreeable, expose a dummy ref_no_free().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[interface] Allow for non-pass-through interface methods
xfer_vredirect() should not be allowed to propagate to a pass-through
interface. For example, when an HTTPS connection is opened, the
redirect message should cause the TLS layer to reopen the TCP socket,
rather than causing the HTTP layer to disconnect from the TLS layer.
Fix by allowing for non-pass-through interface methods, and setting
xfer_vredirect() to be one such method.
This is slightly ugly, in that it complicates the notion of an
interface method call by adding a "pass-through" / "non-pass-through"
piece of metadata. However, the only current user of xfer_vredirect()
is iscsi.c, which uses it only because we don't yet have an
ioctl()-style call for retrieving the underlying socket address.
The new interface infrastructure allows for such a call to be created,
at which time this sole user of xfer_vredirect() can be removed,
xfer_vredirect() can cease to be an interface method and become simply
a wrapper around xfer_vreopen(), and the concept of a non-pass-through
interface method can be reverted.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[interface] Convert all data-xfer interfaces to generic interfaces
Remove data-xfer as an interface type, and replace data-xfer
interfaces with generic interfaces supporting the data-xfer methods.
Filter interfaces (as used by the TLS layer) are handled using the
generic pass-through interface capability. A side-effect of this is
that deliver_raw() no longer exists as a data-xfer method. (In
practice this doesn't lose any efficiency, since there are no
instances within the current codebase where xfer_deliver_raw() is used
to pass data to an interface supporting the deliver_raw() method.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[interface] Convert all name-resolution interfaces to generic interfaces
Remove name-resolution as an interface type, and replace
name-resolution interfaces with generic interfaces supporting the
resolv_done() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[interface] Convert all job-control interfaces to generic interfaces
Remove job-control as an interface type, and replace job-control
interfaces with generic interfaces supporting the close() method.
(Both done() and kill() are absorbed into the function of close();
kill() is merely close(-ECANCELED).)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[interface] Expand object interface to allow for polymorphic interfaces
We have several types of object interface at present (data-xfer, job
control, name resolution), and there is some duplication of
functionality between them. For example, job_done(), job_kill() and
xfer_close() are almost isomorphic to each other.
This updated version of the object interface mechanism allows for each
interface to export an arbitrary list of supported operations.
Advantages include:
Operations methods now receive a pointer to the object, rather than
a pointer to the interface. This allows an object to, for example,
implement a single close() method that can handle close() operations
from any of its exposed interfaces.
The close() operation is implemented as a generic operation (rather
than having specific variants for data-xfer, job control, etc.).
This will allow functions such as monojob_wait() to be used to wait
for e.g. a name resolution to complete.
The amount of boilerplate code required in objects is reduced, not
least because it is no longer necessary to include per-interface
methods that simply use container_of() to derive a pointer to the
object and then tail-call to a common per-object method.
The cost of adding new operations is reduced; adding a new data-xfer
operation such as stat() no longer incurs the penalty of adding a
.stat member to the operations table of all existing data-xfer
interfaces.
The data-xfer, job control and name resolution interfaces have not yet
been updated to use the new interface mechanism, but the code will
still compile and run.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Standardise on using timer_init() to initialise an embedded retry
timer, to match the coding style used by other embedded objects.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Standardise on using ref_init() to initialise an embedded reference
count, to match the coding style used by other embedded objects.
Signed-off-by: Michael Brown <mcb30@ipxe.org>