[netdevice] Limit MTU by hardware maximum frame length
Separate out the concept of "hardware maximum supported frame length"
and "configured link MTU", and limit the latter according to the
former.
In networks where the DHCP-supplied link MTU is inconsistent with the
hardware or driver capabilities (e.g. a network using jumbo frames),
this will result in iPXE advertising a TCP MSS consistent with a size
that can actually be received.
Note that the term "MTU" is typically used to refer to the maximum
length excluding the link-layer headers; we adopt this usage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Avoid using zero as a network device index
Avoid using zero as a network device index, so that a zero
sin6_scope_id can be used to mean "unspecified" (rather than
unintentionally meaning "net0").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic inject_fault() function that can be used to inject
random faults with configurable probabilities.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Add a generic concept of a "blocked link"
When Spanning Tree Protocol (STP) is used, there may be a substantial
delay (tens of seconds) from the time that the link goes up to the
time that the port starts forwarding packets.
Add a generic concept of a "blocked link" (i.e. a link which is up but
which is not expected to communicate successfully), and allow "ifstat"
to indicate when a link is blocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[legal] Relicense files under GPL2_OR_LATER_OR_UBDL
These files cannot be automatically relicensed by util/relicense.pl
since they either contain unusual but trivial contributions (such as
the addition of __nonnull function attributes), or contain lines
dating back to the initial git revision (and so require manual
knowledge of the code's origin).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reject network devices which appear to be duplicates of those already
available via a different underlying hardware device. On a Xen PV-HVM
system, this allows us to filter out the emulated PCI NICs (which
would otherwise appear alongside the netfront NICs).
Note that we cannot use the Xen facility to "unplug" the emulated PCI
NICs, since there is no guarantee that the OS we subsequently load
will have a native netfront driver.
We permit devices with the same MAC address if they are attached to
the same underlying hardware device (e.g. VLAN devices).
Inspired-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Reset network device index when last device is unregistered
When functioning as an EFI driver, drivers can be disconnected and
reconnected multiple times (e.g. via the EFI shell "connect" command,
or by running an executable such as ipxe.efi which will temporarily
disconnect existing drivers).
Minimise surprise by resetting the network device index to zero
whenever the last device is unregistered. This is not foolproof, but
it does handle the common case of having all devices unregistered and
then reregistered in the original order.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Mark devices as open before calling open() method
When opening a VLAN device, vlan_open() will call netdev_open() on the
trunk device. This will result in a call to netdev_notify(), which
will cause vlan_notify() to call vlan_sync() on the original VLAN
device, which will see that the trunk device is now open but the VLAN
device apparently isn't (since it has not yet been flagged as open by
netdev_open()). The upshot is a second attempt to open the VLAN
device, which will result in an erroneous second call to vlan_open().
This convoluted chain of events then terminates harmlessly since
vlan_open() calls netdev_open() on the trunk device, which just
returns immediately since the trunk device is by now flagged as being
already open.
Prevent this from happening by having netdev_open() flag the device as
open prior to calling the device's open() method, and reflagging it as
closed if the open() method fails.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Add generic concept of a network device configurator
iPXE supports multiple mechanisms for network device configuration:
DHCPv4 for IPv4, FIP for FCoE, and SLAAC for IPv6. At present, DHCPv4
requires an explicit action (e.g. a "dhcp" command), FIP is initiated
implicitly upon opening a network device, and SLAAC takes place
whenever a RA happens to be received.
Add a generic concept of a network device configurator, which provides
a common interface to triggering configuration and to reporting the
result of the configuration process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most network upper-layer drivers do not implement all three methods
(probe, notify, and remove). Save code by making all methods
optional.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
IPv6 link-local socket addresses require some way to specify a local
network device. We cannot simply use a pointer to the network device,
since a struct sockaddr_in6 may be long-lived and has no way to hold a
reference to the network device.
Using a network device index allows a socket address to cleanly refer
to a network device without worrying about whether or not that device
continues to exist.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow any iPXE command expecting a network device name to accept
"netX" as a synonym for "most recently opened network device".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Add netdev_tx_defer() to allow drivers to defer transmissions
Devices with small transmit descriptor rings may temporarily run out
of space. Provide netdev_tx_defer() to allow drivers to defer packets
for retransmission as soon as a descriptor becomes available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Use link-layer address as part of RNG seed
iPXE currently seeds the random number generator using the system
timer tick count. When large numbers of machines are booted
simultaneously, multiple machines may end up choosing the same DHCP
transaction ID (XID) value; this can cause problems.
Fix by using the least significant (and hence most variable) bits of
each network device's link-layer address to perturb the random number
generator. This introduces some per-machine unique data into the
random number generator's seed, and so reduces the chances of DHCP XID
collisions.
This does not affect the ANS X9.82-compatible random bit generator
used by TLS and other cryptography code, which uses an entirely
separate source of entropy.
Originally-implemented-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Separate VLAN support from presence of VLAN-supporting drivers
Some NICs (e.g. Hermon) provide hardware support for stripping the
VLAN tag, but do not provide any way for this support to be disabled.
Drivers for this hardware must therefore call vlan_find() to identify
a suitable receiving network device.
Provide a weak version of vlan_find() which will always return NULL if
VLAN support has not been enabled (either directly, or by enabling
a feature such as FCoE which requires VLAN support). This allows the
VLAN code to be omitted from builds where the user has not requested
support for VLANs.
Inspired-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Add vlan_tag() to get the VLAN tag of a network device
The iBFT has a VLAN field that should be filled in. Add the
vlan_tag() function to extract the VLAN tag of a network device.
Since VLAN support is optional, define a weak function that returns 0
when iPXE is built without VLAN support.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Clear network device setting before unregistering
Avoid memory leaks by clearing any (non-child) settings immediately
before unregistering the network device settings block.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Including a netdev_poll() within net_tx() can cause the net_step()
loop to end up processing hundreds or thousands of packets within a
single step, since each received packet being processed may trigger a
response which, in turn causes a poll for further received packets.
Network devices must now ensure that the TX ring is at least as large
as the RX ring, in order to avoid running out of TX descriptors. This
should not cause any problems; unlike the RX ring, there is no
substantial memory cost incurred by increasing the TX ring size.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Process all received packets in net_poll()
The current logic is to process at most one received packet per call
to net_poll(), on the basis that refilling the hardware descriptor
ring should be delayed as little as possible. However, this limits
the rate at which packets can be processed and ultimately ends up
adding latency which, in turn, limits the achievable throughput.
With temporary modifications in place to essentially remove all
resource constraints (heap size increased to 16MB, RX descriptor ring
increased to 64 descriptors) and a TCP window size of 1MB, the
throughput on a gigabit (i.e. 119MBps) network can be observed to fall
off exponentially from around 115MBps to around 75MBps. Changing
net_poll() to process all received packets results in a steady
119MBps throughput.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Allow driver to preinitialise the link-layer address
Drivers are currently expected to initialise only the hardware
address, with the link-layer protocol code taking care of converting
this into a valid link-layer address. Some drivers (e.g. undinet) can
legitimately determine both the hardware and link-layer addresses,
which may differ.
Allow for this situation by checking to see if the link-layer address
is empty before initialising it from the hardware address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Allow link layer to report broadcast/multicast packets via pull()
Allow the link layer to directly report whether or not a packet is
multicast or broadcast at the time of calling pull(), rather than
relying on heuristics to determine this at a later stage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[process] Pass containing object pointer to process step() methods
Give the step() method a pointer to the containing object, rather than
a pointer to the process. This is consistent with the operation of
interface methods, and allows a single function to serve as both an
interface method and a process step() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Allow non-completion TX errors to be recorded
Allow TX errors to be recorded against a network device even when the
packet didn't make it as far as netdev_tx().
Inspired-by: Dominik Russenberger <dominik.russenberger@terreactive.ch>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For devices that start in a link-down state, the user will see a
message such as:
[Link status: The socket is not connected (http://ipxe.org/38086001)]
Waiting for link-up on net0...
This is potentially misleading, since it suggests that there is a
genuine problem. Add a dedicated error message for "link down",
giving instead:
[Link status: Down (http://ipxe.org/38086101)]
Waiting for link-up on net0...
Reported-by: Tal Aloni <tal.aloni.il@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Mark devices as open only if opening succeeds
netdev_close() assumes that devices that are open are on the
open_list, which wasn't true if device specific opening failed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[settings] Apply settings block name in register_settings()
Pass the settings block name as a parameter to register_settings(),
rather than defining it with settings_init() (and then possibly
changing it by directly manipulating settings->name).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Allow per-device receive queue processing to be frozen
Several use cases (e.g. the UNDI API and the EFI SNP API) require
access to the raw network device receive queue, and so currently use
manual calls to netdev_poll() on a specific network device in order to
prevent received packets from being processed by the network stack.
As an alternative, provide a flag that allows receive queue processing
to be frozen on a per-device basis. When receive queue processing is
frozen, packets will be enqueued as normal, but will not be
automatically dequeued and passed up the network stack.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[list] Fix typographical error from previous commit
Fix typographical error from commit ea631f6 ("[list] Add
list_first_entry()"). The symptom was PXELINUX 3.86 causing a stack
overflow under VMware.
Tested-by: Shao Miller <shao.miller@yrdsb.edu.on.ca>
Signed-off-by: Shao Miller <shao.miller@yrdsb.edu.on.ca>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There are several points in the iPXE codebase where
list_for_each_entry() is (ab)used to extract only the first entry from
a list. Add a macro list_first_entry() to make this code easier to
read.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Pass both link-layer addresses in net_tx() and net_rx()
FCoE requires the use of fabric-provided MAC addresses, which breaks
the assumption that the net device's MAC address is implicitly the
source address for net_tx() and the (unicast) destination address for
net_rx().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Report network-layer errors via network device statistics
Errors generated by the network layer in response to received packets
are liable to be lost, since nothing systematically records these
errors and often the packets do not propagate far enough through the
stack to impact upon user-visible processes.
Improve this situation by recording network-layer errors in the
network device statistics.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Add the concept of a network upper-layer driver
Add the concept of a network upper-layer driver, which can create
arbitrary devices on top of network devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Provide a test mechanism for discarding packets at random
Setting NETDEV_DISCARD_RATE to a non-zero value will cause one in
every NETDEV_DISCARD_RATE packets to be discarded at random on both
the transmit and receive datapaths, allowing the robustness of
upper-layer network protocols to be tested even in simulation
environments that provide wholly reliable packet transmission.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Standardise on using ref_init() to initialise an embedded reference
count, to match the coding style used by other embedded objects.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Access to the gpxe.org and etherboot.org domains and associated
resources has been revoked by the registrant of the domain. Work
around this problem by renaming project from gPXE to iPXE, and
updating URLs to match.
Also update README, LOG and COPYRIGHTS to remove obsolete information.
Signed-off-by: Michael Brown <mcb30@ipxe.org>