Describe all SAN devices via ACPI tables such as the iBFT. For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device. For multi-device tables (i.e. the
iBFT), all devices are described in a single table.
An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table. We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors. This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.
For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully. For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths. We do this
since the iBFT has to include IP addresses rather than DNS names. We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.
Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures. We therefore describe only the
NICs that are actually required in order to reach the described
targets. Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[infiniband] Return status code from ib_create_cq() and ib_create_qp()
Any underlying errors arising during ib_create_cq() or ib_create_qp()
are lost since the functions simply return NULL on error. This makes
debugging harder, since a debug-enabled build is required to discover
the root cause of the error.
Fix by returning a status code from these functions, thereby allowing
any underlying errors to be propagated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Limit MTU by hardware maximum frame length
Separate out the concept of "hardware maximum supported frame length"
and "configured link MTU", and limit the latter according to the
former.
In networks where the DHCP-supplied link MTU is inconsistent with the
hardware or driver capabilities (e.g. a network using jumbo frames),
this will result in iPXE advertising a TCP MSS consistent with a size
that can actually be received.
Note that the term "MTU" is typically used to refer to the maximum
length excluding the link-layer headers; we adopt this usage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[infiniband] Avoid multiple calls to ib_cmrc_shutdown()
When a CMRC connection is closed, the deferred shutdown process calls
ib_destroy_qp(). This will cause the receive work queue entries to
complete in error (since they are being cancelled), which will in turn
reschedule the deferred shutdown process. This eventually leads to
ib_destroy_conn() being called on a connection that has already been
freed.
Fix by explicitly cancelling any pending shutdown process after the
shutdown process has completed.
Ironically, this almost exactly reverts commit 019d4c1 ("[infiniband]
Use a one-shot process for CMRC shutdown"); prior to the introduction
of one-shot processes the only way to achieve a one-shot process was
for the process to cancel itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ipoib] Fix a race when chain-loading undionly.kpxe in IPoIB
The Infiniband link status change callback ipoib_link_state_changed()
may be called while the IPoIB device is closed, in which case there
will not be an IPoIB queue pair to be joined to the IPv4 broadcast
group. This leads to NULL pointer dereferences in ib_mcast_attach()
and ib_mcast_detach().
Fix by not attempting to join (or leave) the broadcast group unless we
actually have an IPoIB queue pair.
Signed-off-by: Wissam Shoukair <wissams@mellanox.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[base16] Add buffer size parameter to base16_encode() and base16_decode()
The current API for Base16 (and Base64) encoding requires the caller
to always provide sufficient buffer space. This prevents the use of
the generic encoding/decoding functionality in some situations, such
as in formatting the hex setting types.
Implement a generic hex_encode() (based on the existing
format_hex_setting()), implement base16_encode() and base16_decode()
in terms of the more generic hex_encode() and hex_decode(), and update
all callers to provide the additional buffer length parameter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[process] Pass containing object pointer to process step() methods
Give the step() method a pointer to the containing object, rather than
a pointer to the process. This is consistent with the operation of
interface methods, and allows a single function to serve as both an
interface method and a process step() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[infiniband] Add node GUID as distinct from the first port GUID
iPXE currently uses the first port's port GUID as the node GUID,
rather than using the (possibly distinct) real node GUID. This can
confuse opensm during the handover to a loaded OS: it thinks the port
already belongs to a different node and so discards our port
information with a warning message about duplicate ports. Everything
is picked up correctly on the second subnet sweep, after opensm has
established that the "old" node no longer exists, but this can delay
link-up unnecessarily by several seconds.
Fix by using the real node GUID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[infiniband] Always call ib_link_state_changed() in ib_smc_update()
ib_smc_update() potentially updates the Infiniband port state, and so
should almost always be followed by a call to ib_link_state_changed().
The one exception is the call made to ib_smc_update() before the
device is registered.
Fix by removing explicit calls to ib_link_state_changed() from drivers
using ib_smc_update(), including a call to ib_link_state_changed()
within ib_smc_update(), and creating a separate ib_smc_init() for use
prior to device registration.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[block] Replace gPXE block-device API with an iPXE asynchronous interface
The block device interface used in gPXE predates the invention of even
the old gPXE data-transfer interface, let alone the current iPXE
generic asynchronous interface mechanism. Bring this old code up to
date, with the following benefits:
o Block device commands can be cancelled by the requestor. The INT 13
layer uses this to provide a global timeout on all INT 13 calls,
with the result that an unexpected passive failure mode (such as
an iSCSI target ACKing the request but never sending a response)
will lead to a timeout that gets reported back to the INT 13 user,
rather than simply freezing the system.
o INT 13,00 (reset drive) is now able to reset the underlying block
device. INT 13 users, such as DOS, that use INT 13,00 as a method
for error recovery now have a chance of recovering.
o All block device commands are tagged, with a numerical tag that
will show up in debugging output and in packet captures; this will
allow easier interpretation of bug reports that include both
sources of information.
o The extremely ugly hacks used to generate the boot firmware tables
have been eradicated and replaced with a generic acpi_describe()
method (exploiting the ability of iPXE interfaces to pass through
methods to an underlying interface). The ACPI tables are now
built in a shared data block within .bss16, rather than each
requiring dedicated space in .data16.
o The architecture-independent concept of a SAN device has been
exposed to the iPXE core through the sanboot API, which provides
calls to hook, unhook, boot, and describe SAN devices. This
allows for much more flexible usage patterns (such as hooking an
empty SAN device and then running an OS installer via TFTP).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[retry] Hold reference while timer is running and during expiry callback
Guarantee that a retry timer cannot go out of scope while the timer is
running, and provide a guarantee to the expiry callback that the timer
will remain in scope during the entire callback (similar to the
guarantee provided to interface methods).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[interface] Convert all data-xfer interfaces to generic interfaces
Remove data-xfer as an interface type, and replace data-xfer
interfaces with generic interfaces supporting the data-xfer methods.
Filter interfaces (as used by the TLS layer) are handled using the
generic pass-through interface capability. A side-effect of this is
that deliver_raw() no longer exists as a data-xfer method. (In
practice this doesn't lose any efficiency, since there are no
instances within the current codebase where xfer_deliver_raw() is used
to pass data to an interface supporting the deliver_raw() method.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Standardise on using timer_init() to initialise an embedded retry
timer, to match the coding style used by other embedded objects.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Standardise on using ref_init() to initialise an embedded reference
count, to match the coding style used by other embedded objects.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Access to the gpxe.org and etherboot.org domains and associated
resources has been revoked by the registrant of the domain. Work
around this problem by renaming project from gPXE to iPXE, and
updating URLs to match.
Also update README, LOG and COPYRIGHTS to remove obsolete information.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[infiniband] Make node description invariant across all ports
IBA section 14.2.5.2 states that "the contents of the NodeDescription
attribute are the same for all ports on a node". Satisfy this by
using the HCA GUID rather than the port GUID to form the node
description string.
[infiniband] Disambiguate CM connection rejection reasons
There is diagnostic value in being able to disambiguate between the
various reasons why an IB CM has rejected a connection attempt. In
particular, reason 8 "invalid service ID" can be used to identify an
incorrect SRP service_id root-path component, and reason 28 "consumer
reject" corresponds to a genuine SRP login rejection IU, which can be
passed up to the SRP layer.
For rejection reasons other than "consumer reject", we should not pass
through the private data, since it is most likely generated by the CM
without any protocol-specific knowledge.
[infiniband] Generate more specific errors in response to failure MADs
Generate errors within individual MAD transaction consumers such as
ib_pathrec.c and ib_mcast.c, rather than within ib_mi.c. This allows
for more meaningful error messages to eventually be displayed to the
user.
SRP is the SCSI RDMA Protocol. It allows for a method of SAN booting
whereby the target is responsible for reading and writing data using
Remote DMA directly to the initiator's memory. The software initiator
merely sends and receives SCSI commands; it never has to touch the
actual data.