[netdevice] Allow the hardware and link-layer addresses to differ in size
IPoIB has a 20-byte link-layer address, of which only eight bytes
represent anything relating to a "hardware address".
The PXE and EFI SNP APIs expect the permanent address to be the same
size as the link-layer address, so fill in the "permanent address"
field with the initial link layer address (as generated by
register_netdev() based upon the real hardware address).
[netdevice] Separate out the concept of hardware and link-layer addresses
The hardware address is an intrinsic property of the hardware, while
the link-layer address can be changed at runtime. This separation is
exposed via APIs such as PXE and EFI, but is currently elided by gPXE.
Expose the hardware and link-layer addresses as separate properties
within a net device. Drivers should now fill in hw_addr, which will
be used to initialise ll_addr at the time of calling
register_netdev().
[infiniband] Add infrastructure for RC queue pairs
Queue pairs are now assumed to be created in the INIT state, with a
call to ib_modify_qp() required to bring the queue pair to the RTS
state.
ib_modify_qp() no longer takes a modification list; callers should
modify the relevant queue pair parameters (e.g. qkey) directly and
then call ib_modify_qp() to synchronise the changes to the hardware.
The packet sequence number is now a property of the queue pair, rather
than of the device.
Each queue pair may have an associated address vector. For RC queue
pairs, this is the address vector that will be programmed in to the
hardware as the remote address. For UD queue pairs, it will be used
as the default address vector if none is supplied to ib_post_send().
[infiniband] Make qkey and rate optional parameters to ib_post_send()
The queue key is stored as a property of the queue pair, and so can
optionally be added by the Infiniband core at the time of calling
ib_post_send(), rather than always having to be specified by the
caller.
This allows IPoIB to avoid explicitly keeping track of the data queue
key.
[ipoib] Clarify new role of IPoIB peer cache as for MAC addresses only
Now that path record lookups are handled entirely via
ib_resolve_path(), the only role of the IPoIB peer cache is as a
lookup table for MAC addresses. Update the code structure and
comments to reflect this.
The IPoIB broadcast MAC address varies according to the partition key.
Now that the broadcast MAC address is a property of the network device
rather than of the link layer, we can expose this real MAC address
directly.
The broadcast LID is now identified via a path record lookup; this is
marginally inefficient (since it was present in the MCMemberRecord
GetResponse), but avoids the need to special-case broadcasts when
constructing the address vector in ipoib_transmit().
Currently, all Infiniband users must create a process for polling
their completion queues (or rely on a regular hook such as
netdev_poll() in ipoib.c).
Move instead to a model whereby the Infiniband core maintains a single
process calling ib_poll_eq(), and polling the event queue triggers
polls of the applicable completion queues. (At present, the
Infiniband core simply polls all of the device's completion queues.)
Polling a completion queue will now implicitly refill all attached
receive work queues; this is analogous to the way that netdev_poll()
implicitly refills the RX ring.
Infiniband users no longer need to create a process just to poll their
completion queues and refill their receive rings.
[infiniband] Centralise assumption of 2048-byte payloads
IPoIB and the SMA have separate constants for the packet size to be
used to I/O buffer allocations. Merge these into the single
IB_MAX_PAYLOAD_SIZE constant.
(Various other points in the Infiniband stack have hard-coded
assumptions of a 2048-byte payload; we don't currently support
variable MTUs.)
[netdevice] Make ll_broadcast per-netdevice rather than per-ll_protocol
IPoIB has a link-layer broadcast address that varies according to the
partition key. We currently go through several contortions to pretend
that the link-layer address is a fixed constant; by making the
broadcast address a property of the network device rather than the
link-layer protocol it will be possible to simplify IPoIB's broadcast
handling.
[netdevice] Add netdev argument to link-layer push and pull handlers
In order to construct outgoing link-layer frames or parse incoming
ones properly, some protocols (such as 802.11) need more state than is
available in the existing variables passed to the link-layer protocol
handlers. To remedy this, add struct net_device *netdev as the first
argument to each of these functions, so that more information can be
fetched from the link layer-private part of the network device.
Updated all three call sites (netdevice.c, efi_snp.c, pxe_undi.c) and
both implementations (ethernet.c, ipoib.c) of ll_protocol to use the
new argument.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
[i386] Change [u]int32_t to [unsigned] int, rather than [unsigned] long
This brings us in to line with Linux definitions, and also simplifies
adding x86_64 support since both platforms have 2-byte shorts, 4-byte
ints and 8-byte long longs.
[infiniband] Respect hop pointer when building directed route SMP return path
The return path in directed route SMPs lists the egress ports in order
from SM to node, rather than from node to SM.
To write to the correct offset within the return path, we need to
parse the hop pointer. This is held within the class-specific data
portion of the MAD header, which was previously unused by us and
defined to be a uint16_t. Define this field to be a union type; this
requires some rearrangement of ib_mad.h and corresponding changes to
ipoib.c.
Some Infiniband cards will not be as accommodating as the Arbel and
Hermon cards in providing enough space for us to push a fake extra
header at the start of the received packet. We must therefore make do
with squeezing enough information to identify source and destination
addresses into the two bytes of padding within a genuine IPoIB
link-layer header.
[infiniband] Split subnet management agent client out into ib_smc.c
Not all Infiniband cards have embedded subnet management agents.
Split out the code that communicates with such an embedded SMA into a
separate ib_smc.c file, and have drivers call ib_smc_update()
explicitly when they suspect that the answers given by the embedded
SMA may have changed.
[infiniband] Pass address vector in receive completions
Receive completion handlers now get passed an address vector
containing the information extracted from the packet headers
(including the GRH, if present), and only the payload remains in the
I/O buffer.
This breaks the symmetry between transmit and receive completions, so
remove the ib_completer_t type and use an ib_completion_queue_operations
structure instead.
Rename the "destination QPN" and "destination LID" fields in struct
ib_address_vector to reflect its new dual usage.
Since the ib_completion structure now contains only an IB status code,
("syndrome") replace it with a generic gPXE integer status code.
[infiniband] Flush uncompleted work queue entries at QP teardown
Avoid leaking I/O buffers in ib_destroy_qp() by completing any
outstanding work queue entries with a generic error code. This
requires the completion handlers to be available to ib_destroy_qp(),
which is done by making them static configuration parameters of the CQ
(set by ib_create_cq()) rather than being provided on each call to
ib_poll_cq().
This mimics the functionality of netdev_{tx,rx}_flush(). The netdev
flush functions would previously have been catching any I/O buffers
leaked by the IPoIB data queue (though not by the IPoIB metadata
queue).
[netdevice] Change link-layer push() and pull() methods to take raw types
EFI requires us to be able to specify the source address for
individual transmitted packets, and to be able to extract the
destination address on received packets.
Take advantage of this to rationalise the push() and pull() methods so
that push() takes a (dest,source,proto) tuple and pull() returns a
(dest,source,proto) tuple.
[netdevice] Split multicast hashing out into an mc_hash method
Multicast hashing is an ugly overlap between network and link layers.
EFI requires us to provide access to this functionality, so move it
out of ipv4.c and expose it as a method of the link layer.
[undi] Fill in ProtType correctly in PXENV_UNDI_ISR
Determine the network-layer packet type and fill it in for UNDI
clients. This is required by some NBPs such as emBoot's winBoot/i.
This change requires refactoring the link-layer portions of the
gPXE netdevice API, so that it becomes possible to strip the
link-layer header without passing the packet up the network stack.
Add ability for network devices to flag link up/down state to the
networking core.
Autobooting code will now wait for link-up before attempting DHCP.
IPoIB reflects the Infiniband link state as the network device link state
(which is not strictly correct; we also need a succesful IPoIB IPv4
broadcast group join), but is probably more informative.
[Infiniband] Add preliminary multiple port support for Hermon cards
Infiniband devices no longer block waiting for link-up in
register_ibdev().
Hermon driver needs to create an event queue and poll for link-up events.
Infiniband core needs to reread MAD parameters when link state changes.
IPoIB needs to cope with Infiniband link parameters being only partially
available at probe and open time.
[Infiniband] Add preliminary support for multi-port devices.
Arbel and Hermon cards both have multiple ports. Add the
infrastructure required to register each port as a separate IB
device. Don't yet register more than one port, since registration
will currently fail unless a valid link is detected.
Use ib_*_{set,get}_{drv,owner}data wrappers to access driver- and
owner-private data on Infiniband structures.
Added an almost obscene amount of debugging and assertion code while
tracking down a bug that turned out to be a free_iob() used where I
needed a netdev_tx_complete(). This left the freed I/O buffer on the
net device's TX list, with bad, bad consequences later.
Also fixed the bug in question.