[infiniband] Make qkey and rate optional parameters to ib_post_send()
The queue key is stored as a property of the queue pair, and so can
optionally be added by the Infiniband core at the time of calling
ib_post_send(), rather than always having to be specified by the
caller.
This allows IPoIB to avoid explicitly keeping track of the data queue
key.
[ipoib] Clarify new role of IPoIB peer cache as for MAC addresses only
Now that path record lookups are handled entirely via
ib_resolve_path(), the only role of the IPoIB peer cache is as a
lookup table for MAC addresses. Update the code structure and
comments to reflect this.
The IPoIB broadcast MAC address varies according to the partition key.
Now that the broadcast MAC address is a property of the network device
rather than of the link layer, we can expose this real MAC address
directly.
The broadcast LID is now identified via a path record lookup; this is
marginally inefficient (since it was present in the MCMemberRecord
GetResponse), but avoids the need to special-case broadcasts when
constructing the address vector in ipoib_transmit().
Generalise the subnet management agent into a general management agent
capable of sending and responding to MADs, including support for
retransmissions as necessary.
Currently, all Infiniband users must create a process for polling
their completion queues (or rely on a regular hook such as
netdev_poll() in ipoib.c).
Move instead to a model whereby the Infiniband core maintains a single
process calling ib_poll_eq(), and polling the event queue triggers
polls of the applicable completion queues. (At present, the
Infiniband core simply polls all of the device's completion queues.)
Polling a completion queue will now implicitly refill all attached
receive work queues; this is analogous to the way that netdev_poll()
implicitly refills the RX ring.
Infiniband users no longer need to create a process just to poll their
completion queues and refill their receive rings.
[infiniband] Centralise assumption of 2048-byte payloads
IPoIB and the SMA have separate constants for the packet size to be
used to I/O buffer allocations. Merge these into the single
IB_MAX_PAYLOAD_SIZE constant.
(Various other points in the Infiniband stack have hard-coded
assumptions of a 2048-byte payload; we don't currently support
variable MTUs.)
[netdevice] Make ll_broadcast per-netdevice rather than per-ll_protocol
IPoIB has a link-layer broadcast address that varies according to the
partition key. We currently go through several contortions to pretend
that the link-layer address is a fixed constant; by making the
broadcast address a property of the network device rather than the
link-layer protocol it will be possible to simplify IPoIB's broadcast
handling.
[ata] Make ATA command issuing partially asynchronous
Move the icky call to step() from aoe.c to ata.c; this takes it at
least one step further away from where it really doesn't belong.
Unfortunately, AoE has the ugly aoe_discover() mechanism which means
that we still have a step() loop in aoe.c for now; this needs to be
replaced at some future point.
[xfer] Always nullify interface while sending close() message
Objects typically call xfer_close() as part of their response to a
close() message. If the initiating object has already nullified the
xfer interface then this isn't a problem, but it can lead to
unexpected behaviour when the initiating object is aiming to reuse the
connection and so does not nullify the interface.
Fix by always temporarily nullifying the interface during xfer_close()
(as was already being done by xfer_vreopen() in order to work around
this specific problem).
Reported-by: infernix <infernix@infernix.net>
Tested-by: infernix <infernix@infernix.net>
These commands can be used to activate or deactivate the PXE API (on a
specifiable network interface).
This is currently of limited use, since most image formats will call
shutdown() before booting the image, meaning that the underlying net
device gets shut down during remove_devices() anyway.
ifcommon_exec() was long-ago marked as __attribute__((regparm(2))) in
order to minimise the size of functions that call into it. Since
then, gPXE has added -mregparm=3 as a general compilation option, and
this "optimisation" is now counter-productive.
Change (and simplify) the prototype to minimise code size given the
current compilation conditions.
[pxe] Make pxe_init_structures() an initialisation function
pxe_init_structures() fills in the fields of the !PXE and PXENV+
structures that aren't known until gPXE starts up. Once gPXE is
started, these values will never change.
Make pxe_init_structures() an initialisation function so that PXE
users don't have to worry about calling it.
[pxe] Update UNDI transmit count before transmitting packet
It is possible that the UNDI ISR may be triggered before netdev_tx()
returns control to pxenv_undi_transmit(). This means that
pxenv_undi_isr() may see a zero undi_tx_count, and so not check for TX
completions. This is not a significant problem, since it will check
for TX completions on the next call to pxenv_undi_isr() anyway; it
just means that the NBP will see a spurious IRQ that was apparently
caused by nothing.
Fix by updating the undi_tx_count before calling netdev_tx(), so that
pxenv_undi_isr() can decrement it and report the TX completion.
Symantec Ghost requires working multicast support. gPXE configures
all (sufficiently supported) network adapters into "receive all
multicasts" mode, which means that PXENV_UNDI_SET_MCAST_ADDRESS is
actually a no-op, but the current implementation returns
PXENV_STATUS_UNSUPPORTED instead.
Fix by making PXENV_UNDI_SET_MCAST_ADDRESS return success. For good
measure, also implement PXENV_UNDI_GET_MCAST_ADDRESS, since the
relevant functionality is now exposed by the net device core.
Note that this will silently fail if the gPXE driver for the NIC being
used fails to configure the NIC in "receive all multicasts" mode.
The PXE debugging messages have remained pretty much unaltered since
Etherboot 5.4, and are now difficult to read in comparison to most of
the rest of gPXE.
Bring the pxe_undi debug messages up to normal gPXE standards.
[hci] Expose ifcommon_exec() in a local header so wireless commands can use it
This keeps code size down, since the wireless interface management
commands have the same command-line interface and overall structure as
the wired commands.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
[ifmgmt] Move link-up status messages from autoboot() to iflinkwait()
With the addition of link status codes, we can now display a detailed
error indication if iflinkwait() fails.
Putting the error output in iflinkwait avoids code duplication, and
gains symmetry with the other interface management routines; ifopen()
already prints an error directly if it cannot open its interface.
Modified-by: Michael Brown <mcb30@etherboot.org>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
[netdevice] Add mechanism for reporting detailed link status codes
Expand the NETDEV_LINK_UP bit into a link_rc status code field,
allowing specific reasons for link failure to be reported via
"ifstat".
Originally-authored-by: Joshua Oreman <oremanj@rwcr.net>
[pxe] Fix interoperability with the Symantec (undipd) DOS UNDI driver
The Symantec UNDI DOS driver fails when run on top of gPXE because we
return our interface type as "gPXE" rather than one of the predefined
NDIS interface type strings.
Fix by returning the standard "DIX+802.3" string; this isn't
necessarily always accurate, but it's highly unlikely that anything
trying to use the UNDI API would understand our IPoIB link-layer
pseudo-header anyway.
[pxe] Fix interoperability with the Intel DOS UNDI driver
The Intel DOS UNDI driver fails when run on top of gPXE because we do
not fill in the ServiceFlags field in PXENV_UNDI_GET_IFACE_INFO.
Fix by filling in the ServiceFlags field with reasonable values
indicating our approximate feature capabilities.
[pxe] Fix interoperability with the 3Com DOS UNDI driver
The 3Com DOS UNDI driver fails when run on top of gPXE for two
reasons: firstly because PXENV_UNDI_SET_PACKET_FILTER is unsupported,
and secondly because gPXE enters the NBP without enabling interrupts
on the NIC, and the 3Com driver never calls PXENV_UNDI_OPEN.
Fix by always returning success from PXENV_UNDI_SET_PACKET_FILTER
(which is no worse than the current situation, since we already ignore
the receive packet filter in PXENV_UNDI_OPEN), and by forcibly
enabling interrupts on the NIC within PXENV_UNDI_TRANSMIT. The latter
is something of a hack, but avoids the need to implement a complete
base-code ISR that we would otherwise need if we were to enter the NBP
with interrupts enabled.
[tcp] Avoid rewinding sequence numbers on receiving old duplicate ACKs
Commit 558c1a4 ("[tcp] Improve robustness in the presence of duplicated
received packets") introduced a regression in that an old duplicate
ACK received while in the ESTABLISHED state would pass through normal
ACK processing, including updating tcp->snd_seq.
Fix by ensuring that ACK processing ignores all duplicate ACKs.
[tcp] Attempt to catch all possible error cases with debug messages
All TCP errors or unusual events should now generate a debugging
message at DBGLVL_LOG, with enough information (SEQ and ACK numbers)
to be able to identify the corresponding packet (or missing packet) in
a network trace from the remote end.
[tcp] Move high-frequency debug messages to DBGLVL_EXTRA
This makes it possible to leave TCP debugging enabled in order to see
interesting TCP events, without flooding the console with at least one
message per packet.
[image] Modify imgfree command to accept an argument
This resolves potential difficulties occurring when more than one script
is used. Total cost: 88 bytes uncompressed.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
[netdevice] Add netdev argument to link-layer push and pull handlers
In order to construct outgoing link-layer frames or parse incoming
ones properly, some protocols (such as 802.11) need more state than is
available in the existing variables passed to the link-layer protocol
handlers. To remedy this, add struct net_device *netdev as the first
argument to each of these functions, so that more information can be
fetched from the link layer-private part of the network device.
Updated all three call sites (netdevice.c, efi_snp.c, pxe_undi.c) and
both implementations (ethernet.c, ipoib.c) of ll_protocol to use the
new argument.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
The 93C66 is identical to the 93C56 in programming interface and
addressing, but twice as large in data storage (4096 bits). It's
used in some RTL8185 wireless cards.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
[tcp] Improve robustness in the presence of duplicated received packets
gPXE responds to duplicated ACKs with an immediate retransmission,
which can lead to a sorceror's apprentice syndrome. It also responds
to out-of-range (or old duplicate) ACKs with a RST, which can cause
valid connections to be dropped.
Fix the sorceror's apprentice syndrome by leaving the retransmission
timer running (and so inhibiting the immediate retransmission) when we
receive a potential duplicate ACK. This seems to match the behaviour
of Linux observed via wireshark traces.
Fix the RST issue by sending RST only on out-of-range ACKs that occur
before the connection is fully established, as per RFC 793.
These problems were exposed during development of the 802.11 wireless
link layer; the 802.11 protocol has a failure mode that can easily
cause duplicated packets. The fixes were tested in a controlled way
by faking large numbers of duplicated packets in the rtl8139 driver.
Originally-fixed-by: Joshua Oreman <oremanj@rwcr.net>