[aoe] Use an AoE config query to identify the target MAC address
The AoE spec does not specify that the source MAC address of a
received packet actually matches the MAC address of the AoE target.
In principle an AoE server can respond to an AoE request on any
interface available to it, which may not be an address configured to
accept AoE requests.
This issue is resolved by implementing AoE device discovery. The
purpose of AoE discovery is to find out which addresses an AoE target
can use for requests. An AoE configuration command is sent when the
AoE attach is attempted. The AoE target must respond to that
configuration query from an interface that can accept requests.
Based on a patch from Ryan Thomas <ryan@coraid.com>
[i386] Change [u]int32_t to [unsigned] int, rather than [unsigned] long
This brings us in to line with Linux definitions, and also simplifies
adding x86_64 support since both platforms have 2-byte shorts, 4-byte
ints and 8-byte long longs.
[infiniband] Add raw packet parser and constructor
This can be used with cards that require the driver to construct and
parse packet headers manually. Headers are optionally handled
out-of-line from the packet payload, since some such cards will split
received headers into a separate ring buffer.
[infiniband] Split subnet management agent client out into ib_smc.c
Not all Infiniband cards have embedded subnet management agents.
Split out the code that communicates with such an embedded SMA into a
separate ib_smc.c file, and have drivers call ib_smc_update()
explicitly when they suspect that the answers given by the embedded
SMA may have changed.
[infiniband] Pass address vector in receive completions
Receive completion handlers now get passed an address vector
containing the information extracted from the packet headers
(including the GRH, if present), and only the payload remains in the
I/O buffer.
This breaks the symmetry between transmit and receive completions, so
remove the ib_completer_t type and use an ib_completion_queue_operations
structure instead.
Rename the "destination QPN" and "destination LID" fields in struct
ib_address_vector to reflect its new dual usage.
Since the ib_completion structure now contains only an IB status code,
("syndrome") replace it with a generic gPXE integer status code.
[infiniband] Flush uncompleted work queue entries at QP teardown
Avoid leaking I/O buffers in ib_destroy_qp() by completing any
outstanding work queue entries with a generic error code. This
requires the completion handlers to be available to ib_destroy_qp(),
which is done by making them static configuration parameters of the CQ
(set by ib_create_cq()) rather than being provided on each call to
ib_poll_cq().
This mimics the functionality of netdev_{tx,rx}_flush(). The netdev
flush functions would previously have been catching any I/O buffers
leaked by the IPoIB data queue (though not by the IPoIB metadata
queue).
[netdevice] Retain and report detailed error breakdowns
netdev_rx_err() and netdev_tx_complete_err() get passed the error
code, but currently use it only in debug messages.
Retain error numbers and frequencey counts for up to
NETDEV_MAX_UNIQUE_ERRORS (4) different errors for each of TX and RX.
This allows the "ifstat" command to report the reasons for TX/RX
errors in most cases, even in non-debug builds.
[aoe] Start retry timer before potential temporary transmission failure
The retry timer needs to be running as soon as we know that we are
trying to transmit a command. If transmission fails because of a
temporary error condition, then the timer will allow us to retry the
transmission later.
[settings] Add the notion of a "tag magic" to numbered settings
Settings can be constructed using a dotted-decimal notation, to allow
for access to unnamed settings. The default interpretation is as a
DHCP option number (with encapsulated options represented as
"<encapsulating option>.<encapsulated option>".
In several contexts (e.g. SMBIOS, Phantom CLP), it is useful to
interpret the dotted-decimal notation as referring to non-DHCP
options. In this case, it becomes necessary for these contexts to
ignore standard DHCP options, otherwise we end up trying to, for
example, retrieve the boot filename from SMBIOS.
Allow settings blocks to specify a "tag magic". When dotted-decimal
notation is used to construct a setting, the tag magic value of the
originating settings block will be ORed in to the tag number.
Store/fetch methods can then check for the magic number before
interpreting arbitrarily-numbered settings.
[netdevice] Change link-layer push() and pull() methods to take raw types
EFI requires us to be able to specify the source address for
individual transmitted packets, and to be able to extract the
destination address on received packets.
Take advantage of this to rationalise the push() and pull() methods so
that push() takes a (dest,source,proto) tuple and pull() returns a
(dest,source,proto) tuple.
[netdevice] Split multicast hashing out into an mc_hash method
Multicast hashing is an ugly overlap between network and link layers.
EFI requires us to provide access to this functionality, so move it
out of ipv4.c and expose it as a method of the link layer.
[makefile] Add -Wformat-nonliteral as an extra warning category
-Wformat-nonliteral is not enabled by -Wall and needs to be explicitly
specified.
Modified the few files that use nonliteral format strings to work with
this new setting in place.
Inspired by a patch from Carl Karsten <carl@personnelware.com> and an
identical patch from Rorschach <r0rschach@lavabit.com>.
[iscsi] Change default initiator name prefix to "iqn.2000-01.org.etherboot:"
The domain etherboot.org was actually registered on 2000-01-09, not
2000-09-01. (To put it another way, it was registered on 1/9/2000 (US
date format) rather than 1/9/2000 (sensible date format); this may
illuminate the cause of the error.)
"iqn.2000-09.org.etherboot:" is still valid as per RFC3720, but may be
surprising to users, so change it to something less unexpected.
Thanks to the anonymous contributor for pointing this one out.
[undi] Fill in ProtType correctly in PXENV_UNDI_ISR
Determine the network-layer packet type and fill it in for UNDI
clients. This is required by some NBPs such as emBoot's winBoot/i.
This change requires refactoring the link-layer portions of the
gPXE netdevice API, so that it becomes possible to strip the
link-layer header without passing the packet up the network stack.
[dhcp] Do not restrict minimum retry time for ProxyDHCPREQUEST
The ProxyDHCPREQUEST is a unicast packet, so the first request will
almost always be lost due to not having the IP address in the ARP
cache. If the minimum retry time is set to one second (as per commit
ff2b6a5), then ProxyDHCP will time out and give up before managing to
successfully transmit a request.
The DHCP timers need to be reworked anyway, so this mild hack is
acceptable for now.
[retry] Added configurable timeouts to retry timer
New min_timeout and max_timeout fields in struct retry_timer allow
users of this timer to set their own desired minimum and maximum
timeouts, without being constrained to a single global minimum and
maximum. Users of the timer can still elect to use the default global
values by leaving the min_timeout and max_timeout fields as 0.
[pxe] If no ProxyDHCPACK exists, use DHCPACK for the fake ProxyDHCPACK packet
WinPE seems to have a bug that causes it to always use the TFTP server
IP address and filename from the ProxyDHCPACK packet, even if the
ProxyDHCPACK packet doesn't exist. This causes it to end up
attempting to fetch a file such as
tftp://0.0.0.0/bootmgr.exe
If we don't have a ProxyDHCPACK to use, we pretend that it was a copy
of the DHCPACK packet. This works around the problem, and hopefully
won't surprise any NBPs.
Altiris erroneously cares about the ordering of DHCP options, and will
get confused if we don't construct them in the order it expects.
This is observed (so far) only when attempting to deploy 64-bit Win2k3.
[ftp] Terminate processing after receiving an error
When an error reply (not 1xx, 2xx or 3xx) was received, ftp_reply()
invoked ftp_done() to close connections, but did not return, and the
rest of code in this function could try to send commands to the closed
control connection.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
[ftp] Cope with RETR completion prior to all data received
Based on a patch contributed by Sergey Vlasov <vsu@altlinux.ru> :
In my testing with "qemu -net user" the 226 response to RETR was
often received earlier than final packets of the data connection;
this caused the received file to become truncated without any error
indication. Fix this by adding an intermediate state FTP_TRANSFER
between FTP_RETR and FTP_QUIT, so that the transfer is considered to
be complete only when both the end of data connection is encountered
and the final reply to the RETR command is received.
Verifying server ID and DHCP transaction ID is insufficient to
differentiate between DHCPACK and ProxyDHCPACK when the DHCP server and
Proxy DHCP server are the same machine.
[dhcp] Allow DHCP non-option settings to be cleared
dhcppkt_store() is supposed to clear the setting if passed NULL for the
setting data. In the case of fixed-location fields (e.g. client IP
address), this requires setting the content of the field to all-zeros.
Perform the same test for a matching DHCP_SERVER_IDENTIFIER on
ProxyDHCPACKs as we do for DHCPACKs. Otherwise, a retransmitted
DHCPACK can end up being treated as the ProxyDHCPACK.
I have a vague and unsettling memory that this test was deliberately
omitted, but I can't remember why, and can't find anything in the VC
logs.
[slam] Add support for SLAM window lengths of greater than one packet
Add the definition of SLAM_MAX_BLOCKS_PER_NACK, which is roughly
equivalent to a TCP window size; it represents the maximum number of
packets that will be requested in a single NACK.
Note that, to keep the code size down, we still limit ourselves to
requesting only a single range per NACK; if the missing-block list is
discontiguous then we may request fewer than SLAM_MAX_BLOCKS_PER_NACK
blocks.
On any fast network, or with any driver that may drop packets
(e.g. Infiniband, which has very small RX rings), the traditional
usage of the SLAM protocol will result in enormous numbers of packet
drops and a consequent large number of retransmissions.
By adapting the client behaviour, we can force the server to act more
like a multicast TFTP server, with flow control provided by a single
master client.
This behaviour should interoperate with any traditional SLAM client
(e.g. Etherboot 5.4) on the network. The SLAM protocol isn't actually
documented anywhere, so it's hard to define either behaviour as
compliant or otherwise.
[dhcp] Do not transition to DHCPREQUEST without a valid DHCPOFFER
A missing test for dhcp->dhcpoffer in dhcp_timer_expired() was causing
the client to transition to DHCPREQUEST after timing out on waiting
for ProxyDHCP even if no DHCPOFFERs had been received.
[slam] Request all remaining blocks if we run out of space for the blocklist
In a SLAM NACK packet, if we run out of space to represent the
missing-block list, then indicate all remaining blocks as missing.
This avoids the need to wait for the one-second timeout before
receiving the blocks that otherwise wouldn't have been requested due
to running out of space.
[slam] Speed up NACK transmission by restricting the block-list length
Shorter NACK packets take less time to construct and spew out less
debug output, and there's a limit to how useful it is to send a
complete missing-block list anyway; if the loss rate is high then
we're going to have to retransmit an updated missing-block list
anyway.
Also add pretty debugging output to show the list of requested blocks.
[udp] Verify local socket address (if specified) for UDP sockets
UDP sockets can be used for multicast, at which point it becomes
plausible that we could receive packets that aren't destined for us
but that still match on a port number.
Maintain state for the advertised window length, and only ever increase
it (instead of calculating it afresh on each transmit). This avoids
triggering "treason uncloaked" messages on Linux peers.
Respond to zero-length TCP keepalives (i.e. empty data packets
transmitted outside the window). Even if the peer wouldn't otherwise
expect an ACK (because its packet consumed no sequence space), force an
ACK if it was outside the window.
We don't yet generate TCP keepalives. It could be done, but it's unclear
what benefit this would have. (Linux, for example, doesn't start sending
keepalives until the connection has been idle for two hours.)
[iSCSI] Produce meaningful errors on login failure
Return the most appropriate of EACCES, EPERM, ENODEV, ENOTSUP, EIO or
EINVAL depending on the exact error returned by the target, rather than
just always returning EPERM.
Also, ensure that error strings exist for these errors.
From: Viswanath Krishnamurthy <viswa.krish@gmail.com>
The current ipv4 incorrectly checks the IP address for multicast address.
This causes valid IPv4 unicast address to be trated as multicast address
For e.g if the PXE/tftp server IP address is 192.168.4.XXX where XXX is
224 or greater, it gets treated as multicast address and a ethernet
multicast address is sent out on the wire causing timeouts
[iSCSI] Offer CHAP authentication only if we have a username and password
Some EMC targets will fail if we advertise that we can authenticate with
CHAP, but the target is configured to allow unauthenticated access to that
target. We advertise AuthMethod=CHAP,None; the target should (I think)
select AuthMethod=None for unprotected targets. IETD does this, but an
EMC Celerra NS83 doesn't.
Fix by offering only AuthMethod=None if the user hasn't supplied a
username and password; this means that we won't be offering CHAP
authentication unless the user is expecting to use it (in which case the
target is presumably configured appropriately).
Many thanks to Alessandro Iurlano <alessandro.iurlano@gmail.com> for
reporting and helping to diagnose this problem.
[Infiniband] Add preliminary multiple port support for Hermon cards
Infiniband devices no longer block waiting for link-up in
register_ibdev().
Hermon driver needs to create an event queue and poll for link-up events.
Infiniband core needs to reread MAD parameters when link state changes.
IPoIB needs to cope with Infiniband link parameters being only partially
available at probe and open time.
[http] gPXE is a HTTP/1.0 client, not a HTTP/1.1 client
gPXE is not compliant with the HTTP/1.1 specification (RFC 2616),
since it lacks support for "Transfer-Encoding: chunked". gPXE is,
however, compliant with the HTTP/1.0 specification (RFC 1945), which
does not require "Transfer-Encoding: chunked" to be supported.
The only HTTP/1.1 feature that gPXE uses is the "Host:" header, but
servers universally accept that one from HTTP/1.0 clients as an
optional extension (it is obligatory for HTTP/1.1). gPXE does not,
for example, appear to support connection caching. Advertising as a
HTTP/1.0 client will typically make the server close the connection
immediately upon sending the last data, which is actually beneficial
if we aren't going to keep the connection alive anyway.