[dhcp] Choose ProxyDHCP port based on presence of PXE options
If the ProxyDHCPOFFER already includes PXE options (i.e. option 60 is
set to "PXEClient" and option 43 is present) then assume that the
ProxyDHCPREQUEST can be sent to port 67, rather than port 4011. This
is a reasonable assumption, since in that case the ProxyDHCP server
has already demonstrated by responding to the DHCPDISCOVER that it is
listening on port 67. (If the ProxyDHCP server were not listening on
port 67, then the standard DHCP server would have been configured to
respond with option 60 set to "PXEClient" but no option 43 present.)
The PXE specification is ambiguous on this point; the specified
behaviour covers only the cases in which option 43 is *not* present in
the ProxyDHCPOFFER. In these cases, we will continue to send the
ProxyDHCPREQUEST to port 4011.
This change is required in order to allow us to interoperate with
dnsmasq, which listens only on port 67. (dnsmasq relies on
unspecified behaviour of the Intel PXE stack, which it seems will
retain the ProxyDHCPOFFER as an options source and never issue a
ProxyDHCPREQUEST, thereby enabling dnsmasq to omit listening on port
4011.)
IBM Tivoli PXE Server 5.1.0.3 is reported to send trailing garbage
bytes at the end of the OACK packet, which causes gPXE to reject the
packet and abort the TFTP transfer.
Work around the problem by processing as much as possible of the OACK,
and treating name/value parsing errors as non-fatal.
Reported-by: Shao Miller <Shao.Miller@yrdsb.edu.on.ca>
[dhcp] Send broadcast PXE boot server discovery requests to port 67
We currently send all boot server discovery requests to port 4011.
Section 2.2.1 of the PXE spec states that boot server discovery
packets should be "sent broadcast (port 67), multicast (port 4011), or
unicast (port 4011)". Adjust our behaviour so that any boot server
discovery packets that are sent to the broadcast address are directed
to port 67 rather than port 4011.
This is required for operation with dnsmasq as a PXE server, since
dnsmasq listens only on port 67, and relies upon this (specified)
behaviour.
This change may break some setups using the (itself very broken) Linux
PXE server from kano.org.uk. This server will, in its default
configuration, listen only on port 4011. It never constructs a boot
server list (PXE_BOOT_SERVERS, option 43.8), and uses the wrong
definitions for the discovery control bits (PXE_DISCOVERY_CONTROL,
option 43.6). The upshot is that it will always instruct the client
to perform multicast and broadcast discovery only. In setups lacking
a valid multicast route on the server side, this used to work because
gPXE would eventually give up on the (non-responsive) multicast
address and send a broadcast request to port 4011, which the Linux PXE
server would respond to. Now that gPXE correctly sends this broadcast
request to port 67 instead, it is never seen by the Linux PXE server,
and the boot fails. The fix is to either (a) set up a multicast route
correctly on the server side before starting the PXE server, or (b)
edit /etc/pxe.conf to contain the server's unicast address in the
"multicast_address" field (a hack that happens to work).
Suggested-by: Simon Kelley <simon@thekelleys.org.uk>
[dhcp] Perform ProxyDHCP only if we do not already have PXE options
This prevents gPXE from wasting time attempting to contact a ProxyDHCP
server on port 4011 if the DHCP response already contains the relevant
PXE options. This behaviour is hinted at (though not explicitly
specified) in the PXE spec, and seems to match what the Intel client
does.
Suggested-by: Simon Kelley <simon@thekelleys.org.uk>
[tables] Incorporate table data type information into table definition
Eliminate the potential for mismatches between table names and the
table entry data type by incorporating the data type into the
definition of the table, rather than specifying it explicitly in each
table accessor method.
[tables] Redefine methods for accessing linker tables
Intel's C compiler (icc) chokes on the zero-length arrays that we
currently use as part of the mechanism for accessing linker table
entries. Abstract away the zero-length arrays, to make a port to icc
easier.
Introduce macros such as for_each_table_entry() to simplify the common
case of iterating over all entries in a linker table.
Represent table names as #defined string constants rather than
unquoted literals; this avoids visual confusion between table names
and C variable or type names, and also allows us to force a
compilation error in the event of incorrect table names.
[xfer] Make consistent assumptions that xfer metadata can never be NULL
The documentation in xfer.h and xfer.c does not say that the metadata
parameter is optional in calls such as xfer_deliver_iob_meta() and the
deliver_iob() method. However, some code in net/ is prepared to
accept a NULL pointer, and xfer_deliver_as_iob() passes a NULL pointer
directly to the deliver_iob() method.
Fix this mess of conflicting assumptions by making everything assume
that the metadata parameter is mandatory, and fixing
xfer_deliver_as_iob() to pass in a dummy metadata structure (as is
already done in xfer_deliver_iob()).
[pxe] Obey lists of PXE Boot Servers and associated Discovery Control bits
Various combinations of options 43.6, 43.7 and 43.8 dictate which
servers we send Boot Server Discovery requests to, and which servers
we should accept responses from. Obey these options, and remove the
explicit specification of a single Boot Server from start_pxebs() and
dependent functions.
[iobuf] Add iob_disown() and use it where it simplifies code
There are many functions that take ownership of the I/O buffer they
are passed as a parameter. The caller should not retain a pointer to
the I/O buffer. Use iob_disown() to automatically nullify the
caller's pointer, e.g.:
xfer_deliver_iob ( xfer, iob_disown ( iobuf ) );
This will ensure that iobuf is set to NULL for any code after the call
to xfer_deliver_iob().
iob_disown() is currently used only in places where it simplifies the
code, by avoiding an extra line explicitly setting the I/O buffer
pointer to NULL. It should ideally be used with each call to any
function that takes ownership of an I/O buffer. (The SSA
optimisations will ensure that use of iob_disown() gets optimised away
in cases where the caller makes no further use of the I/O buffer
pointer anyway.)
If gcc ever introduces an __attribute__((free)), indicating that use
of a function argument after a function call should generate a
warning, then we should use this to identify all applicable function
call sites, and add iob_disown() as necessary.
A TFTP DATA packet with a block number of zero (representing a
negative offset within the file) could potentially cause problems.
Fixed by explicitly rejecting such packets.
Identified by Stefan Hajnoczi <stefanha@gmail.com>.
The DHCP client code now implements only the mechanism of the DHCP and
PXE Boot Server protocols. Boot Server Discovery can be initiated
manually using the "pxebs" command. The menuing code is separated out
into a user-level function on a par with boot_root_path(), and is
entered in preference to a normal filename boot if the DHCP vendor
class is "PXEClient" and the PXE boot menu option exists.
Try to qualify relative names in the DNS resolver using the DHCP Domain
Name. For example:
DHCP Domain Name: etherboot.org
(Relative) Name: www
yields:
www.etherboot.org
Only names with no dots ('.') will be modified. A name with one or more
dots is unchanged.
[tftp] Temporary fix for conveying TFTP block size to callers
pxe_tftp.c assumes that the first seek on its data-transfer interface
represents the block size. Apart from being an ugly hack, this will
also screw up file size calculation for files smaller than one block.
The proper solution would be to extend the data-transfer interface to
support the reporting of stat()-like data. This is not going to
happen until the cost of adding interface methods is reduced (a fix I
have planned since June 2008).
In the meantime, abuse the xfer_window() method to return the block
size, since it is not being used for anything else and is vaguely
justifiable.
Astonishingly, having returned the incorrect TFTP blocksize via
PXENV_TFTP_OPEN for almost a year seems not to have affected any of
the test cases run during that time; this bug was found only when
someone tried running the heavily-patched version of pxegrub found in
OpenSolaris.
PXE dictates a mechanism for boot menuing, involving prompting the
user with a variable message, waiting for a predefined keypress,
displaying a boot menu, and waiting for a selection.
This breaks the currently desirable abstraction that DHCP is a process
that can happen in the background without any user interaction.
Remove the lazy assumption that ProxyDHCP == "DHCP with option 60 set
to PXEClient", and explicitly separate the notion of ProxyDHCP from
the notion of packets containing PXE options.
It is possible to configure a DHCP server to hand out PXE options
without a ProxyDHCP server present. This requires setting option 60
to "PXEClient", which will cause gPXE to attempt ProxyDHCP.
We assume in several places that dhcp->proxydhcpack is set to the
DHCPACK packet containing option 60 set to "PXEClient". When we
transition into ProxyDHCPREQUEST, set dhcp->proxydhcpack=dhcp->dhcpack
so that this assumption holds true.
We ought to rename several references to "proxydhcp" to something more
accurate, such as "pxedhcp". Treating a single DHCP response as
potentially both DHCPOFFER and ProxyDHCPOFFER does make the code
smaller, but the variable names get confusing.
Pick out the first boot menu item from the boot menu (option 43.9) and
pass it to the boot server as the boot menu item (option 43.71).
Also improve DHCP debug messages to include more details of the
packets being transmitted.
[dhcp] Add preliminary support for PXE Boot Servers
Some PXE configurations require us to perform a third DHCP transaction
(in addition to the real DHCP transaction and the ProxyDHCP
transaction) in order to retrieve information from a "Boot Server".
This is an experimental implementation, since the actual behaviour is
not well specified in the PXE spec.
[dhcp] Centralise DHCP successful state transitions
Move all the DHCP state transition logic into a single function
dhcp_next_state(). This will make it easier to add support for PXE
Boot Servers, since it abstracts away the difference between "mark
DHCP as complete" and "transition to boot server discovery".
[dhcp] Allow for missing server ID in ProxyDHCPACK
The Linux PXE server (http://www.kano.org.uk/projects/pxe) does not
set the server identifier in its ProxyDHCP responses. If the server
ID is missing, do not treat this as an error.
This resolves the "vague and unsettling memory" mentioned in commit
fdb8481d ("[dhcp] Verify server identifier on ProxyDHCPACKs").
Note that we already accept ProxyDHCPOFFERs without a server
identifier; they get treated as potential BOOTP packets.
[settings] Add the notion of a "tag magic" to numbered settings
Settings can be constructed using a dotted-decimal notation, to allow
for access to unnamed settings. The default interpretation is as a
DHCP option number (with encapsulated options represented as
"<encapsulating option>.<encapsulated option>".
In several contexts (e.g. SMBIOS, Phantom CLP), it is useful to
interpret the dotted-decimal notation as referring to non-DHCP
options. In this case, it becomes necessary for these contexts to
ignore standard DHCP options, otherwise we end up trying to, for
example, retrieve the boot filename from SMBIOS.
Allow settings blocks to specify a "tag magic". When dotted-decimal
notation is used to construct a setting, the tag magic value of the
originating settings block will be ORed in to the tag number.
Store/fetch methods can then check for the magic number before
interpreting arbitrarily-numbered settings.
[dhcp] Do not restrict minimum retry time for ProxyDHCPREQUEST
The ProxyDHCPREQUEST is a unicast packet, so the first request will
almost always be lost due to not having the IP address in the ARP
cache. If the minimum retry time is set to one second (as per commit
ff2b6a5), then ProxyDHCP will time out and give up before managing to
successfully transmit a request.
The DHCP timers need to be reworked anyway, so this mild hack is
acceptable for now.
[retry] Added configurable timeouts to retry timer
New min_timeout and max_timeout fields in struct retry_timer allow
users of this timer to set their own desired minimum and maximum
timeouts, without being constrained to a single global minimum and
maximum. Users of the timer can still elect to use the default global
values by leaving the min_timeout and max_timeout fields as 0.
Altiris erroneously cares about the ordering of DHCP options, and will
get confused if we don't construct them in the order it expects.
This is observed (so far) only when attempting to deploy 64-bit Win2k3.
Verifying server ID and DHCP transaction ID is insufficient to
differentiate between DHCPACK and ProxyDHCPACK when the DHCP server and
Proxy DHCP server are the same machine.
Perform the same test for a matching DHCP_SERVER_IDENTIFIER on
ProxyDHCPACKs as we do for DHCPACKs. Otherwise, a retransmitted
DHCPACK can end up being treated as the ProxyDHCPACK.
I have a vague and unsettling memory that this test was deliberately
omitted, but I can't remember why, and can't find anything in the VC
logs.
[slam] Add support for SLAM window lengths of greater than one packet
Add the definition of SLAM_MAX_BLOCKS_PER_NACK, which is roughly
equivalent to a TCP window size; it represents the maximum number of
packets that will be requested in a single NACK.
Note that, to keep the code size down, we still limit ourselves to
requesting only a single range per NACK; if the missing-block list is
discontiguous then we may request fewer than SLAM_MAX_BLOCKS_PER_NACK
blocks.
On any fast network, or with any driver that may drop packets
(e.g. Infiniband, which has very small RX rings), the traditional
usage of the SLAM protocol will result in enormous numbers of packet
drops and a consequent large number of retransmissions.
By adapting the client behaviour, we can force the server to act more
like a multicast TFTP server, with flow control provided by a single
master client.
This behaviour should interoperate with any traditional SLAM client
(e.g. Etherboot 5.4) on the network. The SLAM protocol isn't actually
documented anywhere, so it's hard to define either behaviour as
compliant or otherwise.
[dhcp] Do not transition to DHCPREQUEST without a valid DHCPOFFER
A missing test for dhcp->dhcpoffer in dhcp_timer_expired() was causing
the client to transition to DHCPREQUEST after timing out on waiting
for ProxyDHCP even if no DHCPOFFERs had been received.
[slam] Request all remaining blocks if we run out of space for the blocklist
In a SLAM NACK packet, if we run out of space to represent the
missing-block list, then indicate all remaining blocks as missing.
This avoids the need to wait for the one-second timeout before
receiving the blocks that otherwise wouldn't have been requested due
to running out of space.
[slam] Speed up NACK transmission by restricting the block-list length
Shorter NACK packets take less time to construct and spew out less
debug output, and there's a limit to how useful it is to send a
complete missing-block list anyway; if the loss rate is high then
we're going to have to retransmit an updated missing-block list
anyway.
Also add pretty debugging output to show the list of requested blocks.
The PXE spec is (as usual) unclear on precisely when ProxyDHCPREQUESTs
should be issued. We adapt the following, slightly paranoid approach:
If an offer contains an IP address, then it is a normal DHCPOFFER.
If an offer contains an option #60 "PXEClient", then it is a
ProxyDHCPOFFER. Note that the same packet can be both a normal
DHCPOFFER and a ProxyDHCPOFFER.
After receiving the normal DHCPACK, if we have received a
ProxyDHCPOFFER, we unicast a ProxyDHCPREQUEST back to the ProxyDHCP
server on port 4011. If we time out waiting for a ProxyDHCPACK, we
treat this as a non-fatal error.
[Settings] Remove assumption that all settings have DHCP tag values
Allow for settings to be described by something other than a DHCP option
tag if desirable. Currently used only for the MAC address setting.
Separate out fake DHCP packet creation code from dhcp.c to fakedhcp.c.
Remove notion of settings from dhcppkt.c.
Rationalise dhcp.c to use settings API only for final registration of the
DHCP options, rather than using {store,fetch}_setting throughout.
[DHCP] Fix up fake-packet creation as used by PXENV_GET_CACHED_INFO
Add dedicated functions create_dhcpdiscover(), create_dhcpack() and
create_proxydhcpack() for use by external code such as the PXE preboot
code.
Register ProxyDHCP options under the global scope "proxydhcp".
Unregister previously-acquired DHCP and ProxyDHCP settings when DHCP
succeeds.