[ipv4] Send gratuitous ARPs whenever a new IPv4 address is applied
In a busy network (such as a public cloud), IPv4 addresses may be
recycled rapidly. When this happens, unidirectional traffic (such as
UDP syslog) will succeed, but bidirectional traffic (such as TCP
connections) may fail due to stale ARP cache entries on other nodes.
The remote ARP cache expiry timeout is likely to exceed iPXE's
connection timeout, meaning that boot attempts can fail before the
problem is automatically resolved.
Fix by sending gratuitous ARPs whenever an IPv4 address is changed, to
attempt to update stale remote ARP cache entries. Note that this is
not a guaranteed fix, since ARP is an unreliable protocol.
We avoid sending gratuitous ARPs unconditionally, since otherwise any
unrelated settings change (e.g. "set dns 192.168.0.1") would cause
unexpected gratuitous ARPs to be sent.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcpip] Avoid generating positive zero for transmitted UDP checksums
TCP/IP checksum fields are one's complement values and therefore have
two possible representations of zero: positive zero (0x0000) and
negative zero (0xffff).
In RFC768, UDP over IPv4 exploits this redundancy to repurpose the
positive representation of zero (0x0000) to mean "no checksum
calculated"; checksums are optional for UDP over IPv4.
In RFC2460, checksums are made mandatory for UDP over IPv4. The
wording of the RFC is such that the UDP header is mandated to use only
the negative representation of zero (0xffff), rather than simply
requiring the checksum to be correct but allowing for either
representation of zero to be used.
In RFC1071, an example algorithm is given for calculating the TCP/IP
checksum. This algorithm happens to produce only the positive
representation of zero (0x0000); this is an artifact of the way that
unsigned arithmetic is used to calculate a signed one's complement
sum (and its final negation).
A common misconception has developed (exemplified in RFC1624) that
this artifact is part of the specification. Many people have assumed
that the checksum field should never contain the negative
representation of zero (0xffff).
A sensible receiver will calculate the checksum over the whole packet
and verify that the result is zero (in whichever representation of
zero happens to be generated by the receiver's algorithm). Such a
receiver will not care which representation of zero happens to be used
in the checksum field.
However, there are receivers in existence which will verify the
received checksum the hard way: by calculating the checksum over the
remainder of the packet and comparing the result against the checksum
field. If the representation of zero used by the receiver's algorithm
does not match the representation of zero used by the transmitter (and
so placed in the checksum field), and if the receiver does not
explicitly allow for both representations to compare as equal, then
the receiver may reject packets with a valid checksum.
For UDP, the combined RFCs effectively mandate that we should generate
only the negative representation of zero in the checksum field.
For IP, TCP and ICMP, the RFCs do not mandate which representation of
zero should be used, but the misconceptions which have grown up around
RFC1071 and RFC1624 suggest that it would be least surprising to
generate only the positive representation of zero in the checksum
field.
Fix by ensuring that all of our checksum algorithms generate only the
positive representation of zero, and explicitly inverting this in the
case of transmitted UDP packets.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ipv4] Allow IPv4 socket addresses to include a scope ID
Extend the IPv6 concept of "scope ID" (indicating the network device
index) to IPv4 socket addresses, so that IPv4 multicast transmissions
may specify the transmitting network device.
The scope ID is not (currently) exposed via the string representation
of the socket address, since IPv4 does not use the IPv6 concept of
link-local addresses (which could legitimately be specified in a URI).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ipv4] Redefine IP address constants to avoid unnecessary byte swapping
Redefine various IPv4 address constants and testing macros to avoid
unnecessary byte swapping at runtime, and slightly rename the macros
to prevent code from accidentally using the old definitions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
At some point in the past few years, binutils became more aggressive
at removing unused symbols. To function as a symbol requirement, a
relocation record must now be in a section marked with @progbits and
must not be in a section which gets discarded during the link (either
via --gc-sections or via /DISCARD/).
Update REQUIRE_SYMBOL() to generate relocation records meeting these
criteria. To minimise the impact upon the final binary size, we use
existing symbols (specified via the REQUIRING_SYMBOL() macro) as the
relocation targets where possible. We use R_386_NONE or R_X86_64_NONE
relocation types to prevent any actual unwanted relocation taking
place. Where no suitable symbol exists for REQUIRING_SYMBOL() (such
as in config.c), the macro PROVIDE_REQUIRING_SYMBOL() can be used to
generate a one-byte-long symbol to act as the relocation target.
If there are versions of binutils for which this approach fails, then
the fallback will probably involve killing off REQUEST_SYMBOL(),
redefining REQUIRE_SYMBOL() to use the current definition of
REQUEST_SYMBOL(), and postprocessing the linked ELF file with
something along the lines of "nm -u | wc -l" to check that there are
no undefined symbols remaining.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[legal] Relicense files under GPL2_OR_LATER_OR_UBDL
These files cannot be automatically relicensed by util/relicense.pl
since they either contain unusual but trivial contributions (such as
the addition of __nonnull function attributes), or contain lines
dating back to the initial git revision (and so require manual
knowledge of the code's origin).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The implementation of inet_aton() has an unknown provenance. Rewrite
this code to avoid potential licensing uncertainty.
Also move the code from core/misc.c to its logical home in net/ipv4.c,
and add a few extra test cases.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcpip] Provide tcpip_mtu() to determine the maximum transmission unit
Provide the function tcpip_mtu() to allow external code to determine
the (transport-layer) maximum transmission unit for a given socket
address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcpip] Provide tcpip_netdev() to determine the transmitting network device
Provide the function tcpip_netdev() to allow external code to
determine the transmitting network device for a given socket address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for equivalent IPv4 and IPv6 settings (which requires equivalent
settings to be adjacent within the settings list).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[settings] Allow for IPv6 setting types in non-IPv6 builds
Allow for the existence of references to IPv6 setting types without
dragging in the whole IPv6 stack, by placing the definition of
setting_type_ipv6 in core/settings.c and providing weak stub methods
for parse_ipv6_setting() and format_ipv6_setting().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[settings] Explicitly separate the concept of a completed fetched setting
The fetch_setting() family of functions may currently modify the
definition of the specified setting (e.g. to add missing type
information). Clean up this interface by requiring callers to provide
an explicit buffer to contain the completed definition of the fetched
setting, if required.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Merge common functionality between IPv4 and IPv6 ICMP echo handling,
and add support for transmitting ICMP echo requests and delivering
ICMP echo replies to a (not yet implemented) ping_rx() function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the existing partially-implemented IPv6 stack with a fresh
implementation.
This implementation is not yet complete. The IPv6 transmit and
receive datapaths are functional (including fragment reassembly and
parsing of arbitrary extension headers). NDP neighbour solicitations
and advertisements are supported. ICMPv6 echo is supported.
At present, only link-local addresses may be used, and there is no way
to specify an IPv6 address as part of a URI (either directly or via
a DNS lookup).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[dhcp] Request broadcast responses when we already have an IPv4 address
FCoE requires the use of multiple local unicast link-layer addresses.
To avoid the complexity of managing multiple addresses, iPXE operates
in promiscuous mode. As a consequence, any unicast packets with
non-matching IPv4 addresses are rejected at the IPv4 layer (rather
than at the link layer).
This can cause problems when issuing a second DHCP request: if the
address chosen by the DHCP server does not match the existing address,
then the DHCP response will itself be rejected.
Fix by requesting a broadcast response from the DHCP server if the
network interface already has any IPv4 addresses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow packet transmission to be deferred pending successful ARP
resolution. This avoids the time spent waiting for a higher-level
protocol (e.g. TCP or TFTP) to attempt retransmission.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ipv4] Use broadcast link-layer address for all broadcast IPv4 addresses
When transmitting, use the broadcast link-layer address for any
broadcast address (e.g. 192.168.0.255), not just INADDR_BROADCAST
(255.255.255.255).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Explicitly discard any unicast packets for addresses that we do not
control, to avoid unexpected behaviour when operating in promiscuous
mode (which is now the default, thanks to FCoE).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Allow link layer to report broadcast/multicast packets via pull()
Allow the link layer to directly report whether or not a packet is
multicast or broadcast at the time of calling pull(), rather than
relying on heuristics to determine this at a later stage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
At the time of attempting ARP resolution, we already know the
transmitting network device. We can therefore record ARP errors using
netdev_tx_err() so that they show up in the output of "ifstat".
Inspired-by: Dominik Russenberger <dominik.russenberger@terreactive.ch>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ipv4] Include network device metadata in packet traces
(Ab)use the "ident" field in transmitted IPv4 packets to convey
metadata about the network device. In particular:
bits 0-3 represent the low bits of the "RX" good packet counter
bits 4-7 represent the low bits of the "RXE" bad packet counter
bits 8-15 represent the transmitted packet sequence number
This allows some relevant information about the internal state of the
network device to be read out from a packet trace from a non-debug
build of iPXE. In particular, it allows a packet trace containing
packets transmitted by iPXE to indicate whether or not any packets
have been received by iPXE.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Improve the appearance of the "config" user interface by ensuring that
settings appear in some kind of logical order.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Pass both link-layer addresses in net_tx() and net_rx()
FCoE requires the use of fabric-provided MAC addresses, which breaks
the assumption that the net device's MAC address is implicitly the
source address for net_tx() and the (unicast) destination address for
net_rx().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[retry] Hold reference while timer is running and during expiry callback
Guarantee that a retry timer cannot go out of scope while the timer is
running, and provide a guarantee to the expiry callback that the timer
will remain in scope during the entire callback (similar to the
guarantee provided to interface methods).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[build] Fix misaligned table entries when using gcc 4.5
Declarations without the accompanying __table_entry cause misalignment
of the table entries when using gcc 4.5. Fix by adding the
appropriate __table_entry macro or (where possible) by removing
unnecessary forward declarations.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Standardise on using timer_init() to initialise an embedded retry
timer, to match the coding style used by other embedded objects.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Access to the gpxe.org and etherboot.org domains and associated
resources has been revoked by the registrant of the domain. Work
around this problem by renaming project from gPXE to iPXE, and
updating URLs to match.
Also update README, LOG and COPYRIGHTS to remove obsolete information.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ipv4] Ignore non-open net devices when performing routing
We do not discard routing table entries when closing an interface. It
is plausible that multiple interfaces may be on the same physical
network; if so, then we may end up in a situation whereby outbound
packets attempt to route via a closed interface.
Fix by ignoring non-open net devices in ipv4_route().
ipv4.c calculates the default subnet mask before calling
fetch_ipv4_setting() to retrieve the configured subnet mask (if any).
However, as of commit 612f4e7 "[settings] Avoid returning
uninitialised data on error in fetch_xxx_setting()",
fetch_ipv4_setting() will zero the IP address if the setting does not
exist, rather than leaving it unaltered.
Fix by fetching the setting first and calculating the default subnet
mask only if necessary.
[ipv4] Use a zero address to indicate "no gateway", rather than INADDR_NONE
ipv4.c uses a gateway address of INADDR_NONE to represent "no
gateway". It initialises the gateway address to INADDR_NONE before
calling fetch_ipv4_setting() to retrieve the configured gateway
address (if any).
However, as of commit 612f4e7 "[settings] Avoid returning
uninitialised data on error in fetch_xxx_setting()",
fetch_ipv4_setting() will zero the IP address if the setting does not
exist, rather than leaving it unaltered.
Fix by using a zero IP address to indicate "no gateway", so that a
non-existent gateway address setting will be treated as such.
[netdevice] Make ll_broadcast per-netdevice rather than per-ll_protocol
IPoIB has a link-layer broadcast address that varies according to the
partition key. We currently go through several contortions to pretend
that the link-layer address is a fixed constant; by making the
broadcast address a property of the network device rather than the
link-layer protocol it will be possible to simplify IPoIB's broadcast
handling.
[tcpip] Allow for transmission to multicast IPv4 addresses
When sending to a multicast address, it may be necessary to specify
the source address explicitly, since the multicast destination address
does not provide enough information to deduce the source address via
the miniroute table.
Allow the source address specified via the data-xfer metadata to be
passed down through the TCP/IP stack to the IPv4 layer, which can use
it as a default source address.
[netdevice] Split multicast hashing out into an mc_hash method
Multicast hashing is an ugly overlap between network and link layers.
EFI requires us to provide access to this functionality, so move it
out of ipv4.c and expose it as a method of the link layer.