[ipv4] Allow IPv4 socket addresses to include a scope ID
Extend the IPv6 concept of "scope ID" (indicating the network device
index) to IPv4 socket addresses, so that IPv4 multicast transmissions
may specify the transmitting network device.
The scope ID is not (currently) exposed via the string representation
of the socket address, since IPv4 does not use the IPv6 concept of
link-local addresses (which could legitimately be specified in a URI).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ipv4] Redefine IP address constants to avoid unnecessary byte swapping
Redefine various IPv4 address constants and testing macros to avoid
unnecessary byte swapping at runtime, and slightly rename the macros
to prevent code from accidentally using the old definitions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Avoid using zero as a network device index
Avoid using zero as a network device index, so that a zero
sin6_scope_id can be used to mean "unspecified" (rather than
unintentionally meaning "net0").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ipv6] Treat a missing network device name as "netX"
When an IPv6 socket address string specifies a link-local or multicast
address but does not specify the requisite network device name
(e.g. "fe80::69ff:fe50:5845" rather than "fe80::69ff:fe50:5845%net0"),
assume the use of "netX".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove AXTLS headers now that no AXTLS code remains, with many thanks
to the AXTLS project for use of their cryptography code over the past
several years.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the AES implementation from AXTLS with a dedicated iPXE
implementation which is slightly smaller and around 1000% faster.
This implementation has been verified using the existing self-tests
based on the NIST AES test vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise the existing support for performing CBC-mode block cipher
tests, and update the code to use okx() for neater reporting of test
results.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[build] Fix compiler warnings on some gcc versions
xfer_buffer() uses intf_get_dest_op() to obtain the destination
interface for xfer_deliver(), in order to check that this is the same
interface which provides xfer_buffer(). The return value from
intf_get_dest_op() (which contains the actual method implementing
xfer_deliver()) is not used.
On some gcc versions, this triggers a "value computed is not used"
warning, since the explicit type cast included within the
intf_get_dest_op() macro is treated as a "value computed".
Fix by explicitly casting the result of intf_get_dest_op() to void.
Reported-by: Matthew Helton <mwhelton@gmail.com>
Reported-by: James A. Peltier <jpeltier@sfu.ca>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the cost of implementing object methods which convey no
information beyond the fact that the method has been called.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[profile] Add profile_custom() for profiling with arbitrary time units
Provide profile_custom() as a trivial wrapper around profile_update()
to allow for the use of the profiling infrastructure by code using
timers other than the default profile_timestamp() provider.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[fault] Add inject_corruption() to randomly corrupt data
Provide an inject_corruption() function that can be used to randomly
corrupt data bytes with configurable probabilities.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic inject_fault() function that can be used to inject
random faults with configurable probabilities.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a named configuration for qemu, based on the config.ipxe.general.h
file taken from the current qemu repository and enabling the option to
work around the missing EFI_PXE_BASE_CODE_PROTOCOL.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE does not currently provide EFI_PXE_BASE_CODE_PROTOCOL: this
causes failures when chainloading bootloaders such as shim.efi which
assume that this protocol will be present.
Provide the ability to work around these problems via the build
configuration option EFI_DOWNGRADE_UX. If this option is enabled,
then we will not install our usual EFI_LOAD_FILE_PROTOCOL
implementation, thereby allowing the platform firmware to install its
own EFI_PXE_BASE_CODE_PROTOCOL implementation on top of our
EFI_SIMPLE_NETWORK_PROTOCOL handle.
A somewhat major side-effect of this workaround is that almost all
iPXE features will be disabled.
This configuration option will be removed in future when support for
EFI_PXE_BASE_CODE_PROTOCOL is added.
Requested-by: Laszlo Ersek <lersek@redhat.com>
Requested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[efi] Fix receive and transmit completion reporting
Fix the TxBuf value filled in by GetStatus() to report the transmit
buffer address as required by the (now clarified) specification.
Simplify "interrupt" handling in GetStatus() to report only that one
or more packets have been transmitted or received; there is no need to
report one GetStatus() "interrupt" per packet.
Simplify receive handling to dequeue received packets immediately from
the network device into an internal list (thereby avoiding the hacks
previously used to determine when to report new packet arrivals).
Originally-fixed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the high bit of the VGA text attribute byte via the ANSI SGR
parameters 5 ("blink on") and 25 ("blink off").
Note that some video cards (and virtual machines) may display a high
intensity background colour instead of blinking text.
Signed-off-by: Christian Nilsson <nikize@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Multicast MAC addresses will never have REMAC cache entries, and the
corresponding multicast IPoIB MAC address cannot be obtained simply by
issuing an ARP request.
For the trivial volume of multicast packets that we expect to send in
any realistic scenario, the simplest solution is to send them as
broadcasts instead.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcp] Gracefully close connections during shutdown
We currently do not wait for a received FIN before exiting to boot a
loaded OS. In the common case of booting from an HTTP server, this
means that the TCP connection is left consuming resources on the
server side: the server will retransmit the FIN several times before
giving up.
Fix by initiating a graceful close of all TCP connections and waiting
(for up to one second) for all connections to finish closing
gracefully (i.e. for the outgoing FIN to have been sent and ACKed, and
for the incoming FIN to have been received and ACKed at least once).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[xen] Wait for and clear XenStore event before receiving data
Older, out-of-tree Xen kernel modules (such as those provided with
SuSE Linux Enterprise Server 11) do not clear the leftover "event
pending" bit when opening an event channel. Consequently, no event is
ever delivered to indicate that there is information in the XenStore
ring buffer, and the system hangs shortly after loading the
xen-platform-pci kernel module.
Work around this problem by always waiting for the XenStore event
channel to be signalled, and clearing the event before processing the
received data.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ipoib] Attempt to generate ARPs as needed to repopulate REMAC cache
The only way to map an eIPoIB MAC address (REMAC) to an IPoIB MAC
address is to intercept an incoming ARP request or reply.
If we do not have an REMAC cache entry for a particular destination
MAC address, then we cannot transmit the packet. This can arise in at
least two situations:
- An external program (e.g. a PXE NBP using the UNDI API) may attempt
to transmit to a destination MAC address that has been obtained by
some method other than ARP.
- Memory pressure may have caused REMAC cache entries to be
discarded. This is fairly likely on a busy network, since REMAC
cache entries are created for all received (broadcast) ARP
requests. (We can't sensibly avoid creating these cache entries,
since they are required in order to send an ARP reply, and when we
are being used via the UNDI API we may have no knowledge of which
IP addresses are "ours".)
Attempt to ameliorate the situation by generating a semi-spurious ARP
request whenever we find a missing REMAC cache entry. This will
hopefully trigger an ARP reply, which would then provide us with the
information required to populate the REMAC cache.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with the neighbour cache, discarding an REMAC cache entry is
potentially very disruptive.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[pxe] Always reconstruct packet for PXENV_GET_CACHED_INFO
Avoid accidentally returning stale packets (e.g. for a previously
attempted network device) by always constructing a fresh DHCP packet.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the link is blocked (e.g. due to a Spanning Tree Protocol port not
yet forwarding packets) then defer DHCP discovery until the link
becomes unblocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[stp] Add support for detecting Spanning Tree Protocol non-forwarding ports
A fairly common end-user problem is that the default configuration of
a switch may leave the port in a non-forwarding state for a
substantial length of time (tens of seconds) after link up. This can
cause iPXE to time out and give up attempting to boot.
We cannot force the switch to start forwarding packets sooner, since
any attempt to send a Spanning Tree Protocol bridge PDU may cause the
switch to disable our port (if the switch happens to have the Bridge
PDU Guard feature enabled for the port).
For non-ancient versions of the Spanning Tree Protocol, we can detect
whether or not the port is currently forwarding and use this to inform
the network device core that the link is currently blocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[netdevice] Add a generic concept of a "blocked link"
When Spanning Tree Protocol (STP) is used, there may be a substantial
delay (tens of seconds) from the time that the link goes up to the
time that the port starts forwarding packets.
Add a generic concept of a "blocked link" (i.e. a link which is up but
which is not expected to communicate successfully), and allow "ifstat"
to indicate when a link is blocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[ethernet] Add minimal support for receiving LLC frames
In some Ethernet framing variants the two-byte protocol field is used
as a length, with the Ethernet header being followed by an IEEE 802.2
LLC header. The first two bytes of the LLC header are the DSAP and
SSAP.
If the received Ethernet packet appears to use this framing, then
interpret the two-byte DSAP and SSAP as being the network-layer
protocol. This allows support for receiving Spanning Tree Protocol
frames (which use an LLC header with {DSAP,SSAP}=0x4242) to be added
without requiring a full LLC protocol layer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[mromprefix] Report a dummy size at offset 0x02 of .mrom payload
The size of the .mrom payload (the second PCI ROM image) is defined in
its PCI header. The code type for the .mrom payload image is
deliberately set to an invalid value (0xff) to ensure that no BIOS
tries to parse anything in the image other than the PCI header.
Since the code type is not set to 0x00 ("Intel x86, PC-AT
compatible"), bytes 0x02-0x17 should not be interpreted by the BIOS as
being in the standard ISA expansion ROM format. In particular, the
byte at offset 0x02 does not represent the length of the ROM image (in
512-byte blocks).
However, some Dell BIOSes seem to erroneously use the byte at offset
0x02 to determine the length of the .mrom payload when walking the
list of PCI ROM images. Since this byte is currently set to zero,
this can lead to the BIOS getting stuck in an infinite loop during
POST. (This problem may not arise if the .mrom payload is the final
image in the ROM, since the BIOS will then have no reason to attempt
to locate the next image.)
One possible workaround would be to put the real payload size in this
byte, but doing so would constrain the .mrom payload size to 128kB
(see commit 8049a52 ("[mromprefix] Allow for .mrom images larger than
128kB") for more details).
Another possible workaround would be to put the real payload size as a
word in bytes 0x02-0x03 (as is done for EFI ROMs). This would not
constrain the .mrom payload size, but a payload size which happened to
be exactly 128kB would result in a zero value in the byte at offset
0x02 and so could still result in infinite loops on BIOSes with this
bug.
We choose to place a fixed value of 0x01 in the byte at offset 0x02.
This should at least prevent the BIOS from getting stuck in an
infinite loop. (The BIOS may walk into the middle of the .mrom
payload, where it will almost certainly not find a valid {0x55,0xaa}
signature or a valid PCIR header, and will therefore hopefully abort
processing.)
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[tcp] Do not shrink window when discarding received packets
We currently shrink the TCP window permanently if we are ever forced
(by a low-memory condition) to discard a previously received TCP
packet. This behaviour was intended to reduce the number of
retransmissions in a lossy network, since lost packets might
potentially result in the entire window contents being retransmitted.
Since commit e0fc8fe ("[tcp] Implement support for TCP Selective
Acknowledgements (SACK)") the cost of lost packets has been reduced by
around one order of magnitude, and the reduction in the window size
(which affects the maximum throughput) is now the more significant
cost.
Remove the code which reduces the TCP maximum window size when a
received packet is discarded.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some HP BIOSes (observed with an HP ProLiant m710p Server Cartridge)
have a bug in the implementation of INT 1a,b101: they blithely assume
that real-mode code is able to read from anywhere in the 32-bit memory
space.
This problem affects the call to INT 1a,b101 made from within
pcibios_num_bus() (which uses REAL_CODE() and hence executes in
genuine real mode) but does not affect the call made from within
romprefix.S (since with a PMM BIOS, that call executes in flat real
mode anyway).
Work around the problem by explicitly calling flatten_real_mode()
before invoking INT 1a,b101. This is a rarely-used code path, and so
the extra overhead of emulating instructions in some VM configurations
(see commit 6d4deee ("[librm] Use genuine real mode to accelerate
operation in virtual machines") for more details) is negligible.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Debugged-by: Wissam Shoukair <wissams@mellanox.com>
Debugged-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>