[hyperv] Cope with Windows Server 2016 enlightenments
An "enlightened" external bootloader (such as Windows Server 2016's
winload.exe) may take ownership of the Hyper-V connection before all
INT 13 operations have been completed. When this happens, all VMBus
devices are implicitly closed and we are left with a non-functional
network connection.
Detect when our Hyper-V connection has been lost (by checking the
SynIC message page MSR). Reclaim ownership of the Hyper-V connection
and reestablish any VMBus devices, without disrupting any existing
iPXE state (such as IPv4 settings attached to the network device).
Windows Server 2016 will not cleanly take ownership of an active
Hyper-V connection. Experimentation shows that we can quiesce by
resetting only the SynIC message page MSR; this results in a
successful SAN boot (on a Windows 2012 R2 physical host). Choose to
quiesce by resetting (almost) all MSRs, in the hope that this will be
more robust against corner cases such as a stray synthetic interrupt
occurring during the handover.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[block] Provide abstraction to allow system to be quiesced
When performing a SAN boot via INT 13, there is no way for the
operating system to indicate that it has finished using the INT 13 SAN
device. We therefore have no opportunity to clean up state before the
loaded operating system's native drivers take over. This can cause
problems when booting Windows, which tends not to be forgiving of
unexpected system state.
Windows will typically write a flag to the SAN device as the last
action before transferring control to the native drivers. We can use
this as a heuristic to bring the system to a quiescent state (without
performing a full shutdown); this provides us an opportunity to
temporarily clean up state that could otherwise prevent a successful
Windows boot.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On most Intel NICs, Auto-Speed Detection Enable (ASDE) can be used to
automatically detect the correct link speed by sampling the link using
the internal PHY. This feature is automatically inhibited when not
appropriate for the physical link (e.g. when using internal SerDes
mode on the 8254x).
On the i350 datasheet ASDE is a reserved bit, but the relevant
auto-speed detection hardware appears still to be present. However,
enabling ASDE on the i350 1000BASE-KX backplane NIC seems to cause an
immediate link failure. It is possible that the auto-speed detection
hardware is still present, is not connected to a physical link, and is
not inhibited from being applied in this mode.
Work around this problem by adding an INTEL_NO_ASDE flag bit
(analogous to INTEL_NO_PHY_RST), and applying this for the i350
backplane NIC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Show original CTRL and STATUS values in debugging output
In situations where iPXE fails to reach link-up as expected, it is
useful to know the original values of the CTRL and STATUS registers
prior to our reset attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[block] Allow use of a non-default EFI SAN boot filename
Some older operating systems (e.g. RHEL6) use a non-default filename
on the root disk and rely on setting an EFI variable to point to the
bootloader. This does not work when performing a SAN boot on a
machine where the EFI variable is not present.
Fix by allowing a non-default filename to be specified via the
"sanboot --filename" option or the "san-filename" setting. For
example:
sanboot --filename \efi\redhat\grub.efi \
iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6
or
option ipxe.san-filename code 188 = string;
option ipxe.san-filename "\\efi\\redhat\\grub.efi";
option root-path "iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6";
Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[thunderx] Use ThunderxConfigProtocol to obtain board configuration
Following changes were introduced:
- added GetBgxProp and GetLmacProp methods to ThunderxConfigProtocol
- replaced direct BOARD_CFG access with usage of introduced methods
- removed redundant BOARD_CFG
- changed GUID of ThunderxConfigProtocol, as this is not compatible
with previous version
- changed UINTN* to UINT64* buffer type to fix issue on 32-bit
platforms with MAC address
This change allows us to avoid alignment of BOARD_CFG definitions
every time it changes in UEFI.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TEST UNIT READY command is issued automatically when the device is
opened, and is not the result of a command being issued by the caller.
This is required in order that a permanent TEST UNIT READY failure can
be used to identify unusable paths in a multipath SAN device.
Since the TEST UNIT READY command is not part of the caller's command
issuing process, it is not covered by any external retry loops (such
as the main retry loop in sandev_command()).
We must therefore be prepared to retry the TEST UNIT READY command
within the SCSI layer itself. We retry only the TEST UNIT READY
command so as not to multiply the number of potential retries for
normal commands (which are already retried by sandev_command()).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[http] Notify data transfer interface when underlying connection is ready
HTTP implements xfer_window_changed() on the underlying server
connection using http_step(), which does not propagate the window
change notification to the data transfer interface. This breaks the
multipath-capable SAN boot code, which relies on the window change
notification to discover that the HTTP block device is ready for
commands to be issued.
Fix by sending xfer_window_changed() in http_step() once the
underlying connection has been determined to be ready.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Describe all SAN devices via ACPI tables such as the iBFT. For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device. For multi-device tables (i.e. the
iBFT), all devices are described in a single table.
An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table. We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors. This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.
For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully. For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths. We do this
since the iBFT has to include IP addresses rather than DNS names. We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.
Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures. We therefore describe only the
NICs that are actually required in order to reach the described
targets. Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some block device protocols, the active path may continue to
receive xfer_window_changed() notifications during normal use. These
currently result in the active path being erroneously closed.
Fix by ignoring any xfer_window_changed() messages if this path is
already the active path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[block] Retry reopening indefinitely for multipath devices
For multipath SAN devices, verify that the device is capable of being
opened (i.e. that all URIs are parseable and that at least one path is
alive) and thereafter retry indefinitely to reopen the device as
needed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[block] Add a small delay between attempts to reopen SAN targets
When all SAN targets are completely unreachable, there will be a
natural delay between reopening attempts due to the network connection
timeout on the unreachable targets.
However, some SAN targets may accept connections instantly and report
a temporary unavailability by e.g. failing the TEST UNIT READY
command. If all targets are behaving this way then there will be no
natural delay, and we will attempt to saturate the network with
connection attempts.
Fix by introducing a small delay between attempts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the SAN retry count to be configured via the ${san-retry}
setting, defaulting to the current value of 10 retries if not
specified.
Note that setting a retry count of zero is inadvisable, since iSCSI
targets in particular will often report spurious errors such as "power
on occurred" for the first few commands.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[int13con] Avoid overwriting random portions of SAN boot disks
The INT13 console type (CONSOLE_INT13) autodetects at initialisation
time a magic partition to be used for logging iPXE console output. If
the INT13 drive number mapping is subsequently changed (e.g. because
iPXE was used to perform a SAN boot), then the console logging output
will be written to the incorrect disk.
Fix by recording the INT13 vector at initialisation time, and using
this original vector to emulate INT13 calls for all subsequent
accesses. This should be robust against drive remapping performed
either by ourselves or by another bootloader (e.g. a chainloaded
undionly.kpxe which then performs a SAN boot).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[int13] Improve geometry guessing for unaligned partitions
Some partition tables have partitions that are not aligned to a
cylinder boundary, which confuses the current geometry guessing logic.
Enhance the existing logic to ensure that we never reduce our guesses
for the number of heads or sectors per track, and add extra logic to
calculate the exact number of sectors per track if we find a partition
that starts within cylinder zero.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for multipath block devices. The "sanboot" and
"sanhook" commands now accept a list of SAN URIs. We open all URIs
concurrently. The first connection to become available for issuing
block device commands is marked as the active path and used for all
subsequent commands; all other connections are then closed. Whenever
the active path fails, we reopen all URIs and repeat the process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a dummy SAN device which allows the "sanhook" command to be tested
even when no SAN booting capability is present on the platform. This
allows substantial portions of the SAN boot code to be run in Linux
under Valgrind.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[scsi] Avoid duplicate call to scsicmd_close() on TEST UNIT READY failure
When the TEST UNIT READY command receives an error response, the
shutdown of the command's block data interface will result in
scsidev_ready() closing the SCSI device. This will subsequently
result in a duplicate call to scsicmd_close(), leading to an assertion
failure when list_del() is called for the second time.
Fix by removing the command from the list of outstanding commands
before shutting down the command's interfaces.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[iobuf] Increase minimum I/O buffer size to 128 bytes
The eIPoIB translation layer needs to translate outbound ARP packets
from Ethernet to IPoIB. A 64-byte buffer (starting with the Ethernet
header) does not provide enough tailroom to expand to hold the two
20-byte IPoIB MAC addresses. The result is that an UNDI API user will
be unable to send ARP packets.
We could potentially shuffle the packet contents to reuse the space
occupied by the stripped Ethernet link-layer header, but this would
add complexity. Instead, fix by increasing the minimum allocation
size to 128 bytes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
B0_CTST is a 24bit register according to the vendor driver (sk98lin).
A 16bit read on B0_CTST will always return 0 for Y2_VAUX_AVAIL
(1<<16), so use a 32bit read when testing Y2_VAUX_AVAIL.
[This patch is copied directly from the Linux kernel tree.]
Signed-off-by: Mike McCormack <mikem@ring3k.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The value of ( ( x & 0x0c00 ) | 0x0c00 ) is always 0x0c00 regardless
of the value of x, and so the read_csr() is redundant. (There are no
read side effects for this register, according to the datasheet.)
This line of code originated in Linux kernel 2.3.19pre1 as
a->write_csr(ioaddr, 80, a->read_csr(ioaddr, 80) | 0x0c00);
and was modified in kernel 2.3.41pre4 to read
a->write_csr(ioaddr, 80, (a->read_csr(ioaddr, 80) & 0x0C00) | 0x0c00);
In the absence of commit messages, the intention of the code is
unclear. However, the logic resulting in a fixed value of 0x0c00 has
remained unaltered for over 17 years, and can probably be assumed to
have the correct overall result.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Track the current and maximum heap usage, and display the maximum
during shutdown when DEBUG=malloc is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our asprintf() implementation guarantees that strp will be NULL on
allocation failure, but this is not standard behaviour. Detect errors
by checking for a negative return value instead of a NULL pointer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid potential division by zero when performing the check against
multiplication overflow. (Note that if the width is zero then there
can be no overflow anyway, so it is then safe to bypass the check.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>