[romprefix] Change from opt-in to opt-out when booting via INT19
On non-BBS systems, we have to hook INT 19 in order to be able to boot
from the gPXE ROM at all. However, doing this unconditionally will
prevent the user from booting via any other devices.
Previously, the INT 19 entry point would prompt the user to press B in
order to boot from gPXE, which makes it impossible to perform an
unattended network boot. We now prompt the user to press N to skip
booting from gPXE, which allows for unattended operation.
This should be a better match for most real-world scenarios. Most
modern systems support BBS and so are unaffected by this change. Very
old (non-BBS) systems tend not to have PXE ROMs by default anyway; if
the user has added a gPXE ROM then they probably do want to boot from
the network. Newer non-BBS systems are essentially limited to IBM
servers, which will recapture the INT 19 vector anyway and implement
their own boot-ordering selection mechanism.
This driver is based on Stefan Hajnoczi's summer work, which
is in turn based on version 1.01 of the linux b44 driver.
I just assembled the pieces and fixed/added a few pieces
here and there to make it work for my hardware.
The most major limitation is that this driver won't work
on systems with >1GB RAM due to the card not having enough
address bits for that and gPXE not working around this
limitation.
Still, other than that the driver works well enough for
at least 2 users :) and the above limitation can always
be fixed when somebody wants it bad enough :)
Signed-off-by: Pantelis Koukousoulas <pktoss@gmail.com>
[netdevice] Kill off the various guess_boot_netdev() functions
Remove the assortment of miscellaneous hacks to guess the "network
boot device", and replace them each with a call to last_opened_netdev().
It still isn't guaranteed correct, but it won't be any worse than
before, and it will at least be consistent.
[netdevice] Provide function to retrieve the most recently opened net device
There are currently four places within the codebase that use a
heuristic to guess the "boot network device", with varying degrees of
success. Add a feature to the net device core to maintain a list of
open network devices, in order of opening, and provide a function
last_opened_netdev() to retrieve the most recently opened net device.
This should do a better job than the current assortment of
guess_boot_netdev() functions.
[aoe] Use an AoE config query to identify the target MAC address
The AoE spec does not specify that the source MAC address of a
received packet actually matches the MAC address of the AoE target.
In principle an AoE server can respond to an AoE request on any
interface available to it, which may not be an address configured to
accept AoE requests.
This issue is resolved by implementing AoE device discovery. The
purpose of AoE discovery is to find out which addresses an AoE target
can use for requests. An AoE configuration command is sent when the
AoE attach is attempted. The AoE target must respond to that
configuration query from an interface that can accept requests.
Based on a patch from Ryan Thomas <ryan@coraid.com>
EFI_STATUS is defined as an INTN, which maps to UINT32 (i.e. unsigned
int) on i386 and UINT64 (i.e. unsigned long) on x86_64. This would
require a cast each time the error status is printed.
Add efi_strerror() to avoid this ickiness and simultaneously enable
prettier reporting of EFI status codes.
[i386] Change [u]int32_t to [unsigned] int, rather than [unsigned] long
This brings us in to line with Linux definitions, and also simplifies
adding x86_64 support since both platforms have 2-byte shorts, 4-byte
ints and 8-byte long longs.
[i386] Free allocated base memory on exit, if possible
Code paths that automatically allocate memory from the FBMS at 40:13
should also free it, if possible.
Freeing this memory will not be possible if either
1. The FBMS has been modified since our allocation, or
2. We have not been able to unhook one or more BIOS interrupt vectors.
_filesz was incorrectly forced to be aligned up to MAX_ALIGN. In a
non-compressed build, this would cause a build failure unless _filesz
happened to already be aligned to MAX_ALIGN.
[infiniband] Respect hop pointer when building directed route SMP return path
The return path in directed route SMPs lists the egress ports in order
from SM to node, rather than from node to SM.
To write to the correct offset within the return path, we need to
parse the hop pointer. This is held within the class-specific data
portion of the MAD header, which was previously unused by us and
defined to be a uint16_t. Define this field to be a union type; this
requires some rearrangement of ib_mad.h and corresponding changes to
ipoib.c.
[romprefix] Use smaller PMM allocations if possible
The only way that PMM allows us to request a block in a region with
A20=0 is to ask for a block with an alignment of 2MB. Due to the PMM
API design, the only way we can do this is to ask for a block with a
size of 2MB.
Unfortunately, some BIOSes will hit problems if we allocate a 2MB
block. In particular, it may not be possible to enter the BIOS setup
screen; the BIOS setup code attempts a PMM allocation, fails, and
hangs the machine.
We now try allocating only as much as we need via PMM. If the
allocated block has A20=1, we free the allocated block, double the
allocation size, and try again. Repeat until either we obtain a block
with A20=0 or allocation fails. (This is guaranteed to terminate by
the time we reach an allocation size of 2MB.)
[linda] Add support for QLogic 7220-based Infiniband HCAs
These cards very nearly support our current IB Verbs model. There is
one minor difference: multicast packets will always be delivered by
the hardware to QP0, so the driver has to redirect them to the
appropriate QP. This means that QP owners may see receive completions
for buffers that they never posted. Nothing in our current codebase
will break because of this.
[infiniband] Add raw packet parser and constructor
This can be used with cards that require the driver to construct and
parse packet headers manually. Headers are optionally handled
out-of-line from the packet payload, since some such cards will split
received headers into a separate ring buffer.
Some Infiniband cards will not be as accommodating as the Arbel and
Hermon cards in providing enough space for us to push a fake extra
header at the start of the received packet. We must therefore make do
with squeezing enough information to identify source and destination
addresses into the two bytes of padding within a genuine IPoIB
link-layer header.
[infiniband] Split subnet management agent client out into ib_smc.c
Not all Infiniband cards have embedded subnet management agents.
Split out the code that communicates with such an embedded SMA into a
separate ib_smc.c file, and have drivers call ib_smc_update()
explicitly when they suspect that the answers given by the embedded
SMA may have changed.
[infiniband] Pass address vector in receive completions
Receive completion handlers now get passed an address vector
containing the information extracted from the packet headers
(including the GRH, if present), and only the payload remains in the
I/O buffer.
This breaks the symmetry between transmit and receive completions, so
remove the ib_completer_t type and use an ib_completion_queue_operations
structure instead.
Rename the "destination QPN" and "destination LID" fields in struct
ib_address_vector to reflect its new dual usage.
Since the ib_completion structure now contains only an IB status code,
("syndrome") replace it with a generic gPXE integer status code.
[infiniband] Flush uncompleted work queue entries at QP teardown
Avoid leaking I/O buffers in ib_destroy_qp() by completing any
outstanding work queue entries with a generic error code. This
requires the completion handlers to be available to ib_destroy_qp(),
which is done by making them static configuration parameters of the CQ
(set by ib_create_cq()) rather than being provided on each call to
ib_poll_cq().
This mimics the functionality of netdev_{tx,rx}_flush(). The netdev
flush functions would previously have been catching any I/O buffers
leaked by the IPoIB data queue (though not by the IPoIB metadata
queue).
Add the simplified ne2k_isa driver. It is just a selective copy+paste
of the relevant parts from ns8390.c plus a little trivial hacking to
make it actually work.
It is true that the code is pretty ugly, but:
a) ns8390.c is worse
b) It is only 372 lines and no #ifdefs
c) It works both in qemu/bochs and in real hardware
and we all know it is easier to cleanup working code
Hope someone will find the time to rewrite this driver properly,
but until then at least for me this is an ok solution.
Signed-off-by: Pantelis Koukousoulas <pktoss@gmail.com>
[netdevice] Retain and report detailed error breakdowns
netdev_rx_err() and netdev_tx_complete_err() get passed the error
code, but currently use it only in debug messages.
Retain error numbers and frequencey counts for up to
NETDEV_MAX_UNIQUE_ERRORS (4) different errors for each of TX and RX.
This allows the "ifstat" command to report the reasons for TX/RX
errors in most cases, even in non-debug builds.
Halting the PEGs breaks platforms where there is sideband access to
the NIC (e.g. HP machines using iLO). (We have to retain the
unhalting code because on some other platforms (e.g. IBM blades with
BOFM) the pre-PXE firmware must halt the PEGs to avoid issues with the
BIOS rereading via the expansion ROM BAR.)
[aoe] Start retry timer before potential temporary transmission failure
The retry timer needs to be running as soon as we know that we are
trying to transmit a command. If transmission fails because of a
temporary error condition, then the timer will allow us to retry the
transmission later.
[settings] Ensure fetch_string_setting() returns a NUL-terminated string
This fixes a regression introduced in commit 612f4e7:
[settings] Avoid returning uninitialised data on error in fetch_xxx_setting()
in which the memset() was moved from fetch_string_setting() to
fetch_setting(), in order that it would be useful for non-string
setting types. However, this neglects to take into account the fact
that fetch_string_setting() shrinks its buffer by one byte (to allow
for the NUL) before calling fetch_setting().
Restore the memset() in fetch_string_setting(), so that the
terminating NUL is guaranteed to actually be a NUL.
[i386] Add data32 prefixes to all lgdt/lidt instructions
With a 16-bit operand, lgdt/lidt will load only a 24-bit base address,
ignoring the high-order bits. This meant that we could fail to fully
restore the GDT across a call into gPXE, if the GDT happened to be
located above the 16MB mark.
Not all of our lgdt/lidt instructions require a data32 prefix (for
example, reloading the real-mode IDT can never require a 32-bit base
address), but by adding them everywhere we will hopefully not forget
the necessary ones in future.
[phantom] Allow for PXE boot to be enabled/disabled on a per-port basis
This is something of an ugly hack to accommodate an OEM requirement.
The NIC has only one expansion ROM BAR, rather than one per port. To
allow individual ports to be selectively enabled/disabled for PXE boot
(as required), we must therefore leave the expansion ROM always
enabled, and place the per-port enable/disable logic within the gPXE
driver.