[intel] Use autoloaded MAC address instead of EEPROM MAC address
The i350 (and possibly other Intel NICs) have a non-trivial
correspondence between the PCI function number and the external
physical port number. For example, the i350 has a "LAN Function Sel"
bit within the EEPROM which can invert the mapping so that function 0
becomes port 3, function 1 becomes port 2, etc.
Unfortunately the MAC addresses within the EEPROM are indexed by
physical port number rather than PCI function number. The end result
is that when anything other than the default mapping is used, iPXE
will use the wrong address as the base MAC address.
Fix by using the autoloaded MAC address if it is valid, and falling
back to reading the MAC address directly from the EEPROM only if no
autoloaded address is available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the ID for the LM variant and differentiate it from the I217-V.
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Apply PBS/PBA errata workaround only to ICH8 PCI device IDs
ICH8 devices have an errata which requires us to reconfigure the
packet buffer size (PBS) register, and correspondingly adjust the
packet buffer allocation (PBA) register. The "Intel I/O Controller
Hub ICH8/9/10 and 82566/82567/82562V Software Developer's Manual"
notes for the PBS register that:
10.4.20 Packet Buffer Size - PBS (01008h; R/W)
Note: The default setting of this register is 20 KB and is
incorrect. This register must be programmed to 16 KB.
Initial value: 0014h
0018h (ICH9/ICH10)
It is unclear from this comment precisely which devices require the
workaround to be applied. We currently attempt to err on the side of
caution: if we detect an initial value of either 0x14 or 0x18 then the
workaround will be applied. If the workaround is applied
unnecessarily, then the effect should be just that we use less than
the full amount of the available packet buffer memory.
Unfortunately this approach does not play nicely with other device
drivers. For example, the Linux e1000e driver will rewrite PBA while
assuming that PBS still contains the default value, which can result
in inconsistent values between the two registers, and a corresponding
inability to transmit or receive packets. Even more unfortunately,
the contents of PBS and PBA are not reset by anything less than a
power cycle, meaning that this error condition will survive a hardware
reset.
The Linux driver (written and maintained by Intel) applies the PBS/PBA
errata workaround only for devices in the ICH8 family, identified via
the PCI device ID. Adopt a similar approach, using the PCI_ROM()
driver data field to indicate when the workaround is required.
Reported-by: Donald Bindner <dbindner@truman.edu>
Debugged-by: Donald Bindner <dbindner@truman.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Exclude time spent in hypervisor from profiling
When profiling, exclude any time spent inside the hypervisor
responding to our MMIO accesses. This substantially reduces the
variance accumulated on many other profilers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Inside a virtual machine, writing the RX ring tail pointer may incur a
substantial overhead of processing inside the hypervisor. Minimise
this overhead by writing the tail pointer once per batch of
descriptors, rather than once per descriptor.
Profiling under qemu-kvm (version 1.6.2) shows that this reduces the
amount of time taken to refill the RX descriptor ring by around 90%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Operations which are negligible on physical hardware (such as issuing
a posted write to the transmit ring tail register) may involve
substantial amounts of processing within the hypervisor if running in
a virtual machine.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Avoid completely filling the TX descriptor ring
It is unclear from the datasheets whether or not the TX ring can be
completely filled (i.e. whether writing the tail value as equal to the
current head value will cause the ring to be treated as completely
full or completely empty). It is very plausible that this edge case
could differ in behaviour between real hardware and the many
implementations of an emulated Intel NIC found in various virtual
machines. Err on the side of caution and always leave at least one
ring entry empty.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Expose functionality to be shared with intelx driver
The Intel 10 Gigabit NICs have a datapath that is almost
register-compatible with the Intel 1 Gigabit NICs. Expose common
functionality to avoid duplication of code in the new "intelx" driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Remove hardcoded offsets for descriptor ring registers
The Intel 10 Gigabit NICs use the same simplified (aka "legacy")
descriptor format and the same layout for descriptor register blocks
as the Intel 1 Gigabit NICs. The offsets of the descriptor register
blocks are not the same.
Simplify reuse of the existing code by removing all hardcoded offsets
for registers within descriptor register blocks, and ensuring that all
offsets are calculated using the descriptor register block base
address provided via intel_init_ring().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Poll RX queue if hardware reports RX overflow
The Intel NIC emulation in some versions of VMware seems to suffer
from a flaw whereby the Interrupt Cause Register (ICR) fails to assert
the usual "packet received" bit (ICR.RXT0) if a receive overflow
(ICR.RXO) has also occurred.
Work around this flaw by polling for completed descriptors whenever
either ICR.RXT0 or ICR.RXO is asserted.
Reported-by: Miroslav Halas <miroslav.halas@bankofamerica.com>
Debugged-by: Miroslav Halas <miroslav.halas@bankofamerica.com>
Tested-by: Miroslav Halas <miroslav.halas@bankofamerica.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On i350 the datasheet contradicts itself in stating that the default
value of RXDCTL.ENABLE for queue zero is both set (according to the
"Receive Initialization" section) and unset (according to the "Receive
Descriptor Control - RXDCTL" section). Empirical evidence suggests
that the default value is unset.
Explicitly enable both transmit and receive queues to avoid any
ambiguity.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
[intel] Refill receive ring only after enabling receiver
On 82576 (and probably others), the datasheet states that "the tail
register of the queue (RDT[n]) should not be bumped until the queue is
enabled". There is some confusion over exactly what constitutes
"enabled": the initialisation blurb says that we should "poll the
RXDCTL register until the ENABLE bit is set", while the description
for the RXDCTL register says that the ENABLE bit is set by default
(for queue zero). Empirical evidence suggests that the ENABLE bit
reads as set immediately after writing to RCTL.EN, and so polling is
not necessary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>