/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
* Use is subject to license terms.
*/
/*
* FIT rates - assume leaf devices are somewhat less reliable than
* root complexes, switches and bridges
*/
/*
* SERD parameters.
*
* PCI Express correctable link errors are automatically handled by the
* hardware, so have relatively little impact and we can allow quite a
* high frequency. We will also be quite conservative about nonfatal internal
* errors reported by the driver.
*
* driver which may cause intermittant performance/responsiveness problems, so
* we have tighter serd parameters for these. These are most likely errors in
* data cache parity errors.
*/
/*
* if the source-id payload is valid, then check it matches
*/
(!payloadprop_defined("source-valid") || \
(payloadprop_defined("source-valid") && \
/*
* Other useful macros. These use the EXCAP property (PCI Express Capabilities
* register) to find the type for PCI Express devices, and the CLASS-CODE
* property (PCI Class Code register) for to find the type of PCI devices behind
* a PCI Express-PCI bridge - note that 60400 and 60401 are defined as PCI-PCI
* bridges, everything else is consider a PCI leaf device.
*/
/*
* define faults
*/
N=CORRLINK_COUNT, T=CORRLINK_TIME;
N=CORRLINK_COUNT, T=CORRLINK_TIME;
N=CORRLINK_COUNT, T=CORRLINK_TIME;
N=CORRLINK_COUNT, T=CORRLINK_TIME;
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* Handling of leaf driver detected internal errors. Use serd engine if
* no service impact - otherwise fail immediately
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*/
/*
* handling of service impact ereports.
*/
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* A faulty PCI Express hostbridge (root complex) may cause:
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* - nr-d: the device not to respond to a valid upstream request
* - ca-d: the device to completer abort a valid upstream request
* - mtlp-d: a malformed tlp to be transmitted downstream
* - badreq-d: a bad downstream request - not CRC error (may cause
* completer to respond with ur or ca)
* - ecrcreq-d: TLP with end-to-end CRC error transmitted downstream
* - ecrccomp-d: TLP with end-to-end CRC error transmitted downstream
* - poisreq-d: poisoned request transmitted downstream
* - poiscomp-d: poisoned completion transmitted downstream
* - corrlink: correctable link or physical level error
* - fatlink: fatal link or physical level error
*/
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* A faulty PCI Express leaf device or upstream switch port may cause:
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* - flt-nr-u: the device not to respond to a valid downstream request
* - flt-ca-u: the device to completer abort a valid downstream request
* - flt-badreq-u: a bad upstream request - not CRC error (may cause
* completer to respond with ur or ca) - leaf only
* - flt-mtlp-u: a malformed tlp transmitted upstream - leaf only
* - flt-ecrcreq-u: request with end-to-end CRC error transmitted upstream
* - flt-ecrccomp-u: compl with end-to-end CRC error transmitted upstream
* - flt-poisreq-u: poisoned request transmitted upstream
* - flt-poiscomp-u: poisoned completion transmitted upstream
* - device: internal error reported by leaf device
* - corrlink: correctable link or physical level error
* - fatlink: fatal link or physical level error
*/
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* A faulty PCI Express downstream switch port may cause
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* - nr-d: the device not to respond to an upstream request
* - ca-d: the device to completer abort an upstream request
* - ecrcreq-d: TLP with end-to-end CRC error transmitted upstream
* - ecrccomp-d: TLP with end-to-end CRC error transmitted upstream
* - poisreq-d: poisoned request transmitted upstream
* - poiscomp-d: poisoned completion transmitted upstream
* - corrlink: correctable link or physical level error
* - fatlink: fatal link or physical level error
*/
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* A faulty PCIEX bus may cause:
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* - flt-nr-u: a device to not respond because the link is down
* - nr-d: a device to not respond because the link is down
* - corrlink: correctable link or physical level error
* - fatlink: fatal link or physical level error
*/
/*
* ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* A faulty pciex-pci bridge may cause
* ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* The following errors to propagate onto the PCI Express fabric
* - flt-nr-u: the device not to respond to a valid downstream request
* - flt-ca-u: the device to completer abort a valid downstream request
* - flt-ecrcreq-u: request with end-to-end CRC error transmitted upstream
* - flt-ecrccomp-u: compl with end-to-end CRC error transmitted upstream
* - flt-poisreq-u: poisoned request transmitted upstream
* - flt-poiscomp-u: poisoned completion transmitted upstream
* - corrlink: correctable link or physical level error upstream
* - fatlink: fatal link or physical level error upstream
* - sec-interr: internal error on pci express to pci bridge
*
* And the following errors to propagate onto the secondary pci or pci/x bus
* (these will be handled by code in the pci.esc file).
* - nr-pw-d: the device not to respond to a valid upstream request
* - nr-drw-d: the device not to respond to a valid upstream request
* - retry-to-d: failure to retry an downstream delayed request
* - ta-pw-d: the device responds with a ta to a valid upstream
* request
* - ta-drw-d: the device responds with a ta to a valid upstream
* request
* - scpe-d: split completion to get corrupted during downstream transmission
*/
/*
* the following rules for ptlp and ecrc faults are split into fatal and
* nonfatal, depending on the service impact reported by the leaf driver
*/
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* declarations
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*/
/*
* handling of fatal and nonfatal error messages propagated up to root complex
*
* Use these for errors reported by root-complex on behalf of another device.
* Can use source-id payload to identify where the message came from.
*/
/*
* link-level errors - could generate ereports at either end of link
*
* can use may propagations here as these ereports are only seen for these
* faults.
*/
/*
* bridge internal error
*/
/*
* downstream poisoned request
*
* on route must raise a ptlp ereport while any switch ports forwarding
* the poisoned request must raise sec-mdpe ereports. The originator of the
* poisoning (be it root complex or downstream port of a switch) also raises
* sec-mdpe. A hardened leaf driver will also raise ptlp. A target-mdpe may
* be seen at the leaf (which may be a pci device beyond the bridge).
*
* root complex will see and report an ma. Use flt-ur-u to represent this.
*
* The fault can always be recognized and the source identified using the ptlp
* and sec-mdpe ereports.
*/
/*
* downstream poisoned completion
*
* route must raise ptlp and mdpe ereports. A hardened leaf driver will also
* reported, and though we should still see a nonfatal error reported from
* the root complex identifying the leaf device, we won't actually be informed
* that the error was an ptlp.
*/
/*
* downstream request with ecrc error.
*
* route can optionally raise an ecrc ereport. A hardened leaf driver may also
* raise ecrc. For non-hardened leaf devices, no ecrc may be reported, and
* though we should still see a nonfatal error reported from the root complex
* identifying the leaf device, we won't actually be informed that the error
* was an ecrc.
*
* eventually get a cto at the root complex - so use an nr-u at the pciex
* leaf or bridge to get the appropriate behaviour. For the case where the leaf
* driver wasn't hardened we may be able to identify the leaf device (and
* therefore any intermediate switches which might have caused the problem)
* either via a target-ma ereport if available or via the nonfatal error
* reported from the root complex identifying the leaf device. The combination
* of a nonfatal error reported from the root complex and a cto from the root
* complex is sufficient to positively identify this case.
*/
/*
* downstream completion with ecrc error.
*
* route can optionally raise an ecrc ereport. A hardened leaf driver may
* also raise ecrc. For non-hardened leaf devices, no ecrc may be reported,
* and though we should still see a nonfatal error reported from the root
* complex identifying the leaf device, we won't actually be informed that
* the error was an ecrc.
*
* eventually get a cto. Note the leaf ereports are optional (ie in case driver
* not hardened) but if we get both ecrc and cto we need to distinguish from
* cto only which would be an nr-d.
*/
/*
* upstream poisoned request
*
* - flt-poisreq-u is on the pciex node which generated the fault
* - source-poisreq-u refers to at least one leaf or bridge device
* whose bdf (if leaf) must match the source-id in the payload of the
* ereport generated from the root complex.
* - poisreq-u propagates up to the root complex and any switch ports on
* route will raise a ptlp ereport, while any upstream devices generating
* or forwarding the poisoned packed will raise an mdpe ereport. The root
* complex should also report a ptlp.
*
* Additionally, as the root complex may treat the request as a ur, which the
* ta onto the child pci bus if this was a delayed write).
*
* We can always recognize what sort of fault this is from the ptlp (with no
* sec-mdpe) at the root complex. Recognizing which originating devices may be
* implicated can be done using the mdpe ereport (for a hardened leaf driver),
* or for a non-hardened leaf driver by using the source-id payload in the ptlp
* at the intervening switches will narrow the fault down to a single suspect.
*/
/*
* the remaining propagations are also used for poisoned requests propagating
* up due to a fault behind a pcie-pci bridge
*/
/*
* upstream poisoned completion
*
* - flt-poiscomp-u is on the pciex node which generated the fault. There will
* be a target-mdpe downstream from here.
* - source-poiscomp-u refers to at least one leaf or bridge device
* whose bdf (if leaf) must match the source-id in the payload of the
* ereport generated from the root complex.
* - poiscomp-u propagates up to the root complex and any switches on
* route will raise ptlp and sec-mdpe ereports. The root complex will also
* raise a sec-mdpe and ptlp.
*
* the root complex. Recognizing which originating devices may be implicated
* can be done using the source-id payload in the ptlp ereport to identify the
* switches will narrow the fault down to a single suspect.
*/
/*
* the remaining propagations are also used for poisoned completions propagating
* up due to a fault behind a pcie-pci bridge
*/
/*
* upstream request with ecrc error.
*
* - flt-ecrcreq-u is on the pciex node which generated the fault.
* - source-ecrcreq-u cascades down to at least one leaf device (pciex or pci),
* whose bdf (if pciex) must match the source-id in the payload of the
* ereport generated from the root complex.
* - ecrcreq-u propagates up to the root complex which must report it with an
* ecrc ereport and any switches on route can optionally raise an ecrc ereport
*
* Additionally, as the root complex will just throw away the packet, we may
* eventually get a cto - so use an nr-d at the pciex leaf or bridge to get
* the appropriate behaviour.
*
* We can always recognize what sort of fault this is from the ecrc (with no
* cto) at the root complex. Recognizing which leaf device may be implicated
* can be done from the cto ereport (for a hardened leaf driver) or for a
* non-hardened leaf using the source-id payload of the ecrc.
*/
/*
* upstream completion with ecrc error.
*
* - flt-ecrccomp-u is on the pciex node which generated the fault.
* - source-ecrccomp-u cascades down to at least one leaf device (pciex or pci),
* whose bdf (if pciex) must match the source-id in the payload of the
* ereport generated from the root complex.
* - ecrccomp-u propagates up to the root complex, which should report it with
* an ecrc ereport and any switches on route can optionally raise an ecrc
* ereport.
*
* Additionally, as the root complex will just throw away the packet, we'll
* eventually get a cto - so use an flt-nr-u at the pciex leaf or bridge to get
* the appropriate behaviour.
*
* root complex. Recognizing which leaf device may be implicated can be done
* using either the source-id payload of the ecrc or the target-ma ereport if
* available.
*/
/*
* no response to downstream requester
*
* - nr-d will effectively cascade downstream to the requester. The fault here
* is always at the root complex. For a hardened leaf device driver, we will
* always be able to recognize this as the requester will report this as a
* cto. For non-hardened leaf devices, no cto will be reported, and though we
* should still see a nonfatal error reported from the root complex
* identifying the leaf device, we won't actually be informed that the error
* was a cto.
*/
/*
* no response to upstream requester
*
* - flt-nr-u will effectively cascade upstream to the root complex which will
* report it as a cto.
*
* We have to use target-ma to informs us which device failed to respond.
*/
/*
* downstream malformed tlp
*
* This will cascade downstream to the receiver which will report it as an mtlp.
* For non-hardened leaf drivers, no mtlp will be reported, and though we should
* still see a fatal error reported from the root complex identifying the leaf
* device, we won't actually be informed that the error was a mtlp.
* Note that sw-mtlp-d is to handle the case where the switch is actually
* the target of the packet (config request etc).
*/
/*
* upstream malformed tlp
*
* This will cascade upstream to the receiver which will report it as an mtlp.
*/
/*
* downstream completer aborts
*
* This could be the fault of the root complex or a switch reporting an internal
* error, or of the leaf device sending an invalid request (the latter is
* handled by the flt-badreq-u case below).
*
* This is reported by the completer or by an intervening downstream switch
* port. The completer abort response propagates down to the initiator which
* will set the legacy pci bit rta.
*
* The fault can always be recognized by the ca ereport from the root complex
* or downstream switch port. The originator of the request can be recognized
* by the rta for a hardened driver or by using the source-id payload of the
* ca ereport for a non-hardened driver.
*/
/*
* upstream completer aborts
*
* reporting an internal error, or of the root complex sending an invalid
* request (the latter case is handled by badreq-d below).
*
* This is reported as a ca by the completer. The completer (for non-posted
* requests) sends the appropriate error bits in the completion message to
* the initiator which will set the legacy pci bit sec-rta.
*
* The fault can always be recognized from the sec-rta bit at the root complex.
*
* If the fault was with a PCI Express leaf with a hardened driver, then we
* will identify the device from the ca ereport.
*
* If the fault was with a PCI Express leaf with a non-hardened driver, then we
* can still identify the leaf device from the source-id payload of the nonfatal
* message ereport from the root complex or from the target-rta ereport.
*/
/*
* upstream bad request
*
* When detecting bad data on a request the completer (or any switch on the
* way to the completer) may report ur or ca. If the switch detects the problem
* first then the request doesn't get forwarded on to the completer.
*
* then sends the appropriate error bits in the completion message to the
* initiator which will set the legacy pci bits ma or rta.
*
* identifies the initiator.
*
* complex or downstream switch port. The originator of the request can be
*/
/*
* downstream bad request
*
* When detecting bad data on a request the completer (or any switch on the
* way to the completer) may report ur or ca. If the switch detects the problem
* first then the request doesn't get forwarded on to the completer.
*
* hardened leaf driver when all we get is a nonfatal error from the root
* complex identifying the leaf device). The reporter then sends the appropriate
* error bits in the completion message to the initiator which will set the
* legacy pci bits ma or rta (oddly there is no equivalent in pcie error
* reporting).
*/
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* Stub unused legacy pci ereports at root complex.
* Stub tl.uc as we can't do anything useful with it (we should eventually
* get a cto which we can do something with - a uc without a cto is a genuinely
* spurious completion which is at least harmless).
* Stub messages that the root complex sends to itself.
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*/
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* rules for propagations from child PCI bus
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*/
/*
* ma-u will only propagate on to pciex bus for non-posted accesses. It
* is then represented as an unsupported request.
*/
/*
* ta-u will only propagate on to pciex bus for non-posted accesses. It is
* then represented as a completer abort.
*/
/*
* PERR# on a delayed write is represented as an unsupported request
*/
/*
* propagate onto pci express as a poisoned tlp
*/
/*
* If the bridge sees an address or attribute parity error it is considered
* a fatal error.
*/
/*
* If the bridge sees a split completion error (pci-x only) it could
* result in a number of things
* - unrecovered split completion message data error (uscmd). This would
* happen on a pio write. A completer abort is returned to the initiator.
* - for various faults in the split completion (eg address parity error)
* we will respond with a target abort (which the child device will treat
* as a split completion ta)
* - for other faults we can't tell who send the split completion and so
* just drop the request (which the child device sees as a split
* completion ma)
*/
/*
* Similarly a child device may have responded with a master abort or
* target abort to one of our split competions. The hardware just logs these.
*/
/*
* SERR# is considered fatal
*/
/*
* Retry time-out is nonfatal. The initial requester has stopped retrying so
* there's nothing else the hardware can do but flag the error.
*/
/*
* A bad dma request (eg with invalid address) propagates onto pci express
* as a bad dma request. The end result may be a master abort or target abort
* (depending on whether the child is pci-x or pci).
*/
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* target- propagations
*
* A Root Complex driver may generate "target-" ereports when knowledge of the
* physical address associated with a fault allows the target device to be
* determined. This is not a requirement of the Diagnosis Engine, but can be
* valuable when available.
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*/
/*
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* stub unused pciex-pci bridge ereports
* - ignore usc/sec-unex-spl
* - ignore sec-spl-or/sec-spl-dly as these aren't really faults (tuning info)
* - ignore ecc.ce ereports for now (could do serd on these)
* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*/