hermon_wr.c revision 949b58c70cf907006b9f724dfad665d44eca5881
* Implements all the routines necessary to provide the PostSend(), * PostRecv() and PostSRQ() verbs. Also contains all the code * necessary to implement the Hermon WRID tracking mechanism. /* initialize the FMA retry loop */ /* Grab the lock for the WRID list */ /* Save away some initial QP state */ * Check for "queue full" condition. If the queue * is already full, then no more WQEs can be posted. * So break out, ring a doorbell (if necessary) and /* need to know the count of destination nds for backward loop */ for (
dnds = 0, i = 0; i <
nds; i++) {
* Build a Send or Send_LSO WQE nopcode |= (
1 <<
6);
/* ReRead bit must be set */ last_ds++;
/* real last ds of wqe to fill */ for (j =
nds; --j >= i; ) {
* Fill in the Data Segment(s) for the current WQE, using the * information contained in the scatter-gather list of the /* Update some of the state in the QP */ /* Now set the ownership bit and opcode (first dword). */ /* do the invalidate of the headroom */ /* the FMA retry loop starts for Hermon doorbell register. */ /* the FMA retry loop ends. */ /* do the invalidate of the headroom */ /* initialize the FMA retry loop */ /* make sure we see any update of wq_head */ /* Save away some initial QP state */ * Check for "queue full" condition. If the queue * is already full, then no more WQEs can be posted. * So break out, ring a doorbell (if necessary) and * Validate the operation type. For RC requests, we allow * "Send", "RDMA Read", "RDMA Write", various "Atomic" * operations, and memory window "Bind" * If this is an RDMA Read or RDMA Write request, then fill * in the "Remote Address" header fields. * Build the Remote Address Segment for the WQE, using * the information from the RC work request. /* Update "ds" for filling in Data Segments (below) */ * If this is one of the Atomic type operations (i.e * Compare-Swap or Fetch-Add), then fill in both the "Remote * Address" header fields and the "Atomic" header fields. * Build the Remote Address and Atomic Segments for * the WQE, using the information from the RC Atomic /* Update "ds" for filling in Data Segments (below) */ * Update "nds" and "sgl" because Atomic requests have * only a single Data Segment. * If this is memory window Bind operation, then we call the * hermon_wr_bind_check() routine to validate the request and * to generate the updated RKey. If this is successful, then * we fill in the WQE's "Bind" header fields. * Build the Bind Memory Window Segments for the WQE, * using the information from the RC Bind memory * Update the "ds" pointer. Even though the "bind" * operation requires no SGLs, this is necessary to * facilitate the correct descriptor size calculations * Now fill in the Data Segments (SGL) for the Send WQE based * on the values setup above (i.e. "sgl", "nds", and the "ds" * pointer. Start by checking for a valid number of SGL entries last_ds++;
/* real last ds of wqe to fill */ for (i =
nds; --i >= 0; ) {
* Fill in the Data Segment(s) for the current WQE, using the * information contained in the scatter-gather list of the /* Update some of the state in the QP */ /* Now set the ownership bit of the first one in the chain. */ /* do the invalidate of the headroom */ /* the FMA retry loop starts for Hermon doorbell register. */ /* the FMA retry loop ends. */ /* do the invalidate of the headroom */ * Update the "num_posted" return value (if necessary). * Then drop the locks and return success. * Context: Can be called from interrupt or base context. /* initialize the FMA retry loop */ * Check for user-mappable QP memory. Note: We do not allow kernel * clients to post to QP memory that is accessible directly by the * user. If the QP memory is user accessible, then return an error. * Check QP state. Can not post Send requests from the "Reset", * "Init", or "RTR" states /* Use these optimized functions most of the time */ /* general loop for non-optimized posting */ /* Grab the lock for the WRID list */ /* Save away some initial QP state */ /* Initialize posted_cnt */ * For each ibt_send_wr_t in the wr[] list passed in, parse the * request and build a Send WQE. NOTE: Because we are potentially * building a chain of WQEs to post, we want to build them all first, * and set the valid (HW Ownership) bit on all but the first. * However, we do not want to validate the first one until the * entire chain of WQEs has been built. Then in the final * we set the valid bit in the first, flush if needed, and as a last * step ring the appropriate doorbell. NOTE: the doorbell ring may * NOT be needed if the HCA is already processing, but the doorbell * ring will be done regardless. NOTE ALSO: It is possible for * more Work Requests to be posted than the HW will support at one * shot. If this happens, we need to be able to post and ring * several chains here until the the entire request is complete. * NOTE ALSO: the term "chain" is used to differentiate it from * Work Request List passed in; and because that's the terminology * from the previous generations of HCA - but the WQEs are not, in fact * chained together for Hermon * For the first WQE on a new chain we need "prev" to point * to the current descriptor. * unlike Tavor & Arbel, tail will maintain the number of the * next (this) WQE to be posted. Since there is no backward linking * in Hermon, we can always just look ahead * Before we begin, save the current "tail index" for later /* NOTE: don't need to go back one like arbel/tavor */ * Break the request up into lists that are less than or * equal to the maximum number of WQEs that can be posted * per doorbell ring - 256 currently * Check for "queue full" condition. If the queue * is already full, then no more WQEs can be posted. * So break out, ring a doorbell (if necessary) and * Increment the "tail index". Check for "queue * full" condition incl. headroom. If we detect that * the current work request is going to fill the work * queue, then we mark this condition and continue. * Don't need >=, because going one-by-one we have to * hit it exactly sooner or later * Get the address of the location where the next * Send WQE should be built * Call hermon_wqe_send_build() to build the WQE * at the given address. This routine uses the * information in the ibt_send_wr_t list (wr[]) and * returns the size of the WQE when it returns. * Now, build the Ctrl Segment based on * now, build up the control segment, leaving the /* Ensure correctness, set the ReRead bit */ * If this is not the first descriptor on the current * chain, then set the ownership bit. if (
currindx != 0) {
/* not the first */ * Update the current "tail index" and increment * If we reach here and there are one or more WQEs which have * been successfully built as a chain, we have to finish up * and prepare them for writing to the HW * 1. do the headroom fixup * 2. add in the size of the headroom for the sync * 3. write the owner bit for the first WQE * 5. fix up the structures * 6. hit the doorbell in UAR * Save away updated "tail index" for the DMA sync * including the headroom that will be needed /* do the invalidate of the headroom */ /* Do a DMA sync for current send WQE(s) */ /* Update some of the state in the QP */ * Now set the ownership bit of the first /* the FMA retry loop starts for Hermon doorbell. */ /* the FMA retry loop ends. */ * Update the "num_posted" return value (if necessary). * Then drop the locks and return success. * Context: Can be called from interrupt or base context. * Check for user-mappable QP memory. Note: We do not allow kernel * clients to post to QP memory that is accessible directly by the * user. If the QP memory is user accessible, then return an error. /* Initialize posted_cnt */ * Check if QP is associated with an SRQ * Check QP state. Can not post Recv requests from the "Reset" state /* Check that work request transport type is valid */ * Grab the lock for the WRID list, i.e., membar_consumer(). * This is not needed because the mutex_enter() above has /* Save away some initial QP state */ * Before we begin, save the current "tail index" for later /* Save away updated "tail index" for the DMA sync */ /* Update the doorbell record w/ wqecntr */ * Context: Can be called from interrupt or base context. * Check for user-mappable QP memory. Note: We do not allow kernel * clients to post to QP memory that is accessible directly by the * user. If the QP memory is user accessible, then return an error. * Check SRQ state. Can not post Recv requests when SRQ is in error /* Ring the doorbell w/ wqecntr */ * hermon_wqe_send_build() * Context: Can be called from interrupt or base context. /* Initialize the information for the Data Segments */ * Build a Send WQE depends first and foremost on the transport * type of Work Request (i.e. UD, RC, or UC) /* Ensure that work request transport type matches QP type */ * Validate the operation type. For UD requests, only the * "Send" and "Send LSO" operations are valid. * If this is a Special QP (QP0 or QP1), then we need to * build MLX WQEs instead. So jump to hermon_wqe_mlx_build() * and return whatever status it returns * Otherwise, if this is a normal UD Send request, then fill * all the fields in the Hermon UD header for the WQE. Note: * to do this we'll need to extract some information from the * Address Handle passed with the work request. * Build the Unreliable Datagram Segment for the WQE, using * the information from the address handle and the work /* mutex_enter(&ah->ah_lock); */ }
else {
/* IBT_WRC_SEND_LSO */ /* mutex_exit(&ah->ah_lock); */ /* Update "ds" for filling in Data Segments (below) */ /* Ensure that work request transport type matches QP type */ * Validate the operation type. For RC requests, we allow * "Send", "RDMA Read", "RDMA Write", various "Atomic" * operations, and memory window "Bind" * If this is a Send request, then all we need to do is break * out and here and begin the Data Segment processing below * If this is an RDMA Read or RDMA Write request, then fill * in the "Remote Address" header fields. * Build the Remote Address Segment for the WQE, using * the information from the RC work request. /* Update "ds" for filling in Data Segments (below) */ * If this is one of the Atomic type operations (i.e * Compare-Swap or Fetch-Add), then fill in both the "Remote * Address" header fields and the "Atomic" header fields. * Build the Remote Address and Atomic Segments for * the WQE, using the information from the RC Atomic /* Update "ds" for filling in Data Segments (below) */ * Update "nds" and "sgl" because Atomic requests have * only a single Data Segment (and they are encoded * somewhat differently in the work request. * If this is memory window Bind operation, then we call the * hermon_wr_bind_check() routine to validate the request and * to generate the updated RKey. If this is successful, then * we fill in the WQE's "Bind" header fields. * Build the Bind Memory Window Segments for the WQE, * using the information from the RC Bind memory * Update the "ds" pointer. Even though the "bind" * operation requires no SGLs, this is necessary to * facilitate the correct descriptor size calculations /* Ensure that work request transport type matches QP type */ * Validate the operation type. For UC requests, we only * allow "Send", "RDMA Write", and memory window "Bind". * Note: Unlike RC, UC does not allow "RDMA Read" or "Atomic" * If this is a Send request, then all we need to do is break * out and here and begin the Data Segment processing below * If this is an RDMA Write request, then fill in the "Remote * Address" header fields. * Build the Remote Address Segment for the WQE, using * the information from the UC work request. /* Update "ds" for filling in Data Segments (below) */ * If this is memory window Bind operation, then we call the * hermon_wr_bind_check() routine to validate the request and * to generate the updated RKey. If this is successful, then * we fill in the WQE's "Bind" header fields. * Build the Bind Memory Window Segments for the WQE, * using the information from the UC Bind memory * Update the "ds" pointer. Even though the "bind" * operation requires no SGLs, this is necessary to * facilitate the correct descriptor size calculations * Now fill in the Data Segments (SGL) for the Send WQE based on * the values setup above (i.e. "sgl", "nds", and the "ds" pointer * Start by checking for a valid number of SGL entries * For each SGL in the Send Work Request, fill in the Send WQE's data * segments. Note: We skip any SGL with zero size because Hermon * hardware cannot handle a zero for "byte_cnt" in the WQE. Actually * the encoding for zero means a 2GB transfer. last_ds++;
/* real last ds of wqe to fill */ * Return the size of descriptor (in 16-byte chunks) * For Hermon, we want them (for now) to be on stride size for (j =
nds; --j >= i; ) {
* Fill in the Data Segment(s) for the current WQE, using the * information contained in the scatter-gather list of the * Context: Can be called from interrupt or base context. /* Initialize the information for the Data Segments */ * Pull the address handle from the work request. The UDAV will * be used to answer some questions about the request. * If the request is for QP1 and the destination LID is equal to * the Permissive LID, then return an error. This combination is * Calculate the size of the packet headers, including the GRH * Begin to build the first "inline" data segment for the packet * headers. Note: By specifying "inline" we can build the contents * of the MAD packet headers directly into the work queue (as part * descriptor). This has the advantage of both speeding things up * memory for the packet headers. * Build Local Route Header (LRH) * We start here by building the LRH into a temporary location. * When we have finished we copy the LRH data into the descriptor. * Notice that the VL values are hardcoded. This is not a problem * because VL15 is decided later based on the value in the MLX * transport "next/ctrl" header (see the "vl15" bit below), and it * is otherwise (meaning for QP1) chosen from the SL-to-VL table * values. This rule does not hold for loopback packets however * (all of which bypass the SL-to-VL tables) and it is the reason * that non-QP0 MADs are setup with VL hardcoded to zero below. * Notice also that Source LID is hardcoded to the Permissive LID * (0xFFFF). This is also not a problem because if the Destination * LID is not the Permissive LID, then the "slr" value in the MLX * transport "next/ctrl" header will be set to zero and the hardware * will pull the LID from value in the port. * Build Global Route Header (GRH) * This is only built if necessary as defined by the "grh" bit in * the address vector. Note: We also calculate the offset to the * next header (BTH) based on whether or not the "grh" bit is set. * If the request is for QP0, then return an error. The * combination of global routine (GRH) and QP0 is not allowed. * Build Base Transport Header (BTH) * Notice that the M, PadCnt, and TVer fields are all set * to zero implicitly. This is true for all Management Datagrams * MADs whether GSI are SMI. * Build Datagram Extended Transport Header (DETH) /* Ensure that the Data Segment is aligned on a 16-byte boundary */ * Now fill in the Data Segments (SGL) for the MLX WQE based on the * values set up above (i.e. "sgl", "nds", and the "ds" pointer * Start by checking for a valid number of SGL entries * For each SGL in the Send Work Request, fill in the MLX WQE's data * segments. Note: We skip any SGL with zero size because Hermon * hardware cannot handle a zero for "byte_cnt" in the WQE. Actually * the encoding for zero means a 2GB transfer. Because of this special * encoding in the hardware, we mask the requested length with * HERMON_WQE_SGL_BYTE_CNT_MASK (so that 2GB will end up encoded as for (i = 0; i <
nds; i++) {
* Fill in the Data Segment(s) for the MLX send WQE, using * the information contained in the scatter-gather list of * Search through the contents of all MADs posted to QP0 to * initialize pointers to the places where Directed Route "hop * pointer", "hop count", and "mgmtclass" would be. Hermon * needs these updated (i.e. incremented or decremented, as * necessary) by software. * Hermon's Directed Route MADs need to have the "hop pointer" * currently less than or greater than the "hop count" (i.e. whether * the MAD is a request or a response.) * Now fill in the ICRC Data Segment. This data segment is inlined * just like the packets headers above, but it is only four bytes and * set to zero (to indicate that we wish the hardware to generate ICRC. * Return the size of descriptor (in 16-byte chunks) * For Hermon, we want them (for now) to be on stride size * hermon_wqe_recv_build() * Context: Can be called from interrupt or base context. * Fill in the Data Segments (SGL) for the Recv WQE - don't * need to have a reserved for the ctrl, there is none on the * recv queue for hermon, but will need to put an invalid * (null) scatter pointer per PRM /* Check for valid number of SGL entries */ * For each SGL in the Recv Work Request, fill in the Recv WQE's data * segments. Note: We skip any SGL with zero size because Hermon * hardware cannot handle a zero for "byte_cnt" in the WQE. Actually * the encoding for zero means a 2GB transfer. Because of this special * encoding in the hardware, we mask the requested length with * HERMON_WQE_SGL_BYTE_CNT_MASK (so that 2GB will end up encoded as * Fill in the Data Segment(s) for the receive WQE, using the * information contained in the scatter-gather list of the /* put the null sgl pointer as well if needed */ * Context: Can be called from interrupt or base context. /* Fill in the Data Segments (SGL) for the Recv WQE */ /* Check for valid number of SGL entries */ * For each SGL in the Recv Work Request, fill in the Recv WQE's data * segments. Note: We skip any SGL with zero size because Hermon * hardware cannot handle a zero for "byte_cnt" in the WQE. Actually * the encoding for zero means a 2GB transfer. Because of this special * encoding in the hardware, we mask the requested length with * HERMON_WQE_SGL_BYTE_CNT_MASK (so that 2GB will end up encoded as * Fill in the Data Segment(s) for the receive WQE, using the * information contained in the scatter-gather list of the * put in the null sgl pointer as well, if needed * hermon_wr_get_immediate() * Context: Can be called from interrupt or base context. * This routine extracts the "immediate data" from the appropriate * location in the IBTF work request. Because of the way the * work request structure is defined, the location for this data * depends on the actual work request operation type. /* For RDMA Write, test if RC or UC */ }
else {
/* IBT_UC_SRV */ /* For Send, test if RC, UD, or UC */ }
else {
/* IBT_UC_SRV */ * If any other type of request, then immediate is undefined * Context: can be called from interrupt or base, currently only from * Routine that fills in the headroom for the Send Queue if (j == 0) {
/* 1st section of wqe */ /* perserve ownership bit */ /* or just invalidate it */ * Context: Can be called from interrupt or base context. /* Get the DMA handle from SRQ context */ /* get base addr of the buffer */ /* Get the DMA handle from QP context */ /* Determine the base address of the QP buffer */ * Depending on the type of the work queue, we grab information * about the address ranges we need to DMA sync. * There are two possible cases for the beginning and end of the WQE * chain we are trying to sync. Either this is the simple case, where * the end of the chain is below the beginning of the chain, or it is * the "wrap-around" case, where the end of the chain has wrapped over * the end of the queue. In the former case, we simply need to * calculate the span from beginning to end and sync it. In the latter * case, however, we need to calculate the span from the top of the * work queue to the end of the chain and sync that, and then we need * to find the other portion (from beginning of chain to end of queue) * and sync that as well. Note: if the "top to end" span is actually * zero length, then we don't do a DMA sync because a zero length DMA * sync unnecessarily syncs the entire work queue. /* "From Beginning to End" */ /* "From Beginning to Bottom" */ * Context: Can be called from interrupt or base context. /* Check for a valid Memory Window handle in the WR */ /* Check for a valid Memory Region handle in the WR */ * Check here to see if the memory region has already been partially * deregistered as a result of a hermon_umap_umemlock_cb() callback. * If so, this is an error, return failure. /* Check for a valid Memory Window RKey (i.e. a matching RKey) */ /* Check for a valid Memory Region LKey (i.e. a matching LKey) */ * Now check for valid "vaddr" and "len". Note: We don't check the * "vaddr" range when "len == 0" (i.e. on unbind operations) * Validate the bind access flags. Remote Write and Atomic access for * the Memory Window require that Local Write access be set in the * corresponding Memory Region. /* Calculate the new RKey for the Memory Window */ * hermon_wrid_from_reset_handling() * Context: Can be called from interrupt or base context. /* grab the cq lock(s) to modify the wqavl tree */ /* Chain the newly allocated work queue header to the CQ's list */ * Now we repeat all the above operations for the receive work queue, * or shared receive work queue. * Note: We still use the 'qp_rq_cqhdl' even in the SRQ case. * hermon_wrid_to_reset_handling() * Context: Can be called from interrupt or base context. * If there are unpolled entries in these CQs, they are * Grab the CQ lock(s) before manipulating the lists. * Flush the entries on the CQ for this QP's QPN. * hermon_wrid_get_entry() * Context: Can be called from interrupt or base context. * Determine whether this CQE is a send or receive completion. /* Find the work queue for this QP number (send or receive side) */ * Regardless of whether the completion is the result of a "success" * or a "failure", we lock the list of "containers" and attempt to * search for the the first matching completion (i.e. the first WR * with a matching WQE addr and size). Once we find it, we pull out * the "wrid" field and return it (see below). XXX Note: One possible * future enhancement would be to enable this routine to skip over * any "unsignaled" completions to go directly to the next "signaled" /* put wqe back on the srq free list */ * hermon_wrid_workq_find() * Context: Can be called from interrupt or base context. * Walk the CQ's work queue list, trying to find a send or recv queue * with the same QP number. We do this even if we are going to later * create a new entry because it helps us easily find the end of the * hermon_wrid_wqhdr_create() * Context: Can be called from base context. * Allocate space for the wqhdr, and an array to record all the wrids. * Context: Can be called from interrupt or base context. * hermon_cq_workq_remove() * Context: Can be called from interrupt or base context.