tcp_output.c revision 6e91bba0d6c6bdabbba62cefae583715a4a58e2a
* if it is a snmp message, don't get behind the squeue "tcp_wput_proto, dropping one...");
* All Solaris components should pass a db_credp * for this TPI message, hence we ASSERT. * But in case there is some other M_PROTO that looks * like a TPI message sent by some other kernel * component, we check and return an error. * This was a SNMP request * Most ioctls can be processed right away without going via * squeues - process them right here. Those that do require * squeue (currently _SIOCSOCKFALLBACK) * are processed by tcp_wput_ioctl(). * The TCP normal data output path. * NOTE: the logic of the fast path is duplicated from this function. * tcp_wput_data() with NULL mp should only be called when /* Really tacky... but we need this for detached closes. */ * Don't allow data after T_ORDREL_REQ or T_DISCON_REQ, * or before a connection attempt has begun. "tcp_wput_data: data after ordrel, %s",
"tcp_wput_data: data after ordrel, %s\n",
/* If we are the first on the list ... */ /* If tiny tx and room in txq tail, pullup to save mblks. */ /* Tack on however many more positive length mblks we have */ * Note that tcp_mss has been adjusted to take into account the * timestamp option if applicable. Because SACK options do not * appear in every TCP segments and they are of variable lengths, * they cannot be included in tcp_mss. Thus we need to calculate * the actual segment length when we need to send a segment which * The three-way connection establishment handshake is not * complete yet. We want to queue the data for transmission * after entering ESTABLISHED state (RFC793). A jump to * "done" label effectively leaves data on the queue. * In the special case when cwnd is zero, which can only * happen if the connection is ECN capable, return now. * New segments is sent using tcp_timer(). The timer * is set in tcp_input_data(). * Note that tcp_cwnd is 0 before 3-way handshake is /* NOTE: trouble if xmitting while SYN not acked? */ * Check if the receiver has shrunk the window. If * tcp_wput_data() with NULL mp is called, tcp_fin_sent * cannot be set as there is unsent data, so FIN cannot * be sent out. Otherwise, we need to take into account * of FIN as it consumes an "invisible" sequence number. * The receiver has shrunk the window and we have sent * -usable_r date beyond the window, re-adjust. * If TCP window scaling is enabled, there can be * round down error as the advertised receive window * is actually right shifted n bits. This means that * the lower n bits info is wiped out. It will look * like the window is shrunk. Do a check here to * see if the shrunk amount is actually within the * error in window calculation. If it is, just * return. Note that this check is inside the * shrunk window check. This makes sure that even * though tcp_process_shrunk_swnd() is not called, * we will stop further processing. /* usable = MIN(swnd, cwnd) - unacked_bytes */ /* usable = MIN(usable, unsent) */ /* usable = MAX(usable, {1 for urgent, 0 for data}) */ /* Bypass all other unnecessary processing. */ * "Our" Nagle Algorithm. This is not the same as in the old * BSD. This is more in line with the true intent of Nagle. * 1. The amount of unsent data (or amount of data which can be * sent, whichever is smaller) is less than Nagle limit. * 2. The last sent size is also less than Nagle limit. * 3. There is unack'ed data. * 4. Urgent pointer is not set. Send urgent data ignoring the * Nagle algorithm. This reduces the probability that urgent * bytes get "merged" together. * 5. The app has not closed the connection. This eliminates the * wait time of the receiving side waiting for the last piece of * If all are satisified, exit without sending anything. Note * that Nagle limit can be smaller than 1 MSS. Nagle limit is * the smaller of 1 MSS and global tcp_naglim_def (default to be * If tcp_zero_win_probe is not set and the tcp->tcp_cork option * is set, then we have to force TCP not to send partial segment * (smaller than MSS bytes). We are calculating the usable now * based on full mss and will save the rest of remaining data for * later. When tcp_zero_win_probe is set, TCP needs to send out * something to do zero window probe. /* Update the latest receive window size in TCP header. */ /* Pretend that all we were trying to send really got sent */ * If new data was sent, need to update the notsack * list, which is, afterall, data blocks that have * not been sack'ed by the receiver. New data is /* len is a negative value. */ * Didn't send anything. Make sure the timer is running * so that we will probe a zero window. /* Note that len is the amount we just sent but with a negative sign */ * Initial STREAMS write side put() procedure for sockets. It tries to * handle the T_CAPABILITY_REQ which sockfs sends down while setting * up the socket without using the squeue. Non T_CAPABILITY_REQ messages * are handled by tcp_wput() as usual. * All further messages will also be handled by tcp_wput() because we cannot * be sure that the above short cut is safe later. * Call by tcp_wput() to handle misc non M_DATA messages. * TCP is D_MP and qprocsoff() is done towards the end of the tcp_close. * Once the close starts, streamhead and sockfs will not let any data * packets come down (close ensures that there are no threads using the * queue and no new threads will come down) but since qprocsoff() * hasn't happened yet, a M_FLUSH or some non data message might * get reflected back (in response to our own FLUSHRW) and get * processed after tcp_close() is done. The conn would still be valid * because a ref would have added but we need to check the state * before actually processing the packet. /* tcp_wput_flush is called by tcp_wput_nondata to handle M_FLUSH messages. */ /* TODO: How should flush interact with urgent data? */ * Flush only data that has not yet been put on the wire. If * we flush data that we have already transmitted, life, as we * know it, may come to an end. * We have no unsent data, so unsent must be less than * conn_sndlowat, so re-enable flow. * TODO: you can't just flush these, you have to increase rwnd for one * thing. For another, how should urgent data interact? * tcp_wput_iocdata is called by tcp_wput_nondata to handle all M_IOCDATA /* Make sure it is one of ours. */ * If the conn is closing, then error the ioctl here. Otherwise * use the CONN_IOCTLREF_* macros to hold off tcp_close until /* Copy out the strbuf. */ /* Check alignment of the strbuf */ /* Copy out the address */ * tcp_wput_ioctl is called by tcp_wput_nondata() to handle all M_IOCTL * Try and ASSERT the minimum possible references on the * conn early enough. Since we are executing on write side, * the connection is obviously not detached and that means * there is a ref each for TCP and IP. Since we are behind * the squeue, the minimum references needed are 3. If the * conn is in classifier hash list, there should be an * extra ref for that (we check both the possibilities). * Either sockmod is about to be popped and the socket * would now be treated as a plain stream, or a module * is about to be pushed so we could no longer use read- * side synchronous streams for fused loopback tcp. * Drain any queued data and disable direct sockfs * If the conn is closing, then error the ioctl here. Otherwise bump the * conn_ioctlref to hold off tcp_close until we're done here. * This routine is called by tcp_wput() to handle all TPI requests. * Try and ASSERT the minimum possible references on the * conn early enough. Since we are executing on write side, * the connection is obviously not detached and that means * there is a ref each for TCP and IP. Since we are behind * the squeue, the minimum references needed are 3. If the * conn is in classifier hash list, there should be an * extra ref for that (we check both the possibilities). /* TODO: options, flags, ... from user */ /* Set length to zero for reclamation below */ "tcp_wput_proto, dropping one...");
* save the kssl_ent_t from the next block, and convert this * back to a normal bind_req. * Note: no support for snmpcom_req() through new * T_OPTMGMT_REQ. See comments in ip.c * All Solaris components should pass a db_credp * for this TPI message, hence we ASSERT. * But in case there is some other M_PROTO that looks * like a TPI message sent by some other kernel * component, we check and return an error. * If EINPROGRESS is returned, the request has been queued * for subsequent processing by ip_restart_optmgmt(), which * will do the CONN_DEC_REF(). * We were crossing FINs and got a reset from * the other side. Just ignore it. "tcp_wput_proto, T_ORDREL_REQ out of " "tcp_wput_proto, bogus TPI msg, type %d",
* We used to M_ERROR. Sending TNOTSUPPORT gives the user * Handle special out-of-band ioctl requests (see PSARC/2008/265). * The TCP fast path write put procedure. * NOTE: the logic of the fast path is duplicated from tcp_wput_data() * Try and ASSERT the minimum possible references on the * conn early enough. Since we are executing on write side, * the connection is obviously not detached and that means * there is a ref each for TCP and IP. Since we are behind * the squeue, the minimum references needed are 3. If the * conn is in classifier hash list, there should be an * extra ref for that (we check both the possibilities). /* Bypass tcp protocol for fused tcp loopback */ * If ZEROCOPY has turned off, try not to send any zero-copy message * Criteria for fast path: * 2. single mblk in request * 3. connection established /* queue new packet onto retransmission queue */ /* find out how much we can send */ * |--------------|-----------------| * tcp_suna tcp_snxt tcp_suna+tcp_swnd /* start sending from tcp_snxt */ * Check to see if this connection has been idled for some * time and no ACK is expected. If it is, we need to slow * start again to get back the connection's "self-clock" as * described in VJ's paper. * Reinitialize tcp_cwnd after idle. /* usable can be < 0 if the congestion window is smaller */ /* Can't send complete M_DATA in one shot */ * determine if anything to send (Nagle). * 1. len < tcp_mss (i.e. small) * 2. unacknowledged data present * 4. last packet sent < nagle limit (previous packet sent) * This was the first unsent packet and normally * mss < xmit_hiwater so there is no need to worry * about flow control. The next packet will go * through the flow control check in tcp_wput_data(). /* leftover work from above */ * len <= tcp->tcp_mss && len == unsent so no sender silly window. Can /* we have always sent something */ /* adjust tcp header information */ /* Update the latest receive window size in TCP header. */ /* see if we need to allocate a mblk for the headers */ /* NOTE: we assume allocb returns an OK_32PTR */ /* Leave room for Link Level header */ /* Fill in the timestamp option. */ /* copy header into outgoing packet */ * Set the ECN info in the TCP header. Note that this * is not the template header. * If we ran out of memory, we pretend to have sent the packet * and that it was lost on the wire. /* leftover work from above */ * Try to force urgent data out on the wire. Even if we have unsent * data this will at least send the urgent flag. * XXX does not handle more flag correctly. /* Bypass tcp protocol for fused tcp loopback */ /* Strip off the T_EXDATA_REQ if the data is from TPI */ * Called by streams close routine via squeues when our client blows off her * descriptor, we take this to mean: "close the stream state NOW, close the tcp * connection politely" When SO_LINGER is set (with a non-zero linger time and * it is not a nonblocking socket) then this routine sleeps until the FIN is * NOTE: tcp_close potentially returns error when lingering. * However, the stream head currently does not pass these errors * to the application. 4.4BSD only returns EINTR and EWOULDBLOCK * errors to the application (from tsleep()) and not errors * like ECONNRESET caused by receiving a reset packet. /* Cleanup for listener */ msg =
"tcp_close, during connect";
* Close during the connect 3-way handshake * but here there may or may not be pending data * already on queue. Process almost same as in * If SO_LINGER has set a zero linger time, abort the * connection with a reset. msg =
"tcp_close, zero lingertime";
* Abort connection if there is unread data queued. msg =
"tcp_close, unread data";
* We have done a qwait() above which could have possibly * drained more messages in turn causing transition to a * different state. Check whether we have to do the rest * of the processing or not. * Transmit the FIN before detaching the tcp_t. * no longer owns the tcp_t thus others can modify it. * If lingering on close then wait until the fin is acked, /* tcp_close_linger_timeout will finish close */ * Check if we need to detach or just close * Make sure that no other thread will access the conn_rq of * this instance (through lookups etc.) as conn_rq will go * Need to cancel those timers which will not be used when * TCP is detached. This has to be done before the conn_wq * If delta is zero the timer event wasn't executed and was * successfully canceled. In this case we need to restart it * with the minimal delta possible. /* Detach did not complete. Still need to remove q from stream. */ * Don't change the queues in the case of a listener that has * eagers in its q or q0. It could surprise the eagers. * Instead wait for the eagers outside the squeue. /* Signal tcp_close() to finish closing. */ * We were crossing FINs and got a reset from * the other side. Just ignore it. "tcp_shutdown_output() out of state %s",
* Check here to avoid sending zero-copy message down to IP when * ZEROCOPY capability has turned off. We only need to deal with * the race condition between sockfs and the notification here. * Since we have tried to backoff the tcp_xmit_head when turning * zero-copy off and new messages in tcp_output(), we simply drop * the dup'ed packet here and let tcp retransmit, if tcp_xmit_zc_clean /* Guard against a RST having blown it away while on the squeue */ * tcp_send() is called by tcp_wput_data() and returns one of the following: * -1 = failed allocation. * 0 = success; burst count reached, or usable send window is too small, * and that we'd rather wait until later before sending again. * Check LSO possibility. The value of tcp->tcp_lso indicates whether * the underlying connection is LSO capable. Will check whether having * enough available data to initiate LSO transmission in the for(){} * Burst count reached, return successfully. * Calculate the maximum payload length we can send at one * Check whether be able to to do LSO for the current * Adjust num_burst_seg here. break;
/* success; too small */ * Sender silly-window avoidance. * Ignore this if we are going to send a * TODO: force data into microscopic window? * ==> (!pushed || (unsent > usable)) * If the retransmit timer is not running * we start it so that we will retransmit * in the case when the receiver has * decremented the window. * We are not supposed to send * anything. So let's wait a little * bit longer before breaking SWS * What should the value be? * Suggestion: MAX(init rexmit time, break;
/* success; too small */ * The reason to adjust len here is that we need to set flags * and calculate checksum. *
usable -=
len;
/* Approximate - can be adjusted later */ * Prime pump for IP's checksumming on our behalf. * Include the adjustment for a source route if any. * In case of LSO, the partial pseudo-header checksum should * exclusive TCP length, so zero tha_sum before IP calculate * pseudo-header checksum for partial checksum offload. * Branch off to tcp_xmit_mp() if any of the VALID bits is * set. For the case when TCP_FSS_VALID is the only valid * bit (normal active close), branch off only when we think * that the FIN flag needs to be set. Note for this case, * that (snxt + len) may not reflect the actual seg_len, * as len may be further reduced in tcp_xmit_mp(). If len * gets modified, we will end up here again. /* Restore tcp_snxt so we get amount sent right. */ * If the previous timestamp is still in use, if (
len <=
mss)
/* LSO is unusable (!do_lso_send) */ *
snxt +=
len;
/* Adjust later if we don't send all of len */ /* Are the bytes above us in flight? */ if (
len <=
mss)
/* LSO is unusable */ return (-
1);
/* out_of_mem */ * If the old timestamp is no longer in use, * sample a new timestamp now. if (
len <=
mss)
/* LSO is unusable (!do_lso_send) */ return (-
1);
/* out_of_mem */ * There are four reasons to allocate a new hdr mblk: * 1) The bytes above us are in use by another packet * 2) We don't have good alignment * 3) The mblk is being shared * 4) We don't have enough room for a header /* NOTE: we assume allocb returns an OK_32PTR */ return (-
1);
/* out_of_mem */ /* Leave room for Link Level header */ * Fill in the header using the template header, and add * options such as time-stamp, ECN and/or SACK, as needed. * If we're a little short, tack on more mblks until * there is no more spillover. * Excess data in mblk; can we split it? * If LSO is enabled for the connection, * keep on splitting as this is a transient * Don't split if stream head was * told to break up larger writes * Next mblk is less than SMSS/2 * rounded up to nearest 64-byte; * let it get sent as part of the /* Stash for rtt use later */ return (-
1);
/* out_of_mem */ /* Trim back any surplus on the last mblk */ * We did not send everything we could in * order to remain within the b_cont limit. /* Append LSO information to the mp. */ * Restore values of ixa_fragsize and ixa_extra_ident. * Make sure to clean up LSO information. Wherever a * new mp uses the prepended header room after dupb(), * lso_info_cleanup() should be called. * Initiate closedown sequence on an active connection. (May be called as * writer.) Return value zero for OK return, non-zero for error return. * Invalid state, only states TCPS_SYN_RCVD, * TCPS_ESTABLISHED and TCPS_CLOSE_WAIT are valid * If there is nothing more unsent, send the FIN now. * Otherwise, it will go out with the last segment. * Couldn't allocate msg. Pretend we got it out. * Wait for rexmit timeout. * If needed, update tcp_rexmit_snxt as tcp_snxt is * If tcp->tcp_cork is set, then the data will not get sent, * so we have to check that and unset it first. * If TCP does not get enough samples of RTT or tcp_rtt_updates * is 0, don't update the cache. * We do not have a good algorithm to update ssthresh at this time. * So don't do any update. * Note that uinfo is kept for conn_faddr in the DCE. Could update even * if source routed but we don't. * If we are going to create a DCE we'd better have * Send out a control packet on the tcp connection specified. This routine * is typically called where we need a simple ACK or RST generated. * Save sum for use in source route later. /* If a text string is passed in with the request, pass it to strlog. */ "tcp_xmit_ctl: '%s', seq 0x%x, ack 0x%x, ctl 0x%x",
* Don't send TSopt w/ TH_RST packets per RFC 1323. /* Update the latest receive window size in TCP header. */ /* Track what we sent to the peer */ * Include the adjustment for a source route if any. * Generate a reset based on an inbound packet, connp is set by caller * when RST is in response to an unexpected inbound packet for which * there is active tcp state in the system. * IPSEC NOTE : Try to send the reply with the same protection as it came * in. We have the ip_recv_attr_t which is reversed to form the ip_xmit_attr_t. * That way the packet will go out at the same level of protection as it * If connp != NULL we use conn_ixa to keep IP_NEXTHOP and other * options from the listener. In that case the caller must ensure that * we are running on the listener = connp squeue. * We get a safe copy of conn_ixa so we don't need to restore anything * we or ip_output_simple might change in the ixa. * IXAF_VERIFY_SOURCE is overkill since we know the "tcp_xmit_early_reset: '%s', seq 0x%x, ack 0x%x, " * We skip reversing source route here. * (for now we replace all IP options with EOL) * Make sure that src address isn't flagrantly invalid. * Not all broadcast address checking for the src address * is possible, since we don't know the netmask of the src * addr. No check for destination address is done, since * IP will not pass up a packet with a broadcast dest * address to TCP. Similar checks are done below for IPv6. /* Remove any extension headers assuming partial overlay */ /* Discard any old label */ * Apply IPsec based on how IPsec was applied to * the packet that caused the RST. /* Note: mp already consumed and ip_drop_packet done */ * This is in clear. The RST message we are building * here should go out in clear, independent of our policy. * NOTE: one might consider tracing a TCP packet here, but * this function has no active TCP state and no tcp structure * that has a trace buffer. If we traced here, we would have * to keep a local trace buffer in tcp_record_trace(). * Generate a "no listener here" RST in response to an "unknown" segment. * connp is set by caller when RST is in response to an unexpected * inbound packet for which there is active tcp state in the system. * Note that we are reusing the incoming mp to construct the outgoing RST. * The conn_t parameter is NULL because we already know char *,
"Could not reply with RST to mp(1)",
ip2dbg((
"tcp_xmit_listeners_reset: not permitted to reply\n"));
* Here we violate the RFC. Note that a normal * TCP will never send a segment without the ACK * flag, except for RST or SYN segment. This * segment is neither. Just drop it on the * tcp_xmit_mp is called to return a pointer to an mblk chain complete with * ip and tcp header ready to pass down to IP. If the mp passed in is * non-NULL, then up to max_to_send bytes of data will be dup'ed off that * mblk. (If sendall is not set the dup'ing will stop at an mblk boundary * otherwise it will dup partial mblks.) * Otherwise, an appropriate ACK packet will be generated. This * routine is not usually called to send new data for the first time. It * is mostly called out of the timer for retransmits, and to generate ACKs. * If offset is not NULL, the returned mblk chain's first mblk's b_rptr will * be adjusted by *offset. And after dupb(), the offset and the ending mblk * of the original mblk chain will be returned in *offset and *end_mp. /* Allocate for our maximum TCP header + link-level */ * Note that tcp_mss has been adjusted to take into account the * timestamp option if applicable. Because SACK options do not * appear in every TCP segments and they are of variable lengths, * they cannot be included in tcp_mss. Thus we need to calculate * the actual segment length when we need to send a segment which /* We use offset as an indicator that end_mp is not NULL. */ /* This could be faster with cooperation from downstream */ * Don't send the next mblk since the whole mblk /* Update the latest receive window size in TCP header. */ * Use tcp_unsent to determine if the PUSH bit should be used assumes * that this function was called from tcp_wput_data. Thus, when called * to retransmit data the setting of the PUSH bit may appear some * what random in that it might get set when it should not. This * should not pose any performance issues. * Only set ECT bit and ECN_CWR if a segment contains new data. * There is no TCP flow control for non-data segments, and * only data segment is transmitted reliably. * If TCP_ISS_VALID and the seq number is tcp_iss, * TCP can only be in SYN-SENT, SYN-RCVD or * FIN-WAIT-1 state. It can be FIN-WAIT-1 if * our SYN is not ack'ed but the app closes this * Tack on the MSS option. It is always needed * for both active and passive open. * MSS option value should be interface MTU - MIN * TCP/IP header according to RFC 793 as it means * the maximum segment size TCP can receive. But * to get around some broken middle boxes/end hosts * out there, we allow the option value to be the * same as the MSS option size on the peer side. * In this way, the other side will not send * anything larger than they can receive. * Note that for SYN_SENT state, the ndd param * tcp_use_smss_as_mss_opt has no effect as we * don't know the peer's MSS option value. So * the only case we need to take care of is in * SYN_RCVD state, which is done later. /* Update the offset to cover the additional word */ * Note that the following way of filling in * TCP options are not optimal. Some NOPs can * be saved. But there is no need at this time * to optimize it. When it is needed, we will * Set up all the bits to tell other side * Reset the MSS option value to be SMSS * We should probably add back the bytes * for timestamp option and IPsec. We * don't do that as this is a workaround * is better for us to be more cautious. * They may not take these things into * account in their SMSS calculation. Thus * the peer's calculated SMSS may be smaller * than what it can be. This should be OK. * If the other side is ECN capable, reply * that we are also ECN capable. * The above ASSERT() makes sure that this * must be FIN-WAIT-1 state. Our SYN has * not been ack'ed so retransmit it. /* allocb() of adequate mblk assures space */ * Get IP set to checksum on our behalf * Include the adjustment for a source route if any. u1 = (
u1 >>
16) + (
u1 &
0xFFFF);
* Note the trick here. u1 is unsigned. When tcp_urg * is smaller than seq, u1 will become a very huge value. * So the comparison will fail. Also note that tcp_urp * should be positive, see RFC 793 page 17. * Include the adjustment for a source route if any. * If this routine returns B_TRUE, TCP can generate a RST in response * to a segment. If it returns B_FALSE, TCP should not respond. * TCP needs to protect itself from generating too many RSTs. * This can be a DoS attack by sending us random segments * What we do here is to have a limit of tcp_rst_sent_rate RSTs * in each 1 second interval. In this way, TCP still generate * RSTs in normal cases but when under attack, the impact is * This function handles all retransmissions if SACK is enabled for this * connection. First it calculates how many segments can be retransmitted * based on tcp_pipe. Then it goes thru the notsack list to find eligible * segments. A segment is eligible if sack_cnt for that segment is greater * than or equal tcp_dupack_fast_retransmit. After it has retransmitted * all eligible segments, it checks to see if TCP can send some new segments * (fast recovery). If it can, set the appropriate flag for tcp_input_data(). * tcp_t *tcp: the tcp structure of the connection. * uint_t *flags: in return, appropriate value will be set for /* Defensive coding in case there is a bug... */ * Limit the num of outstanding data in the network to be * tcp_cwnd_ssthresh, which is half of the original congestion wnd. /* At least retransmit 1 MSS of data. */ /* Make sure no new RTT samples will be taken. */ * All holes are filled. Manipulate tcp_cwnd to send more * if we can. Note that after the SACK recovery, tcp_cwnd is * set to tcp_cwnd_ssthresh. * Note that we may send more than usable_swnd allows here * because of round off, but no more than 1 MSS of data. /* This should not happen. Defensive coding again... */ * Update the send timestamp to avoid false retransmission. * Update tcp_rexmit_max to extend this SACK recovery phase. * This happens when new data sent during fast recovery is * also lost. If TCP retransmits those new data, it needs * to extend SACK recover phase to avoid starting another * tcp_ss_rexmit() is called to do slow start retransmission after a timeout * To limit the number of duplicate segments, we limit the number of segment * to be sent in one time to tcp_snd_burst, the burst variable. * Note that tcp_rexmit can be set even though TCP has retransmitted * Update the send timestamp to avoid false * If we have transmitted all we have at the time * we started the retranmission, we can leave * the rest of the job to tcp_wput_data(). But we * need to check the send window first. If the * win is not 0, go on with tcp_wput_data(). /* Only call tcp_wput_data() if there is data to be sent. */ * Do slow start retransmission after ICMP errors of PMTU changes. * All sent data has been acknowledged or no data left to send, just * tcp_get_seg_mp() is called to get the pointer to a segment in the * send queue which starts at the given sequence number. If the given * sequence number is equal to last valid sequence number (tcp_snxt), the * returned mblk is the last valid mblk, and off is set to the length of * send queue which starts at the given seq. no. * tcp_t *tcp: the tcp instance pointer. * uint32_t seq: the starting seq. no of the requested segment. * int32_t *off: after the execution, *off will be the offset to * the returned mblk which points to the requested seq no. * It is the caller's responsibility to send in a non-null off. * A mblk_t pointer pointing to the requested segment in send queue. /* Defensive coding. Make sure we don't send incorrect data. */ * This routine adjusts next-to-send sequence number variables, in the * case where the reciever has shrunk it's window. /* Get the mblk, and the offset in it, as per the shrunk window */ * This handles the case when the receiver has shrunk its win. Per RFC 1122 * if the receiver shrinks the window, i.e. moves the right window to the * left, the we should not send new data, but should retransmit normally the * old unacked data between suna and suna + swnd. We might has sent data * that is now outside the new window, pretend that we didn't send it. /* Pretend we didn't send the data outside the window */ /* Reset all the values per the now shrunk window */ * If the SACK option is set, delete the entire list of * Make sure the timer is running so that we will probe a zero * tcp_fill_header is called by tcp_send() to fill the outgoing TCP header * with the template header, as well as other options such as time-stamp, /* Header of outgoing packet */ /* dst and src are opaque 32-bit fields, used for copying */ /* Fill time-stamp option if needed */ * Copy the template header; is this really more efficient than * calling bcopy()? For simple IPv4/TCP, it may be the case, * but perhaps not for other scenarios. * Set the ECN info in the TCP header if it is not a zero * window probe. Zero window probe is only sent in * tcp_wput_data() and tcp_timer(). /* Fill in SACK options */