pxtcp.c revision 1c08b0ec28ca5c600c21c0ab5a53cae73f1c821d
2N/A/* -*- indent-tabs-mode: nil; -*- */ 2N/A/* NetBSD doesn't report POLLHUP for TCP sockets */ 2N/A * Ring buffer for inbound data. Filled with data from the host 2N/A * socket on poll manager thread. Data consumed by scheduling 2N/A * tcp_write() to the pcb on the lwip thread. 2N/A * NB: There is actually third party present, the lwip stack itself. 2N/A * Thus the buffer doesn't have dual free vs. data split, but rather 2N/A * three-way free / send and unACKed data / unsent data split. 2N/A * Start of free space, producer writes here (up till "unacked"). 2N/A * Start of sent but unacknowledged data. The data are "owned" by 2N/A * the stack as it may need to retransmit. This is the free space 2N/A * limit for producer. 2N/A * "vacant"). Not declared volatile since it's only accessed from 2N/A * the consumer thread. 2N/A * references depend on this "inheritance". * Host (external) side of the proxied connection. * Socket events we are currently polling for. * Socket error. Currently used to save connect(2) errors so that * we can decide if we need to send ICMP error. * Interface that we have got the SYN from. Needed to send ICMP * with correct source address. * For tentatively accepted connections for which we are in * process of connecting to the real destination this is the * initial pbuf that we might need to build ICMP error. * When connection is established this is used to hold outbound * pbuf chain received by pxtcp_pcb_recv() but not yet completely * forwarded over the socket. We cannot "return" it to lwIP since * the head of the chain is already sent and freed. * Guest has closed its side. Reported to pxtcp_pcb_recv() only * once and we might not be able to forward it immediately if we * Outbound half-close has been done on the socket. * External has closed its side. We might not be able to forward * it immediately if we have unforwarded data. * Inbound half-close has been done on the pcb. * On systems that report POLLHUP as soon as the final FIN is * received on a socket we cannot continue polling for the rest of * input, so we have to read (pull) last data from the socket on * manager thread. See comment in pxtcp_pmgr_pump() POLLHUP case. * When poll manager schedules delete we may not be able to delete * a pxtcp immediately if not all inbound data has been acked by * the guest: lwIP may need to resend and the data are in pxtcp's * inbuf::buf. We defer delete until all data are acked to * It's also implied by inbound_pull. It probably means that * "deferred" is not a very fortunate name. * Ring-buffer for inbound data. * lwIP thread's strong reference to us. * We use static messages to call functions on the lwIP thread to /* poll manager callbacks for pxtcp related channels */ /* poll manager callbacks for individual sockets */ /* get incoming traffic into ring buffer */ /* convenience functions for poll manager callbacks */ /* lwip thread callbacks called via proxy_lwip_post() */ /* poll manager handlers for pxtcp channels */ * Init PXTCP - must be run when neither lwIP tcpip thread, nor poll * manager threads haven't been created yet. * Listen to outgoing connection from guest(s). * Syntactic sugar for sending pxtcp pointer over poll manager * channel. Used by lwip thread functions. * Syntactic sugar for sending weak reference to pxtcp over poll * manager channel. Used by lwip thread functions. * Counterpart of pxtcp_chan_send(). * Counterpart of pxtcp_chan_send_weak(). * Register pxtcp with poll manager. * Used for POLLMGR_CHAN_PXTCP_ADD and by port-forwarding. Since * error handling is different in these two cases, we leave it up to * Unregister pxtcp with poll manager. * Used for POLLMGR_CHAN_PXTCP_RESET and by port-forwarding (on error * POLLMGR_CHAN_PXTCP_ADD handler. * Get new pxtcp from lwip thread and start polling its socket. DPRINTF0((
"pxtcp_add: new pxtcp %p; pcb %p; sock %d\n",
* POLLMGR_CHAN_PXTCP_POLLOUT handler. * pxtcp_pcb_forward_outbound() on the lwIP thread tried to send data * and failed, it now requests us to poll the socket for POLLOUT and * schedule pxtcp_pcb_forward_outbound() when sock is writable again. * POLLMGR_CHAN_PXTCP_POLLIN handler. * POLLMGR_CHAN_PXTCP_DEL handler. * Schedule pxtcp deletion. We only need this if host system doesn't * report POLLHUP for fully closed tcp sockets. DPRINTF((
"PXTCP_DEL: pxtcp %p; pcb %p; sock %d\n",
#
endif /* !HAVE_TCP_POLLHUP */ * POLLMGR_CHAN_PXTCP_RESET handler. * Close the socket with RST and delete pxtcp. DPRINTF0((
"PXTCP_RESET: pxtcp %p; pcb %p; sock %d\n",
* Exported to fwtcp to create pxtcp for incoming port-forwarded * connections. Completed with pcb in pxtcp_pcb_connect(). * Counterpart to pxtcp_create_forwarded() to destruct pxtcp that * fwtcp failed to register with poll manager to post to lwip thread DPRINTF((
"%s: pxtcp %p <-> pcb %p\n",
* We must have dissociated from a fully closed pcb immediately * since lwip recycles them and we don't wan't to mess with what * would be someone else's pcb that we happen to have a stale * Lwip thread callback invoked via pxtcp::msg_delete * Since we use static messages to communicate to the lwip thread, we * cannot delete pxtcp without making sure there are no unprocessed * messages in the lwip thread mailbox. * The easiest way to ensure that is to send this "delete" message as * the last one and when it's processed we know there are no more and * it's safe to delete pxtcp. * Poll manager handlers should use pxtcp_schedule_delete() DPRINTF((
"%s: pxtcp %p, pcb %p, sock %d%s\n",
?
" (was deferred)" :
"")));
* pxtcp is no longer registered with poll manager, so it's safe * We might have already dissociated from a fully closed pcb, or * guest might have sent us a reset while msg_delete was in * transit. If there's no pcb, we are done. * Have we completely forwarded all inbound traffic to the guest? * We may still be waiting for ACKs. We may have failed to send * some of the data (tcp_write() failed with ERR_MEM). We may * have failed to send the FIN (tcp_shutdown() failed with " unacked %d, unsent %d, vacant %d, %s - DEFER!\n",
* If we couldn't delete pxtcp right away in the msg_delete callback * from the poll manager thread, we repeat the check at the end of * relevant pcb callbacks. * Poll manager callbacks should use this convenience wrapper to * schedule pxtcp deletion on the lwip thread and to deregister from * If pollmgr_refptr_get() is called by any channel before * scheduled deletion happens, let them know we are gone. * Schedule deletion. Since poll manager thread may be pre-empted * right after we send the message, the deletion may actually * happen on the lwip thread before we return from this function, * so it's not safe to refer to pxtcp after this call. /* tell poll manager to deregister us */ * Lwip thread callback invoked via pxtcp::msg_reset * Like pxtcp_pcb_delete(), but sends RST to the guest before DPRINTF0((
"%s: pxtcp %p, pcb %p, sock %d\n",
* Poll manager callbacks should use this convenience wrapper to * schedule pxtcp reset and deletion on the lwip thread and to * deregister from the poll manager. * See pxtcp_schedule_delete() for additional comments. * Reject proxy connection attempt. Depending on the cause (sockerr) * we may just drop the pcb silently, generate an ICMP datagram or * Called from poll manager thread via pxtcp::msg_accept when proxy * failed to connect to the destination. Also called when we failed * to register pxtcp with poll manager. * This is like pxtcp_pcb_reset_pxtcp() but is more discriminate in * how this unestablished connection is terminated. DPRINTF0((
"%s: pxtcp %p, pcb %p, sock %d: errno %d\n",
* Convenience wrapper for poll manager connect callback to reject * Like pxtcp_schedule_reset(), but the callback is more discriminate * in how this unestablished connection is terminated. * Global tcp_proxy_accept() callback for proxied outgoing TCP * connections from guest(s). * TCP first calls accept callback when it receives the first SYN * and "tentatively accepts" new proxied connection attempt. When * proxy "confirms" the SYN and sends SYN|ACK and the guest * replies with ACK the accept callback is called again, this time * with the established connection. /* save initial datagram in case we need to reply with ICMP */ DPRINTF0((
"%s: pcb %p, sock %d: errno %d\n",
* tcp_proxy_accept() callback for accepted proxied outgoing TCP * connections from guest(s). This is "real" accept with three-way /* send any inbound data that are already queued */ * Initial poll manager callback for proxied outgoing TCP connections. * pxtcp_pcb_accept() sets pxtcp::pmhdl::callback to this. * Waits for connect(2) to the destination to complete. On success * replaces itself with pxtcp_pmgr_pump() callback common to all * established TCP connections. if (
status < 0) {
/* should not happen */ perror(
"connect: getsockopt");
/* confirm accept to the guest */ * Switch to common callback used for all established proxied * Initially we poll for incoming traffic only. Outgoing * traffic is fast-forwarded by pxtcp_pcb_recv(); if it fails * it will ask us to poll for POLLOUT too. /* should never get here */ DPRINTF0((
"%s: pxtcp %p, sock %d: unexpected revents 0x%x\n",
* Called from poll manager thread via pxtcp::msg_accept when proxy * connected to the destination. Finalize accept by sending SYN|ACK /* we are not going to reply with ICMP, so we can drop initial pbuf */ * If lwIP failed to enqueue SYN|ACK because it's out of pbufs it * abandons the pcb. Retrying that is not very easy, since it * would require keeping "fractional state". From guest's point * of view there is no reply to its SYN so it will either resend * the SYN (effetively triggering full connection retry for us), * or it will eventually time out. * else if (error != ERR_OK): even if tcp_output() failed with * ERR_MEM - don't give up, that SYN|ACK is enqueued and will be * retransmitted eventually. * Entry point for port-forwarding. * fwtcp accepts new incoming connection, creates pxtcp for the socket * (with no pcb yet) and adds it to the poll manager (polling for * errors only). Then it calls this function to construct the pcb and * perform connection to the guest. /* nit: comapres PF and AF, but they are the same everywhere */ /* lwip port arguments are in host order */ * Port-forwarded connection to guest is successful, pump data. DPRINTF0((
"%s: new pxtcp %p; pcb %p; sock %d\n",
/* ACK on connection is like ACK on data in pxtcp_pcb_sent() */ * Have we done sending previous batch? * Return an error to tell TCP to hold onto that pbuf. * It will be presented to us later from tcp_fasttmr(). * Unlike data, p == NULL indicating orderly shutdown is * NOT presented to us again * Got data, send what we can without blocking. * Guest half-closed its TX side of the connection. * Called either immediately from pxtcp_pcb_recv() when it gets NULL, * or from pxtcp_pcb_forward_outbound() when it finishes forwarding * previously unsent data and sees pxtcp::outbound_close flag saved by DPRINTF((
"outbound_close: pxtcp %p; pcb %p %s\n",
* NB: set the flag first, since shutdown() will trigger POLLHUP * if inbound is already closed, and poll manager asserts * outbound_close_done (may be it should not?). * On NetBSD POLLHUP is not reported for TCP sockets, so we need * to nudge poll manager manually. /* no more outbound data coming to us */ * If we have already done inbound close previously (active close * on the pcb), then we must not hold onto a pcb in TIME_WAIT * state since those will be recycled by lwip when it runs out of * The test is true also for a pcb in CLOSING state that waits * just for the ACK of its FIN (to transition to TIME_WAIT). * Forward outbound data from pcb to socket. * Called by pxtcp_pcb_recv() to forward new data and by callout * triggered by POLLOUT on the socket to send previously unsent data. * (Re)scehdules one-time callout if not all data are sent. * TODO: This is where application-level proxy can hook into * to process outbound traffic. /* successfully sent this chain fragment completely */ /* successfully sent only some data */ /* find the first pbuf that was not completely forwarded */ * Some errors are really not errors - if we get them, * it's not different from getting nsent == 0, so filter if (q ==
NULL) {
/* everything is forwarded? */ /* free forwarded pbufs at the beginning of the chain */ /* advance payload pointer past the forwarded part */ * Connection reset will be detected by poll and * pxtcp_schedule_reset() will be called. * Otherwise something *really* unexpected must have happened, /* call error callback manually since we've already dissociated */ /* schedule one-shot POLLOUT on the socket */ #
else /* RT_OS_WINDOWS */#
endif /* RT_OS_WINDOWS */ * Callback from poll manager (on POLLOUT) to send data from * pxtcp::unsent pbuf to socket. * Common poll manager callback used by both outgoing and incoming * (port-forwarded) connections that has connected socket. if (
status < 0) {
/* should not happen */ * If host does not report POLLHUP for closed sockets * (e.g. NetBSD) we should check for full close manually. * Linux and Darwin seems to report POLLHUP when both * directions are shut down. And they do report POLLHUP even * when there's unread data (which they aslo report as POLLIN * along with that POLLHUP). * FreeBSD (from source inspection) seems to follow Linux, * reporting POLLHUP when both directions are shut down, but * POLLHUP is always accompanied with POLLIN. * NetBSD never reports POLLHUP for sockets. * If external half-closes first, we don't get POLLHUP, we * recv 0 bytes from the socket as EOF indicator, stop polling * for POLLIN and poll with events == 0 (with occasional * one-shot POLLOUT). When guest eventually closes, we get * If guest half-closes first things are more tricky. As soon * as host sees the FIN from external it will spam POLLHUP, * even when there's unread data. The problem is that we * might have stopped polling for POLLIN because the ring * buffer is full or we were polling POLLIN but can't read all * of the data becuase buffer doesn't have enough space. * Either way, there's unread data but we can't keep polling /* there's no unread data, we are done */ * We cannot just set a flag here and let pxtcp_pcb_sent() * notice and start pulling, because if we are preempted * before setting the flag and all data in inbuf is ACKed * there will be no more calls to pxtcp_pcb_sent() to * We cannot set a flag and then send a message to make * sure it noticed, because if it has and it has read all * data while the message is in transit it will delete * In a sense this message is like msg_delete (except we * ask to pull some data first). #
endif /* HAVE_TCP_POLLHUP */ * Read data from socket to ringbuf. This may be used both on lwip * and poll manager threads. * Flag pointed to by pstop is set when further reading is impossible, * either temporary when buffer is full, or permanently when EOF is * Returns number of bytes read. NB: EOF is reported as 1! * Returns zero if nothing was read, either because buffer is full, or * if no data is available (EAGAIN, EINTR &c). * Returns -errno on real socket errors. /* lim is the index we can NOT write to */ lim =
sz -
1;
/* empty slot at the end */ lim =
sz;
/* empty slot at the beginning */ * Buffer is full, stop polling for POLLIN. * pxtcp_pcb_sent() will re-enable POLLIN when guest ACKs * data, freeing space in the ring buffer. /* free space in one chunk */ /* free space in two chunks */ * TODO: This is where application-level proxy can hook into to * process inbound traffic. DPRINTF2((
"pxtcp %p: sock %d read %d bytes\n",
DPRINTF2((
"pxtcp %p: sock %d read EOF\n",
/* haven't read anything, just return */ DPRINTF2((
"pxtcp %p: sock %d read cancelled\n",
DPRINTF0((
"pxtcp %p: sock %d read errno %d\n",
#
else /* RT_OS_WINDOWS */#
endif /* RT_OS_WINDOWS */ * Callback from poll manager (pxtcp::msg_inbound) to trigger output * We swtich it on when tcp_write() or tcp_shutdown() fail with * ERR_MEM to prevent connection from stalling. If there are ACKs or * more inbound data then pxtcp_pcb_forward_inbound() will be * triggered again, but if neither happens, tcp_poll() comes to the * If the last thing holding up deletion of the pxtcp was failed * tcp_shutdown() and it succeeded, we may be the last callback. * Forward inbound data from ring buffer to the guest. * Scheduled by poll manager thread after it receives more data into * the ring buffer (we have more data to send). * Also called from tcp_sent() callback when guest ACKs some data, * increasing pcb->snd_buf (we are permitted to send more data). * Also called from tcp_poll() callback if previous attempt to forward * inbound data failed with ERR_MEM (we need to try again). * If we have just confirmed accept of this connection, the * pcb is in SYN_RCVD state and we still haven't received the * ACK of our SYN. It's only in SYN_RCVD -> ESTABLISHED * transition that lwip decrements pcb->acked so that that ACK * is not reported to pxtcp_pcb_sent(). If we send something * now and immediately close (think "daytime", e.g.) while * still in SYN_RCVD state, we will move directly to * FIN_WAIT_1 and when our confirming SYN is ACK'ed lwip will * report it to pxtcp_pcb_sent(). DPRINTF2((
"forward_inbound: pxtcp %p; pcb %p %s - later...\n",
* Else, there's no data to send. * If there is free space in the buffer, producer will * reschedule us as it receives more data and vacant (lim) * If buffer is full when all data have been passed to * tcp_write() but not yet acknowledged, we will advance * unacked on ACK, freeing some space for producer to write to * Can't send anything now. As guest ACKs some data, TCP will * call pxtcp_pcb_sent() callback and we will come here again. * We have three limits to consider: * - how much data we have in the ringbuf * - how much data we are allowed to send if (
lim <
beg) {
/* lim wrapped */ if (
sndbuf <
toeob) {
/* but we are limited by sndbuf */ /* so beg is not going to wrap, treat sndbuf as lim */ lim =
beg +
sndbuf;
/* ... and proceed to the simple case */ else {
/* we are limited by the end of the buffer, beg will wrap */ /* we are done sending, but ... */ DPRINTF2((
"forward_inbound: pxtcp %p, pcb %p: sent %d bytes\n",
if (
nsent > 0) {
/* first write succeeded, second failed */ DPRINTF2((
"forward_inbound: pxtcp %p, pcb %p: sent %d bytes only\n",
DPRINTF((
"forward_inbound: pxtcp %p, pcb %p: ERR_MEM\n",
DPRINTF((
"forward_inbound: pxtcp %p, pcb %p: %s\n",
/* XXX: We shouldn't get ERR_ARG. Check ERR_CONN conditions early? */ DPRINTF((
"inbound_close: pxtcp %p; pcb %p: %s\n",
DPRINTF((
"inbound_close: pxtcp %p; pcb %p:" " tcp_shutdown: error=%s\n",
* If we have already done outbound close previously (passive * close on the pcb), then we must not hold onto a pcb in LAST_ACK * state since those will be deleted by lwip when that last ack * NB: We do NOT check for deferred delete here, even though we * have just set one of its conditions, inbound_close_done. We * let pcb callbacks that called us do that. It's simpler and * Check that all forwarded inbound data is sent and acked, and that * inbound close is scheduled (we aren't called back when it's acked). * tcp_sent() callback - guest acknowledged len bytes. * We can advance inbuf::unacked index, making more free space in the * ringbuf and wake up producer on poll manager thread. * We can also try to send more data if we have any since pcb->snd_buf * was increased and we are now permitted to send more. DPRINTF2((
"%s: pxtcp %p; pcb %p: +%d ACKed:" " unacked %d, unsent %d, vacant %d\n",
if (
/* __predict_false */ len == 0) {
/* we are notified to start pulling */ * Advance unacked index. Guest acknowledged the data, so it * won't be needed again for potential retransmits. /* arrange for more inbound data */ /* wake up producer, in case it has stopped polling for POLLIN */ * We have't got enought room in ring buffer to read atm, * but we don't want to lose notification from WSAW4ME when * space would be available, so we reset event with empty recv * Since we are pulling, pxtcp is no longer registered * with poll manager so we can kill it directly. /* forward more data if we can */ * NB: we might have dissociated from a pcb that transitioned * to LAST_ACK state, so don't refer to pcb below. /* have we got all the acks? */ DPRINTF((
"%s: pxtcp %p; pcb %p; all data ACKed\n",
/* no more retransmits, so buf is not needed */ /* no more acks, so no more callbacks */ * We may be the last callback for this pcb if we have also * successfully forwarded inbound_close. * Callback from poll manager (pxtcp::msg_inpull) to switch * pxtcp_pcb_sent() to actively pull the last bits of input. See * POLLHUP comment in pxtcp_pmgr_pump(). * pxtcp::sock is deregistered from poll manager after this callback * pcb is not passed to this callback since it may be already * deallocated by the stack, but we can't do anything useful with it * anyway since connection is gone. * ERR_CLSD is special - it is reported here when: * . guest has already half-closed * . we send FIN to guest when external half-closes * Since connection is closed but receive has been already closed * lwip can only report this via tcp_err. At this point the pcb * is still alive, so we can peek at it if need be. * The interesting twist is when the ACK from guest that akcs our * FIN also acks some data. In this scenario lwip will NOT call * tcp_sent() callback with the ACK for that last bit of data but * instead will call tcp_err with ERR_CLSD right away. Since that * ACK also acknowledges all the data, we should run some of * pxtcp_pcb_sent() logic here. " unacked %d, unsent %d, vacant %d\n",
DPRINTF0((
"tcp_err: pxtcp=%p, error=%s\n",