socket.c revision d98c74e2ec5b96bd22aa4ed6d893e8993787493b
3853N/A * Copyright (C) 1998, 1999, 2000 Internet Software Consortium. 3853N/A * Permission to use, copy, modify, and distribute this software for any 3853N/A * purpose with or without fee is hereby granted, provided that the above 3853N/A * copyright notice and this permission notice appear in all copies. 3853N/A * THE SOFTWARE IS PROVIDED "AS IS" AND INTERNET SOFTWARE CONSORTIUM DISCLAIMS 3853N/A * ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES 3853N/A * OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL INTERNET SOFTWARE 3853N/A * CONSORTIUM BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL 3853N/A * DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR 3853N/A * PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS 3853N/A * ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS 3853N/A * Some systems define the socket length argument as an int, some as size_t, 3853N/A * some as socklen_t. This is here so it can be easily changed if needed. 3853N/A * Define what the possible "soft" errors can be. These are non-fatal returns 3853N/A * of various network related functions, like recv() and so on. 3853N/A * For some reason, BSDI (and perhaps others) will sometimes return <0 3853N/A * from recv() but will have errno==0. This is broken, but we have to 3853N/A * DLVL(70) -- Socket "correctness" -- including returning of events, etc. 3853N/A * IPv6 control information. If the socket is an IPv6 socket we want 3853N/A * to collect the destination address and interface so the client can 3853N/A * set them on outgoing packets. 3853N/A * NetBSD and FreeBSD can timestamp packets. XXXMLG Should we have 3853N/A * a setsockopt() like interface to request timestamps, and if the OS 3853N/A * doesn't do it for us, call gettimeofday() on every UDP receive? 3853N/A * Check to see if we have even basic support for cracking messages from 3853N/A /* Locked by socket lock. */ 3853N/A * Internal events. Posted when a descriptor is readable or 3853N/A * writable. These are statically allocated and never freed. 3853N/A * They will be set to non-purgable before use. 3853N/A /* Locked by manager lock. */ 3853N/A * send() and recv() iovec counts 3853N/A * Poke the select loop when there is something for us to do. 3853N/A * We assume that if a write completes here, it will be inserted into the 3853N/A * queue fully. That is, we will not get partial writes. 3853N/A "write() failed during watcher poke: %s",
3853N/A * read a message on the internal fd. 3853N/A "read() failed during watcher poke: %s",
"fcntl(%d, F_SETFL, %d): %s",
* Process control messages received on a socket. * sock is used only when ISC_NET_BSD44MSGHDR and USE_CMSG are defined. * msg and dev are used only when ISC_NET_BSD44MSGHDR is defined. * They are all here, outside of the CPP tests, because it is * more consistent with the usual ISC coding style. #
else /* defined ISC_NET_BSD44MSGHDR */ "interface received on ifindex %u",
#
endif /* ISC_NET_BSD44MSGHDR */ * Construct an iov array and attach it to the msghdr passed in. Return * 0 on success, non-zero on failure. This is the SEND constructor, which * will used the used region of the buffer (if using a buffer list) or * will use the internal region (if a single buffer I/O is requested). * Nothing can be NULL, and the done event must list at least one buffer * on the buffer linked list for this function to be meaningful. * If write_countp != NULL, *write_countp will hold the number of bytes * this transaction can send. * Single buffer I/O? Skip what we've done so far in this region. * Skip the data in the buffer list that we have already written. "sendto pktinfo data, ifindex %u",
#
else /* ISC_NET_BSD44MSGHDR */#
endif /* ISC_NET_BSD44MSGHDR */ * Construct an iov array and attach it to the msghdr passed in. Return * 0 on success, non-zero on failure. This is the RECV constructor, which * will use the avialable region of the buffer (if using a buffer list) or * will use the internal region (if a single buffer I/O is requested). * Nothing can be NULL, and the done event must list at least one buffer * on the buffer linked list for this function to be meaningful. * If read_countp != NULL, *read_countp will hold the number of bytes * this transaction can receive. /* If needed, steal one iovec for overflow detection. */ * Single buffer I/O? Skip what we've done so far in this region. * If needed, set up to receive that one extra byte. Note that * we know there is at least one iov left, since we stole it * at the top of this function. #
else /* ISC_NET_BSD44MSGHDR */#
endif /* ISC_NET_BSD44MSGHDR */ printf(
"\t\t%d\tbase %p, len %d\n", i,
#
define DOIO_SOFT 1 /* i/o ok, soft error, no event sent */#
define DOIO_HARD 2 /* i/o error, event sent */#
define DOIO_EOF 3 /* EOF, no event sent */ "doio_recv: recvmsg(%d) %d bytes, err %d/%s",
* On TCP, zero length reads indicate EOF, while on * UDP, zero length reads are perfectly valid, although * Overflow bit detection. If we received MORE bytes than we should, * this indicates an overflow situation. Set the flag in the * dev entry and adjust how much we read by one. * If there are control messages attached, run through them and pull * out the interesting bits. * update the buffers (if any) and the i/o count * If we read less than we expected, update counters, * and let the upper layer poke the descriptor. * full reads are posted, or partials if partials are ok. /* XXXMLG Should verify that we didn't overflow MAXSCATTERGATHER? */ * check for error or block condition * The other error types depend on whether or not the * socket is UDP or TCP. If it is UDP, some errors * that we expect to be fatal under TCP are merely * annoying, and are really soft errors. * However, these soft errors are still returned as "internal_send: send() returned 0");
* if we write less than we expected, update counters, * Exactly what we wanted to write. We're done with this * entry. Post its completion event. * Caller must ensure that the socket is not locked and no external * No one has this socket open, so the watcher doesn't have to be * poked, and the socket doesn't have to be locked. * XXX should reset manager->maxfd here #
if USE_CMSG /* Let's hope the OSs are sane, and pad correctly XXXMLG */ * set up list of readers and writers to be initially empty "isc_mutex_init() failed");
* Initialize readable and writable events err2:
/* cmsg allocated */ err1:
/* socket allocated */ * This event requires that the various lists be empty, that the reference * count be 1, and that the magic number is valid. The other socket bits, * like the lock, must be initialized as well. The fd associated must be * marked as closed, by setting it to -1 on close, or this routine will * Create a new 'type' socket managed by 'manager'. The sockets * parameters are specified by 'expires' and 'interval'. Events * will be posted to 'task' and when dispatched 'action' will be * called with 'arg' as the arg value. The new socket is returned (
void *)&
on,
sizeof on) < 0) {
"setsockopt(%d) failed",
sock->
fd);
#
endif /* SO_TIMESTAMP */ (
void *)&
on,
sizeof (
on)) < 0)) {
"setsockopt(%d) failed: %s",
#
endif /* ISC_PLATFORM_HAVEIPV6 */ * Note we don't have to lock the socket like we normally would because * there are no external references to it yet. * Attach to a socket. Caller must explicitly detach when it is done. * Dereference a socket. If this is the last reference to it, clean things * up by destroying the socket. * I/O is possible on a given socket. Schedule an event to this task that * will call an internal function to do the I/O. This will charge the * task with the I/O operation and let our select loop handler get back * to doing something real as fast as possible. * The socket and manager must be locked before calling this function. * Dispatch an internal accept event. * Are there any done events left, or were they all canceled * before the manager got the socket lock? * Dequeue an item off the given socket's read queue, set the result code * in the done event to the one provided, and send it to the task it was * If the event to be sent is on a list, remove it before sending. If * asked to, send and detach from the socket as well. * Caller must have the socket locked. * See comments for send_recvdone_event() above. * Caller must have the socket locked. * Call accept() on a socket, to get the new file descriptor. The listen * socket is used as a prototype to create a new isc_socket_t. The new * socket has one outstanding reference. The task receiving the event * will be detached from just after the event is delivered. * On entry to this function, the event delivered is the internal * readable event, and the first item on the accept_list should be * the done event we want to send. If the list is empty, this is a no-op, * so just unlock and return. * Get the first item off the accept list. * If it is empty, unlock the socket and return. * Try to accept the new connection. If the accept fails with * EAGAIN or EINTR, simply poke the watcher to watch this socket * If some other error, ignore it as well and hope * for the best, but log it. "internal_accept: accept() failed: %s",
* Pull off the done event. * Poke watcher if there are more pending accepts. "internal_accept: make_nonblock() failed: %s",
* -1 means the new socket didn't happen. * Save away the remote address "accepted connection, new socket %p",
* Fill in the done event details and send it off. "internal_recv: task %p got event %p",
me,
ev,
sock);
* Try to do as much I/O as possible on this socket. There are no * limits here, currently. If some sort of quantum read count is * desired before giving up control, make certain to process markers * If this is a marker event, post its completion and * read of 0 means the remote end was closed. * Run through the event queue and dispatch all * the events with an EOF result code. This will * set the EOF flag in markers as well, but * Find out what socket this is and lock it. "internal_send: task %p got event %p",
me,
ev,
sock);
* Try to do as much I/O as possible on this socket. There are no * limits here, currently. If some sort of quantum write count is * desired before giving up control, make certain to process markers * If this is a marker event, post its completion and * This is the thread that will loop forever, always in a select or poll * When select returns something to do, track down what thread gets to do * this I/O and post the event to it. * Get the control fd here. This will never change. "select(%d, ...) == %d, errno %d/%s",
* Process reads on internal, control fd. "watcher got message %d",
msg);
* handle shutdown message. We really should * jump out of this loop right away, but * it doesn't matter if we have to do a little * This is a wakeup on a socket. Look * at the event queue for both read and write, * and decide if we need to watch on it now * If there are no events, or there * is an event but we have already * queued up the internal event on a * task's queue, clear the bit. * Process read/writes on other fds here. Avoid locking * and unlocking twice if both reads and writes are possible. for (i = 0 ; i <
maxfd ; i++) {
* Create a new socket manager. "isc_mutex_init() failed");
"isc_condition_init() failed");
* Create the special fds that will be used to wake up the * select/poll loop when something internal needs to be done. * Set up initial state for the select loop "isc_thread_create() failed");
* Destroy a socket manager. * Wait for all sockets to be destroyed. * Here, poke our select/poll thread. Do this by closing the write * half of the pipe, which will send EOF to the read half. * Wait for thread to exit. "isc_thread_join() failed");
*** From here down, only ISC_R_SUCCESS can be returned. Any further *** error information will result in the done event being posted *** to the task rather than this function failing. * UDP sockets are always partial read * Move each buffer from the passed in list to our internal one. * If the read queue is empty, try to do the I/O right now. * We couldn't read all or part of the request right now, so queue * Attach to socket and to task * Enqueue the request. If the socket was previously not being * watched, poke the watcher to start paying attention to it. "isc_socket_recvv: event %p -> task %p",
dev,
ntask);
* UDP sockets are always partial read * If the read queue is empty, try to do the I/O right now. * We couldn't read all or part of the request right now, so queue * Attach to socket and to task * Enqueue the request. If the socket was previously not being * watched, poke the watcher to start paying attention to it. "isc_socket_recv: event %p -> task %p",
dev,
ntask);
* REQUIRE() checking performed in isc_socket_sendto() "pktinfo structure provided, ifindex %u (set to 0)",
* Set the pktinfo index to 0 here, to let the kernel decide * what interface it should send on. * If the write queue is empty, try to do the I/O right now. * We couldn't send all or part of the request right now, so queue * Enqueue the request. If the socket was previously not being * watched, poke the watcher to start paying attention to it. "isc_socket_sendto: event %p -> task %p",
dev,
ntask);
*** From here down, only ISC_R_SUCCESS can be returned. Any further *** error information will result in the done event being posted *** to the task rather than this function failing. * Move each buffer from the passed in list to our internal one. * If the read queue is empty, try to do the I/O right now. * We couldn't send all or part of the request right now, so queue * Enqueue the request. If the socket was previously not being * watched, poke the watcher to start paying attention to it. "isc_socket_sendtov: event %p -> task %p",
dev,
ntask);
* Set up to listen on a given socket. We do this by creating an internal * event that will be dispatched when the socket has read activity. The * watcher will send the internal event to the task when there is a new * Unlike in read, we don't preallocate a done event here. Every time there * is a new connection we'll have to allocate a new one anyway, so we might * as well keep things simple rather than having to track them. * This should try to do agressive accept() XXXMLG * Sender field is overloaded here with the task we will be sending * this event to. Just before the actual event is delivered the * actual ev_sender will be touched up to be the socket. * Attach to socket and to task * poke watcher here. We still have the socket locked, so there * is no race condition. We will keep the lock for such a short * bit of time waking it up now or later won't matter all that much. * Try to do the connect right away, as there can be only one * outstanding, and it might happen to complete. * If connect completed, fire off the done event * poke watcher here. We still have the socket locked, so there * is no race condition. We will keep the lock for such a short * bit of time waking it up now or later won't matter all that much. * Called when a socket with a pending connect() finishes. * When the internal event was sent the reference count was bumped * to keep the socket around for us. Decrement the count here. * Has this event been canceled? * Get any possible error status here. * If the error is EAGAIN, just re-select on this * fd and pretend nothing strange happened. * Translate other errors into ISC_R_* flavors. "internal_connect: connect() %s",
* Run through the list of events on this socket, and cancel the ones * queued for task "task" of type "how". "how" is a bitmask. * Quick exit if there is nothing to do. Don't even bother locking * All of these do the same thing, more or less. * o If the internal event is marked as "posted" try to * remove it from the task's queue. If this fails, mark it * as canceled instead, and let the task clean it up later. * o For each I/O request for that task of that type, post * its done event with status of "ISC_R_CANCELED". * o Reset any state needed. * Connecting is not a list. * Need to guess if we need to poke or not... XXX * If the queue is empty, simply return the last error we got on * this socket as the result code, and send off the done event. * Bad luck. The queue wasn't empty. Insert this in the proper "isc_socket_recvmark: event %p -> task %p",
dev,
ntask);
* If the queue is empty, simply return the last error we got on * this socket as the result code, and send off the done event. * Bad luck. The queue wasn't empty. Insert this in the proper "isc_socket_sendmark: event %p -> task %p",
dev,
ntask);