2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright (c) 1993, 2012, Oracle and/or its affiliates. All rights reserved. 2N/A * switch for kernel async I/O 2N/Aint _kaio_ok = 0;
/* 0 = disabled, 1 = on, -1 = error */ 2N/A * Key for thread-specific data 2N/A * Array for determining whether or not a file supports kaio. 2N/A * Initialized in _kaio_init(). 2N/A * (__aio_mutex lock protects circular linked list of workers) 2N/A * worker for notification requests. 2N/Aint hz;
/* clock ticks per second */ 2N/A * The aio subsystem is initialized when an AIO request is made. 2N/A * Constants are initialized like the max number of workers that 2N/A * the subsystem can create, and the minimum number of workers 2N/A * permitted before imposing some restrictions. Also, some 2N/A * workers are created. 2N/A * Allocate and initialize the hash table. 2N/A * Do this only once, even if __uaio_init() is called twice. 2N/A * Initialize worker's signal mask to only catch SIGAIOCANCEL. 2N/A * Create one worker to send asynchronous notifications. 2N/A * Do this only once, even if __uaio_init() is called twice. 2N/A * And later check whether atleast one worker is created; 2N/A * lwp_create() calls could fail because of segkp exhaustion. 2N/A * Called from close() before actually performing the real _close(). 2N/A if (
fd < 0)
/* avoid cancelling everything */ 2N/A * Cancel all outstanding aio requests for this file descriptor. 2N/A * If we have allocated the bit array, clear the bit for this file. 2N/A * The next open may re-use this file descriptor and the new file 2N/A * may have different kaio() behaviour. 2N/A * special kaio cleanup thread sits in a loop in the 2N/A * kernel waiting for pending kaio requests to complete. 2N/A#
endif /* !defined(_LP64) */ 2N/A /* initialize kaio */ 2N/A * _aio_do_request() needs the original request code (mode) to be able 2N/A * to choose the appropiate 32/64 bit function. All other functions 2N/A * only require the difference between READ and WRITE (umode). 2N/A * Try kernel aio first. 2N/A * _aio_do_request() checks reqp->req_op to differentiate 2N/A * between 32 and 64 bit access. 2N/A * _aio_req_add() only needs the difference between READ and 2N/A * WRITE to choose the right worker queue. 2N/A * This must be asynch safe and cancel safe 2N/A * Check for a valid specified wait time. 2N/A * If it is invalid, fail the call right away. 2N/A /* aiowait() awakened by an aionotify() */ 2N/A /* time is up; return */ 2N/A * Some time left. Round up the remaining time 2N/A * in nanoseconds to microsec. Retry the call. 2N/A * _aio_get_timedelta calculates the remaining time and stores the result 2N/A * into timespec_t *wait. 2N/A * If closing by file descriptor: we will simply cancel all the outstanding 2N/A * aio`s and return. Those aio's in question will have either noticed the 2N/A * cancellation notice before, during, or after initiating io. 2N/A * finally, check if there are requests on the done queue that 2N/A * should be canceled. 2N/A /* this should be the last req in list */ 2N/A * Cancel requests from a given work queue. If the file descriptor 2N/A * parameter, fd, is non-negative, then only cancel those requests 2N/A * in this queue that are to this file descriptor. If the fd 2N/A * parameter is -1, then cancel all requests. 2N/A * cancel queued requests first. 2N/A * Callers locks were dropped. 2N/A * reqp is invalid; start traversing 2N/A * the list from the beginning again. 2N/A * Since the queued requests have been canceled, there can 2N/A * only be one inprogress request that should be canceled. 2N/A * Cancel a request. Return 1 if the callers locks were temporarily 2N/A * dropped, otherwise return 0. 2N/A * If not on the done queue yet, just mark it CANCELED, 2N/A * _aio_work_done() will do the necessary clean up. 2N/A * This is required to ensure that aiocancel_all() cancels 2N/A * all the outstanding requests, including this one which 2N/A * is not yet on done queue but has been marked done. 2N/A /* Cancel the queued aio_fsync() request */ 2N/A * Set the result values now, before _aiodone() is called. 2N/A * We do this because the application can expect aio_return 2N/A * and aio_errno to be set to -1 and ECANCELED, respectively, 2N/A * immediately after a successful return from aiocancel() 2N/A * Put the new worker thread in the right queue. 2N/A * This is the worker's main routine. 2N/A * The task of this function is to execute all queued requests; 2N/A * once the last pending request is executed this function will block 2N/A * in _aio_idle(). A new incoming request must wakeup this thread to 2N/A * Every worker has an own work queue. The queue lock is required 2N/A * to synchronize the addition of new requests for this worker or 2N/A * Cancellation scenarios: 2N/A * The cancellation of a request is being done asynchronously using 2N/A * _aio_cancel_req() from another thread context. 2N/A * A queued request can be cancelled in different manners : 2N/A * a) request is queued but not "in progress" or "done" (AIO_REQ_QUEUED): 2N/A * - lock the queue -> remove the request -> unlock the queue 2N/A * b) request is in progress (AIO_REQ_INPROGRESS) : 2N/A * - this function first allow the cancellation of the running 2N/A * request with the flag "work_cancel_flg=1" 2N/A * see _aio_req_get() -> _aio_cancel_on() 2N/A * During this phase, it is allowed to interrupt the worker 2N/A * thread running the request (this thread) using the SIGAIOCANCEL 2N/A * Once this thread returns from the kernel (because the request 2N/A * is just done), then it must disable a possible cancellation 2N/A * and proceed to finish the request. To disable the cancellation 2N/A * this thread must use _aio_cancel_off() to set "work_cancel_flg=0". 2N/A * c) request is already done (AIO_REQ_DONE || AIO_REQ_DONEQ): 2N/A * same procedure as in a) 2N/A * This thread uses sigsetjmp() to define the position in the code, where 2N/A * it wish to continue working in the case that a SIGAIOCANCEL signal 2N/A * Normally this thread should get the cancellation signal during the 2N/A * kernel phase (reading or writing). In that case the signal handler 2N/A * aiosigcancelhndlr() is activated using the worker thread context, 2N/A * which again will use the siglongjmp() function to break the standard 2N/A * code flow and jump to the "sigsetjmp" position, provided that 2N/A * "work_cancel_flg" is set to "1". 2N/A * Because the "work_cancel_flg" is only manipulated by this worker 2N/A * thread and it can only run on one CPU at a given time, it is not 2N/A * necessary to protect that flag with the queue lock. 2N/A * Returning from the kernel (read or write system call) we must 2N/A * first disable the use of the SIGAIOCANCEL signal and accordingly 2N/A * the use of the siglongjmp() function to prevent a possible deadlock: 2N/A * - It can happens that this worker thread returns from the kernel and 2N/A * blocks in "work_qlock1", 2N/A * - then a second thread cancels the apparently "in progress" request 2N/A * and sends the SIGAIOCANCEL signal to the worker thread, 2N/A * - the worker thread gets assigned the "work_qlock1" and will returns 2N/A * - the kernel detects the pending signal and activates the signal 2N/A * - if the "work_cancel_flg" is still set then the signal handler 2N/A * should use siglongjmp() to cancel the "in progress" request and 2N/A * it would try to acquire the same work_qlock1 in _aio_req_get() 2N/A * for a second time => deadlock. 2N/A * To avoid that situation we disable the cancellation of the request 2N/A * in progress BEFORE we try to acquire the work_qlock1. 2N/A * In that case the signal handler will not call siglongjmp() and the 2N/A * worker thread will continue running the standard code flow. 2N/A * Then this thread must check the AIO_REQ_CANCELED flag to emulate 2N/A * an eventually required siglongjmp() freeing the work_qlock1 and 2N/A * avoiding a deadlock. 2N/A * We resume here when an operation is cancelled. 2N/A * On first entry, aiowp->work_req == NULL, so all 2N/A * we do is block SIGAIOCANCEL. 2N/A * Put completed requests on aio_done_list. This has 2N/A * to be done as part of the main loop to ensure that 2N/A * we don't artificially starve any aiowait'ers. 2N/A /* consume any deferred SIGAIOCANCEL signal here */ 2N/A * The SUSv3 POSIX spec for aio_write() states: 2N/A * If O_APPEND is set for the file descriptor, 2N/A * write operations append to the file in the 2N/A * same order as the calls were made. 2N/A * but, somewhat inconsistently, it requires pwrite() 2N/A * to ignore the O_APPEND setting. So we have to use 2N/A * fcntl() to get the open modes and call write() for 2N/A * the O_APPEND case. 2N/A * The SUSv3 POSIX spec for aio_write() states: 2N/A * If O_APPEND is set for the file descriptor, 2N/A * write operations append to the file in the 2N/A * same order as the calls were made. 2N/A * but, somewhat inconsistently, it requires pwrite() 2N/A * to ignore the O_APPEND setting. So we have to use 2N/A * fcntl() to get the open modes and call write() for 2N/A * the O_APPEND case. 2N/A#
endif /* !defined(_LP64) */ 2N/A * All writes for this fsync request are now 2N/A * acknowledged. Now make these writes visible 2N/A * and put the final request into the hash table. 2N/A "request already in hash table");
2N/A * Perform the tail processing for _aio_do_request(). 2N/A * The in-progress request may or may not have been cancelled. 2N/A * If it was canceled, this request will not be 2N/A * added to done list. Just free it. 2N/A * Notify any thread that may have blocked 2N/A * because it saw an outstanding request. 2N/A * Sleep for 'ticks' clock ticks to give somebody else a chance to run, 2N/A * hopefully to consume one of our queued signals. 2N/A * Actually send the notifications. 2N/A * We could block indefinitely here if the application 2N/A * is not listening for the signal or port notifications. 2N/A * Asynchronous notification worker. 2N/A * This isn't really necessary. All signals are blocked. 2N/A * Notifications are never cancelled. 2N/A * All signals remain blocked, forever. 2N/A * Do the completion semantics for a request that was either canceled 2N/A * by _aio_cancel_req() or was completed by _aio_do_request(). 2N/A * We call _aiodone() only for Posix I/O. 2N/A * Figure out the notification parameters while holding __aio_mutex. 2N/A * Actually perform the notifications after dropping __aio_mutex. 2N/A * This allows us to sleep for a long time (if the notifications 2N/A * incur delays) without impeding other async I/O operations. 2N/A * __aio_waitn() sets AIO_WAIT_INPROGRESS and 2N/A * __aio_suspend() increments "_aio_kernel_suspend" 2N/A * when they are waiting in the kernel for completed I/Os. 2N/A * _kaio(AIONOTIFY) awakes the corresponding function 2N/A * in the kernel; then the corresponding __aio_waitn() or 2N/A * __aio_suspend() function could reap the recently 2N/A * completed I/Os (_aiodone()). 2N/A * If all the lio requests have completed, 2N/A * prepare to notify the waiting thread. 2N/A }
else {
/* thread or port */ 2N/A * The request is completed; now perform the notifications. 2N/A * We usually put the request on the notification 2N/A * queue because we don't want to block and delay 2N/A * other operations behind us in the work queue. 2N/A * Also we must never block on a cancel notification 2N/A * because we are being called from an application 2N/A * thread in this case and that could lead to deadlock 2N/A * if no other thread is receiving notificatins. 2N/A * We already put the request on the done queue, 2N/A * so we can't queue it to the notification queue. 2N/A * Just do the notification directly. 2N/A * Delete fsync requests from list head until there is 2N/A * only one left. Return 0 when there is only one, 2N/A * otherwise return a non-zero value. 2N/A * A worker is set idle when its work queue is empty. 2N/A * The worker checks again that it has no more work 2N/A * and then goes to sleep waiting for more work. 2N/A * A cancellation handler is not needed here. 2N/A * aio worker threads are never cancelled via pthread_cancel(). 2N/A * The idle flag is normally cleared before worker is awakened 2N/A * by aio_req_add(). On error (EINTR), we clear it ourself. 2N/A * A worker's completed AIO requests are placed onto a global 2N/A * done queue. The application is only sent a SIGIO signal if 2N/A * the process has a handler enabled and it is not waiting via 2N/A * Request got cancelled after it was marked done. This can 2N/A * happen because _aio_finish_request() marks it AIO_REQ_DONE 2N/A * and drops all locks. Don't add the request to the done 2N/A * queue and just discard it. 2N/A * The done queue consists of AIO requests that are in either the 2N/A * AIO_REQ_DONE or AIO_REQ_CANCELED state. Requests that were cancelled 2N/A * are discarded. If the done queue is empty then NULL is returned. 2N/A * Otherwise the address of a done aio_result_t is returned. 2N/A /* is queue empty? */ 2N/A * Set the return and errno values for the application's use. 2N/A * For the Posix interfaces, we must set the return value first followed 2N/A * by the errno value because the Posix interfaces allow for a change 2N/A * in the errno value from EINPROGRESS to something else to signal 2N/A * the completion of the asynchronous request. 2N/A * The opposite is true for the Solaris interfaces. These allow for 2N/A * a change in the return value from AIO_INPROGRESS to something else 2N/A * to signal the completion of the asynchronous request. 2N/A * Add an AIO request onto the next work queue. 2N/A * A circular list of workers is used to choose the next worker. 2N/A * Try to acquire the next worker's work queue. If it is locked, 2N/A * then search the list of workers until a queue is found unlocked, 2N/A * or until the list is completely traversed at which point another 2N/A * worker will be created. 2N/A /* try to find an idle worker */ 2N/A /* try to acquire some worker's queue lock */ 2N/A * Create more workers when the workers appear overloaded. 2N/A * Either all the workers are busy draining their queues 2N/A * or no worker's queue lock could be acquired. 2N/A * No worker available and we have created 2N/A * _max_workers, keep going through the 2N/A * list slowly until we get a lock 2N/A * give someone else a chance 2N/A * Put request onto worker's work queue. 2N/A * Awaken worker if it is not currently active. 2N/A * Get an AIO request for a specified worker. 2N/A * If the work queue is empty, return NULL. 2N/A * Remove a POSIX request from the queue; the 2N/A * request queue is a singularly linked list 2N/A * with a previous pointer. The request is 2N/A * removed by updating the previous pointer. 2N/A * Non-posix requests are left on the queue 2N/A * to eventually be placed on the done queue. 2N/A * if this is the first request on the queue, move 2N/A * the lastrp pointer forward. 2N/A * if this request is pointed by work_head1, then 2N/A * make work_head1 point to the last request that is 2N/A * present on the queue. 2N/A * work_prev1 is used only in non posix case and it 2N/A * points to the current AIO_REQ_INPROGRESS request. 2N/A * If work_prev1 points to this request which is being 2N/A * deleted, make work_prev1 NULL and set work_done1 2N/A * A worker thread can be processing only one request 2N/A * caller owns the _aio_mutex 2N/A /* request in done queue */ 2N/A /* only one request on queue */ 2N/A /* only one request on queue */ 2N/A * An AIO request is identified by an aio_result_t pointer. The library 2N/A * maps this aio_result_t pointer to its internal representation using a 2N/A * hash table. This function adds an aio_result_t pointer to the hash table. 2N/A * Remove an entry from the hash table. 2N/A * find an entry in the hash table 2N/A * AIO interface for POSIX 2N/A /* initialize kaio */ 2N/A * If we have been called because a list I/O 2N/A * kaio() failed, we dont want to repeat the 2N/A * Try kernel aio first. 2N/A * fall back to the thread implementation. 2N/A * If an LIO request, add the list head to the aio request 2N/A * Reuse the sigevent structure to contain the port number 2N/A * and the user value. Same for SIGEV_THREAD, below. 2N/A * The sigevent structure contains the port number 2N/A * and the user value. Same for SIGEV_PORT, above. 2N/A * 64-bit AIO interface for POSIX 2N/A /* initialize kaio */ 2N/A * If we have been called because a list I/O 2N/A * kaio() failed, we dont want to repeat the 2N/A * Try kernel aio first. 2N/A * fall back to the thread implementation. 2N/A * If an LIO request, add the list head to the aio request 2N/A#
endif /* !defined(_LP64) */