Lines Matching refs:to

4  * The contents of this file are subject to the terms of the
23 * Use is subject to license terms.
41 * Pseudo-code to aid in understanding the control flow of the
46 * ! Determine whether to use the FP register version
50 * ! dst addresses can be aligned to long word, word,
61 * go to small_copy; ! to speed short copies
65 * go to small_copy;
67 * go to small_copy;
68 * go to FPBLK_copy;
72 * go to small_copy;
74 * go to small_copy;
75 * go to FPBLK_copy;
79 * go to small_copy;
81 * go to small_copy;
82 * go to FPBLK_copy;
86 * go to small_copy;
88 * go to small_copy;
89 * go to FPBLK_copy;
96 * go to sm_left; ! special finish up code
99 * go to sm_med ! tuned by alignment
124 * go to sm_movebytes
126 * go to sm_movehalf
128 * go to sm_moveword
135 * if one byte left, go to sm_byte
136 * else go to sm_half
143 * if one byte left, go to sm_byte
144 * else go to sm_half
147 * move a byte if needed to align src on halfword
150 * if one byte left, go to sm_byte
151 * else go to sm_half
164 * ! Kernel threads do not have pcb's in which to store
206 * We've tried to restore fp state from the stack and failed. To
213 * The initial optimization decision in this code is to determine
214 * whether to use the FP registers for a copy or not. If we don't
217 * is required, allowing short copies to be completed more quickly.
219 * dst do not align to allow simple ldx,stx operation), the FP
225 * moved whether the FP registers need to be saved, and some other
226 * minor issues. The average additional overhead is estimated to be
228 * around 10 clocks, elaborate calculation would slow down to all
243 * data to gain much benefit from prefetching. But when there
244 * is more data and that data is not in cache, failing to prefetch
246 * which will cause the non-FPBLK inner loop to slow for larger copies.
252 * risk while still gaining the primary benefits of the improvements to
255 * of hw_copy_limit_* can be used to make further adjustments if
264 * some initial alignment activity of moving 0 to 3 bytes,
275 * If hw_copy_limit_? is set to zero, then use of FPBLK copy is
277 * If hw_copy_limit_? is set to a value between 1 and VIS_COPY_THRESHOLD (256)
280 * It is provided to allow for disabling FPBLK copies and to allow
286 * threshold to speedup all shorter copies (less than 256). That
297 * But, tests on running kernels show that src and dst to copy code
303 * Several times, tests for length are made to split the code
304 * into subcases. These tests often allow later tests to be
307 * to use a 4-way unrolled loop for the general byte copy case
311 * align src and dst. We try to minimize special case tests in
313 * to the total time.
315 * For the medium sized cases, we allow ourselves to adjust the
318 * to decide between short and medium size was chosen to be 39
323 * to be increases, this number would also need to be adjusted.
333 * loops to insure that loops are aligned so that their instructions
337 * to be readjusted. Misaligned loops can add a clock per loop
338 * iteration to the loop timing.
340 * In a few cases, code is duplicated to avoid a branch. Since
345 * loop needs to be explained as it is not standard. Two
349 * cache line more time to reach the processor for systems with
351 * can cause that prefetch to be dropped. Putting a second
364 * When a copyOP decides to use fp we may have to preserve existing
365 * floating point state. It is not the caller's state that we need to
380 * We therefore need a per-call place in which to preserve fp state -
387 * return to a caller which may initiate other fp operations that could
392 * to the registered lofault handler. There is no need for any
393 * membars for these - eg, our store to t_lofault will always be visible to
398 * to or from userland the extent of the damage is known - the destination
399 * buffer is incomplete. So trap handlers will trampoline to the lofault
400 * handler in this case which should take some form of error action to
414 * is no need to repeat this), and we must force delivery of deferred
416 * Failure to do so results in lost kernel state being interpreted as
421 * point state that does not belong to the caller (see examples above),
422 * we must be careful in how we do this in order to prevent corruption
431 * use. Bit 2 (TRAMP_FLAG) indicates that the call was to bcopy, and a
439 * data from the stack, the error handler can check this flag to see if
442 * 4. Code run under the new lofault handler must be kept to a minimum. In
443 * particular, any calls to FP_ALLOWMIGRATE, which could result in a call
444 * to kpreempt(), should not be made until after the lofault handler has
450 * to "break even" using FP/VIS-accelerated memory operations.
452 * to be moved on entry. Check that code carefully before
456 * This shadows sys/machsystm.h which can't be included due to the lack of
470 * Indicates that we're to trampoline to the error handler.
480 * first prefetch moves data from L2 to L1 (n_reads)
481 * second prefetch moves data from memory to L2 (one_read)
489 * Size of stack frame in order to accomodate a 64-byte aligned
491 * All copy functions use two quadrants of fp registers; to assure a
492 * block-aligned two block buffer in which to save we must reserve
494 * or need to preserve %gsr but we use HWCOPYFRAMESIZE for all.
497 * | We may need to preserve 2 quadrants |
499 * | BST/BLD we need room in which to |
500 * | align to VIS_BLOCKSIZE bytes. So |
503 * | 8 bytes to save %fprs | <-- - SAVED_FPRS_OFFSET
505 * | 8 bytes to save %gsr | <-- - SAVED_GSR_OFFSET
520 * In FP copies if we do not have preserved data to restore over
521 * the fp regs we used then we must zero those regs to avoid
522 * exposing portions of the data to later threads (data security).
569 * Macros to save and restore quadrants 1 and 3 or 2 and 4 to/from the stack.
570 * Used to save and restore in-use fp registers when we want to use FP
571 * and find fp already in use and copy size still large enough to justify
574 * A membar #Sync is needed before save to sync fp ops initiated before
575 * the call to the copy function (by whoever has fp in use); for example
576 * an earlier block load to the quadrant we are about to save may still be
577 * "in flight". A membar #Sync is required at the end of the save to
578 * sync our block store (the copy code is about to begin ldd's to the
582 * the copy operation to complete before we fill the quadrants with their
584 * of the restore complete before we return to whoever has the fp regs
586 * of the copy code to membar #Sync immediately after copy is complete
629 * prevent preemption if there is no t_lwp to save FP state to on context
635 * to use - we just use any outputs we want.
691 * `to' takes a kernel pagefault which cannot be resolved.
699 kcopy(const void *from, void *to, size_t count)
710 bleu,pt %ncc, .kcopy_small ! go to larger cases
723 bleu,pt %ncc, .kcopy_small ! go to small copy
725 ba,pt %ncc, .kcopy_more ! otherwise go to large copy
736 bleu,pt %ncc, .kcopy_small ! go to small copy
738 ba,pt %ncc, .kcopy_more ! otherwise go to large copy
747 bleu,pt %ncc, .kcopy_small ! go to small copy
749 ba,pt %ncc, .kcopy_more ! otherwise go to large copy
757 bleu,pt %ncc, .kcopy_small ! go to small copy
759 ba,pt %ncc, .kcopy_more ! otherwise go to large copy
790 and %l6, TRAMP_FLAG, %l0 ! copy trampoline flag to %l0
810 ! Need to cater for the different expectations of kcopy
812 ! If it fires, we're expected to just return the error code
813 ! and *not* to invoke any existing error handler. As far as
815 ! existing lofault handler. In that case we're expected to
844 .asciz "Unable to restore fp state after copy operation"
875 * Copy a block of storage - must not overlap (from + len <= to).
886 bcopy(const void *from, void *to, size_t count)
894 bleu,pt %ncc, .bcopy_small ! go to larger cases
907 bleu,pt %ncc, .bcopy_small ! go to small copy
909 ba,pt %ncc, .bcopy_more ! otherwise go to large copy
920 bleu,pt %ncc, .bcopy_small ! go to small copy
922 ba,pt %ncc, .bcopy_more ! otherwise go to large copy
931 bleu,pt %ncc, .bcopy_small ! go to small copy
933 ba,pt %ncc, .bcopy_more ! otherwise go to large copy
941 bleu,pt %ncc, .bcopy_small ! go to small copy
943 ba,pt %ncc, .bcopy_more ! otherwise go to large copy
964 bz,pt %ncc, .bc_sm_word ! branch to word aligned case
966 sub %o2, 3, %o2 ! adjust count to allow cc zero test
1089 ! Now long word aligned and have at least 32 bytes to move
1092 sub %o2, 31, %o2 ! adjust count to allow cc zero test
1106 addcc %o2, 24, %o2 ! restore count to long word offset
1107 ble,pt %ncc, .bc_med_lextra ! check for more long words to move
1139 ! Now word aligned and have at least 36 bytes to move
1142 sub %o2, 15, %o2 ! adjust count to allow cc zero test
1156 addcc %o2, 12, %o2 ! restore count to word offset
1157 ble,pt %ncc, .bc_med_wextra ! check for more words to move
1186 ! Now half word aligned and have at least 38 bytes to move
1189 sub %o2, 7, %o2 ! adjust count to allow cc zero test
1214 * The _more entry points are not intended to be used directly by
1215 * any caller from outside this file. They are provided to allow
1236 ! We need to mark ourselves as being from bcopy since both
1245 * Also, use of FP registers has been tested to be enabled
1274 ! TMP = bytes required to align DST on FP_BLOCK boundary
1457 ovbcopy(const void *from, void *to, size_t count)
1464 bgu,a %ncc, 1f ! nothing to do or bad arguments
1465 subcc %o0, %o1, %o3 ! difference of from and to address
1472 2: cmp %o2, %o3 ! cmp size and abs(from - to)
1475 cmp %o0, %o1 ! compare from and to addresses
1476 blu %ncc, .ov_bkwd ! if from < to, copy backwards
1484 stb %o3, [%o1] ! write to address
1487 inc %o1 ! inc to address
1530 ! %l1 - pointer to saved fpregs
1642 * Transfer data to and from user space -
1654 * allows other callers (e.g. uiomove(9F)) to work correctly.
1659 * which currently are intended to handle requests of <= 16 bytes from
1660 * do_unaligned. Future enhancement to make them handle 8k pages efficiently
1665 * Copy user data to kernel space (copyOP/xcopyOP/copyOP_noerr)
1675 * causes the default handlers to trampoline to the previous handler
1679 * we need to do a HW block copy operation. This saves a window
1694 * Copy kernel data to user space (copyout/xcopyout/xcopyout_little).
1720 * fault occurs in (x)copyin/(x)copyout. In order for this to function
1722 * This allows us to share common code for all the flavors of the copy
1726 * calling REAL_LOFAULT. So the real handler can vector to the appropriate
1781 bleu,pt %ncc, .copyout_small ! go to larger cases
1794 bleu,pt %ncc, .copyout_small ! go to small copy
1796 ba,pt %ncc, .copyout_more ! otherwise go to large copy
1807 bleu,pt %ncc, .copyout_small ! go to small copy
1809 ba,pt %ncc, .copyout_more ! otherwise go to large copy
1818 bleu,pt %ncc, .copyout_small ! go to small copy
1820 ba,pt %ncc, .copyout_more ! otherwise go to large copy
1828 bleu,pt %ncc, .copyout_small ! go to small copy
1830 ba,pt %ncc, .copyout_more ! otherwise go to large copy
1852 bz,pt %ncc, .co_sm_word ! branch to word aligned case
1854 sub %o2, 3, %o2 ! adjust count to allow cc zero test
1987 ! Now long word aligned and have at least 32 bytes to move
1990 sub %o2, 31, %o2 ! adjust count to allow cc zero test
1991 sub %o1, 8, %o1 ! adjust pointer to allow store in
2010 addcc %o2, 24, %o2 ! restore count to long word offset
2011 ble,pt %ncc, .co_med_lextra ! check for more long words to move
2045 ! Now word aligned and have at least 36 bytes to move
2048 sub %o2, 15, %o2 ! adjust count to allow cc zero test
2065 addcc %o2, 12, %o2 ! restore count to word offset
2066 ble,pt %ncc, .co_med_wextra ! check for more words to move
2098 ! Now half word aligned and have at least 38 bytes to move
2101 sub %o2, 7, %o2 ! adjust count to allow cc zero test
2150 * The _more entry points are not intended to be used directly by
2151 * any caller from outside this file. They are provided to allow
2199 ! TMP = bytes required to align DST on FP_BLOCK boundary
2407 bleu,pt %ncc, .xcopyout_small ! go to larger cases
2420 bleu,pt %ncc, .xcopyout_small ! go to small copy
2422 ba,pt %ncc, .xcopyout_more ! otherwise go to large copy
2433 bleu,pt %ncc, .xcopyout_small ! go to small copy
2435 ba,pt %ncc, .xcopyout_more ! otherwise go to large copy
2444 bleu,pt %ncc, .xcopyout_small ! go to small copy
2446 ba,pt %ncc, .xcopyout_more ! otherwise go to large copy
2454 bleu,pt %ncc, .xcopyout_small ! go to small copy
2456 ba,pt %ncc, .xcopyout_more ! otherwise go to large copy
2553 * Copy user data to kernel space (copyin/xcopyin/xcopyin_little)
2567 bleu,pt %ncc, .copyin_small ! go to larger cases
2580 bleu,pt %ncc, .copyin_small ! go to small copy
2582 ba,pt %ncc, .copyin_more ! otherwise go to large copy
2593 bleu,pt %ncc, .copyin_small ! go to small copy
2595 ba,pt %ncc, .copyin_more ! otherwise go to large copy
2604 bleu,pt %ncc, .copyin_small ! go to small copy
2606 ba,pt %ncc, .copyin_more ! otherwise go to large copy
2614 bleu,pt %ncc, .copyin_small ! go to small copy
2616 ba,pt %ncc, .copyin_more ! otherwise go to large copy
2638 bz,pt %ncc, .ci_sm_word ! branch to word aligned case
2640 sub %o2, 3, %o2 ! adjust count to allow cc zero test
2773 ! Now long word aligned and have at least 32 bytes to move
2776 sub %o2, 31, %o2 ! adjust count to allow cc zero test
2793 addcc %o2, 24, %o2 ! restore count to long word offset
2794 ble,pt %ncc, .ci_med_lextra ! check for more long words to move
2828 ! Now word aligned and have at least 36 bytes to move
2831 sub %o2, 15, %o2 ! adjust count to allow cc zero test
2848 addcc %o2, 12, %o2 ! restore count to word offset
2849 ble,pt %ncc, .ci_med_wextra ! check for more words to move
2880 ! Now half word aligned and have at least 38 bytes to move
2883 sub %o2, 7, %o2 ! adjust count to allow cc zero test
2929 * The _more entry points are not intended to be used directly by
2930 * any caller from outside this file. They are provided to allow
2978 ! TMP = bytes required to align DST on FP_BLOCK boundary
3185 bleu,pt %ncc, .xcopyin_small ! go to larger cases
3198 bleu,pt %ncc, .xcopyin_small ! go to small copy
3200 ba,pt %ncc, .xcopyin_more ! otherwise go to large copy
3211 bleu,pt %ncc, .xcopyin_small ! go to small copy
3213 ba,pt %ncc, .xcopyin_more ! otherwise go to large copy
3222 bleu,pt %ncc, .xcopyin_small ! go to small copy
3224 ba,pt %ncc, .xcopyin_more ! otherwise go to large copy
3232 bleu,pt %ncc, .xcopyin_small ! go to small copy
3234 ba,pt %ncc, .xcopyin_more ! otherwise go to large copy
3338 * Copy a block of storage - must not overlap (from + len <= to).
3339 * No fault handler installed (to be called under on_fault())
3352 bleu,pt %ncc, .copyin_ne_small ! go to larger cases
3365 bleu,pt %ncc, .copyin_ne_small ! go to small copy
3367 ba,pt %ncc, .copyin_noerr_more ! otherwise go to large copy
3378 bleu,pt %ncc, .copyin_ne_small ! go to small copy
3380 ba,pt %ncc, .copyin_noerr_more ! otherwise go to large copy
3389 bleu,pt %ncc, .copyin_ne_small ! go to small copy
3391 ba,pt %ncc, .copyin_noerr_more ! otherwise go to large copy
3399 bleu,pt %ncc, .copyin_ne_small ! go to small copy
3401 ba,pt %ncc, .copyin_noerr_more ! otherwise go to large copy
3435 * Copy a block of storage - must not overlap (from + len <= to).
3436 * No fault handler installed (to be called under on_fault())
3450 bleu,pt %ncc, .copyout_ne_small ! go to larger cases
3463 bleu,pt %ncc, .copyout_ne_small ! go to small copy
3465 ba,pt %ncc, .copyout_noerr_more ! otherwise go to large copy
3476 bleu,pt %ncc, .copyout_ne_small ! go to small copy
3478 ba,pt %ncc, .copyout_noerr_more ! otherwise go to large copy
3487 bleu,pt %ncc, .copyout_ne_small ! go to small copy
3489 ba,pt %ncc, .copyout_noerr_more ! otherwise go to large copy
3497 bleu,pt %ncc, .copyout_ne_small ! go to small copy
3499 ba,pt %ncc, .copyout_noerr_more ! otherwise go to large copy
3542 ! %l1 - pointer to saved %d0 block
3648 * Copy 32 bytes of data from src (%o0) to dst (%o1)