memset.s revision fad5204e207119133cdc503293923b09417b233b
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * CDDL HEADER START
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * The contents of this file are subject to the terms of the
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Common Development and Distribution License (the "License").
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * You may not use this file except in compliance with the License.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * See the License for the specific language governing permissions
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * and limitations under the License.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * When distributing Covered Code, include this CDDL HEADER in each
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * If applicable, add the following below this CDDL HEADER, with the
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * fields enclosed by brackets "[]" replaced with your own identifying
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * information: Portions Copyright [yyyy] [name of copyright owner]
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * CDDL HEADER END
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Use is subject to license terms.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Copyright (c) 2008, Intel Corporation
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * All rights reserved.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Portions Copyright 2009 Advanced Micro Devices, Inc.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * memset algorithm overview:
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Thresholds used below were determined experimentally.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Pseudo code:
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * NOTE: On AMD NO_SSE is always set. Performance on Opteron did not improve
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * using 16-byte stores. Setting NO_SSE on AMD should be re-evaluated on
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * future AMD processors.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * If (size <= 144 bytes) {
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * do unrolled code (primarily 8-byte stores) regardless of alignment.
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Align destination to 16-byte boundary
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * if (NO_SSE) {
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * If (size > largest level cache) {
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Use 8-byte non-temporal stores (64-bytes/loop)
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * if (size >= 2K) {
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Use rep sstoq
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * Use 8-byte stores (128 bytes per loop)
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * } else { **USE SSE**
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * If (size <= 192 bytes) {
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * do unrolled code using primarily 16-byte stores (SSE2)
fb3fb4f3d76d55b64440afd0af72775dfad3bd1dtomee * If (size > largest level cache) {
L(ck2):
L(ck_align):
L(aligned_now):
L(byte32sse2_pre):
L(byte32sse2):
L(sse2_nt_move):
L(Loop8byte_pre):
L(Loop8byte):
L(use_rep):