1450N/A * Copyright (c) 2008, Oracle and/or its affiliates. All rights reserved. 1450N/A * Copyright (C) 2004-2005 Nicolai Haehnle et al. 1450N/A * Permission is hereby granted, free of charge, to any person obtaining a 1450N/A * copy of this software and associated documentation files (the "Software"), 1450N/A * to deal in the Software without restriction, including without limitation 1450N/A * on the rights to use, copy, modify, merge, publish, distribute, sub 1450N/A * license, and/or sell copies of the Software, and to permit persons to whom 1450N/A * the Software is furnished to do so, subject to the following conditions: 1450N/A * The above copyright notice and this permission notice (including the next 1450N/A * paragraph) shall be included in all copies or substantial portions of the 1450N/A * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 1450N/A * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 1450N/A * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL 1450N/A * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM, 1450N/A * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 1450N/A * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 1450N/A * USE OR OTHER DEALINGS IN THE SOFTWARE. 1450N/A * This file contains registers and constants for the R300. They have been 1450N/A * found mostly by examining command buffers captured using glxtest, as well 1450N/A * as by extrapolating some known registers and constants from the R200. 1450N/A * I am fairly certain that they are correct unless stated otherwise in 1450N/A// This register is written directly and also starts data 1450N/A// section in many 3d CP_PACKET3's 1450N/A /* State based - direct writes to registers trigger vertex generation */ 1450N/A /* I don't think I saw these three used.. */ 1450N/A/* index size - when not set the indices are assumed to be 16 bit */ 1450N/A/* BEGIN: Vertex data assembly - lots of uncertainties */ 1450N/A// Where do we get our vertex data? 1450N/A// Vertex data either comes either from immediate mode registers or from 1450N/A// There appears to be no mixed mode (though we can force the pitch of 1450N/A// vertex arrays to 0, effectively reusing the same element over and over 1450N/A// Immediate mode is controlled by the INPUT_CNTL registers. I am not sure 1450N/A// if these registers influence vertex array processing. 1450N/A// Vertex arrays are controlled via the 3D_LOAD_VBPNTR packet3. 1450N/A// In both cases, vertex attributes are then passed through INPUT_ROUTE. 1450N/A// Beginning with INPUT_ROUTE_0_0 is a list of WORDs that route vertex data 1450N/A// into the vertex processor's input registers. 1450N/A// The first word routes the first input, the second word the second, etc. 1450N/A// The corresponding input is routed into the register with the given index. 1450N/A// The list is ended by a word with INPUT_ROUTE_END set. 1450N/A// Always set COMPONENTS_4 in immediate mode. */ 1450N/A// - always set up to produce at least two attributes: 1450N/A// if vertex program uses only position, fglrx will set normal, too 1450N/A// - INPUT_CNTL_0_COLOR and INPUT_CNTL_COLOR bits are always equal */ 1450N/A// Words parallel to INPUT_ROUTE_0; All words that are active in INPUT_ROUTE_0 1450N/A// are set to a swizzling bit pattern, other words are 0. 1450N/A// In immediate mode, the pattern is always set to xyzw. In vertex array 1450N/A// mode, the swizzling pattern is e.g. used to set zw components in texture 1450N/A// coordinates with only tweo components 1450N/A// BEGIN: Upload vertex program and data 1450N/A// The programmable vertex shader unit has a memory bank of unknown size 1450N/A// that can be written to in 16 byte units by writing the address into 1450N/A// UPLOAD_ADDRESS, followed by data in UPLOAD_DATA (multiples of 4 DWORDs). 1450N/A// Pointers into the memory bank are always in multiples of 16 bytes. 1450N/A// The memory bank is divided into areas with fixed meaning. 1450N/A// Starting at address UPLOAD_PROGRAM: Vertex program instructions. 1450N/A// Native limits reported by drivers from ATI suggest size 256 (i.e. 4KB), 1450N/A// whereas the difference between known addresses suggests size 512. 1450N/A// Starting at address UPLOAD_PARAMETERS: Vertex program parameters. 1450N/A// Native reported limits and the VPI layout suggest size 256, whereas 1450N/A// difference between known addresses suggests size 512. 1450N/A// At address UPLOAD_POINTSIZE is a vector (0, 0, ps, 0), where ps is the 1450N/A// floating point pointsize. The exact purpose of this state is uncertain, 1450N/A// as there is also the R300_RE_POINTSIZE register. 1450N/A// Multiple vertex programs and parameter sets can be loaded at once, 1450N/A// which could explain the size discrepancy. 1450N/A * I do not know the purpose of this register. However, I do know that 1450N/A * it is set to 221C_CLEAR for clear operations and to 221C_NORMAL 1450N/A * Sometimes, END_OF_PKT and 0x2284=0 are the only commands sent between 1450N/A * rendering commands and overwriting vertex program parameters. 1450N/A * Therefore, I suspect writing zero to 0x2284 synchronizes the engine and 1450N/A * avoids bugs caused by still running shaders reading bad data from memory. 1450N/A/* Absolutely no clue what this register is about. */ 1450N/A * Addresses are relative to the vertex program instruction area of the 1450N/A * memory bank. PROGRAM_END points to the last instruction of the active 1450N/A * The meaning of the two UNKNOWN fields is obviously not known. However, 1450N/A * experiments so far have shown that both *must* point to an instruction 1450N/A * inside the vertex program, otherwise the GPU locks up. 1450N/A * fglrx usually sets CNTL_3_UNKNOWN to the end of the program and 1450N/A * CNTL_1_UNKNOWN points to instruction where last write to position 1450N/A * takes place. Most likely this is used to ignore rest of the program 1450N/A * in cases where group of verts arent visible. 1450N/A * For some reason this "section" is sometimes accepted other instruction 1450N/A * that have no relationship with position calculations. 1450N/A/* Addresses are relative the the vertex program parameters area. */ 1450N/A// The entire range from 0x2300 to 0x2AC inclusive seems to be used for 1450N/A * be correct and are here so we can use one register file instead 1450N/A // each of the following is 3 bits wide, specifies number 1450N/A * UNK30 seems to enables point to quad transformation on 1450N/A * textures (or something closely related to that).This bit 1450N/A * is rather fatal at the time being due to lackings at pixel 1450N/A /* each of the following is 2 bits wide */ 1450N/A/* MSPOS - positions for multisample antialiasing (?) */ 1450N/A /* shifts - each of the fields is 4 bits */ 1450N/A /* each of the following is 2 bits wide */ 1450N/A // the following use the same constants as above, but meaning is 1450N/A // is times 2 (i.e. instead of 32 words it means 64 */ 1450N/A /* watermarks, 3 bits wide */ 1450N/A/* The upper enable bits are guessed, based on fglrx reported limits. */ 1450N/A// The pointsize is given in multiples of 6. The pointsize can be 1450N/A// enormous: Clear() renders a single point that fills the entire 1450N/A * The line width is given in multiples of 6. 1450N/A * In default mode lines are classified as vertical lines. 1450N/A * VE: vertical or horizontal 1450N/A * HO & VE: no classification 1450N/A/* Some sort of scale or clamp value for texcoordless textures. */ 1450N/A * Not sure why there are duplicate of factor and constant values. 1450N/A * My best guess so far is that there are seperate zbiases for test 1450N/A * Some of the tests indicate that fgl has a fallback implementation 1450N/A * of zbias via pixel shaders. 1450N/A * This register needs to be set to (1<<1) for RV350 to correctly 1450N/A * perform depth test (see --vb-triangles in r300_demo) 1450N/A * Don't know about other chips. - Vladimir 1450N/A * This is set to 3 when GL_POLYGON_OFFSET_FILL is on. 1450N/A * My guess is that there are two bits for each zbias 1450N/A * primitive (FILL, LINE, POINT). 1450N/A * One to enable depth test and one for depth write. 1450N/A * Yet this doesnt explain why depth writes work ... 1450N/A// BEGIN: Rasterization / Interpolators - many guesses 1450N/A// 0_UNKNOWN_18 has always been set except for clear operations. 1450N/A// TC_CNT is the number of incoming texture coordinate sets (i.e. it depends 1450N/A// on the vertex program, *not* the fragment program) */ 1450N/A /* number of color interpolators used */ 1450N/A/* Guess: RS_CNTL_1 holds the index of the highest used RS_ROUTE_n register. */ 1450N/A// Only used for texture coordinates. 1450N/A// Use the source field to route texture coordinate input from the 1450N/A// vertex program to the desired interpolator. Note that the source 1450N/A// field is relative to the outputs the vertex program *actually* 1450N/A// writes. If a vertex program only writes texcoord[1], this will 1450N/A// be source index 0. Set INTERP_USED on all interpolators that 1450N/A// produce data used by the fragment program. INTERP_USED looks 1450N/A// like a swizzling mask, but I haven't seen it used that way. 1450N/A// Note: The _UNKNOWN constants are always set in their respective register. 1450N/A// I don't know if this is necessary. */ 1450N/A// These DWORDs control how vertex data is routed into fragment program 1450N/A// registers, after interpolators. */ 1450N/A// Special handling for color: When the fragment program uses color, 1450N/A// the ROUTE_0_COLOR bit is set and ROUTE_0_COLOR_DEST contains the 1450N/A/* As above, but for secondary color */ 1450N/A// BEGIN: Scissors and cliprects 1450N/A// There are four clipping rectangles. Their corner coordinates are inclusive. 1450N/A// Every pixel is assigned a number from 0 and 15 by setting bits 0-3 depending 1450N/A// on whether the pixel is inside cliprects 0-3, respectively. For example, 1450N/A// if a pixel is inside cliprects 0 and 1, but outside 2 and 3, it is assigned 1450N/A// the number 3 (binary 0011). 1450N/A// Iff the bit corresponding to the pixel's number in RE_CLIPRECT_CNTL is set, 1450N/A// In addition to this, there is a scissors rectangle. Only pixels inside the 1450N/A// scissors rectangle are drawn. (coordinates are inclusive) 1450N/A// For some reason, the top-left corner of the framebuffer is at (1440, 1440) 1450N/A// for the purpose of clipping and scissors. */ 1450N/A// BEGIN: Texture specification 1450N/A// The texture specification dwords are grouped by meaning and not 1450N/A// by texture unit. This means that e.g. the offset for texture 1450N/A// image unit N is found in register TX_OFFSET_0 + (4*N) */ 1450N/A * NOTE: NEAREST doesnt seem to exist 1450N/A * Im not seting MAG_FILTER_MASK and (3 << 11) on for all 1450N/A * anisotropy modes because that would void selected mag filter 1450N/A /* The interpretation of the format word by Wladimir van der Laan */ 1450N/A * The X, Y, Z and W refer to the layout of the components. 1450N/A * They are given meanings as R, G, B and Alpha by the swizzle 1450N/A /* 0x16 - some 16 bit green format.. ?? */ 1450N/A /* Floating point formats */ 1450N/A /* Note - hardware supports both 16 and 32 bit floating point */ 1450N/A /* alpha modes, convenience mostly */ 1450N/A // if you have alpha, pick constant appropriate to the 1450N/A // number of channels (1 for I8, 2 for I8A8, 4 for R8G8B8A8, etc 1450N/A /* 2.0*Z, everything above 1.0 is set to 0.0 */ 1450N/A /* 2.0*W, everything above 1.0 is set to 0.0 */ 1450N/A /* Convenience macro to take care of layout and swizzling */ 1450N/A /* These can be ORed with result of R300_EASY_TX_FORMAT() */ 1450N/A/* We don't really know what they do. Take values from a constant color ? */ 1450N/A /* obvious missing in gap */ 1450N/A/* BEGIN: Guess from R200 */ 1450N/A /* ff00ff00 == { 0, 1.0, 0, 1.0 } */ 1450N/A// BEGIN: Fragment program instruction set 1450N/A// Fragment programs are written directly into register space. 1450N/A// There are separate instruction streams for texture instructions and ALU 1450N/A// In order to synchronize these streams, the program is divided into up 1450N/A// to 4 nodes. Each node begins with a number of TEX operations, followed 1450N/A// by a number of ALU operations. 1450N/A// The first node can have zero TEX ops, all subsequent nodes must have at least 1450N/A// All nodes must have at least one ALU op. 1450N/A// The index of the last node is stored in PFS_CNTL_0: A value of 0 means 1450N/A// 1 node, a value of 3 means 4 nodes. 1450N/A// The total amount of instructions is defined in PFS_CNTL_2. The offsets are 1450N/A// offsets into the respective instruction streams, while *_END points to the 1450N/A// last instruction relative to this offset. 1450N/A// There is an unshifted value here which has so far always been equal to the 1450N/A// index of the highest used temporary register. 1450N/A// Nodes are stored backwards. The last active node is always stored in 1450N/A// Example: In a 2-node program, NODE_0 and NODE_1 are set to 0. The 1450N/A// first node is stored in NODE_2, the second node is stored in NODE_3. 1450N/A// Offsets are relative to the master offset from PFS_CNTL_2. 1450N/A// LAST_NODE is set for the last node, and only for the last node. 1450N/A/* #define R300_PFS_NODE_LAST_NODE (1 << 22) */ 1450N/A// As far as I can tell, texture instructions cannot write into output 1450N/A// registers directly. A subsequent ALU instruction is always necessary, 1450N/A// even if it's just MAD o0, r0, 1, 0 1450N/A /* GUESS based on layout and native limits */ 1450N/A * Unsure if these are opcodes, or some kind of bitfield, but this is how 1450N/A * they were set when I checked 1450N/A// The ALU instructions register blocks are enumerated according to the order 1450N/A// in which fglrx. I assume there is space for 64 instructions, since 1450N/A// each block has space for a maximum of 64 DWORDs, and this matches reported 1450N/A// The basic functional block seems to be one MAD for each color and alpha, 1450N/A// and an adder that adds all components after the MUL. 1450N/A// - ADD, MUL, MAD etc.: use MAD with appropriate neutral operands 1450N/A// - DP4: Use OUTC_DP4, OUTA_DP4 1450N/A// - DP3: Use OUTC_DP3, OUTA_DP4, appropriate alpha operands 1450N/A// - DPH: Use OUTC_DP4, OUTA_DP4, appropriate alpha operands 1450N/A// - CMP: If ARG2 < 0, return ARG1, else return ARG0 1450N/A// - RSQ: use ABS modifier for argument 1450N/A// - Use OUTC_REPL_ALPHA to write results of an alpha-only operation (e.g. RCP) 1450N/A// - apparently, there's no quick DST operation 1450N/A// - fglrx set FPI2_UNKNOWN_31 on a "MAX r2, r1, c0" 1450N/A// - fglrx once set FPI0_UNKNOWN_31 on a "FRC r1, r1" 1450N/A// First stage selects three sources from the available registers and 1450N/A// constant parameters. This is defined in INSTR1 (color) and INSTR3 (alpha). 1450N/A// fglrx sorts the three source fields: Registers before constants, 1450N/A// lower indices before higher indices; I do not know whether this is necessary. 1450N/A// fglrx fills unused sources with "read constant 0" 1450N/A// According to specs, you cannot select more than two different constants. 1450N/A// Second stage selects the operands from the sources. This is defined in 1450N/A// INSTR0 (color) and INSTR2 (alpha). You can also select the special constants 1450N/A// Swizzling and negation happens in this stage, as well. 1450N/A// Important: Color and alpha seem to be mostly separate, i.e. their sources 1450N/A// selection appears to be fully independent (the register storage is probably 1450N/A// physically split into a color and an alpha section). 1450N/A// However (because of the apparent physical split), there is some interaction 1450N/A// WRT swizzling. If, for example, you want to load an R component into an 1450N/A// Alpha operand, this R component is taken from a *color* source, not from 1450N/A// an alpha source. The corresponding register doesn't even have to appear in 1450N/A// the alpha sources list. (I hope this alll makes sense to you) 1450N/A// The destination register index is in FPI1 (color) and FPI3 (alpha) together 1450N/A// There are separate enable bits for writing into temporary registers 1450N/A// (DSTC_REG_* /DSTA_REG) and and program output registers 1450N/A// (DSTC_OUTPUT_* /DSTA_OUTPUT). 1450N/A// You can write to both at once, or not write at all (the same index 1450N/A// Note: There is a special form for LRP 1450N/A// - Argument order is the same as in ARB_fragment_program. 1450N/A// Arbitrary LRP (including support for swizzling) requires vanilla MAD+MAD 1450N/A/* Fragment program parameters in 7.16 floating point */ 1450N/A/* GUESS: PARAM_31 is last, based on native limits reported by fglrx */ 1450N/A// - AFAIK fglrx always sets BLEND_UNKNOWN when blending is used 1450N/A// - AFAIK fglrx always sets BLEND_NO_SEPARATE when CBLEND and 1450N/A// ABLEND are set to the same 1450N/A// function (both registers are always set up completely in any case) 1450N/A// - Most blend flags are simply copied from R200 and not tested yet 1450N/A/* the following only appear in CBLEND */ 1450N/A/* the following are shared between CBLEND and ABLEND */ 1450N/A// Bit 18: Extremely weird tile like, but some pixels duplicated? 1450N/A * Set to 0A before 3D operations, set to 02 afterwards. 1450N/A * There seems to be no "write only" setting, so use 1450N/A * Z-test = ALWAYS for this. Bit (1<<8) is the "test" 1450N/A * bit. so plain write is 6 - vd 1450N/A * front and back refer to operations done for front 1450N/A * and back faces, i.e. separate stencil function support 1450N/A * BEGIN: Vertex program instruction set 1450N/A * Every instruction is four dwords long: 1450N/A * DWORD 0: output and opcode 1450N/A * - ABS r, a is implemented as MAX r, a, -a 1450N/A * - MOV is implemented as ADD to zero 1450N/A * - XPD is implemented as MUL + MAD 1450N/A * - FLR is implemented as FRC + ADD 1450N/A * - apparently, fglrx tries to schedule instructions so that there 1450N/A * is at least one instruction between the write to a temporary 1450N/A * and the first read from said temporary; however, violations 1450N/A * of this scheduling are allowed 1450N/A * - register indices seem to be unrelated with OpenGL aliasing to 1450N/A * - only one attribute and one parameter can be loaded at a time; 1450N/A * - the second software argument for POW is the third hardware 1450N/A * - MAD with only temporaries as input seems to use VPI_OUT_SELECT_MAD_2 1450N/A * There is some magic surrounding LIT: 1450N/A * The single argument is replicated across all three inputs, but swizzled: 1450N/A * Whenever the result is used later in the fragment program, fglrx forces 1450N/A * x and w to be 1.0 in the input selection; I don't know whether this is 1450N/A * Used in GL_POINT_DISTANCE_ATTENUATION_ARB, 1450N/A /* Used in fog computations, scalar(scalar) */ 1450N/A * Used in GL_POINT_DISTANCE_ATTENUATION_ARB, 1450N/A /* all temps, vector(scalar, vector, vector) */ 1450N/A /* GUESS based on fglrx native limits */ 1450N/A /* GUESS based on fglrx native limits */ 1450N/A * The R300 can select components from the input register arbitrarily. 1450N/A * Use the following constants, shifted by the component shift you 1450N/A/* BEGIN: Packet 3 commands */ 1450N/A// A primitive emission dword. 1450N/A// Draw a primitive from vertex data in arrays loaded via 3D_LOAD_VBPNTR. 1450N/A// 0. The first parameter appears to be always 0 1450N/A// 1. The second parameter is a standard primitive emission dword. 1450N/A// Specify the full set of vertex arrays as (address, stride). 1450N/A// The first parameter is the number of vertex arrays specified. 1450N/A// The rest of the command is a variable length list of blocks, where 1450N/A// each block is three dwords long and specifies two arrays. 1450N/A// The first dword of a block is split into two words, the lower significant 1450N/A// word refers to the first array, the more significant word to the second 1450N/A// The low byte of each word contains the size of an array entry in dwords, 1450N/A// the high byte contains the stride of the array. 1450N/A// The second dword of a block contains the pointer to the first array, 1450N/A// the third dword of a block contains the pointer to the second array. 1450N/A// Note that if the total number of arrays is odd, the third dword of 1450N/A// the last block is omitted.