i915_gem_tiling.c revision 1450
1450N/A * Copyright (c) 2006, 2013, Oracle and/or its affiliates. All rights reserved. 1450N/A * Copyright (c) 2009, 2013, Intel Corporation. 1450N/A * Permission is hereby granted, free of charge, to any person obtaining a 1450N/A * copy of this software and associated documentation files (the "Software"), 1450N/A * to deal in the Software without restriction, including without limitation 1450N/A * the rights to use, copy, modify, merge, publish, distribute, sublicense, 1450N/A * and/or sell copies of the Software, and to permit persons to whom the 1450N/A * Software is furnished to do so, subject to the following conditions: 1450N/A * The above copyright notice and this permission notice (including the next 1450N/A * paragraph) shall be included in all copies or substantial portions of the 1450N/A * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 1450N/A * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 1450N/A * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 1450N/A * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 1450N/A * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 1450N/A * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 1450N/A * Eric Anholt <eric@anholt.net> 1450N/A * Support for managing tiling state of buffer objects. 1450N/A * The idea behind tiling is to increase cache hit rates by rearranging 1450N/A * pixel data so that a group of pixel accesses are in the same cacheline. 1450N/A * Performance improvement from doing this on the back/depth buffer are on 1450N/A * Intel architectures make this somewhat more complicated, though, by 1450N/A * adjustments made to addressing of data when the memory is in interleaved 1450N/A * mode (matched pairs of DIMMS) to improve memory bandwidth. 1450N/A * For interleaved memory, the CPU sends every sequential 64 bytes 1450N/A * to an alternate memory channel so it can get the bandwidth from both. 1450N/A * The GPU also rearranges its accesses for increased bandwidth to interleaved 1450N/A * memory, and it matches what the CPU does for non-tiled. However, when tiled 1450N/A * it does it a little differently, since one walks addresses not just in the 1450N/A * X direction but also Y. So, along with alternating channels when bit 1450N/A * 6 of the address flips, it also alternates when other bits flip -- Bits 9 1450N/A * (every 512 bytes, an X tile scanline) and 10 (every two X tile scanlines) 1450N/A * are common to both the 915 and 965-class hardware. 1450N/A * The CPU also sometimes XORs in higher bits as well, to improve 1450N/A * bandwidth doing strided access like we do so frequently in graphics. This 1450N/A * is called "Channel XOR Randomization" in the MCH documentation. The result 1450N/A * is that the CPU is XORing in either bit 11 or bit 17 to bit 6 of its address 1450N/A * All of this bit 6 XORing has an effect on our memory management, 1450N/A * as we need to make sure that the 3d driver can correctly address object 1450N/A * If we don't have interleaved memory, all tiling is safe and no swizzling is 1450N/A * When bit 17 is XORed in, we simply refuse to tile at all. Bit 1450N/A * 17 is not just a page offset, so as we page an objet out and back in, 1450N/A * individual pages in it will have different bit 17 addresses, resulting in 1450N/A * each 64 bytes being swapped with its neighbor! 1450N/A * Otherwise, if interleaved, we have to tell the 3d driver what the address 1450N/A * swizzling it needs to do is, since it's writing with the CPU to the pages 1450N/A * (bit 6 and potentially bit 11 XORed in), and the GPU is reading from the 1450N/A * pages (bit 6, 9, and 10 XORed in), resulting in a cumulative bit swizzling 1450N/A * required by the CPU of XORing in bit 6, 9, 10, and potentially 11, in order 1450N/A * to match what the GPU expects. 1450N/A * Detects bit 6 swizzling of address lookup between IGD access and CPU 1450N/A * access through main memory. 1450N/A /* Enable swizzling when the channels are populated with 1450N/A * identically sized dimms. We don't need to check the 3rd 1450N/A * channel because no cpu with gpu attached ships in that 1450N/A * configuration. Also, swizzling only makes sense for 2 1450N/A /* On IRONLAKE whatever DRAM config, GPU always do 1450N/A /* As far as we know, the 865 doesn't have these bit 6 1450N/A /* On mobile 9xx chipsets, channel interleave by the CPU is 1450N/A * determined by DCC. For single-channel, neither the CPU 1450N/A * nor the GPU do swizzling. For dual channel interleaved, 1450N/A * the GPU's interleave is bit 9 and 10 for X tiled, and bit 1450N/A * 9 for Y tiled. The CPU's interleave is independent, and 1450N/A * can be based on either bit 11 (haven't seen this yet) or 1450N/A /* This is the base swizzling by the GPU for 1450N/A /* Work around: for fixing 965GM flickering issue on OpenArena */ 1450N/A /* Bit 17 swizzling by the CPU in addition. */ 1450N/A /* The 965, G33, and newer, have a very flexible memory 1450N/A * configuration. It will enable dual-channel mode 1450N/A * (interleaving) on as much memory as it can, and the GPU 1450N/A * will additionally sometimes enable different bit 6 1450N/A * swizzling for tiled objects from the CPU. 1450N/A * Here's what I found on the G965: 1450N/A * slot fill memory size swizzling 1450N/A * 1024 1024 1024 0 2048 1024 O 1450N/A * We could probably detect this based on either the DRB 1450N/A * matching, which was the case for the swizzling required in 1450N/A * the table above, or from the 1-ch value being less than 1450N/A * the minimum size of a rank. 1450N/A/* Check pitch constriants for all chips & tiling formats */ 1450N/A /* Linear is always fine */ 1450N/A /* check maximum stride & object size */ 1450N/A /* i965+ stores the end address of the gtt mapping in the fence 1450N/A * reg, so dont bother to check the size */ 1450N/A /* 965+ just needs multiples of tile width */ 1450N/A /* Pre-965 needs power of two tile widths */ 1450N/A/* Is the current GTT allocation valid for the change in tiling? */ 1450N/A * Sets the tiling mode of an object, returning the required swizzling of 1450N/A * bit 6 of addresses in the object. 1450N/A /* Hide bit 17 swizzling from the user. This prevents old Mesa 1450N/A * from aborting the application on sw fallbacks to bit 17, 1450N/A * If there was a user that was relying on the swizzle 1450N/A * break it, but we don't have any of those. 1450N/A /* If we can't handle the swizzling, make it untiled. */ 1450N/A /* We need to rebind the object if its current allocation 1450N/A * no longer meets the alignment restrictions for its new 1450N/A * tiling mode. Otherwise we can just leave it alone, but 1450N/A * need to ensure that any fence register is cleared. 1450N/A * the next fenced (either through the GTT or by the BLT unit 1450N/A * After updating the tiling parameters, we then flag whether 1450N/A * we need to update an associated fence register. Note this 1450N/A * has to also include the unfenced register the GPU uses 1450N/A * whilst executing a fenced command for an untiled object. 1450N/A /* Rebind if we need a change of alignment */ 1450N/A /* Force the fence to be reacquired for GTT access */ 1450N/A /* we have to maintain this existing ABI... */ 1450N/A /* Try to preallocate memory required to save swizzling on put-pages */ 1450N/A * Returns the current tiling mode and required bit 6 swizzling for the object. 1450N/A /* Hide bit 17 from the user -- see comment in i915_gem_set_tiling */ 1450N/A * Swap every 64 bytes of this page around, to account for it having a new 1450N/A * bit 17 of its physical address and therefore being interpreted differently