A texture mapping unit (TMU) is a component in modern graphics processing units (GPUs). Historically it was a separate physical processor. A TMU is able to rotate, resize, and distort a bitmap image (performing texture sampling), to be placed onto an arbitrary plane of a given 3D model as a texture. This process is called texture mapping. In modern graphics cards it is implemented as a discrete stage in a graphics pipeline, whereas when first introduced it was implemented as a separate processor, e.g. as seen on the Voodoo2 graphics card.
The TMU came about due to the compute demands of sampling and transforming a flat image (as the texture map) to the correct angle and perspective it would need to be in 3D space. The compute operation is a large matrix multiply, which CPUs of the time (early Pentiums) could not cope with at acceptable performance.
Today (2013), TMUs are part of the shader pipeline and decoupled from the Render Output Pipelines (ROPs). For example, in AMD's Cypress GPU, each shader pipeline (of which there are 20) has four TMUs, giving the GPU 80 TMUs. This is done by chip designers to closely couple shaders and the texture engines they will be working with.
3D scenes are generally composed of two things: 3D geometry, and the textures that cover that geometry. Texture units in a video card take a texture and 'map' it to a piece of geometry. That is, they wrap the texture around the geometry and produce textured pixels which can then be written to the screen. Textures can be an actual image, a lightmap, or even normal maps for advanced surface lighting effects.
To render a 3D scene, textures are mapped over the top of polygon meshes. This is called texture mapping and is accomplished by texture mapping units (TMUs) on the videocard. Texture fill rate is a measure of the speed with which a particular card can perform texture mapping.
Though pixel shader processing is becoming more important, this number still holds some weight. Best example of this is the X1600 XT. This card has a 3 to 1 ratio of pixel shader processors/texture mapping units. As a result, the X1600 XT achieves lower performance when compared to other GPUs of the same era and class (such as nVidia's 7600GT). In the mid range, texture mapping can still very much be a bottleneck. However, at the high end, the X1900 XTX has this same 3 to 1 ratio, but does just fine because screen resolutions top out and it has more than enough texture mapping power to handle any display.
Textures need to be addressed and filtered. This job is done by TMUs that work in conjunction with pixel and vertex shader units. It is the TMU's job to apply texture operations to pixels. The number of texture units in a graphics processor is used when comparing two different cards for texturing performance. It is reasonable to assume that the card with more TMUs will be faster at processing texture information. In modern GPUs TMUs contain Texture Address Units(TA) and Texture Filtering Units(TF). Texture Address Units map texels to pixels and can perform texture addressing modes. Texture Filtering Units optionally perform hardware based texture filtering.
A pipeline is the graphics card's architecture, which provides a generally accurate idea of the computing power of a graphics processor.
A pipeline isn't formally accepted as a technical term. There are different pipelines within a graphics processor as there are separate functions being performed at any given time. Historically, it has been referred to as a pixel processor that is attached to a dedicated TMU. A Geforce 3 had four pixel pipelines, each of which had two TMUs. The rest of the pipeline handled things like depth and blending operations.
The ATI Radeon 9700 was first to break this mould, by placing a number of vertex shader engines independent of the pixel shaders. The R300 GPU used in the Radeon 9700 had four global vertex shaders, but split the rest of the rendering pipeline in half (it was, so to speak, dual core) each half, called a quad, had four pixel shaders, four TMUs and four ROPs.
Some units are used more than others, and in an effort to increase the processor's entire performance, they attempted to find a "sweet spot" in the number of units needed for optimum efficiency without the need for excess silicon. In this architecture the name pixel pipeline lost its meaning as pixel processors were no longer attached to single TMUs.
The vertex shader had long been decoupled, starting with the R300, but the pixel shader was not so easily done, as it required colour data (e.g. texture samples) to work with, and hence needed to be closely coupled to a TMU.
Said coupling remains to this day, where the shader engine, made of units able to run either vertex or pixel data, is tightly coupled to a TMU but has a crossbar dispatcher between its output and the bank of ROPs.
The Render Output Pipeline is an inherited term, and more often referred to as the render output unit. Its job is to control the sampling of pixels (each pixel is a dimensionless point), so it controls antialiasing, when more than one sample is merged into one pixel. All data rendered has to travel through the ROP in order to be written to the framebuffer, from there it can be transmitted to the display.
Therefore, the ROP is where the GPU's output is assembled into a bitmapped image ready for display.
In GPGPU, texture maps in 1,2, or 3 dimensions may be used to store arbitrary data. By providing interpolation, the texture mapping unit provides a convenient means of approximating arbitrary functions with data tables.