MATROX GRAPHICS INC.
Setup Engine DDK
VERSION 1.0
By Mathieu Raby
July 12, 1999
Table of Content
List of figures *
List of tables *
1 Introduction *
2 Setup Engine *
3 Setting up the setup engine *
3.1 Register list *
3.2 First time initialization *
3.3 Loading micro-code *
4 Vertex format *
4.1 Structure of a vertex *
4.2 Talking to the WARPs *
4.3 Support for lists, strips and fans *
5 Pipes *
5.1 Pipes combinations *
5.2 Pipes parameters *
Annex 1 - Warp initialization code *
Annex 2 - Loading micro-codes in busmastering *
Annex 3 - Loading micro-codes in Direct access *
Figure 2-1 Basic bloc diagram of Matrox chip *
Figure 4-1 Difference between busmastering methods of sending vertex information. *
Figure 4-2 Sequence of cache location for triangle list, strip and fan *
Table 3-1 Warp registers list *
Table 5-1 Registers set by the WARP pipes *
This document explains how to setup the G200/G400 accelerator chips setup engine (WARP). It will explain first what is the WARP and how to setup it. Then follow the description of how to send information to the WARP micro-code.
This document is targeted to peoples who want to write driver for alternate 3D interface and used the WARP engine acceleration. This document is a complement to the G200/G400 registers specification. Difference between G200/G400 will be explained when necessary.
This chapter will describe what the setup engine is and what it is used for.
The setup engine or W.A.R.P. (Windows Accelerator Rendering Pre-processor) is a device who pre-process command sent to the graphic engine (the graphic engine being the accelerator itself).
The driver can send information to the WARP (to be pre-processed), to the graphic engine (to be executed directly) and of course to the frame buffer. Information sent to the WARPs is toggle automatically between WARP0 and WARP1 by the input path controller. The output path controller serializes information sent to the graphic engine, which can come from the driver via the AGP or from the WARPs.
Figure -1 Basic bloc diagram of Matrox chip
There are two major advantages to use the WARP. The WARP is completely programmable and it can do complex computations in parallel with the graphic engine and the CPU.
The format and interpretation of the information that is sent to the WARP depend on the micro-code. For the purpose of 3D interfaces, the micro-codes are used to setup triangles into the graphic engine, hence, the information is interpreted as vertex information.
With the WARP running, the driver just has to setup the graphic engine with the type of primitive to draw (texture format, Z compare, alpha blend…) and send vertex information to the micro-codes.
The micro-codes (or pipes) that are distributed in this DDK are those in use in the current G200/G400 Direct3D/MCD/ICD drivers. These pipes can be the starting points of a driver that use the WARP. To have a completely optimized driver and because each 3D interface are different, the WARP pipes would need to be redesign for each specific uses.
This chapter will describe the register set that need to be setup for the WARP to work properly.
The major difference between the WARP in G200 and G400 is that there is 2 WARPs in G400. This adds some synchronization registers. Aside from that, some new WARPs instructions, a hardware support for strip and fan has been added.
In the G400, the 2 WARPs are independent which means that they have they’re own registers set, ALU and instructions cache.
The registers concerning the warp all begin with the letter W. The table below gives a short description. Refer to the full register specification for more detail.
Register name |
Description |
WACCEPTSEQ* |
Use to support triangle list, strip and fan. |
WCODEADDR / WCODEADDR1 |
In WARP cache operation INTERRUPT (see WMISC register description, field WMASTER). Not used in current implementations. |
WFLAG / WFLAG1 |
ALU flags, CONFIG flags and PROGRAMMABLE flags for WARP0 and WARP1. |
WGETMSB |
Configurations register. This is always initialized with the value 0x000F0E00 for OpenGL and 0x000B0E00 for Direct3D. The difference between those two values is WGETMSB.centersnap which control the sub-pixel adjustment to the center of the pixel (OpenGL) or to the upper left (Direct3D). |
WIADDR / WIADDR2 |
This is the instruction pointer of the micro-code. It is also use to start/stop/suspend/resume the execution of the micro-code. WIADDR set WARP0 instruction pointer while WIADDR2 set both WARP0 and WARP1 instruction pointer. |
WIMEMADDR / WIMEMADDR1 |
Use in conjunction with WIMEMDATA/WIMEMDATA1 when WMISC=00. Use to load the micro-code inside the instruction memory. |
WIMEMDATA / WIMEMDATA1 |
Use in conjunction with WIMEMADDR/WIMEMADDR1 when WMISC=00. Use to load the micro-code inside the instruction memory. |
WMISC |
Micro-code fetch operation mode. There are 3 operation modes.
|
WR0 to WR63 / WR64 to WR127 |
Those are the internal 32 bits registers of WARP0 and WARP1. |
WVRTXSZ |
Give the size of each vertex and the size of the first primitive (see WACCEPTSEQ). |
WBRKPTSTS / WBRKPTSTS1* |
Status registers. Use to know if WARPs are frozen or not and if they have finished processing. |
As first initialization the following registers should be initialized: WIADDR2, WGETMSB, WVRTXSZ, WACCEPTSEQ and WMISC. Of those, only WMISC is a one time initialization registers. Normally, busmastering capability is known at load time and this state cannot change without rebooting the machine so WMISC value can be determined at load time. Value 0x00000008 and value 0x00000003 are the most used. See Annex 1 -Warp initialization code for code example.
Depending on WMISC, the ways to load the micro-code differ. The 2 most used values are WMISC=3 (full busmastering) and WMISC=0 (direct access).
In bus mastering, each WARPs fetch the micro-code himself. To do this, the driver has to stop the WARPs (WIADDR2.wmode to 00), set the new AGP address of the micro-code (WIADDR2.wiaddr) and restart the WARPs (WIADDR2.wmode to 11). In other words, the programming sequence to set a new micro-code in the WARPs contains 2 write to the WIADDR2 register, one to stop the WARPs and one to restart it with the new address. Code example in Annex 2 - Loading micro-codes in busmastering.
In direct access, the WARPs use the instruction cache as code memory. Since the WARPs will not fetch the micro-code, the driver must download it into the instruction cahce. So, the driver stops the WARPs, download the micro-code then restart the WARPs (code example in Annex 3 - List of figures). Registers WBRKPTSTS / WBRKPTSTS1 are used know when the WARPs are idle.
To download the micro-code, registers WIMEMADDRx and WIMEMDATAx are used. Register WIMEMADDRx contains the address in the instruction cache where the micro-code will be put. These auto-increment registers are modulo 8 bytes addresses as the WARPs work in 64 bits instruction slices. Each 2 writes to WIMEMDATAx will increment the WIMEMADDRx by 1. Micro-code size is always be 8 bytes aligned. On the G400, each WARP has its own instruction cache so the driver must download the micro-code in both of them.
This chapter discusses the vertex format expected by the pipe, how to send information to the WARPs and support for triangle list, triangle fans and triangle strips.
The pipes receive vertex information based in Direct3D TLVERTEX. For the multiple stage support of the G400, another format has been added to send the second texture coordinates.
typedef struct _COLOR {
BYTE Blue,
Green,
Red,
Alpha;
} COLOR, *LPCOLOR;
typedef struct _VERTEX {
float x; /* Screen coordinates */
float y;
float z;
float rhw; /* Reciprocal of homogeneous w */
COLOR color; /* Vertex color */
COLOR specular; /* Specular component of vertex. Alpha component if fog factor */
float tu0; /* Texture coordinates for stage 0 */
float tv0;
} VERTEX, *LPVERTEX;
typedef struct _VERTEX2 {
float x; /* Screen coordinates */
float y;
float z;
float rhw; /* Reciprocal of homogeneous w */
DWORD color; /* Vertex color. From MSB to LSB à ARGB */
DWORD specular; /* Specular component of vertex. From MSB to LSB à FogRsGsBs */
float tu0; /* Texture coordinates for stage 0 */
float tv0;
float tu1; /* Texture coordinates for stage 1*/
float tv1;
} VERTEX, *LPVERTEX2;
Where:
Where:
Depending on the pipe selected, one of these vertex formats will be used to send the triangle information. For the hardware to know what constitute a vertex, the field WVRTXSZ.wvrtxsz must be sent to the number of DWORDs of the vertex size (which would be 8 for VERTEX and 10 for VERTEX2).
In busmastering, there is 5 types of information that can be transmitted to the G200/G400, 2 of which are for the WARPs. In direct access, only 4 are provided, one of which is for the WARPs.
In busmastering, the first method consists of using the DMA VERTEX WRITE of register PRIMADDRESS or register SECADDRESS. The driver copies the vertex into an AGP buffer that the G200/G400 busmastering will read and send to the WARPs.
The second method consist of using the DMA VERTEX FIXED LENGTH SETUP LIST of register SETUPADDRESS. This method needs more consistent use of the AGP aperture trough the transformation/lighting operations. The basis of this method is that the transform/lighting already put the vertices in AGP buffers so the driver does not need to do the copy operation. The driver needs sending only AGP pointers to the vertex. The G200/G400 busmastering will read the vertex (using the size in WVRTXSZ.wvrtxsz).
In direct access, the only way to send information to the WARPs is to use the DMA VERTEX WRITE method of OPMODE.dmamod register (like the PRIMEDDRESS and SECADDRESS in busmastering). In this method, the driver copies the vertex information in the 7k window of the G200/G400 (address 0-1bff).
In all cases, the DWORDs that are directed to the WARPs will go trough the input path controller (remember Figure 2-1 Basic bloc diagram of Matrox chip).
Figure -1 Difference between busmastering methods of sending vertex information.
Figure note: In this figure, primary DMA channel is set to GENERAL PURPOSE DMA, secondary DMA channel SECADDRESS is set to DMA VERTEX WRITE and secondary DMA channel SETUPADDRESS is set to DMA VERTEX FIXED LENGTH SETUP LIST. Vertexes are all of the same size "n". The transform/lighting of the 3D interface fill the AGP buffers with vertexes #1…#m. The driver copies vertexes A and B from some buffers received by the transform/lighting.
This section describes the support for primitive lists, strips and fans. Since there is no such hardware support in the G200, this section is relevant only to G400.
The hardware contains cache for 3 vertices. For each triangle, any one of those 3 location can be replaced (or all of them). Also, culling direction may be different between triangles. With 2 WARPs, this means that odd triangle will be culled on one WARP while the even triangle will be culled by the other.
Now, let’s look at the 3 type of triangles (refer to Figure 4-2). For triangle list, each triangle is always 3 new vertices. Also, the culling is always on the same direction.
For triangle strip, the first primitive must receive 3 vertices, then 1 new vertex per triangle. Culling direction change between each triangle. Cache allocation follow the sequence 123, 423, 453, 456… This sequence is reset between strips.
Triangle fan differs from the strip by the sequence of the cache location: 123, 143, 145, 165…
Figure -2 Sequence of cache location for triangle list, strip and fan
To set the correct sequence in the WARPs, registers of interest are WACCEPTSEQ, WVRTXSZ.primsz and WFLAGx.Q. Register WACCEPTSEQ determines the sequence, the length of the sequence and the culling direction for each WARP. Field WVRTXSZ.primsz determines the size of the first primitive. Flag WFLAGx.Q is set automatically depending on values in WACCEPTSEQ.wsametag and WACCEPTSEQ.firsttag. Flag WFLAGx.Q is use by the pipes to know if culling direction must be reversed. The sequence is reset when WACCEPTSEQ is reprogrammed.
The values to set in WACCEPTSEQ.seqdst depend entirely on the pipes. See section "5.2 - Pipes parameters" for values.
This chapter describes the pipes that are in this DKK as well as the parameters needed to make then works properly.
The attributes supported by the graphic engines are triangle intensity (Gouraud interpolation of red, green and blue channel), alpha channel, Z buffer, specular highlights, vertex fog, texture stage 0 and texture stage 1.
The pipes compute graphic engine’s values for all those attributes in addition to the setup of the triangles themselves. Table 5-1 shows the list of programmed registers for each attribute.
Table -1 Registers set by the WARP pipes
Attribute |
Registers |
Gouraud intensity |
DR4, DR6 and DR7 for red component. DR8, DR10 and DR11 for green component. DR12, DR14 and DR15 for blue component. |
Alpha channel |
ALPHASTART, ALPHAXINC and ALPHAYINC. |
Z buffer |
DR0, DR2 and DR3. |
Specular highlights |
SPECRSTART, SPECRXINC and SPECRYINC for red component. SPECGSTART, SPECGXINC and SPECGYINC for green component. SPECBSTART, SPECBXINC and SPECBYINC for blue component. |
Vertex fog |
FOGSTART, FOGXINC and FOGYINC. |
Texture stages |
TMR0, TMR1 and TMR6 for the S component. TMR2, TMR3 and TMR7 for the T component. TMR4, TMR5 and TMR8 for the Q component. And finally TEXHEIGHT and TEXWIDTH. |
Edge registers |
AR0, AR1, AR2, AR4, AR5, AR6, SGN, YDST, FXLEFT, FXRIGHT and LEN. |
For optimum performance, more then one pipe is provided to minimize unneeded computations. Those pipes are TGZ, TGZS, TGZA, TGZF, TGZSA, TGZSF, TGZAF, TGZSAF, T2GZ, T2GZS, T2GZA, T2GZF, T2GZSA, T2GZSF, T2GZAF and T2GZSAF.
Where:
T stands for computation for texture stage 0.
T2 stands for computation of both texture stage 0 and texture stage 1.
G stands for computation for triangle intensity (Gouraud interpolation).
Z stands for computation for Z buffer interpolation.
S stands for computation of specular highlight.
A stands for computation of the alpha channel.
F stands for computation of the vertex fog interpolation.
There are some programmable parameters that the driver must set for all pipes. These parameters will enable the rest of the hardware features to be used. Table 5-2 shows the list of parameters and there description.
Table -2 Pipes parameters
Register |
Description |
WACCEPTSEQ |
Triangle list=0x18000000, VERTEX structure: Strips=0x02010200, Fans=0x01000408 VERTEX2 structure: Strips=0x02020400, Fans=0x01000810 Reprogram when a new pipe is used and on each new strips or fans. |
WFLAG.U / WFLAG1.U WFLAG.F / WFLAG1.F WFLAG.F16 / WFLAG1.F16 WFLAG.F17 / WFLAG1.F17 WFLAG.F18 / WFLAG1.F18 WFLAG.F19 / WFLAG1.F19 WFLAG.F20 / WFLAG1.F20 WFLAG.F22 / WFLAG1.F22
WFLAG.F23 / WFLAG1.F23 |
Enable culling of the triangles. 0 to cull counter clock wise, 1 to cull clock wise. 0 to cull counter clock wise, 1 to cull clock wise. For texture stage 0. Set to 1 when fix of U coordinate is needed For texture stage 0. Set to 1 when fix of V coordinate is needed For texture stage 1. Set to 1 when fix of U coordinate is needed For texture stage 1. Set to 1 when fix of V coordinate is needed For texture stage 0. Set to 1 when both U and V coordinate are in REPEAT mode. For texture stage 1. Set to 1 when both U and V coordinate are in REPEAT mode.
Reprogram when a new pipe is used and on state change |
WVRTXSZ |
VERTEX structure: 0x00000187 VERTEX2 structure: 0x000001E9 Reprogram when new vertex size is used |
WR56 |
This register is set to 12800.0f and is reprogram when a new pipe is used. |
WR49 and WR57 |
For texture stage 0. Those registers must be set to 0. Reprogram on stage change |
WR54 |
For texture stage 0. Value of TEXWIDTH register or’ed with 0x40. Reprogram on stage change. |
WR62 |
For texture stage 0. Value of TEXHEIGHT register or’ed with 0x40. Reprogram on stage change. |
WR53 and WR61 |
For texture stage 1. Those registers must be set to 0. Reprogram on stage change |
WR52 |
For texture stage 1. Value of TEXWIDTH register or’ed with 0x40. Reprogram on stage change. |
WR60 |
For texture stage 1. Value of TEXHEIGHT register or’ed with 0x40. Reprogram on stage change. |
Warp initialization code
Void WarpInit ( void )
{
// Check if the board can accept registers accesses (FIFOSTATUS register)
//
CheckFifoSpace(4);
// First stop the Warp in case it's running.
//
WriteDword(WIADDR2, WIADDR_WMODE_SUSPEND );
// Set the GetMSB instruction limits, and initialize the vertex size to 7.
// WGETMSB_MIN is 0. WGETMSB_MAX is 14.
//
WriteDword( WGETMSB, WGETMSB_WFASTCROP |
WGETMSB_WBRKLEFTTOP |
WGETMSB_WBRKRIGHTTOP |
WGETMSB_MIN |
(WGETMSB_MAX << WGETMSB_WGETMSBMAX_SHIFT);
WriteDword( WVRTXSZ, (24 << WVRTXSZ_PRIMSZ_SHIFT) | 7 );
WriteDword( WACCEPTSEQ, WACCEPTSEQ_SEQOFF_TRUE);
// busmastering microcode upload (source is agp)
//
if ( warpBufferType == WARP_AGP_BUSMASTER )
wmisc = WMISC_WUCODECACHE | WMISC_WMASTER | WMISC_WCACHEFLUSH;
// busmastering microcode upload (source is pci locked)
//
if ( warpBufferType == WARP_PCI_BUSMASTER )
wmisc = WMISC_WUCODECACHE | WMISC_WMASTER | WMISC_WCACHEFLUSH;
// directaccess microcode upload
if ( warpBufferType == WARP_PCI_DIRECTACCESS )
wmisc = WMISC_WCACHEFLUSH;
WriteDword( WMISC, wmisc );
}
This function "load" the micro-code into the WARP. This is not really "loading" since, in busmastering, the WARP fetches itself the micro-code. It is more setting the WIADDR2 register.
In this function, register WVRTXSZ is reprogrammed since it can only change when the micro-code changes (driver architecture). Register WR56 is also reset.
In busmastering, before restarting the WARPs, the DMAPAD register must be set (hardware bug).
Parameter pContext identify the Direct3D context.
Parameter pWarpCode is the AGP address of the pipe to be use by both WARPs.
VOID WarpWriteCodeBM ( D3DCONTEXT *pContext, void *pwarpcode)
{
DMA_INTERFACE * pDMA;
float fParam= 12800.0f;
// Get a DMA buffer for the warp micro-code
//
pDMA = pContext->pDMA;
// Check for room in the DMA buffer
//
CheckCurBufferSpace( pDMA, 2*DMA_PACKET_SIZE );
// Stop the warp + program warp setup + resume warp
//
*(pDMA->Cur + 0) = DMA_TAG( WIADDR2, WR56, WVRTXSZ, DMAPAD);
*(pDMA->Cur + 1) = WIADDR_WMODE_SUSPEND;
*(pDMA->Cur + 2) = pContext->hwWVRTXSZ;
*(pDMA->Cur + 3) = *((DWORD*)(& fParam));
WriteToDMAPAD(pDMA->Cur + 4);
*(pDMA->Cur + 5) = DMA_TAG( DMAPAD, DMAPAD, DMAPAD, WIADDR2);
WriteToDMAPAD(pDMA->Cur + 6);
WriteToDMAPAD(pDMA->Cur + 7);
WriteToDMAPAD(pDMA->Cur + 8);
*(pDMA->Cur + 9) = ((DWORD) pwarpcode ) | WIADDR_WMODE_START | warpXferProtocol;
pDMA->Cur += 10;
}
This function loads the micro-code into the WARP.
In this function, register WVRTXSZ is reprogrammed since it can only change when the micro-code changes (driver architecture). Register WR56 is also reset.
Parameter pContext identify the Direct3D context.
Parameter pWarpCode is the linear address of the pipe to be use by both WARPs.
Parameter WarpCodeSize is the size of the pipe to load.
Global variable "warpCurrentPipe" contain the last pipe that as been sent. It is set outside this function.
VOID WarpWriteCodeDA ( D3DCONTEXT *pContext, void *pvwarpcode, DWORD warpcodesize )
{
DWORD i;
DWORD *pwarpcode;
float fParam = 12800.0f;
// microcode to be downloaded
//
pwarpcode = (DWORD *) pvwarpcode;
// If we are in busmastering and we are using this function, we MUST sync the HW and
// break the Bus Mastering to make sure not to mix busmastering and direct access
//
SendCurBuffer( pContext->pDMA );
If ( IS_SUPPORTED(BusMastering) )
SyncHw();
// Upload the warp code only if we really need to!
//
if (pwarpcode != warpCurrentPipe)
{
// Here, we must reload the WARPs micro-code.
//
// Suspend the warp engine
//
CheckFifoSpace(1);
WriteDword (WIADDR2, WIADDR_WMODE_SUSPEND );
// Wait for warp idle state (for the 2 warps...)
//
PollDword( WBRKPTSTS, WBRKPTSTS_WARPSTS_IDLE, WBRKPTSTS_WARPSTS_MASK );
PollDword( WBRKPTSTS1, WBRKPTSTS_WARPSTS_IDLE, WBRKPTSTS_WARPSTS_MASK );
// Modifiy the warp parameters
//
CheckFifoSpace(2);
WriteDword (WVRTXSZ, pContext->hwWVRTXSZ );
WriteDword (WR56, *((DWORD*)(&fParam)) );
// Program the microcode cache address
//
WriteDword (WIMEMADDR, 0 );
WriteDword (WIMEMADDR1, 0 );
// Two access are required at the wimemdata for one complete instruction
//
warpcodesize = (warpcodesize + 7) & (~7);
// Download the microcode (64bits at a time [or 8 bytes]) by chunk of 16 DWORDs
//
while ( ((LONG)warpcodesize -= (ECLIPSE_FIFO_SIZE*4)) > 0 )
{
// Send to the first WARP
CheckFifoSpace(ECLIPSE_FIFO_SIZE);
for (i=0; i < ECLIPSE_FIFO_SIZE; i++)
WriteDword (WIMEMDATA, pwarpcode[i] );
// Send to the second WARP
CheckFifoSpace(ECLIPSE_FIFO_SIZE);
for (i=0; i < ECLIPSE_FIFO_SIZE; i++)
WriteDword (WIMEMDATA1, *pwarpcode++ );
}
warpcodesize += (ECLIPSE_FIFO_SIZE*4);
if (warpcodesize != 0 )
{
// Send to the first WARP
CheckFifoSpace(ECLIPSE_FIFO_SIZE);
for (i=0; i < (warpcodesize/4); i++)
WriteDword (WIMEMDATA, pwarpcode[i] );
// Send to the second WARP
CheckFifoSpace(ECLIPSE_FIFO_SIZE);
for (i=0; i < (warpcodesize/4); i++)
WriteDword (WIMEMDATA1, *pwarpcode++ );
}
// resume the warp
CheckFifoSpace(1);
WriteDword (WIADDR2, WIADDR_WMODE_START );
}
else
{
// Only stop/start the WARPs since it is the same pipe.
//
// Stop the warp
//
CheckFifoSpace( 1 );
WriteDword (WIADDR2, WIADDR_WMODE_SUSPEND );
// Wait for warp idle state (for the 2 warps...)
//
PollDword( WBRKPTSTS, WBRKPTSTS_WARPSTS_IDLE, WBRKPTSTS_WARPSTS_MASK );
PollDword( WBRKPTSTS1, WBRKPTSTS_WARPSTS_IDLE, WBRKPTSTS_WARPSTS_MASK );
// Modifiy the parameters
//
CheckFifoSpace( 3 );
WriteDword (WVRTXSZ, pContext->hwWVRTXSZ );
WriteDword (WR56, *((DWORD*)(&fParam)) );
// Resume the warp
WriteDword (WIADDR2, WIADDR_WMODE_START );
}
}
A modifier