Photo by sskennel.

Today’s Question & Answer session comes to us courtesy of SuperUser—a subdivision of Stack Exchange, a community-drive grouping of Q&A web sites.

The Question

SuperUser reader Sathya posed the question:

Here you can see a screenshot of a small C++ program called Triangle.exe with a rotating triangle based on the OpenGL API.

Admittedly a very basic example but I think it’s applicable to other graphic cards operations.

I was just curious and wanted to know the whole process from double clicking on Triangle.exe under Windows XP until I can see the triangle rotating on the monitor. What happens, how do CPU (which first handles the .exe) and GPU (which finally outputs the triangle on the screen) interact?

I guess involved in displaying this rotating triangle is primarily the following hardware/software among others:

Hardware

HDD System Memory (RAM) CPU Video memory GPU LCD display

Software

Operating System DirectX/OpenGL API Nvidia Driver

Can anyone explain the process, maybe with some sort of flow chart for illustration?

It should not be a complex explanation that covers every single step (guess that would go beyond the scope), but an explanation an intermediate IT guy can follow.

I’m pretty sure a lot of people that would even call themselves IT professionals could not describe this process correctly.

The Answer

Although multiple community members answered the question, Oliver Salzburg went the extra mile and answered it not only with a detailed response but excellent accompanying graphics.

Image by JasonC, available as wallpaper here.

He writes:

I decided to write a bit about the programming aspect and how components talk to each other. Maybe it’ll shed some light on certain areas.

The Presentation

What does it take to even have that single image, that you posted in your question, drawn on the screen?

There are many ways to draw a triangle on the screen. For simplicity, let’s assume no vertex buffers were used. (A vertex bufferis an area of memory where you store coordinates.) Let’s assume the program simply told the graphics processing pipeline about every single vertex (a vertex is just a coordinate in space) in a row.

But, before we can draw anything, we first have to run some scaffolding. We’ll see why later:

So what did that do?

When you write a program that wants to use the graphics card, you’ll usually pick some kind of interface to the driver. Some well known interfaces to the driver are:

OpenGL Direct3D CUDA

For this example we’ll stick with OpenGL. Now, your interface to the driver is what gives you all the tools you need to make your program talk to the graphics card (or the driver, which then talks to the card).

This interface is bound to give you certain tools. These tools take the shape of an API which you can call from your program.

That API is what we see being used in the example above. Let’s take a closer look.

The Scaffolding

Before you can really do any actual drawing, you’ll have to perform a setup. You have to define your viewport (the area that will actually be rendered), your perspective (the camera into your world), what anti-aliasing you will be using (to smooth out the edged of your triangle)…

But we won’t look at any of that. We’ll just take a peek at the stuff you’ll have to do every frame. Like:

Clearing the screen

The graphics pipeline is not going to clear the screen for you every frame. You’ll have to tell it. Why? This is why:

If you don’t clear the screen, you’ll simply draw over it every frame. That’s why we call glClear with theGL_COLOR_BUFFER_BIT set. The other bit (GL_DEPTH_BUFFER_BIT) tells OpenGL to clear the depthbuffer. This buffer is used to determine which pixels are in front (or behind) other pixels.

Transformation

Image source

Transformation is the part where we take all the input coordinates (the vertices of our triangle) and apply our ModelView matrix. This is the matrix that explains how our model (the vertices) are rotated, scaled, and translated (moved).

Next, we apply our Projection matrix. This moves all coordinates so that they face our camera correctly.

Now we transform once more, with our Viewport matrix. We do this to scale our model to the size of our monitor. Now we have a set of vertices that are ready to be rendered!

We’ll come back to transformation a bit later.

Drawing

To draw a triangle, we can simply tell OpenGL to start a new list of triangles by calling glBegin with the GL_TRIANGLES constant. There are also other forms you can draw. Like a triangle strip or a triangle fan. These are primarily optimizations, as they require less communication between the CPU and the GPU to draw the same amount of triangles.

After that, we can provide a list of sets of 3 vertices which should make up each triangle. Every triangle uses 3 coordinates (as we’re in 3D-space). Additionally, I also provide a color for each vertex, by callingglColor3f before calling glVertex3f.

The shade between the 3 vertices (the 3 corners of the triangle) is calculated by OpenGLautomatically. It will interpolate the color over the whole face of the polygon.

Interaction

Now, when you click the window. The application only has to capture the window message that signals the click. Then you can run any action in your program you want.

This gets a lot more difficult once you want to start interacting with your 3D scene.

You first have to clearly know at which pixel the user clicked the window. Then, taking your perspectiveinto account, you can calculate the direction of a ray, from the point of the mouse click into your scene. You can then calculate if any object in your scene intersects with that ray. Now you know if the user clicked an object.

So, how do you make it rotate?

Transformation

I am aware of two types of transformations that are generally applied:

Matrix-based transformation Bone-based transformation

The difference is that bones affect single vertices. Matrices always affect all drawn vertices in the same way. Let’s look at an example.

Example

Earlier, we loaded our identity matrix before drawing our triangle. The identity matrix is one that simply provides no transformation at all. So, whatever I draw, is only affected by my perspective. So, the triangle will not be rotated at all.

If I want to rotate it now, I could either do the math myself (on the CPU) and simply call glVertex3f withother coordinates (that are rotated). Or I could let the GPU do all the work, by calling glRotatefbefore drawing:

amount is, of course, just a fixed value. If you want to animate, you’ll have to keep track of amountand increase it every frame.

So, wait, what happened to all the matrix talk earlier?

In this simple example, we don’t have to care about matrices. We simply call glRotatef and it takes care of all that for us.

Well, thanks for that!

x 2 ⁡ 1 – c + c x ⁢ y ⁡ 1 – c – z ⁢ s x ⁢ z ⁡ 1 – c + y ⁢ s 0 y ⁢ x ⁡ 1 – c + z ⁢ s y 2 ⁡ 1 – c + c y ⁢ z ⁡ 1 – c – x ⁢ s 0 x ⁢ z ⁡ 1 – c – y ⁢ s y ⁢ z ⁡ 1 – c + x ⁢ s z 2 ⁡ 1 – c + c 0 0 0 0 1

Conclusion

What becomes obvious is, there’s a lot of talk to OpenGL. But it’s not telling us anything. Where is the communication?

The only thing that OpenGL is telling us in this example is when it’s done. Every operation will take a certain amount of time. Some operation take incredibly long, others are incredibly quick.

Sending a vertex to the GPU will be so fast, I wouldn’t even know how to express it. Sending thousands of vertices from the CPU to the GPU, every single frame, is, most likely, no issue at all.

Clearing the screen can take a millisecond or worse (keep in mind, you usually only have about 16 milliseconds of time to draw each frame), depending on how large your viewport is. To clear it, OpenGL has to draw every single pixel in the color you want to clear to, that could be millions of pixels.

Other than that, we can pretty much only ask OpenGL about the capabilities of our graphics adapter (max resolution, max anti-aliasing, max color depth, …).

But we can also fill a texture with pixels that each have a specific color. Each pixel thus holds a value and the texture is a giant “file” filled with data. We can load that into the graphics card (by creating a texture buffer), then load a shader, tell that shader to use our texture as an input and run some extremely heavy calculations on our “file”.

We can then “render” the result of our computation (in the form of new colors) into a new texture.

That’s how you can make the GPU work for you in other ways. I assume CUDA performs similar to that aspect, but I never had the opportunity to work with it.

We really only slightly touched the whole subject. 3D graphics programming is a hell of a beast.

Image Source

Have something to add to the explanation? Sound off in the the comments. Want to read more answers from other tech-savvy Stack Exchange users? Check out the full discussion thread here.