Help by OpenGL expert kindly requested

D2X-XL - Descent II update for modern systems with many new features and enhanced graphics. Home Page

Moderators: Grendel, Aus-RED-5

Post Reply
User avatar
Diedel
D2X Master
D2X Master
Posts: 5278
Joined: Thu Nov 05, 1998 12:01 pm
Contact:

Help by OpenGL expert kindly requested

Post by Diedel »

As you may have noticed, frame rate drops dramatically in D2X-XL, particularly when pixel shaders are used (for lightmaps or hires-textures).

I am not an OpenGL expert, and I'd appreciate it very much if someone who knows something about this subject could take a look or two into D2X-XL (I'd gladly assist where I can) and give me hints where and how to improve performance.
pATCheS
DBB Ace
DBB Ace
Posts: 187
Joined: Sat Apr 03, 2004 9:12 pm
Contact:

Post by pATCheS »

User avatar
Diedel
D2X Master
D2X Master
Posts: 5278
Joined: Thu Nov 05, 1998 12:01 pm
Contact:

Post by Diedel »

The code isn't buggy. Rather conceptionally flawed.
User avatar
Iceman
DBB Habitual Type Killer
DBB Habitual Type Killer
Posts: 4929
Joined: Thu Apr 20, 2000 2:01 am
Location: Huntsville, AL. USA
Contact:

Post by Iceman »

gDebugger is more than a debugger. It is an OpenGL native optimization tool also. It can help you identify the problem areas of code so that you can concentate on them.
User avatar
Diedel
D2X Master
D2X Master
Posts: 5278
Joined: Thu Nov 05, 1998 12:01 pm
Contact:

Post by Diedel »

I tried it, but it's of no use to me.

As I said, the problems are rather conceptual.
pATCheS
DBB Ace
DBB Ace
Posts: 187
Joined: Sat Apr 03, 2004 9:12 pm
Contact:

Post by pATCheS »

That program can help you identify what calls are expensive, as well as find and remove redundancies. You can also try some shader performance analysis tools and see what's slowing things up. nVidia has some excellent stuff in that department, but I don't know if it'll run on ATI hardware.

A fatal flaw D2X has in general is inefficient management of state. State changes have associated costs, and the most expensive state change by far is changing to a different texture. D2X changes textures pretty much for every single face drawn. Not a particularly good way to go about things.

The easiest way to fix D2X-XL's OpenGL shortcomings would be to render geometry and corresponding state information into a buffer and process/sort the buffer appropriately before sending them off for rendering. This would also allow aggregation of vertex data into arrays so that you can use the mass-draw functions (forgot what they're called), which is much better than individual calls to the glVertex functions.

Are there any decent free/GPL/open source/etc programs that do code visualization? Such a tool would help immensely, and I can't seem to find anything. Even a text call tree would be awesome. I'm just about ready to kludge together a program that picks out function calls and dumps them into formatted HTML or something.
User avatar
Diedel
D2X Master
D2X Master
Posts: 5278
Joined: Thu Nov 05, 1998 12:01 pm
Contact:

Post by Diedel »

patches,

very interesting!

changing to a different texture even if it is already loaded to the gfx card? That means if I'd first sort all faces to render by their texture, and then render all faces with the same texture one after another, rendering might considerably speed up? Doesn't this eventually conflict with hidden pixel removal (i.e. if I render foreground faces first, the gfx card will already drop a lot of texels with bigger Z?)

There is other stuff, like D2 computing the vertex coordinates afresh for every frame depending on player position and orientation. This is done to be able to remove hidden segments (cubes)/surfaces, but I wonder whether there would be a more efficient way to handle this. I have done some profiling, and the function calls for these computation are very expensive. I have tried to replace these functions with more efficient ones, but the new functions yield slightly different results in some cases (no problem for the renderer), and these also affect level geometry calculation (e.g. normals; still no problem), but these are used in level CRC computation, so you will get 'invalid level' errors when playing multiplayer with non D2X-XL versions. What a mess.

About the shaders: actually, they're extremely simple: They simply blend several textures together in the case of super transparency with hires textures (color key transparency), and to add lightmaps. Above that, these blending shaders are only executed if the face to be rendered actually has such textures. I am really clueless why this slows down rendering so tremendously even on my Radeon X800 XT PE which runs far more complex games as fast or faster. On the other hand, when just rendering standard D2 levels w/o all the new gimmicks, it gives me framerates up to 1000 fps.
pATCheS
DBB Ace
DBB Ace
Posts: 187
Joined: Sat Apr 03, 2004 9:12 pm
Contact:

Post by pATCheS »

Determining cube visibility shouldn't involve transforming the entire level..? I'm not too informed on the math and methods in determining visibility, but doing a dot product between the player's forward vector and all points would certainly be faster than transforming all cubes based on player position and orientation. Is there a way to get unique points rather than points of a cube? Maybe merge extraneous verts when the level is loaded and keep a list of which cubes have which verts, and determine visibility using that.

The level's CRC only needs to be computed once, right? Why not use the original function to compute it after loading the level, and then use something faster for rendering?


As far as I know, changing textures is the most expensive state change. It'd be even more so if it weren't already on the gfx card. I don't know the underlying reasons for its expense, but I imagine it has a lot to do with the driver and with changing and synchronizing hardware state. So, draw faces sorted by texture, front to back. If you're really worried about overdraw, do a Z-only pass first before rendering the geometry. But really, this game at most has two textures per pixel per face (three for lightmapping) and not a whole lot of opportunity for overdraw (as long as you're doing a good job of visibility), so it's not any great concern, even for older generation graphics cards.

I remember from my experimentations with D2X-W32's code a long while back that I found some interesting artifacts which suggest that doors are drawn more than once, and by different parts of the code which are using different render state. Getting up close to a door caused a definite slowdown on my laptop when I had it, and I think even when I was using the 9800 Pro in my desktop. This weirdness really ought to be fixed...

Come to think of it, shaders are another thing that's expensive to be switching around a lot, again due to driver and card state. Perhaps this is the root cause of the slowdown.


Here's another idea. Instead of using constants in the shader, use texture alpha for normal transparency, and a seperate one channel texture map for textures that have super-transparency. I don't know how this will affect shader performance (it'll certainly increase memory usage), but hopefully it would be faster (and easier to maintain) than shaders with constants. The color keying operation could also be fairly intensive depending on how you determine it; to eliminate interpolative sampling artifacts.. hell, I don't even know how you could use color keying in a shader without multiple texture samples (probably either 4 or 9) whose positions are derived from some very repetitive math. Either that or a nearest texel filter.


I have an nVidia graphics card now (6800 Ultra. very fast, I'm spoiled by FSAA :)), so I can use nVidia's powerful shader performance analysis tools to see where the slowdown is.
User avatar
Diedel
D2X Master
D2X Master
Posts: 5278
Joined: Thu Nov 05, 1998 12:01 pm
Contact:

Post by Diedel »

D2 has one vertex list; Cubes have indices into that list for each corner. I don't know anything really about 3D math though, so I cannot judge whether alternative methods of visibility culling would be faster and work with D2.

Drawing doors with standard levels should be any slower than single textures, as door frames are merged with the underlying textures when loading the textures from disk (so D2 dynamically creates additional textures). The color key transparency is solved when merging the textures that way already.

I'd appreciate very much you doing some OpenGL profiling and telling me the results.
pATCheS
DBB Ace
DBB Ace
Posts: 187
Joined: Sat Apr 03, 2004 9:12 pm
Contact:

Post by pATCheS »

The dot product is defined as the cosine between two angles. You multiply corresponding components of two normals, then add them all up. (x1*x2+y1*y2+etc) Inverse cosine gives the angle, but you needn't do that most of the time. When the dot product is less than 0, the normals are more than 90 degrees apart. The closer to 1 the DP is, the closer together they are.

All you would need for visibility is a bunch of dot products. If you can find information for determining normals pointing to the corners of the viewport (it's probably already in the source in some form), you can take the dot between that and the forward vector of the player. This essentially gives you a cone for testing visibility (because it's based on angle and not on the actual viewport). Then it's a simple matter to draw all faces attached to verts which have a dot product greater than that. The video card does its own clipping practically instantly, so the little extra geometry won't have any significant impact.


Set the texture alpha on doors to .5 and see what happens (vertex alpha would work too but you'd need to use either a shader or a texture environment setting which I don't remember). I know for a fact that, at least in older versions, they were being drawn more than once.


man, it's almost 5 am here... I'll play with it some time in the next few days, my main rig doesn't have net access
User avatar
Diedel
D2X Master
D2X Master
Posts: 5278
Joined: Thu Nov 05, 1998 12:01 pm
Contact:

Post by Diedel »

I know how to compute a dot product, but I don't know how to interpret the result in terms of something being visible. So I have no clue how to compute that cone and test vertices whether they are in the cone. It's one thing to compute a dot product, and another thing to understand its geometrical implications.

as I told you before: With standard D2 textures, textures containing color keyed transparency are merged with the underlieing texture into new textures frame by frame - I am sure that they are not drawn more than once.
User avatar
Diedel
D2X Master
D2X Master
Posts: 5278
Joined: Thu Nov 05, 1998 12:01 pm
Contact:

Post by Diedel »

patches,

optimization attempts:
  • group all faces with the same texture together when rendering them - no speed increase.
  • Suppress glBindTexture() calls when texture already bound - no speed increase.
pATCheS
DBB Ace
DBB Ace
Posts: 187
Joined: Sat Apr 03, 2004 9:12 pm
Contact:

Post by pATCheS »

Google for \"dot product java\" for interactive visualizations of what the dot product does.

I need to find a code profiling program that shows function execution times... I can't do much else without one.

I did find something interesting about the menus. In gDEBugger, if you enable profiling mode and force 2x2 textures, menus gain an inordinate amount of speed, at the obvious expense of readability. Forcing 2x2 textures with the abort game menu up while in the initial big room of Kool Kave caused my framerate to go from 90 to 120, and with the menu down, the difference was from 130 to 140. -fastmenu causes menu text to be rendered into textures so it can be drawn all at once rather than drawing individual letters, which helps a lot after the initial pause for rendering, but it doesn't solve the real problem. Such simple geometry should not be so slow to render. Unfortunately, there's no way to disable specific calls in gDEBugger... Have you done much profiling of the menu code?
User avatar
Diedel
D2X Master
D2X Master
Posts: 5278
Joined: Thu Nov 05, 1998 12:01 pm
Contact:

Post by Diedel »

Did you try the menu in a one-cube level?

I know what -fastmenu does - I have implemented it. ;)

I have had > 4000 fps in a 10 cubes level. I believe D2X-XL is very CPU dependant.

I can profile D2X-XL at work, and most of the time was consumed in the geometry computation functions.

Edit: I have replaced the integer square root function used in D2X-XL by a call to the runtime lib function double sqrt(double) and have had a speed increase of almost 50%.
Post Reply