Been reworking some of the surface refresh code and have now got back the additional speed that I'd lost yesterday. Yesterday's code was nonetheless somewhat faster than before, but today's is much faster.
Some more relatively minor things to try out before I call this one a wrap; at least for now. Ultimately there is one huge inefficiency in the code, which is filling a dynamic vertex buffer for surfaces each frame. There are various ways and means of addressing this, but they require a heavier reworking of the surface refresh than I'm prepared to undertake right now, so they're going to have to wait. The current work is more in terms of addressing CPU-side bottlenecks; specifically the building of lots of large intermediate tables.
I'm hoping to get this finally released over the next few days, so watch this space.
Thursday, April 28, 2011
Been reworking some of the surface refresh code and have now got back the additional speed that I'd lost yesterday. Yesterday's code was nonetheless somewhat faster than before, but today's is much faster.
Posted by mhquake at 10:39 PM
Wednesday, April 27, 2011
The next one will go maybe twice as fast when under heavy load. Lots of optimizations in the brush model drawing functions are coming through, which is resolving one of the major bottlenecks in the engine. Neat.
Update - lost some of the extra speed while fixing up some things. Still faster with brush surfaces though, and still starting to move in the right direction for removing some major bottlenecks from them.
Posted by mhquake at 11:19 PM
Tuesday, April 26, 2011
I haven't updated for a few days, but I've been slowly bringing things together for the next release, which should hopefully happen soon-ish. This is shaping up to be one of the most stable DirectQ versions ever as lots of little glitches and niggles are getting nicely smoothed off.
I've also done yet another full run through ID1 over the Easter weekend, and my previous revised impressions of e4 are coming through even stronger - definitely the best.
There's going to be another rewrite of the main surface refresh coming up shortly too. The current version uses a dynamic vertex buffer which I intend switching over to static; this should see major performance improvements in maps with complex brushwork or high polycounts. There are still some - hopefully relatively minor - niggling issues to be worked out regarding handling of brush models, but I'm starting to get a clearer picture of what it's going to look like and how it's going to work. That's a few releases away yet, so I'll say nothing more for now.
Wanted: Sane Doom Source Port
I'm in the market for a relatively sane Doom source port. I've gone hunting through a few, including ZDoom, PrBoom-Plus and Doomsday, and I've been fairly dismayed by what I've found. In one way they remind me of where Quake engine ports were at maybe 10 years ago, in that they all seem to include:
- Lots and lots of DLLs that are necessary to run the engine.
- Additional external content that the engine won't run without.
- Unnecessary eye-candy effects.
- Support for mouselooking, MD2 files, high-resolution textures, etc etc yadda yadda.
- Third-party audio and/or video libraries (SDL/OpenAL/etc) needed.
Posted by mhquake at 11:23 PM
Wednesday, April 20, 2011
This was going to be a discussion of some of the choices, and the reasons behind them, for the underwater fog implementation I did lately, but I'm holding back on that one until later as there is something more important that you need to know.
For the first time ever on this blog I've deleted a person's comment (one that wasn't spam anyway) and have gone back and edited an old post to read differently than it previously did (I have deleted posts before where events have overtaken their contents).
This was not something that I did lightly, and it was not on account of the person's comment, which was a perfectly valid and fair question. It was instead on account of my own realisation that the way I had worded something was open to misinterpretation. My screw-up in other words.
I'm not going to hide this, but I do want to discuss it a little more in context. It's only right that you know about what it was.
The original post (made yesterday) contained a discussion of changes made to video startup code that I had felt to be safer, owing to the fact that the new code always started in a non-exclusive non-fullscreen mode. Exclusive video modes made me slightly paranoid as they mean that the program now has sole ownership of the video hardware. Any fuck-up I make in code could have implications for stable running of your PC. That's obviously not a good thing.
The most likely place for such a fuck-up to occur is during video startup. DirectQ is reading in your cvars, it doesn't know what values you're using, it doesn't yet know your hardware capabilities, and - in theory - potentially anything could happen. In practice there are a LOT of safeguards in place, both within D3D and within my own code, to ensure that the possibility of this happening is significantly lower than the possibility of you winning the national lottery. But I still had this slight nagging doubt. What if I missed something?
Once video startup completes and the engine is up and running this possibility more or less vanishes. Every previously unknown factor is now known, and the situation is no worse than with any other Quake engine.
As things turned out this description is actually no longer relevant; it's been overtaken by events and the restructured video mode handling code I also did has made those changes completely unnecessary. But in the interests of fairness and openness it's only right that you know the background.
The comment was a question along the lines of "so does this mean that D3D can lock up your GPU?" I consider that a perfectly fair and valid question given the original wording I had used in my original post. However it does not describe the full story, and a reply I made to it only made the possibility for misinterpretation worse. I want to avoid that, but I also want to be open about stuff.
When starting up D3D one of two things can happen. It can succeed or it can fail. If it succeeds then there is nothing you need to worry about; startup has completed and everything has made the transition from unknown to known. If it fails there is also nothing you need to worry about - it won't even get exclusive access to the video hardware (for fullscreen modes only; windowed modes never have exclusive access), so there is no possibility of GPU lockups there.
Yes, D3D can lock up your GPU. So can OpenGL. I've seen it happen with D3D when you try to read back the result from an occlusion query that hasn't yet been issued. I've seen it happen with OpenGL when you update a dynamic lightmap on Intel drivers (that was actually part of the original reason for why DirectQ exists).
This is however all down to bad code or bad drivers, and is not an inherent feature of either D3D or OpenGL. That's the important thing to realise. This kind of thing gets caught in testing and is fixed. Otherwise both APIs are incredibly robust - I've actually made past releases that contain horrific things - like reading well beyond the end of a vertex buffer - but which have had no ill effects whatsoever during runtime. D3D is actually incredibly good at the "crash early and crash hard" philosophy, and has excellent debugging tools (which I regularly use) to ensure that these things get caught.
So that's about where we're at now, and I hope this clarifies a few things. Next up - the fog discussion.
Posted by mhquake at 11:47 PM
A fix for the eyes model has just gone in. This resolves a problem where DirectQ incorrectly positioned the eyes when their size was doubled. More about that one here.
The difficulties some people had with vsync should now be definitively resolved. I've identified a case where the value of your vid_vsync cvar was not being respected during video startup and have reworked the code that handled that. I've also taken the opportunity to clean up a lot of the video state change functions.
All of this work is heading towards being able to bring back variable refresh rate settings and introduce multisampling. It also cleans out more legacy crap from the first few versions of DirectQ (where I had really made a botch of the video code).
Posted by mhquake at 12:34 AM
Tuesday, April 19, 2011
Some more changes coming through.
A bug with save games (where it always saved as save_00000 if using the menus) has been fixed. I've also revised the menu slightly to show the savegame name, which might be handy for some folks.
I've disabled timer decoupling for multiplayer games. This is quite intentional - the whole reason for decoupled timers is to resolve physics wonkiness in single player games by running the server at a forced 72 FPS and the client at any arbitrary frame rate; it has no purpose (and may even have detrimental effects) in multiplayer.
I've added optional underwater fog (controlled by setting r_underwaterfog 1). There was a cvar called gl_underwaterfog in earlier versions, but it did nothing and I had chosen a bad default (on by default) for it so I needed to rename it. Ouch - must never do that again. As a first-cut of a new feature this might be a little rough around the edges in places - fair warning.
I'll probably do another release soon enough - in particular the save game bug is serious enough to require one.
Posted by mhquake at 6:31 PM
Wednesday, April 13, 2011
Tuesday, April 12, 2011
I'm hoping to get the next DirectQ release out tomorrow; no real performance improvements for you this time, but I have been fixing up some of the multiplayer aspect a little. The key thing is that .loc files now work again; the previous breaking of them was due to some code I had commented out while tracing through other problems with the ProQuake messaging system (this is the thing I had referred to some time back where the comment on the relevant part of the code was "added this").
I know that other ProQuake features have been requested many times in the past - stuff like team scores, match times, etc - but you're really going to have to take my word on this one and bear with me. The system is complex and it's not just something you can drop in casually and expect to work first time; especially not in an engine that's been as heavily revised as DirectQ has (and I don't just mean the renderer). Hopefully this latest change will have worked out most of the bugs and problems that have accumulated to date with it; if so I'll feel more comfortable pushing ahead with the next part.
One other thing I didn't get done was some required fixes for weirdness in the FOV code. For now you should just play with values until you find something that works well enough; the next one will do it right. I also didn't manage to revise entity alpha as time caught up while chasing other bugs.
As you may guess from my recent removal of 16bpp modes, the video mode list will have changed and your selected video mode will likely be no longer valid. I have a fix - of sorts - for this in place, but it hasn't really been tested in the wild just yet, so tomorrow's release will be an opportunity to see what happens. All going well you should get a 800x600 (or similar) windowed mode when you launch it the first time. You will then be able to go into the video modes memu and select the mode you really want.
Something I see on occasion is people complaining that Quake is taking too much CPU, that an old game shouldn't do this, and on occasion engine authors may even go so far as to put a Sleep call into their code to prevent this from happening.
Here's my take on it.
Quake is a real-time interactive application. Now, one of the features of a real-time interactive application is that it needs to respond more or less instantaneously to user input, and it needs to update it's display (and internal book-keeping) to reflect the current situation more or less instantaneously too. Otherwise it's no longer real-time nor interactive.
The only reliable way to do this is to keep the CPU running constantly. That way whenever anything happens it's ready to react. Minimum lag between the event and the reaction. This is important.
Now, most of the time when Quake is running it's actually doing nothing. Just spinning around in a loop waiting for a certain amount of time to have passed. Something cool about decoupling the timers is that you can now make some productive use of a good chunk of that "doing nothing" time, but that's irrelevant for this discussion, and the majority of time is still spent doing nothing.
So why not just send the CPU to sleep for a short while during these time periods? Here's where the trouble arises. Sleep function timers are imprecise; when you tell the CPU to sleep for n milliseconds it doesn't actually sleep for n milliseconds; it sleeps for n + x milliseconds, where x is a number that may be between 0 and infinity (in practice it's going to be around the 1-20 range). This is outside of your control.
So right away you're introducing a hugely significant element of imprecision and uncertaintly into your timing, and the end result is that Quake is no longer real-time nor interactive; every time the CPU sleeps there will be a certain amount of lag that you don't know in advance, that may cause you to miss frames, and that results in uneven response which only gets worse as the framerates get higher.
You as the player definitely do not want that, and that's why DirectQ will always chew as much CPU time as it can get.
Posted by mhquake at 8:42 PM
Monday, April 11, 2011
I'm thinking of removing support for 16bpp modes. There are a few reasons for this, so let's go through them:
- Unlike OpenGL, which gives you a 32bpp backbuffer but dithers it down to 16bpp for display, with D3D when you create a 16bpp mode you get a 16bpp backbuffer. I suspect that this may be a potential root cause of some folks having observed certain quality loss issues with textures in DirectQ.
- The performance advantages on older hardware of running in a 16bpp mode are dubious at best with modern (at least D3D9-class) hardware. At worst it may actually run slower as it has to translate textures and blending results from 32bpp to 16bpp (this has been observed - switching from 16bpp to 32bpp gave an over 2.5x framerate increase for one person). I suspect that some people may even be deliberately selecting 16bpp modes out of habit from GLQuake and not be even aware that they're getting nowhere near the best out of DirectQ.
- Availability of 32bpp modes is not an issue. Since DirectQ needs D3D9-class hardware anyway, if you're able to run your desktop at 1280x1024x32bpp then you've got a 32bpp mode of the same resolution available. This is guaranteed. The other major advantage of having 16bpp modes - as a fallback if the equivalent 32bpp mode is unavailable - is no longer relevant.
- In addition to the above, removing 16bpp modes simplifies the startup and video mode selection code, making it more robust.
Update - too late! The deed is done.
Posted by mhquake at 2:14 PM
Friday, April 8, 2011
This release is somewhat experimental, unfinished and maybe a bit rough around the edges; I personally feel that it's not quite ready yet, but I'm releasing it all the same for the following reasons which I feel are important:
- It fixes a few critical crash bugs in certain maps.
- It includes some important multiplayer fixes.
- It includes some important renderer fixes.
- The performance improvements are very worthwhile.
- It's probably timely to start getting some public feedback on it.
Note that this release features decoupled timers but they are disabled by default. If you want to experiment you can do so by setting host_decoupletimers 1 and then adjusting the value of host_maxfps to taste.
I've also hard-disabled the occlusion query code as I'm still somewhat unconfident in it; I've had experiences with D3D occlusion queries either hard-locking the PC, or (on WDDM drivers) resetting the driver every few seconds, so this is a feature that I really feel I need to make totally bullet-proof before I'd be comfortable inflicting it on you.
Posted by mhquake at 7:50 PM
Wednesday, April 6, 2011
Historically GLQuake sets it's far clipping plane distance to 4096. In the era of big maps this is woefully inadequate and is easily exceeded by maps that don't even stretch other limits by too much (I'm thinking The Marcher Fortress here).
Some engines resolve this by pushing the far clipping plane out further - to 8192, 16384 or whatever. I don't like that because it's really just a war of attrition; an arms race between engines and maps in which each tries to outdo the other. Other engines resolve it by having an "r_farclip" cvar which allows the player to set a further clipping plane, but I don't like that either as it smells too much like "I couldn't be bothered fixing it properly so I'll make it the player's problem instead" for my tastes.
Previously I've resolved it by using an infinite projection. I was always unhappy with that setup as it scrunches all distant geometry at the far end of the depth buffer and may cause Z-fighting problems. Z-fighting is also an issue with setting your far clipping plane further and further away. Because the depth buffer is non-linear it has much higher resolution for nearer distances but drops off quickly for further ones.
So to fix it properly I've gone back to an old idea where I just get the distance to the furthest surface (well, plane, actually), use the knowledge that it represents one side of a right-angled triangle, invoke Pythagoras, and derive a dynamically adjusting far clipping plane from that.
So now if the current scene needs a far clipping distance of 80 billion, you'll get it. On the other hand if only 22 is what's needed, that's what you'll get too. No need for you to do anything, and no need to compromise the quality of the scene.
An interesting note here regarding sky. One other reason for a large far clipping plane is to accomodate skyboxes, which commonly need to be drawn as the farthest object in the scene. I don't need it for that reason because I'm actually drawing skyboxes as cubemaps on the original sky surfaces. So yeah; that skybox that looks so far away might really just have it's actual surfaces a mere 20 units above your head.
Posted by mhquake at 8:40 PM
The interesting thing about some of my current work is that it throws up all kinds of subtle and curious bugs. I'm going to save the punchline on this one till the end; for now let me describe what was happening.
I knew that I was getting some strange rendering stats. I could see that easily in some maps; epoly counts (which - remember - are now mostly irrelevant for performance, but still handy to keep around as we'll soon see) were see-sawing wildly between large (80k ish) figures and relatively smaller (20k ish) ones. Stuff I was doing that should have given a good speed boost wasn't quite having the effect I expected. Plus I had this weird thing happening with entity alpha. Stick with me because it's all related.
Let's talk about the latter. I was running some tests and noticed that certain models which should have had alpha were behaving strangely. Instead of a nice steady translucency they were flickering wildly between the alpha they should have had and full solid. Quite funky stuff, and I was initially blaming the Nehahra code (which uses it's own rather weird standards, and none of what's coming up is going to stop me doing the cleanup it badly needs).
Some digging around followed; a lot of Con_Printf'ing and hard-coding alpha values into entities to see what happened. All I could determine was that it was as if something was switching alpha blending off when it shouldn't be.
I removed redundant state filtering, ran with the debug runtimes and under Pix; no luck. The only useful observation was that it seemed to go to full-on or full-off under odd circumstances too - such as when the console was down or when certain menu screens (like the Quit confirm) were up.
Definitely a mismatched state problem I still thought (hint - it wasn't), but I was damned if I could find it. Forced alpha blending on all surfaces - no luck. Hard-coded alpha into my shaders - no luck.
One thing I did notice while comparing with other maps was that it only seemed to happen on static entities. Now we're getting somewhere, but what the hell was going on?
Even more confusingly - this only happened when host_maxfps was set to above 72. Set it to 72 or lower and it didn't happen. This should have been the giveaway but by then my brain was frazzled enough to completely miss it.
So I traced back through the lifetime of these compared to server entities, when something that I should have been aware of a long time ago came up.
I'm running with decoupled timers.
Server entities are added to the visedicts list on every pass through the main loop when the server runs.
Static entities are added to the visedicts list on every pass through the main loop when the client runs.
The client typically runs between 5 and 35 times faster than the server in this kind of setup.
So we had an initial visedicts list, then static entities get added to it and stuff gets drawn. Then static entities get added to it and stuff gets drawn again. Then static entities get added to it and stuff gets drawn again. Then static entities get added to it and stuff gets drawn again. Then static entities get added to it and stuff gets drawn again. And so on for 5 to 35 frames until the list is cleared again when a server frame runs. OUCH!
And the reason why it didn't happen when the console was down? When you bring down the console in a singleplayer game I throttle framerates back to 72. Simple! And what about the menu screens? In there I run a few screen updates to flush pending stuff before popping up the notification dialog and listening for keystrokes, during which no screen updates happen and both the client and the server pause. It doesn't take long for drawing the same entity over and over again to fill to solid, and with the visedicts list never being cleared during that time it never went back to translucent.
Fortunately the fix was easy enough - just prevent static entities from being added if the list hasn't been cleared this frame. But wow; the torture I was inflicting on the renderer (and on the visedicts list) was something to behold.
So as a bonus I've got something like a doubling of framerates in situations where this occurs heavily, as well as a certain amount of sanity checking that I need to do elsewhere. In particular I need to check if running CL_RelinkEntities every client frame wouldn't be a better idea. But that's all for later. Right now I'm just glad that I didn't release with that one in.
Posted by mhquake at 2:51 AM
Monday, April 4, 2011
...is that there are so many of them, of course.
Following on from my previous post I've been reviewing the entity alpha code, and what a hellish mess it's become. Does it use a U_TRANS bit? Does it use a U_ALPHA bit? Does it read from the entities lump? Does it come from an entity field? Is it a part of the protocol? Is it an extension to protocol 15? Is it in the 0..1 range? Is it in the 0..255 range? Is it sent as a byte? Is it sent as a float? At least thankfully there isn't a server message or a QC builtin for it too. (Note to self - mustn't give people ideas!)
IMO this is one serious weakness of the open source world (not open source itself but I'll come to that). It allows for - nay, encourages - proliferation of standards like this, as everyone has their own idea of what the best way of doing things is, and everyone implements the same thing in different ways. Sometimes a half thought-through implementation gets out, sometimes an incomplete or bugged (or just plain crap) implementation gets out, and they all become part of the standard and all need to be supported.
(Aside: if it was up to me alpha would use U_ALPHA, be 0..255, sent as a byte, need explicit protocol support and be either in the entities lump or from a field.)
Now, this isn't a natural consequence of any open source project; it's perfectly possible to have an open source project that doesn't display these tendencies. Look at Firefox as a great example. So if being open source doesn't inevitably lead to this kind of mess, then what does?
Thinking over it, it's quite clear - design by committee is the real culprit. It's just that in the open source world a lot of standards tend to evolve by committee (or at least by different people trying out different approaches to the same thing in rough and loose cooperation - a defacto committee, pretty much). Without strong leadership behind an evolving standard, without someone to call the shots and have final say on the way things are, the result is a mess.
So while open source doesn't spontaneously generate a mess of conflicting standards in and of itself, what it does do is create an environment where people can more easily take something that was once at least reasonably clear and graft the mess on top of it. That's not to say that open source is bad because of this; I said it was just a weakness remember. Open source with a firm guiding principle behind it, and a final arbiter who has final veto doesn't create this mess.
This has parallels in something else I've experienced, namely D3D vs OpenGL. Now, OpenGL isn't open source but an open standard (it existed long before the term "open source" did - at least in it's familiar, modern context), but the same principle applies. In ye olde dayes it was held as a significant thing that (dark clouds and thunder) vith ze D3D only ze eeeevil Microsoft called ze shots, but (rays of sunshine and birdsong) with OpenGL the ARB group guided it's evolution in the golden light of greater communal wisdom. Anyone who's ever had to wrestle with the mess of extensions that OpenGL has become, where each vested interest group is trying to pull it in different directions (with the poor developers stuck in the middle desperately crying STOP! but nobody listens) should agree that things don't always work that way in the real world.
So do I use GL_ARB_shader_objects or GL_EXT_shader_objects? Or do I assume a specific GL_VERSION and it's part of the core? What about GL_NV_fragment_program? Does this have a software emulation fallback or not? Is this deprecated in 3.3 but what if I'm targetting 2.1? Do my shaders use 16 bit, 24 bit or 32 bit floats by default? Can I use glVertexAttribPointer here or glVertexPointer there and what if one needs client memory but the other needs GPU memory? What if the driver I'm on tells me it's 3.2 but doesn't support glPointParameteri?
Sound familiar? Yeah, it's just like entity alpha.
So I didn't mean this to turn into a dig at OpenGL, and I'm rambling now, so it's time to stop.
Posted by mhquake at 8:11 PM
Still some bugs coming through; entity alpha appears to be broken again. This is a fairly direct consequence of trying to integrate Nehahra alpha and modern entity alpha in a single engine, and has caused me trouble before. This is the kind of crap that makes an engine fragile, and probably needs to be gutted and rewritten more than anything else at this stage. Time to crack open the debugger.
Occlusion queries are back. I was a little premature in my prior statement that "the hardware that needs them doesn't support them and the hardware that supports them doesn't need them". OK, I was wrong, and a 3 x framerate increase in certain situations proves that. They still need a little fine-tuning as I suspect that I may have made them slightly over-conservative, but on balance that's the side that it's better to err on with occlusion queries.
Been working some on bounding boxes. Not the server-side bounding boxes, but the bounding boxes that are being used for frustum culling and occlusion queries. I've now got proper nice tight bounding boxes around all entities (you wouldn't believe what QBSP does to some brush model bounding boxes), per-frame bounding boxes on MDLs (which also get interpolated!), proper bounding box rotation support on brush models and MDLs, and an r_showbboxes cvar to let you look at them. As far as I'm concerned changes to the server-side bounding boxes are also a gameplay change, so these are untouchable.
Next up is fixing entity alpha, which means more delays on the release, but I think we'll all agree that it's better that it happen this way.
Posted by mhquake at 6:11 PM
Saturday, April 2, 2011
This also seems to happen if you manually bsp2prt a map and then re-vis it, but I'm going to use the term "vispatched" here to cover both cases.
So, one feature of DirectQ is that it sets the contents colour when underwater to the correct colour of the water surface above you. I always though it was slightly weird in Quake that when you dived into blue water you got a sludgy brown colour, so I wanted to fix that. To do this it does some tracing through the BSP tree at load time to figure out the various colours needed.
It turns out that if you use a vispatched map it doesn't get all spots correctly. I'm not sure if this is a fault in my BSP tracing or if it is a fault in the bsp2prt/re-vis process, but that's not totally relevant for this discussion. What is relevant is that when you pass through these spots you'll get brief flashes alternating between the new colour and the old.
The fix for this involved tracking the last good contents colour and using that instead if you're in one of those spots (which was easy to identify). There are various extra conditions around whether this last good colour can be used, which are intended to catch situations where you get a contents transition or haven't yet been in water; overall it seems a lot more robust than it was.
It's interesting to note that this doesn't happen with maps which have been originally processed for translucent water the correct (and longer) way. Maybe there are differences in the BSP tree this generates?
Whatever, the moral of this particular story is that with 15 years of accumulated content there will always be more than ample room for something to come along and surprise you, and that there are still plenty of places where even 15 years of accumulated knowledge and fixes isn't enough, and you have to do something new.
Posted by mhquake at 4:38 PM
OK, release is being held back to try catch some last minute minor bugs, so in the meantime here's a brief list of things that there's no point in bothering doing with DirectQ.
It has no effect. DirectQ's memory manager is completely rewritten and it will use as much (or as little) heap as it needs, up to a maximum of 512 MB. That doesn't mean that DirectQ uses 512 MB memory all the time when running. If the current map only needs 10 MB then only 10 MB will be actually used.
Note that DirectQ is much more efficient than GLQuake with it's memory usage, so the amount of heap memory it needs is probably between about half and three quarters that of GLQuake, depending on content.
DirectQ has no effective particle limit and setting -particles has no effect at all.
Again, DirectQ is not GLQuake. If you're running a 32 bpp desktop then you'll get a 32 bpp video mode so you only need this if - for some bizarre reason - you're actually running a 16 bpp desktop.
These are almost completely irrelevant. I've amended the counters to provide some more useful and relevant information, but the old wpoly and (especially with the upcoming version) epoly counts don't mean much any more.
For wpoly counts, the number of draw calls needed (and each draw call more or less represents a texture change) is far more important. Traversal of a deep and complex BSP tree will be what hurts you more on the CPU side. If lots and lots of entities are on-screen, QC overhead will be more likely to be your bottleneck. But poly counts? Forget about them, they're not much of a meaningful metric.
epoly counts are even more meaningless. MDLs are completely stored in GPU memory with all operations (interpolation, lighting) being done on the GPU. Each MDL is - at most - one draw call. With instancing enabled multiple MDLs (up to 400) can be handled in a single call. Transfer of data from the CPU to the GPU only occurs with instancing (maybe 80 bytes per MDL) or with the view model (which needs a few blendweights to fix up muzzleflash interpolation).
Posted by mhquake at 3:22 PM
Friday, April 1, 2011
Been doing some more work on lightmaps and I'm thinking about how I want to evolve them in the future. One thing that has tickled my interest a little in the past is implementing something like software Quake's surface caching system, but using dynamic textures located in hardware. Now that I have lightmap updates running quite fast, the potential of being able to update textures this way would be a shame to pass by.
The way I'd see it working is that you'd have a set of really huge textures - something like 4096 x 4096 perhaps - which you'd then use as a texture atlas (like how scraps and lightmaps are currently done) and write in the results of colormap lookups. Then you use those textures for rendering with, which should give a good colour and lighting quality that approaches that of software Quake but with many of the advantages of a hardware renderer, such as automatic perspective correction and cheap bilinear filtering. You'd also quite neatly bypass problems like non-power-of-two textures in the original game data.
Of course it would mean that video RAM overhead would be rather large, and that all texture data would need to be kept in software. You'd also need to do miplevel selection in software too, but on balance I think it should be an acceptable tradeoff. Today's PCs are more than fast enough, and resources are cheap and plentiful.
The biggest problem I can think of is how you would handle external textures and RGBA lightmaps. The initial implementation is probably not going to support these, as the first priority will be to get the basic procedures working. Once that's done I could look to finding a good solution for these, with perhaps RGBA lightmaps taking priority.
External textures, by the way, were the reason why I went for 4096 x 4096 as the surface cache texture size - it obviously needs to be large enough to support most reasonable external texture sizes.
So all in all it's definitely an interesting idea, and a new bunch of challenges worth pursuing. Ever onwards!
Posted by mhquake at 12:27 AM