Forum archive

Real-time voxel raytracing, in development..

IJs at 2008-01-28 09:48
I'm currently working on a real-time raytracing project (http://ijs.mtasa.com) which utilizes voxels aligned in a uniform fixed-grid, and is fully implemented on the GPU instead of conventional CPU raytracers.

Inspired by Ken's Voxlap engine which we know utilizes some runtime length encoding based raycasting algorithm, I reckoned it would be cool to get a raytracer running capable of tracing a given voxel data set. I know there's been a lot of research (e.g. medical research) into this subject, and I've seen several approaches on rendering voxel data sets with reasonable speeds but I figured out they were mostly raycasted, which is not what I was looking after.

Latest relevant snapshots
(01-2009) Instant radiosity & ISMs at work
http://ijs.bastage.net/img9l.png

Early proof-of-concept

(07-2008) Beware of bugs! First-hit ray-tracing only (phong shading).

Application: http://ijs.mtasa.com/files/voxeltracer_demo.exe
Dependency libraries: http://ijs.mtasa.com/files/voxeltracer_demo_dlls.exe

You will need a CUDA-compatible graphics card to run this.

http://ijs.mtasa.com/images/voxeldemo1.png

One of the keys things here is that the programming, being executed on your GPU, is entirely different than that of a CPU-equivalent but it's pretty interesting and looks like a nice challenge anyways. The framerates are different every day, cause there's hardly any performance debugging available - so it's a matter of slightly brute-forcing stuff and figuring out what runs best. Everything is chopped up in logic blocks (or passes), so secondary (and more) ray tracing should be possible by just slapping another trace pass behind the first one. I'm fairly confident I can use the same technique for photon mapping (voxels seem to lend themselves nicely for the storage structure).. hopefully I'll be able to post some progress on that any time soon.

Currently it's running on my first generation NVidia GeForce 8800GTS (not one of the fastest.. G80, theoretical GFLOPS at ~345, 320-bit memory bus).

The cool thing with voxels in the current implementation is that as you increase your scene complexity in terms of numbers of voxels in your grid, the performance of the tracer remains mostly constant - whereas with polygons (with the tracing done on your GPU) you would tend to get into trouble due to the required random memory lookups in combination with the fairly limited memory bus. I use an acceleration structure that is easy to update through the CPU, so animation is possible.

In short, the voxel grid is traversed using a heavily modified Bresenham algorithm (if you're familiar with the subject you probably know Bresenham used to be the fastest line drawing algorithm around, albeit with some graphical error) utilizing some GPU-optimized memory scheme. Everything is coded in C/C++ using NVIDIA CUDA 2.1.
Edited by IJs at 2009-01-17 06:35
Awesoken at 2008-01-28 14:35
Re: Real-time voxel raytracing, in development..
Your video looks very impressive. Please do not hesitate to release a working demo at some point. I must warn you: apegomp (who recently registered under the name, "asdgdhfsg") will soon be your new best friend! : )
Edited by Awesoken at 2008-01-29 03:15
asdgdhfsg at 2008-01-28 17:08
Feel free to use the Voxelstein 3D dataset (http://rapidshare.com/files/86075076/Voxelstein_3D.rar) :)
asdgdhfsg at 2008-01-28 20:59
IJs said at 2008-01-28 09:48
No real-time voxel rayTRACERS that I know of though.
REARViEw is a real-time CPU voxel rayTRACER, but it runs in software, and I don't think it takes advantage of parallel processing and I think the project is dead..
http://rearview.sourceforge.net/index.shtml

There is also a real-time Flash ActionScript 3.0 (voxel?) Raytracer demo here:
http://www.strille.net/works/as3/raytracer/

A raytraced voxel game that uses a resolution of 320x240 would run super-smooth on today's computers, especially when powered by the GPU! :o :o :o
Edited by asdgdhfsg at 2008-01-28 22:59
IJs at 2008-01-29 00:07
I must say that the video is pretty outdated (time to release a new one).. more specifically, before I realized it's probably better for the overall quality to keep the size of 1 voxel equal to 1 pixel on my screen.. the performance to pull it off is there as you can see. That's also the reason why I probably can't use the Voxelstein 3D (although very impressive) dataset at it's current resolution.

And as far as those other raytracers are concerned: I'm not seeing the potential. They come nowhere near the performance and quality that can be achieved by using a good parallel-based algorithm. The AS3 raytracer, although impressive, is probably just based on sphere/plane intersections.

I should be able to release a demo whenever I reach the standards I have in mind, which shouldn't be too long from now. You're gonna need a pretty fast graphics card though : P
Edited by IJs at 2008-01-29 00:16
asdgdhfsg at 2008-01-29 01:52
IJs said at 2008-01-29 00:07
it's probably better for the overall quality to keep the size of 1 voxel equal to 1 pixel on my screen.. the performance to pull it off is there as you can see. That's also the reason why I probably can't use the Voxelstein 3D (although very impressive) dataset at it's current resolution.
I could try making a high-res dataset.

IJs said at 2008-01-29 00:07
You're gonna need a pretty fast graphics card though : P
My brother owns a gaming computer! :o 8) D:

IJs said at 2008-01-29 00:07
I should be able to release a demo whenever I reach the standards I have in mind, which shouldn't be too long from now.
Be sure to include transparent water/glass voxels in your upcoming demo! ;) just kidding :P
IJs at 2008-01-29 02:33
asdgdhfsg said at 2008-01-29 01:52
I could try making a high-res dataset.
That'd be very interesting. Right now I'm just looking into getting some standard raytracer benchmark scenes into there (e.g. Cornell box) as to get the lighting working properly, but it won't be that interesting cause it doesn't show how cool and detailed voxels can be.

I don't think Ken's voxel editing tools support these massive datasets though, that why I'm going to make a suitable polygonal mesh->voxel dataset converter some time soon for very high polycount meshes. But if you're used to voxel editing.. I really don't know any voxel modeling tools out there that seem to do the job.

As far as that 1 pixel per voxel resolution is concerned.. where 1024x1024x256 would be the size of a medium level in Voxlap, 1024x1024x256 would (very) roughly be the size of two human models with this approach in the worst case (given that you're watching them from the same distance in the above screenshot - which would be pretty close anyways). Just to give you an idea of the slightly bigger scale.

asdgdhfsg said at 2008-01-29 01:52
Be sure to include transparent water/glass voxels in your upcoming demo! ;) just kidding :P
That's really not that stupid. Photon mapping seems to fit right into there, so with any luck the first demo should contain a preliminary version of photon mapping (if it doesn't take too long), and eventually you could have caustics and all that : ) - gotta get the basics working first though.

Edited by IJs at 2008-01-29 02:55
asdgdhfsg at 2008-01-29 04:23
Here is a short list of the most important voxel types you should try to put in your engine:

Air voxel: 100% transparent, does not block the player (is not an obstacle)
Solid voxel: 0% transparent, blocks player (obstacle)
grass voxel: ~15% transparent, green, reduces the player's speed (eg. when moving through a big bush), is burnable (when the voxel burns, it a) sends out lightning or b) gets surrounded by fire voxels and subsequently gets replaced with an ash voxel)! :D
Ash voxel: 0% transparent, not heavy, does not burn.
Fire voxel: ~90% transparent, follows/moves in a fluid pattern, sets nearby burnable voxels on fire (DUH! :P)
Water voxel: ~90% transparent, same as the fire voxel, except that it extinguishes fire and is much heavier.
Wood voxel: 0% transparent, heavier than grass,
rubber voxel: 0% transparent, bounces and is burnable! :D
metal voxel: 0% transparent, reflects light, does not burn, weighs a lot :P makes a "pling" sound when hit :P
Stone voxel: 0% transparent, same as metal voxel, except that it does not make a "pling" sound when dropped etc :D
blood voxel: This is the most important voxel! :P should be red in color, ~60% transparent, extinguishes fire, coagulates (gets replaced by a solid organic/grass voxel)!

:D

Blender is fun :D I really want to contribute with something.. therefore I'm going to try to sculpture a high-res human model :D

We can't stop the evolution of violent video games, so we might as well keep pushing the envelope and continue to evolve the genre! :D We should make a voxel game called Childhunt! :o Childhunt will render Manhunt obsolete! :P Imagine pulling a plastic bag over an innocent little girl's head and subsequently punching her 3 times in her face and then stabbing her in multiple times in her chest with a rusty screwdriver LOL! :P just kidding :P

I will continue to learn using Blender now..
Edited by asdgdhfsg at 2008-01-29 07:48
Slang at 2008-01-29 23:11
Hi IJs,

I would like to strongly recommend you check out Coherent Grid Traversal (pdf link). It is basically for accelerating traversals of coherent primary rays (first-hit) in real-time 'polygon' raytracing, but I believe that it can be used in voxel raytracing as well.

If you have some time to experiment with it and eventually get some results, then please let us know. ;)
Spacerat at 2008-01-30 02:46
IJs said at 2008-01-29 00:07
I should be able to release a demo whenever I reach the standards I have in mind, which shouldn't be too long from now. You're gonna need a pretty fast graphics card though : P
I can't wait to get it ;D Really impressive stuff.

Did you implement the algorithm in CUDA or by Shaders?
Maybe you can try to raycast even much bigger data-sets. I found its not a big difference between raycasting 1024x1024x1024 and 40.000x1024x40.000 using mipmapping.

However, in my case with CUDA, I found its still very slow in some cases - about 16fps. Hopefully I can increase the speed using some empty-space skipping algorithm...
Will there be a paper once the algorithm is finished ? I'm looking forward to it ^^

A recent paper which might be interesting for the shading is this one here about ambient occlusions for volume rendering and further related papers.
http://viscg.uni-muenster.de/publications/

Oh, and last here two further links to merge 2 other recent discussions about voxel with this thread:
http://www.gamedev.net/community/forums/topic.asp?whichpage=2&pagesize=25&topic_id=480208
http://ompf.org/forum/viewtopic.php?f=4&t=736

-Sven
Edited by Spacerat at 2008-01-30 15:21
asdgdhfsg at 2008-01-31 00:31
Here is my high-resolution mesh so far:
http://i225.photobucket.com/albums/dd202/highwingx3/poop-5.png
:(

I think I will continue making low-res voxel models since I am just a hobbyist. Modeling lots of high-resolution voxel models (without the use of a big medical CT scanner of some sort) is a time consuming process that only a big game company can do practically.

A raytraced low-res Voxelstein 3D dataset would run smooth on most computers, especially when played at 320x240. It would be cool if we could play a low-res voxel game while we are waiting for the game industry to stop using those old obsolete boring triangles.
IJs at 2008-01-31 23:46
Slang said at 2008-01-29 23:11
I would like to strongly recommend you check out Coherent Grid Traversal (pdf link). It is basically for accelerating traversals of coherent primary rays (first-hit) in real-time 'polygon' raytracing, but I believe that it can be used in voxel raytracing as well.
Thanks. I think I've read a similar paper before, but that must've been before the voxel times. I'm currently a little limited by the algorithm and it's not-so-wide flexibility but who knows.

Spacerat said at 2008-01-30 02:46
Did you implement the algorithm in CUDA or by Shaders?
Maybe you can try to raycast even much bigger data-sets. I found its not a big difference between raycasting 1024x1024x1024 and 40.000x1024x40.000 using mipmapping.

However, in my case with CUDA, I found its still very slow in some cases - about 16fps. Hopefully I can increase the speed using some empty-space skipping algorithm...
Will there be a paper once the algorithm is finished ? I'm looking forward to it ^^
It's not implemented using CUDA due to future plans.. apart from the fact how much I like NVidia's cards, CUDA would require me to completely rewrite the whole thing if I ever wanted to run it on multiple platforms. The Rapidmind library provides an abstraction in between so I don't have to worry about that for now.

As far as mipmapping is concerned, it seems to be a matter of keeping things like the step size pretty constant. I'm using a LOD empty-space skipping algorithm right now that doubles the rendering speed, but no mipmapping is used yet. I'll investigate this pretty soon anyways.

Mind you that raycasting is, in essence, a fraction of the complexity of raytracing.

asdgdhfsg said at 2008-01-31 00:31
I think I will continue making low-res voxel models since I am just a hobbyist. Modeling lots of high-resolution voxel models (without the use of a big medical CT scanner of some sort) is a time consuming process that only a big game company can do practically.
That is correct, and that's why I'm working on a polygon->voxel converter that accepts really high polycount meshes. Mind you that polygons aren't that bad, as long as you have lots and lots of them, and it'll be just a matter of converting those into an acceptable voxel data set. Are you actually using blender to make voxel datasets?
Edited by IJs at 2008-01-31 23:49
asdgdhfsg at 2008-02-01 01:35
IJs said at 2008-01-31 23:46
Are you actually using blender to make voxel datasets?
No, I use Milkshape 3D and poly2vox.
Spacerat at 2008-02-01 16:28
IJs said at 2008-01-31 23:46
It's not implemented using CUDA due to future plans.. apart from the fact how much I like NVidia's cards, CUDA would require me to completely rewrite the whole thing if I ever wanted to run it on multiple platforms. The Rapidmind library provides an abstraction in between so I don't have to worry about that for now.

As far as mipmapping is concerned, it seems to be a matter of keeping things like the step size pretty constant. I'm using a LOD empty-space skipping algorithm right now that doubles the rendering speed, but no mipmapping is used yet. I'll investigate this pretty soon anyways.

Mind you that raycasting is, in essence, a fraction of the complexity of raytracing.
Hm.. Rapidmind is really a good idea to use - especially as Intel etc. are developing CPUs with up to 80 cores.
However, I found that multicore CPUs are not yet as powerful as the GPU mostly due to the limited memory bandwidth..

In case of mipmapping, the advantage is that the number of slabs/RLE elements can be heavily reduced. I am not sure how your algorithm traces the RLE-structure, but I guess you might use binary-search in combination with a coarse 3D-grid to figure out the RLE elements that are hit by a ray. If Mipmaps are included, then the vertical search is at least one iteration faster for each level and the trace-length of the ray is doubled.

Its true that raytracing in general has a higher cost, but it really depends on the scene which algorithm is faster. In my case, I get a lot of overdraw for example, as I have to render most of the vertical elements without knowing which ones are visible. The only help are floating horizons from the top and the bottom that allow to skip RLE elements out of range.
Another thing is the view-transformation for each RLE-element that is required with the conventional voxlap algorithm - meaning a rotation and projection onto the screen. This is not necessary for the ray-tracer.

I'm looking forward to see your algorithms speed with mipmapping ;D I guess you might get pretty large scenes raycasted. Also it might be useful for reflected rays etc, if the result doesnt need to be too exact.
IJs at 2008-02-02 12:25
Primary ray hitting is where raycasting stops and raytracing starts. One of the key points with raytracing will be the lighting equations that can, and will, be added as I've sure you've seen in other raytracers: this is one of the reasons I started developing a real-time raytracing algorithm.

Realistic and physically correct (or approximated, dynamic) global illumination is something you will never be able to achieve with raycasting in it's current form: even rasterization is currently able to get a far higher quality than conventional raycasting with less work. Another point is that the viewpoint is not limited to any particular range either, like you see with some raycasters with nifty encoding techniques. e.g. In the case of photon mapping, it'll be really easy to extend the current raytracing algorithm: instead of tracing from an arbitrary point in space (as camera), just trace a single frame from the lightpoint's view and let the photons bounce around the scene. (This does of course rely on primary, secondary, and deeper rays.) That's just an example of how flexible raytracing can be.

As far as GPU's versus CPU's are concerned, GPU's currently pack a lot more power and are, in comparison with your CPU, more dedicated to 3D maths. They currently seem to be growing significantly faster in their performance, which is imo a good indication of their role in the future - they outrank the fastest desktop CPU out there with a GFLOPS rating that is an order of magnitude higher. Unfortunately it's memory bus does not scale the same way, thus being the major bottleneck in projects like these: it's key is to absolutely keep the bandwidth as low as possible.

My voxels are not encoded in an RLE- or tree-fashion due to performance reasons. However, their data (normals, etc.), which are stored separately from the voxel grid itself, are RLE-encoded (for each z-column) and then linearly iterated after a voxel hit has been found. This requires a maximum iteration of z per ray (in case there are z distinct voxels in one column), where z is the height of the voxel data set. This lookup can be done better, but it's performance hit can be neglected compared to the performance hit of the voxel grid traversal algorithm.
asdgdhfsg at 2008-02-08 00:35
IJ, do you really think you will be able to make a real-time GPU+CPU voxel raytracer?

The problem with using high-resolution datasets is a) much higher memory requirements, b) a high-resolution dataset leads to increased screen resolution (+ raytracing = deadly combination) which leads to super-slow framerate and c) Only a pro modeler can make good-looking high-resolution polygon meshes practically, a game project's development time will therefore be much longer.

If you could allow users to use a screen resolution of 320x200 and a low-resolution voxel dataset, people who do not have a quadruple-core AMD-256 Sledgehammer 80 XT supercomputer can enjoy raytracing games too.. :P

http://www.nlm.nih.gov/research/visible/vhp_conf/schieman/images/vhtitlc.jpg
BOOM!! BRAINSHOT!! ;D :P
Edited by asdgdhfsg at 2008-02-08 01:01
IJs at 2008-02-10 08:21
I don't really care about people having old graphics cards systems and being compatible with them. I'm developing something for future usage, whenever the current high-end GPU cards and other parallel systems (e.g. Sony's PS3) will be mainstream.

I don't have an AMD-256 Sledgehammer 80 XT supercomputer.. In case you did not notice yet, the point of this project is to create a real-time raytracer that can run on conventional hardware, so that excludes any dedicated hardware or cluster computers.

A resolution of 320x200 and low-resolution voxel datasets are unacceptable for anything that's meant to be high-quality.

The memory problems have been taken care of.

As far as I know with professional game development, most modelling is done with high-resolution models which are then optimized using today's techniques into low-polygon meshes (with all kinds of bump/normal/parallax mapping tricks).

Anyways, I've been a bit busy lately but I hope to be picking up development pretty soon.
ConsistentCallsign at 2008-02-16 10:31
IJs said at 2008-02-10 08:21
the point of this project is to create a real-time raytracer that can run on conventional hardware, so that excludes any dedicated hardware or cluster computers.
Raytracing games can never be rendered real-time @ a fast framerate until they are executed on parallel processing cluster computers that consist of at least 64 "Playstation 3"s (or until they are played at a resolution of 320x200 ;D or until some new revolutionary technology replaces the current type of transistor that we use today)

Cheap Cell cluster computers will eventually become conventional hardware, right? :o
Edited by ConsistentCallsign at 2008-02-16 11:10
IJs at 2008-02-18 01:42
Right.

Anyways, besides the fact whether or when this is possible or not, I've finally managed to create a voxelization program that can turn high poly convex/concave meshes into the currently used 1024x1024x256 voxel format. It basically slices the scene into 256 slices, and uses OpenGL's stencil buffer to create a "silhouette" of the object at that particular depth. Currently it renders the slices on GPU, downloads and processes the stencil buffer contents on the CPU, which is eventually uploaded back to the GPU (in the renderer), but the entire thing can be modified to run entirely inside the GPU (at very high speeds), which may open a possibility for animations or dynamically generated voxels-through-triangles somewhere in the future.

It's not quite done yet. Since the renderer is based on Bresenham, it requires some extra work on the "inner" mesh voxels, but I'm trying to get the Cornell box in there pretty soon.

It'd be a whole lot easier if I didn't have to write these stupid triangle->voxel converters, they take up a tremendous amount of effort. But.. c'est la vie.
Edited by IJs at 2008-02-18 01:51

Slang at 2008-02-18 04:03

IJs said at 2008-02-18 01:42

It basically slices the scene into 256 slices, and uses OpenGL's stencil buffer to create a "silhouette" of the object at that particular depth.

Is it really necessary to use stencil buffers?

Well, I actually implemented the same kind of GPU-based voxelizer a year ago when I personally experimented with per-pixel voxel raycasting on the SM3.0 GPU. Although the only difference was that I used framebuffers rather than stencil buffers because you could directly render-to-3D-texture the result and thus easily get the voxel-volume data by simply calling glGetTexImage(). (Also, you could automatically generate mipmaps via GL_GENERATE_MIPMAP.)

Here's the code from my archive:


void Voxelizer::RenderToSlice(const Model& model, const word z) {

	static const Vec4 FORE_COLOR(1.0f);
	static const Vec4 BACK_COLOR(0.0f);
	static const Vec3& N(-Vec3::Z_AXIS);
	const Vec3 P(0.0f, 0.0f, m_AABB.GetMin().z + m_SliceSize*(float)z);
	const double plane[4] = { (double)N.x, (double)N.y, (double)N.z, (double)(-N.Dot(P)) };
	
	glClearColor(BACK_COLOR.x, BACK_COLOR.y, BACK_COLOR.z, BACK_COLOR.w);
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

	glEnable(GL_CLIP_PLANE0);
	glClipPlane(GL_CLIP_PLANE0, &plane[0]);
	
	glColor4fv(FORE_COLOR.ToFloatPtr());
	glCullFace(GL_FRONT);
	model.Draw();

	glColor4fv(BACK_COLOR.ToFloatPtr());
	glCullFace(GL_BACK);
	model.Draw();
}


glViewport(0, 0, m_VolumeSize, m_VolumeSize);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(min.x, max.x, min.y, max.y, min.z, max.z);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();

for (word z=0; z<m_VolumeSize; ++z) {
	RenderToSlice(model, z);
	glBindTexture(GL_TEXTURE_3D, m_VolumeTexture);
	glCopyTexSubImage3D(GL_TEXTURE_3D, 0, 0, 0, z, 0, 0, m_VolumeSize, m_VolumeSize);
}

Much simpler, isn't it? ;)

Edited by Slang at 2008-02-18 04:21

IJs at 2008-02-18 05:57
Unfortunately that's unsuitable for the 3D volume data sets I require (1024x1024x256 x 3 = 768MB for uniformly storing the xyz normals), so that's out of the question. My 2D slices are 1024x1024 each, for all 256.

Also, the meshes need to be floodfilled (filled in with "inside" voxels) at this point, that's where the stencil buffers come in handy.
Edited by IJs at 2008-02-18 06:01
Slang at 2008-02-18 06:58
IJs said at 2008-02-18 05:57
Unfortunately that's unsuitable for the 3D volume data sets I require (1024x1024x256 x 3 = 768MB for uniformly storing the xyz normals), so that's out of the question. My 2D slices are 1024x1024 each, for all 256.
Well, you could 1) use a single-component texture format such as GL_RED to reduce the size (resulting 256 MBytes in this case) 2) use FBOs to solve the RBBCTT's screen-size-limited issue 3) upload the 'binary' voxel-volume data to system memory and calculate the normal vectors on the host side 4) download the normal-volume data to VRAM and raycast/raytrace it.

IJs said at 2008-02-18 05:57
Also, the meshes need to be floodfilled (filled in with "inside" voxels) at this point, that's where the stencil buffers come in handy.
Actually, the code above automatically fills the volume without any effort. 8)
Edited by Slang at 2008-02-18 12:19
IJs at 2008-02-18 07:35
Yes you could. But then you'll have to agree with me that the stencil buffer turns out to be the easier all-in-one option right now, requiring only ~4MB of GPU memory at any point.

Right now it's working alright, it takes about 10 seconds to sample an entire scene and the scene data is accurate. The problem right now is that the floodfilled voxels near the edges of the mesh need to have the same normal as the voxels that actually form the edges. I hope to be solving this pretty soon.
Edited by IJs at 2008-02-18 07:38
Slang at 2008-02-18 09:57
IJs said at 2008-02-18 07:35
Yes you could. But then you'll have to agree with me that the stencil buffer turns out to be the easier all-in-one option right now, requiring only ~4MB of GPU memory at any point.
You know, it's also possible in the framebuffer-based algorithm to upload the voxel-volume data on a slice-to-slice basis -- use a temporal 2D texture (let's say "slice buffer") rather than one big 3D texture:
```
for (word z=0; z<m_VolumeSize; ++z) {
	RenderToSlice(model, z);
	glBindTexture(GL_TEXTURE_2D, m_VolumeTexture);
	glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, m_VolumeSize, m_VolumeSize);
	glGetTexImage(GL_TEXTURE_2D, 0, GL_RED, GL_UNSIGNED_BYTE, ...);
}
```
In this case, it only requires 1024x1024 pixels * 8 bits = 1 MBytes of VRAM. ;)
Although I strongly believe there will be a lot of overheads due to the hundreds of glGetTexImage() calls. Asynchronous PBOs might be helpful.

IJs said at 2008-02-18 07:35
The problem right now is that the floodfilled voxels near the edges of the mesh need to have the same normal as the voxels that actually form the edges. I hope to be solving this pretty soon.
I think that you would end up storing only surface voxels just like Voxlap if you are going to support per-voxel texture mapping.
Edited by Slang at 2008-02-18 10:49
IJs at 2008-02-19 06:36
How does your code handle concave meshes? It seems to always draw the closest front and backfacing polygons, not something you'd want for e.g. a sphere mesh that looks something like this:

http://www.cs.tut.fi/~tgraf/harjoitustyot/tutorial/image6.4.jpg

Unless I'm mistaken.

Another problem with our sliced based approaches is that polygons that are perpendicular to the orthogonal camera are nullified, e.g. the sides of a box that is standing upright. Apart from slightly rotating the scene, which seems like an ugly hack, I'm not exactly sure of a quick fix for this. Maybe change the frustum so it's nearly orthogonal? Perhaps the edges are automatically reconstructed if I used the clip planes instead of the frustum planes.

I've tried cutting the slices by setting the near/far plane to the top and bottom coords of each slice, so you get a nice silhouette (basically all the surface pixels) of the model for every slice without using the stencil buffer. Basically what you're doing above, but by using the near/far planes instead. This works fairly good for sampling the normals.. except the entire mesh is full of holes because of the above problem.

EDIT: Never mind, I figured it out by using a combination of GL_FILL and GL_LINE along with 2 clipping planes.
Edited by IJs at 2008-02-19 10:21
Slang at 2008-02-19 20:13
The algorithm surely can handle concave/perpendicular polygons without any modification. However, it can't handle non-closed polygons and/or polygons intersecting other polygons, so it's not 100% practical and robust.

Anyway, check out "Real-Time Volume Graphics" (pp.316-320) for more details.
IJs at 2008-02-22 02:28
Alright, the voxelizer is now working.

All I need now is some high polycount meshes to sample, maybe a detailed stone wall or something, so I can start creating a test room for the lighting.
ConsistentCallsign at 2008-02-22 05:03
IJs said at 2008-02-22 02:28
Alright, the voxelizer is now working.

All I need now is some high polycount meshes to sample, maybe a detailed stone wall or something, so I can start creating a test room for the lighting.
I'm on the case! ;D

EDIT
The polycount isn't very high, but I guess it's OK since it's a just a stonewall, stones are allowed to be a little edgy:
http://rapidshare.com/files/93764113/stonewall.rar
High resolution texture included! :o
Edited by ConsistentCallsign at 2008-02-22 05:51
IJs at 2008-02-22 06:56
Hey, that's pretty nice. I should be able to smoothen that out and do some deformations on it with a modeller as to increase the quality.

Thanks.

EDIT: And here we go for a preliminary result (really simple Phong shading). Note that I'm in debug mode (poor performance), and I'm crossing the 1-voxel-per-pixel boundary (result: jaggy edges). Don't worry about it.

The model's actually taking up about 20% of the entire scene, so relatively low quality (as I'm currently limited by the 256 voxel height). Remember these are voxels: increasing the resolution will _not_ decrease the performance in any way.

http://solid.student.utwente.nl/volume23b.png
Edited by IJs at 2008-02-22 08:34
ConsistentCallsign at 2008-02-22 16:48
LOL, 4 FPS :D

I'm making a high-resolution hand, LOL!
http://i225.photobucket.com/albums/dd202/highwingx3/hand.png

I made a fork, LOL:
http://i225.photobucket.com/albums/dd202/highwingx3/fork.png
http://rapidshare.com/files/93918823/fork.3ds
Edited by ConsistentCallsign at 2008-02-22 21:08
IJs at 2008-02-22 21:25
And performance speed mode just for the ones that didn't read..

http://solid.student.utwente.nl/volume23c.png
ConsistentCallsign at 2008-02-22 22:36
One of the advantages of voxel engines is their ability to render lots of dense volumetric smoke without killing the framerate. Therefore, you should add fluid physics or you might as well make a vector engine instead.. Volumetric smoke generator plugins for 3D modeling/animation programs all use voxels because voxels are so much faster to render than polygons.

And after you have implemented a fluid physics algorithm, you just need to add transparent voxels and then you have water and fire! :o

Spectacular volumetric smoke effects as shown in the fake, computer-generated Killzone 2 (PS3) trailer 3 years ago (http://www.youtube.com/watch?v=Ko9xC6TMdiw), are only practical to render in real-time with a voxel engine.
Edited by ConsistentCallsign at 2008-02-22 22:53
Spacerat at 2008-02-23 00:44
ConsistentCallsign said at 2008-02-22 22:36
One of the advantages of voxel engines is their ability to render lots of dense volumetric smoke without killing the framerate.
Um.. thats not true. Rendering scenes without transparencies is very fast. However, in case of transparencies as for volumetric smoke, you cant use early ray termination efficiently so its getting quite slow. I already made some tests..
A solution is to render the opaque stuff first and the transparent things in a second pass. Then you get early ray termination and may also use low resolution raycasting for the transparent things.
IJs at 2008-02-23 02:39
ConsistentCallsign said at 2008-02-22 22:36
One of the advantages of voxel engines is their ability to render lots of dense volumetric smoke without killing the framerate. Therefore, you should add fluid physics or you might as well make a vector engine instead.. Volumetric smoke generator plugins for 3D modeling/animation programs all use voxels because voxels are so much faster to render than polygons.
Perhaps just one of the other advantages. The only advantage that is important right now is the uniformity of a voxel data set, as opposed to tree-based structures (with triangles or primitive shapes) that are currently less than optimal for a GPU to process, thus their advantage in speed combined with their advantage in quality (although that's still being worked on : ).

I like your enthousiasm, but I don't think you really understand the reason why I'm doing this. Perhaps it'll start making sense once I get my first demo with proper quality ready.

Again, this is not a raycaster or rasterizer. This is a ray-tracer, although it doesn't show yet, it really is. I'll be adding texturing pretty soon, as well as reflection rays and preliminary photon mapping.
Edited by IJs at 2008-02-23 02:44
IJs at 2008-02-28 04:12
And here's an update. I managed to get texturing to work first:

http://solid.student.utwente.nl/volume25b.png

I then spent some time in attaching a second ray-tracing pass, resulting in support for reflections:

http://solid.student.utwente.nl/volume27.png

It's starting to take shape. Of course at the cost of some speed, high performance mode currently runs at a worst-case 10fps so there's some room for improvement: rays aren't terminated early, even if they point into void (all the black colors), so it's still unoptimized.
Edited by IJs at 2008-02-28 04:27
ConsistentCallsign at 2008-02-28 04:49
Now all you need are some transparent voxels.

:o3D Raster Ray Tracing (RRT) is the future!! :o
http://www.cs.sunysb.edu/~vislab/projects/volume/Papers/Discrete.html said
In conventional ray tracing, computation time grows with the number of objects, and performance is greatly influenced by the type of objects comprising the scene; intersection calculation between a ray and a parametric surface is significantly more complex than intersecting the ray with a sphere or a polygon. In contrast, RRT completely eliminates the a computationally expensive ray-object intersection calculation, and instead relies solely on fast discrete ray traversal mechanism and a single simple type of object the voxel. Consequently, RRT performance is effectively independent of the number of objects in the scene or the objects' complexity or type. Therefore, for a given resolution, ray tracing time is nearly constant and can even decrease as the number of objects in the scene increases, as less stepping is necessary before an object is encountered.
Voxels 1 - Polygons 0 8)
Edited by ConsistentCallsign at 2008-02-28 04:56
IJs at 2008-02-28 05:45
Transparency is no problem at all. I am a little concerned about making the scene dynamic though, as it's currently as static as can be. Suggestions are welcome.
Spacerat at 2008-02-28 12:11
IJs said at 2008-02-28 04:12
It's starting to take shape. Of course at the cost of some speed, high performance mode currently runs at a worst-case 10fps so there's some room for improvement: rays aren't terminated early, even if they point into void (all the black colors), so it's still unoptimized.
10 fps sounds good. Do you think its possible to get about 20-30 fps with a 640x480 resolution ? Then it gets interesting for gaming. 10 fps is also similiar to what I am getting at the moment with GPU voxel ray-casting.
Yesterday I finally got the x- and y-rotations working. However, I found I have to cast up to 4x as much columns as the screen has to get all pixels filled.. Thats really a handicap as this ends up in 4096 raycasts for a 1024x1024 screen resolution.
Edited by Spacerat at 2008-02-28 12:14
ConsistentCallsign at 2008-02-28 18:59
Spacerat said at 2008-02-28 12:11
Do you think its possible to get about 20-30 fps with a 640x480 resolution ? Then it gets interesting for gaming.
It's possible to get a thousand frames per second with a screen resolution of 1x1, 'cause then there will only be 1 ray, I think, LOL! ;D

IJs: Are you planning to make a cool voxel game engine or are you just trying to show that GPU raster raytracing is possible?
If it's just an experiment and you are not going to add fluid physics etc and use it as a game engine or let other people license it and use it as a game engine, then you might as well help me with Voxelstein 3D :P

When cheap Cell cluster computers become available on the mainstream market, then the game industry will make cool raster raytracing engines in a matter of days. You are like inventing the car before the oil has been discovered or the invention of the wheel. GPU raytracing is still slow, real-time raytracing will never be practical until cluster computers have become conventional hardware or you use a really low screen resolution (320x200).
IJs at 2008-02-29 01:34
Spacerat said at 2008-02-28 12:11
10 fps sounds good. Do you think its possible to get about 20-30 fps with a 640x480 resolution ? Then it gets interesting for gaming. 10 fps is also similiar to what I am getting at the moment with GPU voxel ray-casting.
Yes, definately. There's a number of "slow" factors involved in my current situation:
- I'm on a Revision 1 G80 gpu which has 96/112 stream processors, 64GB/s memory bandwidth and a theoretical 345 GFLOPS. If you compare that with the newer G92 or GTX, which have 128 stream processors, 64 to 86 GB/s memory bandwidth and theoretical 518 or 624 GFLOPS ratings, you can see that my GPU is at the bottom of the high-end range cards performance. Also, since I'm using Rapidmind as platform, it can easily be ported to the PlayStation 3, which should be even faster although the specs about memory bandwidth and performance are pretty hard to find. I know there will be a performance gain in more expensive hardware setups, but I'm not sure of the exact magnitude (apart from theoretical calculations). I guess there's only one way to find out.

- The current implementation is still unoptimized. I've been forced to use some workarounds here and there and some choices I've made regarding stuff like texture or data types are probably far from optimal at this point, because basically up until now I've been trying to program this as a (trail and error) proof-of-concept with the option of eventually rewriting it. It's a fairly new platform, so it probably requires some work to tune and tweak (and read lots of tech specs) it down to awesome performance.

I think people generally consider real-time as >1 FPS or something. My goal is to stay far above that, so it remains suitable for gaming.
When cheap Cell cluster computers become available on the mainstream market, then the game industry will make cool raster raytracing engines in a matter of days. You are like inventing the car before the oil has been discovered or the invention of the wheel. GPU raytracing is still slow, real-time raytracing will never be practical until cluster computers have become conventional hardware or you use a really low screen resolution (320x200).
Yeah, sure you can wait another 20 years for it to become practical to just use a stream-based raytracer without any fancy techniques, why bother now? Because we can: the technology is already in place. The technique this is based on (Bresenham) was invented decades and decades ago, and is up until today one of the simplest techniques around. Don't forget that as those magic Cell cluster computers become available, your conventional raytracer's performance grows and becomes real-time, but this technique's performance grows n times faster than that and already was real-time.

It'd be interesting if you checked out http://www.ompf.org/ once in a while. They're actually doing a real-time (CPU, SIMD) raytracing game engine over there that looks pretty well. I think they're making a big mistake by utilizing the CPU for it though while a good GPU packs so much more power. They're two fundamentally different things though.

And yeah, this is beyond an experiment. This is the base of a new engine. I should really make a roadmap.
Edited by IJs at 2008-02-29 01:51
ConsistentCallsign at 2008-02-29 03:37
IJs said at 2008-02-29 01:34
And yeah, this is beyond an experiment. This is the base of a new engine. I should really make a roadmap.
In that case, it is imperative that you implement volumetric smoke/water/fire/fog because if your raster raytracing engine is going to be suitable for next-next-gen games (next-next-gen games are games made in the new era when raytracing games rule the world:P), you must make it so that the engine can take advantage of at least 512 parallel processors to calculate the all the rays, but when cheap Cell cluster computers become conventional hardware and everyone has at least 512 parallel processors and can use your engine, we will have the processing power to simulate fluid physics in real-time at a fast framerate too and then nobody would want to use your engine if it doesn't have ultra-awesome explosions like in the fake Killzone 2 trailer :D

Imagine cheap Cell cluster computer with 512 CPUs that has an additional 512 Cells dedicated to smoke/fluid simulation :D that would be sweet.. But wouldn't 1024 Cell CPUs use alot of electricity? ???
Slang at 2008-02-29 06:41
ConsistentCallsign said at 2008-02-29 03:37
...when cheap Cell cluster computers become conventional hardware and everyone has at least 512 parallel processors and can use your engine, we will have the processing power to simulate fluid physics in real-time at a fast framerate too...
Actually, voxel-based fluid simulation is bandwidth-bound, not compute-bound:

Theodore Kim, "Hardware-Aware Analysis and Optimization of Stable Fluids," I3D 2008.
http://www.cs.unc.edu/~kim/I3D08/

Even if many-core MIMD processors become commoditized in the near future, I suspect that there would be almost no commercial game product that has voxel fluids in it as long as there wouldn't be any innovation that drastically speeds up memory access. (Before then, the current process-miniaturization technology will plateau soon at around 20nm in any way...The future is not bright.)

BTW, check out this video from GDC 2008:
http://www.gametrailers.com/player/30825.html

P.S. You guys also might want to check this out:
http://www.gametrailers.com/player/31317.html
Edited by Slang at 2008-02-29 21:44
IJs at 2008-03-01 01:07
Those movies are nice, it's always nice to see some innovation from companies like Epic Megagames or Id software, it's still mostly rasterization though ; )

I have to admit that getting dynamic scenes working is vital for a project like this. I've been thinking about running the entire animation code on the GPU, as some form of "modifier" that changes the voxel (and color) data without invoking the CPU in the process. I'll be giving this some extra thought.

On a sidenote, here are some specs on the reflection scene:
- 1024x1024x256 voxels
- voxel data is 16MB in size and does not change depending on complexity
- RLE data changes depending on color/normal variety and is currently 6.4MB in size
- RLE indexing data takes up 4MB of space
- total memory cost: 16+6.4+4 = 26.4MB

It's indeed the memory bus that remains the bottleneck for applications like this, so it requires clever usage in order to get more performance.

Less space would be required if the scene data was divided into objects (bounding boxes), instead of taking the 1024x1024x256 scene as a whole. This may also benefit any future animation plans.
Edited by IJs at 2008-03-01 01:12
ConsistentCallsign at 2008-03-01 01:17
IJs said at 2008-03-01 01:07
It's indeed the memory bus that remains the bottleneck for applications like this, so it requires clever usage in order to get more performance.
What we need is a new revolutionary type of memory/bus hardware that will solve all our bandwidth problems.. Until then, you might as well help me with Voxelstein 3D :P
IJs at 2008-03-01 05:36
Come on, don't be obstinate. Think of a solution instead. You know high resolution voxels are the only way if you care about quality. How does Voxlap do animations?
Edited by IJs at 2008-03-01 05:41
ConsistentCallsign at 2008-03-01 06:57
IJs said at 2008-03-01 05:36
You know high resolution voxels are the only way if you care about quality.
You know there are lots of people that still enjoy playing low-resolution games? Gameplay over graphics! :P
Seriously.. Of course I want better graphics.. but I am an impatient man, I don't want to wait anymore, I'm fed up with polygons. I want to play a real voxel game, not that Outcast shit that is just a static voxel terrain engine with some polygon characters moving around on top, and the closest thing to a real voxel game/engine is Ken's Voxlap.

IJs said at 2008-03-01 05:36
How does Voxlap do animations?
Ken's kwalk voxel animation program lets you move voxel parts around a point, but Ken created the program a long time ago and he found new and better ways to animate them by bending and stretching the voxels too as the in-game command /curvy=nameofmodel.kv6 shows:
http://i225.photobucket.com/albums/dd202/highwingx3/dope-1.gif
http://i225.photobucket.com/albums/dd202/highwingx3/KVX50001-11.png
IJs at 2008-03-01 09:13
Interesting. However, I don't think stretching voxels will really do here.

I'm pretty certain that I can easily perform any transformations on my voxel data (very much like a geometry shader if you will). The only problem is the RLE data texture. It contains the RLE elements, the information (colors, normals, etc) for each unique voxel in the scene, and is currently run-time length encoded (down the Z axis) based on every column in the scene: a ray hits a voxel, uses its X,Y voxel coords to get a value from a particular texture, the RLE index data. This value is the pointer to the start (texture coordinate) of the first RLE element for that specific X,Y column, which it iterates until the Z value is found for the voxel that was hit. It works fairly good.

But applying transformations to that RLE data seems fairly costly since it involves shifting data (bad for performance) or padding data beforehand (bad for memory usage). There are other ways of doing this, so it's worth investigating.

Gameplay over graphics doesn't really work for me by the way.
Edited by IJs at 2008-03-01 09:15
esuvs at 2008-03-02 03:13
IJs said at 2008-03-01 09:13
Interesting. However, I don't think stretching voxels will really do here.

I'm pretty certain that I can easily perform any transformations on my voxel data (very much like a geometry shader if you will). The only problem is the RLE data texture. It contains the RLE elements, the information (colors, normals, etc) for each unique voxel in the scene, and is currently run-time length encoded (down the Z axis) based on every column in the scene: a ray hits a voxel, uses its X,Y voxel coords to get a value from a particular texture, the RLE index data. This value is the pointer to the start (texture coordinate) of the first RLE element for that specific X,Y column, which it iterates until the Z value is found for the voxel that was hit. It works fairly good.

But applying transformations to that RLE data seems fairly costly since it involves shifting data (bad for performance) or padding data beforehand (bad for memory usage). There are other ways of doing this, so it's worth investigating.

Gameplay over graphics doesn't really work for me by the way.
Well if you were just storing 'raw' texture data (rather than RLE form) then presumably you could use render-to-texture functionality to edit the volume. This should be damn fast. But instead of storing the volume as one big texture you could store it as a series of blocks with each block represented by a texture. Then you can omit to store blocks for homogenous regions in order to achieve some compression. Dos that make any sense?

I think RLE encoding, while good for rendering, is bad from the perspective of making the volume dynamic :-(
Spacerat at 2008-03-02 12:12
I think animtion by recomputing the complete volume data is not such a good idea for gaming.
Here is a paper which does this for medical volume data http://graphics.usc.edu/~trhee/papers/Rhee07_PG.pdf
It takes about 2.5 seconds for deforming 255x255x100 using skinned skeletal animation (usually 3 matrix multiplications are required for each voxel).
More suitable for animation are characters made from particles. The only difficulty there is to fill the gaps if the density at certain points is insufficient. I think the characters in voxlap are already rendered as particles, aren't they?

Another interesting link is the OpenRT project http://www.openrt.de/publications.php
However, there hasnt been any update for already one year..

Slang said at 2008-02-29 06:41
Actually, voxel-based fluid simulation is bandwidth-bound, not compute-bound:
Today I was surprised that my raycaster also seems to be bandwidth bound. Here the reason:
Casting 1000 Rays: Render time ca. 50ms
Casting 4000 Rays: Render time ca. 50ms
Switch MipMaps at 1000 (z distance): Render time ca. 50ms
Switch MipMaps at 400 (z distance): Render time ca. 25ms

I don't know if its the same as for ray-tracing.

Edited by Spacerat at 2008-03-02 22:02
IJs at 2008-03-03 05:17
Spacerat said at 2008-03-02 12:12
I think animtion by recomputing the complete volume data is not such a good idea for gaming.
Here is a paper which does this for medical volume data http://graphics.usc.edu/~trhee/papers/Rhee07_PG.pdf
It takes about 2.5 seconds for deforming 255x255x100 using skinned skeletal animation (usually 3 matrix multiplications are required for each voxel).
More suitable for animation are characters made from particles. The only difficulty there is to fill the gaps if the density at certain points is insufficient. I think the characters in voxlap are already rendered as particles, aren't they?
Mind you, we're not recomputing the volume data (e.g. re-uploading or re-sampling it), but merely changing (transforming) what we already had in memory. Don't forget that the volume data (e.g. 1024x1024x256) lends itself perfectly for parallel processing.

For example, all you need is a 1024x1024 viewport with each shader (or pixel) processing that particular Z column (256 voxels) and applying a certain transformation for whichever voxels had to be transformed. If my raytracer looks up roughly 400 of these voxels on average at 25 FPS, it would be really fast to transform voxels any way we wanted.

Also, since we're bandwidth bound, there's a lot of computational power left for the GPU to transform the voxels.
SOMEGUY at 2008-03-03 12:54
Ah i been reading this article for awhile and really interested in this voxel engine projects.

I saw this link with really high voxel count with a trillion
http://www.crs4.it/vic/cgi-bin/bib-page.cgi?id=%27Gobbetti:2005:FV%27
and another with real-time raytracing
http://www.crs4.it/vic/cgi-bin/multimedia-page.cgi?id=%27129%27
Please keep chugging until this thing gets off the ground, also do you plan to upload any source code?
Spacerat at 2008-03-03 16:18
SOMEGUY said at 2008-03-03 12:54
I saw this link with really high voxel count with a trillion　and another with real-time raytracing
Yes, the results of Far Voxels are really impressive. It's actually a mix of polygons and particles and not a raytracer/raycaster. Near geometry is rendered as geometry using bsp for culling and far Geometry by view-dependent particles.

IJs said at 2008-03-03 05:17
Mind you, we're not recomputing the volume data (e.g. re-uploading or re-sampling it), but merely changing (transforming) what we already had in memory. Don't forget that the volume data (e.g. 1024x1024x256) lends itself perfectly for parallel processing.

For example, all you need is a 1024x1024 viewport with each shader (or pixel) processing that particular Z column (256 voxels) and applying a certain transformation for whichever voxels had to be transformed. If my raytracer looks up roughly 400 of these voxels on average at 25 FPS, it would be really fast to transform voxels any way we wanted.

Also, since we're bandwidth bound, there's a lot of computational power left for the GPU to transform the voxels.
Hm.. but how would you do it ? With RLE encoded data it might be very slow to animate everything and store it again in GPU memory as you need to reencode everything - the RLE structure of the animated object will be very different. Applying this for raw data is also a problem as it doesnt fit into GPU memory. A different idea might be an inverse animation, where you lookup voxels near the ray while tracing, if they might hit the ray once deformed - this might be fast enough by using bounding boxes e.g.

But as we are bandwidth bound, its true that we should add more computations ;D For me, I first need faster culling and better lighting..
Edited by Spacerat at 2008-03-03 16:51
Slang at 2008-03-03 23:16
Why do you guys stick to voxels for animation? Since you are using GPUs, I think that it's just a matter of integrating polygons with voxels via depth buffering (where the polygonal models are animated by mesh skinning/deforming using GPU vertex shaders).

You know, voxel raycasting/tracing simply never replace GPU rasterization. Let's use the hard-wired resource. ;)
Edited by Slang at 2008-03-03 23:47
IJs at 2008-03-04 08:41
Slang said at 2008-03-03 23:16
Why do you guys stick to voxels for animation? Since you are using GPUs, I think that it's just a matter of integrating polygons with voxels via depth buffering (where the polygonal models are animated by mesh skinning/deforming using GPU vertex shaders).

You know, voxel raycasting/tracing simply never replace GPU rasterization. Let's use the hard-wired resource. ;)
That's really besides the point here.

Either way, I might be able to do a "sprite"-based approach: divide the scene into blocks containing 6 faces with a unique sprite representing the data of the inner voxels. This allows easy read & write operations.
IJs at 2008-03-13 05:31
I've ported the whole thing to NVIDIA CUDA, so my framerates (for the first-post reflection screenshot) are at 20FPS again as you should be able to see on my webpage.

http://solid.student.utwente.nl/media/1/20080311-volume28.png

Also, I've started experimenting with animations. It seems I have enough bandwidth (a theoretical 6 GB/s, practical 2.6 GB/s with my poor setup) over the PCI-Express bus available to manipulate the voxel data in real-time by using a hybrid CPU/GPU approach.

I'm currently adding the code to make these scenes dynamic and with any luck I should have a video available any time soon, if no problems occur that is. I'm currently only getting a hit of about 5 fps for syncing data between the CPU and GPU every frame thanks to CUDA.
Edited by IJs at 2008-03-13 05:38
bitshit at 2008-03-14 09:30
Interesting project! One of the most promising effords into volume rendering!

PS: Does Cuda give you more performance compared to Rapidmind? I always through Rapidmind could do the same tricks as Cuda but more scalable among different architectures (GPU, multicore CPU's, SPU's etc)
IJs at 2008-03-15 03:08
Yes, I literally went up 200% in performance due to better memory management tricks. I have another few tricks up my sleeve, e.g. using the shared memory for ray coherence to speed it up even further (or using it to do anti-aliasing).

Also, the initialisation overhead went down from like 5 seconds to 1 second. Not sure what to blame for that.

I'm happy to offer up the compatibility for better performance.. I don't have a PlayStation 3 to develop on either way.
bitshit at 2008-03-15 09:37
IJs said at 2008-03-15 03:08
Yes, I literally went up 200% in performance due to better memory management tricks. I have another few tricks up my sleeve, e.g. using the shared memory for ray coherence to speed it up even further (or using it to do anti-aliasing).
Ok, sounds like a good deal then :)

IJs said at 2008-03-15 03:08
Also, the initialisation overhead went down from like 5 seconds to 1 second. Not sure what to blame for that.
Could be because of JIT evaluation / recompilation tricks Rapidmind employes?

IJs said at 2008-03-15 03:08
I'm happy to offer up the compatibility for better performance.. I don't have a PlayStation 3 to develop on either way.
I can imagine, the only disadvantage I see is that it's not "platform" independand this way (hardcoded to the GPU platform)... say we get multicore cpu's (or something else like the labberee) in the near future that could pull this off more efficient, then everything would have to be rewritten?

PS: The screens look amazing, but whats up with these "cracks" you see here and there? Is it a side effect of the modified bressenham algorithm you mentioned?
IJs at 2008-03-16 23:55
bitshit said at 2008-03-15 09:37
I can imagine, the only disadvantage I see is that it's not "platform" independand this way (hardcoded to the GPU platform)... say we get multicore cpu's (or something else like the labberee) in the near future that could pull this off more efficient, then everything would have to be rewritten?
It took me 2 days to convert the entire thing to CUDA. If you don't count the dumb issues I've had (typo's, forgetting signs, etc.) it would've been a couple of hours. Note that it's only a library/compiler suite, not a different language. If I were to convert this to the multicore CPU approach you mentioned, ideally, I would only have to change stuff like the data types, since it's virtually all C. Porting it to a PS3 would be more difficult though.

bitshit said at 2008-03-15 09:37
PS: The screens look amazing, but whats up with these "cracks" you see here and there? Is it a side effect of the modified bressenham algorithm you mentioned?
It's an error in the LOD (Level Of Detail) algorithm, it skips over a single voxel on each LOD boundary I believe (if you look good you can see the "cracks" are actually formed by planes that are aligned to the XYZ axis). Nothing that can't be fixed. The other pixel-sized cracks are caused by the scene mesh itself and regular ray-tracing aliasing. Apart from the aliasing (which can be solved), I think Bresenham is behaving particularly well since I've managed to get around it's issues.

I've written down a few key milestones:
1. Dynamic scenes (being worked on)
2. Lighting (e.g. photon mapping with density estimation)
3. Object management (e.g. streaming objects in and out)
4. Anti-aliasing (the tracer's worst nightmare, preferably using ray coherency)
5. Integration into game engine / demo
Edited by IJs at 2008-03-17 00:09
IJs at 2008-03-19 02:22
Well it's fairly early yet but I got some form of voxel manipulation working:

http://bastage.student.utwente.nl/waves.avi
http://bastage.student.utwente.nl/waves2.avi (debug reflections)

Some perlin noise function I borrowed from a water deformation project applied on a 110k voxel object. No ray transformation but pure per-voxel transformations by the CPU. And yeah, the object actually has 2 sides (its a flat box). Performance runs slightly lower than the tracer's and is choking a little (mainly because of the recording program), but that can be solved.

I still have to fix the associated LOD problems, voxel data and optimize the performance so I can enable the lighting and the reflections again. I've also added support for voxel objects and matrix transformations on each one of those.
Edited by IJs at 2008-03-19 02:38
Spacerat at 2008-03-22 00:04
IJs said at 2008-03-13 05:31
I've ported the whole thing to NVIDIA CUDA, so my framerates (for the first-post reflection screenshot) are at 20FPS again as you should be able to see on my webpage.
If you get 20 fps with reflection then you get 40 without? Thats pretty fast.
IJs at 2008-03-22 02:12
Spacerat said at 2008-03-22 00:04
If you get 20 fps with reflection then you get 40 without? Thats pretty fast.
No, it's not that easy. I think I get about 30 FPS without secondary reflections.

ConsistentCallsign said at 2008-02-22 22:36
One of the advantages of voxel engines is their ability to render lots of dense volumetric smoke without killing the framerate. Therefore, you should add fluid physics or you might as well make a vector engine instead.. Volumetric smoke generator plugins for 3D modeling/animation programs all use voxels because voxels are so much faster to render than polygons.
There you go. You should be able to see a video at: http://solid.student.utwente.nl

Edited by IJs at 2008-03-22 02:16
ConsistentCallsign at 2008-03-22 10:07
IJs said at 2008-03-22 02:12
There you go.
Good; now you just need to make the dynamic voxels transparent and you will get real-time photorealistic smoke/fog/fire/water! :o ;D
IJs at 2008-03-30 07:44
And here are some photon mapping results. Just indirect lighting, photons only, no original voxel colors. Correct lighting is still miles off, but the basic principles are now in place.

Scene is a Cornell box (red left, green right, white everywhere else, light top-center).

For anyone that knows how photon mapping works, the irradiance estimation is actually incorrect right now considering the alternative photon storing structures I use: no kd-trees or whatever, just the uniform grid I've been using all along. Works excellent, I just need to find a robust way to estimate the radiance using my structure (I do have a paper about photons and uniform grids).

http://solid.student.utwente.nl/media/1/20080329-volume32.png

The photon tracing uses the same raytracer functions, so all it does is grab some few frames off the performance. If it's done once, there is no performance drop at all. ~20 fps on 8800GTS (G80) is still maintained.

A key point here is that no photon gathering is required during the raytracing any more. Since we can store colors (plus normals, and photon data) for each independant voxel in the scene (this scene currently takes up 16MB for that purpose), we can now perform the gathering once after every photon trace pass and store the results directly into the scene.
Edited by IJs at 2008-03-30 08:53
esuvs at 2008-03-30 08:39
That's really is cool! Those shadows look great :-)
Maren at 2008-03-30 08:41
Amazing 8)
ConsistentCallsign at 2008-03-30 08:42
looks like a photorealistic oil painting
Slang at 2008-03-31 05:32
I would be surprised if there is a commercial game product in the near future, which features the photon-mapped voxel raytracing rather than polygon(GPU rasterization)-based indirect lighting such as Geomerics (real-time radiosity).

http://www.youtube.com/watch?v=TCPQiCliKmg
http://www.youtube.com/watch?v=ewQuQxeyIP8

Personally, I do believe that the true value of voxels is dynamic destructibility, so I would like to see a game that is fun to play rather than photorealistic if it uses voxels (no offense ;)).
Edited by Slang at 2008-03-31 05:51
ConsistentCallsign at 2008-03-31 06:46
Slang said at 2008-03-31 05:32
Personally, I do believe that the true value of voxels is dynamic destructibility, so I would like to see a game that is fun to play rather than photorealistic if it uses voxels (no offense ;)).
Voxelstein 3D will be fun to play :D
Spacerat at 2008-04-02 21:15
IJ, do you plan to publish a paper about your engine ?
Would be interesting for others to know about.
Perhaps NVISION might be a good chance to promote voxels.

http://www.nvision2008.com/
(Proposal Deadline: April 4th, 2008, 300words max)
psychorosti at 2008-04-03 02:24
at least you could join if you are not afraid to go to a nvidia party where everything is coupled with them.

But since you are interested to port it to cuda anyway you might even get help from them if you join :)
IJs at 2008-04-03 02:35
Although I'm sure it would be interesting for all of us, I have 0.0 experience in writing papers and I'd rather get a demo up and running first, so getting a paper proposal done for the 4th is impossible.

However, I was actually thinking of participating in the demo contest at nvision '08 using this renderer + some voxel and music content. I think that would be way more exciting as well, although it would mean that something really good needs to be there in a few months (afaik the deadline's a month before the whole event starts)..

Downside is that flying over from here to San Jose is pretty expensive, but I think it's not required to be physically present there.

Here's a quote from nvision.scene.org:
April 1, 2008 - Two updates in the ruleset

Many has asked us about two things concerning the compo rules: CUDA and D3D10. We have augmented the rule specifications to explicitly allow those two, so feel free to experiment with new methods!
IJs at 2008-04-10 08:17
I've been trying to implement and research several ways of doing the lighting, e.g. photon mapping, radiosity or other approaches such as SH lighting.

IMO now that I have a ray-tracer that runs in real-time, I should be exploiting this for the global illumination calculations so it's most natural to pick something that involves shooting rays and preferably without any precomputation.

Photon mapping brings some issues with it. It's "gathering" stages rely on nearest neighbour searching. This means that in order to calculate lighting, we need to look for x nearest voxels around that point in 3d space for any photons that were stored during the photon tracing pass. Problem here is that you need to walk the surrounding "blocks" in the uniform grid, and that's a nice hit on the memory bus. (A crappy form of this was used in the earlier screenshot by the way.)

Then there's photon splatting, that involves creating "splats" (discs) during the photon tracing pass, whereever a photon hits the surface, so eventually all the photons blend with each other. Same problem here though. Another option here using a 2D texture atlas. This involves turning all surfaces into unique textures, as we know them from rasterization, and outlaying them next to each other in what is basically one big 2D texture. You could then "splat" the photons and it would only require circular 2D texture access. Problem here is that making a texture atlas and laying it out may be tricky (e.g. you need to align the surface edges or photon splatting won't work that well), and the quality from what I've seen in the paper is not that good.

Yet another thing is radiosity and all it's techniques. All in all, this basically involves shooting a relatively big series of rays from each "patch" (e.g. voxel) in the scene.. not ideal.

Spherical harmonic lighting is another lighting solution, although it's basically a clever way of "encoding" precomputed lighting (as traced per radiosity or raytracer) and putting it to use in your renderer. This has some disadvantages (limited dynamics, e.g. no deformations) and doesn't seem that suitable.

As you can see it's yet another way of finding and adjusting a suitable technique for use with voxels. I'm currently trying to adjust photon mapping to my needs. Any ideas are always welcome though.
Edited by IJs at 2008-04-10 08:20
Spacerat at 2008-04-10 13:39
You could try to implement SSAO - looks pretty nice in Crysis
http://en.wikipedia.org/wiki/Screen_Space_Ambient_Occlusion
psychorosti at 2008-04-10 18:27

Or try the extension of it - HSAO (Horizon Split Ambient Occlusion) developed by Miguel Sainz from NVIDIA. Gives a more realistic picture, take a look:

http://www.abload.de/img/hsaocn8.jpg
IJs at 2008-04-11 03:07
Thanks. I've tried another approach using photons, here's a few new results:

http://solid.student.utwente.nl/media/1/20080410-volume35.png
http://solid.student.utwente.nl/media/1/20080410-volume35b.png

Little crappy around the edges of each object, it's supposed to be fixed by a "boundary bias" but I haven't found the correct magic numbers for that yet.
Edited by IJs at 2008-04-11 03:29
Slang at 2008-04-11 06:12
You might want to check out this video now, IJs.
http://research.microsoft.com/users/kunzhou/2008/kd-tree.avi

For more info:
Real-Time KD-Tree Construction on Graphics Hardware (pdf link)
IJs at 2008-04-11 08:40
Very useful, thanks. It may indeed be smart to have a second structure for photons. Nice to see some research in real-time kd-tree construction.
ConsistentCallsign at 2008-06-10 15:50
something that might help IJs in reply to this post "Post by: IJs on April 10, 2008, 08:17:59 AM" is the fact that for radiosity, you may not have to be "shooting a relatively big series of rays from each "patch" (e.g. voxel)". There are some obscure things called "radiation transfer factors" that you can look up if you hunt hard enough. They were developed to figure out how hot different parts of a thing get based on how hot the other things around it are, based on radiant heat. To do it, they came up with a set of simple factors (just a number) that explain how much of one patch is coupled to any other patch. So calculating lighting becomes a matter of lookup from a reference table, once you know what your patch's shape is and its angle relative to the receiving patch. It's not really pre-computation, because the table of factors is forever fixed for every pair of patches. You just have to figure out which pair of patches you're dealing with. Anyway that may be a lead IJs would like to follow up.
IJs at 2008-06-26 00:07
Sorry, I've been busy the last few weeks due to university work. However, I've been figuring out ways of fixing the lighting in my renderer.

Here is one of the results:
http://solid.student.utwente.nl/media/1/20080613-photon10.png

I will be trying my best to get it to acceptable speeds again as soon as possible. This is my top priority concern right now.

ConsistentCallsign said at 2008-06-10 15:50
something that might help IJs in reply to this post "Post by: IJs on April 10, 2008, 08:17:59 AM" is the fact that for radiosity, you may not have to be "shooting a relatively big series of rays from each "patch" (e.g. voxel)". There are some obscure things called "radiation transfer factors" that you can look up if you hunt hard enough. They were developed to figure out how hot different parts of a thing get based on how hot the other things around it are, based on radiant heat. To do it, they came up with a set of simple factors (just a number) that explain how much of one patch is coupled to any other patch. So calculating lighting becomes a matter of lookup from a reference table, once you know what your patch's shape is and its angle relative to the receiving patch. It's not really pre-computation, because the table of factors is forever fixed for every pair of patches. You just have to figure out which pair of patches you're dealing with. Anyway that may be a lead IJs would like to follow up.
I'm always interested in finding out more on this subject. If you could supply me with a little more information, I'd very much appreciate it. There is not much on-line information available concerning these heat or radiation transfer factors.
Edited by IJs at 2008-06-26 00:10
Slang at 2008-06-26 00:23
Hey IJs,

I strongly suggest you check out the new DAAMIT's Ruby demo -- the developer claims that it's real-time voxel raytracing running on two RV770 cards.

Also, there's a thread on this topic at ompf.
http://ompf.org/forum/viewtopic.php?f=6&t=882
Slang at 2008-07-04 03:57
Now, the developer has officially confirmed that the Ruby demo uses real-time voxel raytracing.

Cinema 2.0 follow-up with Jules Urbach - LightStage
http://youtube.com/watch?v=Bz7AukqqaDQ
(@2:52) We basically had two separate demos, one for just showing the fact that we can look around the voxelized scene and render it. And this one now, this demo is a slight update on the original one, where I'm able to actually look around and essentially place voxels in the scene but also relight it as well.
(@3:52) Right now, the reason why we are not loading the entire animation is that the frame data is about 700 MBytes for every frame. We can easily compress that down to 1/100 the size, we can do about 1/1000 the size. And then with that we will be able to load much larger voxel data sets and actually have you navigating pretty far throughout the scene and still keep the raytracing and voxelization good enough that you don't really see any sort of pixelized or voxelized data sets too closely.
(@5:08) If we move to voxel rendering, which I'm planning to do for LightStage as soon as we're done with the Ruby demo, we will be able to have voxelized assets rendering in real-time at much higher resolutions than this. And that's gonna be giving us characters that look better than anything we can show in any of these videos. And we should have that ready probably before the end of the year.
Edited by Slang at 2008-07-04 04:06
ConsistentCallsign at 2008-07-04 04:27
Slang said at 2008-07-04 03:57
Voxels 1 - Polygons 0
8)
counting_pine at 2008-07-04 10:22
What, still one-nil?
Spacerat at 2008-07-04 14:46
Hm.. I still doubt that the complete scene is rendered as voxels.
I assume they do something similar to Relief Mapping of Non-Height-Field Surface Details or Shark-Skinning
Doing so, I think its possible to split most geometry into convex parts, which can be raycasted in the pixel-shader - also its still possible to have animations, which is difficult for pure voxel stuff.

http://www.inf.ufrgs.br/~oliveira/RTM.html
http://ompf.org/forum/viewtopic.php?f=6&t=882
http://www.gamedev.net/community/forums/topic.asp?topic_id=437642&whichpage=1&#2904184
Edited by Spacerat at 2008-07-04 14:58
IJs at 2008-07-08 10:23
Slang said at 2008-06-26 00:23
Hey IJs,

I strongly suggest you check out the new DAAMIT's Ruby demo -- the developer claims that it's real-time voxel raytracing running on two RV770 cards.

Also, there's a thread on this topic at ompf.
http://ompf.org/forum/viewtopic.php?f=6&t=882
May I remind you to look at the ompf.org critics thread again, before getting your hopes up.

http://ompf.org/forum/viewtopic.php?f=6&t=882&st=0&start=50
Slang at 2008-07-09 05:36
IJs said at 2008-07-08 10:23
May I remind you to look at the ompf.org critics thread again, before getting your hopes up.
Well, I still believe that voxel raytracing/raycasting will never replace GPU rasterization in the near future, and I don't expect this Cinema 2.0 in terms of my "hope", namely, fully dynamic destructibility in games. Note that the animation in the Ruby demo is frame-by-frame pre-voxelized/computed and thus it's obviously not appropriate for dynamic destruction.

I just posted the demo here since it's exactly what you're trying to do -- photon-mapped photorealistic voxel raytracing on the GPU.

Anyway, we can't determine any technical detail of the demo (esp. voxel traversal algorithm) simply due to information asymmetry, so it would be better to wait for the upcoming SIGGRAPH presentation rather than just presuming from fragmentary info.

P.S. Id Software is going to talk at SIGGRAPH on voxel raycasting. Not Carmack, though. ;)
http://ompf.org/forum/viewtopic.php?f=3&p=8319
Edited by Slang at 2008-07-09 07:32
Slang at 2008-07-11 04:51
OTOY Developing Server-Side 3D Rendering Technology
http://www.techcrunch.com/2008/07/09/otoy-developing-server-side-3d-rendering-technology/

So, the Transformers scene is rendered by real-time voxel raytracing using three RV770 cards -- one for the buildings and the other two for robots. If it's really possible to render such a highly-detailed model by even a single RV770 card using voxels, then that's pretty awesome.
Edited by Slang at 2008-07-11 06:44
Spacerat at 2008-07-11 10:25
http://www.tkarena.com/Articles/tabid/59/ctl/ArticleView/mid/382/ArticleID/38/PageID/169/Default.aspx

I just came over another article - they claim that they can visualize unlimitid sized voxel data with their "secret" method -
also they got lots of screenshots and a comparison to crysis. However, they don't have a demo there.

"Alternative to polygon system
The Unlimited Detail system consists of a compiler that takes point cloud data and converts it in to a compressed format, the engine is then capable of accessing this data in such a way that it only accesses the pixels needed on screen and ignores the others generating real-time graphics that look like unlimited polygons. it is also the best available way of displaying laser scanned environments, they can be of unlimited size and this will not slow down the system.
the system isn’t ray tracing at all or anything like ray tracing. Ray tracing uses up lots of nasty multiplication and divide operators and so isn’t very fast or friendly.
Unlimited Detail is a sorting algorithm that retrieves only the 3d atoms (I wont say voxels any more it seems that word doesn’t have the prestige in the games industry that it enjoys in medicine and the sciences) that are needed, exactly one for each pixel on the screen, it displays them using a very different procedure from individual 3d to 2d conversion, instead we use a mass 3d to 2d conversion that shares the common elements of the 2d positions of all the dots combined. And so we get lots of geometry and lots of speed, speed isn’t fantastic yet compared to hardware, but its very good for a software application that’s not written for dual core. We get about 24-30 fps 1024*768 for that demo of the pyramids of monsters. This will probably be released as “backgrounds only” for the next few years, until we have made a lot more tools to work with, then we will move in to sprites as well."

After reading this, QSplat firstly came into my mind. They have a demo here: http://graphics.stanford.edu/papers/qsplat/
Similar is also farvox.

PS: in the last vid of the techcrunch article at 1:12, you can see him putting the cam below the street and then see the street disappearing - its basically backface cullling. Common voxel engines don't have backfaceculling as this would remove small details sized a few voxels; perhaps they use separate cases for culling..

Oh and another paper I just came over - their interpolation method seems terribly slow, but the results are nice:
http://citeseer.ist.psu.edu/tiede98high.html
Edited by Spacerat at 2008-07-11 20:48
IJs at 2008-07-12 23:45
I believe I owed you guys a demo a really long time ago, so here's a demo I put together without any of the lighting stuff I was working on.

Beware of bugs!

Application: http://solid.student.utwente.nl/voxeltracer_demo.exe
Dependency libraries: http://solid.student.utwente.nl/voxeltracer_demo_dlls.exe

Oh, and you need a CUDA-compatible graphics card to run this. Let me know how it works out.

EDIT: Added some boundary checks.

http://solid.student.utwente.nl/voxeldemo1.png
Edited by IJs at 2008-07-13 00:35
esuvs at 2008-07-13 00:04
I get an error - "This application has failed to start because cudart.dll was not found. Re-installing the application may fix this problem".

Do I need to install Cuda separately or can you just bundle the .dll?
ConsistentCallsign at 2008-07-13 00:18
esuvs said at 2008-07-13 00:04
I get an error - "This application has failed to start because cudart.dll was not found. Re-installing the application may fix this problem".

Do I need to install Cuda separately or can you just bundle the .dll?
The .dll files (http://solid.student.utwente.nl/voxeltracer_demo_dlls.exe) and the .exe file (http://solid.student.utwente.nl/voxeltracer_demo.exe) needs to be in the same folder.
Edited by ConsistentCallsign at 2008-07-13 00:26
IJs at 2008-07-13 00:32
Correct. I updated the dll SFX to include the bin/ directory just now.
Spacerat at 2008-07-13 03:50
In my case it's working fine - I'm getting about 50fps for the camera view in the screenshot :-)
(GeForce 8800GTS, 320MB,CUDA1.1)
I only got problems with the steering.. but I just saw you wrote already in the readme.
IJs at 2008-07-14 06:28
Nice to know it runs faster than my 8800GTX. I'm a bit surprised though, as I believe my memory bus is theoretically wider (384-bit as opposed to 256-bit) than the 8800GTS.

Spacerat said at 2008-07-11 10:25
http://www.tkarena.com/Articles/tabid/59/ctl/ArticleView/mid/382/ArticleID/38/PageID/169/Default.aspx

I just came over another article - they claim that they can visualize unlimitid sized voxel data with their "secret" method -
also they got lots of screenshots and a comparison to crysis. However, they don't have a demo there.
The pictures reminded me of some kind of shear-warp or "texcell" volume rendering techniques I've seen. They basically rely upon stacking slices and possibly warping them. But you may be right about QSplat, it actually looks more like some kind of point splatting approach than anything else. I can't really distinguish any discs, but you can clearly observe the circles. I have yet to see how their approach turns out to be, the article doesn't really convince me.

Slang said at 2008-07-09 05:36
I just posted the demo here since it's exactly what you're trying to do -- photon-mapped photorealistic voxel raytracing on the GPU.
My main goal is trying to maintain as much properties of voxels as possible (such as dynamic destructability) while providing a realistic image.
Edited by IJs at 2008-07-14 06:33
Spacerat at 2008-07-25 17:04
For your voxel raycaster I was just wondering if there is a possibility to speed up the rendering by a 2 pass coarse to fine process..
Like raycasting the half resolution (256x256) in a first step, and in a second to raycast the full resolution.
The advantage could be, that nearby pixels have similar depth, this means you dont need to shoot the ray from the origin again, you can start at the depth of one of the neighbour pixels..
(Or maybe you already use something like this ?)

-Sven
IJs at 2008-07-27 06:59
Good point. I've thought about something similar (coherent or packet ray tracing, something like that). CUDA divides the screen up into blocks of a specific size (currently 8x8) and 16KB region of low-latency shared memory that's accessible from within each block. I can keep the previous hits/blocks in the shared memory and re-use them for the other rays in the same block and give the ray tracing pass a significant speedup with any luck.

Also, I'm currently ditching Ken's original RLE-encoded voxel data (originally in .vxl, I believe) in favour of a "spatial hashing" approach as part of my efforts to add dynamic voxel scenes and streaming to the renderer. I managed to animate my voxel scenes using Ken's format, but it was far from ideal. The spatial hashing is meant for static arrays, but I believe I can calculate them fast enough to allow dynamic scenes. I'm also trying to mimic the MegaTexture (or Sparse Virtual Texture) approach for streaming voxel world data into the renderer so we can start navigating around a large world instead of being restrained to a 1024x1024x256 box.
Spacerat at 2008-07-27 10:58
Shared memory sounds good. I'm also using it at the moment for quick occlusion tests, where each bit in the shared mem is linked to one pixel on the screen; it made the rendering about 25% faster in my case.

Yes, larger scenes is something we definitely need - the only problem I have so far is an efficient memory management for handling the dynamic allocations.
Something I came over is this lib: http://daniel.haxx.se/projects/dbestfit/
Its a very quick best fit memory manager for a given memory pool - however, it does not have garbage collection, which means it might force the algorithm to stop is the memory is too fragmented.. I am experimentig with it now to see if its sufficient - but also already thought about writing a manager myself with moving garbage collection to prevent fragmentation.
IJs at 2008-07-29 06:51
Fragmentation is definitely something that needs to be controlled. I'm not sure what kind of structure you're using (I recall someone writing a RLE-based raycast paper a while ago), but the spatial hashes I'm using should have a similar behaviour since they take up 2 * num_voxels (compared to RLE's 1 * num_voxels) on average if I'm not mistaken. My world is currently divided up into blocks of a certain size that can be streamed in and out (apparently similar to http://www.silverspaceship.com/src/svt/).

I just completed the spatial hashes and it's looking good in terms of performance (quality is, of course, the same). It seems like they're considerably faster than the previous RLE approach since data can be looked up in O(1) (constant) time, but I haven't benchmarked properly yet.
Edited by IJs at 2008-07-29 07:02
IJs at 2008-08-01 22:38
Now that the new virtual memory management stuff is complete and ready to be used I'm going to look into combining skeletal animation with voxels.
Alfalfa at 2008-08-19 07:03
I was interested in ray-tracing for a while, after reading about Intel's plans in that direction, but their acceleration structure seemed a bit static and their Larrabee hardware is still beyond the horizon, so I forgot about it.

Later I was wondering how one would go about making an XCom remake/modernization, especially the fully destructible environment aspect, when I ran across Ken Silverman's Voxlap engine, which looked very interesting. Then I started looking up information on voxels and remembered an interview with John Carmack (http://www.pcper.com/article.php?aid=532) I had read when I was researching ray-tracing but before I was familiar with voxels. His proposed voxel-octree approach seems like it would have faster performance and be easier to manipulate than Intel's BSP polygons. I also found an interesting paper (Interactive Gigavoxels: http://artis.imag.fr/Publications/2008/CNL08/RR-6567.pdf) detailing optimizations for both performance and memory.

The main thing I've been thinking about is how to manipulate the voxel scene. It seems as though moving large amounts of voxels would be extremely expensive, and would scale poorly with voxel size. Also, wouldn't repeatedly changing the orientation of voxel models deteriorate them? Or could you rotate the original model and place it in the scene at each frame?

How do you do reflections? Unless you store their normals, I wouldn't think a voxel has enough information for that, unless you sampled the surrounding voxels. In that case, if you had destructibility the surface could change, and then what?

What is programming for a GPU like? What are the limitations?

Sorry for all the questions, but I was toying with the idea of trying something like this, and you seem to be pretty far along.
IJs at 2008-09-08 09:31
Alfalfa said at 2008-08-19 07:03
I was interested in ray-tracing for a while, after reading about Intel's plans in that direction, but their acceleration structure seemed a bit static and their Larrabee hardware is still beyond the horizon, so I forgot about it.

Later I was wondering how one would go about making an XCom remake/modernization, especially the fully destructible environment aspect, when I ran across Ken Silverman's Voxlap engine, which looked very interesting. Then I started looking up information on voxels and remembered an interview with John Carmack (http://www.pcper.com/article.php?aid=532) I had read when I was researching ray-tracing but before I was familiar with voxels. His proposed voxel-octree approach seems like it would have faster performance and be easier to manipulate than Intel's BSP polygons. I also found an interesting paper (Interactive Gigavoxels: http://artis.imag.fr/Publications/2008/CNL08/RR-6567.pdf) detailing optimizations for both performance and memory.

The main thing I've been thinking about is how to manipulate the voxel scene. It seems as though moving large amounts of voxels would be extremely expensive, and would scale poorly with voxel size. Also, wouldn't repeatedly changing the orientation of voxel models deteriorate them? Or could you rotate the original model and place it in the scene at each frame?

How do you do reflections? Unless you store their normals, I wouldn't think a voxel has enough information for that, unless you sampled the surrounding voxels. In that case, if you had destructibility the surface could change, and then what?

What is programming for a GPU like? What are the limitations?

Sorry for all the questions, but I was toying with the idea of trying something like this, and you seem to be pretty far along.
Sorry for the belated response by the way, I've been switching back and forth from lighting techniques to animation techniques. Anyways, this is exactly what I've been pondering about for a while until I came with the idea of spatial hashes and a pretty solid implementation.

The references you're making to binary-space trees (and related) structures make sense. By using trees you basically have control over the size of each individual voxel, which gives you the potential advantage of scaling them properly without losing quality (e.g. as the camera gets closer). My implementation is a uniform grid, probably many times faster, but currently not capable of doing this. I'm basically using two uniform grids at different scales, that divide up the scene into blocks and subsequently into voxels. This scheme is used for various reasons including very fast real-time recalculation of voxels (dynamic scenes), as well as streaming in new blocks of new voxel data at the boundaries of the world.

Some info on my current implementation..

To store my voxels I basically make use of bitmasks (0/1 empty/full) and a perfect spatial hash table for the actual voxel data (normal, color, reflectivity, etc.). Perfect spatial hashing basically means you have a hash table "T" (containing the actual data) and a 3D hash function "h(x,y,z)" that takes a 3D voxel position as input and points to a unique(!) location within T.This allows for an extremely cheap constant O(1) lookup time, instead of a linear lookup time in case of Ken's original RLE approach.

Amazingly, the hash tables only take up roughly 2 times the amount of total voxels in your scene, a price I'm happy to pay for the gain in performance. However, although various papers on "spatial hashing" noted it was unsuitable for dynamic scenes, I did came up with a hybrid GPU/CPU approach (CUDA + SSE2 basically) that allows the engine to re-calculate the spatial hash table of the scene blocks effectively within a few milliseconds.

To allow dynamic scenes with voxels the way they were meant to be used, you would indeed have to consider each individual voxel of any object (consisting of voxels) that you would want to transform. Luckely, this is a highly parallel independent process that is suitable to be done completely on the GPU, so that's what I've done. The renderer is currently able to transform all the voxels of a particular object, given a transformation matrix, again within a few milliseconds (on the volume bitmasks alone). After that, the spatial hash would have to be recalculated, which also happens in real-time.

Coincidentally, I first tried to transform the voxels that were already in the GPU's memory (some sort of remove & re-add process without touching the rest of the scene), and it indeed led to a severe deterioration of quality given that voxels use absolute positions and these integer positions multiplied by the floating point transformation matrices lead to rounding errors and holes everywhere. After that I went for the above approach which basically recalculates the scene entirely on the GPU by linearly reading out all the "original" voxels (stored independently in GPU memory) of every object, transforming and adding these to the scene, without the need to involve the CPU for these voxels or any massive CPU/GPU memory transfers. I'm actually considering writing a paper on this technique if I get the time and motivation.

Besides applying transformation matrices on objects, creating or removing sets of new voxels is also possible in the above process (e.g. note the linearly stored voxels, we can easily add new voxels here). Although I've not yet implemented these features, this would also allow all kinds of real-time deformations of objects, one of the key advantages of using voxels in the first place.

Programming for a GPU is fundamentally different than regular CPU programming, since you have to make sure your implementations can run completely parallel and as independent of other "instances" as possible. Graphics cards have severe limitations, specifically the strain on memory bandwidth and looping or branching can be a big problem, as well as weird errors or tweaks you have to perform. But it (CUDA) is a pretty new platform, it's very exciting and it has a lot of potential.
Edited by IJs at 2008-09-08 20:32
Alfalfa at 2008-09-12 15:02
Sorry for the belated response by the way, I've been switching back and forth from lighting techniques to animation techniques. Anyways, this is exactly what I've been pondering about for a while until I came with the idea of spatial hashes and a pretty solid implementation.
Apologies on my part as well, but no worries, I figured you were implementing some cool stuff. Thank you for your in-depth reply.
The references you're making to binary-space trees (and related) structures make sense. By using trees you basically have control over the size of each individual voxel, which gives you the potential advantage of scaling them properly without losing quality (e.g. as the camera gets closer). My implementation is a uniform grid, probably many times faster, but currently not capable of doing this. I'm basically using two uniform grids at different scales, that divide up the scene into blocks and subsequently into voxels. This scheme is used for various reasons including very fast real-time recalculation of voxels (dynamic scenes), as well as streaming in new blocks of new voxel data at the boundaries of the world.
An octree is basically a hierarchy of uniform grids, with pointers between parent and child elements, or at least that's my understanding of it. In that Interactive Gigavoxels paper, I think they use an octree to accelerate ray tracing as well as to selectively load voxel bricks into memory (when a ray hits it). The voxels themselves are mipmapped to various levels of detail, which one is loaded being determined by the distance from the camera. Even if you don't have LODs, traversing an acceleration structure rather than a uniform grid should have performance benefits, if only allowing you to skip over empty space quickly.
To store my voxels I basically make use of bitmasks (0/1 empty/full) and a perfect spatial hash table for the actual voxel data (normal, color, reflectivity, etc.). Perfect spatial hashing basically means you have a hash table "T" (containing the actual data) and a 3D hash function "h(x,y,z)" that takes a 3D voxel position as input and points to a unique(!) location within T.This allows for an extremely cheap constant O(1) lookup time, instead of a linear lookup time in case of Ken's original RLE approach.

Amazingly, the hash tables only take up roughly 2 times the amount of total voxels in your scene, a price I'm happy to pay for the gain in performance. However, although various papers on "spatial hashing" noted it was unsuitable for dynamic scenes, I did came up with a hybrid GPU/CPU approach (CUDA + SSE2 basically) that allows the engine to re-calculate the spatial hash table of the scene blocks effectively within a few milliseconds.
I don't have any experience with hashing; any good sources for information on this? What do you mean take up 2 times the voxels, do you mean 2 times the memory?
To allow dynamic scenes with voxels the way they were meant to be used, you would indeed have to consider each individual voxel of any object (consisting of voxels) that you would want to transform. Luckely, this is a highly parallel independent process that is suitable to be done completely on the GPU, so that's what I've done. The renderer is currently able to transform all the voxels of a particular object, given a transformation matrix, again within a few milliseconds (on the volume bitmasks alone). After that, the spatial hash would have to be recalculated, which also happens in real-time.
So you perform the expensive transformations on the small data, then just re-reference the big stuff? Nice. 8)
Coincidentally, I first tried to transform the voxels that were already in the GPU's memory (some sort of remove & re-add process without touching the rest of the scene), and it indeed led to a severe deterioration of quality given that voxels use absolute positions and these integer positions multiplied by the floating point transformation matrices lead to rounding errors and holes everywhere. After that I went for the above approach which basically recalculates the scene entirely on the GPU by linearly reading out all the "original" voxels (stored independently in GPU memory) of every object, transforming and adding these to the scene, without the need to involve the CPU for these voxels or any massive CPU/GPU memory transfers. I'm actually considering writing a paper on this technique if I get the time and motivation.
I'm sorry, I don't quite understand this part. What is the difference in how you handle the transformations that avoids deterioration? Would love to read a paper on this.

You mentioned you were working on animation: how is that coming along?

Thanks again.
IJs at 2008-09-23 09:18
I'm not exactly sure what kind of deterioration you were originally pointing at, but what I can tell is that I store the original voxel meshes (like how they were imported) somewhere on the GPU. This original mesh is then transformed according to a per-object transformation matrix. I've had some trouble with significant artifacts in the past (due to the Bresenham's "approximation" algorithm as well as rounding problems when applying a specific rotation to meshes). However I did manage to fix this by making each voxel surface at least 3 or 4 voxels wide, this prevents any holes in the surfaces in case of integer errors.

My goal is bringing out at least one paper on the subject of voxel/GPU dynamic transformations (or additionally the ray casting algorithm) this year which should shed some light on things. Meanwhile I'm stuck with figuring out a suitable global illumination technique for my renderer to make it complete, and I'm not putting out another demo until I figure this out :(
theamusementmachine at 2008-09-24 00:19
Hello, please have a look at our site http://theamusementmachine.net or straight to some videos here http://www.youtube.com/user/theamusementmachine. Some notes: realtime raytracing/raycasting on the GPU, mostly procedural type datasets... Outdoor natural type stuff not a lot of lighting done on it... The most important thing to me right now is fast compressed access to the voxels...choices include regular 3d grid, octree, hash map, displacement map(2d) and apparently voxlap uses RLE? Anyway I think you will enjoy our videos for now. It does amaze me how fast voxlap is on the cpu... thanks

Lee

ps
- also, IJs - sounds like what I am thinking about doing...in terms of spatial hashing the volume. The compression and access times it gives are tough to beat, especially the access time. In terms of traversing an acceleration structure versus a 3d grid, I found as far as the octree goes at least that it is very costly to traverse up and down or even just down the tree. In terms of empty space skipping, there is no reason you can't skip empty space in a regular grid if you have extra information and assume a relatively conservative change in slope. The question is how far is it safe to step from this point in 3d space. Sphere tracing stores the answer at every point in the volume which is really nice but even that is not a directional quantity and thus a conservative step. I'm new on this board so please cut me some slack...thanks again
Edited by theamusementmachine at 2008-09-24 01:04
IJs at 2008-09-24 21:34
Hey Lee,

Good work on your ray-tracer. It seems (from what I've seen) it's based on some kind of slice-based or heightmap algorithm, something I thought about using before I got access to CUDA.

3D grids for voxel data are extremely big and sparse, and I'm generally not a fan of octrees and related tree structures for GPU applications. The RLE approach is pretty nice actually but can be costly to lookup as well. Anyways, spatial hashes have extremely fast constant lookup times (e.g. hash_map[3D_to_hash(x,y,z)]). The question is whether you can find an implementation that can recalculate the hash maps fast enough.

And as far as empty space skipping is concerned, I use multiple uniform grids at different scales to skip certain fixed distances along the rays.
Edited by IJs at 2008-09-24 21:36
Spacerat at 2008-09-25 01:18
As for octrees, I also would like to add a reference: the siggraph presentation of zelex's work :)
His SVO implementation renders a 32k^3 voxel creature @60fps on a GTX280.
http://s08.idav.ucdavis.edu/olick-current-and-next-generation-parallelism-in-games.pdf

@Lee: can you tell what I do wrong ? I think you saw my screenshots. Perhaps the attached log helps.
Edited by Spacerat at 2008-09-25 13:57
theamusementmachine at 2008-09-27 11:06
Spacerat, I saw your screenshots...but they dont give enough info...look at the help.log, that is your best bet, If you can't get it to work try the chat link on my website....dont waste too much time on it just yet, we are hoping to have things updated some (finally) this weekend with any luck... If all goes well the website will be in 3d after you install The Amusement Machine, and will have a basic development system to write your own gpu raytracing code...all in realtime (without having to leave the exe), in summary look at the help.log and just wait a little longer to run The Amusement Machine.

IJs, thanks very much, it used to use heightmap data and probably still will for terrains, as you can see in one of our videos the terrain even though it is a heightmap need not be non-overlapping, i think the caves video shows what i mean. Hash Maps.... we are working right now on this and I practically guarantee they are the method everyone should be using but why do you need to recalculate them all the time? For dynamic objects? As i said to spacerat above soon you will have a 3d dev environment for GPU raytracing with some of my example code already in it ready to tweak. Right now there is no CUDA in this dev environment but if I go with CUDA then I would have to drop support for ATI/AMD cards? .... I know there is not a good solution there right now.... anyway the code is all in HLSL for now so that is what everyone will get to start.... Thanks to all who viewed and ill post when i update (which will be soon)

Lee
IJs at 2008-09-27 21:28
You would need this for dynamic objects.

As far as CUDA is concerned, it is indeed targetted at NVIDIA cards, but it's very powerful. On the other hand, there seems to be some progress on other platforms such as OpenCL or RapidMind (although I stepped away from this at some point due to low performance compared to CUDA). It's a choice you have to make.
straaljager at 2008-10-07 01:35
Hi,

I am very interested in the potential of voxel rendering. I have read somewhere that the guy that made the AMD Ruby demo uses geometry maps (or geometry images) to store the voxel data of "Light Stage 5 structured data" which is characters I suppose. Geometry maps seem to be useful for raycasting dynamic objects according to a paper by Carr et al. (Fast GPU Ray Tracing of Dynamic Meshes using Geometry Images, 2006). They're using geometry maps to store triangles instead of voxels, and I think they could get an extra speedup if they were storing voxels for raycasting. I wonder if any of you have tried this method for dynamic objects yet.

IJs, do you think it's possible to use your technology to render a highly detailed GTA-like environment with moving cars, people (and multi-player :)) ?
Spacerat at 2008-10-07 15:52
Hm.. in my eyes, the raycasting performance of a 512x512 map is really poor. But an adaptive geometrymap rasterizer is something I thought about recently. The algorithm: Rasterize every n'th texel in the map in u/v and then backcheck the rasterized screen-space coordinates whether there is space left in between (by using zbuffer of course). If there is, upsample and rasterize more texels in this area until there is no change in screen-space coordinates anymore. Also necessary for this idea are mipmaps of the original image, tiles (quadtree perhaps) that are loaded dynamically with higher resolution and a boundary map to lead the upsampling process. I think thats superior in speed to raycasting since multiple per-pixel tree-traversals are not required, which is the biggest performance-hit in SIMD raycasting. In case a quadtree-structure for the geometry-map is used, its possible to store lists of each tree level and then simply rasterize all tiles - so there wont be a single tree-traversal during rendering. Also the result will be a smooth surface as hardware accelerated texture filtering can be used. The method also allows to add antialiaing very easy.

The only problem might be the memory consumption in case of a big and detailed model.. its like 3 floats (12*8=96 bit) per coordinate - in a pointerless, binary octree its like 1.x bit / coordinate, so about 100 times more. Perhaps wavelet compression, 16 bit floats etc might be a help there..
Edited by Spacerat at 2008-10-07 16:38
straaljager at 2008-10-08 04:38
Thanks for the comprehensive answer spacerat. As for the compression, the OTOY guy said he was using wavelet compression in one of the first video's.
Spacerat at 2008-10-09 20:52
I think it could also be interesting to use geometry-maps for skeletal animation. In this case, 1 matrix multiplication and 2 matrix blending operations would be required for 3 weights per rasterized pixel. Since the upsampling in screenspace is adaprive, there wont be much overdraw in the ideal case.
ConsistentCallsign at 2008-10-27 07:15
The sad little gamer boy, whose parents died in a car crash :'( :'(, does not have a CUDA-compatible GPU!! >:( >:(

oh well.. CUDA will eventually become obsolete anyway..
IJs at 2009-01-17 06:39
I guess this topic more or less slipped my mind. Oops. My apologies. Happy new year by the way.

Anyways, I've updated the dead links in the first post (moved servers again) and added a little more info to my website as well.

I've been scanning through lots of papers on lighting lately and coming up with implementations that could be applied to my renderer. Here is a snapshot that's based on Keller's "Instant Radiosity" technique + something called "Imperfect Shadow Maps" plus some other fancy algorithms in a standard Cornell Box scene:

http://ijs.bastage.net/img9l.png

It basically involves a few photons, Virtual Point Lights, and a lot of voxels. And yes, it's still real-time. Also, before you mention the vagueness of the shadows and colors; I'm a little unsure about how to proceed with blending the different types of lighting together, but I'll probably figure it out.

Now that the lighting is at an adequate level, I guess it would be fair enough to spend some more time on getting the voxels in the scene moving around massively at real-time performance as to make it a little more practical for everyday usage.
Edited by IJs at 2009-01-17 06:50
hark at 2009-01-17 07:52
Beautiful. How's the performance? I'm guessing it's good enough since you're proceeding with more interactive elements.

Forum archive

Real-time voxel raytracing, in development..

Re: Real-time voxel raytracing, in development..