What's the big (64-bit) deal, anyway?
I run Linux at home. It is so much easier to code for external periferals it isn't funny! The only problem I've had with my particular version is that it chokes on malloc, but works fine if I just declare 500MB of static ram.
I think the model we use can be tuned to fit the machines we have at hand. If we go a million steps in time over a 400x400x400 cube of space and things look ok, it is a pretty good clue on how to build an experiment. If it blows up, you know how to *not* build an experiment!
The first nuclear reactors were built with slide rules and 10 digit accuracy lookup tables. I think the toys we have on our desks are quite sufficient to build a fusion reactor.
I think the model we use can be tuned to fit the machines we have at hand. If we go a million steps in time over a 400x400x400 cube of space and things look ok, it is a pretty good clue on how to build an experiment. If it blows up, you know how to *not* build an experiment!
The first nuclear reactors were built with slide rules and 10 digit accuracy lookup tables. I think the toys we have on our desks are quite sufficient to build a fusion reactor.
True that. But how much better the toys available for just a bit more!drmike wrote:The first nuclear reactors were built with slide rules and 10 digit accuracy lookup tables. I think the toys we have on our desks are quite sufficient to build a fusion reactor.
Our hardware guy tells me that we're paying ~$6k for a dual core 4-processor Xeon Dell with 16-20 GB of RAM.
I read your white paper on the calculations, BTW. It's exercising brain cells I haven't used in 20 years almost. Good stuff.
Clearspeed has a floating-point accelerator that will turn your desktop into a floating-point monster.
http://www.wired.com/science/discoverie ... 3/10/60791
That five-year-old Wired story is obsolete... Clearspeed's latest product, the e620, puts 80 GFLOPS on your desktop.
http://www.clearspeed.com/docs/resource ... _05_07.pdf
Too bad they can't seem to get their product into the channel. IBM is the only one carrying it, and they seem to want $15k a copy.
http://www.wired.com/science/discoverie ... 3/10/60791
That five-year-old Wired story is obsolete... Clearspeed's latest product, the e620, puts 80 GFLOPS on your desktop.
http://www.clearspeed.com/docs/resource ... _05_07.pdf
Too bad they can't seem to get their product into the channel. IBM is the only one carrying it, and they seem to want $15k a copy.
Dr Mike,
Yeah, the current driver situation in Winders machines sucks.
When I have my drutthers I like DOS. Well tested. Simple. And drivers are a real piece of cake.
Writing drivers that interface with C is butt ugly though. Still better than Winders 13 levels of indirection.
Yeah, the current driver situation in Winders machines sucks.
When I have my drutthers I like DOS. Well tested. Simple. And drivers are a real piece of cake.
Writing drivers that interface with C is butt ugly though. Still better than Winders 13 levels of indirection.
Engineering is the art of making what you want from what you can get at a profit.
I remember looking up Clearspeed. I drooled a lot! Glad to hear they are still alive. But if push comes to shove, you just get a bunch of FPGA's and build your own compute engine that is dedicated to the problem at hand. It just costs a lot
I've been looking at the math some more, and I like the idea of going with integrals versus differentials. I think in the end, the difference between charged species can be dealt with more easily if we integrate first, then compute differences to get forces, then operate on particles. In a purely differential equation set, small differences really are a big problem.
It will be fun to see what kinds of toys we can put the models on!

I've been looking at the math some more, and I like the idea of going with integrals versus differentials. I think in the end, the difference between charged species can be dealt with more easily if we integrate first, then compute differences to get forces, then operate on particles. In a purely differential equation set, small differences really are a big problem.
It will be fun to see what kinds of toys we can put the models on!
If you decide to go the FPGA route let me know. I have some experience along those lines.
We can build a FORTH engine to handle the CPU type stuff needed along with a custom ALU or three to do the math.
We could even make it look like an x86 machine, since that is stack oriented.
We would just make the stacks deeper for convenience.
We can build a FORTH engine to handle the CPU type stuff needed along with a custom ALU or three to do the math.
We could even make it look like an x86 machine, since that is stack oriented.
We would just make the stacks deeper for convenience.
Engineering is the art of making what you want from what you can get at a profit.
If someone wants a challenge, look at writing the simulator to run on a modern video card. http://en.wikipedia.org/wiki/GPGPU The fairly simple code running on a massively parallel dataset of vectors might be a good fit.
Yeah, that sounds like a great idea! Here is an abstract from one of the references
Abstract
In visualization and computer graphics it has been shown that the numerical solution of
PDE problems can be obtained much faster on graphics processors (GPUs) than on CPUs.
However, GPUs are restricted to single precision floating point arithmetics which is insufficient
for most technical scientific computations. Since we do not expect double precision
support natively in graphics hardware in the medium-term, we demonstrate how to accelerate
double precision iterative solvers for Finite Element simulations with current GPUs by
applying a mixed precision defect correction approach. Our prototypical algorithm already
runs more than two times faster than a highly tuned pure CPU solver while maintaining the
same accuracy. We present a series of tests and discuss multiple optimization options.
Acceleware seems to be building whole systems bundled with third-party software bolted on:
http://www.acceleware.com/about/overview_LoLC8h.cfm
They claim with the latest NVIDIA card, the Tesla, they can get near to 1 TFLOP:
http://www.tgdaily.com/content/view/34656/135/
Here's the NVIDIA page for the Tesla:
http://www.nvidia.com/object/tesla_comp ... tions.html
The product spec says they're still only doing single-precision floating-point, but they plan on adding a 64-bit version Real Soon Now:
http://www.nvidia.com/docs/IO/43395/Com ... _Dec07.pdf
The chipset comes with a C SDK, apparently something nobody else had thought of before.
Some interesting discussions here:
http://www.gpgpu.org
http://www.acceleware.com/about/overview_LoLC8h.cfm
They claim with the latest NVIDIA card, the Tesla, they can get near to 1 TFLOP:
http://www.tgdaily.com/content/view/34656/135/
Here's the NVIDIA page for the Tesla:
http://www.nvidia.com/object/tesla_comp ... tions.html
The product spec says they're still only doing single-precision floating-point, but they plan on adding a 64-bit version Real Soon Now:
http://www.nvidia.com/docs/IO/43395/Com ... _Dec07.pdf
The chipset comes with a C SDK, apparently something nobody else had thought of before.
Some interesting discussions here:
http://www.gpgpu.org
drmike, here's the core of their approach from the paper you linked to:
So I guess it's easier to get the second 32-bit float representing the second half of the calculation than it is to get the first half. (I had to punt a bit to get some of this to display, as it doesn't seem to want to let me input HTML entities. Bother.)We present a mixed precision defect correction algorithm for the iterative solution of linear equation systems. The core idea of the algorithm is to split the solution process into a computationally intensive but less precise inner iteration running in 32 bit on the GPU and a computationally simple but precise outer correction loop running in 64 bit on the CPU. Our approach can be easily implemented on top of an existing GPU-based single precision iterative solver in applications where higher precision is necessary. The algorithm requires two input parameters, (epsilon sub)inner and (epsilon sub)outer as stopping criteria for the inner and outer solver respectively. Let A denote the (sparse) coefficient matrix, b the right hand side, x the initial guess for the solution and a scaling factor (alpha). Subscript 32 indicates single precision vectors stored in GPU memory and 64 indicates double precision vectors stored in CPU memory.
NICE!
It's worth comparing some simple calculations with single and double precision. The GPU's job is really to create pretty pictures, and maybe
the visualization part of the task is complicated enough that is all we need it
for. But that Accelware looks really cool.
As I grind thru the math I'll keep all these comments floating by, it might help guide my thought process about how to go about solving the problem.
It's worth comparing some simple calculations with single and double precision. The GPU's job is really to create pretty pictures, and maybe
the visualization part of the task is complicated enough that is all we need it
for. But that Accelware looks really cool.
As I grind thru the math I'll keep all these comments floating by, it might help guide my thought process about how to go about solving the problem.
Lost in the announcement regarding the Mac Air notebook was the new Mac Pro deskside computer:
http://www.apple.com/macpro/
Dual Intel Xeon Harpertown (four cores each) @ 3.2 GHz, up to 32 GB RAM (ships with 2 GB, but Apple always overcharges for RAM), $2799 a copy.
http://www.apple.com/macpro/
Dual Intel Xeon Harpertown (four cores each) @ 3.2 GHz, up to 32 GB RAM (ships with 2 GB, but Apple always overcharges for RAM), $2799 a copy.
Interesting! It seems Apple finally woke up. That's a reasonable cost for what you get, and you can even program it! Check out their developer stuff:
Open Source
It'd be fun to make all the fans hum on that puppy. 32GB of *RAM*!!
Can't wait to see what comes out next week.
Open Source
It'd be fun to make all the fans hum on that puppy. 32GB of *RAM*!!
Can't wait to see what comes out next week.
