Programming languages

Discuss life, the universe, and everything with other members of this site. Get to know your fellow polywell enthusiasts.

Moderators: tonybarry, MSimon

Post Reply
BenTC
Posts: 410
Joined: Tue Jun 09, 2009 4:54 am

Post by BenTC »

MSimon wrote:Well no matter. If you want ultimate speed design a FORTH CPU in FPGA.
What size FPGA do you need? (also, how do you rate FPGA size?)
eg 5000 LUT, 12000 LUT?

For instance, for this code http://www.opencores.org/project,fcpu,vhdl%20code I have no idea how to judge the compiled size.
In theory there is no difference between theory and practice, but in practice there is.

MSimon
Posts: 14334
Joined: Mon Jul 16, 2007 7:37 pm
Location: Rockford, Illinois
Contact:

Post by MSimon »

BenTC wrote:
MSimon wrote:Well no matter. If you want ultimate speed design a FORTH CPU in FPGA.
What size FPGA do you need? (also, how do you rate FPGA size?)
eg 5000 LUT, 12000 LUT?

For instance, for this code http://www.opencores.org/project,fcpu,vhdl%20code I have no idea how to judge the compiled size.
For a 16 bit machine an 18 X 18 FPGA is good.

If you go to 32 bits a 34 X 34 would work.

I don't know how many LUTs that would be. It has been a while since I looked into it.

A 16 bitter was implemented in a XC4005XL 9K-gate FPGA Board

http://www.sandpipers.com/cpuclass.html

So figure 40K "gates" or so for a 32 bit machine.

The XC4005 is 466 LUTs - for a 16 bit machine.

So a 32 bit machine might be around 2,000 LUTs. That would allow you to implement a simple 16 cycle 32X32 multiplier. (2 word adder and 32 bit shifter). i.e. you multiply by 2 bits at a time. Now given carry speed in the adders it might take 32 clock cycles. But it saves a LOT of gates vs a full up multiplier.

If you can run the chip at 20 MHz (very conservative) you get a full 32X32 multiply with a 64 bit product about every 2 uSec. Adequate for most control situations. A divider could run at about the same speed.

That means a PID loop could run at a 100 KHz rate. More than adequate for a 10KHz process.

http://www.xilinx.com/support/documenta ... /ds015.pdf

The XC4036 looks nice for a 32 bitter.

Costs:

http://search.digikey.com/

I also like the TI DSP chips.
Last edited by MSimon on Wed Mar 31, 2010 12:35 am, edited 1 time in total.
Engineering is the art of making what you want from what you can get at a profit.

mad_derek
Posts: 46
Joined: Tue Jan 08, 2008 4:08 am
Location: UK (mostly)

Post by mad_derek »

Yes, well. I can do COBOL, ALGOL, FORTRAN, BASIC, Pascal, C, C+, C++ ... unfortunately I just don't understand Forth, LISP etc. Must be a mindset thing. I bought all the books (ages ago admittedly) but I just cannot get a Forth or a LISP program to do anything. I don't mean anything predictable - I mean anything at all ... downto and including the 'Hello world' program ...

Oh well!
Insanity Rules!

blaisepascal
Posts: 191
Joined: Thu Jun 05, 2008 3:57 am
Location: Ithaca, NY
Contact:

Post by blaisepascal »

mad_derek wrote:Yes, well. I can do COBOL, ALGOL, FORTRAN, BASIC, Pascal, C, C+, C++ ... unfortunately I just don't understand Forth, LISP etc. Must be a mindset thing. I bought all the books (ages ago admittedly) but I just cannot get a Forth or a LISP program to do anything. I don't mean anything predictable - I mean anything at all ... downto and including the 'Hello world' program ...

Oh well!
I'm more experienced with PostScript than with Forth. Perhaps it's the dialects of Forth I've used, but I've found it to lack a certain level of abstraction I like. With PostScript, you can push strings, symbols, block of code, and other first-class types on the stack, but in Forth it seems you are limited to numbers.

The trouble with just doing COBOL, ALGOL, FORTRAN, BASIC (other than the immense amount of capital letters), Pascal, C, C++, etc is that basically they all work in very similar ways. Sure, the ALGOL derivatives are a bit more structured than COBOL, FORTRAN, or BASIC, but all of them are basically imperative, proceedural. It is very beneficial to learn languages that work differently, like Lisp (or other mainly functional languages like ML or Haskel), or Prolog, or Smalltalk. It's not so much that you'll use them, but that they teach you a new way to think about programming. I've written plenty of code in a functional programming style in C++, for instance, and occasionally a declarative program in Lisp.

MSimon
Posts: 14334
Joined: Mon Jul 16, 2007 7:37 pm
Location: Rockford, Illinois
Contact:

Post by MSimon »

With PostScript, you can push strings, symbols, block of code, and other first-class types on the stack, but in Forth it seems you are limited to numbers.
With Forth you just push the address of the object on the stack and go from there. There are other ways of course - but that is the most common.

Hello world in Forth:

: Hello ." Hello World" ;

You invoke it by typing Hello <cr>. No Main or any of that other carp required. No prototype. No compiling (well actually ":" does the compile and ";" ends it. But the compile is invisible for short stuff.

Use:

: Foo 1 2 + . Hello ;

Foo <cr>

And you get: 3 Hello World

The "." prints the top of the stack. If you want it to be non-destructive you would do: DUP .

BTW <cr> = Enter

It is true though - most folks corrupted by other languages don't adapt well to Forth. Forth is about factoring - the way you are supposed to program. Most languages don't encourage factoring. Why? Context changes are expensive.

Let me add: anything that you can do in Post-Script can be done in Forth. Anything missing is simple to add. After all Post Script was modeled on Forth.
Engineering is the art of making what you want from what you can get at a profit.

MSimon
Posts: 14334
Joined: Mon Jul 16, 2007 7:37 pm
Location: Rockford, Illinois
Contact:

Post by MSimon »

The very best book on factoring I have ever read:

Thinking Forth
Engineering is the art of making what you want from what you can get at a profit.

Luzr
Posts: 269
Joined: Sun Nov 22, 2009 8:23 pm

Post by Luzr »

MSimon wrote:
Luzr wrote:
MSimon wrote: Forth did this 20 to 30 years ago. It is a very nice wheel. Why reinvent it?
Forth? Joking?

If I remember Forth well, there are at least 2 problems with it:

- its stack based model is totally incompatible with modern high-performance CPU architecture.

- it has zero compile time checks. That means it can be used for small toy projects only, but I cannot imagine Forth to be used for 100000 lines projects with a team of developers. How are you going to maintain interfaces when there are no function signatures?
Dual Stacks are easy on MODERN architectures. ARM for instance. Although I will admit that the ARM architecture is crippled for RETURN stacks.
Easy != effective. I suggest you to spend some time studying MODERN cpu architectures. You know, Tomasulo model based out-of-order execution, register renaming and stuff.

Then I suggest you to investigate how much trouble in FP performance has been caused by that great x87 FPU forth-like stack in the past and why SSE2 is so much improvement over it in terms of performance even for scalar computations.
You want compile time checks? Adding them is trivial. But you have a point. 100,000 line programs are tough. But FORTH is not for brute force programmers. It is for programmers who would rather finesse a problem in 10,000 lines.
Really, number of lines is only one benchmark. Much important is total development time and even much more important than that is maintainance costs.
Your kind of thinking is why programming is in such a sorry state. Everyone thinks - we will get 100 programmers and brute force it. Give me the top 10 out of that 100 and let me finesse it. There will be less code. It will be more thoroughly tested. And it will be easier to maintain.
Just wishful thinking. There is a reason why Forth is only used in niche scenarios.

Well, talk is cheap, show me some code. I you want to prove your point, reimplement this in Forth:

http://www.ultimatepp.org/www$uppweb$vsd$en-us.html

and then do benchmarks.
It is similar to the way we design processors - throw more gates at the problem. Fortunately the speed of light is forcing us back in the direction of simplicity.
I doubt you really know enough about CPU design to judge - which in fact is a perfect place for some ultimate engineering. Unlike empty claims about would-be CPUs, it is a very competitive area (albeit with only three or four real competitors) and lagging behind means you go out of bussiness.

Do you think it is some evil conspiracy that all modern CPUs basically follow similar design? (And I am not speaking about x86 here, but the real internal design - tomasulo ooo execution model, register renaming, branch prediction, L1, L2 caches etc...)

MSimon
Posts: 14334
Joined: Mon Jul 16, 2007 7:37 pm
Location: Rockford, Illinois
Contact:

Post by MSimon »

Then I suggest you to investigate how much trouble in FP performance has been caused by that great x87 FPU forth-like stack in the past and why SSE2 is so much improvement over it in terms of performance even for scalar computations.
Oh. I'm sure. I've done math on the 8051 using the stack and if you don't think out the problem in advance and really think it through you can get in a LOT of trouble.

But if you do THINK about what you are doing and order the operands correctly data stacks really simplify things a lot.

It would have helped if the x87FPU (same as the 9511 IIRC) had a deeper stack. It would have been much better. BTW I designed the 9511 into a Z-80 board that I sold about 10K copies of. For the stuff I was doing at the time I never needed it.

But if you just want to brute force your way through problems without a lot of deep thinking modern processors are excellent. Kluge piled on kluge. And totally untestable in any rigorous way because stimulating all the possible combinations of the way things could execute is out of the question. You just fix anamalous results with exception handlers - if you can recognize them.

And so true about speed of development. You might want to find out why FedEx went with Forth again for their delivery register. They had the option of choosing any language (they had been using Forth) and decided on Forth for the 3X to 6X gain in development speed. And the smaller memory footprint. Very important in a battery powered unit.

But the method is somewhat applicable to any project. You develop the specs so well that the code practically writes itself. I was chief (about 70% of the design) architect on the aerospace CAN bus tester I mentioned above. Hard real time. We used C - but it did cause problems. You never knew what the compiler was going to do to your source. Often we had to write snippets and then examine the assy code to see what abomination the compiler was producing and develop workarounds. But aerospace is different. Working code needs seven nines uptime and systems need 1E9 Hrs mtbf or better. And of course a tester should be about 10X better than the unit tested. On your typical machine these days the Reset button is an integral part of operations. Let me add that my boss on that project was sceptical about my methods. And he was one of those "hard man" bosses. He was always smoked at me for my attitude. And then at the end (we made schedule - unusual in aerospace) he looked at the costs and we came in about 10% below budget - also unusual in aereospace. He was rather pleased. Even gave me an attaboy.

But I was able to use Forth on that project in the same way Open Boot is used - I could check out the SJA 1000 with Forth where the manual was unclear in minutes and then port the knowledge to the coders.

And let me re-iterate - we have the best tools possible (hardware and software) for the way things are currently done. Long pipelines are real speed ups for long routines. But is writing long routines the best way to write software? It makes testing harder. And then all that wonderful branch predicting so you can be ready for a branch. It seems to me that a two stack architecture which is always ready to branch would be simpler. And if you design your processor right a return instruction can be included with most other types of instruction. So you are already fetching from the stack while doing your add (or whatever) no branch predictor required.

The speed of light is encroaching on complexity. We will eventually get back to simplicity. But a few things will need to be changed along with the simpler processors.

We are really more stuck by our way of thinking than by what is possible.

But to tell you the truth I don't mind too much on a personal level. The way things are currently done gives me lots of opportunities. Or it did before I had to stay home to take care of my son.

And your point about caches? Well taken. I really like small simple processors and devoting the rest of the available silicon to on chip RAM. It speeds things up a lot.
Engineering is the art of making what you want from what you can get at a profit.

MSimon
Posts: 14334
Joined: Mon Jul 16, 2007 7:37 pm
Location: Rockford, Illinois
Contact:

Post by MSimon »

It is not evil conspiracy.

Just the usual "we have always done it this way and besides we have so much invested in doing it this way".

The whole idea of register based machines came out of the WW2 code breaking processors. But there are advantages to zero operand machines and data flow architectures.

But every one is comfortable with the current ways. Why change until a brick wall is hit?

It is the old path dependence thing. There may be better ways - but that was not the path taken and for the time being we are stuck.

===

At the aerospace company I worked for we had some very sharp people in technical management. They understood all the above points I made and agreed with them. But in the end the decision was made based on how wrenching making the change would be. And they decided to stick with what they were doing. And you know? I respected that because:

1. They understood and agreed with my points
2. My points were not the only criteria

In fact one of my managers was quite conversant with Forth and we spent about a week (off and on) discussing the pros and cons of various architectures.

As I said. Some very sharp cookies.
Engineering is the art of making what you want from what you can get at a profit.

Luzr
Posts: 269
Joined: Sun Nov 22, 2009 8:23 pm

Post by Luzr »

MSimon wrote:
Then I suggest you to investigate how much trouble in FP performance has been caused by that great x87 FPU forth-like stack in the past and why SSE2 is so much improvement over it in terms of performance even for scalar computations.
Oh. I'm sure. I've done math on the 8051 using the stack and if you don't think out the problem in advance and really think it through you can get in a LOT of trouble.

But if you do THINK about what you are doing and order the operands correctly data stacks really simplify things a lot.
Hey, no quelling about that. Stack based architectures ARE simple and quite effective if your HW is poor. That is why many early compilers used stack based intermediate code ("p-code") and usually based compiled result on stack operations. Converting standard expressions to stack ops is trivial. Been there, done that...

But it is SLOW.
But if you just want to brute force your way through problems without a lot of deep thinking modern processors are excellent.
Well, you have to be solving problems of different scale then.

And let me re-iterate - we have the best tools possible (hardware and software) for the way things are currently done. Long pipelines are real speed ups for long routines.
What long routines have to do with that?
And then all that wonderful branch predicting so you can be ready for a branch.
Branch prediction is not about "being ready". It is about "ignore the branch" (and redo if it went the other way).
It seems to me that a two stack architecture which is always ready to branch would be simpler. And if you design your processor right a return instruction can be included with most other types of instruction. So you are already fetching from the stack while doing your add (or whatever) no branch predictor required.
Sorry, but you do not seem to have a clue....

Actually, I do not blame you. I have observed than most of my coworkers do not really know how modern out-of-order CPU works... In fact, it is not quite easy to understand.

OK, just for starters: Something like "already fetching from the stack" is trivial. OOO CPUs with branch prediction actually EXECUTE up to hunderds of instructions ahead of the branch. Well, some of them - those that have data available. While finishing instructions before the branch. While renaming registers to reduce dependencies.

Is it simple? No way. It is hard to design as hell. But as you correctly say, we are approaching the speed of light. Simple recipes for performance do not work anymore. You cannot bump frequency. What you CAN do is to increase parellelism and avoid bottlenecks. And OOO excels at both tasks.
We are really more stuck by our way of thinking than by what is possible.
Really?
And your point about caches? Well taken. I really like small simple processors and devoting the rest of the available silicon to on chip RAM. It speeds things up a lot.
Let me note that the average cache cell is 5-10 times more expensive than DRAM cell and requires more power....

Of course, if all you need to do is some simple 8051 class taks, your approach is fine.

If you want to simulate Polywell, you better stick with high-performance computing. Get latest Intel or AMD CPU and good C++ compiler.

Luzr
Posts: 269
Joined: Sun Nov 22, 2009 8:23 pm

Post by Luzr »

MSimon wrote:But there are advantages to zero operand machines and data flow architectures.
Yes, of course there are. They are simple. If the die size is the constraint, it is fine solution how to save some.
But every one is comfortable with the current ways. Why change until a brick wall is hit?
But they hit the brick wall if more instruction level paralelism is about to be exploited. Too much dependencies in the data flow.

MSimon
Posts: 14334
Joined: Mon Jul 16, 2007 7:37 pm
Location: Rockford, Illinois
Contact:

Post by MSimon »

But it is SLOW.
Only if your code is threaded interpreted. No one doing Forths does it that way these days. The compiler I'm writing should run compiled code as fast as any compiled C program. Faster if you are writing code in small modules. Forth has no stack thrash.

As to small processors having limits. Well yes. But we are already up against limits these days. The speed of light.

And of course there is the speed of development and size of working programs. Forth development runs 3X to 6X faster. I was competing with a big defense contractor once using Forth and a very small team. They used C. The Navy would come in and ask for a change in design - rather major.

I could get my hardware design produced and the new code up in one month. The big guys were still wailing after 6 months that we were unfair. My team was two software guys and me designing hardware and managing software. The big defense contractor had 30 guys on the job.

Let us assume 20 software guys. That is a better than 60X advantage in productivity. Nothing to sneeze at. At I had a 6X real time advantage in development.

And I did all that wonderful goodness in 4K bytes of compiled code. With the usual 50% margin in spare EPROM the Navy required for new eqpt. (to allow for upgrades).

And it didn't happen just once. It happened 3 times against these guys. Unfortunately the company I worked for was run by crooks and the Navy declined to do business with them. But the Navy guys just loved to regal us with tales of how the big company was sweating blood to keep up with us. They would tell us of how pained the big company was to be beaten by a company that had no record of being able to develop anything more than second source stuff.

I had an inspector from the Navy check our code. He poured over it for a day. He said it was the most readable code he had come across in YEARS in any language. Practically self documenting. But I had very rigid standards. I refused to let my guys get sloppy. Everything had a name related to its function.

There would be constructs like

Turn-On Motor

and

Frequency Display

BTW my 6X improvement in calendar results mirrors what FedEx claimed Forth did for them.

But change won't come for a while in most areas. Too much invested in the old way of doing things.
Engineering is the art of making what you want from what you can get at a profit.

MSimon
Posts: 14334
Joined: Mon Jul 16, 2007 7:37 pm
Location: Rockford, Illinois
Contact:

Post by MSimon »

Branch prediction is not about "being ready". It is about "ignore the branch" (and redo if it went the other way).
If your branch costs you zero cycles then there is nothing to redo. You can then throw away all that branch prediction hardware.

That is what I mean by always ready.

And long pipelines cost you when you have to do a flush. It militates against short routines.

Modern Forth chips do use a one level deep pipeline for speed. So the cost of a flush is minimized.

I can tell you have kept up with mainline chips. But Forth chips have moved on from what you used to know.
Engineering is the art of making what you want from what you can get at a profit.

Luzr
Posts: 269
Joined: Sun Nov 22, 2009 8:23 pm

Post by Luzr »

MSimon wrote: As to small processors having limits. Well yes. But we are already up against limits these days. The speed of light.
Speed of light is a performance limit of small simple CPUs - you cannot bump frequency to make it fast enough.
And of course there is the speed of development and size of working programs. Forth development runs 3X to 6X faster. I was competing with a big defense contractor once using Forth and a very small team. They used C. The Navy would come in and ask for a change in design - rather major.
Based on single anecdotal evidence?
I could get my hardware design produced and the new code up in one month. The big guys were still wailing after 6 months that we were unfair. My team was two software guys and me designing hardware and managing software. The big defense contractor had 30 guys on the job.
Well, looks like a bunch of incompetent programmers in big company vs small agile team.

Actually, I have been through this many times. I do not know why is that, but big companies tend to hire big teams of subpar developers. Then you have all these meetings and teambuilding and other sh*t and in the end, there is not much work done. Everybody is afraid to write a single line.
Let us assume 20 software guys. That is a better than 60X advantage in productivity. Nothing to sneeze at. At I had a 6X real time advantage in development.
I would say the bigger team you have, the slower the development is. Especially in relatively small projects (say up to 20000 lines).

20000 lines of C++ is something I can do myself in 3 months. If I would have to colaborate with 20 people on them, all design and interface negotiations would prolong that to 6 months or more (been there too).
And I did all that wonderful goodness in 4K bytes of compiled code. With the usual 50% margin in spare EPROM the Navy required for new eqpt. (to allow for upgrades).
4K is nothing. For such a small project, on small CPU, you actually do not need a compiler nor Forth.

I used to write 30KB assembly for Z80. In two-three months.
But change won't come for a while in most areas. Too much invested in the old way of doing things.
Well, just tell my how would you write (and run!) Polywell simulator in Forth...

Luzr
Posts: 269
Joined: Sun Nov 22, 2009 8:23 pm

Post by Luzr »

MSimon wrote:
Branch prediction is not about "being ready". It is about "ignore the branch" (and redo if it went the other way).
If your branch costs you zero cycles then there is nothing to redo. You can then throw away all that branch prediction hardware.

That is what I mean by always ready.

And long pipelines cost you when you have to do a flush. It militates against short routines.
What flush? Actually, current CPUs barely know they are in a routine. Call/Ret are virtually zero cost.

Of course, if you can pass most parameters in register (on amd64 you can in most cases), even better so.

I believe that you are referring to "failed prediction" issue - in that case there is pipeline flush, but prediction never fails for uncoditional jumps (obviously), so I still do not get your "long routines" theory... Actual CPUs really care only a little about short or long routines.

Besides, in C++, short rutines are usually inlined and completely 'dissolved' in the code - note that this goes much further than "copying the code of routine into the place".

Post Reply