Control Processor

MSimon · Post by **MSimon** » Mon Oct 22, 2007 2:03 am

Also I don't like functions that return many parameters (or for the matter pass many of them). In my book functions are instead for modifying memory, and not for returning parameters (and in practice=real world applications most of them are). It's true there are several ways to design a computer architecture, but I believe the current prevalent way is a compromise of the best.

Why not return many parameters if it helps?

i.e. modified data, modified addresses (which are naturally modified by the required processing)

It is a matter of passing temporaries without having to write them to memory. Much faster.

You only believe current ways are best because you do not have a deep understanding of what is possible. Think of it as ITER vs IEC. I'm the IEC guy all the way. ALL the way.

MSimon · Post by **MSimon** » Mon Oct 22, 2007 2:22 am

The way I see the stack based systems are slow - memory has to be accessed all the time. Accessing memory is slow. That's why we have registers, caches, branch prediction and random access of the stack. Those tools minimize slow direct accesses to the main memory.

Stacks are faster. You put the stack on the chip. Bottle neck eliminated. Registers eliminated (except for special functions). The stack is your cache. It can be quite small 16 deep is adequate for most stuff. 64 deep if you have an unusual problem. Branch prediction unnecessary. Random access of the stack (ala C) unnecessary. Every thing is where you want it when you need it. No need to go poking around to get stuff. Really. It is a very, very, very, very, very, hard thing to understand if you haven't done it.

Then without all that extra crap you get smaller area. Smaller area = faster. The #1 limitation in current top of the line processors is the delay due to distance. Make everything smaller and you cut down the distance.

It is a different way of thinking. Don't fight nature. Work with it. IEC vs ITER.

It amazes me that the idea of working with the forces of nature vs brute force does not translate to all disciplines. Why so many can see it in IEC vs ITER but not hardware/software design.

You are not unusual in that respect. You are in the 99.99% majority. It still does not mean the 0.01% is wrong.

MSimon · Post by **MSimon** » Mon Oct 22, 2007 2:27 am

Another common feature of computers is bulk processing of data. That's why we have pipelines. They make the processor slightly slower for the very single operations but for bulk operations (which is the common case) they increase speed.

Read, modify, write. If the processor is fast enough it can be done in one memory cycle. You don't need bulk data in the processor. And even if it helped with data size n you are out of luck with data size 2n. And data size is always increasing.

MSimon · Post by **MSimon** » Mon Oct 22, 2007 2:39 am

Your argument about an AMD burning up does not make sense. Just pour liquid nitrogen on it if it heats up, it's cheap and plentiful Wink Real arguments please. Real time reason? Sure.

Because that LN2 plumbing is just another point of failure.

And bigger power supplies have to come from some where and fit in to the total hardware.

If I can get the speed with 1 W and no cooling why is it good engineering practice to get it with 200 W and massive cooling? The cooling is going to cost you more watts.

You are thinking like a user not an engineer. Besides I save $200 on the processor and can sell the saved 199 W (actually 1 KW if LN2 cooling is counted) to a customer. Lower capital costs. More to sell. Besides it is ecological.

You substitute improved design for brute force. You reduce the labor required to solve the problem. Sure out of 500 software designers only 5 will get it. The 5 will be sufficient. Think of all the tail (as in tooth vs tail) that eliminates.

Think of todays hardware designers. It takes 1000s to build a top of the line AMD. If I build a FORTH chip with the same processing power I can do it with 10. Maybe even 2. Why not? Why waste all that human capital if you don't have to?

I HATE waste.

Read "The Mythical Man Month" by Brooks. He discusses cascading management failure in a software project. I'd prefer to avoid it.

Indrek · Post by **Indrek** » Mon Oct 22, 2007 2:57 am

MSimon wrote:In FORTH you just leave the parameters on the stack and call the subroutine. Other than the initial build no rebuilding is necessary.
If you haven't worked with it, it is very hard to believe. Very hard.

To leave parameters to the stack you have to leave them in a specified order. That is you have to reorder your parameters depending on the function you call. Also stack based system complicates working with lots of arguments - additional overhead. Having 10 registers in place really does help.

In current processors executing instructions takes multiple clocks. Leading to long pipelines, branch predictors and all that rotten kluge. The way out? Execute all (or almost all) instructions in a single cycle.

Um. Wasn't multiple clock cycles invented as revolutionary speed improvement? As in instead of every instruction taking 1 clock cycles=5ms, we have only the few biggest instructions take 5 clock cycles=5ms, and most take only 1 clock cycles=1ms? This is again a feature to make computers faster, not slower

Though now that we can put more gates on chips this advantage is diminishing. Pipelining was just a natural extension from that - how to get many slow operations working in parallel. All of that rotten kludge allowed computers to be fast with the limited silicon and memory the technology allowed. Brain over muscle.

And branch prediction is necessary to preload and preprocess the instructions for faster execution. Even your forth systems have branching and need simple branch predictors. If you don't have them it means your system is just built to take it slow. Take for example: 1 cycle to execute an instruction, 1 cycle to load the next instruction, 1 cycle to execute, 1 clock to load instruction = total 4 cycles. Instead it could be 1 clock to execute + parallel preload next instruction, 1 clock to execute + parallel preload next instruction = total 2 cycles. See? Takes only half the cycles to get the same work done without branch prediction. So I have no idea what you don't like about it.

- Indrek

MSimon · Post by **MSimon** » Mon Oct 22, 2007 2:58 am

Managing a project versus the architecture/language used - in my mind those two are entirely different things. More so I think the C vs. Forth discussion is irrelevant - they are both established technologies. Strong management and qualified professionals is what makes a project succeed, not some technological gizmo on its own. I've seen plenty of IT projects fail that stressed technology over process - the weak management's idea is "lets buy some most expensive buzzword-filled 'product' off the street, throw lots of programmers that just graduated high school at it and success is guaranteed".

They are one thing. Managing 5 is much easier than managing 500.

The throwing lots of programmers at a problem is exactly what I'm arguing against. I can do a project that requires 500 C programmers with 5 FORTH programmers (I have done it).

It is not a matter of some buzz. It is a matter of a different way of solving the problem. Fewer resources human and material. The less you require the easier it is to manage.

I had a team of 2 FORTHers against a team of 30 C guys. (different companies). The Navy would come out with a new requirement. My two guys finished in a month. The 30 C guys were still struggling after 6 months. They hated us. We made them look bad. We were upstarts. They were professionals. Let me see 2 vs 30 that is 15X. 1 month vs 6+ months. That is 6X. 6 X 15 = 90 (100 in round numbers).

Again. Why waste human and material resources? It goes against my grain.

Indrek · Post by **Indrek** » Mon Oct 22, 2007 3:24 am

MSimon wrote:Stacks are faster. You put the stack on the chip. Bottle neck eliminated. Registers eliminated (except for special functions). The stack is your cache. It can be quite small 16 deep is adequate for most stuff. 64 deep if you have an unusual problem. Branch prediction unnecessary. Random access of the stack (ala C) unnecessary. Every thing is where you want it when you need it. No need to go poking around to get stuff. Really. It is a very, very, very, very, very, hard thing to understand if you haven't done it.

The problem with this is that the amount of stack is limited. Even if you constrain the application's stack usage (this is something I've worked on to get software working on embedded systems) the modern computer systems have hundreds of threads running around. In the old days this was just not practical - the technology was limited. I can agree to a certain extent that these days this might just start making some sense. In a narrow and very specialized system it might make a lot of sense. I don't see this working in a general purpose computer though. Also to get any speed you need the data cache anyway, trust me - so why bother with the stack. Von Neumann all the way! But considering the legacy this will never happen.

Then without all that extra crap you get smaller area. Smaller area = faster. The #1 limitation in current top of the line processors is the delay due to distance. Make everything smaller and you cut down the distance.

Distance is a problem indeed. My desktop has 8GB of RAM. AFAIK they have not yet figured out how to put that all into the CPU. Sorry. Once they figure that out (and I believe they will some day) - then we'll have to review our thinking.

It amazes me that the idea of working with the forces of nature vs brute force does not translate to all disciplines. Why so many can see it in IEC vs ITER but not hardware/software design.

I agree with you here but actually your approach to computing seems as brute force to me

You are not unusual in that respect. You are in the 99.99% majority. It still does not mean the 0.01% is wrong.

I'm sorry. The many years I have lived on this little planet have taught me one thing. There is no silver bullet.

- Indrek

MSimon · Post by **MSimon** » Mon Oct 22, 2007 3:28 am

To leave parameters to the stack you have to leave them in a specified order. That is you have to reorder your parameters depending on the function you call. Also stack based system complicates working with lots of arguments - additional overhead. Having 10 registers in place really does help.

The order is natural.

If you design right no re-ordering is necessary. Most of designing right is inherent in FORTH. Really. If you have never done it it is very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, hard to understand.

You want to solve the out of order problem.

I say why create the problem in the first place? Just so you don't have to think clearly about the problem? Just so you can do sloppy design? OK. The standards you keep are up to you.

I can counter your every point - from experience. I see no use in it. It is like arguing IEC with a tokamak guy.

I don't need 10 registers. In fact I don't need any (well sometimes two is handy - for memory re-ordering problems) . You want to solve problems. I prefer not to create them. Different way of thinking.

I'd put the chance of your getting it at .001% Why should I waste the effort? Your simulations are good and you can work with what you have got. That is sufficient for now and I'm thankful for it. Very thankful.

Good design will eliminate almost all the problems C and current processor designs were built to solve.

If you want to understand my way of thinking may I suggest starting with "Starting FORTH" by Brodie to get the basics and then go on to "Thinking FORTH" by Brodie. Both free on the 'net. I have links at IEC Fusion Tech - on the sidebar.

If you factor a problem properly thrashing can be eliminated.

In many cases the key to solving problems is to not create them. A skill that is not taught in any school I'm aware of. Very smart people can solve very complex problems. The essence of genius is simplification. Chuck Moore is a genius. K&R were merely smart. Very smart, but only smart.

Indrek · Post by **Indrek** » Mon Oct 22, 2007 3:39 am

MSimon wrote:Because that LN2 plumbing is just another point of failure.

And bigger power supplies have to come from some where and fit in to the total hardware.

If I can get the speed with 1 W and no cooling why is it good engineering practice to get it with 200 W and massive cooling? The cooling is going to cost you more watts.

You are thinking like a user not an engineer. Besides I save $200 on the processor and can sell the saved 199 W (actually 1 KW if LN2 cooling is counted) to a customer. Lower capital costs. More to sell. Besides it is ecological.

You substitute improved design for brute force. You reduce the labor required to solve the problem. Sure out of 500 software designers only 5 will get it. The 5 will be sufficient. Think of all the tail (as in tooth vs tail) that eliminates.

Think of todays hardware designers. It takes 1000s to build a top of the line AMD. If I build a FORTH chip with the same processing power I can do it with 10. Maybe even 2. Why not? Why waste all that human capital if you don't have to?

I'm sorry. I believe in fixing simple problems with brute force. Fixing fundamental issues with brute force is a no-no though. Also I'm looking at it from a manager standpoint rather than engineer. Instead of buying one big copper heatsink you completely change the architecture of your entire system. Does not make sense to me - this is another extream to the brute force approach. Very often engineers only see the trees and don't realize that there's a forest. Also, I made some fun of you with the LN2 comment (your brute force approach to polywell design), this was not a serious engineering proposal, smile

- Indrek

Indrek · Post by **Indrek** » Mon Oct 22, 2007 4:30 am

MSimon wrote: The order is natural.

If you design right no re-ordering is necessary. Most of designing right is inherent in FORTH. Really. If you have never done it it is very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, very, very, very very, hard to understand.

You want to solve the out of order problem.

I say why create the problem in the first place? Just so you don't have to think clearly about the problem? Just so you can do sloppy design? OK. The standards you keep are up to you.

But I keep very high standards to my coding. When I write C/C++ I'm always conscious of what kind of assembly instructions the compiler will generate. I can tell you most other programmers don't. I consider them incompetent

But also. There are the trees and there is the forest. Most (99%+) of bloat and slowness in modern computing is not caused by stack thrashing or poor coding on a code line per code line basis. It's because of the bigger architectural issues. Most IEEE-s I know only know about the trees. They write poor code. Most mathematicians I know have no idea about the trees or the forest, but they know how an idealized tree should look like. They write even poorer code. One has to know about both. Now compilers also optimize your code - that's part of what they do. It seems forth forces a lot of that work on the programmer. That means programmer must understand more what he does and so makes less mistakes, but also there is more overhead and the programmer must have a bigger than average brain. But this also makes reading and modifying old code more complicated. You know. I don't use comments in my code. I believe that the code that needs comments to be understood later on is poor code. You write code once but you read it 10 times - that's where the priority should be. In that sense perl is an abomination that should be erased off the earth's surface

Now I might have stressed this issue a bit too much. I see forth might avoid some stack thrashing that is inherent in C. But. IMHO the stack thrashing is not a problem one needs to solve. Because. It's not a problem in general purpose computing.

As for forth forcing to think out your code and so writing better code and being more productive. There are other languages that do that. Whether they use stacks or register underneath is irrelevant.

I can counter your every point - from experience. I see no use in it. It is like arguing IEC with a tokamak guy.

I don't need 10 registers. In fact I don't need any (well sometimes two is handy - for memory re-ordering problems) . You want to solve problems. I prefer not to create them. Different way of thinking.

The thinking is different. You say you have the stack in the CPU. Well. I have the registers in the CPU. It's almost the same thing.

I'd put the chance of your getting it at .001% Why should I waste the effort? Your simulations are good and you can work with what you have got. That is sufficient for now and I'm thankful for it. Very thankful.

Good design will eliminate almost all the problems C and current processor designs were built to solve.

If you want to understand my way of thinking may I suggest starting with "Starting FORTH" by Brodie to get the basics and then go on to "Thinking FORTH" by Brodie. Both free on the 'net. I have links at IEC Fusion Tech - on the sidebar.

If you factor a problem properly thrashing can be eliminated.

In many cases the key to solving problems is to not create them. A skill that is not taught in any school I'm aware of. Very smart people can solve very complex problems. The essence of genius is simplification. Chuck Moore is a genius. K&R were merely smart. Very smart, but only smart.

Actually I'm not a big C fan. Every time I revert from C++ to C I start cursing. So I do C++. But I think C++ is too complicated, I think most people who use it have no idea of what they are doing. If you asked me what the most beautiful computer language is then that would be SML (python might be the second). So you've never met any competent C programmers? I'm sorry. I might just be your first

And yes. Experience is not taught in schools.

I can't read books off computer screens, sorry. Maybe in some other life.

- Indrek

Nanos · Post by **Nanos** » Mon Oct 22, 2007 6:03 am

If you had been in the UK, you would have loved the Jupiter ACE when it came out in the 80's, a computer with built in FORTH language;

http://en.wikipedia.org/wiki/Jupiter_ACE

I particularly like this page about building your own;

http://home.micros.users.btopenworld.co ... erAce.html

I'm also a fan of the ARM cpu, beautiful instruction set its got.

One of my little projects in the future is to design/build my own CPU's (possibly optical at first, with a mechanical version later to handle powercuts.) and I can imagine MSimon being the ideal kind of person to work on a project like that.

MSimon · Post by **MSimon** » Mon Oct 22, 2007 7:30 pm

Nanos wrote:If you had been in the UK, you would have loved the Jupiter ACE when it came out in the 80's, a computer with built in FORTH language;

http://en.wikipedia.org/wiki/Jupiter_ACE

I particularly like this page about building your own;

http://home.micros.users.btopenworld.co ... erAce.html

I'm also a fan of the ARM cpu, beautiful instruction set its got.

One of my little projects in the future is to design/build my own CPU's (possibly optical at first, with a mechanical version later to handle powercuts.) and I can imagine MSimon being the ideal kind of person to work on a project like that.

The way to do it (for prototypes) is with FPGAs.

MSimon · Post by **MSimon** » Mon Oct 22, 2007 7:35 pm

Indrek,

FORTH has about the simplest inheritance model I have ever seen. In fact it was the first object oriented language about 10 years before C+.

And yes I have met good C programmers. I can run circles around them with FORTH.

BTW you don't have to read the books off the screen. Down load them and print them out. If I have piqued your interest.

MSimon · Post by **MSimon** » Mon Oct 22, 2007 7:43 pm

Indrek,

Brute force is nice. If you can afford it.

Still a team of 5 is going to produce higher quality software than a team of 500.

People build new processors all the time. And if you insist on a C model a C compiler can be built.

However, there really is no shortage of FORTH programmers. JAVA is a quasi FORTH language. Why? The Sun influence.

Indrek · Post by **Indrek** » Mon Oct 22, 2007 8:15 pm

MSimon wrote:Indrek,

Brute force is nice. If you can afford it.

Still a team of 5 is going to produce higher quality software than a team of 500.

People build new processors all the time. And if you insist on a C model a C compiler can be built.

However, there really is no shortage of FORTH programmers. JAVA is a quasi FORTH language. Why? The Sun influence.

I've seen what a small team can do. I'm here with you. I've seen what a 3-5 people team of c++ programmers can do, you'd be amazed (returning the same statement to you here;). And I've seen the degradation of an organization as it hires too many people.

I don't insist on the C model, no way dude. It's just some of your comments seem so provocative I couldn't help but poke some holes in them and start a flame war

And as I said before - which established technology gets used is not really of big relevance in my opinion. Code can be written quickly in pretty much any programming language.

I feel that in your mind you associate C/C++ with a 500 people team and dead-slow progress. That's an invalid perception. It's not the language, it's the management and size of the team.

- Indrek