Page 8 of 9

Posted: Tue Apr 06, 2010 9:59 am
by Luzr
MSimon wrote:
Totally unsuitable for the task.
Since control interests me more than crunching vast volumes of data.

The right tool for the job. Why use a 20 Hp jack hammer for pounding a couple of nails?
That is true. Diversity is nice. There might be some niche for GA144.

However, I would say most such niches are occupied by ARM, which serves the purpose better....

Posted: Tue Apr 06, 2010 10:06 am
by Luzr
JohnFul wrote:They are indeded Xeon cores. The point however was that the multipule cores are being used for specific repetitive tasks.
Yes, specific tasks, agreed. Rendering movie is task that can be ideally parallelized. All you need to do is to make each core to render single frame of movie. There are no dependencies, all you need to deliver is single output image per node.

Still, each node has to be pretty powerful to render single frame in any reasonable time (or at all). In fact, it has to be almost complete PC, with GB of RAM, with Giga ethernet etc... So I guess this is not a niche for many simple cores design.

Posted: Thu Apr 08, 2010 8:03 am
by MSimon
Luzr wrote:
MSimon wrote:
Totally unsuitable for the task.
Since control interests me more than crunching vast volumes of data.

The right tool for the job. Why use a 20 Hp jack hammer for pounding a couple of nails?
That is true. Diversity is nice. There might be some niche for GA144.

However, I would say most such niches are occupied by ARM, which serves the purpose better....
Hardware that interfaces to the real world would work a LOT better if the interface was handled asynchronously without regard to interrupt level. Every process gets its own processor so the latency for a given "interrupt" is short (nanoseconds) and well defined. You then stitch it all together with data flow.

And yes ARM has got the market and I like some of the STM ARM automotive chips with CAN, SPI, I2C, USB, Ethernet, etc. In fact my recent start of a Forth Direct Threaded Compiler (Compiles Machine Code) was because I wanted one for some ARM Projects I'm contemplating. And then I got the idea of doing my first one as a Z-80 and use the first assembler I ever owned.

The GA 144 is nice in the way it handles data flow. It can be blocking - lowest cost in time and program space. i.e. No data? Wait here. Or it can be non-blocking i.e. No data? Check back later. Which costs you more cycles since transfer is not automatic.

Posted: Thu Apr 08, 2010 9:14 am
by Luzr
MSimon wrote:
Luzr wrote:
MSimon wrote: Since control interests me more than crunching vast volumes of data.

The right tool for the job. Why use a 20 Hp jack hammer for pounding a couple of nails?
That is true. Diversity is nice. There might be some niche for GA144.

However, I would say most such niches are occupied by ARM, which serves the purpose better....
Hardware that interfaces to the real world would work a LOT better if the interface was handled asynchronously without regard to interrupt level. Every process gets its own processor so the latency for a given "interrupt" is short (nanoseconds) and well defined. You then stitch it all together with data flow.

And yes ARM has got the market and I like some of the STM ARM automotive chips with CAN, SPI, I2C, USB, Ethernet, etc. In fact my recent start of a Forth Direct Threaded Compiler (Compiles Machine Code) was because I wanted one for some ARM Projects I'm contemplating. And then I got the idea of doing my first one as a Z-80 and use the first assembler I ever owned.

The GA 144 is nice in the way it handles data flow. It can be blocking - lowest cost in time and program space. i.e. No data? Wait here. Or it can be non-blocking i.e. No data? Check back later. Which costs you more cycles since transfer is not automatic.
Ok ok. But where the Forth fits into image?

Note that simplest ARM core is about 30000 gates - and likely much more powerful than GA144 cores.

Posted: Thu Apr 08, 2010 9:25 am
by MSimon
Note that simplest ARM core is about 30000 gates - and likely much more powerful than GA144 cores.
But is it more powerful than 144 Green Array cores? I don't think so. The GA device also gives you the option of making any set of pins any kind of low speed (under 10 Mbs) serial device. You just change which set of pins get the software. No switching matrix to select functions required.

Posted: Thu Apr 08, 2010 5:08 pm
by Luzr
MSimon wrote:
Note that simplest ARM core is about 30000 gates - and likely much more powerful than GA144 cores.
But is it more powerful than 144 Green Array cores?
Well, with 130mn, you should be able to build about 1000000 gates with a good yield. Means about 30 ARM cores in single chip.

Would be 30 x 32-bit core, with one instruction per cycle, 16 registers, more powerful than 144 simple Forth cores?

I would bet so.
I don't think so. The GA device also gives you the option of making any set of pins any kind of low speed (under 10 Mbs) serial device. You just change which set of pins get the software. No switching matrix to select functions required.
That is definitely nice. But it is not unique (AFAIK) and it is quite unrelated to cores being Forth processors.

Posted: Thu Apr 08, 2010 8:17 pm
by kunkmiester
Well, with 130mn, you should be able to build about 1000000 gates with a good yield. Means about 30 ARM cores in single chip.
You just reminded me of another advantage to multiple cores--redundancy in manufacturing. The PlayStation 3 uses this--they build 8 core processors, but ship 7 core, or something like that. With a single core, if something didn't turn out, it's totally screwed.

If you built say, a 100 core processor, but sold it as a 90 core, you'd have a lot more chips able to be sold rather than scrapped, making the chips cheaper as whole.[/code]

Posted: Thu Apr 08, 2010 9:27 pm
by Luzr
kunkmiester wrote: You just reminded me of another advantage to multiple cores--redundancy in manufacturing. The PlayStation 3 uses this--they build 8 core processors, but ship 7 core, or something like that.
Actually, GPUs - same story. And AMD even sells failed quadcores as triple-cores and even dual-cores. I believe they had, in the past, sold failed dualcores as singles.

Also, Athlons 64 with failed cache lines were sold as Semprons with less cache. In fact, as cache often represents most of CPU die, failure there was the most likely. Same story with 486 and 486SX - this time, CPU with defective FPU. So your claim
With a single core, if something didn't turn out, it's totally screwed.
is not entirely true.

What is more, in today manufacturing process, dies have some degree of redundancy. E.g. there is surplus cache memory. Defective cache lines can be repaired by replacing them from this surplus (I believe they are using lasers to cut some wires to reconfigure the chip).

Pretty smart if you ask me :)

Posted: Fri Apr 09, 2010 4:47 am
by MSimon
Luzr wrote:
MSimon wrote:
Note that simplest ARM core is about 30000 gates - and likely much more powerful than GA144 cores.
But is it more powerful than 144 Green Array cores?
Well, with 130mn, you should be able to build about 1000000 gates with a good yield. Means about 30 ARM cores in single chip.

Would be 30 x 32-bit core, with one instruction per cycle, 16 registers, more powerful than 144 simple Forth cores?

I would bet so.
I don't think so. The GA device also gives you the option of making any set of pins any kind of low speed (under 10 Mbs) serial device. You just change which set of pins get the software. No switching matrix to select functions required.
That is definitely nice. But it is not unique (AFAIK) and it is quite unrelated to cores being Forth processors.
Depends on the speed difference. The GA144 trades complexity for speed. You then restore the complexity in software.

Posted: Fri Apr 09, 2010 10:55 am
by Luzr
MSimon wrote: Depends on the speed difference. The GA144 trades complexity for speed. You then restore the complexity in software.
Somehow I do not see what you mean...

If you think that Forth cores are simple and fast, the I have to disagree. AFAIK, they are simple and slow. If you would like to build fast Forth core, you would need much more die size than for fast ARM core.

If you want top-performance, even venerable x86 ISA is better suited for the task than anything stack based.

Posted: Fri Apr 09, 2010 2:02 pm
by MSimon
650 MIPS on 4.5 mA does seem rather fast to me.

To make effective use of such a bit you have to think about what you are trying to accomplish differently.

No skin off my nose if you don't like them.

Posted: Fri Apr 09, 2010 10:11 pm
by Luzr
MSimon wrote:650 MIPS on 4.5 mA does seem rather fast to me.
What is MIPS? Totally irrelevant. I believe they simply have 650Mhz clock and for PR purposes, translate it to MIPS. Show me some real-world benchmark. Can it sort 1M array of strings fast? Multiply FP matrix?

Posted: Sat Apr 10, 2010 5:02 pm
by MSimon
Luzr wrote:
MSimon wrote:650 MIPS on 4.5 mA does seem rather fast to me.
What is MIPS? Totally irrelevant. I believe they simply have 650Mhz clock and for PR purposes, translate it to MIPS. Show me some real-world benchmark. Can it sort 1M array of strings fast? Multiply FP matrix?
It is not designed for sorting arrays. It is designed for controlling hardware.

And it doesn't clock. It is asynchronous. Except for "wake up on change" or "wake up on data".

Posted: Mon Apr 12, 2010 9:01 am
by Luzr
MSimon wrote:
Luzr wrote:
MSimon wrote:650 MIPS on 4.5 mA does seem rather fast to me.
What is MIPS? Totally irrelevant. I believe they simply have 650Mhz clock and for PR purposes, translate it to MIPS. Show me some real-world benchmark. Can it sort 1M array of strings fast? Multiply FP matrix?
It is not designed for sorting arrays. It is designed for controlling hardware.

And it doesn't clock. It is asynchronous. Except for "wake up on change" or "wake up on data".
Well, in that case: Any advantage over programmable logic? :)

Posted: Mon Apr 12, 2010 11:17 am
by MSimon
Luzr wrote:
MSimon wrote:
Luzr wrote: What is MIPS? Totally irrelevant. I believe they simply have 650Mhz clock and for PR purposes, translate it to MIPS. Show me some real-world benchmark. Can it sort 1M array of strings fast? Multiply FP matrix?
It is not designed for sorting arrays. It is designed for controlling hardware.

And it doesn't clock. It is asynchronous. Except for "wake up on change" or "wake up on data".
Well, in that case: Any advantage over programmable logic? :)
You can change functionality in a few tens of nanoseconds if the code can fit in RAM.