Skynet is coming.

Luzr · Post by **Luzr** » Thu Feb 16, 2012 10:35 pm

ScottL wrote:
You are interpreting the law in strange ways. But even so, you are wrong about resulting computer power. I looks like you misinterpret "clock speed" for "performance". Adding more cores IS legit way how to sidestep MHz limit. E.g. especially for simulating neural networks, more cores are just as fine as more GHz.
The "law" states a doubling over an 18-24 month period. I've seen no doubling from 2002 to 2012. In 10 years we've gone from as you say 2.2Ghz processors to 2x 2.2Ghz processor

Where have you missed 3 times more IPC performance?

Also note we have not accomplish successful decoupling of multi-threaded programming;

Not sure what you mean, but I am quite sure that if you as programmer are unable to put those cores to work where possible, you should perhaps consider another career.

Yes, there are some 'new' things to learn, but really, there is no magic.

Quad cores aren't the standard and the chips are 20x faster, each core contains the capability of a single P4 x.x GHz chip.

Sandy Bridge has 3 times more IPC. Where have you happened to miss this?

Plus, my one year old quadcore sandy-bridge notebook was exactly the same price as the dual-core core 2 I have bought 4 years ago. IMO, that is the law in action. And I am always doing basic benchmark important for my work (compiling huge C++ projects) and it really does it 3 times faster..

Furthermore, the law isn't about performance over price.

Sure it is. There was no possible metrics in the past. Sure, today Intel CPUs are fastest cores in the world, but it had no sense in 1990's to compare Crays with Apples...

Assuming a Quad core at 2.2Ghz per core, 8.8 GHz, assuming your mentioned 2GHz after 2 years should be 4GHz, after 2 more should be 8GHz, after 2 more should be 16GHz, 2 more 32GHz, and finally 2 more for 64GHz.

You really are victim of Mhz myth! 2Ghz Sandy Bridge single core is performance equivalent of 6Ghz Pentium 4 single core... (HT not included)

These are a specialty processor for graphics. Sure there's documentation on them being capable of doing some impressive stuff, but, they aren't a CPU and have no positive or negative impact on neural networks unless specifically setup to do them and even then they'd be no better than your multi-core processor.

So, looking for "major breakthrough", but too lazy to write a bit of OpenCL code?

While old enough to have witnessed this claim, I also am wise enough to note the break through that was the Integrated Circuit. I've specifically stated it would take another break through of equivalent level to push us beyond. I mean come on, I was running dual AMD Athlon MP 1600+ in 2000.

Memory is such a fragile thing...

http://en.wikipedia.org/wiki/Athlon#Athlon_XP.2FMP

"Palomino was also the first socketed Athlon officially supporting dual processing, with chips certified for that purpose branded as the Athlon MP"

"First release: October 9, 2001"

That's 2x 1.4Ghz and I was "behind in the times" at that point and I was still clocking in at 2.8Ghz. At that time 2x 1.8 and 2x 2.0 were definitely available so 3.6Ghz and 4Ghz respectively have been available for 12 years and we're coming in at a whopping 8.8 after nearly 12 years.

First 2Ghz x86 CPU was Northwood in 2002...

As always I'm not saying never, I'm saying we need a break-through and so far, chip makers are happy just adding cores. At some point the individual core will need to become faster.

But they do, quite a lot. It is just it is possible to make it faster without increasing clock speed. You simply invest more gates to make it smarter.

Side note: The clock speed is mostly limited by power consumption going too high. But this was EVER true in the past, during all times of Moore's law. You might have witnessed by increasing heatsinks during 1990's. Pre 1990 CPUs (80386, 68000) did not even had any heatsinks. 486 had primitive one. Pentium larger. Pentium 4 gigantic. It was just that prior Pentium 4, consumption was less than 50W and nobody cared...

Diogenes · Post by **Diogenes** » Thu Feb 16, 2012 10:41 pm

williatw wrote:
Diogenes wrote: Yes, the pro-gun forces have effectively won the war on this issue. One of the ways that they have done so is through the use of Public Service Announcements (informing the public) such as the humorous Youtube video up above. In many respects, we have Bill Clinton to thank for rallying the Gun forces and taking over legislatures all across the country. His Ban on Semi-Automatic fire arms was the wake up call for Pro-Gun Americans everywhere. It is to our credit that we have not stopped applying pressure on this issue.
Obama is a eunuch on this issue.
Yes he is but there is one caveat: Judges; if Obama wins a 2nd term he would have more opportunities to appoint more federal judges like Sotomayor and Kagan. One of whom flat out lied during her confirmation hearings about her support for the 2nd amendment. To say nothing of Ruth Bader Ginsberg feelings about our constitution being "flawed", other emerging democracies like Egypt should look elsewhere. It is fascinating in a tragic way to hear the Obama administration talk about not arming the rebels in Syria because "more guns would just make the situation worse". In other words if you arm the rebels and they defeat Syria's military or at least hold their own, you can't argue against the whole idea of an armed citizenry being a deterrent against a tyrant. People who think that the constitution is just a loose guideline to be followed or ignored based the whim of a judge who thinks they know what best for us poor serfs.

I fear that if Obama gets re-elected (I consider him to as yet to be unproven as to his legitimacy) then our more pressing problem will be economic system collapse. But yes, barring that, Obama's appointments to the Federal Judiciary could only be horrible.

At the moment I don't see any of the current selection of Republican contenders as being much of a threat to him. I think Yogi Berra said "You can't beat somebody with nobody."

I don't like ANY of them, but the one I dislike the least is Gingrich, but he comes with so much baggage that just won't fly in a general election. At the moment it is not looking good to get rid of this first usurper/tyrant President.

ScottL · Post by **ScottL** » Thu Feb 16, 2012 11:35 pm

Where have you missed 3 times more IPC performance?

This is the number of instructrions per cycle, not the measure of cycles. Think of it as 2 hoses, the same size and generally the same through-put. You're pouring water through that hose, but in the "new quad core hose" you have a tech that is able to shrink the water molecules before entering the hose. Well of course you can pump more molecules through. The hose is still the same size and stuff in general still flows through it the same speed.

Not sure what you mean, but I am quite sure that if you as programmer are unable to put those cores to work where possible, you should perhaps consider another career.

I often wish many people would get other careers, but I don't get the say. The problem here is that the only applications currently utilizing multi-core power are video editing/compression technologies. Open up MS Word and note that it only runs on your first core. Open up, oh, iono, World of Warcraft, still just running on that first core. You name it, 9 time sout of 10, its running in that first core. These things are not distributed by the OS as well....there isn't a concensus on how to successfully do that per se. Feel free to read up on it though, huge debates going on. BattleField 3 being the 1 out of 10.

Sandy Bridge has 3 times more IPC. Where have you happened to miss this?

Plus, my one year old quadcore sandy-bridge notebook was exactly the same price as the dual-core core 2 I have bought 4 years ago. IMO, that is the law in action. And I am always doing basic benchmark important for my work (compiling huge C++ projects) and it really does it 3 times faster..

Once again it's part of the instruction set, not the cycles per second and has little to do with speed of the core itself.

In response to the price/performance vs performance/time (Moore had several postulates)

Sure it is. There was no possible metrics in the past. Sure, today Intel CPUs are fastest cores in the world, but it had no sense in 1990's to compare Crays with Apples...

I point out this:

His bold prediction, popularly known as Moore's Law, states that the number of transistors on a chip will double approximately every two years.

This is not the case as we're hitting the limit as I previously mentioned at the Nanometer level.

You really are victim of Mhz myth! 2Ghz Sandy Bridge single core is performance equivalent of 6Ghz Pentium 4 single core... (HT not included)

You have a tightened instruction set. Modify any P4 to handle the same instruction set and you'll get the same amont of instructions handled per cycle.

So, looking for "major breakthrough", but too lazy to write a bit of OpenCL code?

I wouldn't recommend it for neural networks and what you need access too.

Memory is such a fragile thing...

http://en.wikipedia.org/wiki/Athlon#Athlon_XP.2FMP

"Palomino was also the first socketed Athlon officially supporting dual processing, with chips certified for that purpose branded as the Athlon MP"

I stand corrected, 2001, not 2000. Technically my chip was an Athlon XP 1400, but registered as an MP so I purchased a second MP and dual board. I'm surprised I was that close in the date actually.

But they do, quite a lot. It is just it is possible to make it faster without increasing clock speed. You simply invest more gates to make it smarter.

Sliming the instruction set so you can jam more through isn't really my interpretation of speeding up the processor. We're at a cycle limit right now, the next area is the as noted tighten the instruction set, the final is to fit as many potential cores as we can in a single die. We're coming to this point with the i7 and beyond. We need a break-through, and while there are varying researchers look for the answer, they have not found it. Even by Intel's admission, the Moore's "law" has slowed and become a 3 year doubling instead of 1.5-2 year doubling. It's flat lining as I predicted and will continue to do so until som new innovation.

Supported slowing:

This trend has continued for more than half a century. 2005 sources expected it to continue until at least 2015 or 2020.[note 1][12] However, the 2010 update to the International Technology Roadmap for Semiconductors has growth slowing at the end of 2013,[13] after which time transistor counts and densities are to double only every 3 years.

http://www.itrs.net/

And this data is skewed by the fac tthat NAND components have shrunk so more are fit on the circuit. Had this tech not been part of the data, the slowing would be more prominent.

hanelyp · Post by **hanelyp** » Fri Feb 17, 2012 2:57 am

ScottL wrote: The "law" states a doubling over an 18-24 month period. I've seen no doubling from 2002 to 2012. In 10 years we've gone from as you say 2.2Ghz processors to 2x 2.2Ghz processor

My latest PC uses a 3+GHz 6 core processor. In addition, each new generation of processor develops new tricks to get more done with each clock tick.

2.xGHz dual cores are the standard because they are fast enough for what most people do, and cost less.

These are a specialty processor for graphics. Sure there's documentation on them being capable of doing some impressive stuff, but, they aren't a CPU and have no positive or negative impact on neural networks unless specifically setup to do them and even then they'd be no better than your multi-core processor.

Given that so much of the processing on a typical PC is the type the GPUs do, advances there make a big difference in desktop performance. As for neural networks, with a little smart design I can see GPUs doing them very well, calculating circles around a general purpose processor.

ScottL · Post by **ScottL** » Fri Feb 17, 2012 8:00 am

hanelyp wrote:
ScottL wrote: The "law" states a doubling over an 18-24 month period. I've seen no doubling from 2002 to 2012. In 10 years we've gone from as you say 2.2Ghz processors to 2x 2.2Ghz processor
My latest PC uses a 3+GHz 6 core processor. In addition, each new generation of processor develops new tricks to get more done with each clock tick.

2.xGHz dual cores are the standard because they are fast enough for what most people do, and cost less.

These are a specialty processor for graphics. Sure there's documentation on them being capable of doing some impressive stuff, but, they aren't a CPU and have no positive or negative impact on neural networks unless specifically setup to do them and even then they'd be no better than your multi-core processor.
Given that so much of the processing on a typical PC is the type the GPUs do, advances there make a big difference in desktop performance. As for neural networks, with a little smart design I can see GPUs doing them very well, calculating circles around a general purpose processor.

Congrats?

Given that most of the processing done on a PC are done in the CPU and GPUs usually stick with their name-sake Graphical Processing Units with entirely different instruction sets. So let me get this straight you want an AI that willfully utilizes both the processors and video processors of every server in a data center and you don't see the error. You know IBM's Watson is the closest thing we've built to AI and even it can't think.

Luzr · Post by **Luzr** » Fri Feb 17, 2012 9:04 am

ScottL wrote:
Where have you missed 3 times more IPC performance?
This is the number of instructrions per cycle, not the measure of cycles.

Well, sorry, now I have lost you. What are you speaking about?

Is it your interpretation that Moore's law is about clock frequency?!

I often wish many people would get other careers, but I don't get the say. The problem here is that the only applications currently utilizing multi-core power are video editing/compression technologies. Open up MS Word and note that it only runs on your first core.
Open up, oh, iono, World of Warcraft, still just running on that first core. You name it, 9 time sout of 10, its running in that first core. These things are not distributed by the OS as well....there isn't a concensus on how to successfully do that per se. Feel free to read up on it though, huge debates going on. BattleField 3 being the 1 out of 10.

It is true that there is code that does not benefit from more cores most of time (wordprocessing). It is typically code that in fact does not need much processing power either way.

But if you need that power, it really is not hard to make the use of it. Plus, IME, the software is either single core or multicore. Actual number of cores does not matter, if you go "multi" (and do it right), all cores will be utilized, no matter how much of them.

Now, to be correct, there are some future limits about number of cores you can put into CURRENT infrastructure. 32 cores for Sandy Bridge would probably starve for memory bandwidth. But these are things that can and will be solved (e.g. those 6 core monsters use more memory channels).

Once again it's part of the instruction set, not the cycles per second and has little to do with speed of the core itself.

Could you decipher this for me please? Same software, same CPU instructions, 3 times speed of single core Sandy Bridge vs Pentium 4 at same clock.

What am I missing? Your hose analogy is flawed, it is same water, just bigger diameter for Sandy Bridge..

You really are victim of Mhz myth! 2Ghz Sandy Bridge single core is performance equivalent of 6Ghz Pentium 4 single core... (HT not included)
You have a tightened instruction set. Modify any P4 to handle the same instruction set and you'll get the same amont of instructions handled per cycle.

Bullsh*t. These are benchamarks with the same code, plain old 32bit x86 with SSE2. If you would count in enhancements of instructions set (especially x86-64 and AVX), you would get even much faster.

Sliming the instruction set so you can jam more through isn't really my interpretation of speeding up the processor.
We're at a cycle limit right now, the next area is the as noted tighten the instruction set

Yes, ISA gets better (but it is being EXTENDED, not reduced), but to say that we are at "cycle limit" is flawed. Yes, there are no doublings, but Intel still manages to get healthy 20% improvement in IPC in each generation, ISA improvements not counted in.

This trend has continued for more than half a century. 2005 sources expected it to continue until at least 2015 or 2020.[note 1][12] However, the 2010 update to the International Technology Roadmap for Semiconductors has growth slowing at the end of 2013,[13] after which time transistor counts and densities are to double only every 3 years.

Which still is in the future...

To be fair, there could be "other Moore's law" working, the one about exponentially growing costs of manufacturing equipment. It could be that for most consumer electronics, current state is enough. Frankly, people like me who benefit from all those 8 logical cores are rare. Most people will do just as fine with nehalem generation CPU 2cores and will not notice difference beyond.

Maybe the real breakthrough we need is some consumer software task that needs all that power and makes sense

ScottL · Post by **ScottL** » Fri Feb 17, 2012 6:47 pm

Well, sorry, now I have lost you. What are you speaking about?
Is it your interpretation that Moore's law is about clock frequency?!

Sorry if I was unclear. My comment has nothing to do with Moore's law. We agree that cycle speed a P4 2.2GHz and a single core of a quadcore 2.2GHz are the same correct? If so, my argument is that we're looking for "new tricks" to squeeze the last bit of computing power, and its my contention that there are finite levels of that squeeze.

For instance, you've noted the combining of some instruction into a single cycle vs. perhaps an add/sub/multiply each taking their own cycle. This in my opinion is a manipulation, albiet a clever one, it still has finite terms based on the number of registers...etc..

Next comes the number of cores per dye, once again a finite number due to sizing and that dye real-estate is almost exclusively kept for cores.

Bullsh*t. These are benchamarks with the same code, plain old 32bit x86 with SSE2. If you would count in enhancements of instructions set (especially x86-64 and AVX), you would get even much faster.

My contention is that what is happening is this:

P4 single core => Add instruction = 1 cycle, Sub instruction = 1 cycle, Mult instruction = 1 cycle, totalling 3 cycles

Quadcore => Add instruction + Sub instruction = 1 cycle, Mult instruction = 1 cycle, totalling 2 cycles

Yes you get more instructions per cycle, but....(there are always buts) the P4 with a modified IR can also do this, it just wasn't designed to do so. The great thing about the P4 being cheap now is that Hobbyists have partially successfully managed to get the P4 to execute more instructions per cycle. I'm stating that this is possible and has been done.

Yes, ISA gets better (but it is being EXTENDED, not reduced), but to say that we are at "cycle limit" is flawed. Yes, there are no doublings, but Intel still manages to get healthy 20% improvement in IPC in each generation, ISA improvements not counted in.

2.2GHz, 3.3GHz, we really haven't moved in in the last 6 years. It's all about the number of cores now and less about the clock speed. They're using creative manipulations to get around the hard limits via fsb, msb, caches, etc. These have finite growth as the boards fill up. We're already seeing them resort to SSD Caching onboard.

To be fair, there could be "other Moore's law" working, the one about exponentially growing costs of manufacturing equipment. It could be that for most consumer electronics, current state is enough. Frankly, people like me who benefit from all those 8 logical cores are rare. Most people will do just as fine with nehalem generation CPU 2cores and will not notice difference beyond.

He has several postulates, I hesitate to call laws as there have been trend lines to show they don't always hold true. Most noteably is the single core vs single core, where we haven't necessarily doubled our speed every 2 years, but instead went the route of more cores and said well the chip as a whole has held this postulate true.

I hope you don't mind but I've grouped the quotes based on differing arguments, so here is my programmer argument:

It is true that there is code that does not benefit from more cores most of time (wordprocessing). It is typically code that in fact does not need much processing power either way.

But if you need that power, it really is not hard to make the use of it. Plus, IME, the software is either single core or multicore. Actual number of cores does not matter, if you go "multi" (and do it right), all cores will be utilized, no matter how much of them.

Now, to be correct, there are some future limits about number of cores you can put into CURRENT infrastructure. 32 cores for Sandy Bridge would probably starve for memory bandwidth. But these are things that can and will be solved (e.g. those 6 core monsters use more memory channels).

As a current full time programmer, we as a whole generally aren't worried about single vs multi. At the standard application level (Word, Excel, IE, Firefox, Chrome, etc) It should be the operating system that identifies the bottleneck and transfers or executes the process on a different core. As programmers, we aren't thinking about how the work is distributed, because then we'd have different code accounting for varying number of cores.

There are some exceptions to this rule in the way of say BattleField 3 and any major video compression tool. In the case of this few month old game, it can identify (as far as I know) up to 4 cores and seperate duties. The reason for this need for all cores in BF3 is that you're distributing physics calculations that don't tie back to each other and/or Shader, Texture, etc. calculations of high magnitude. This however; is largely over kill for any every day application, although you can manually set which core an app runs on. Repeating, this is mostly an OS issue for core management, not a programming issue.

Luzr · Post by **Luzr** » Fri Feb 17, 2012 8:58 pm

ScottL wrote:
Well, sorry, now I have lost you. What are you speaking about?
Is it your interpretation that Moore's law is about clock frequency?!
Sorry if I was unclear. My comment has nothing to do with Moore's law. We agree that cycle speed a P4 2.2GHz and a single core of a quadcore 2.2GHz are the same correct? If so, my argument is that we're looking for "new tricks" to squeeze the last bit of computing power, and its my contention that there are finite levels of that squeeze.

For instance, you've noted the combining of some instruction into a single cycle vs. perhaps an add/sub/multiply each taking their own cycle. This in my opinion is a manipulation, albiet a clever one, it still has finite terms based on the number of registers...etc..
Ah, I see. You have read somewhere something about CPUs and you think this is the full knowledge... but it is sort of 'elementary school' oversimplification.

Modern CPUs are much more complicated that you ever thought. Let me just say that the code flow written in programs is not 'what they really do'. All that matters is that within single thread, memory content and I/O are consistent with algorithm. What they do internally is wildly different from what is written in assembly. For starters, they DO NOT HAVE fixed registers as defined in ISA. E.g. x86-64 has 32 'logical' registers defined in ISA, but CPUs tend to have about 250 physical registers used to really do the stuff - they are used to reduce opcode dependencies. Some opcodes reduced to simpler ones, others are fused to single operation.

Obviously, the goal is to increase intruction level paralellism. And this is something more than manipulation, with more gates, you can have more execution units that really do things in parallel. And it works.

My contention is that what is happening is this:

P4 single core => Add instruction = 1 cycle, Sub instruction = 1 cycle, Mult instruction = 1 cycle, totalling 3 cycles

Quadcore => Add instruction + Sub instruction = 1 cycle, Mult instruction = 1 cycle, totalling 2 cycles
Totally misguided. It is NOT about quadcore vs P4. It is about complex Sandy Bridge OOO core vs less complex OOO P4 core (yes, P4 WAS OOO as well, just not as good). There is no "add + sub". There is just a number of integer execution units that can process a couple of instructions in parallel.

In OOO core, easily about 100 instructions is 'in the waiting' for execution - usually they are waiting for memory operands or for execution unit available. Once memory operands and execution units are available, they get executed. It is not about combining single add/sub. In fact, it is even possible that instructions from two iterations of single loop are performed in REVERSE ORDER (if it does not change the meaning of algorithm) - this is what OOO - out of order - means...

Surely, all that requires VERY complex logic, but this is what those gates you seem to miss to notice go (without increasing clock speed).

He has several postulates, I hesitate to call laws as there have been trend lines to show they don't always hold true. Most noteably is the single core vs single core, where we haven't necessarily doubled our speed every 2 years, but instead went the route of more cores and said well the chip as a whole has held this postulate true.

The original law is about doubling of number of gates (transistors) for the same price, not about performance... in other words, going to finer process, 180nm -> 120nm -> 90nm -> ... and this one seems to 'tock' quite regulary. If it goes to increasing number of cores, it is not the fault of law.

The another point you miss is that increasing clock speed is just side-result of decreasing the node side, but that is not the correct interpretation. The increasing performance is just the intelligence of Intel designers to make a good use of those additional gates (as currently opposed to AMD designers in Buldozerr fiasco

But if you would decided for "brute force" brain simulation, simple node 'tock' should be quite fine.

ScottL · Post by **ScottL** » Sat Feb 18, 2012 12:29 am

Without going into too much detail and clearly you've opted not to listen to anything I'll say, I'll defend this much.

My "reading a little on cpu's" comes from the following course in my undergrad dealing mostly with the Athlon series vs Intel P4 series processors. Writing out the gated bus for irq selection was not fun.

https://www.cse.ohio-state.edu/cgi-bin/ ... st.pl?r=12

675.01 Introduction to Computer Architecture UG 3
Description: Computer system components, instruction set design, hardwired control units, arithmetic algorithms/circuits, floating-point operations, introduction to memory and I/O interfaces. Au, Wi, Sp Qtrs. 3 cl. Prereq: 360 or ECE 265; Math 366; ECE 261. Not open to students with credit for 675 or 675.02. Intended for students with previous knowledge of Digital Logic Design.

So if a senior level undergrad course dealing with the instruction set and esign of the Intel P4 and Athlon processor is "elementary," ok.

As noted from your own mentioning of the "MHz Myth" simplified summary from WIkipedia:

The megahertz myth, or less commonly the gigahertz myth, refers to the misconception of only using clock rate to compare the performance of different microprocessors. While clock rates are a valid way of comparing the performance of different speeds of the same model and type of processor, other factors such as pipeline depth and instruction sets can greatly affect the performance when considering different processors. For example, one processor may take two clock cycles to add two numbers and another clock cycle to multiply by a third number, whereas another processor may do the same calculation in two clock cycles. Comparisons between different types of processors are difficult because performance varies depending on the type of task. A benchmark is a more thorough way of measuring and comparing computer performance.

Glad you mentioned Sandy Bridge because it's all about caching.
Wikipedia because its an easy source:

Upgraded features from Nehalem include:
32 kB data + 32 kB instruction L1 cache (3 clocks) and 256 kB L2 cache (8 clocks) per core
Shared L3 cache includes the processor graphics (LGA 1155)
64-byte cache line size
Two load/store operations per CPU cycle for each memory channel
Decoded micro-operation cache and enlarged, optimized branch predictor
Improved performance for transcendental mathematics, AES encryption (AES instruction set), and SHA-1 hashing
256-bit/cycle ring bus interconnect between cores, graphics, cache and System Agent Domain
Advanced Vector Extensions (AVX) 256-bit instruction set with wider vectors, new extensible syntax and rich functionality
Intel Quick Sync Video, hardware support for video encoding and decoding
Up to 8 physical cores or 16 logical cores through Hyper-threading

What I said holds true still. There are physical differences between a P4 and any 2011 Sandy Bridge processor for sure, if there weren't, then Intel would've pissed off a lot of people. They increased the cache much as I have repeatedly said and the allowed 2 load/store operations per cycle (increased registers and/or extended instruction set).

As for the original law as quoted from Intel:
http://www.intel.com/content/www/us/en/ ... ology.html

Intel co-founder Gordon Moore is a visionary.

His bold prediction, popularly known as Moore's Law, states that the number of transistors on a chip will double approximately every two years.

Intel, which has maintained this pace for decades, uses this golden rule as both a guiding principle and a springboard for technological advancement, driving the expansion of functions on a chip at a lower cost per function and lower power per transistor by introducting and using new materials and transistor structures.

The announcement of the historic Intel® 22nm 3-D Tri-Gate transistor technology assures us that the promise of Moore’s Law will continue to be fulfilled.

hanelyp · Post by **hanelyp** » Sat Feb 18, 2012 3:19 am

The 80486 processor was limited to a subset of the instructions for best speed. The Pentium I was worse, sensitive to instruction order to make best use of the hardware. Since then processors have introduced methods of breaking down and reordering whatever instruction stream the get, finding inherent parallelism, and dividing the jobs between numerous execution units sharing a large set of internal registers. MHz has been at a near standstill for several years. Throughput continues to expand.

On a side note, the internal architecture of modern processors has little resemblance to the instruction code model programmed for. It shouldn't be all that difficult for the masters of CPU design to allow for programming alternate instruction sets. CPU 0 could run the default instruction set, while CPU1 with identical hardware runs another instruction set. The hazard is that these micro programs would be specific to the CPU model.

Luzr · Post by **Luzr** » Sat Feb 18, 2012 8:11 am

ScottL wrote:They increased the cache much as I have repeatedly said and the allowed 2 load/store operations per cycle (increased registers and/or extended instruction set).

So, do you think that those "2 load/store operations per cycle" are explicitly visible in programming model ?

Luzr · Post by **Luzr** » Sat Feb 18, 2012 8:14 am

hanelyp wrote: On a side note, the internal architecture of modern processors has little resemblance to the instruction code model programmed for.

Well, you can also say that for high performance CPU, instruction set matters much less than people tend to believe.

ScottL · Post by **ScottL** » Sat Feb 18, 2012 10:22 pm

Luzr wrote:
ScottL wrote:They increased the cache much as I have repeatedly said and the allowed 2 load/store operations per cycle (increased registers and/or extended instruction set).
So, do you think that those "2 load/store operations per cycle" are explicitly visible in programming model ?

If they are linear yes, otherwise no.

Luzr · Post by **Luzr** » Sun Feb 19, 2012 6:40 am

ScottL wrote:
Luzr wrote:
ScottL wrote:They increased the cache much as I have repeatedly said and the allowed 2 load/store operations per cycle (increased registers and/or extended instruction set).
So, do you think that those "2 load/store operations per cycle" are explicitly visible in programming model ?
If they are linear yes, otherwise no.

I see. Means you have no clue. Either the course you are so proud about was not deep enough (probably was concentrated on programming model only), or you have paid not enough attention.

I recommend you spending some time studying the real stuff. Those CPU cores are more complex - and more beautiful - than you have ever thought.

You can start e.g. here:

http://en.wikipedia.org/wiki/Out-of-order_execution

ScottL · Post by **ScottL** » Sun Feb 19, 2012 10:13 pm

Luzr wrote:
ScottL wrote:
Luzr wrote: So, do you think that those "2 load/store operations per cycle" are explicitly visible in programming model ?
If they are linear yes, otherwise no.
I see. Means you have no clue. Either the course you are so proud about was not deep enough (probably was concentrated on programming model only), or you have paid not enough attention.

I recommend you spending some time studying the real stuff. Those CPU cores are more complex - and more beautiful - than you have ever thought.

You can start e.g. here:

http://en.wikipedia.org/wiki/Out-of-order_execution

I don't think you read my comment right. Your link reiterates exactly what I said. To avoid execution out of order, linear code is "queued" accordingly, otherwise it can be combined in a single cycle. The article states this and I agree 100%.