Jump to content
Sign in to follow this  
xman1

More CPU threads

Recommended Posts

The one thing I hope for with ArmA III is more CPU threads. In playing ArmA II, I was watching while playing and I notice that 3 threads are used fairly well, with possibly a fourth thread in use.

Why do you ask that I ask for more threads? Simple. ArmA II has shown that it is not a GPU limited game, but a CPU limited game. A kick ass GPU will not get you much in the way of extra FPS. It is all the objects in game that bog this game down, + physics, + AI, so if you have a slow CPU even though you may have a bad ass GPU, this game will likely still suck for you.

This brings me to new arch types that are coming out this year. AMD's Bulldozer (first new architecture out of AMD since 2003) is about to be released with 8 cores. 3.8 GHz with 4.2 GHz turbo. All cores can turbo. In addition, you will have 8MB of shared L2 cache - unheard of on x86 architecture since shared L2 is typically found on architectures like SPARC. Intel only has 256K of L2 on there $1000 i7 980x as a comparison.

Back to threads - Bulldozer will support 16 threads simultaneously and act like 16 CPU in you system. If we can put this to use, the CPU slowdowns we have seen with ArmA II might be a thing of the past. Maybe I am dreaming for ArmA III, but for games and apps, CMT is the future. Plain and simple. Lets use it.

Just my two cents.

-X

Share this post


Link to post
Share on other sites
The one thing I hope for with ArmA III is more CPU threads. In playing ArmA II, I was watching while playing and I notice that 3 threads are used fairly well, with possibly a fourth thread in use.

Why do you ask that I ask for more threads? Simple. ArmA II has shown that it is not a GPU limited game, but a CPU limited game. A kick ass GPU will not get you much in the way of extra FPS. It is all the objects in game that bog this game down, + physics, + AI, so if you have a slow CPU even though you may have a bad ass GPU, this game will likely still suck for you.

This brings me to new arch types that are coming out this year. AMD's Bulldozer (first new architecture out of AMD since 2003) is about to be released with 8 cores. 3.8 GHz with 4.2 GHz turbo. All cores can turbo. In addition, you will have 8MB of shared L2 cache - unheard of on x86 architecture since shared L2 is typically found on architectures like SPARC. Intel only has 256K of L2 on there $1000 i7 980x as a comparison.

Back to threads - Bulldozer will support 16 threads simultaneously and act like 16 CPU in you system. If we can put this to use, the CPU slowdowns we have seen with ArmA II might be a thing of the past. Maybe I am dreaming for ArmA III, but for games and apps, CMT is the future. Plain and simple. Lets use it.

Just my two cents.

-X

Arma 2 uses up to 16 threads (as i seen in proces explorer), i think i even seen 17.

It is no problem to thread some process, problem is to gain something from it.

Actualy BIS with ARMA 2 has one best multicore performance gains in gaming history. And still it is most demanding CPU game i played.

My friend i dont think you understand what are you writing.

Share this post


Link to post
Share on other sites
This brings me to new arch types that are coming out this year. AMD's Bulldozer (first new architecture out of AMD since 2003) is about to be released with 8 cores. 3.8 GHz with 4.2 GHz turbo. All cores can turbo. In addition, you will have 8MB of shared L2 cache - unheard of on x86 architecture since shared L2 is typically found on architectures like SPARC. Intel only has 256K of L2 on there $1000 i7 980x as a comparison.

Where the fuck did you get that shit from?

i7 980x has SIX times 256K L2 and 12MB L3. Get your facts straight.

Share this post


Link to post
Share on other sites

Back to threads - Bulldozer will support 16 threads simultaneously and act like 16 CPU in you system. If we can put this to use, the CPU slowdowns we have seen with ArmA II might be a thing of the past. Maybe I am dreaming for ArmA III, but for games and apps, CMT is the future. Plain and simple. Lets use it.

Just my two cents.

-X

You're confusing things. Interlagos is the only Bulldozer CPU with 16 threads, and that's for SERVERS, if you're crazy enough to buy a server chip to play go ahead.

On th Bulldozer architecture there is no differnce between threads and cores so the number of cores is the same as the number of threads. For Desktop, Zambezi CPU will be a 8 core 8 thread processor. Interlagos will be just a MCM with 2 Orochi dies.

Share this post


Link to post
Share on other sites

Agreed. It may use more threads, but it is really only using 3 effectively.

Arma 2 uses up to 16 threads (as i seen in proces explorer), i think i even seen 17.

It is no problem to thread some process, problem is to gain something from it.

Actualy BIS with ARMA 2 has one best multicore performance gains in gaming history. And still it is most demanding CPU game i played.

My friend i dont think you understand what are you writing.

I agree. It is the first I have seen that will actually touch a 4th core. It is just not realizing it's potential in my opinion. If you really have 17 threads running, I'd like a graphic for proof on that. I could see 17 threads, but it is not really using more than 3 effectively, with maybe a 4th for some additional work.

And BTW, for the future - please don't tell me what it is or isn't what I understand. Trust me, it is my job to understand CMT. Leave the patronizing BS at the door if you can and just tell me facts. I can be wrong in my facts since that is an easy thing to do, but just tell me where I flubbed up and I am good with that. No harm done though.

Where the fuck did you get that shit from?

i7 980x has SIX times 256K L2 and 12MB L3. Get your facts straight.

Classy response. But if you must know, I screwed up in not saying the whole thing. It is 6 x 256K. Still pathetic when compared to Bulldozer and shared L2 is still exotic for x86 and intel does not have shared L2.

You're confusing things. Interlagos is the only Bulldozer CPU with 16 threads, and that's for SERVERS, if you're crazy enough to buy a server chip to play go ahead.

On th Bulldozer architecture there is no differnce between threads and cores so the number of cores is the same as the number of threads. For Desktop, Zambezi CPU will be a 8 core 8 thread processor. Interlagos will be just a MCM with 2 Orochi dies.

It is 8 cores and 16 threads. See the following arch diagram. Bulldozer is a leap from current x86 and this 2 threaded design is in all of them:

bulldozer_ht-w.jpg

Friggen awesome. :)

Some tidbits can be found on it here:

http://www.bit-tech.net/hardware/cpus/2010/08/24/amd-previews-fusion-details/2

BTW - this is not the server version. That is getting something different.

-X

---------- Post added at 09:34 PM ---------- Previous post was at 09:20 PM ----------

And BTW, after analyzing Bulldozer, I expect it to trump Intel's best for only $300. However, I also expect it to be 'short lived'. Intel probably is thinking like I am at this point and has seen the writing on the wall. They are investing like mad in 22nm. Billions! They will probably have a counter to Bulldozer by the end of the year or something next year.

You know what all this means? It rocks to be a consumer in this competition world! :) Instead of buying a product from a company resting on its laurels, we all make out with awesome product, no matter who's arch you buy.

Let the fight continue! Woo hoo! Just my two cents.

Now lets see what ArmA 3 can do with all this processing power.

-X

Edited by xman1

Share this post


Link to post
Share on other sites

I hope they don't delay. Tired of being gouged at the CPU seller and the pump lately. There is no reason I should have to pay $1000 for a 980x or 990x CPU. We need some competition here.

Of course, unlike intel, AMD can't afford a Sandy Bridge style launch where it cost intel $1 Billion. I was thinking about intel for an upgrade, but if the SATA 3 has issues, what else is broken on the die? SATA is the only thing to come out at this point. Are the transistors weak on other portions?

At the end of the day though, if we can get intels crazy prices out of the clouds, I am good with that.

-X

Share this post


Link to post
Share on other sites

I agree with the core of the main post, all spec arguments aside. ARMA needs to utilize multiple cores and hyperthreaded cores better.

Share this post


Link to post
Share on other sites

It is 8 cores and 16 threads. See the following arch diagram. Bulldozer is a leap from current x86 and this 2 threaded design is in all of them:

Obviously you didn't understand that slide. What you see there is a Bulldozer module, which takes the concept of sharing resources, specially the frontend. That Bulldozer module has TWO INTEGER CORES, which means you have real hardware for 2 threads. So your OS won't see a module it will see 2 cores, like if it was a normal dual core, except it is sharing frontend resources, plus L1 instruction cache. Each module will have 2 128 bit FMAC capability or 1 256 bit AVX. Each integer core can only do ONE THREAD, so there's no such thing as 8 cores 16 threads.

A Zambezi CPU (full Orochi die) will have 4 modules. 4 X 2 = 8 Cores 8 Threads.

A server Interlagos CPU will be composed of 2 Orochi dies in MCM (multi chip module). 16 cores 16 threads

I don't blame you, most of the articles on the web are pure crap, or either you didn't read it properly. Bulldozer architecture was a bit confusing to a lot of people.

---------- Post added at 11:49 AM ---------- Previous post was at 11:30 AM ----------

In addition, you will have 8MB of shared L2 cache

-X

WRONG!!!! You will have 8 MB of L2 Cache yes, but it's 2 MB of L2 per module and L2 is not shared at CHIP level, it is shared at MODULE level only. The 8MB L3 Cache will be shared at CHIP level.

You are confused. This is a good article for you:

http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333

Edited by CarlosTex

Share this post


Link to post
Share on other sites
Obviously you didn't understand that slide. What you see there is a Bulldozer module, which takes the concept of sharing resources, specially the frontend. That Bulldozer module has TWO INTEGER CORES, which means you have real hardware for 2 threads. So your OS won't see a module it will see 2 cores, like if it was a normal dual core, except it is sharing frontend resources, plus L1 instruction cache. Each module will have 2 128 bit FMAC capability or 1 256 bit AVX. Each integer core can only do ONE THREAD, so there's no such thing as 8 cores 16 threads.

A Zambezi CPU (full Orochi die) will have 4 modules. 4 X 2 = 8 Cores 8 Threads.

A server Interlagos CPU will be composed of 2 Orochi dies in MCM (multi chip module). 16 cores 16 threads

I don't blame you, most of the articles on the web are pure crap, or either you didn't read it properly. Bulldozer architecture was a bit confusing to a lot of people.

---------- Post added at 11:49 AM ---------- Previous post was at 11:30 AM ----------

WRONG!!!! You will have 8 MB of L2 Cache yes, but it's 2 MB of L2 per module and L2 is not shared at CHIP level, it is shared at MODULE level only. The 8MB L3 Cache will be shared at CHIP level.

You are confused. This is a good article for you:

http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333

If it is shared L2 for a single core with two integer units, that is not as cool. You are also right in that there are a lot of articles that are crap out there. This is a testament to how good AMD has locked down its secrets for which they have done well.

From what I understood of that slide, that was two cores exactly like that on a single module for a total of two cores, four integer pipes, for a total of 8 cores and 16 integer pipes per CPU.

Let me read your article. Thanks for the link.

-X

---------- Post added at 10:06 AM ---------- Previous post was at 09:45 AM ----------

Details in a pinch from some further research:

Bulldozer module consists of the following:

up to 2048kB L2 cache inside each module (shared between the cores in a module)

16kB 4-way L1 data cache (way-predicted) per core and 2-way 64kB L1 instruction cache per module, one way for each of the two cores

Two dedicated integer cores

- each consist of 2 ALU and 2 AGU which are capable for total of 4 independent arithmetic and memory operations per clock per core

- duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which significantly increase performance in multithreaded integer applications

- second integer core increases Bulldozer module die by around 12%, which at chip level adds about 5% of total die space[16]

Two symmetrical 128-bit FMAC (fused multiply-add (FMA) capability) Floating Point Pipelines per module that can be unified into one large 256-bit wide unit if one of integer cores dispatch AVX instruction and two symmetrical x87/MMX/SSE capable FPPs for backward compatibility with SSE2 non-optimized software

So each core can do 4 independent arithmetic and memory operations per clock cycle for a total of 16 per CPU. Both integer units share an FPU unit though.

Now here it is from the horses mouth (AMD):

There have been some questions about the Bulldozer architecture so this should help clear up any confusion.

First, Bulldozer is based on a modular architecture where two integer cores are teamed up with an extra-large FPU to create what we call a Bulldozer module. Bulldozer modules are the basis of all of the designs that will be coming from this architecture, and it’s modular nature not only allows us to build processors with different sized core counts but also provides flexibility for future designs that could allow other modular components like GPUs to be added into the designs. The Bulldozer module is a concept and part of an architectural design, it is not something that the user will come in contact with. For instance, when an Interlagos system boots up, the hardware will see 16 integer cores, not 8 modules. When the OS loads, it will see 16 integer cores, not 8 modules, and the applications will see 16 cores as well. Because of this extremely consistent manner by which the whole system sees the integer core (and not modules), it is only natural that Interlagos will be marketed as a 16-core processor. It would actually be more confusing to call it an 8-core processor, because there is no point where a customer would see 8-cores.

Secondly, there was a question about the amount of die space that is consumed by having 2 integer cores in a module versus just one. Bulldozer was designed to be a modular architecture where 2 integer cores are able to share certain resources where it makes sense (in order to reduce power consumption) yet still retain discrete components in order to ensure great performance and no bottlenecks. It was never designed as a single integer core in each module, so dissecting the module components becomes a bit more tricky. Some have compared this to SMT and made statements that SMT customers could see a modest increase in performance for only a fraction of die space. We believe that our Bulldozer architecture will provide far greater performance gains than SMT with up to 80% greater expected throughput when running 2 threads simultaneously compared to a single thread running on a single integer core. Our engineers estimate that the amount of discrete circuitry that is added to each Bulldozer module in order to allow for a second integer thread to run is ~12% at the core level, but because the integer cores are only a portion of the overall die space , the addition of the second integer core in each module only adds ~5% of circuitry to the total die. We believe this is an excellent balance of greater performance with a very small silicon cost.

Finally, there are those that have suggested that the two integer cores in the Bulldozer module could potentially be merged together into a single core. This is not true. Perhaps they are confusing the functionality of the FPU, which is flexible enough to be split between the two cores in the module, giving each a 128-bit FMAC simultaneously, OR can be combined into a 256-bit FMAC for one integer core to use exclusively if the second integer core does not need any FPU commands in that cycle.

We hope this clarifies the questions that seem to be most prevalent.

Still trying to determine what is missing on the consumer CPU. What is the codename for the desktop variant?

---------- Post added at 10:23 AM ---------- Previous post was at 10:06 AM ----------

My final thoughts:

Anyway - that article I couldn't see it list the desktop proc anywhere after skimming over it. I am thinking that 8 threads is the count though an not 16, though it will seem like 16 due to the ability to do two integer and mem look ups per clock cycle.

I prefer a real core per thread anyway (no sharing!). And from what I can see, the idea that a module is only one core is incorrect. It really has 8 cores but a shared FPU. The shared L2 only being shared between 2 cores is not as good as I thought, but a big improvement at least on x86. The shear size of it alone at 2MB per module and 8MB per CPU is huge. I'd rather have that over large L3 any day.

Anyway, here is to hoping AMD finally brings back that crown of fastest proc for a short time. It is just good for us consumers if they do.

-X

---------- Post added at 10:47 AM ---------- Previous post was at 10:23 AM ----------

One more thing i just found:

http://wccftech.com/2011/05/21/amd-bulldozer-zambezifx-cpu-performance-exposed-beats-i7-2600k-cinebench-benchmark/

Seems I will get my wish for some ratcheted up competition between AMD and intel. I'd like to see something official though. None of this obscure blog post numbers.

-X

Edited by xman1

Share this post


Link to post
Share on other sites

all i know is that i am looking forward to it for the price (350$ - 8core one) for my second PC upgrade. I still think the 800/proc - i7 970 was a bit over the top :)

Share this post


Link to post
Share on other sites
all i know is that i am looking forward to it for the price (350$ - 8core one) for my second PC upgrade. I still think the 800/proc - i7 970 was a bit over the top :)

Same, though I am still skeptical about single threaded performance. It has to excel single threaded as much as it excels in multi-threaded apps. So i hope a single core can turbo pretty far.

If Intel's single-threaded performance is way better, and it is not much slower multi-threaded, I will likely buy intel for my next CPU.

The biggest thing I am hoping for is to get better ArmA II performance and cheaper CPU's prices. At the end of the day, that is my goal.

-X

Share this post


Link to post
Share on other sites
ArmA 2 can use up to 31 cores in theory, but experiments have shown that with most scenes the gain above 4 cores is small and above 8 cores unmeasurable.

The explanation is Amdahl's law - only parts of the application is using all cores. See Real Virtuality Going Multicore blog.

---------- Post added at 13:58 ---------- Previous post was at 13:54 ----------

In build 76122 and newer the default for dualcores will be changed to -exThreads=3 based on user feedback.

We have also changed the cpu core detection, therefore depending on how many logical cpus are present, default -cpuCount values will be as follows:

1 1

...

6 6

7 7

8 4

9 4

10 5

11 5

12 6

13 6

14 7

....

Some day hopefully we will find a time to provide a proper HT detection, but until then I think the above provides quite reasonable default settings.

/10 chars

Share this post


Link to post
Share on other sites

It took 3/4 years since we have quadcores for aplications to take advantage of it... I am sceptical if we're going to cut that timeframe to half for the same to happen. As long as BIS implements it in A3 I will probably jump in that octocore stuff though.

Share this post


Link to post
Share on other sites
It took 3/4 years since we have quadcores for aplications to take advantage of it... I am sceptical if we're going to cut that timeframe to half for the same to happen. As long as BIS implements it in A3 I will probably jump in that octocore stuff though.

I disagree, because I think that delay was because of a fundamental shift from single threaded to multi threaded programming. I would expect the process of utilizing more cores in programming to accelerate from here on.

Share this post


Link to post
Share on other sites
If it is shared L2 for a single core with two integer units, that is not as cool. You are also right in that there are a lot of articles that are crap out there. This is a testament to how good AMD has locked down its secrets for which they have done well.

You shouldn't think it is not as good, this is necessary for the architecture to function properly. There are 3 levels of cache, and that's how it should be in this case

From what I understood of that slide, that was two cores exactly like that on a single module for a total of two cores, four integer pipes, for a total of 8 cores and 16 integer pipes per CPU.

Forget the integer pipes it is irrelevant to talk about how many they are in total, because they aren't going to work together. All you need to know is this:

Bulldozer is a 4 way design with a modular approach where each core shares FRONTEND, L1 INSTRUCTION CACHE, L2 CACHE (at MODULE level) and FPU. FPU is big enough to allow one 128 bit FMAC op per core. Each core has it's own integer scheduler so think of a Bulldozer module as an optimized dual-core.

Still trying to determine what is missing on the consumer CPU. What is the codename for the desktop variant?

Zambezi (1 Orochi die) is 8 core for Desktops.

Some Defective harvested Orochi dies may be used for the 4 Core and 6 core variants, and after yields are very strong AMD will just disable modules for lesser core variants.

Interlagos (2 Orochi dies in MCM) for Servers.

Valencia (1 Orochi die) 8 core for Servers.

I prefer a real core per thread anyway (no sharing!). And from what I can see, the idea that a module is only one core is incorrect. It really has 8 cores but a shared FPU. The shared L2 only being shared between 2 cores is not as good as I thought, but a big improvement at least on x86. The shear size of it alone at 2MB per module and 8MB per CPU is huge. I'd rather have that over large L3 any day.

This architecture will be the future. YOu cannot just keep adding more and more transistors because one day you'll hit a wall. Intel will follow this approach too.

Sharing allows die and transistor savings. Bulldozer is their first design so there are probably some sharing penalties but they will improve on this and make penalties marginal or non-existant. AMD already announced Zambezi's replacement "Komodo", which will be a Bulldozer enhancement.

As time passes AMD is fine tuning and figuring out a lot of stuff that can be improved. There is still a lot to improve in this design, something it was not true for their previous design.

EDIT: BTW don't take those benchmark leaks too serious.

Edited by CarlosTex

Share this post


Link to post
Share on other sites
I disagree, because I think that delay was because of a fundamental shift from single threaded to multi threaded programming. I would expect the process of utilizing more cores in programming to accelerate from here on.

I have that present (that's why I cut the timeframe in half as an hipothesis). Still sceptical...

ArmA 2 can use up to 31 cores in theory, but experiments have shown that with most scenes the gain above 4 cores is small and above 8 cores unmeasurable.

...

This is a given for Arma 2 then, it essencially depends on the ability to parallelize whatever process, or fragment it in independent instructions.

Edited by gammadust

Share this post


Link to post
Share on other sites
The one thing I hope for with ArmA III is more CPU threads. In playing ArmA II, I was watching while playing and I notice that 3 threads are used fairly well, with possibly a fourth thread in use.

Actually, I couldn't care less about how many threads ArmA3 is going to use. I just hope the game will be a noticeable step forward from its predecessor and provide a smooth playing experience. :)

I notice that people with little or no background in software engineering tend to reduce their general model of a good game architecture to something like "more threads = better performance", which is simply not true.

The hard part as a coder is of course to figure out *WHAT* parts of your code can be effectively scheduled for concurrent execution without getting yourself into a total debugging nightmare...

I'm quite confident that BIS will be tackling this challenge with some smart concepts, as has happened before (see memory management as just one example).

Share this post


Link to post
Share on other sites

It would be realy nice that they run in game server as a another process, thus you can set affinity on other cores.

I realy realy WANT that, that way i can run my own little war with much more AI...

Shure i can set up dedicated, but im to tyred to set it up (and lazzy)

Share this post


Link to post
Share on other sites

Mi pc:

Placa: Asus P6T V2 Deluxe

Procesador. I7 950 3,07 ghz

Tarjeta Grafica:Asus GTX 480 Nvidia

Disipador:Noctua NH-D14

Fuente Alimentacion: Corsair HX 1000w

Memoria Ram: 12 gigas DDR-3 Kingston a 1333

Disco duro: 1TB

Teclado: Logitech G-19

Cascos: Logitech G-35 7.1 surround

Raton: Logitech G-9

Pantalla: LG LED 22

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×