Jump to content
Sign in to follow this  
darkpeace

Multi cpu / hyperthreaded flashpoint server s/w

Recommended Posts

BTW: did you run your linux server on linux 2.4 or linux 2.6?

Share this post


Link to post
Share on other sites

2.4.25 was what the set up last week on the 3.0 ghz

the zeon was more up to date, i wont bother with updating linux now since win2k will be installed

Share this post


Link to post
Share on other sites

i have a dual CPU system at home with FD loaded and i still see windows showing 100% load on the cpu0 no matter what means i use

FD and OFP server is simply not using both CPU0 and CPU1, i can see it using 0 and maxing it out "claming" its 50 % but thats because its 50% of the total system resources and not individule CPU Usage

we've had this debate years ago, HT or Second CPUs will simple not help OFP, they will help the WIN/OS to run, in a sence the CPU is 100% dedicated to OFP unlike single CPU servers

this is not the case for Linux based HT system where the HT will choke the server, now even a 3.2 dual Zeon with disabled HT will still share load the CPU and that i dislike, but since its FSB is 533 the P4 3.0 or 3.2 with fsb will stomp on the xeon when it comes to OFP servers for both linux and win2k

I can see the 64 bit machines as a possiblity

i was really disapointed with linux and xeons, and the lack of FSB will make me stay away from them

raw power and FSB rules!

Share this post


Link to post
Share on other sites

If you just want raw memory bandwidth the Opteron is meant to support up around the 19.2 GB/sec mark (vs 6.4 GB/sec on 800 FSB Pentium 4's)

Mind you that is the equivilent of 6 Channel DDR400 (PC3200) so I'll have to dig around to see which chipset supports it.

Also, Intel have new 1066 FSB Xeons on the way (as well as 1066 FSB Pentium 4's)

http://www.amd.com

http://www.amd.com/us-en....00.html

http://www.intel.com

(I use the .au domains, as I am in Australia)

Share this post


Link to post
Share on other sites

managed would like us to try one, but without 64bit windows app i wont bother

they did quote us $60 if i ship a server out, renting an opetron for 58$ more (we pay $110 for the 3.0) is not scary, whats scary is buying the OS for it (w2k3 server), the Win XP 64 bit is in beta as we speak and may be released in a few months - so i read on thier site, once that is out the opterons will be an option

and then again thier is a 180 day trial for w2k3 64bit .......

http://www.microsoft.com/windows....lt.mspx

i sent an email to managed.com to see if this would be possible, if so this would be the new WGL CTI league server

http://www.wglcti.com

Share this post


Link to post
Share on other sites

For whom who may be interested:

I went out and bought myself a P4 with Hyper-Threading and now I've done some testing on it.

Some stuff that may or may not be news:

HT is an extension of the pipe architecture used by most processors. A pipe simply executes one instruction at the time. A processor with multiple pipes allows a processor to execute multiple instructions at the time. This is usually referred to as pipelining and makes life hell for assembler programmers

Hyper-Threading allows the pipes to have different execution positions. This mean the pipes can be split in-between 2 threads making 1 processor behave more like 2 processors. One processor can therefore execute code in parallell at two different memory positions while the old pipe system only could execute multiple instructions in parallell at one memory location.

There are limitations.

The pipes being able to handle floating point instructions are limited in the Pentium. Two hyper-threads can not execute floating point instructions at the same time. I'm sure there are other limitations too but this is what came out clearly in my tests.

In multi-threading, integer processing was almost double the speed while floating point processing was the same speed as for a single thread.

Summary: In a multi-threaded environment HT gives a 20-50% performance boost in average. Not double the speed as some rumours suggests.

But, just as multiple processors won't give much for game clients the same goes for hyper-threading. It will however make the entire desktop experience a bit smoother and perhaps handle background tasks better making the game jerk less when other processes takes processor resources away from your gameplay.

I like Hyper-Threading smile_o.gif

Share this post


Link to post
Share on other sites

so do i, i love it for work but not for gaming (i dont mean game servers)

for those interested in the difference between a XP 2500 (333 fsb) and a P4 2.8 (800) with HT

here are two links

AMD with Ti4800 (note this card is better then the 5700 OFP wise)

http://service.futuremark.com/compare?2k1=6501687

P4 with same TI4800

http://service.futuremark.com/compare?2k1=7919299

the amd was 1.83 oc to 2.1 and the P4 was 2.8 oc to 3.2 and not only is the AMD slower in GHZ but it waxed the P4 big time

what to use each for

AMD = gaming

P4 = work stations / game server plat form

XEON = Web servers / work stations

Opertons = all the above but HT and xeons are still nice for work startions and web servers

Share this post


Link to post
Share on other sites

Certa see my Drystone / Whetstone benchmarks a page or so back. Has all the info one would need (bear in mind it is optimized Drystone and Whetstone, not the standard ones)

Drystone = MIPS

Whetstone = MFLOPS

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ahhh hell I'll post them again:

Here are some SANDRA 2004 scores of various CPUs:

[CPU Name]

[Drystone, Whestone FPU, Whetstone iSSE2, MHz]

[Performance Index vs P4-C w/o HT]

Intel Pentium 4-C [2 SMT] 3.2GHz 512L2 (HyperThreading)

9808, 4059, 7095, 3200

119,  171,   164,  100

Intel Pentium 4-C 3.2GHz 512L2 (No HyperThreading)

8243, 2368, 4330, 3200

100,  100,   100,  100

AMD Athlon FX53/Opteron 150 2.4GHz 1ML2

9749, 3764, 4903, 2400

118,   159,   113,   75 (so far 3.2Ghz add 33%)

AMD Athlon XP 3200+ 2.2GHz 512L2

8380, 3465, NOSSE2, 2200

102,   146,      "     ,    68.75 (so for 3.2Ghz add 45%)

Intel Pentium M 1.8GHz 1ML2 (*)

7084, 2499, 3196, 1800

 86,  106,     74,    56.25 (so for 3.2Ghz add 77%)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

HT FPU/SSE added 71% performance boost vs non HT

HT FPU/SSE2 added 64% performance boost vs non HT

Bear in mind the above was one some very optimized code that run threaded very well, so these are the expected maximums in performance one could theoretically gain by using such a feature.

Main thing I noticed was that SSE2+HT was 74% faster than SSE+HT and SSE2 w/o HT was 83% faster (so the code benefits from SSE2 heaps)

ALso the Athlon 64 FPU was 59% faster in plain (non threaded) floating point ops. So in theory the majority of applications that have FPU/SSE support, ***but not hyperthreading or SSE2***, would perform much better on a Athlon 64, than on a P4 Northwood core (and this is comparing 2.4 Ghz to 3.2 Ghz)

Once HyperThreading came into the equation the Athlon 64 only trailed by 8%, so for non threaded code (the most common type) performance should be excellent, yet in threaded code it is only 8% slower. That is a damn fine trade off if you ask me.

HyperThreading is only good in video editing, some web server specific roles and running 2 archivers (WinRAR) at once, on each 'logical' CPU (note: that each RAR operation will take longer than it would have, but doing 2 at once makes up for it)

It has a future in gaming, but I doubt we will see any real support (aka: over 75% CPU usage reported by Task Manager) in games for a long time to come (and longer still for gaming servers)

Summary:

~~~~~~

If only all software could be coded so well, or 'exploit' features (eg: Use ALU and FPU at the same time, and perform as much floating point in batches that need the same operation performed)

In reality however, this is not possible, as the human mind can only comprehend so much at once.

Share this post


Link to post
Share on other sites

Thanks for the numbers darkpeace

This is the actual numbers I get in my own test app for a 2.8 GHz, HT, 1MB L2 cache. The numbers might not say much without reference but you can compare single and multi-thread values.

HT28.gif

Share this post


Link to post
Share on other sites

The whole "Relative to PII-400" casts doubt over that benchmarks SSE support smile_o.gif, as the PII-400 only supported MMX and had some extra cache.

Me thinks that the program you using is a 'tad' dated.

Still valid figures all the same, to see how a P4C+HT handles PII FPU code.

Share this post


Link to post
Share on other sites

I wrote the test app myself. The relative to P2 is just a relative comparison to my old P2, if that radio button is checked. However, the numbers displayed are in this case the calculation score, not the comparison.

It does not test MMX, 3D Now or any specific instruction sets. It tests generic Intel/FPU code (zlib compression/decompression, fixed point and floating point matrix calculations and random memory access) compiled from C++ by Visual Studio 6 with standard release optimizations.

I'd say most code and games today are compiled with Visual Studio 6 (only Pentium 1 optimized). I consider my test app realistic and not in any way ideal for the specific processors.

Programmers are slowly moving over to Visual Studio 7 even though the resistance to .NET is making it slow. VS7 can compile code better optimized for modern processors.

In a not to faar away future, if game programmers move over to managed code, JIT and install compilation can optimize especially for the installed processor(s). But we are not there yet.

Share this post


Link to post
Share on other sites

Before we end up in a benchmark argue, I'd like to underline that my test app was designed for benchmarking servers with one or several CPU's.

Server applications normally do not use MMX, SSE, 3D Now! or any geometry processors in graphic cards smile_o.gif

Most game programmers do not write specific assembler code for these instructions (some cool dudes do) but rely in the support from Direct X, Open GL or other geometry libs to perform the calculations for them. Hopefully the libraries takes advantage of processor or graphics card specific features.

That is outside the scope of my test app.

Share this post


Link to post
Share on other sites

Good to here smile_o.gif, Not here to argue about benchmarks.

Bear in mind that P4-Northwood was only made around dual threaded software, and that the 4 Way threading performed (last screenshot above) may hinder some results slightly.

Either way, I can see how it is realated to the Flashpoint Server exe, as a program.

I started this forum with the intention of server only optimisation, and intend to keep it that way, the game engine itself (the client side of it) can be left to the other 98%+ of the forums.

Didn't think the app was coded by yourself. Well Done.

Based on the screenshot, you can see how I came to my conclusions though - lol.

It wouldn't surprise me if it did have some MMX support, as good compilers these days add a CPU detection routine to the final EXE, and add reasonable support for certain features on certain CPUs (based on compiler date).

For example: Half-Life runs on a 486, (try it, it really does, esp with a TNT2 PCI video card on later 486s with PCI bus)yet it is Pentium optimized, and will use different code for Pentium systems.

There are also better compilers than the MS or Intel one lying around, but support for them is limited sad_o.gif

Still I find it very weird that your floating point benchmark was slower when threaded, and the optimized version of Whetstone (and the iSSE2 version aswell) are much, much faster with hyperthreading. They are both similar benchmarks and both utilise floating point instructions.

Perhaps there is a difference (likely) at the compiler / final code level, between HyperThreading and actually having 2 CPUs. In that a given compiler that was designed with HyperThreading as well as Dual CPU in mind would give vastly different results.

It all comes down to the code being fed to the CPU(s), I guess version 6 was just not meant to create (terrific) P4 output programs (as it may predate the Northwood).

Also was the P4 you tested it on using a decent i865pe or i875p (or any Dual-DDR) chipset ?, as large numbers of 80bit or 64bit floating point values eat memory bandwidth. Just curious as it may have affected just the FPU results, as most integar values are only 32 bits wide.

As for DirectX and OpenGL benchmarks, etc, laeve it to game forums, I do not see a dedicated server using OpenGL for assistance with AI. It is not unheard of to have video cards do things other than video, but what I really want prob needs a special AI Card/Device, with its own bus to the CPU / Northbridge, that is like an AI Accelerator (like the early 3D video cards where.....)

Oh well, maybe in a decade or two......

Hehehe, It would have its own language and not use antique instruction sets, yet still plug into x86 systems. OpenAI and the MS ripoff DirectAI.

Maybe when Direct X 12 comes out will such hardware exist.

Share this post


Link to post
Share on other sites

Anyone know if the VBS1 server is actually threaded (in such that it can use 100% of all CPUs if required, assuming no other bottlenecks are introduced) ?

Would be mad to have MFCTI at a LAN with 32 per side with server running at 50fps, even if it means AU$200 per player to get a copy of VBS1.

(Yes I am serious, if it is threaded as above, and someone can prove it to me with a screenshot while hosting 1 single server, not 4 servers over 4 CPUs, then I'll go out and do it)

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×