Jump to content
Sign in to follow this  
darkpeace

Multi cpu / hyperthreaded flashpoint server s/w

Recommended Posts

Was digging for more ways to make the server balance over multiple CPUs (not just use 25% load over 4 CPUs) and VMware came to mind aswell (not the cheapest solution, only for big CTI games, etc, would it be required)

http://www.vmware.com

http://www.vmware.com/products/server/vsmp_features.html

I *suspect* one of their products makes multiple CPUs appear as one single CPU to the "Guest OS", thus if you run OFP:R Server on the guest OS it should hit 100% usage (or never, ever dip below 50fps unless you hammer it very badly)

I'll ask around at work and see who says what, the above may be untrue (I only *suspect* it may be the case)

I have also é-mailed VMware themselves to confirm this.

Would be damn nice for a Quad Operton if it did work though smile_o.gif

Share this post


Link to post
Share on other sites
mind you on a game level my Barton 2500 oc to 3200 kicks my P4 2.8 (HT 800) O/C to 3.06 ass by 2000 points on 3dmark 2001

after spending money on that "upgrade" i was really disapointed

Would be interesting to see if you, RN Malboeuf, setup your own overclocked Barton core Athlon XP (running SuSe Linux or Windows 2K/XP) as a OFP:R Server and ran some CTI maps on it to compare performance.

(cuz I reckon RN Malboeuf can pull it off easy, and has the Via 4in1 drivers or nForce2 drivers ready to rock)

Or is there anyone out there (Prime?) that could (be arsed to)make a map that could be used to "manually" benchmark servers ? (With the servers ViewDistance noted manually or by the map)

Maybe the ViewDistance was something dodgy on the DevGRU Barton (you sure it was a Barton ?)

Also when you say 2500, you mean PR2500+ clocked to PR3200+ yeah, on say a 400FSB with 400mhz DDR-SDRAM yeah ?

Share this post


Link to post
Share on other sites

thank you so much darkpeace for you research and effort!

best regards

Share this post


Link to post
Share on other sites

I have told Mal this, I will tell you all this. Multi CPUs and OFP will work. 2 words Windows and Firedeamon. Only down side is you cant see the OFP console, so if you want to view some ones ID later on your shit out of luck you better #userlist and get it while you can. It runs completely transparent in the backround as a windows service, also uses about 15% less memory then running as an application.

Come test it out....

Dual Xeon 2.8 w/HTT using Firedeamon 1.7

69.93.58.148:2032 - 100 player

30 Player SHoP CTI, MF 1.16 18 Player, WGL 24 player, you name it it runs it lag free, most CPUs peaked is 58% and thats both of them together. Keep in mind we run Ventrilo 100 user server, and a 20 player private server too.

If anyone has some huge ass maps, I WANT THEM FOR TESTING.

Share this post


Link to post
Share on other sites

We run a modified exe that allows us to log out everything to a text file so we can do stuff like check on id's and stuff. Also allows us to set a limit on total sound files and not just individual ones, a little off topic but thought I would let you know

Share this post


Link to post
Share on other sites

Few questions for "backwoods" (2 posts up):

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Do you play (MF)CTI or RTS or other large scale warfare maps ?

What is the peak CPU load that a single given OFP:R Server thread will use on your server ?

If you load the server heavily and get it down to 10fps (for testing only) does it use near 100% of all available CPUs ?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I have checked out:

http://www.firedaemon.com/register/

I assume I need to be running the one with SMP Support for the subprocess optimisations to have any effect (once I get a MFCTI server at over 75% available load, on P4+HT, I will be a very happy man)

The SMP version will "force" the server process to run on all available CPUs correct ?, and thus get above 50% (or 25% in some cases) utilization ? (That is assuming server fps is well below 50fps, or 32fps in some cases, indicating it has "hit the usage wall" as I have come to call it)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[evil laugh - lol] I found it (doh):

http://www.firedaemon.com/downloads/FireDaemon-Pro-1_7-Pre-GA.exe

The above is the direct link, see the main http://www.firedaemon.com for more information, appears I can test SMP after all, for a 30 day trial, now I can really begin the fun. (maybe I can keep my overclocked Northwood and actualy see some REAL performance......was going to build an Athlon 64(FX maybe) for a LAN in a few weeks, could save some dosh here as FireDaemon looks like an inexpensive solution)

Unless of course SuSe have something better on the linux front, and I doubt many gamers will fork out for VMware and the VMware Virtual SMP add-in, etc

Cheers all, Keep the info coming in, this is helping many gamers as we speak smile_o.gif

Tabris.DarkPeace

GarageLAN, ACT

Share this post


Link to post
Share on other sites

We play nothing but CTI and RTS type games. We ran a 30 player CTI last night.  I dont know about single CPU yet I would assume it would run in the 80s or 90s.  All I know is even the fastest servers out there hit 100% and hold it throughout a CTI game, even 14 players.  Ours peaks at 58% on a 30 player CTI 60 minutes into the game with 350mb peak memory usage.  Like I said come try it out....  our connection is much faster now too, I am pinging 10 - 15 ms less then our Klan-Host server.  I had it running at 6FPS at one point with a heavily flooded COOP 1.16 CTI on DTB/INSANE and server utilization didnt go past 55% that time.  you can watch the CPU graph and see how it evens the load over both CPUs, its the kewlest thing.

Dark the licenses for FD 1.7 are cheap. I found a site where some one was selling them for like 15 bucks.

Share this post


Link to post
Share on other sites

All I need to complete my research is the #MONITOR 5 figures about 90 minutes (or longer) into a CTI game.

Thanks for the uber fast replies btw.

Share this post


Link to post
Share on other sites

Cheers,

I am at <darkpeace@internode.on.net> if anyone feels like emailing server screenshots with TaskManager (Each CPU + Kernal usage as red line) with Flashpoint in foreground, as well as a "#MONITOR 5" report (enter when logged on as Admin to a server, or voted as admin) about 90 - 120 minutes into a CTI game, just go nuts !

***!!! Please use PNG or JPEG or JPEG2000 format as required !!!***

Also basic server specs and OS would be nice with the above.

Share this post


Link to post
Share on other sites

I will take some ingame screenies, and some taskmanager screenies as well.

Share this post


Link to post
Share on other sites
Well, our new server will be an athlon64, so i guess i can tell you in a short time if i notice any difference  biggrin_o.gif

i dont think you'll see any performance gains running a 32 bit app

Share this post


Link to post
Share on other sites

ofp and FD is nothing new, Frag ran it that way for years, but his problem was a shit cpu speed

Ed did post that windows lies about the actual CPU usage, in our pic below on the origonal RN whore house (p4 3.0/3.3 FSB 880 with HT and win2k and FD) it showed 53% usage, the reason it is lieing is that no matter what stresses we put to the server it still never climbed over 55%, so only 1 CPU must really be working its ass off

[im]http://www.roughnecks.org/ofp2/images/cpu.jpg[/img]>100kb

dont get me wrong i still think win2k and FD is the best way to go, but the next question is the HT, does it hamper ofpserver like linux does

here is some interesting readin

http://www.2cpu.com/articles/41_1.html

page 5 shows what OFP server would be simular to in these benchmarks, look for VolanoMark.

Share this post


Link to post
Share on other sites

after the HT question is answered the next will be the prescots, sure some say it hampers it self with a longer pipeline, but mabey not for OFP... i dont know if i will dish out for prescots in a few months, more FSB is is what i will be going for

Share this post


Link to post
Share on other sites
Quote[/b] ]it showed 53% usage, the reason it is lieing

It is not exacly "lying". It shows 50 % because the second CPU is idle - which is normal unless you are running two threads. There is an option in the taskman to show each CPU separatelly, however it will not help much, as the same thread will often get reschedulled to a different logical CPU (which is normal and does not present any performance problem per se).

Conclusion is: CPU usage on HT cpu is a difficult thing to measure.

Share this post


Link to post
Share on other sites

Yeah, for example in 7-Zip, which reads as maxing both logical CPUs on a HT system, does not mean the time to 7-Zip is halved.

I should get off my ass and time it again, It was still a marked improvement over a non-HT system though.

But anyway, with 2000+ AI surely it would make sense to run it on a well threaded server (software wise). Then we could really see P4-Northwoods take off in OFP:R Server performance, and Dual Xeon/Opteron systems would perform even better.

More and more people I know are switching from Counter-Strike to Flashpoint, as a AthlonXP2000+ can be had cheap and run Flashpoint well, even on a uni-student budget (aka: not me) so they are "evolving" their PCs and their gaming requirements.

Anyways, as I said earlier I would pay $$$$ lots for a well threaded OFP:R Server that runs on SuSe or Win2K/XP....I can not really justify spending $$$$ lots on VBS1 though.

(Can anyone confirm or deny if VBS1 Server is threaded any better than Resistance 1.96RC Server ?)

That VolanoMark is bloddy weird.

All we need now is a OFP:R-196RC-ServerMark to confirm if it really is that memroy intensive.

I have a slightly different explaination for the VolanoMark scores though. The 2 logical CPUs on the Northwood are competing for the same cache and same l2 cache using Java which really hurts performance. Same is true on UltraSPARC chips, thus the need for 8mb caches and more 'real' not logical CPUs when using good old "effecient" (sacasm) Java.

As Intel stated the PreScott has double the cache, and other features that 'help', thus when the 2 logical CPUs are competing against each other the extra cache means more cache hits, thus performance does not drop as much.

(Quote: Yes, its results were better with HT disabled but when enabling HT we only saw a decrease in performance in the neighborhood of 12%. The Northwood-based processor suffered a 35% decrease in performance. Morbid, but true.)

I know Flashpoint uses a few scripts here and there, but the AI code can (and should) be threaded. Hell 64 squads of 12 soldiers each could use 64 threads which do not rely on results from previous calcuations to process, thus they can be run in unison.....(maybe even 64x12 threads, except there is a bit of overhead there, surely)

Share this post


Link to post
Share on other sites
Quote[/b] ]I know Flashpoint uses a few scripts here and there, but the AI code can (and should) be threaded. Hell 64 squads of 12 soldiers each could use 64 threads which do not rely on results from previous calcuations to process, thus they can be run in unison.....(maybe even 64x12 threads, except there is a bit of overhead there, surely)

This is something which looks nice and easy on a paper, but real implementation is much more difficult. All squads live in the same world and therefore use the same "objects". Proper "object locking" needs to be implemented, which is a huge task.

Share this post


Link to post
Share on other sites

"As Intel stated the PreScott has double the cache, and other features that 'help', thus when the 2 logical CPUs are competing against each other the extra cache means more cache hits, thus performance does not drop as much."

Yeah I have a prescott and it has 1mb i think.

Share this post


Link to post
Share on other sites

But Flashpoint 2 will have a fully threaded server....right.....right.....?

Share this post


Link to post
Share on other sites
Well, our new server will be an athlon64, so i guess i can tell you in a short time if i notice any difference biggrin_o.gif

i dont think you'll see any performance gains running a 32 bit app

Actually, all benchmarks show that you do see a big performance improvement running 32bit apps with the athlon64.

Share this post


Link to post
Share on other sites

Yeah the way I see it:

If it is HT+SSE2 optimized the P4(any core) will do well

If it is not HT or doesn't benefit from it then the Athlon 64/64FX tend to perform faster (plain x87 is over twice as fast as a P4-Northwood plain FPU - AT EQUAL CLOCK SPEEDS). SSE2 tilts it in Intels favour, as AMD opted for less strong SSE2 peformance, and faster x87 and SSE performance.

Downside is AMD-64 do not have HT function

See my Drystone etc scores above (last page)

That and with the AMD-64 the pipeline is shorter, more L1 cache, nuf said

Share this post


Link to post
Share on other sites

Just try it, I did.... Only lag you may see is a little skipping from any 200+ pingers.

This is a screen shot of our server, after we turned hyper threading on and binded, 2 OFP servers, and 1 ventrilo server to cpu0, cpu1, cpu2, and cpu3. I took this during a MFCTI1.16 game with 18 players 75 minutes into the game. Not a drop of desync everyone was 0. Most everyone was sub 100 pings.

Click Here

Share this post


Link to post
Share on other sites

i beleave HT works with win2k unlike linux, but im sure cpu 0 is maxed, i ran tests last night with my P4 2.8 HT, suma once said running a server and the game would be a world of lag issuses, not with the HT, it did show me the usaged can reach 100 with two powerful programs runnning, most things are leadin up to the fact that the server thread is run on the more powerful CPU0 and not HT unlike linux

we had zero dysync with our old P4 3.3 runing win2k and FD and were quite happy with her till she went missing (long story) she was the best, but now that we're back into the 800 fsb game i plan on making the whore a very busy place with WGL CTI since she will be combined in with www.wglcti.com

Share this post


Link to post
Share on other sites

what is funny is that we have had 6 or so servers

Sept 2001 - XP - AMD Athlon 1.4oc1.6 (139)-  512 meg SDR T1

Jan 2002 - XP - Dual AMD MP 1700 (133) - 1gig DDR T3

Dec 2002 -XP - P4 2.4 (533) - 1 gig DDR - 5mbit @klanhost

Mar 2003 - XP - P4 3.0 (800) - 1 gig DDR - 5mbit @klanhost <span style='color:red'><- world record braker</span>

Aug 2003 - win2k - P4 3.0oc3.3 (880) HT - 1 gig DDR - 5mbit

sept 2003 - win2k/FD - P4 3.0oc3.6 (960) HT - 1 gig DDR - 5mbit <span style='color:red'><- Worlds fastest OFP server</span>

April 2004 - Linux - Xeon 2.8 HT (533) HT - 2 gig DDR -100mbit

June 2004 Linux - P4 3.0 (800) - 1 gig DDR - 100mbit

Late june 2004 - win2k - P4 3.0 (800) - 1 gig DDR - 100mbit

coming in August 2004

win2k - P4 3.4 oc 3.96 (880) - 1 gig DDR - 100mbit

(i doubt the chip will function with 100% cpu usages past 10% oc)

right back to the P4s with thier raw power and win2k with its stability and support that linux just cannot offer

i dont know why i even tested linux, guess i'll mark it off as a "moment"

linux is great for web hosting, after that i hope win2k web editions KILLS it off

Share this post


Link to post
Share on other sites
HOME OF THE FASTEST OFP SERVER IN THE WORLD

regardless of CPUs and since we just got rid of our same identical server smile_o.gif i think any ones P4 3.0 HT with 800 fsb will wax it - which is why after runing dual xeons for 3 months we're switching back to P4s

both are strong but the P4 will win hands down

our old P4 has the record for conects and player in game

141 player conects

117 in game

frag house was 102 ingame and conects

tounge_o.gif

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×