Jump to content
Sign in to follow this  
Dwarden

Check your CPU>GPU GPU<CPU bandwidth etc.

Recommended Posts

point of this thread is to help these who got issues on that front (SMtoVRAM) or various stutters (DPC spikes etc)

Transfer speeds between GPU to CPU and vice versa may be very important to overall game smoothness

(textures transfers etc.)

download it from AMD.ATI

http://download2-developer.amd.com/amd/GPU/executables/PCIeSpeedTest_v0.2.zip

make sure you dont have running any other 3D application and any crucial work (in some cases restart of GPU subsystem done by VPU recovery may happen)

this test works with Intel and AMD CPU and AMD.ATI cards (unsure about nV but most likely not)

and something more 'real life like' (especially for ArmA 2 situation)

D3D Bandwidth Test from Kegetys:

http://www.kegetys.net/dl.php/D3Dbandwidth.zip

another test is bit older and the values is are way different (it's OpenGL only) yet it may give ya some values to compare

http://www.pbernert.com/pete_fbread_benchmark.zip

the idea for this thread come from way how ArmA 2 works and was confirmed e.g. by Hovora in that 'not so great test by some czech reviewer' :)

you can compare or expand the discussion for example into other forums there :

http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=110351

http://www.xtremesystems.org/forums/showthread.php?t=225823

http://forum.beyond3d.com/showthread.php?t=54321

http://www.rage3d.net/board/showthread.php?t=33946670

another important factor to watch over would be DPC (Deffered Procedure Calls) latencies

download http://www.thesycon.de/dpclat/dpclat.exe

read more http://www.thesycon.de/eng/latency_check.shtml

on ideal idle system state they <20us

and on most of systems depending on number applications / work and various drivers <75us stable

with high spikes only when something intensive is going on (e.g. starting new application or doing CPU or I/o intesive ops)

if your system shows different values or random / looping spikes then Your system gunna have perf issues with realtime data streams

Edited by Dwarden

Share this post


Link to post
Share on other sites

Max scores PCIe bandwidth

CPU->GPU 4.9Gb/s

GPU->CPU 3.1Gb/s

Max DPC latency 120µs

Don't know who can get <20µs.... Will test by disabling Network and Sound.

Edited by Dwarden

Share this post


Link to post
Share on other sites

Vista 64

Q9550 @3.4 GHz

HD4870 X2

8 GB RAM

CPU>GPU peak: 4.743 GB/s.

GPU>CPU peak: 6.583 GB/s.

It would've been nice if the PCIe speed test distributed the workload to all 4 cores instead of one at 100%. Think it would've improved my CPU>GPU transfer rate.

DPU latency hovers around ~75-85 µs.

I don't know if this is bad or not. What I know is ArmA 2 isn't running like it should. My card should be able to handle 1920 x 1200 @ 200% fillrate but it doesn't. I don't think ArmA 2 utilizes Crossfire properly.

Edit: D3D Bandwidth Test link doesn't work. Just sends me to Kegety's index and I can't find the link anywhere.

Edited by 7

Share this post


Link to post
Share on other sites

Kegetys link doesn't work.

Vista 64

Q9550 @3.4 GHz

HD4870 X2

8 GB RAM

CPU>GPU peak: 4.743 GB/s.

GPU>CPU peak: 6.583 GB/s.

It would've been nice if the PCIe speed test distributed the workload to all 4 cores instead of one at 100%. Think it would've improved my CPU>GPU transfer rate.

You're pretty much maxing it out already. Haven't seen scores higher.

Edited by Skeptic

Share this post


Link to post
Share on other sites

Devices found: 1

===> Testing device 0 <===

Device type: RV770

Max resource 2D width/height: 8192/8192

Total GPU memory size: 1024 MB

Total CPU cached space size: 64 MB

Total CPU uncached space size: 128 MB

GPU engine clock: 850 MHz

GPU memory clock: 975 MHz

Number of timing loops: 100

[ 16 bytes] CPU->GPU= 871.466 KB/sec, GPU->CPU 670.641 KB/sec

[ 32 bytes] CPU->GPU= 1.063 MB/sec, GPU->CPU 885.410 KB/sec

[ 64 bytes]

All gibberish to me...Anybody feel free to translate :P

Edited by RAINF

Share this post


Link to post
Share on other sites

Could someone with knowledge tell me what does those different speeds mean.

Pete's framebuffer read speed test

Mail: BlackDove@addcom.de

Homepage: http://home.t-online.de/home/PeteBernert/

Vendor: NVIDIA Corporation

Version: 3.0.0

Renderer: GeForce 9800 GT/PCI/SSE2

Frontbuffer reading speed (back)

---------------------------------

Format: 01 - Speed: 107.195 Mpix/s 321.585 MB/s

Format: 02 - Speed: 67.9608 Mpix/s 271.843 MB/s

Format: 03 - Speed: 35.6797 Mpix/s 107.039 MB/s

Format: 04 - Speed: 39.8459 Mpix/s 159.384 MB/s

Frontbuffer reading speed (front)

---------------------------------

Format: 01 - Speed: 41.0474 Mpix/s 123.142 MB/s

Format: 02 - Speed: 33.5544 Mpix/s 134.218 MB/s

Format: 03 - Speed: 27.2193 Mpix/s 81.6579 MB/s

Format: 04 - Speed: 25.9959 Mpix/s 103.984 MB/s

Second test

Frontbuffer reading speed (back)

---------------------------------

Format: 01 - Speed: 110.363 Mpix/s 331.088 MB/s

Format: 02 - Speed: 109.576 Mpix/s 438.305 MB/s

Format: 03 - Speed: 130.268 Mpix/s 390.804 MB/s

Format: 04 - Speed: 114.797 Mpix/s 459.189 MB/s

Frontbuffer reading speed (front)

---------------------------------

Format: 01 - Speed: 109.795 Mpix/s 329.384 MB/s

Format: 02 - Speed: 109.147 Mpix/s 436.588 MB/s

Format: 03 - Speed: 129.805 Mpix/s 389.415 MB/s

Format: 04 - Speed: 114.732 Mpix/s 458.927 MB/s

Edited by Potatomasher

Share this post


Link to post
Share on other sites

dpclat.exe, 1 minute length (windows idle):

max: 22 µs

av: 3-4 µs

readtest.exe, second run:

Frontbuffer reading speed (back)

---------------------------------

Format: 01 - Speed: 292.072 Mpix/s 876.216 MB/s

Format: 02 - Speed: 323.052 Mpix/s 1292.21 MB/s

Format: 03 - Speed: 368.662 Mpix/s 1105.99 MB/s

Format: 04 - Speed: 396.536 Mpix/s 1586.15 MB/s

Frontbuffer reading speed (front)

---------------------------------

Format: 01 - Speed: 296.572 Mpix/s 889.717 MB/s

Format: 02 - Speed: 320.384 Mpix/s 1281.53 MB/s

Format: 03 - Speed: 365.997 Mpix/s 1097.99 MB/s

Format: 04 - Speed: 392.692 Mpix/s 1570.77 MB/s

Edited by Dwarden

Share this post


Link to post
Share on other sites

Frontbuffer reading speed (back)

---------------------------------

Format: 01 - Speed: 213.21 Mpix/s 639.631 MB/s

Format: 02 - Speed: 229.376 Mpix/s 917.504 MB/s

Format: 03 - Speed: 259.588 Mpix/s 778.764 MB/s

Format: 04 - Speed: 280.625 Mpix/s 1122.5 MB/s

Frontbuffer reading speed (front)

---------------------------------

Format: 01 - Speed: 236.76 Mpix/s 710.279 MB/s

Format: 02 - Speed: 255.896 Mpix/s 1023.58 MB/s

Format: 03 - Speed: 296.878 Mpix/s 890.634 MB/s

Format: 04 - Speed: 322.328 Mpix/s 1289.31 MB/s

Edited by diveplane

Share this post


Link to post
Share on other sites

This also explains why running 200% fillrate on high resolutions kills framerates for most people:

A single frame rendered at 1920x1080 at 200% fillrate will essentially be 3840x2160 or roughly 8,3 megapixels large. That means in order to achieve just 25fps, you would need a transfer speed of over 200Mpix/s. From what I've seen here so far, average transfer speeds seem to be in the 200-350MPix/s range, giving people (theoretical) max. framerates of 25-42fps. And that's just by pure bandwidth limits, not even counting the actual rendering time.

All in all, I think we definitely need proper antialiasing. This fillrate optimizer just isn't a good solution.

Share this post


Link to post
Share on other sites

very useful these but any programs that allow us to optimise the findings brought these programs?

Share this post


Link to post
Share on other sites

[quote name='RAINF;1307672

[ 16 bytes] CPU->GPU= 871.466 KB/sec' date=' GPU->CPU 670.641 KB/sec

[ 32 bytes'] CPU->GPU= 1.063 MB/sec, GPU->CPU 885.410 KB/sec

[ 64 bytes]

All gibberish to me...Anybody feel free to translate :P

Let it finish and note max numbers for CPU->GPU= and GPU->CPU=. It might not finish 1Gb transfer to show Peak Value at the end of benchmark.

dpclat.exe, 1 minute length (windows idle):

max: 22 ms

av: 3-4 ms

Wow, that's pretty nice latency!

Share this post


Link to post
Share on other sites

Can someone please explain me, which factors can influence the results of this test?

Share this post


Link to post
Share on other sites

I'm a little worried that my GPU>CPU is only 212MB/s even though the CPU>GPU is 4.4GB/s. I'm running a core i7 920 OCd at 3.8GHZ, ATI 4870 512Mb and 6Gb RAM on Vista64. Arma2 runs great but I'm getting only 25FPS everywhere (on V HIGH settings)..... anyone got any ideas how to get up in the 4Gb/s GPU>CPU that others are getting ? Or am I fussing too much ?

Edited by Kremator

Share this post


Link to post
Share on other sites
I'm a little worried that my GPU>CPU is only 212MB/s even though the CPU>GPU is 4.4GB/s. I'm running a core i7 920 OCd at 3.8GHZ, ATI 4870 512Mb and 6Gb RAM on Vista64. Arma2 runs great but I'm getting only 25FPS everywhere (on V HIGH settings)..... anyone got any ideas how to get up in the 4Gb/s GPU>CPU that others are getting ? Or am I fussing too much ?
how many PCI-E cards do you have installed? and what is your MB,? you may be only running in 4x mode? Then there is latest Bios, and newest chipset drivers ect

Share this post


Link to post
Share on other sites

Just the one card and mobo is an ASUS P6T Deluxe... gonna check that I'm not running in 4x mode. Will check drivers etc .... but it DOES seem a little on the low side !

Share this post


Link to post
Share on other sites
Let it finish and note max numbers for CPU->GPU= and GPU->CPU=. It might not finish 1Gb transfer to show Peak Value at the end of benchmark.

Wow, that's pretty nice latency!

maybe my damn old p35 board :)

Share this post


Link to post
Share on other sites
Can someone please explain me, which factors can influence the results of this test?

I'd say Motherboard has an important role (chipset, north and southbridge).

CPU, FSB and memory.

As for DCP Latency Checker:

Unfortunately, many existing device drivers do not conform to this advice. Such drivers spend an excessive amount of time in their DPC routines, causing an exceptional large latency for any other driver's DPCs. For a device driver that handles data streams in real-time it is crucial that a DPC scheduled from its interrupt routine is executed before the hardware issues the next interrupt. If the DPC is delayed and runs after the next interrupt occurred, typically a hardware buffer overrun occurs and the flow of data is interrupted. A drop-out occurs.

Share this post


Link to post
Share on other sites

i run the sim across x2 19inch monitors 2304x864

not bad performance for me yet to test multiplayer though.

Share this post


Link to post
Share on other sites

Peak:

Cpu->GPU: 2.9 GB/s

GPU->CPU: 840 MB/s

:eek::j:

Share this post


Link to post
Share on other sites

Frontbuffer reading speed (back)

---------------------------------

Format: 01 - Speed: 163.425 Mpix/s 490.275 MB/s

Format: 02 - Speed: 174.348 Mpix/s 697.39 MB/s

Format: 03 - Speed: 200.649 Mpix/s 601.948 MB/s

Format: 04 - Speed: 213.189 Mpix/s 852.754 MB/s

Frontbuffer reading speed (front)

---------------------------------

Format: 01 - Speed: 160.738 Mpix/s 482.214 MB/s

Format: 02 - Speed: 173.386 Mpix/s 693.546 MB/s

Format: 03 - Speed: 195.887 Mpix/s 587.661 MB/s

Format: 04 - Speed: 215.81 Mpix/s 863.24 MB/s

Share this post


Link to post
Share on other sites

this are my results

Frontbuffer reading speed (back)

---------------------------------

Format: 01 - Speed: 230.04 Mpix/s 690.12 MB/s

Format: 02 - Speed: 248.906 Mpix/s 995.625 MB/s

Format: 03 - Speed: 285.041 Mpix/s 855.124 MB/s

Format: 04 - Speed: 320.78 Mpix/s 1283.12 MB/s

Frontbuffer reading speed (front)

---------------------------------

Format: 01 - Speed: 231.259 Mpix/s 693.776 MB/s

Format: 02 - Speed: 247.256 Mpix/s 989.026 MB/s

Format: 03 - Speed: 283.46 Mpix/s 850.379 MB/s

Format: 04 - Speed: 319.736 Mpix/s 1278.94 MB/s

Second test in the first one Arma 2 was running in the back ground

Frontbuffer reading speed (back)

---------------------------------

Format: 01 - Speed: 232.381 Mpix/s 697.142 MB/s

Format: 02 - Speed: 249.063 Mpix/s 996.252 MB/s

Format: 03 - Speed: 281.972 Mpix/s 845.917 MB/s

Format: 04 - Speed: 322.63 Mpix/s 1290.52 MB/s

Frontbuffer reading speed (front)

---------------------------------

Format: 01 - Speed: 231.118 Mpix/s 693.353 MB/s

Format: 02 - Speed: 247.169 Mpix/s 988.677 MB/s

Format: 03 - Speed: 284.78 Mpix/s 854.34 MB/s

Format: 04 - Speed: 323.457 Mpix/s 1293.83 MB/s

Edited by Richieb0y

Share this post


Link to post
Share on other sites

Vendor: ATI Technologies Inc.

Version: 2.1.8674

Renderer: ATI Radeon HD 4870 X2

---------------------------------

OpenGL extensions:

GL_AMDX_vertex_shader_tessellator GL_AMD_draw_buffers_blend GL_AMD_performance_monitor GL_AMD_texture_texture4 GL_ARB_color_buffer_float GL_ARB_copy_buffer GL_ARB_depth_buffer_float GL_ARB_depth_texture GL_ARB_draw_buffers GL_ARB_draw_instanced GL_ARB_fragment_program GL_ARB_fragment_program_shadow GL_ARB_fragment_shader GL_ARB_framebuffer_object GL_ARB_framebuffer_sRGB GL_ARB_half_float_pixel GL_ARB_half_float_vertex GL_ARB_instanced_arrays GL_ARB_map_buffer_range GL_ARB_multisample GL_ARB_multitexture GL_ARB_occlusion_query GL_ARB_pixel_buffer_object GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_shader_objects GL_ARB_shader_texture_lod GL_ARB_shading_language_100 GL_ARB_shadow GL_ARB_shadow_ambient GL_ARB_texture_border_clamp GL_ARB_texture_buffer_object GL_ARB_texture_compression GL_ARB_texture_compression_rgtc GL_ARB_texture_cube_map GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_float GL_ARB_texture_mirrored_repeat GL_ARB_texture_non_power_of_two GL_ARB_texture_rectangle GL_ARB_texture_rg GL_ARB_texture_snorm GL_ARB_transpose_matrix GL_ARB_vertex_array_object GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_window_pos GL_ATI_draw_buffers GL_ATI_envmap_bumpmap GL_ATI_fragment_shader GL_ATI_meminfo GL_ATI_separate_stencil GL_ATI_texture_compression_3dc GL_ATI_texture_env_combine3 GL_ATI_texture_float GL_ATI_texture_mirror_once GL_EXT_abgr GL_EXT_bgra GL_EXT_bindable_uniform GL_EXT_blend_color GL_EXT_blend_equation_separate GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_compiled_vertex_array GL_EXT_copy_buffer GL_EXT_copy_texture GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_draw_range_elements GL_EXT_fog_coord GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXT_framebuffer_object GL_EXT_framebuffer_sRGB GL_EXT_gpu_program_parameters GL_EXT_gpu_shader4 GL_EXT_multi_draw_arrays GL_EXT_packed_depth_stencil GL_EXT_packed_float GL_EXT_packed_pixels GL_EXT_point_parameters GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_specular_color GL_EXT_shadow_funcs GL_EXT_stencil_wrap GL_EXT_subtexture GL_EXT_texgen_reflection GL_EXT_texture3D GL_EXT_texture_array GL_EXT_texture_buffer_object GL_EXT_texture_compression_latc GL_EXT_texture_compression_rgtc GL_EXT_texture_compression_s3tc GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp GL_EXT_texture_env_add GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_EXT_texture_filter_anisotropic GL_EXT_texture_integer GL_EXT_texture_lod GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp GL_EXT_texture_object GL_EXT_texture_rectangle GL_EXT_texture_sRGB GL_EXT_texture_shared_exponent GL_EXT_texture_snorm GL_EXT_texture_swizzle GL_EXT_transform_feedback GL_EXT_vertex_array GL_IBM_texture_mirrored_repeat GL_KTX_buffer_region GL_NV_blend_square GL_NV_conditional_render GL_NV_copy_depth_to_color GL_NV_primitive_restart GL_NV_texgen_reflection GL_SGIS_generate_mipmap GL_SGIS_texture_edge_clamp GL_SGIS_texture_lod GL_SUN_multi_draw_arrays GL_WIN_swap_hint WGL_EXT_swap_control

---------------------------------

Frontbuffer reading speed (back)

---------------------------------

Format: 01 - Speed: 11.7965 Mpix/s 35.3894 MB/s

Format: 02 - Speed: 43.7776 Mpix/s 175.11 MB/s

Format: 03 - Speed: 4.10686 Mpix/s 12.3206 MB/s

Format: 04 - Speed: 46.0718 Mpix/s 184.287 MB/s

Frontbuffer reading speed (front)

---------------------------------

Format: 01 - Speed: 4.95889 Mpix/s 14.8767 MB/s

Format: 02 - Speed: 4.86901 Mpix/s 19.476 MB/s

Format: 03 - Speed: 4.93705 Mpix/s 14.8111 MB/s

Format: 04 - Speed: 4.87151 Mpix/s 19.486 MB/s

?????

latency 2-4ms on XP and 120 on Win7???

Edited by =S= Den

Share this post


Link to post
Share on other sites

btw use more that D3D one from Kegetys as the Pete's one is OpenGL based ...

his test is more real-life scenario for DX game ...

there is also CUDA based test (part of CUDA SDK) but that's similar to the AMD test

but they offer faster transfers than DX test from Kegetys

anyway keep posting results and if you need compare with more machines then take look in all the topics i posted ...

if you got some stuff i should add / update or etc. send me private message and i update the first post

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×