Quantcast
Channel: Media & Entertainment Technology » configuration
Viewing all articles
Browse latest Browse all 4

How (Not) to Generate Misleading Performance Results for Servers

$
0
0

November 11, 2015, ARM TechCon, Santa Clara, CA—Markus Levy from EEMBC talked about the need for care in reviewing benchmark results. Even though many benchmarks are easy to apply and provide hard numbers, they may not be appropriate for the units being measured. EEMBC has been in the standard benchmark arena for 18 years as a non-profit consortium that defines and develops application-specific benchmarks.

Traditional server performance measures tend to be single-threaded programs that represent databases, compliers, and interpreters on a single or few machines. Some CPU and memory benchmarks include linpack, specint, lmbench, coremark, etc. The problem, however, is that these benchmarks do not represent the workloads for big data and cloud operations.

For example, SPECInt is a mix of cache friendly and memory intensive applications that focus on CPU (scalar) performance. There is no I/O or hypervisor impact and no sharing or communications. An alternative is to use the MultiBench which is similar to specint except it includes OS and cooperative tasks.

Transaction-oriented benchmarks are not suitable for cloud and big data. For example TPC includes the system overhead, but requires big systems and can be large and expensive to set up. Specjbb requires Java, and the JVM can make a big difference in measurements. It also uses a transaction model.

Some other benchmarks include spec osg which addresses some components of cloud environments like SaaS, PaaS, and IaaS. It is geared towards hardware and cloud providers and measures such characteristics as agility, provisioning, elasticity, etc. The EPFL cloudsuite uses specific sets of workloads and doesn't specifically address SSaaS, PaaS, or IaaS.it is great for academic evaluations, but is not designed for ease of use, verification, or validity.

The problems with benchmarks are that the server configuration can lead to significantly different instruction miss rates. The big data and cloud workloads have highly differing datasets and their architectures are segmented for special purposes such as webserver, database, caching layer, map reduce cluster, etc. In addition, hardware will be configured for an app with varying memory size, optimization for scalar performance or throughput, storage capacity, and hardware accelerators. All of these components will change benchmark performance.

EEMBC has released scalemark to address the cloud and big data type workloads. Internal parameters are well defined and the tests are repeatable and verifiable. Memcache is used in data centers to optimize performance and energy use. The EEMBC version defines specific parameters to ensure repeatability and verifiability.

The general nature of eh EEMBC benchmarks have selected parameters that represent a strict definition of realistic benchmarks and translate to real application performance. The benchmarks use a standardized key value store implementation and a standardized server code base. The load generator and protocols as well as the workload parameters are standardized for number of connections, key/value sizes and distribution, IA distribution, and set/get ratios.

The reasons that the specific parameters are important is that measurements like queries per second (QPS) can vary over a 3:1 range depending on value distributions. The distributions for the benchmarks are based on a research paper from Sigmetrics 2012 "workload analysis of a large-scale key-value store" by Atikoglu, et. al. The resulting ETC profiles are used to set all workload parameters and the resulting benchmarks match real-world implementations.

The benchmark results can compare servers of different architectures that are running the same source code under the same operating conditions. Testing has shown that a configuration may have better throughput with a different number of connections for 32B get request. In another test, latency also varies with number of connections, with notable and significant differences between 1024 and 2048 connections for 32 B get results.

In fact, both servers got their best through put at 1536 connections and both achieved less than 10 ms latency, and under 1 ms at maximum throughput. They found that it is crucial to test under the exactly same conditions to avoid hitting a sweet spot or sore spot on one machine. They found that it is possible to optimize memcache to improve performance on either system.
 


Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles





Latest Images