PerfSuite Default PAPI-Derived Metrics

Current derived performance counter metrics in XSL stylesheet and other post-processing tools. These are being expanded (additions, corrections, and suggestions are welcome). Note that users are free to define arbitrary metrics of their own choosing.

The complete PAPI preset listing is available at:
http://icl.cs.utk.edu/projects/papi/files/html_man/papi_presets.html

Metric	Formula
Instructions
Graduated instructions per cycle	PAPI_TOT_INS/PAPI_TOT_CYC
Issued instructions per cycle	PAPI_TOT_IIS/PAPI_TOT_CYC
Graduated floating point instructions per cycle	PAPI_FP_INS/PAPI_TOT_CYC
Percentage floating point instructions	PAPI_FP_INS/PAPI_TOT_INS
Ratio of graduated instructions to issued instructions	PAPI_TOT_INS/PAPI_TOT_IIS
Percentage of cycles with no instruction issue	100.0 * (PAPI_STL_ICY/PAPI_TOT_CYC)
Data references per instruction	PAPI_L1_DCA/PAPI_TOT_INS
Ratio of floating point instructions to L1 data cache accesses	PAPI_FP_INS/PAPI_L1_DCA
Ratio of floating point instructions to L2 cache accesses (data)	PAPI_FP_INS/PAPI_L2_DCA
Issued instructions per L1 instruction cache miss	PAPI_TOT_IIS/PAPI_L1_ICM
Graduated instructions per L1 instruction cache miss	PAPI_TOT_INS/PAPI_L1_ICM
L1 instruction cache miss ratio	PAPI_L2_ICR/PAPI_L1_ICR
Cache & Memory Hierarchy
Graduated loads & stores per cycle	PAPI_LST_INS/PAPI_TOT_CYC
Graduated loads & stores per floating point instruction	PAPI_LST_INS/PAPI_FP_INS
L1 cache line reuse (data)	((PAPI_LST_INS - PAPI_L1_DCM) / PAPI_L1_DCM)
L1 cache data hit rate	1.0 - (PAPI_L1_DCM/PAPI_LST_INS)
L1 data cache read miss ratio	PAPI_L1_DCM/PAPI_L1_DCA
L2 cache line reuse (data)	((PAPI_L1_DCM - PAPI_L2_DCM) / PAPI_L2_DCM)
L2 cache data hit rate	1.0 - (PAPI_L2_DCM/PAPI_L1_DCM)
L2 cache miss ratio	PAPI_L2_TCM/PAPI_L2_TCA
L3 cache line reuse (data)	((PAPI_L2_DCM - PAPI_L3_DCM) / PAPI_L3_DCM)
L3 cache data hit rate	1.0 - (PAPI_L3_DCM/PAPI_L2_DCM)
L3 data cache miss ratio	PAPI_L3_DCM/PAPI_L3_DCA
L3 cache data read ratio	PAPI_L3_DCR/PAPI_L3_DCA
L3 cache instruction miss ratio	PAPI_L3_ICM/PAPI_L3_ICR
Bandwidth used (Lx cache)	((PAPI_Lx_TCM * Lx_linesize) / PAPI_TOT_CYC) * Clock(MHz)
Branching
Ratio of mispredicted to correctly predicted branches	PAPI_BR_MSP/PAPI_BR_PRC
Processor Stalls
Percentage of cycles waiting for memory access	100.0 * (PAPI_MEM_SCY/PAPI_TOT_CYC)
Percentage of cycles stalled on any resource	100.0 * (PAPI_RES_STL/PAPI_TOT_CYC)
Aggregate Performance
MFLOPS (CPU cycles)	(PAPI_FP_INS/PAPI_TOT_CYC) * Clock(MHz)
MFLOPS (effective)	PAPI_FP_INS/Wallclock time
MIPS (CPU cycles)	(PAPI_TOT_INS/PAPI_TOT_CYC) * Clock(MHz)
MIPS (effective)	PAPI_TOT_INS/Wallclock time
Processor utilization	(PAPI_TOT_CYC*Clock) / Wallclock time

PerfSuite
perfsuite@ncsa.uiuc.edu
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign