PerfSuite Default PAPI-Derived Metrics

Current derived performance counter metrics in XSL stylesheet and other post-processing tools. These are being expanded (additions, corrections, and suggestions are welcome). Note that users are free to define arbitrary metrics of their own choosing.

The complete PAPI preset listing is available at:
http://icl.cs.utk.edu/projects/papi/files/html_man/papi_presets.html

MetricFormula
Instructions
Graduated instructions per cycle PAPI_TOT_INS/PAPI_TOT_CYC
Issued instructions per cycle PAPI_TOT_IIS/PAPI_TOT_CYC
Graduated floating point instructions per cycle PAPI_FP_INS/PAPI_TOT_CYC
Percentage floating point instructions PAPI_FP_INS/PAPI_TOT_INS
Ratio of graduated instructions to issued instructions PAPI_TOT_INS/PAPI_TOT_IIS
Percentage of cycles with no instruction issue 100.0 * (PAPI_STL_ICY/PAPI_TOT_CYC)
Data references per instruction PAPI_L1_DCA/PAPI_TOT_INS
Ratio of floating point instructions to L1 data cache accesses PAPI_FP_INS/PAPI_L1_DCA
Ratio of floating point instructions to L2 cache accesses (data) PAPI_FP_INS/PAPI_L2_DCA
Issued instructions per L1 instruction cache miss PAPI_TOT_IIS/PAPI_L1_ICM
Graduated instructions per L1 instruction cache miss PAPI_TOT_INS/PAPI_L1_ICM
L1 instruction cache miss ratio PAPI_L2_ICR/PAPI_L1_ICR
Cache & Memory Hierarchy
Graduated loads & stores per cycle PAPI_LST_INS/PAPI_TOT_CYC
Graduated loads & stores per floating point instruction PAPI_LST_INS/PAPI_FP_INS
L1 cache line reuse (data) ((PAPI_LST_INS - PAPI_L1_DCM) / PAPI_L1_DCM)
L1 cache data hit rate 1.0 - (PAPI_L1_DCM/PAPI_LST_INS)
L1 data cache read miss ratio PAPI_L1_DCM/PAPI_L1_DCA
L2 cache line reuse (data) ((PAPI_L1_DCM - PAPI_L2_DCM) / PAPI_L2_DCM)
L2 cache data hit rate 1.0 - (PAPI_L2_DCM/PAPI_L1_DCM)
L2 cache miss ratio PAPI_L2_TCM/PAPI_L2_TCA
L3 cache line reuse (data) ((PAPI_L2_DCM - PAPI_L3_DCM) / PAPI_L3_DCM)
L3 cache data hit rate 1.0 - (PAPI_L3_DCM/PAPI_L2_DCM)
L3 data cache miss ratio PAPI_L3_DCM/PAPI_L3_DCA
L3 cache data read ratio PAPI_L3_DCR/PAPI_L3_DCA
L3 cache instruction miss ratio PAPI_L3_ICM/PAPI_L3_ICR
Bandwidth used (Lx cache) ((PAPI_Lx_TCM * Lx_linesize) / PAPI_TOT_CYC) * Clock(MHz)
Branching
Ratio of mispredicted to correctly predicted branches PAPI_BR_MSP/PAPI_BR_PRC
Processor Stalls
Percentage of cycles waiting for memory access 100.0 * (PAPI_MEM_SCY/PAPI_TOT_CYC)
Percentage of cycles stalled on any resource 100.0 * (PAPI_RES_STL/PAPI_TOT_CYC)
Aggregate Performance
MFLOPS (CPU cycles) (PAPI_FP_INS/PAPI_TOT_CYC) * Clock(MHz)
MFLOPS (effective) PAPI_FP_INS/Wallclock time
MIPS (CPU cycles) (PAPI_TOT_INS/PAPI_TOT_CYC) * Clock(MHz)
MIPS (effective) PAPI_TOT_INS/Wallclock time
Processor utilization (PAPI_TOT_CYC*Clock) / Wallclock time

PerfSuite
perfsuite@ncsa.uiuc.edu
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign