Current derived performance counter metrics in XSL stylesheet and other post-processing tools. These are being expanded (additions, corrections, and suggestions are welcome). Note that users are free to define arbitrary metrics of their own choosing.
The complete PAPI preset listing is available at:
http://icl.cs.utk.edu/projects/papi/files/html_man/papi_presets.html
Metric | Formula | Instructions |
---|---|
Graduated instructions per cycle | PAPI_TOT_INS/PAPI_TOT_CYC |
Issued instructions per cycle | PAPI_TOT_IIS/PAPI_TOT_CYC |
Graduated floating point instructions per cycle | PAPI_FP_INS/PAPI_TOT_CYC |
Percentage floating point instructions | PAPI_FP_INS/PAPI_TOT_INS |
Ratio of graduated instructions to issued instructions | PAPI_TOT_INS/PAPI_TOT_IIS |
Percentage of cycles with no instruction issue | 100.0 * (PAPI_STL_ICY/PAPI_TOT_CYC) |
Data references per instruction | PAPI_L1_DCA/PAPI_TOT_INS |
Ratio of floating point instructions to L1 data cache accesses | PAPI_FP_INS/PAPI_L1_DCA |
Ratio of floating point instructions to L2 cache accesses (data) | PAPI_FP_INS/PAPI_L2_DCA |
Issued instructions per L1 instruction cache miss | PAPI_TOT_IIS/PAPI_L1_ICM |
Graduated instructions per L1 instruction cache miss | PAPI_TOT_INS/PAPI_L1_ICM |
L1 instruction cache miss ratio | PAPI_L2_ICR/PAPI_L1_ICR | Cache & Memory Hierarchy |
Graduated loads & stores per cycle | PAPI_LST_INS/PAPI_TOT_CYC |
Graduated loads & stores per floating point instruction | PAPI_LST_INS/PAPI_FP_INS |
L1 cache line reuse (data) | ((PAPI_LST_INS - PAPI_L1_DCM) / PAPI_L1_DCM) |
L1 cache data hit rate | 1.0 - (PAPI_L1_DCM/PAPI_LST_INS) |
L1 data cache read miss ratio | PAPI_L1_DCM/PAPI_L1_DCA |
L2 cache line reuse (data) | ((PAPI_L1_DCM - PAPI_L2_DCM) / PAPI_L2_DCM) |
L2 cache data hit rate | 1.0 - (PAPI_L2_DCM/PAPI_L1_DCM) |
L2 cache miss ratio | PAPI_L2_TCM/PAPI_L2_TCA |
L3 cache line reuse (data) | ((PAPI_L2_DCM - PAPI_L3_DCM) / PAPI_L3_DCM) |
L3 cache data hit rate | 1.0 - (PAPI_L3_DCM/PAPI_L2_DCM) |
L3 data cache miss ratio | PAPI_L3_DCM/PAPI_L3_DCA |
L3 cache data read ratio | PAPI_L3_DCR/PAPI_L3_DCA |
L3 cache instruction miss ratio | PAPI_L3_ICM/PAPI_L3_ICR |
Bandwidth used (Lx cache) | ((PAPI_Lx_TCM * Lx_linesize) / PAPI_TOT_CYC) * Clock(MHz) | Branching |
Ratio of mispredicted to correctly predicted branches | PAPI_BR_MSP/PAPI_BR_PRC | Processor Stalls |
Percentage of cycles waiting for memory access | 100.0 * (PAPI_MEM_SCY/PAPI_TOT_CYC) |
Percentage of cycles stalled on any resource | 100.0 * (PAPI_RES_STL/PAPI_TOT_CYC) | Aggregate Performance |
MFLOPS (CPU cycles) | (PAPI_FP_INS/PAPI_TOT_CYC) * Clock(MHz) |
MFLOPS (effective) | PAPI_FP_INS/Wallclock time |
MIPS (CPU cycles) | (PAPI_TOT_INS/PAPI_TOT_CYC) * Clock(MHz) |
MIPS (effective) | PAPI_TOT_INS/Wallclock time |
Processor utilization | (PAPI_TOT_CYC*Clock) / Wallclock time |