2025-05-31 19:50:53 版本 : CentOS 7 浮点运算能力测试流程
作者: 罗定丰 于 2025年05月31日 发布在分类 / 服务类文档 下,并于 2025年05月31日 编辑
 历史版本

备注 修改日期 修改人
实际交付版本 2025-06-05 16:09:38[当前版本] 罗定丰
创建版本 2025-05-31 19:50:53 罗定丰

浮点运算能力测试流程

1、前言及提要

本流程主要用于在CentOS 7.9环境下对CPU和GPU的浮点运算能力进行测试。

请注意本文以 root 用户为例。

请在root用户目录作为整个流程的基准,即/root目录。

2、环境准备

2.1 换源

CentOS 7.9 官方已停止维护,需要更换为国内源,在本流程中我们以阿里云源为例:

wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo

2.2 安装测试软件及所需依赖

2.2.1 安装 NVIDIA 驱动

安装dkms与内核头:

yum install epel

yum install dkms kernel-devel

在英伟达官方下载驱动并安装:

下载链接:https://us.download.nvidia.com/XFree86/Linux-x86_64/570.153.02/NVIDIA-Linux-x86_64-570.153.02.run

修改 nouveau 相关配置:

nano /lib/modprobe.d/dist-blacklist

找到nvidia行,加上#注释掉,并在底部加上以下两行:

blacklist nouveau

options nouveau modeset=0

修改完成后使用dracut -force命令生成新的内核并重启,

重启后使用 init 3 切换至文本模式继续后续操作。

添加运行权限:

chmod +x NVIDIA-Linux-x86_64-570.153.02.run

运行:

./NVIDIA-Linux-x86_64-570.153.02.run

全部按默认设置继续,安装完成后使用 init 5 切换回图形化模式,通常无需重启即可生效。

2.2.2 安装 CUDA Toolkit

在英伟达官网下载CUDA安装包:

wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda-repo-rhel7-12-4-local-12.4.0_550.54.14-1.x86_64.rpm

安装源:

sudo rpm -i cuda-repo-rhel7-12-4-local-12.4.0_550.54.14-1.x86_64.rpm

清理缓存:

sudo yum clean all

安装:

sudo yum -y install cuda-toolkit-12-4

2.2.3 安装依赖

sudo yum install -y gcc g++ make cmake
sudo yum groupinstall -y "Development Tools"

2.2.4 安装HPL

安装Intel MPI:

wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/dc93af13-2b3f-40c3-a41b-2bc05a707a80/intel-onemkl-2025.1.0.803_offline.sh

安装Intel MKL:

wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/6b6e395e-8f38-4da3-913d-90a2bcf41028/intel-mpi-2021.15.0.495_offline.sh

设置环境变量:

echo 'source /opt/intel/oneapi/setvars.sh' >> ~/.bashrc
source ~/.bashrc
ln -s /opt/intel/oneapi/compiler/latest/lib/libiomp5.so /lib/libiomp5.so

下载HPL源代码并解压:

wget http://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
tar -xzf hpl-2.3.tar.gz
cd hpl-2.3

创建编辑 Make.test 文件:

nano ~/hpl-2.3/Make.test

#  
#  -- High Performance Computing Linpack Benchmark (HPL)                
#     HPL - 2.3 - December 2, 2018                          
#     Antoine P. Petitet                                                
#     University of Tennessee, Knoxville                                
#     Innovative Computing Laboratory                                 
#     (C) Copyright 2000-2008 All Rights Reserved                       
#                                                                       
#  -- Copyright notice and Licensing terms:                             
#                                                                       
#  Redistribution  and  use in  source and binary forms, with or without
#  modification, are  permitted provided  that the following  conditions
#  are met:                                                             
#                                                                       
#  1. Redistributions  of  source  code  must retain the above copyright
#  notice, this list of conditions and the following disclaimer.        
#                                                                       
#  2. Redistributions in binary form must reproduce  the above copyright
#  notice, this list of conditions,  and the following disclaimer in the
#  documentation and/or other materials provided with the distribution. 
#                                                                       
#  3. All  advertising  materials  mentioning  features  or  use of this
#  software must display the following acknowledgement:                 
#  This  product  includes  software  developed  at  the  University  of
#  Tennessee, Knoxville, Innovative Computing Laboratory.             
#                                                                       
#  4. The name of the  University,  the name of the  Laboratory,  or the
#  names  of  its  contributors  may  not  be used to endorse or promote
#  products  derived   from   this  software  without  specific  written
#  permission.                                                          
#                                                                       
#  -- Disclaimer:                                                       
#                                                                       
#  THIS  SOFTWARE  IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
#  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,  INCLUDING,  BUT NOT
#  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
#  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
#  OR  CONTRIBUTORS  BE  LIABLE FOR ANY  DIRECT,  INDIRECT,  INCIDENTAL,
#  SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL DAMAGES  (INCLUDING,  BUT NOT
#  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
#  DATA OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER CAUSED AND ON ANY
#  THEORY OF LIABILITY, WHETHER IN CONTRACT,  STRICT LIABILITY,  OR TORT
#  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
#  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
# ######################################################################
#  
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -s
MKDIR        = mkdir
RM           = /bin/rm -f
TOUCH        = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH         = test
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir       = /root/hpl-2.3
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a 
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir        = /opt/intel/oneapi/mpi/latest
MPinc        = -I$(MPdir)/include
MPlib        = -lmpi
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir        = /opt/intel/oneapi/mkl/latest/lib
LAinc        = -I/opt/intel/oneapi/mkl/latest/include
LAlib        = $(LAdir)/libmkl_intel_lp64.a $(LAdir)/libmkl_intel_thread.a $(LAdir)/libmkl_core.a -liomp5 -lpthread -lm
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section  if and only if  you are not planning to use
# a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
# necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
# options.  **One and only one**  option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_              : all lower case and a suffixed underscore  (Suns,
#                       Intel, ...),                           [default]
# -DNoChange          : all lower case (IBM RS6000),
# -DUpCase            : all upper case (Cray),
# -DAdd__             : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,         [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle    : The string address is passed at the string loca-
#                       tion on the stack, and the string length is then
#                       passed as  an  F77_INTEGER  after  all  explicit
#                       stack arguments,                       [default]
# -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
#                       Fortran 77  string,  and the structure is of the
#                       form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal   : A structure is passed by value for each  Fortran
#                       77 string,  and  the  structure is  of the form:
#                       struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
#                       Cray  fcd  (fortran  character  descriptor)  for
#                       interoperation.
#
F2CDEFS      =
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L           force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS       call the cblas interface;
# -DHPL_CALL_VSIPL       call the vsip  library;
# -DHPL_DETAILED_TIMING  enable detailed timers;
#
# By default HPL will:
#    *) not copy L before broadcast,
#    *) call the BLAS Fortran 77 interface,
#    *) not display detailed timing information.
#
HPL_OPTS     = -DHPL_CALL_CBLAS
#
# ----------------------------------------------------------------------
#
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC           = /opt/intel/oneapi/mpi/latest/bin/mpicc -lpthread
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops
#
# On some platforms,  it is necessary  to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER       = /opt/intel/oneapi/mpi/latest/bin/mpicc -lpthread
LINKFLAGS    = $(CCFLAGS)
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
# ----------------------------------------------------------------------

编辑 hpl-2.3/bin/linux/HPL.dat 文件:

nano ~/hpl-2.3/bin/linux/HPL.dat


HPLinpack benchmark input file

Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name
1            number of problems sizes
50000        Ns (problem size)
1            number of NBs
192          NBs (block size)
1            number of process grids
8            Ps
8            Qs
16.0         threshold
1            number of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            number of recursive stopping criterium
4            NBMINs (>= 1)
1            number of panels in recursion
2            NDIVs
1            number of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            number of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            number of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

3、开始测试

3.1 CPU 浮点(使用 HPL )

cd ~/hpl-2.3/bin/linux
mpirun -np 64 ./xhpl

3.1.1 结果处理

如环境配置正确,应正常输出如下结果:

T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00L2L4       50000   192     8     8             123.45              123.456e+00

Gflops值(如123.456 GFLOPs)转换为FLOPs:123.456 × 10⁹ FLOPs。

对比理论值(2.15 TFLOPs = 2150 GFLOPs)

3.2 GPU 浮点 (使用 CUDA Toolkit )

cd /usr/local/cuda/samples/1_Utilities/deviceQuery
mkdir build
cmake ..
make
./deviceQuery
cd /usr/local/cuda/samples/0_Introduction/matrixMul
mkdir build
cmake ..
make
./matrixMul -wA=4096 -hA=4096 -wB=4096 -hB=4096

3.2.1 结果处理

记录性能结果(GFLOPS),转换为FLOPs。

理论单个GPU浮点计算能力 = 6912 × 1.41 × 10⁹ × 2 ≈ 1.95 × 10¹³ FLOPs ≈ 19.5 TFLOPs

理论8块GPU总浮点计算能力 = 8 × 19.5 TFLOPs = 156 TFLOPs

4、结语

通过上述流程,可以在CentOS 7.9 上准确测试机器的浮点性能,并验证是否符合理论计算值。

 

历史版本-目录  [回到顶端]
    知识分享平台 -V 5.2.5 -wcp