Skip to content
This repository was archived by the owner on Apr 2, 2025. It is now read-only.

Commit d37524d

Browse files
final updates to hpctoolkit manual for the 2018-09 release.
1 parent e33feb4 commit d37524d

File tree

4 files changed

+84
-53
lines changed

4 files changed

+84
-53
lines changed
-111 KB
Binary file not shown.

doc/manual/HPCToolkit-users-manual.tex

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@
116116
% ***************************************************************************
117117
% ***************************************************************************
118118

119-
\title{\HPCToolkit{} User's Manual}
119+
\title{\HPCToolkit{} User's Manual\\[.5in]Version 2018.09}
120120
%\subtitle{}
121121

122122
\author{
@@ -449,7 +449,7 @@ \subsection{Measuring Application Performance}
449449
For instance:
450450
\begin{quote}
451451
\begin{verbatim}
452-
export HPCRUN_EVENT_LIST="PAPI_TOT_CYC@4000001"
452+
export HPCRUN_EVENT_LIST="CYCLES@f200"
453453
[<mpi-launcher>] app [app-arguments]
454454
\end{verbatim}
455455
\end{quote}
@@ -473,16 +473,17 @@ \subsubsection{Specifying Sample Sources}
473473

474474
\HPCToolkit{} primarily monitors an application using asynchronous sampling.
475475
Consequently, the most common option to \hpcrun{} is a list of sample sources that define how samples are generated.
476-
A sample source takes the form of an event name $e$ and period $p$ and is specified as \texttt{$e$@$p$}, \eg{}, \mytt{PAPI_TOT_CYC@4000001}.
476+
A sample source takes the form of an event name $e$ and \texttt{howoften}, specified as \texttt{$e$@howoften}. The specifier \texttt{howoften} may
477+
be a number, indicating a period, \eg{} \mytt{CYCLES@4000001} or it may be \texttt{f} followed by a number, \mytt{CYCLES@f200} indicating a frequency in samples/second.
477478
For a sample source with event $e$ and period $p$, after every \emph{p} instances of \emph{e}, a sample is generated that causes \hpcrun{} to inspect the and record information about the monitored application.
478479

479-
To configure \hpcrun{} with two samples sources, \texttt{$e_1$@$p_1$} and \texttt{$e_2$@$p_2$}, use the following options:
480+
To configure \hpcrun{} with two samples sources, \texttt{$e_1$@howoften$_1$} and \texttt{$e_2$@howoften$_2$}, use the following options:
480481
\begin{quote}
481-
\texttt{--event $e_1$@$p_1$ --event $e_2$@$p_2$}
482+
\texttt{--event $e_1$@howoften$_1$ --event $e_2$@howoften$_2$}
482483
\end{quote}
483484
To use the same sample sources with an \hpclink{}-ed application, use a command similar to:
484485
\begin{quote}
485-
\texttt{export HPCRUN\_EVENT\_LIST="$e_1$@$p_1$;$e_2$@$p_2$"}
486+
\texttt{export HPCRUN\_EVENT\_LIST="$e_1$@howoften$_1$;$e_2$@howoften$_2$"}
486487
\end{quote}
487488

488489

@@ -995,7 +996,7 @@ \section{Running and Analyzing MPI Programs}
995996
%
996997
\begin{quote}
997998
\begin{verbatim}
998-
export HPCRUN_EVENT_LIST="PAPI_TOT_CYC@4000001"
999+
export HPCRUN_EVENT_LIST="CYCLES@f200"
9991000
<mpi-launcher> app [app-arguments]
10001001
\end{verbatim}
10011002
%
@@ -1162,7 +1163,7 @@ \section{Running a Statically Linked Binary}
11621163
#PBS -l size=64
11631164
#PBS -l walltime=01:00:00
11641165
cd $PBS_O_WORKDIR
1165-
export HPCRUN_EVENT_LIST="PAPI_TOT_CYC@4000000 PAPI_L2_TCM@400000"
1166+
export HPCRUN_EVENT_LIST="CYCLES@f200 PERF_COUNT_HW_CACHE_MISSES@f200"
11661167
aprun -n 64 ./app arg ...
11671168
\end{verbatim}
11681169
\end{quote}
@@ -1206,6 +1207,9 @@ \chapter{FAQ and Troubleshooting}
12061207
\section{How do I choose \hpcrun{} sampling periods?}
12071208
\label{sec:troubleshooting:hpcrun-sample-periods}
12081209

1210+
When using sample sources for hardware counter and software counter events provided by Linux \verb|perf_events|,
1211+
we recommend that you use frequency-based sampling. The default frequency is 300 samples/second.
1212+
12091213
Statisticians use samples sizes of approximately 3500 to make accurate projections about the voting preferences of millions of people.
12101214
In an analogous way, rather than collect unnecessary large amounts of performance information, sampling-based performance measurement collects ``just enough'' representative performance data.
12111215
You can control \hpcrun{}'s sampling periods to collect ``just enough'' representative data even for very long executions and, to a lesser degree, for very short executions.
@@ -1214,8 +1218,8 @@ \section{How do I choose \hpcrun{} sampling periods?}
12141218
Since unimportant contexts are irrelevant to performance, as long as this condition is met (and as long as samples are not correlated, etc.), \HPCToolkit{}'s performance data should be accurate.
12151219

12161220
We typically recommend targeting a frequency of hundreds of samples per second.
1217-
For very short runs, you may need to try thousands of samples per second.
1218-
For very long runs, tens of samples per second can be quite reasonable.
1221+
For very short runs, you may need to collect thousands of samples per second to record an adequate number of samples.
1222+
For long runs, tens of samples per second may suffice for performance diagnosis.
12191223

12201224
Choosing sampling periods for some events, such as Linux timers, cycles and instructions, is easy given a target sampling frequency.
12211225
Choosing sampling periods for other events such as cache misses is harder.
@@ -1250,7 +1254,9 @@ \section{\hpcrun{} incurs high overhead! Why?}
12501254
\begin{quote}
12511255
\verb|hpcsummary --all <hpctoolkit-measurements>|
12521256
\end{quote}
1253-
Please let us know if there are problems.
1257+
Note: The \verb|hpcsummary| script is no longer included in the \verb|bin| directory of an \HPCToolkit{} installation;
1258+
it is a developer script that can be found in the \verb|libexec/hpctoolkit| directory.
1259+
Let us know if you encounter signficant problems with bad unwinds.
12541260

12551261
\item You have very long call paths where long is in the hundreds or thousands.
12561262
On x86-based architectures, try additionally using \hpcrun{}'s \texttt{RETCNT} event.
@@ -1323,7 +1329,7 @@ \section{\hpcviewer{} writes a long list of Java error messages to the terminal!
13231329
\texttt{\$HOME/.hpctoolkit/hpcviewer} \\
13241330
and run \hpcviewer{} again.
13251331

1326-
On MacOS, persistent state is currently stored within Mac app. If the Eclipse persistent state gets corrupted, one cant simply clear the workspace because some initial persistent state is needed for Eclipse to function properly. For MacOS, the thing to try is downloading a fresh copy of hpcviewer and running the freshly downloaded copy.
1332+
On MacOS, persistent state is currently stored within Mac app. If the Eclipse persistent state gets corrupted, one can't simply clear the workspace because some initial persistent state is needed for Eclipse to function properly. For MacOS, the thing to try is downloading a fresh copy of hpcviewer and running the freshly downloaded copy.
13271333

13281334
If one of the aforementioned suggestions doesn’t fix the problem, report a bug.
13291335

doc/manual/environ.tex

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,23 @@ \chapter{Environment Variables}
2828
\section{Environment Variables for Users}
2929
\label{user-env}
3030

31+
\paragraph{HPCTOOLKIT.}
32+
Under normal circumstances, there is no need to use this environment variable.
33+
However, there are two situations, however, \hpcrun{}
34+
\emph{must} consult the \verb+HPCTOOLKIT+ environment variable to determine the location
35+
of \HPCToolkit{}'s top-level installation directory:
36+
37+
\begin{itemize}
38+
\item On some systems, parallel job launchers (e.g., Cray's aprun) \emph{copy} the
39+
\hpcrun{} script to a different location. In this case, for \hpcrun{} to find libraries
40+
and utilities it needs at runtime, you must set the \verb+HPCTOOLKIT+ environment variable to
41+
\HPCToolkit{}'s top-level installation directory.
42+
\item
43+
If you launch the \hpcrun{} script via a file system link,
44+
you must set \verb+HPCTOOLKIT+ for the same reason.
45+
\end{itemize}
46+
47+
3148
\paragraph{HPCRUN\_EVENT\_LIST.}
3249

3350
This environment variable is used provide a set of (event, period)
@@ -219,21 +236,6 @@ \section{Environment Variables for Developers}
219236
core dump for each process, depending upon the settings for your
220237
system. Be careful!
221238

222-
\paragraph{HPCRUN\_QUIET}
223-
224-
If this unfortunately-named environment variable is set, HPCToolkit's
225-
measurement subsystem will turn on a default set of dynamic debugging
226-
variables to log information about HPCToolkit's stack unwinding
227-
based on on-the-fly binary analysis. If set, HPCToolkit's measurement
228-
subsystem log information associated with the following debug flags:
229-
TROLL (when a return address was not found algorithmically
230-
and \HPCToolkit{} begins looking for possible return address values
231-
on the call stack), SUSPICIOUS\_INTERVAL (when an x86 unwind recipe
232-
is suspicious because it indicates that a base pointer is saved on
233-
the stack when a return instruction is encountered) and DROP (when
234-
samples are dropped because the measurement infrastructure was
235-
unable to record a sample in a timely fashion).
236-
237239
\paragraph{HPCRUN\_FNBOUNDS\_CMD}
238240

239241
For dynamically-linked executables, this environment variable must

doc/manual/hpcrun.tex

Lines changed: 49 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -19,35 +19,42 @@ \section{Using \hpcrun{}}
1919

2020
The basic options for \hpcrun{} are \verb|-e| (or \verb|--event|) to
2121
specify a sampling source and rate and \verb|-t| (or \verb|--trace|) to
22-
turn on tracing. Sample sources are specified as `\verb|event@period|'
23-
where \verb|event| is the name of the source and \verb|period| is the
24-
period (threshold) for that event, and this option may be used
25-
multiple times. Note that a higher period implies a lower rate of
26-
sampling. The basic syntax for profiling an application with
22+
turn on tracing. Sample sources are specified as `\verb|event@howoften|'
23+
where \verb|event| is the name of the source and \verb|howoften| is either
24+
a number specifying the period (threshold) for that event, or \verb|f| followed by a number, \eg{}, \verb|@f100|
25+
specifying a target sampling frequency for the event in samples/second.\footnote{Frequency-based sampling and
26+
the frequency-based notation for {\tt howoften} is only
27+
available for sample sources managed by Linux {\tt perf\_events}. For Linux {\tt perf\_events}, \HPCToolkit{} uses
28+
a default sampling frequency of 300 samples/second.}
29+
Note that a higher period implies a lower rate of sampling.
30+
The \verb|-e| option may be used multiple times to specify that multiple
31+
sample sources be used for measuring an execution.
32+
The basic syntax for profiling an application with
2733
\hpcrun{} is:
2834

2935
\begin{quote}
3036
\begin{verbatim}
31-
hpcrun -t -e event@period ... app arg ...
37+
hpcrun -t -e event@howoften ... app arg ...
3238
\end{verbatim}
3339
\end{quote}
3440

35-
For example, to profile an application and sample every 15,000,000
36-
total cycles and every 400,000 L2 cache misses you would use:
41+
For example, to profile an application using hardware counter sample sources
42+
provided by Linux \verb|perf_events| and sample cycles at 300 times/second (the default sampling frequency) and sample every 4,000,000 instructions,
43+
you would use:
3744

3845
\begin{quote}
3946
\begin{verbatim}
40-
hpcrun -e PAPI_TOT_CYC@15000000 -e PAPI_L2_TCM@400000 app arg ...
47+
hpcrun -e CYCLES -e INSTRUCTIONS@4000000 app arg ...
4148
\end{verbatim}
4249
\end{quote}
4350

44-
The units for the \verb|WALLCLOCK| sample source are in microseconds,
51+
The units for timer-based sample sources (\verb|CPUTIME|, \verb|REALTIME|, and \verb|WALLCLOCK|) are microseconds,
4552
so to sample an application with tracing every 5,000 microseconds
4653
(200~times/second), you would use:
4754

4855
\begin{quote}
4956
\begin{verbatim}
50-
hpcrun -t -e WALLCLOCK@5000 app arg ...
57+
hpcrun -t -e CPUTIME@5000 app arg ...
5158
\end{verbatim}
5259
\end{quote}
5360

@@ -74,7 +81,7 @@ \section{Using \hpcrun{}}
7481

7582
\begin{quote}
7683
\begin{verbatim}
77-
mpirun -n 4 hpcrun -e PAPI_TOT_CYC@15000000 mpiapp arg ...
84+
mpirun -n 4 hpcrun -e CYCLES mpiapp arg ...
7885
\end{verbatim}
7986
\end{quote}
8087

@@ -103,7 +110,7 @@ \section{Using \hpclink{}}
103110

104111
% ===========================================================================
105112

106-
\section{Harware counter event names}
113+
\section{Harware Counter Event Names}
107114

108115
HPCToolkit uses libpfm4\cite{libpfm-www} to translate from an event name string to an event code recognized by the kernel.
109116
An event name is case insensitive and is defined as followed:
@@ -456,11 +463,11 @@ \subsection{PAPI}
456463
enough, the count for the loop as a whole (and up the tree) should be
457464
accurate.
458465

459-
\subsection{Wallclock, Realtime and Cputime}
466+
\subsection{WALLCLOCK, REALTIME and CPUTIME}
460467

461468
\HPCToolkit{} supports three timer-based sample sources: \verb|CPUTIME|,
462469
\verb|REALTIME| and \verb|WALLCLOCK|.
463-
The units for periods of these timers are all in microseconds.
470+
The unit for periods of these timers is microseconds.
464471

465472
Before describing this capability further, it is worth noting
466473
that the CYCLES event supported by Linux \perfevents{} or PAPI's \verb|PAPI_TOT_CYC|
@@ -550,7 +557,7 @@ \subsection{IO}
550557
thus is able to more accurately count the time spent in these
551558
functions.
552559

553-
\subsection{Memleak}
560+
\subsection{MEMLEAK}
554561

555562
The \verb|MEMLEAK| sample source counts the number of bytes allocated
556563
and freed. Like \verb|IO|, \verb|MEMLEAK| is a synchronous sample
@@ -651,9 +658,9 @@ \section{Process Fraction}
651658

652659
\begin{quote}
653660
\begin{tabular}{@{}cl}
654-
(dynamic) & \verb|hpcrun -f 0.10 -e event@period app arg ...| \\
655-
(dynamic) & \verb|hpcrun -f 1/10 -e event@period app arg ...| \\
656-
(static) & \verb|export HPCRUN_EVENT_LIST='event@period'| \\
661+
(dynamic) & \verb|hpcrun -f 0.10 -e event@howoften app arg ...| \\
662+
(dynamic) & \verb|hpcrun -f 1/10 -e event@howoften app arg ...| \\
663+
(static) & \verb|export HPCRUN_EVENT_LIST='event@howoften'| \\
657664
& \verb|export HPCRUN_PROCESS_FRACTION=0.10| \\
658665
& \verb|app arg ...|
659666
\end{tabular}
@@ -757,8 +764,8 @@ \section{Starting and Stopping Sampling}
757764

758765
\begin{quote}
759766
\begin{tabular}{@{}cl}
760-
(dynamic) & \verb|hpcrun -ds -e event@period app arg ...| \\
761-
(static) & \verb|export HPCRUN_EVENT_LIST='event@period'| \\
767+
(dynamic) & \verb|hpcrun -ds -e event@howoften app arg ...| \\
768+
(static) & \verb|export HPCRUN_EVENT_LIST='event@howoften'| \\
762769
& \verb|export HPCRUN_DELAY_SAMPLING=1| \\
763770
& \verb|app arg ...|
764771
\end{tabular}
@@ -791,17 +798,17 @@ \section{Environment Variables for \hpcrun{}}
791798
would be convenient for users.
792799

793800
\section{Platform-Specific Notes}
801+
\label{sec:platform-specific}
794802

795803
%
796804
% system specific notes for titan, keenland?
797805
%
798-
\subsection{Cray XE6 and XK6}
799-
\label{sec:platform-specific}
806+
\subsection{Cray Systems}
800807

801-
The ALPS job launcher used on Cray XE6 and XK6 systems copies
808+
The ALPS job launcher used on Cray systems copies
802809
programs to a special staging area before launching them,
803810
as described in Section~\ref{sec:env-vars}.
804-
Consequently, when using \hpcrun{} to monitor dynamically linked binaries on Cray XE6 and XK6 systems, you
811+
Consequently, when using \hpcrun{} to monitor dynamically-linked binaries on Cray systems, you
805812
should add the \verb|HPCTOOLKIT| environment variable to your launch
806813
script.
807814
Set \verb|HPCTOOLKIT| to the top-level \HPCToolkit{} installation directory
@@ -822,7 +829,7 @@ \subsection{Cray XE6 and XK6}
822829
export CRAY_ROOTFS=DSL
823830
824831
cd $PBS_O_WORKDIR
825-
aprun -n #nodes hpcrun -e event@period dynamic-app arg ...
832+
aprun -n #nodes hpcrun -e event@howoften dynamic-app arg ...
826833
\end{verbatim}
827834
\end{quote}
828835
% $
@@ -849,3 +856,19 @@ \subsection{Cray XE6 and XK6}
849856
correct settings for \verb|PATH|, \verb|HPCTOOLKIT|, etc. In that case,
850857
the easiest solution is to load the \verb|hpctoolkit| module. Try
851858
``\verb|module show hpctoolkit|'' to see if it sets \verb|HPCTOOLKIT|.
859+
860+
\subsection{Blue Gene/Q Systems}
861+
Blue Gene Q systems provide the \verb|WALLCLOCK| interval timer, but not the
862+
POSIX \verb|CPUTIME| and \verb|REALTIME| timers.
863+
864+
The Linux \verb|perf_events| subsystem is unavailable on Blue Gene Q systems.
865+
One should use the PAPI interface to monitor executions using hardware performance counters.
866+
867+
\subsection{ARM Systems}
868+
\HPCToolkit{}'s measurement infrastructure depends upon \verb|libunwind| for call stack unwinding on ARM.
869+
On some ARM systems, compilers put DWARF Function Descriptor Entries (FDEs) in the ELF \verb|.debug_frame| segment rather
870+
than the \verb|.eh_frame| segment. In such cases, \HPCToolkit{} requires a bleeding-edge version of \verb|libunwind| that is not included in
871+
\HPCToolkit{}'s \verb|hpctoolkit-externals| package.
872+
\footnote{We are in the midst of deprecating {\tt hpctoolkit-externals} as we move to a spack-based distribution system. While it is used for the
873+
current release, we are no longer maintaining it.}
874+
Contact the \HPCToolkit{} forum if you need a copy of a newer \verb|libunwind|.

0 commit comments

Comments
 (0)