Collecting metrics¶
This page describes methods of Procfs [1] process metrics collection implemented in Procpath. Usually an analysis of an issue with Procpath is a 2-step process:
collect data on relevant processes
analyse the collected data visually and/or with SQL
Procpath can collect process metrics from any Linux system that can run
Python, which includes Android (e.g. via Termux [2]), arm64
NAS devices,
GitLab pipeline jobs, containers, and usual server and desktop machines.
Snapshot¶
procpath query provides JSON point-in-time slices of the process tree running on the target Linux system. It’s useful for answering:
specific questions about the process/fields in its the JSON document (how many open file descriptors does this process have?)
$ procpath query -f stat,fd --indent 2 \ '$..children[?(@.stat.pid == 42 and @.pop\("children", 1\))]..fd' [ { "anon": 12, "blk": 0, "chr": 7, "dir": 0, "fifo": 4, "lnk": 0, "reg": 118, "sock": 36 } ]
Note
@.pop("children", 1)
can be used to get rid of descendants of the matched process unless they match themselvesprocess hierarchy questions (what are the PIDs of all descendants of this process?),
$ procpath query -d, "..children[?(@.stat.pid == 42)]..pid" 7342,7733,7931,78880,78884
counting processes (how many Celery workers are running on the server?),
$ procpath query -d $'\n' \ '$..children[?("celery worker" in @.cmdline)].stat.comm' | wc -l 97
calculating aggregates (how much main memory does this docker-compose stack consume?),
$ L=$(docker ps -f status=running -f name='^project_name' -q | xargs -I{} -- \ docker inspect -f '{{.State.Pid}}' {} | tr '\n' ,) $ procpath query "$..children[?(@.stat.pid in [$L])]" \ 'SELECT SUM(stat_rss) / 1024.0 * 4 "RSS MiB" FROM record' [{"RSS MiB": 390.515625}]
It also comes at handy for crafting JSONPath queries for procpath record (see below).
As demonstrated by the examples above procpath query
accepts two positional
argument for the JSONPath and SQL query (see Design for details on the
dialects). Both are optional.
To use only SQL pass empty string for the JSONPath (what is the sum of proportional set sizes of all process on the system?).
$ sudo procpath query -f stat,smaps_rollup \
'' 'SELECT SUM(smaps_rollup_pss) / 1024.0 "PSS MiB" FROM record'
[{"PSS MiB": 4007.9482421875}]
Note
To read smaps_rollup
and some other procfiles you may need to be
the owner of the process (or root):
$ ls -l /proc/1/smaps_rollup
-r--r--r-- 1 root root 0 Sep 3 19:54 /proc/1/smaps_rollup
When a SQL query is specified the tree is flattened to a table (see Data model for details).
Timeline¶
procpath record essentially does the same as
procpath query "..." "SELECT * FROM record"
but instead of an ephemeral
SQLite database, it creates a persistent one and saves snapshots there in
specified intervals. JSONPath can be specified too to narrow down the process
tree, and SQL queries can be run on the result database (also while it’s being
recorded).
The most basic form of JSONPath for procpath record
is selecting a subtree
by a PID i.e. all descendant processes including the one with the PID
(record snapshots of the process subtree of PID 2610 every second until it
exists).
procpath record -i 1 --stop-without-result -d subtree.sqlite \
'$..children[?(@.stat.pid == 2610)]'
Note
JSONPath query used for procpath record
must yield full process
documents. I.e. $..children[?(@.stat.pid == 2610)]
, not
$..children[?(@.stat.pid == 2610)]..pid
.
Additionally procpath record
supports --pid-list
argument which
is a pre-filter which specifies PIDs of branches to keep in the tree before
reading procfiles other than stat
and before running a JSONPath against it.
It minimises resources needed to Procpath which is relevant when it records
multiple procfiles at sub-second intervals. For instance, having on a system
this tree:
PID 1
├─ PID 2
├─ PID 3
│ └─ PID 4
└─ PID 5
└─ PID 6
├─ PID 7
├─ PID 8
└─ PID 9
procpath record -f stat,io,status,fd,smaps_rollup --pid-list 3 ...
will
only read easy-to-parse stat
procfiles for all processes, and the rest
procfiles only for the processes below (including running a JSONPath query
against a smaller tree, if specified):
PID 1
├─ PID 2
└─ PID 3
└─ PID 4
Besides PID hierarchy JSONPapth queries, other types of filters can be formulated (record once a second for a minute all processes that have resident set size bigger than 512 MiB).
procpath record -i 1 -r 60 -d hog.sqlite \
'$..children[?(@.stat.rss > 512 * 1024 / 4 and @.pop\("children", 1\))]'
Note
stat.rss
is usually measured in 4 KiB memory pages, see
meta.page_size
in Data model for more details.