Collecting metrics

This page describes methods of Procfs [1] process metrics collection implemented in Procpath. Usually an analysis of an issue with Procpath is a 2-step process:

  1. collect data on relevant processes

  2. analyse the collected data visually and/or with SQL

Procpath can collect process metrics from any Linux system that can run Python, which includes Android (e.g. via Termux [2]), arm64 NAS devices, GitLab pipeline jobs, containers, and usual server and desktop machines.

Snapshot

procpath query provides JSON point-in-time slices of the process tree running on the target Linux system. It’s useful for answering:

  • specific questions about the process/fields in its the JSON document (how many open file descriptors does this process have?)

    $ procpath query -f stat,fd --indent 2 \
      '$..children[?(@.stat.pid == 42 and @.pop\("children", 1\))]..fd'
    [
      {
        "anon": 12,
        "blk": 0,
        "chr": 7,
        "dir": 0,
        "fifo": 4,
        "lnk": 0,
        "reg": 118,
        "sock": 36
      }
    ]
    

    Note

    @.pop("children", 1) can be used to get rid of descendants of the matched process unless they match themselves

  • process hierarchy questions (what are the PIDs of all descendants of this process?),

    $ procpath query -d, "..children[?(@.stat.pid == 42)]..pid"
    7342,7733,7931,78880,78884
    
  • counting processes (how many Celery workers are running on the server?),

    $ procpath query -d $'\n' \
        '$..children[?("celery worker" in @.cmdline)].stat.comm' | wc -l
    97
    
  • calculating aggregates (how much main memory does this docker-compose stack consume?),

    $ L=$(docker ps -f status=running -f name='^project_name' -q | xargs -I{} -- \
        docker inspect -f '{{.State.Pid}}' {} | tr '\n' ,)
    $ procpath query "$..children[?(@.stat.pid in [$L])]" \
        'SELECT SUM(stat_rss) / 1024.0 * 4 "RSS MiB" FROM record'
    [{"RSS MiB": 390.515625}]
    

It also comes at handy for crafting JSONPath queries for procpath record (see below).

As demonstrated by the examples above procpath query accepts two positional argument for the JSONPath and SQL query (see Design for details on the dialects). Both are optional.

To use only SQL pass empty string for the JSONPath (what is the sum of proportional set sizes of all process on the system?).

$ sudo procpath query -f stat,smaps_rollup \
  '' 'SELECT SUM(smaps_rollup_pss) / 1024.0 "PSS MiB" FROM record'
[{"PSS MiB": 4007.9482421875}]

Note

To read smaps_rollup and some other procfiles you may need to be the owner of the process (or root):

$ ls -l /proc/1/smaps_rollup
-r--r--r-- 1 root root 0 Sep  3 19:54 /proc/1/smaps_rollup

When a SQL query is specified the tree is flattened to a table (see Data model for details).

Timeline

procpath record essentially does the same as procpath query "..." "SELECT * FROM record" but instead of an ephemeral SQLite database, it creates a persistent one and saves snapshots there in specified intervals. JSONPath can be specified too to narrow down the process tree, and SQL queries can be run on the result database (also while it’s being recorded).

The most basic form of JSONPath for procpath record is selecting a subtree by a PID i.e. all descendant processes including the one with the PID (record snapshots of the process subtree of PID 2610 every second until it exists).

procpath record -i 1 --stop-without-result -d subtree.sqlite \
  '$..children[?(@.stat.pid == 2610)]'

Note

JSONPath query used for procpath record must yield full process documents. I.e. $..children[?(@.stat.pid == 2610)], not $..children[?(@.stat.pid == 2610)]..pid.

Additionally procpath record supports --pid-list argument which is a pre-filter which specifies PIDs of branches to keep in the tree before reading procfiles other than stat and before running a JSONPath against it. It minimises resources needed to Procpath which is relevant when it records multiple procfiles at sub-second intervals. For instance, having on a system this tree:

PID 1
├─ PID 2
├─ PID 3
│  └─ PID 4
└─ PID 5
   └─ PID 6
      ├─ PID 7
      ├─ PID 8
      └─ PID 9

procpath record -f stat,io,status,fd,smaps_rollup --pid-list 3 ... will only read easy-to-parse stat procfiles for all processes, and the rest procfiles only for the processes below (including running a JSONPath query against a smaller tree, if specified):

PID 1
├─ PID 2
└─ PID 3
   └─ PID 4

Besides PID hierarchy JSONPapth queries, other types of filters can be formulated (record once a second for a minute all processes that have resident set size bigger than 512 MiB).

procpath record -i 1 -r 60 -d hog.sqlite \
  '$..children[?(@.stat.rss > 512 * 1024 / 4 and @.pop\("children", 1\))]'

Note

stat.rss is usually measured in 4 KiB memory pages, see meta.page_size in Data model for more details.