pahole
&
BTF augmentation update




Arnaldo Carvalho de Melo
acme@kernel.org
Red Hat
Montreal, 2025
http://vger.kernel.org/~acme/prez/lsfmm-bpf-2025/

What this is about?



  • pahole stewardship
  • New features !BTF
  • BTF improvements
  • Ongoing work
  • You can help!
  • pahole as a BTF swiss knife
pahole TODO: 2024 prez

  • Support kfunc decls in pfunct
  • parallel reproducible encoding of BTF
  • Further testing and merge of resilient split BTF

pahole news



  • Co-maintainer
  • Close to kernel release cadence
  • CI for pahole
  • Improved parallel mode

Thanks to



  • Work since last LSFMM/BPF:
  • ~140 patches
  • Alan Maguire
  • Eduard Zingerman
  • Ihor Solodrai
  • Stephan Brennan
  • And to reviewers!
Co-maintainer

  • Alan Maguire
  • Implemented lots of features
  • Reviewing patches for a long time
  • Mostly on the BTF conversion part
  • Already merging patches
  • And pushing to kernel.org
CI for pahole

  • github actions
  • tests 'next' branch
  • Then move to 'master'
tests

  • tests/ directory
  • Growing set
  • Lots more to do:
  • Compare BTF from different compilers, etc

tests/tests


root@number:/home/acme/git/pahole# tests/tests
  1: Validation of BTF encoding of functions; this may take some time: Ok
  2: Default BTF on a system without BTF: Ok
  3: Flexible arrays accounting: Ok
  4: Check that pfunct can print btf_decl_tags read from BTF: Ok
  5: Pretty printing of files using DWARF type information: Ok
  6: Parallel reproducible DWARF Loading/Serial BTF encoding: Ok
/home/acme/git/pahole
root@number:/home/acme/git/pahole#
					

Goodies for testing


$ pahole --running_kernel_vmlinux
/usr/lib/debug/lib/modules/6.13.6-100.fc40.x86_64/vmlinux
					
$ perf buildid-list -k
db7ce41bc9c6bdd99d5ff9508e161e3e1e078b42
					
$ file /usr/lib/debug/lib/modules/6.13.6-100.fc40.x86_64/vmlinux
/usr/lib/debug/lib/modules/6.13.6-100.fc40.x86_64/vmlinux:
  ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked,
  BuildID[sha1]=db7ce41bc9c6bdd99d5ff9508e161e3e1e078b42, with debug_info,
  not stripped
$
					

Flexible Arrays



Example


$ pahole --sizes --with_embedded_flexible_array | wc -l
100
					
$ pahole tty_bufhead
struct tty_bufhead {
	struct tty_buffer *        head;                 /*     0     8 */
	struct work_struct         work;                 /*     8    32 */
	struct mutex               lock;                 /*    40    32 */
	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
	atomic_t                   priority;             /*    72     4 */

	/* XXX 4 bytes hole, try to pack */

	struct tty_buffer          sentinel;             /*    80    32 */

	/* XXX last struct has a flexible array, 1 hole */

	struct llist_head          free;                 /*   112     8 */
	atomic_t                   mem_used;             /*   120     4 */
	int                        mem_limit;            /*   124     4 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	struct tty_buffer *        tail;                 /*   128     8 */

	/* size: 136, cachelines: 3, members: 9 */
	/* sum members: 132, holes: 1, sum holes: 4 */
	/* member types with holes: 1, total: 1 */
	/* flexible array members: end: 1 */
	/* last cacheline: 8 bytes */
};
					

flexible array


$ pahole tty_buffer
struct tty_buffer {
	union {
		struct tty_buffer * next;                /*     0     8 */
		struct llist_node  free;                 /*     0     8 */
	};                                               /*     0     8 */
	unsigned int               used;                 /*     8     4 */
	unsigned int               size;                 /*    12     4 */
	unsigned int               commit;               /*    16     4 */
	unsigned int               lookahead;            /*    20     4 */
	unsigned int               read;                 /*    24     4 */
	bool                       flags;                /*    28     1 */

	/* XXX 3 bytes hole, try to pack */

	u8                         data[];               /*    32     0 */

	/* size: 32, cachelines: 1, members: 8 */
	/* sum members: 29, holes: 1, sum holes: 3 */
	/* last cacheline: 32 bytes */
};
					

Rust



  • Minimal effort here
  • Really --lang_exclude on LTO builds
  • But kernel rust objects shouldn't crash pahole
  • When not excluded

What is being filtered


[acme@toolbox clang+thinlto+rust]$ pahole --verbose \
					  --btf_encode_detached=tmp_vmlinux1.btf \
					  --lang_exclude=rust tmp_vmlinux1
<SNIP>
Filtering CU /usr/lib/rustlib/src/rust/library/core/src/lib.rs/@/core.25793e9715209a7b-cgu.0 written in rust.
Filtering CU rust/compiler_builtins.rs/@/compiler_builtins.6f57a5fc13101438-cgu.0 written in rust.
Filtering CU /usr/lib/rustlib/src/rust/library/alloc/src/lib.rs/@/alloc.e373cb07bd07083a-cgu.0 written in rust.
Filtering CU rust/bindings/lib.rs/@/bindings.e4a37ad9e428c652-cgu.0 written in rust.
Filtering CU rust/kernel/lib.rs/@/kernel.f5c1c46b14036fc8-cgu.0 written in rust.
Filtering CU rust/uapi/lib.rs/@/uapi.52e8b2f8205e89f-cgu.0 written in rust.
<SNIP>
					

An example


$ pahole --btf_encode_detached=/tmp/ax88796b_rust.ko.btf \
	../build/v6.14.0-rc7+/drivers/net/phy/ax88796b_rust.ko
die__process_class: tag not supported 0x33 (variant_part) at <11e3e>!
					
$ bpftool btf dump file /tmp/ax88796b_rust.ko.btf
[16] STRUCT '{closure_env#0}<ax88796b_rust::Module>' size=8 vlen=1
        'module' type_id=3081 bits_offset=0
[17] STRUCT 'InitClosure<kernel::{impl#0}::init::{closure_env#0<ax88796b_rust::Module>,
					ax88796b_rust::Module, kernel::error::Error>' size=8 vlen=2
        '__0' type_id=16 bits_offset=0
        '__1' type_id=1682 bits_offset=64
					
$ pahole -F btf /tmp/ax88796b_rust.ko.btf
struct InitClosure<kernel::{impl#0}::init::{closure_env#0}<ax88796b_rust::Module>,
                                ax88796b_rust::Module, kernel::error::Error> {
        struct {closure_env#0}<ax88796b_rust::Module> __0; /*     0     8 */
        struct PhantomData<fn(*mut (kernel::error::Error, ax88796b_rust::Module)) ->
			*mut (kernel::error::Error, ax88796b_rust::Module)> __1; /*     8     0 */

        /* size: 8, cachelines: 1, members: 2 */
        /* last cacheline: 8 bytes */
};
					

New Kind Metadata


bpf_fastcall decl tag



  • kfuncs marked KF_FASTCALL
  • Chain of function attributes
  • To regenerate source code

pfunct bpf_fastcall


$ pfunct --prototype -F btf ../build/v6.13-rc2/vmlinux | grep bpf_kfunc | grep bpf_fastcall
bpf_kfunc bpf_fastcall void * bpf_rdonly_cast(const void  * obj__ign, u32 btf_id__k);
bpf_kfunc bpf_fastcall void * bpf_cast_to_kern_ctx(void * obj);
$
					

bpftool bpf_fastcall


$ bpftool btf dump file /sys/kernel/btf/vmlinux | grep bpf_fastcall
[148506] DECL_TAG 'bpf_fastcall' type_id=72111 component_idx=-1
[148508] DECL_TAG 'bpf_fastcall' type_id=73863 component_idx=-1
					
$ bpftool btf dump file /sys/kernel/btf/vmlinux | grep -w 72111
[72111] FUNC 'bpf_cast_to_kern_ctx' type_id=72110 linkage=static
[148505] DECL_TAG 'bpf_kfunc' type_id=72111 component_idx=-1
[148506] DECL_TAG 'bpf_fastcall' type_id=72111 component_idx=-1
					

global vars



  • Stephen Brennan
  • drgn, bpftrace, etc would love
  • perf data-type profiling wants this as well
  • per_cpu vars encoded
  • --btf-features=+global_var
  • To encode all global vars
  • Modular?

Size diff


$ readelf -wi vmlinux | head -20 | grep DW_AT_comp_dir
    <17>   DW_AT_comp_dir    : (indirect line string, offset: 0x21): /home/acme/git/build/v6.14-rc7+
					
$ pahole -j --btf_encode_detached=no_global_vars vmlinux
$ stat -c "%s" no_global_vars
6115636
					
$ pahole -j --btf_encode_detached=global_vars --btf_features=+global_var vmlinux
$ stat -c "%s" global_vars 
8249077
$ echo $(((8249077 - 6115636) / 1024)) Kb
2083 Kb
					

how many


$ bpftool btf dump file no_global_vars | grep -w VAR | wc -l
822
					
$ bpftool btf dump file global_vars | grep -w VAR | wc -l
76472
					

The variables


$ bpftool btf dump file no_global_vars | grep -w VAR | head -5
[3038] VAR 'kstack_offset' type_id=35, linkage=global
[4135] VAR 'cpu_loops_per_jiffy' type_id=1, linkage=static
[4436] VAR 'runtime_data' type_id=4391, linkage=static
[4437] VAR 'sev_vmsa' type_id=4392, linkage=static
[4438] VAR 'svsm_caa' type_id=4304, linkage=static
					
$ bpftool btf dump file global_vars | grep -w VAR | head -5
[3038] VAR '__tracepoint_initcall_level' type_id=658, linkage=global
[3039] VAR '__SCK__tp_func_initcall_level' type_id=331, linkage=global
[3040] VAR '__tracepoint_initcall_start' type_id=658, linkage=global
[3041] VAR '__SCK__tp_func_initcall_start' type_id=331, linkage=global
[3042] VAR '__tracepoint_initcall_finish' type_id=658, linkage=global
					

Tracepoints


$ bpftool btf dump file global_vars | grep -w VAR | grep trace_event | wc -l
3016
					
$ bpftool btf dump file global_vars | grep -w VAR | grep tracepoint | wc -l
2470
					

static call keys


$ pglobal -v -F btf global_vars | grep static_call_key | head -5
struct static_call_key     __SCK____perf_guest_get_ip;; /* 0 */
struct static_call_key     __SCK____perf_guest_handle_intel_pt_intr;; /* 0 */
struct static_call_key     __SCK____perf_guest_state;; /* 0 */
struct static_call_key     __SCK__aesni_ctr_enc_tfm;; /* 0 */
struct static_call_key     __SCK__amd_pmu_branch_add;; /* 0 */
					
$ pglobal -v -F btf global_vars | grep static_call_key | wc -l
1143
$
					

Uninteresting variables



Filtering



  • Hard coded in btf_encoder.c
  • __UNIQUE_ID
  • __tpstrtab_
  • __exitcall_
  • __func_stack_frame_non_standard_
  • Should move to a separate file
  • Maintained in the kernel sources?

Types of variables



  • Some in ELF sections (e.g. per-cpu)
  • Some with a prefix (tracepoints)
  • Others just part of steps of building the kernel
  • Stephen will try to do a breakdown
  • To see how much each type adds

BTF extras



pahole as a BTF tweaker



  • DWARF loader
  • BTF loader and encoder
  • BTF loader for pretty printing
  • Should generate BTF from BTF
  • Merging BTF from input for dedup
  • With libbpf's and libctf's deduper
  • Comparing them
  • Adding/Removing/Correcting

CTFv4



New workflow



  • kernel compiled with "gcc -gbtf"
  • binutils dedups gcc's BTF
  • link together pieces from vmlinux and modules
  • pahole does final touches if needed
  • DWARF to BTF conversion is no more.

Misc non-BTF TODO



  • splitlock annotation
  • Atomic types crossing cachelines
  • Documentation/arch/x86/buslock.rst
  • cacheline prefetch detection/annotation

THE END