GEMC Analyzer
analyzer is a small Python package for reading GEMC output files and plotting
variables by name. It currently focuses on CSV and ROOT output from gstreamer,
with a reader structure that can be extended to other formats later.
Dependencies
The main analyzer API uses the standard scientific Python stack:
python3 -m pip install pandas numpy matplotlib
ROOT output support also requires uproot:
python3 -m pip install uproot
ROOT prerequisites:
- GEMC must be built with ROOT support and the ROOT streamer plugin available.
- The run must use
gstreamerformatroot. - Reading ROOT files from Python does not require importing C++ ROOT; the analyzer uses
uproot.
The dependency-free SVG helper in analyzer/svg_plot.py only uses the Python standard library. It is useful on minimal systems where pandas, numpy, and matplotlib are not installed.
GEMC Output Model
The CSV streamer writes two flattened files per worker thread:
<rootname>_t<thread>_true_info.csv
<rootname>_t<thread>_digitized.csv
For one thread and filename: b2, the files are typically:
b2_t0_true_info.csv
b2_t0_digitized.csv
CSV files are read with:
pd.read_csv(path, sep=",", skipinitialspace=True)
The CSV rows include event context columns such as:
evn, timestamp, thread_id, detector
The digitized B2 output includes columns like:
hitn, pid, tid, E, time, totEdep
The true-info output includes tracking columns like:
processName, avgTime, avgx, avgy, avgz, hitn, pid, tid, mtid, vx, vy, vz, mvx, mvy, mvz, totalEDeposited
When the matching digitized CSV is available, the analyzer also adds E to true-info tables by matching rows on event, detector, hit, PID, and track ID. In that case E is the track total energy, while totalEDeposited remains the deposited energy.
The vx, vy, and vz columns are the current track vertex coordinates. The mvx, mvy, and mvz columns are the mother-track vertex coordinates when the mother track was available to GEMC hit processing; otherwise they use the GEMC uninitialized numeric sentinel. The mtid column stores the mother track id.
The ROOT streamer writes one ROOT file per worker thread. For one thread and filename: b2, the file is typically:
b2_t0.root
The file contains trees named:
event_header
run_header
true_info_<detector>
digitized_<detector>
ROOT detector trees store vector branches. The analyzer flattens each vector element into one DataFrame row.
Python API
Read one digitized CSV file:
from analyzer import read_output
output = read_output("b2_t0_digitized.csv")
df = output.get_frame("digitized")
print(df.columns)
Read a CSV root name when both files exist:
from analyzer import read_output
output = read_output("b2_t0", kind="csv")
print(output.summary())
Plot totEdep grouped by pid:
from analyzer import plot_variable, read_output
output = read_output("b2_t0_digitized.csv")
plot_variable(
output,
"totEdep",
data="digitized",
bins=30,
xlim=(0.0, 0.1),
show=True,
)
Read ROOT output:
from analyzer import read_output
output = read_output("b2_t0.root", kind="root")
df = output.get_frame("digitized", detector="flux")
Command-Line Usage
Run python3 -m analyzer from the GEMC source directory, where the analyzer package directory is visible to Python.
The -m flag takes a module name, not a filesystem path. Do not run python3 -m ../analyzer. If your shell is in another directory, move back to the source directory or set PYTHONPATH.
Print a summary:
python3 -m analyzer digitized.csv
Plot a digitized variable with matplotlib:
python3 -m analyzer digitized.csv totEdep --kind csv --xlim 0.0 0.1
Save a plot instead of showing it:
python3 -m analyzer digitized.csv totEdep --kind csv --save b2_totEdep.png
Plot ROOT output with matplotlib:
python3 -m analyzer b2_t0.root totEdep --kind root --detector flux --save b2_totEdep.png
Plot a true-info track vertex coordinate:
python3 -m analyzer true_info.csv vx --kind csv --data true_info --save b2_vertex_x.png
Analyzer Plot Examples
The examples below were produced from 1000-event GEMC CSV runs. The -n=1000
command-line option overrides the event count for the run without modifying the
YAML file.
Plot the total energy deposited in the B2 digitized output:
python3 -m analyzer digitized.csv totEdep --kind csv --bins 50

Plot the true-info track energy in the B2 output:
python3 -m analyzer true_info.csv E --kind csv --data true_info --bins 50

Plot the total energy deposited in the simple_flux digitized output:
python3 -m analyzer digitized.csv totEdep --kind csv --bins 50

Plot the hit time in the simple_flux digitized output:
python3 -m analyzer digitized.csv time --kind csv --bins 50

Plot the particle energy in the cherenkov digitized output:
python3 -m analyzer digitized.csv E --kind csv --bins 50

Plot the hit time in the cherenkov digitized output:
python3 -m analyzer digitized.csv time --kind csv --bins 50

Jupyter Usage
The analyzer can also be used directly in notebook-style Python cells:
#%%
from analyzer import read_output, plot_variable
plot_variable(
read_output("cherenkov_t0_digitized.csv"),
"totEdep",
data="digitized",
bins=30,
show=True,
)
Dependency-Free SVG Plot
If pandas, numpy, or matplotlib are unavailable, create an SVG histogram directly from the CSV file:
python3 -B analyzer/svg_plot.py b2_t0_digitized.csv totEdep --out b2_totEdep.svg --bins 30
Add an x-axis range with:
python3 -B analyzer/svg_plot.py b2_t0_digitized.csv totEdep --out b2_totEdep.svg --bins 30 --xlim 0.0 0.1
Run the B2 Example
Run these commands from the GEMC source directory.
Build the B2 geometry into a local SQLite database:
PYTHONDONTWRITEBYTECODE=1 PYTHONPATH=/opt/projects/gemc/src/api \
python3 examples/basic/b2/b2.py -f sqlite -sql gemc.db
Run GEMC with CSV output rooted at b2:
build/gemc examples/basic/b2/b2.yaml \
'-gsystem=[{name: b2, factory: sqlite}]' \
'-gstreamer=[{format: csv, filename: b2}]' \
-sql=gemc.db \
-n=1000
With one worker thread, this produces:
b2_t0_digitized.csv
b2_t0_true_info.csv
Inspect the digitized CSV header:
head -1 b2_t0_digitized.csv
Expected columns include:
evn, timestamp, thread_id, detector, hitn, pid, tid, E, time, totEdep
Create the totEdep plot with the main analyzer API:
python3 -m analyzer digitized.csv totEdep --kind csv --save b2_totEdep.png
Or create the same style of histogram without third-party Python packages:
python3 -B analyzer/svg_plot.py b2_t0_digitized.csv totEdep --out b2_totEdep.svg --bins 30
Run B2 With ROOT Output
To produce ROOT output instead of CSV, keep the same gemc.db and run:
build/gemc examples/basic/b2/b2.yaml \
'-gsystem=[{name: b2, factory: sqlite}]' \
'-gstreamer=[{format: root, filename: b2}]' \
-sql=gemc.db \
-n=1000
With one worker thread, this produces:
b2_t0.root
Read the ROOT file from Python if you want to inspect or manipulate the data before plotting:
from analyzer import plot_variable, read_output
output = read_output("b2_t0.root", kind="root")
print(output.summary())
df = output.get_frame("digitized", detector="flux")
print(df[["pid", "totEdep"]].head())
plot_variable(
output,
"totEdep",
data="digitized",
detector="flux",
bins=30,
show=True,
)
The Python inspection step is not required for plotting. To plot directly from the command line, use:
python3 -m analyzer b2_t0.root totEdep --kind root --detector flux --save b2_root_totEdep.png
If matplotlib reports that its default cache directory is not writable, set a writable MPLCONFIGDIR:
MPLCONFIGDIR=. python3 -m analyzer b2_t0.root totEdep --kind root --detector flux --save b2_root_totEdep.png
Extending Readers
New formats should return a GemcOutput object from analyzer.dataset.
Populate one or more of these maps:
GemcOutput(
true_info={"name": true_info_dataframe},
digitized={"name": digitized_dataframe},
headers={"event_header": event_header_dataframe},
)
Then add the format selection to read_output() in analyzer/readers.py.