Histograms - advanced 2

Input/Output

Sometimes it is necessary to save the analysis results in the form of histograms. This page describes how to do it using 3 file formats:

  • text files
  • numpy files
  • ROOT files

Basically a histogram can be represented by 2 arrays: the histogram content and the binning. More complicate considerations can be done on the errors but here this will be negleted and a simple sqrt(N) has to be considered as implicit. No errors-dedicated array will be treated.

Text files

This is not really useful and not recommended. It is not efficient and things can get very complicated if several histograms has to be saved. Just use it in case you want to inspect the histo content in a text file for some reason or you have to avoid any Python and/or ROOT related object.

Writing

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import mplhep as hep # <-- nice histogram plotting
In [2]:
# histo creation

# data & plot
data = np.random.normal(10,2,1000)
h, bins = np.histogram(data, bins=20)
hep.histplot(h,bins)
plt.show()

# saving txt files. The array size must be the same

np.savetxt('histos.txt', (h, bins[:-1]) )

The resulting file contains the 2 arrays in separate rows, which is the natural format for later import in loadtxt.

In [3]:
%cat histos.txt
2.000000000000000000e+00 3.000000000000000000e+00 8.000000000000000000e+00 5.000000000000000000e+00 1.600000000000000000e+01 2.300000000000000000e+01 4.500000000000000000e+01 6.100000000000000000e+01 7.500000000000000000e+01 1.010000000000000000e+02 1.370000000000000000e+02 1.090000000000000000e+02 1.010000000000000000e+02 9.100000000000000000e+01 8.100000000000000000e+01 5.700000000000000000e+01 4.400000000000000000e+01 1.900000000000000000e+01 1.700000000000000000e+01 5.000000000000000000e+00
3.258149725303574940e+00 3.860510940209892716e+00 4.462872155116210493e+00 5.065233370022528270e+00 5.667594584928846047e+00 6.269955799835163823e+00 6.872317014741481600e+00 7.474678229647799377e+00 8.077039444554117154e+00 8.679400659460434042e+00 9.281761874366752707e+00 9.884123089273071372e+00 1.048648430417938826e+01 1.108884551908570515e+01 1.169120673399202381e+01 1.229356794889834248e+01 1.289592916380465937e+01 1.349829037871097626e+01 1.410065159361729492e+01 1.470301280852361359e+01

If you want an array per column, just pack the 2 arrays and transpose the resulting matrix

In [4]:
stack = np.stack((h, bins[:-1]), axis=1) # use axis=0 to stack along rows and axis=1 for columns
np.savetxt('histos_stacked.txt', stack)
In [5]:
%cat histos_stacked.txt
2.000000000000000000e+00 3.258149725303574940e+00
3.000000000000000000e+00 3.860510940209892716e+00
8.000000000000000000e+00 4.462872155116210493e+00
5.000000000000000000e+00 5.065233370022528270e+00
1.600000000000000000e+01 5.667594584928846047e+00
2.300000000000000000e+01 6.269955799835163823e+00
4.500000000000000000e+01 6.872317014741481600e+00
6.100000000000000000e+01 7.474678229647799377e+00
7.500000000000000000e+01 8.077039444554117154e+00
1.010000000000000000e+02 8.679400659460434042e+00
1.370000000000000000e+02 9.281761874366752707e+00
1.090000000000000000e+02 9.884123089273071372e+00
1.010000000000000000e+02 1.048648430417938826e+01
9.100000000000000000e+01 1.108884551908570515e+01
8.100000000000000000e+01 1.169120673399202381e+01
5.700000000000000000e+01 1.229356794889834248e+01
4.400000000000000000e+01 1.289592916380465937e+01
1.900000000000000000e+01 1.349829037871097626e+01
1.700000000000000000e+01 1.410065159361729492e+01
5.000000000000000000e+00 1.470301280852361359e+01

Reading

In [6]:
H = np.loadtxt('histos.txt')
plt.plot(H[1], H[0], drawstyle='steps-pre')
plt.show()

# note that the hep.histplot cannot be used because it requires bins array to be N+1 wrt to content arrays.

In the case the histo arrays are saved in columns ('histo_stacked.txt) the unpack=True is necessary

In [7]:
H = np.loadtxt('histos_stacked.txt', unpack=True)
plt.plot(H[1], H[0], drawstyle='steps-pre')
plt.show()

Numpy files

Numpy files (usually npy extention, or npz when the compression is used) are more fast to read/write.

In [8]:
data = np.random.normal(10,2,1000)
h, bins = np.histogram(data, bins=20)
hep.histplot(h,bins)

plt.show()

Use savez to exploit compression. Moreover:

  • the arrays can ben "named"
  • the arrays can have different size
In [9]:
np.savez('histos',histo=h, binning=bins)
In [10]:
%ls -hrlt histos.*
-rw-r--r-- 1 valerio valerio  770 set  8 15:31 histos.npy
-rw-r--r-- 1 valerio valerio  11K set  8 16:04 histos.root
-rw-r--r-- 1 valerio valerio 1000 set  8 16:07 histos.txt
-rw-r--r-- 1 valerio valerio  838 set  8 16:07 histos.npz

Reading

The arrays are accessed in a dictionary way (key-value)

In [11]:
data = np.load('histos.npz')

# printout the arrays contained in the compressed archive
print(data.files)
['histo', 'binning']
In [12]:
hep.histplot(data['histo'], data['binning'])
Out[12]:
<AxesSubplot:>

Several histos can be packed as needed. For instance, these are 10 histos with the same binning (total = 11 arrays)

In [13]:
histolist=[]
for i in range(10):
    data = np.random.normal(10,2,1000)
    h, bins = np.histogram(data, bins=20)
    histolist.append(h)
    
np.savez('manyhistos.npz', binning=bins, *histolist) # a total or 11 arrays is saved. For the first one the name is specified, the following will take the default name

Reopen and plot data

In [14]:
data = np.load('manyhistos.npz')
print(data.files)
['binning', 'arr_0', 'arr_1', 'arr_2', 'arr_3', 'arr_4', 'arr_5', 'arr_6', 'arr_7', 'arr_8', 'arr_9']
In [15]:
fig, ax = plt.subplots()
for i in range(10):
    hep.histplot(data[f'arr_{i}'], data['binning'], label=f'arr_{i}')
    
plt.legend()
plt.show()

ROOT files

The uproot module can be used for ROOT file reading/writing provided the histogram is packed as a tuple of 2 arrays (histo content, bins) or 3 arrays for 2D histos (content, bin x, bin y)

Writing ROOT files

In [16]:
data = np.random.normal(10,2,1000)
h, bins = np.histogram(data, bins=20)


import uproot

# opening a file for writing (open is readonly)
outfile = uproot.recreate('histos.root')
outfile['myhisto'] = (h, bins)

Opening with ROOT:

root -l histos.root
root [0] myhisto->Draw()

For several histos:

In [17]:
outfile = uproot.recreate('manyhistos.root')

for i in range(10):
    data = np.random.normal(10,2,100000)
    h, bins = np.histogram(data, bins=20)
    # names will be myhisto0, myhisto1, ...
    outfile[f'myhisto{i}'] = (h, bins)
    

Opening and plotting with ROOT

root -l -b -q open_multi_histo.C

Macro content:

void open_multi_histo(){
    TFile * f = new TFile("manyhistos.root");

    TCanvas * c = new TCanvas();
    c->Divide(2,5);
    TH1F * h[10] ;
    for (int i=0;i<10;i++) {
        h[i] = (TH1F *) f->Get(Form("myhisto%d", i));
        c->cd(i+1);
        h[i]->Draw();
    }
    c->SaveAs("many.png");
    return;
}

Reading from ROOT files

In [18]:
inputfile = uproot.open('manyhistos.root')
print(inputfile.keys())
[b'myhisto0;1', b'myhisto1;1', b'myhisto2;1', b'myhisto3;1', b'myhisto4;1', b'myhisto5;1', b'myhisto6;1', b'myhisto7;1', b'myhisto8;1', b'myhisto9;1']
In [19]:
for i in range(10):
    hep.histplot(inputfile[f'myhisto{i}'], label=f'myhisto{i}')
    
plt.legend()
plt.show()
In [ ]: