Skip to content
Snippets Groups Projects
Commit a7fd93db authored by Arturo Sanchez's avatar Arturo Sanchez
Browse files

Upload changes and presentation

parent d913a2fb
No related branches found
No related tags found
No related merge requests found
File moved
%% Cell type:code id: tags: %% Cell type:markdown id: tags:
``` c++ # CSV to ROOT conversion
%jsroot on
``` %% Cell type:markdown id: tags:
In this notebook we will convert the csv file to a lighter ROOT file ready to be read using PyROOT.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` c++ ``` c++
#include "Riostream.h" #include "Riostream.h"
#include "TString.h" #include "TString.h"
#include "TFile.h" #include "TFile.h"
#include "TTree.h" #include "TTree.h"
#include "TSystem.h" #include "TSystem.h"
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
``` ```
%% Cell type:markdown id: tags:
Setting the path where our dataset is stored
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` c++ ``` c++
TString dir = gSystem->UnixPathName("/home/student/ejercicios-clase-08-datos/data-used/census.csv"); TString dir = gSystem->UnixPathName("/home/student/ejercicios-clase-08-datos/data-used/census.csv");
dir.ReplaceAll("census.C",""); dir.ReplaceAll("census.C","");
dir.ReplaceAll("/./","/"); dir.ReplaceAll("/./","/");
``` ```
%% Cell type:markdown id: tags:
Creating the ROOT file where the data will be stored
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` c++ ``` c++
TFile *f = new TFile("/home/student/ejercicios-clase-08-datos/data-used/census.root","RECREATE"); TFile *f = new TFile("/home/student/ejercicios-clase-08-datos/data-used/census.root","RECREATE");
``` ```
%% Cell type:markdown id: tags:
Reading the data and writing it to the ROOT file
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` c++ ``` c++
TTree *tree = new TTree("data","data from csv file"); TTree *tree = new TTree("data","data from csv file");
// The file inside has --------> "entry"/I:"HHID"/I:"A10itemcod"/I:"Farm_Implement"/I:"A10Q1_other"/I:"a10q1"/I:"a10q2"/I:"a10q3"/I:"a10q4"/I:"a10q5"/I:"a10q6"/I:"a10q7"/I:"a10q8"/I:"hh"/I:"wgt_X" // The file inside has --------> SUMLEV/I:REGION/I:DIVISION/I:STATE/I:COUNTY/I:STNAME/I:CTYNAME/I:CENSUS2010POP/I:ESTIMATESBASE2010/I:POPESTIMATE2010/I:POPESTIMATE2011/I:POPESTIMATE2012/I:POPESTIMATE2013/I:POPESTIMATE2014/I:POPESTIMATE2015/I:NPOPCHG_2010/I:NPOPCHG_2011/I:NPOPCHG_2012/I:NPOPCHG_2013/I:NPOPCHG_2014/I:NPOPCHG_2015/I:BIRTHS2010/I:BIRTHS2011/I:BIRTHS2012/I:BIRTHS2013/I:BIRTHS2014/I:BIRTHS2015/I:DEATHS2010/I:DEATHS2011/I:DEATHS2012/I:DEATHS2013/I:DEATHS2014/I:DEATHS2015/I:NATURALINC2010/I:NATURALINC2011/I:NATURALINC2012/I:NATURALINC2013/I:NATURALINC2014/I:NATURALINC2015/I:INTERNATIONALMIG2010/I:INTERNATIONALMIG2011/I:INTERNATIONALMIG2012/I:INTERNATIONALMIG2013/I:INTERNATIONALMIG2014/I:INTERNATIONALMIG2015/I:DOMESTICMIG2010/I:DOMESTICMIG2011/I:DOMESTICMIG2012/I:DOMESTICMIG2013/I:DOMESTICMIG2014/I:DOMESTICMIG2015/I:NETMIG2010/I:NETMIG2011/I:NETMIG2012/I:NETMIG2013/I:NETMIG2014/I:NETMIG2015/I:RESIDUAL2010/I:RESIDUAL2011/I:RESIDUAL2012/I:RESIDUAL2013/I:RESIDUAL2014/I:RESIDUAL2015/I:GQESTIMATESBASE2010/I:GQESTIMATES2010/I:GQESTIMATES2011/I:GQESTIMATES2012/I:GQESTIMATES2013/I:GQESTIMATES2014/I:GQESTIMATES2015/I:RBIRTH2011/I:RBIRTH2012/I:RBIRTH2013/I:RBIRTH2014/I:RBIRTH2015/I:RDEATH2011/I:RDEATH2012/I:RDEATH2013/I:RDEATH2014/I:RDEATH2015/I:RNATURALINC2011/I:RNATURALINC2012/I:RNATURALINC2013/I:RNATURALINC2014/I:RNATURALINC2015/I:RINTERNATIONALMIG2011/I:RINTERNATIONALMIG2012/I:RINTERNATIONALMIG2013/I:RINTERNATIONALMIG2014/I:RINTERNATIONALMIG2015/I:RDOMESTICMIG2011/I:RDOMESTICMIG2012/I:RDOMESTICMIG2013/I:RDOMESTICMIG2014/I:RDOMESTICMIG2015/I:RNETMIG2011/I:RNETMIG2012/I:RNETMIG2013/I:RNETMIG2014/I:RNETMIG2015
// tree->ReadFile("AGSEC10_clean.csv"/I:"entry/I/I:HHID/C/I:A10itemcod/I/I:Farm_Implement/C/I:A10Q1_other/C/I:a10q1/I/I:a10q2/I/I:a10q3/I/I:a10q4/I/I:a10q5/I/I:a10q6/I/I:a10q7/I/I:a10q8/I/I:hh/C/I:wgt_X/F"/I:'/I:');
// SUMLEV/I:REGION/I:DIVISION/I:STATE/I:COUNTY/I:STNAME/I:CTYNAME/I:CENSUS2010POP/I:ESTIMATESBASE2010/I:POPESTIMATE2010/I:POPESTIMATE2011/I:POPESTIMATE2012/I:POPESTIMATE2013/I:POPESTIMATE2014/I:POPESTIMATE2015/I:NPOPCHG_2010/I:NPOPCHG_2011/I:NPOPCHG_2012/I:NPOPCHG_2013/I:NPOPCHG_2014/I:NPOPCHG_2015/I:BIRTHS2010/I:BIRTHS2011/I:BIRTHS2012/I:BIRTHS2013/I:BIRTHS2014/I:BIRTHS2015/I:DEATHS2010/I:DEATHS2011/I:DEATHS2012/I:DEATHS2013/I:DEATHS2014/I:DEATHS2015/I:NATURALINC2010/I:NATURALINC2011/I:NATURALINC2012/I:NATURALINC2013/I:NATURALINC2014/I:NATURALINC2015/I:INTERNATIONALMIG2010/I:INTERNATIONALMIG2011/I:INTERNATIONALMIG2012/I:INTERNATIONALMIG2013/I:INTERNATIONALMIG2014/I:INTERNATIONALMIG2015/I:DOMESTICMIG2010/I:DOMESTICMIG2011/I:DOMESTICMIG2012/I:DOMESTICMIG2013/I:DOMESTICMIG2014/I:DOMESTICMIG2015/I:NETMIG2010/I:NETMIG2011/I:NETMIG2012/I:NETMIG2013/I:NETMIG2014/I:NETMIG2015/I:RESIDUAL2010/I:RESIDUAL2011/I:RESIDUAL2012/I:RESIDUAL2013/I:RESIDUAL2014/I:RESIDUAL2015/I:GQESTIMATESBASE2010/I:GQESTIMATES2010/I:GQESTIMATES2011/I:GQESTIMATES2012/I:GQESTIMATES2013/I:GQESTIMATES2014/I:GQESTIMATES2015/I:RBIRTH2011/I:RBIRTH2012/I:RBIRTH2013/I:RBIRTH2014/I:RBIRTH2015/I:RDEATH2011/I:RDEATH2012/I:RDEATH2013/I:RDEATH2014/I:RDEATH2015/I:RNATURALINC2011/I:RNATURALINC2012/I:RNATURALINC2013/I:RNATURALINC2014/I:RNATURALINC2015/I:RINTERNATIONALMIG2011/I:RINTERNATIONALMIG2012/I:RINTERNATIONALMIG2013/I:RINTERNATIONALMIG2014/I:RINTERNATIONALMIG2015/I:RDOMESTICMIG2011/I:RDOMESTICMIG2012/I:RDOMESTICMIG2013/I:RDOMESTICMIG2014/I:RDOMESTICMIG2015/I:RNETMIG2011/I:RNETMIG2012/I:RNETMIG2013/I:RNETMIG2014/I:RNETMIG2015
tree->ReadFile("/home/student/ejercicios-clase-08-datos/data-used/census.csv","SUMLEV/I:REGION/I:DIVISION/I:STATE/I:COUNTY/I:STNAME/C:CTYNAME/C:CENSUS2010POP/I:ESTIMATESBASE2010/I:POPESTIMATE2010/I:POPESTIMATE2011/I:POPESTIMATE2012/I:POPESTIMATE2013/I:POPESTIMATE2014/I:POPESTIMATE2015/I:NPOPCHG_2010/I:NPOPCHG_2011/I:NPOPCHG_2012/I:NPOPCHG_2013/I:NPOPCHG_2014/I:NPOPCHG_2015/I:BIRTHS2010/I:BIRTHS2011/I:BIRTHS2012/I:BIRTHS2013/I:BIRTHS2014/I:BIRTHS2015/I:DEATHS2010/I:DEATHS2011/I:DEATHS2012/I:DEATHS2013/I:DEATHS2014/I:DEATHS2015/I:NATURALINC2010/I:NATURALINC2011/I:NATURALINC2012/I:NATURALINC2013/I:NATURALINC2014/I:NATURALINC2015/I:INTERNATIONALMIG2010/I:INTERNATIONALMIG2011/I:INTERNATIONALMIG2012/I:INTERNATIONALMIG2013/I:INTERNATIONALMIG2014/I:INTERNATIONALMIG2015/I:DOMESTICMIG2010/I:DOMESTICMIG2011/I:DOMESTICMIG2012/I:DOMESTICMIG2013/I:DOMESTICMIG2014/I:DOMESTICMIG2015/I:NETMIG2010/I:NETMIG2011/I:NETMIG2012/I:NETMIG2013/I:NETMIG2014/I:NETMIG2015/I:RESIDUAL2010/I:RESIDUAL2011/I:RESIDUAL2012/I:RESIDUAL2013/I:RESIDUAL2014/I:RESIDUAL2015/I:GQESTIMATESBASE2010/I:GQESTIMATES2010/I:GQESTIMATES2011/I:GQESTIMATES2012/I:GQESTIMATES2013/I:GQESTIMATES2014/I:GQESTIMATES2015/I:RBIRTH2011/F:RBIRTH2012/F:RBIRTH2013/F:RBIRTH2014/F:RBIRTH2015/F:RDEATH2011/F:RDEATH2012/F:RDEATH2013/F:RDEATH2014/F:RDEATH2015/F:RNATURALINC2011/F:RNATURALINC2012/F:RNATURALINC2013/F:RNATURALINC2014/F:RNATURALINC2015/F:RINTERNATIONALMIG2011/F:RINTERNATIONALMIG2012/F:RINTERNATIONALMIG2013/F:RINTERNATIONALMIG2014/F:RINTERNATIONALMIG2015/F:RDOMESTICMIG2011/F:RDOMESTICMIG2012/F:RDOMESTICMIG2013/F:RDOMESTICMIG2014/F:RDOMESTICMIG2015/F:RNETMIG2011/F:RNETMIG2012/F:RNETMIG2013/F:RNETMIG2014/F:RNETMIG2015",','); tree->ReadFile("/home/student/ejercicios-clase-08-datos/data-used/census.csv","SUMLEV/I:REGION/I:DIVISION/I:STATE/I:COUNTY/I:STNAME/C:CTYNAME/C:CENSUS2010POP/I:ESTIMATESBASE2010/I:POPESTIMATE2010/I:POPESTIMATE2011/I:POPESTIMATE2012/I:POPESTIMATE2013/I:POPESTIMATE2014/I:POPESTIMATE2015/I:NPOPCHG_2010/I:NPOPCHG_2011/I:NPOPCHG_2012/I:NPOPCHG_2013/I:NPOPCHG_2014/I:NPOPCHG_2015/I:BIRTHS2010/I:BIRTHS2011/I:BIRTHS2012/I:BIRTHS2013/I:BIRTHS2014/I:BIRTHS2015/I:DEATHS2010/I:DEATHS2011/I:DEATHS2012/I:DEATHS2013/I:DEATHS2014/I:DEATHS2015/I:NATURALINC2010/I:NATURALINC2011/I:NATURALINC2012/I:NATURALINC2013/I:NATURALINC2014/I:NATURALINC2015/I:INTERNATIONALMIG2010/I:INTERNATIONALMIG2011/I:INTERNATIONALMIG2012/I:INTERNATIONALMIG2013/I:INTERNATIONALMIG2014/I:INTERNATIONALMIG2015/I:DOMESTICMIG2010/I:DOMESTICMIG2011/I:DOMESTICMIG2012/I:DOMESTICMIG2013/I:DOMESTICMIG2014/I:DOMESTICMIG2015/I:NETMIG2010/I:NETMIG2011/I:NETMIG2012/I:NETMIG2013/I:NETMIG2014/I:NETMIG2015/I:RESIDUAL2010/I:RESIDUAL2011/I:RESIDUAL2012/I:RESIDUAL2013/I:RESIDUAL2014/I:RESIDUAL2015/I:GQESTIMATESBASE2010/I:GQESTIMATES2010/I:GQESTIMATES2011/I:GQESTIMATES2012/I:GQESTIMATES2013/I:GQESTIMATES2014/I:GQESTIMATES2015/I:RBIRTH2011/F:RBIRTH2012/F:RBIRTH2013/F:RBIRTH2014/F:RBIRTH2015/F:RDEATH2011/F:RDEATH2012/F:RDEATH2013/F:RDEATH2014/F:RDEATH2015/F:RNATURALINC2011/F:RNATURALINC2012/F:RNATURALINC2013/F:RNATURALINC2014/F:RNATURALINC2015/F:RINTERNATIONALMIG2011/F:RINTERNATIONALMIG2012/F:RINTERNATIONALMIG2013/F:RINTERNATIONALMIG2014/F:RINTERNATIONALMIG2015/F:RDOMESTICMIG2011/F:RDOMESTICMIG2012/F:RDOMESTICMIG2013/F:RDOMESTICMIG2014/F:RDOMESTICMIG2015/F:RNETMIG2011/F:RNETMIG2012/F:RNETMIG2013/F:RNETMIG2014/F:RNETMIG2015",',');
f->Write(); f->Write();
``` ```
%% Output %% Output
Warning in <TTree::ReadStream>: Couldn't read formatted data in "SUMLEV" for branch SUMLEV on line 1; ignoring line Warning in <TTree::ReadStream>: Couldn't read formatted data in "SUMLEV" for branch SUMLEV on line 1; ignoring line
Warning in <TTree::ReadStream>: Read too few columns (1 < 100) in line 1; ignoring line Warning in <TTree::ReadStream>: Read too few columns (1 < 100) in line 1; ignoring line
%% Cell type:markdown id: tags:
CHecking the diiference in size of the file and number of entries
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` c++ ``` c++
system("ls -lhrt /home/student/ejercicios-clase-08-datos/data-used/census*"); system("ls -lhrt /home/student/ejercicios-clase-08-datos/data-used/census*");
system("echo"); system("echo");
system("echo 'This dataset contains the below number of data points'"); system("echo 'This dataset contains the below number of data points'");
system("wc -l /home/student/ejercicios-clase-08-datos/data-used/census.csv"); system("wc -l /home/student/ejercicios-clase-08-datos/data-used/census.csv");
``` ```
%% Output %% Output
-rw-r--r-- 1 student student 2.1M Feb 25 00:41 /home/student/ejercicios-clase-08-datos/data-used/census.csv -rw-r--r-- 1 student student 2.1M Feb 25 00:41 /home/student/ejercicios-clase-08-datos/data-used/census.csv
-rw-r--r-- 1 student student 795K Feb 27 02:12 /home/student/ejercicios-clase-08-datos/data-used/census.root -rw-r--r-- 1 student student 795K Feb 27 17:24 /home/student/ejercicios-clase-08-datos/data-used/census.root
This dataset contains the below number of data points This dataset contains the below number of data points
3194 /home/student/ejercicios-clase-08-datos/data-used/census.csv 3194 /home/student/ejercicios-clase-08-datos/data-used/census.csv
%% Cell type:code id: tags:
``` c++
```
......
No preview for this file type
File added
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment