Overview

Dataset statistics

Number of variables15
Number of observations546212
Missing cells48324
Missing cells (%)0.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory62.5 MiB
Average record size in memory120.0 B

Variable types

Categorical9
Numeric6

Alerts

PUNT_LECTURA_CRITICA is highly correlated with PUNT_MATEMATICAS and 4 other fieldsHigh correlation
PUNT_MATEMATICAS is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_C_NATURALES is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_SOCIALES_CIUDADANAS is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_INGLES is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_GLOBAL is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_LECTURA_CRITICA is highly correlated with PUNT_MATEMATICAS and 4 other fieldsHigh correlation
PUNT_MATEMATICAS is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_C_NATURALES is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_SOCIALES_CIUDADANAS is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_INGLES is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_GLOBAL is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_LECTURA_CRITICA is highly correlated with PUNT_MATEMATICAS and 4 other fieldsHigh correlation
PUNT_MATEMATICAS is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_C_NATURALES is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_SOCIALES_CIUDADANAS is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_INGLES is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_GLOBAL is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
ESTU_DEPTO_RESIDE is highly correlated with COLE_DEPTO_UBICACIONHigh correlation
FAMI_TIENEINTERNET is highly correlated with ESTU_DEDICACIONLECTURADIARIA and 1 other fieldsHigh correlation
ESTU_DEDICACIONLECTURADIARIA is highly correlated with FAMI_TIENEINTERNETHigh correlation
COLE_DEPTO_UBICACION is highly correlated with ESTU_DEPTO_RESIDEHigh correlation
FAMI_ESTRATOVIVIENDA is highly correlated with FAMI_TIENEINTERNETHigh correlation
ESTU_DEPTO_RESIDE is highly correlated with COLE_DEPTO_UBICACIONHigh correlation
FAMI_ESTRATOVIVIENDA is highly correlated with FAMI_TIENEINTERNET and 1 other fieldsHigh correlation
FAMI_TIENEINTERNET is highly correlated with FAMI_ESTRATOVIVIENDA and 2 other fieldsHigh correlation
FAMI_SITUACIONECONOMICA is highly correlated with ESTU_HORASSEMANATRABAJAHigh correlation
ESTU_DEDICACIONLECTURADIARIA is highly correlated with FAMI_ESTRATOVIVIENDA and 1 other fieldsHigh correlation
ESTU_DEDICACIONINTERNET is highly correlated with FAMI_TIENEINTERNETHigh correlation
ESTU_HORASSEMANATRABAJA is highly correlated with FAMI_SITUACIONECONOMICAHigh correlation
COLE_DEPTO_UBICACION is highly correlated with ESTU_DEPTO_RESIDEHigh correlation
PUNT_LECTURA_CRITICA is highly correlated with PUNT_MATEMATICAS and 4 other fieldsHigh correlation
PUNT_MATEMATICAS is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_C_NATURALES is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_SOCIALES_CIUDADANAS is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_INGLES is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
PUNT_GLOBAL is highly correlated with PUNT_LECTURA_CRITICA and 4 other fieldsHigh correlation
FAMI_TIENEINTERNET has 8337 (1.5%) missing values Missing
FAMI_SITUACIONECONOMICA has 8259 (1.5%) missing values Missing
ESTU_DEDICACIONINTERNET has 30298 (5.5%) missing values Missing

Reproduction

Analysis started2022-05-24 18:26:41.511446
Analysis finished2022-05-24 18:27:42.292309
Duration1 minute and 0.78 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

ESTU_GENERO
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
F
295994 
M
250097 
-
 
121

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters546212
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
F295994
54.2%
M250097
45.8%
-121
 
< 0.1%

Length

2022-05-24T18:27:42.422273image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:27:42.650978image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
f295994
54.2%
m250097
45.8%
121
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
F295994
54.2%
M250097
45.8%
-121
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter546091
> 99.9%
Dash Punctuation121
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F295994
54.2%
M250097
45.8%
Dash Punctuation
ValueCountFrequency (%)
-121
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin546091
> 99.9%
Common121
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
F295994
54.2%
M250097
45.8%
Common
ValueCountFrequency (%)
-121
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII546212
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F295994
54.2%
M250097
45.8%
-121
 
< 0.1%

ESTU_DEPTO_RESIDE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct34
Distinct (%)< 0.1%
Missing377
Missing (%)0.1%
Memory size4.2 MiB
BOGOTÁ
83600 
ANTIOQUIA
74228 
VALLE
38640 
CUNDINAMARCA
36196 
ATLANTICO
32179 
Other values (29)
280992 

Length

Max length15
Median length12
Mean length7.530119908
Min length4

Characters and Unicode

Total characters4110203
Distinct characters26
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMAGDALENA
2nd rowBOGOTÁ
3rd rowBOLIVAR
4th rowBOGOTÁ
5th rowBOGOTÁ

Common Values

ValueCountFrequency (%)
BOGOTÁ83600
15.3%
ANTIOQUIA74228
13.6%
VALLE38640
 
7.1%
CUNDINAMARCA36196
 
6.6%
ATLANTICO32179
 
5.9%
SANTANDER25473
 
4.7%
BOLIVAR25232
 
4.6%
CORDOBA20037
 
3.7%
NARIÑO16903
 
3.1%
BOYACA16763
 
3.1%
Other values (24)176584
32.3%

Length

2022-05-24T18:27:42.836071image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bogotá83600
14.6%
antioquia74228
 
13.0%
santander41241
 
7.2%
valle38640
 
6.8%
cundinamarca36196
 
6.3%
atlantico32179
 
5.6%
bolivar25232
 
4.4%
cordoba20037
 
3.5%
nariño16903
 
3.0%
boyaca16763
 
2.9%
Other values (26)186051
32.6%

Most occurring characters

ValueCountFrequency (%)
A800950
19.5%
O425965
10.4%
N325463
 
7.9%
I323904
 
7.9%
T316893
 
7.7%
C228237
 
5.6%
R221090
 
5.4%
L211817
 
5.2%
U182367
 
4.4%
E161838
 
3.9%
Other values (16)911679
22.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter4084968
99.4%
Space Separator25235
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A800950
19.6%
O425965
10.4%
N325463
 
8.0%
I323904
 
7.9%
T316893
 
7.8%
C228237
 
5.6%
R221090
 
5.4%
L211817
 
5.2%
U182367
 
4.5%
E161838
 
4.0%
Other values (15)886444
21.7%
Space Separator
ValueCountFrequency (%)
25235
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4084968
99.4%
Common25235
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A800950
19.6%
O425965
10.4%
N325463
 
8.0%
I323904
 
7.9%
T316893
 
7.8%
C228237
 
5.6%
R221090
 
5.4%
L211817
 
5.2%
U182367
 
4.5%
E161838
 
4.0%
Other values (15)886444
21.7%
Common
ValueCountFrequency (%)
25235
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4009700
97.6%
None100503
 
2.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A800950
20.0%
O425965
10.6%
N325463
8.1%
I323904
 
8.1%
T316893
 
7.9%
C228237
 
5.7%
R221090
 
5.5%
L211817
 
5.3%
U182367
 
4.5%
E161838
 
4.0%
Other values (14)811176
20.2%
None
ValueCountFrequency (%)
Á83600
83.2%
Ñ16903
 
16.8%

FAMI_ESTRATOVIVIENDA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing26
Missing (%)< 0.1%
Memory size4.2 MiB
Estrato 2
188314 
Estrato 1
159977 
Estrato 3
108692 
-
34481 
Estrato 4
25810 
Other values (3)
28912 

Length

Max length11
Median length9
Mean length8.557853918
Min length1

Characters and Unicode

Total characters4674180
Distinct characters17
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEstrato 3
2nd rowEstrato 3
3rd rowEstrato 1
4th rowEstrato 3
5th rowEstrato 3

Common Values

ValueCountFrequency (%)
Estrato 2188314
34.5%
Estrato 1159977
29.3%
Estrato 3108692
19.9%
-34481
 
6.3%
Estrato 425810
 
4.7%
Sin Estrato17177
 
3.1%
Estrato 58024
 
1.5%
Estrato 63711
 
0.7%
(Missing)26
 
< 0.1%

Length

2022-05-24T18:27:43.049913image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:27:43.310464image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
estrato511705
48.4%
2188314
 
17.8%
1159977
 
15.1%
3108692
 
10.3%
34481
 
3.3%
425810
 
2.4%
sin17177
 
1.6%
58024
 
0.8%
63711
 
0.4%

Most occurring characters

ValueCountFrequency (%)
t1023410
21.9%
E511705
10.9%
s511705
10.9%
r511705
10.9%
a511705
10.9%
o511705
10.9%
511705
10.9%
2188314
 
4.0%
1159977
 
3.4%
3108692
 
2.3%
Other values (7)123557
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3104584
66.4%
Uppercase Letter528882
 
11.3%
Space Separator511705
 
10.9%
Decimal Number494528
 
10.6%
Dash Punctuation34481
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1023410
33.0%
s511705
16.5%
r511705
16.5%
a511705
16.5%
o511705
16.5%
i17177
 
0.6%
n17177
 
0.6%
Decimal Number
ValueCountFrequency (%)
2188314
38.1%
1159977
32.3%
3108692
22.0%
425810
 
5.2%
58024
 
1.6%
63711
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
E511705
96.8%
S17177
 
3.2%
Space Separator
ValueCountFrequency (%)
511705
100.0%
Dash Punctuation
ValueCountFrequency (%)
-34481
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3633466
77.7%
Common1040714
 
22.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1023410
28.2%
E511705
14.1%
s511705
14.1%
r511705
14.1%
a511705
14.1%
o511705
14.1%
S17177
 
0.5%
i17177
 
0.5%
n17177
 
0.5%
Common
ValueCountFrequency (%)
511705
49.2%
2188314
 
18.1%
1159977
 
15.4%
3108692
 
10.4%
-34481
 
3.3%
425810
 
2.5%
58024
 
0.8%
63711
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4674180
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1023410
21.9%
E511705
10.9%
s511705
10.9%
r511705
10.9%
a511705
10.9%
o511705
10.9%
511705
10.9%
2188314
 
4.0%
1159977
 
3.4%
3108692
 
2.3%
Other values (7)123557
 
2.6%

FAMI_TIENEINTERNET
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing8337
Missing (%)1.5%
Memory size4.2 MiB
Si
314042 
No
201199 
-
 
22634

Length

Max length2
Median length2
Mean length1.957919591
Min length1

Characters and Unicode

Total characters1053116
Distinct characters5
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSi
2nd rowSi
3rd rowNo
4th rowNo
5th rowSi

Common Values

ValueCountFrequency (%)
Si314042
57.5%
No201199
36.8%
-22634
 
4.1%
(Missing)8337
 
1.5%

Length

2022-05-24T18:27:43.506627image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:27:43.731709image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
si314042
58.4%
no201199
37.4%
22634
 
4.2%

Most occurring characters

ValueCountFrequency (%)
S314042
29.8%
i314042
29.8%
N201199
19.1%
o201199
19.1%
-22634
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter515241
48.9%
Lowercase Letter515241
48.9%
Dash Punctuation22634
 
2.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S314042
61.0%
N201199
39.0%
Lowercase Letter
ValueCountFrequency (%)
i314042
61.0%
o201199
39.0%
Dash Punctuation
ValueCountFrequency (%)
-22634
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1030482
97.9%
Common22634
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
S314042
30.5%
i314042
30.5%
N201199
19.5%
o201199
19.5%
Common
ValueCountFrequency (%)
-22634
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1053116
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S314042
29.8%
i314042
29.8%
N201199
19.1%
o201199
19.1%
-22634
 
2.1%

FAMI_SITUACIONECONOMICA
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing8259
Missing (%)1.5%
Memory size4.2 MiB
Igual
322524 
Mejor
133690 
Peor
72123 
-
 
9616

Length

Max length5
Median length5
Mean length4.794429997
Min length1

Characters and Unicode

Total characters2579178
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPeor
2nd rowMejor
3rd rowIgual
4th rowIgual
5th rowMejor

Common Values

ValueCountFrequency (%)
Igual322524
59.0%
Mejor133690
24.5%
Peor72123
 
13.2%
-9616
 
1.8%
(Missing)8259
 
1.5%

Length

2022-05-24T18:27:43.920444image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:27:44.163246image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
igual322524
60.0%
mejor133690
24.9%
peor72123
 
13.4%
9616
 
1.8%

Most occurring characters

ValueCountFrequency (%)
I322524
12.5%
g322524
12.5%
u322524
12.5%
a322524
12.5%
l322524
12.5%
e205813
8.0%
o205813
8.0%
r205813
8.0%
M133690
5.2%
j133690
5.2%
Other values (2)81739
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2041225
79.1%
Uppercase Letter528337
 
20.5%
Dash Punctuation9616
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
g322524
15.8%
u322524
15.8%
a322524
15.8%
l322524
15.8%
e205813
10.1%
o205813
10.1%
r205813
10.1%
j133690
6.5%
Uppercase Letter
ValueCountFrequency (%)
I322524
61.0%
M133690
25.3%
P72123
 
13.7%
Dash Punctuation
ValueCountFrequency (%)
-9616
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2569562
99.6%
Common9616
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
I322524
12.6%
g322524
12.6%
u322524
12.6%
a322524
12.6%
l322524
12.6%
e205813
8.0%
o205813
8.0%
r205813
8.0%
M133690
5.2%
j133690
5.2%
Common
ValueCountFrequency (%)
-9616
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2579178
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I322524
12.5%
g322524
12.5%
u322524
12.5%
a322524
12.5%
l322524
12.5%
e205813
8.0%
o205813
8.0%
r205813
8.0%
M133690
5.2%
j133690
5.2%
Other values (2)81739
 
3.2%

ESTU_DEDICACIONLECTURADIARIA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing627
Missing (%)0.1%
Memory size4.2 MiB
30 minutos o menos
199094 
Entre 30 y 60 minutos
144272 
No leo por entretenimiento
95621 
Entre 1 y 2 horas
55480 
-
31108 

Length

Max length26
Median length21
Mean length18.9777175
Min length1

Characters and Unicode

Total characters10353958
Distinct characters26
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEntre 30 y 60 minutos
2nd rowEntre 30 y 60 minutos
3rd rowEntre 30 y 60 minutos
4th row30 minutos o menos
5th rowNo leo por entretenimiento

Common Values

ValueCountFrequency (%)
30 minutos o menos199094
36.4%
Entre 30 y 60 minutos144272
26.4%
No leo por entretenimiento95621
17.5%
Entre 1 y 2 horas55480
 
10.2%
-31108
 
5.7%
Más de 2 horas20010
 
3.7%
(Missing)627
 
0.1%

Length

2022-05-24T18:27:44.345096image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:27:44.623385image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
30343366
15.0%
minutos343366
15.0%
entre199752
8.7%
y199752
8.7%
o199094
8.7%
menos199094
8.7%
60144272
 
6.3%
por95621
 
4.2%
entretenimiento95621
 
4.2%
leo95621
 
4.2%
Other values (7)373209
16.3%

Most occurring characters

ValueCountFrequency (%)
1743183
16.8%
o1199528
11.6%
n1029075
9.9%
e896961
8.7%
t829981
8.0%
m638081
 
6.2%
s637960
 
6.2%
i534608
 
5.2%
0487638
 
4.7%
r466484
 
4.5%
Other values (16)1890459
18.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7158038
69.1%
Space Separator1743183
 
16.8%
Decimal Number1106246
 
10.7%
Uppercase Letter315383
 
3.0%
Dash Punctuation31108
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o1199528
16.8%
n1029075
14.4%
e896961
12.5%
t829981
11.6%
m638081
8.9%
s637960
8.9%
i534608
7.5%
r466484
 
6.5%
u343366
 
4.8%
y199752
 
2.8%
Other values (6)382242
 
5.3%
Decimal Number
ValueCountFrequency (%)
0487638
44.1%
3343366
31.0%
6144272
 
13.0%
275490
 
6.8%
155480
 
5.0%
Uppercase Letter
ValueCountFrequency (%)
E199752
63.3%
N95621
30.3%
M20010
 
6.3%
Space Separator
ValueCountFrequency (%)
1743183
100.0%
Dash Punctuation
ValueCountFrequency (%)
-31108
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7473421
72.2%
Common2880537
 
27.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
o1199528
16.1%
n1029075
13.8%
e896961
12.0%
t829981
11.1%
m638081
8.5%
s637960
8.5%
i534608
7.2%
r466484
 
6.2%
u343366
 
4.6%
y199752
 
2.7%
Other values (9)697625
9.3%
Common
ValueCountFrequency (%)
1743183
60.5%
0487638
 
16.9%
3343366
 
11.9%
6144272
 
5.0%
275490
 
2.6%
155480
 
1.9%
-31108
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII10333948
99.8%
None20010
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1743183
16.9%
o1199528
11.6%
n1029075
10.0%
e896961
8.7%
t829981
8.0%
m638081
 
6.2%
s637960
 
6.2%
i534608
 
5.2%
0487638
 
4.7%
r466484
 
4.5%
Other values (15)1870449
18.1%
None
ValueCountFrequency (%)
á20010
100.0%

ESTU_DEDICACIONINTERNET
Categorical

HIGH CORRELATION
MISSING

Distinct6
Distinct (%)< 0.1%
Missing30298
Missing (%)5.5%
Memory size4.2 MiB
Entre 1 y 3 horas
157557 
Entre 30 y 60 minutos
134383 
Más de 3 horas
100134 
30 minutos o menos
90517 
No Navega Internet
30697 

Length

Max length21
Median length18
Mean length17.61314095
Min length1

Characters and Unicode

Total characters9086866
Distinct characters26
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEntre 30 y 60 minutos
2nd rowEntre 30 y 60 minutos
3rd rowMás de 3 horas
4th rowEntre 30 y 60 minutos
5th rowMás de 3 horas

Common Values

ValueCountFrequency (%)
Entre 1 y 3 horas157557
28.8%
Entre 30 y 60 minutos134383
24.6%
Más de 3 horas100134
18.3%
30 minutos o menos90517
16.6%
No Navega Internet30697
 
5.6%
-2626
 
0.5%
(Missing)30298
 
5.5%

Length

2022-05-24T18:27:44.843142image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:27:45.140946image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
entre291940
12.6%
y291940
12.6%
3257691
11.1%
horas257691
11.1%
30224900
9.7%
minutos224900
9.7%
1157557
6.8%
60134383
5.8%
más100134
 
4.3%
de100134
 
4.3%
Other values (6)275751
11.9%

Most occurring characters

ValueCountFrequency (%)
1801107
19.8%
o694322
 
7.6%
s673242
 
7.4%
n668751
 
7.4%
r580328
 
6.4%
t578234
 
6.4%
e574682
 
6.3%
3482591
 
5.3%
0359283
 
4.0%
a319085
 
3.5%
Other values (16)2355241
25.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5665154
62.3%
Space Separator1801107
 
19.8%
Decimal Number1133814
 
12.5%
Uppercase Letter484165
 
5.3%
Dash Punctuation2626
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o694322
12.3%
s673242
11.9%
n668751
11.8%
r580328
10.2%
t578234
10.2%
e574682
10.1%
a319085
5.6%
m315417
5.6%
y291940
5.2%
h257691
 
4.5%
Other values (6)711462
12.6%
Decimal Number
ValueCountFrequency (%)
3482591
42.6%
0359283
31.7%
1157557
 
13.9%
6134383
 
11.9%
Uppercase Letter
ValueCountFrequency (%)
E291940
60.3%
M100134
 
20.7%
N61394
 
12.7%
I30697
 
6.3%
Space Separator
ValueCountFrequency (%)
1801107
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2626
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6149319
67.7%
Common2937547
32.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o694322
11.3%
s673242
10.9%
n668751
10.9%
r580328
9.4%
t578234
9.4%
e574682
9.3%
a319085
 
5.2%
m315417
 
5.1%
E291940
 
4.7%
y291940
 
4.7%
Other values (10)1161378
18.9%
Common
ValueCountFrequency (%)
1801107
61.3%
3482591
 
16.4%
0359283
 
12.2%
1157557
 
5.4%
6134383
 
4.6%
-2626
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII8986732
98.9%
None100134
 
1.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1801107
20.0%
o694322
 
7.7%
s673242
 
7.5%
n668751
 
7.4%
r580328
 
6.5%
t578234
 
6.4%
e574682
 
6.4%
3482591
 
5.4%
0359283
 
4.0%
a319085
 
3.6%
Other values (15)2255107
25.1%
None
ValueCountFrequency (%)
á100134
100.0%

ESTU_HORASSEMANATRABAJA
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing381
Missing (%)0.1%
Memory size4.2 MiB
0
354503 
Menos de 10 horas
97913 
Entre 11 y 20 horas
41729 
Más de 30 horas
 
19907
-
 
16509

Length

Max length19
Median length1
Mean length6.260397449
Min length1

Characters and Unicode

Total characters3417119
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMenos de 10 horas
2nd rowMenos de 10 horas
3rd row0
4th rowMás de 30 horas
5th rowMás de 30 horas

Common Values

ValueCountFrequency (%)
0354503
64.9%
Menos de 10 horas97913
 
17.9%
Entre 11 y 20 horas41729
 
7.6%
Más de 30 horas19907
 
3.6%
-16509
 
3.0%
Entre 21 y 30 horas15270
 
2.8%
(Missing)381
 
0.1%

Length

2022-05-24T18:27:45.385708image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:27:45.644127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0354503
31.4%
horas174819
15.5%
de117820
 
10.5%
menos97913
 
8.7%
1097913
 
8.7%
entre56999
 
5.1%
y56999
 
5.1%
1141729
 
3.7%
2041729
 
3.7%
3035177
 
3.1%
Other values (3)51686
 
4.6%

Most occurring characters

ValueCountFrequency (%)
581456
17.0%
0529322
15.5%
s292639
8.6%
e272732
8.0%
o272732
8.0%
r231818
 
6.8%
1196641
 
5.8%
a174819
 
5.1%
h174819
 
5.1%
n154912
 
4.5%
Other values (9)535229
15.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1826196
53.4%
Decimal Number818139
23.9%
Space Separator581456
 
17.0%
Uppercase Letter174819
 
5.1%
Dash Punctuation16509
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s292639
16.0%
e272732
14.9%
o272732
14.9%
r231818
12.7%
a174819
9.6%
h174819
9.6%
n154912
8.5%
d117820
6.5%
t56999
 
3.1%
y56999
 
3.1%
Decimal Number
ValueCountFrequency (%)
0529322
64.7%
1196641
 
24.0%
256999
 
7.0%
335177
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
M117820
67.4%
E56999
32.6%
Space Separator
ValueCountFrequency (%)
581456
100.0%
Dash Punctuation
ValueCountFrequency (%)
-16509
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2001015
58.6%
Common1416104
41.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
s292639
14.6%
e272732
13.6%
o272732
13.6%
r231818
11.6%
a174819
8.7%
h174819
8.7%
n154912
7.7%
M117820
5.9%
d117820
5.9%
E56999
 
2.8%
Other values (3)133905
6.7%
Common
ValueCountFrequency (%)
581456
41.1%
0529322
37.4%
1196641
 
13.9%
256999
 
4.0%
335177
 
2.5%
-16509
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII3397212
99.4%
None19907
 
0.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
581456
17.1%
0529322
15.6%
s292639
8.6%
e272732
8.0%
o272732
8.0%
r231818
 
6.8%
1196641
 
5.8%
a174819
 
5.1%
h174819
 
5.1%
n154912
 
4.6%
Other values (8)515322
15.2%
None
ValueCountFrequency (%)
á19907
100.0%

COLE_DEPTO_UBICACION
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct33
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
BOGOTÁ
82832 
ANTIOQUIA
74182 
VALLE
38664 
CUNDINAMARCA
37049 
ATLANTICO
32235 
Other values (28)
281250 

Length

Max length15
Median length12
Mean length7.541300814
Min length4

Characters and Unicode

Total characters4119149
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMAGDALENA
2nd rowBOGOTÁ
3rd rowBOLIVAR
4th rowBOGOTÁ
5th rowBOGOTÁ

Common Values

ValueCountFrequency (%)
BOGOTÁ82832
15.2%
ANTIOQUIA74182
13.6%
VALLE38664
 
7.1%
CUNDINAMARCA37049
 
6.8%
ATLANTICO32235
 
5.9%
SANTANDER25751
 
4.7%
BOLIVAR25418
 
4.7%
CORDOBA19984
 
3.7%
NARIÑO16933
 
3.1%
BOYACA16737
 
3.1%
Other values (23)176427
32.3%

Length

2022-05-24T18:27:45.880272image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bogotá82832
14.5%
antioquia74182
 
13.0%
santander41671
 
7.3%
valle38664
 
6.8%
cundinamarca37049
 
6.5%
atlantico32235
 
5.6%
bolivar25418
 
4.4%
cordoba19984
 
3.5%
nariño16933
 
3.0%
boyaca16737
 
2.9%
Other values (25)185882
32.5%

Most occurring characters

ValueCountFrequency (%)
A803921
19.5%
O424463
10.3%
N328006
 
8.0%
I324942
 
7.9%
T316651
 
7.7%
C229766
 
5.6%
R222404
 
5.4%
L211884
 
5.1%
U183212
 
4.4%
E162148
 
3.9%
Other values (15)911752
22.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter4093774
99.4%
Space Separator25375
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A803921
19.6%
O424463
10.4%
N328006
 
8.0%
I324942
 
7.9%
T316651
 
7.7%
C229766
 
5.6%
R222404
 
5.4%
L211884
 
5.2%
U183212
 
4.5%
E162148
 
4.0%
Other values (14)886377
21.7%
Space Separator
ValueCountFrequency (%)
25375
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4093774
99.4%
Common25375
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A803921
19.6%
O424463
10.4%
N328006
 
8.0%
I324942
 
7.9%
T316651
 
7.7%
C229766
 
5.6%
R222404
 
5.4%
L211884
 
5.2%
U183212
 
4.5%
E162148
 
4.0%
Other values (14)886377
21.7%
Common
ValueCountFrequency (%)
25375
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4019384
97.6%
None99765
 
2.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A803921
20.0%
O424463
10.6%
N328006
8.2%
I324942
8.1%
T316651
 
7.9%
C229766
 
5.7%
R222404
 
5.5%
L211884
 
5.3%
U183212
 
4.6%
E162148
 
4.0%
Other values (13)811987
20.2%
None
ValueCountFrequency (%)
Á82832
83.0%
Ñ16933
 
17.0%

PUNT_LECTURA_CRITICA
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct65
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.15730522
Minimum0
Maximum100
Zeros127
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-24T18:27:46.140508image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile35
Q145
median52
Q360
95-th percentile69
Maximum100
Range100
Interquartile range (IQR)15

Descriptive statistics

Standard deviation10.53796336
Coefficient of variation (CV)0.2020419443
Kurtosis-0.2639727926
Mean52.15730522
Median Absolute Deviation (MAD)8
Skewness-0.02693031487
Sum28488946
Variance111.0486717
MonotonicityNot monotonic
2022-05-24T18:27:46.423368image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5319780
 
3.6%
5419719
 
3.6%
5519616
 
3.6%
5219561
 
3.6%
5119036
 
3.5%
5618879
 
3.5%
5018765
 
3.4%
5718623
 
3.4%
4918052
 
3.3%
5817915
 
3.3%
Other values (55)356266
65.2%
ValueCountFrequency (%)
0127
 
< 0.1%
207
 
< 0.1%
2121
 
< 0.1%
2242
 
< 0.1%
23116
 
< 0.1%
24201
 
< 0.1%
25360
 
0.1%
26577
0.1%
27942
0.2%
281292
0.2%
ValueCountFrequency (%)
100221
 
< 0.1%
825
 
< 0.1%
8144
 
< 0.1%
80261
 
< 0.1%
79526
 
0.1%
78724
 
0.1%
771034
0.2%
761404
0.3%
751938
0.4%
742484
0.5%

PUNT_MATEMATICAS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct73
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.60634882
Minimum0
Maximum100
Zeros10
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-24T18:27:46.695691image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile31
Q142
median51
Q359
95-th percentile70
Maximum100
Range100
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.99764898
Coefficient of variation (CV)0.237077941
Kurtosis-0.2457941652
Mean50.60634882
Median Absolute Deviation (MAD)8
Skewness0.05566056913
Sum27641795
Variance143.943581
MonotonicityNot monotonic
2022-05-24T18:27:46.987395image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5217011
 
3.1%
5117010
 
3.1%
5316962
 
3.1%
5016952
 
3.1%
4916930
 
3.1%
5416724
 
3.1%
4816548
 
3.0%
5516469
 
3.0%
5616215
 
3.0%
4716198
 
3.0%
Other values (63)379193
69.4%
ValueCountFrequency (%)
010
 
< 0.1%
1511
 
< 0.1%
1617
 
< 0.1%
1769
 
< 0.1%
18144
 
< 0.1%
19256
 
< 0.1%
20406
 
0.1%
21618
0.1%
22881
0.2%
231215
0.2%
ValueCountFrequency (%)
100526
 
0.1%
859
 
< 0.1%
8489
 
< 0.1%
83385
 
0.1%
82390
 
0.1%
81636
 
0.1%
80816
0.1%
791005
0.2%
781276
0.2%
771597
0.3%

PUNT_C_NATURALES
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct66
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48.2347788
Minimum0
Maximum100
Zeros18
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-24T18:27:47.252914image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile31
Q140
median48
Q356
95-th percentile67
Maximum100
Range100
Interquartile range (IQR)16

Descriptive statistics

Standard deviation10.7640352
Coefficient of variation (CV)0.2231592115
Kurtosis-0.3215417997
Mean48.2347788
Median Absolute Deviation (MAD)8
Skewness0.2308582355
Sum26346415
Variance115.8644538
MonotonicityNot monotonic
2022-05-24T18:27:47.620004image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4519175
 
3.5%
4619056
 
3.5%
4718927
 
3.5%
4818744
 
3.4%
4918420
 
3.4%
4318323
 
3.4%
4418309
 
3.4%
5017871
 
3.3%
4217768
 
3.3%
5117579
 
3.2%
Other values (56)362040
66.3%
ValueCountFrequency (%)
018
 
< 0.1%
196
 
< 0.1%
2049
 
< 0.1%
21136
 
< 0.1%
22311
 
0.1%
23489
 
0.1%
24839
 
0.2%
251185
0.2%
261801
0.3%
272471
0.5%
ValueCountFrequency (%)
100138
 
< 0.1%
8234
 
< 0.1%
81104
 
< 0.1%
80237
 
< 0.1%
79339
 
0.1%
78453
 
0.1%
77641
0.1%
76881
0.2%
751129
0.2%
741397
0.3%

PUNT_SOCIALES_CIUDADANAS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct70
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.22458862
Minimum0
Maximum100
Zeros15
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-24T18:27:47.915629image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile28
Q137
median45
Q355
95-th percentile67
Maximum100
Range100
Interquartile range (IQR)18

Descriptive statistics

Standard deviation12.14058808
Coefficient of variation (CV)0.2626435074
Kurtosis-0.5042016304
Mean46.22458862
Median Absolute Deviation (MAD)9
Skewness0.3156098131
Sum25248425
Variance147.393879
MonotonicityNot monotonic
2022-05-24T18:27:48.183354image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3916509
 
3.0%
3816405
 
3.0%
3716399
 
3.0%
4016204
 
3.0%
4116059
 
2.9%
3616000
 
2.9%
4215920
 
2.9%
4315737
 
2.9%
4415617
 
2.9%
4515282
 
2.8%
Other values (60)386080
70.7%
ValueCountFrequency (%)
015
 
< 0.1%
162
 
< 0.1%
1721
 
< 0.1%
1871
 
< 0.1%
19180
 
< 0.1%
20333
 
0.1%
21677
 
0.1%
221143
0.2%
231815
0.3%
242658
0.5%
ValueCountFrequency (%)
100227
 
< 0.1%
837
 
< 0.1%
8241
 
< 0.1%
81214
 
< 0.1%
80290
 
0.1%
79498
 
0.1%
78626
0.1%
77818
0.1%
76986
0.2%
751328
0.2%

PUNT_INGLES
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct70
Distinct (%)< 0.1%
Missing19
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean48.4168911
Minimum0
Maximum100
Zeros142
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-24T18:27:48.425615image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30
Q139
median48
Q356
95-th percentile71
Maximum100
Range100
Interquartile range (IQR)17

Descriptive statistics

Standard deviation12.55843838
Coefficient of variation (CV)0.2593813459
Kurtosis0.1813852678
Mean48.4168911
Median Absolute Deviation (MAD)9
Skewness0.4451975185
Sum26444967
Variance157.7143745
MonotonicityNot monotonic
2022-05-24T18:27:49.752918image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4516195
 
3.0%
5416182
 
3.0%
4016138
 
3.0%
4616127
 
3.0%
5316083
 
2.9%
4115976
 
2.9%
3915967
 
2.9%
4415774
 
2.9%
4715762
 
2.9%
4215716
 
2.9%
Other values (60)386273
70.7%
ValueCountFrequency (%)
0142
 
< 0.1%
2032
 
< 0.1%
21165
 
< 0.1%
22590
 
0.1%
231625
 
0.3%
242136
 
0.4%
252541
0.5%
263314
0.6%
274766
0.9%
285492
1.0%
ValueCountFrequency (%)
1001247
0.2%
8721
 
< 0.1%
86253
 
< 0.1%
85250
 
< 0.1%
84313
 
0.1%
83404
 
0.1%
82690
0.1%
811367
0.3%
801225
0.2%
791551
0.3%

PUNT_GLOBAL
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct389
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean246.1864642
Minimum0
Maximum477
Zeros4
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-24T18:27:50.007292image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile167
Q1207
median243
Q3282
95-th percentile335
Maximum477
Range477
Interquartile range (IQR)75

Descriptive statistics

Standard deviation51.38685767
Coefficient of variation (CV)0.2087314501
Kurtosis-0.4595364194
Mean246.1864642
Median Absolute Deviation (MAD)37
Skewness0.2625281501
Sum134470001
Variance2640.609142
MonotonicityNot monotonic
2022-05-24T18:27:50.280789image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2304596
 
0.8%
2354593
 
0.8%
2384578
 
0.8%
2374574
 
0.8%
2454552
 
0.8%
2334527
 
0.8%
2404500
 
0.8%
2434474
 
0.8%
2424456
 
0.8%
2274441
 
0.8%
Other values (379)500921
91.7%
ValueCountFrequency (%)
04
< 0.1%
111
 
< 0.1%
141
 
< 0.1%
211
 
< 0.1%
281
 
< 0.1%
441
 
< 0.1%
461
 
< 0.1%
541
 
< 0.1%
551
 
< 0.1%
571
 
< 0.1%
ValueCountFrequency (%)
4771
 
< 0.1%
4751
 
< 0.1%
4731
 
< 0.1%
4671
 
< 0.1%
4601
 
< 0.1%
4571
 
< 0.1%
4521
 
< 0.1%
4501
 
< 0.1%
4492
< 0.1%
4484
< 0.1%

Interactions

2022-05-24T18:27:35.592288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:26.046686image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:28.003661image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:29.888113image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:31.764598image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:33.686178image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:35.897394image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:26.374764image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:28.299338image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:30.187241image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:32.079221image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:33.999997image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:36.217705image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:26.710169image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:28.623578image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:30.510984image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:32.435583image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:34.302004image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:36.586515image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:27.007804image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:28.928011image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:30.813326image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:32.738761image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:34.614304image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:36.893933image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:27.328399image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:29.235259image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:31.132211image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:33.062984image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:34.954154image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:37.198272image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:27.680815image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:29.558722image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:31.438580image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:33.363688image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:27:35.264224image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-05-24T18:27:50.484830image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-24T18:27:50.741218image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-24T18:27:50.990421image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-24T18:27:51.280358image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-24T18:27:51.599610image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-24T18:27:37.740758image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-24T18:27:38.929052image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-24T18:27:40.978101image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-24T18:27:41.561552image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

ESTU_GENEROESTU_DEPTO_RESIDEFAMI_ESTRATOVIVIENDAFAMI_TIENEINTERNETFAMI_SITUACIONECONOMICAESTU_DEDICACIONLECTURADIARIAESTU_DEDICACIONINTERNETESTU_HORASSEMANATRABAJACOLE_DEPTO_UBICACIONPUNT_LECTURA_CRITICAPUNT_MATEMATICASPUNT_C_NATURALESPUNT_SOCIALES_CIUDADANASPUNT_INGLESPUNT_GLOBAL
0MMAGDALENAEstrato 3SiPeorEntre 30 y 60 minutosEntre 30 y 60 minutosMenos de 10 horasMAGDALENA4748373054.0208
1MBOGOTÁEstrato 3SiMejorEntre 30 y 60 minutosEntre 30 y 60 minutosMenos de 10 horasBOGOTÁ6065545963.0299
2MBOLIVAREstrato 1NoIgualEntre 30 y 60 minutosMás de 3 horas0BOLIVAR6657417464.0299
3MBOGOTÁEstrato 3NoIgual30 minutos o menosEntre 30 y 60 minutosMás de 30 horasBOGOTÁ6254617353.0309
4MBOGOTÁEstrato 3SiMejorNo leo por entretenimientoMás de 3 horasMás de 30 horasBOGOTÁ6357555752.0288
5MATLANTICO--Mejor-NaNMenos de 10 horasATLANTICO4929414135.0198
6MVALLEEstrato 4SiMejor30 minutos o menosEntre 30 y 60 minutos0VALLE7670706872.0355
7MSANTANDEREstrato 3SiIgualEntre 1 y 2 horasMás de 3 horasMenos de 10 horasSANTANDER5765636660.0313
8MCUNDINAMARCAEstrato 3SiIgualNo leo por entretenimientoEntre 1 y 3 horas0CUNDINAMARCA6262663963.0288
9MSUCREEstrato 3SiIgualEntre 30 y 60 minutosEntre 30 y 60 minutos0SUCRE6866637751.0336

Last rows

ESTU_GENEROESTU_DEPTO_RESIDEFAMI_ESTRATOVIVIENDAFAMI_TIENEINTERNETFAMI_SITUACIONECONOMICAESTU_DEDICACIONLECTURADIARIAESTU_DEDICACIONINTERNETESTU_HORASSEMANATRABAJACOLE_DEPTO_UBICACIONPUNT_LECTURA_CRITICAPUNT_MATEMATICASPUNT_C_NATURALESPUNT_SOCIALES_CIUDADANASPUNT_INGLESPUNT_GLOBAL
546202FHUILAEstrato 3SiIgualMás de 2 horasEntre 30 y 60 minutos0HUILA10073707284.0396
546203FGUAVIAREEstrato 1NoMejorEntre 30 y 60 minutos30 minutos o menosMenos de 10 horasGUAVIARE7166505656.0302
546204FVICHADAEstrato 2SiIgualEntre 1 y 2 horasEntre 1 y 3 horas0VICHADA7171587568.0343
546205MNORTE SANTANDEREstrato 2NoMejor30 minutos o menosEntre 30 y 60 minutosMenos de 10 horasNORTE SANTANDER5458605449.0280
546206MSUCREEstrato 2SiIgualEntre 30 y 60 minutosEntre 1 y 3 horas0SUCRE1001008275100.0450
546207MANTIOQUIAEstrato 2SiIgualNo leo por entretenimientoMás de 3 horasMenos de 10 horasANTIOQUIA7678657458.0360
546208MBOGOTÁEstrato 3SiMejorNo leo por entretenimientoMás de 3 horas0BOGOTÁ7573726774.0360
546209MARAUCAEstrato 2SiIgual30 minutos o menosEntre 30 y 60 minutos0ARAUCA7283717772.0377
546210MSANTANDEREstrato 1NoIgual30 minutos o menosEntre 30 y 60 minutosMás de 30 horasSANTANDER5961545246.0278
546211MBOGOTÁEstrato 3SiIgualEntre 30 y 60 minutosEntre 1 y 3 horas0BOGOTÁ7673727174.0365