Overview

Dataset statistics

Number of variables14
Number of observations260756
Missing cells47800
Missing cells (%)1.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.9 MiB
Average record size in memory112.0 B

Variable types

Categorical8
Numeric6

Alerts

ESTU_PRGM_ACADEMICO has a high cardinality: 810 distinct values High cardinality
MOD_RAZONA_CUANTITAT_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 1 other fieldsHigh correlation
MOD_LECTURA_CRITICA_PUNT is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fieldsHigh correlation
MOD_COMPETEN_CIUDADA_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 2 other fieldsHigh correlation
MOD_INGLES_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 2 other fieldsHigh correlation
PUNT_GLOBAL is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fieldsHigh correlation
MOD_RAZONA_CUANTITAT_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 1 other fieldsHigh correlation
MOD_LECTURA_CRITICA_PUNT is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fieldsHigh correlation
MOD_COMPETEN_CIUDADA_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 2 other fieldsHigh correlation
MOD_INGLES_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 2 other fieldsHigh correlation
MOD_COMUNI_ESCRITA_PUNT is highly correlated with PUNT_GLOBALHigh correlation
PUNT_GLOBAL is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 4 other fieldsHigh correlation
MOD_RAZONA_CUANTITAT_PUNT is highly correlated with PUNT_GLOBALHigh correlation
MOD_LECTURA_CRITICA_PUNT is highly correlated with PUNT_GLOBALHigh correlation
MOD_COMPETEN_CIUDADA_PUNT is highly correlated with PUNT_GLOBALHigh correlation
MOD_INGLES_PUNT is highly correlated with PUNT_GLOBALHigh correlation
PUNT_GLOBAL is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fieldsHigh correlation
ESTU_DEPTO_RESIDE is highly correlated with ESTU_PRGM_DEPARTAMENTOHigh correlation
ESTU_PRGM_DEPARTAMENTO is highly correlated with ESTU_DEPTO_RESIDEHigh correlation
ESTU_DEPTO_RESIDE is highly correlated with ESTU_PRGM_DEPARTAMENTOHigh correlation
ESTU_PRGM_DEPARTAMENTO is highly correlated with ESTU_DEPTO_RESIDEHigh correlation
MOD_RAZONA_CUANTITAT_PUNT is highly correlated with MOD_COMPETEN_CIUDADA_PUNT and 2 other fieldsHigh correlation
MOD_LECTURA_CRITICA_PUNT is highly correlated with MOD_COMPETEN_CIUDADA_PUNT and 1 other fieldsHigh correlation
MOD_COMPETEN_CIUDADA_PUNT is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fieldsHigh correlation
MOD_INGLES_PUNT is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 2 other fieldsHigh correlation
MOD_COMUNI_ESCRITA_PUNT is highly correlated with PUNT_GLOBALHigh correlation
PUNT_GLOBAL is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 4 other fieldsHigh correlation
ESTU_SEMESTRECURSA has 4263 (1.6%) missing values Missing
FAMI_ESTRATOVIVIENDA has 16367 (6.3%) missing values Missing
FAMI_TIENEINTERNET has 11822 (4.5%) missing values Missing
ESTU_HORASSEMANATRABAJA has 12916 (5.0%) missing values Missing
MOD_COMUNI_ESCRITA_PUNT has 8364 (3.2%) zeros Zeros

Reproduction

Analysis started2022-05-24 16:09:47.281007
Analysis finished2022-05-24 16:10:20.840584
Duration33.56 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

ESTU_GENERO
Categorical

Distinct2
Distinct (%)< 0.1%
Missing26
Missing (%)< 0.1%
Memory size2.0 MiB
F
153820 
M
106910 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters260730
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowF
4th rowF
5th rowF

Common Values

ValueCountFrequency (%)
F153820
59.0%
M106910
41.0%
(Missing)26
 
< 0.1%

Length

2022-05-24T16:10:20.957246image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T16:10:21.162175image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
f153820
59.0%
m106910
41.0%

Most occurring characters

ValueCountFrequency (%)
F153820
59.0%
M106910
41.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter260730
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F153820
59.0%
M106910
41.0%

Most occurring scripts

ValueCountFrequency (%)
Latin260730
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F153820
59.0%
M106910
41.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII260730
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F153820
59.0%
M106910
41.0%

ESTU_DEPTO_RESIDE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct34
Distinct (%)< 0.1%
Missing2025
Missing (%)0.8%
Memory size2.0 MiB
BOGOTÁ
72185 
ANTIOQUIA
31153 
VALLE
20633 
ATLANTICO
15145 
CUNDINAMARCA
14930 
Other values (29)
104685 

Length

Max length15
Median length12
Mean length7.366249116
Min length4

Characters and Unicode

Total characters1905877
Distinct characters26
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowVALLE
2nd rowVALLE
3rd rowVALLE
4th rowVALLE
5th rowVALLE

Common Values

ValueCountFrequency (%)
BOGOTÁ72185
27.7%
ANTIOQUIA31153
11.9%
VALLE20633
 
7.9%
ATLANTICO15145
 
5.8%
CUNDINAMARCA14930
 
5.7%
SANTANDER12604
 
4.8%
NORTE SANTANDER8374
 
3.2%
BOLIVAR7980
 
3.1%
BOYACA6870
 
2.6%
NARIÑO6496
 
2.5%
Other values (24)62361
23.9%

Length

2022-05-24T16:10:21.332177image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bogotá72185
26.8%
antioquia31153
11.5%
santander20978
 
7.8%
valle20633
 
7.6%
atlantico15145
 
5.6%
cundinamarca14930
 
5.5%
norte8374
 
3.1%
bolivar7980
 
3.0%
boyaca6870
 
2.5%
nariño6496
 
2.4%
Other values (26)65003
24.1%

Most occurring characters

ValueCountFrequency (%)
A323499
17.0%
O244967
12.9%
T176925
 
9.3%
N141889
 
7.4%
I133559
 
7.0%
L93856
 
4.9%
B92811
 
4.9%
R87505
 
4.6%
C87236
 
4.6%
G78640
 
4.1%
Other values (16)444990
23.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1894861
99.4%
Space Separator11016
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A323499
17.1%
O244967
12.9%
T176925
9.3%
N141889
 
7.5%
I133559
 
7.0%
L93856
 
5.0%
B92811
 
4.9%
R87505
 
4.6%
C87236
 
4.6%
G78640
 
4.2%
Other values (15)433974
22.9%
Space Separator
ValueCountFrequency (%)
11016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1894861
99.4%
Common11016
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A323499
17.1%
O244967
12.9%
T176925
9.3%
N141889
 
7.5%
I133559
 
7.0%
L93856
 
5.0%
B92811
 
4.9%
R87505
 
4.6%
C87236
 
4.6%
G78640
 
4.2%
Other values (15)433974
22.9%
Common
ValueCountFrequency (%)
11016
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1827196
95.9%
None78681
 
4.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A323499
17.7%
O244967
13.4%
T176925
9.7%
N141889
 
7.8%
I133559
 
7.3%
L93856
 
5.1%
B92811
 
5.1%
R87505
 
4.8%
C87236
 
4.8%
G78640
 
4.3%
Other values (14)366309
20.0%
None
ValueCountFrequency (%)
Á72185
91.7%
Ñ6496
 
8.3%

ESTU_SEMESTRECURSA
Categorical

MISSING

Distinct12
Distinct (%)< 0.1%
Missing4263
Missing (%)1.6%
Memory size2.0 MiB
09
86643 
10
73503 
08
58838 
07
16668 
11
9028 
Other values (7)
11813 

Length

Max length8
Median length2
Mean length2.186367659
Min length2

Characters and Unicode

Total characters560788
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row08
2nd row09
3rd row08
4th row08
5th row09

Common Values

ValueCountFrequency (%)
0986643
33.2%
1073503
28.2%
0858838
22.6%
0716668
 
6.4%
119028
 
3.5%
12 o más7967
 
3.1%
062377
 
0.9%
05542
 
0.2%
04453
 
0.2%
03275
 
0.1%
Other values (2)199
 
0.1%
(Missing)4263
 
1.6%

Length

2022-05-24T16:10:21.579052image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0986643
31.8%
1073503
27.0%
0858838
21.6%
0716668
 
6.1%
119028
 
3.3%
127967
 
2.9%
o7967
 
2.9%
más7967
 
2.9%
062377
 
0.9%
05542
 
0.2%
Other values (4)927
 
0.3%

Most occurring characters

ValueCountFrequency (%)
0239498
42.7%
199579
17.8%
986643
 
15.5%
858838
 
10.5%
716668
 
3.0%
15934
 
2.8%
28113
 
1.4%
o7967
 
1.4%
m7967
 
1.4%
á7967
 
1.4%
Other values (5)11614
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number512986
91.5%
Lowercase Letter31868
 
5.7%
Space Separator15934
 
2.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0239498
46.7%
199579
19.4%
986643
 
16.9%
858838
 
11.5%
716668
 
3.2%
28113
 
1.6%
62377
 
0.5%
5542
 
0.1%
4453
 
0.1%
3275
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
o7967
25.0%
m7967
25.0%
á7967
25.0%
s7967
25.0%
Space Separator
ValueCountFrequency (%)
15934
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common528920
94.3%
Latin31868
 
5.7%

Most frequent character per script

Common
ValueCountFrequency (%)
0239498
45.3%
199579
18.8%
986643
 
16.4%
858838
 
11.1%
716668
 
3.2%
15934
 
3.0%
28113
 
1.5%
62377
 
0.4%
5542
 
0.1%
4453
 
0.1%
Latin
ValueCountFrequency (%)
o7967
25.0%
m7967
25.0%
á7967
25.0%
s7967
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII552821
98.6%
None7967
 
1.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0239498
43.3%
199579
18.0%
986643
 
15.7%
858838
 
10.6%
716668
 
3.0%
15934
 
2.9%
28113
 
1.5%
o7967
 
1.4%
m7967
 
1.4%
s7967
 
1.4%
Other values (4)3647
 
0.7%
None
ValueCountFrequency (%)
á7967
100.0%

FAMI_ESTRATOVIVIENDA
Categorical

MISSING

Distinct7
Distinct (%)< 0.1%
Missing16367
Missing (%)6.3%
Memory size2.0 MiB
Estrato 2
84244 
Estrato 3
80130 
Estrato 1
38549 
Estrato 4
25379 
Estrato 5
9400 
Other values (2)
 
6687

Length

Max length11
Median length9
Mean length9.013552165
Min length9

Characters and Unicode

Total characters2202813
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEstrato 4
2nd rowEstrato 1
3rd rowEstrato 3
4th rowEstrato 3
5th rowEstrato 2

Common Values

ValueCountFrequency (%)
Estrato 284244
32.3%
Estrato 380130
30.7%
Estrato 138549
14.8%
Estrato 425379
 
9.7%
Estrato 59400
 
3.6%
Estrato 65031
 
1.9%
Sin Estrato1656
 
0.6%
(Missing)16367
 
6.3%

Length

2022-05-24T16:10:21.813274image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T16:10:22.099815image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
estrato244389
50.0%
284244
 
17.2%
380130
 
16.4%
138549
 
7.9%
425379
 
5.2%
59400
 
1.9%
65031
 
1.0%
sin1656
 
0.3%

Most occurring characters

ValueCountFrequency (%)
t488778
22.2%
E244389
11.1%
s244389
11.1%
r244389
11.1%
a244389
11.1%
o244389
11.1%
244389
11.1%
284244
 
3.8%
380130
 
3.6%
138549
 
1.7%
Other values (6)44778
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1469646
66.7%
Uppercase Letter246045
 
11.2%
Space Separator244389
 
11.1%
Decimal Number242733
 
11.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t488778
33.3%
s244389
16.6%
r244389
16.6%
a244389
16.6%
o244389
16.6%
i1656
 
0.1%
n1656
 
0.1%
Decimal Number
ValueCountFrequency (%)
284244
34.7%
380130
33.0%
138549
15.9%
425379
 
10.5%
59400
 
3.9%
65031
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
E244389
99.3%
S1656
 
0.7%
Space Separator
ValueCountFrequency (%)
244389
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1715691
77.9%
Common487122
 
22.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
t488778
28.5%
E244389
14.2%
s244389
14.2%
r244389
14.2%
a244389
14.2%
o244389
14.2%
S1656
 
0.1%
i1656
 
0.1%
n1656
 
0.1%
Common
ValueCountFrequency (%)
244389
50.2%
284244
 
17.3%
380130
 
16.4%
138549
 
7.9%
425379
 
5.2%
59400
 
1.9%
65031
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2202813
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t488778
22.2%
E244389
11.1%
s244389
11.1%
r244389
11.1%
a244389
11.1%
o244389
11.1%
244389
11.1%
284244
 
3.8%
380130
 
3.6%
138549
 
1.7%
Other values (6)44778
 
2.0%

FAMI_TIENEINTERNET
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing11822
Missing (%)4.5%
Memory size2.0 MiB
Si
215344 
No
33590 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters497868
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSi
2nd rowNo
3rd rowSi
4th rowSi
5th rowSi

Common Values

ValueCountFrequency (%)
Si215344
82.6%
No33590
 
12.9%
(Missing)11822
 
4.5%

Length

2022-05-24T16:10:22.281530image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T16:10:22.517022image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
si215344
86.5%
no33590
 
13.5%

Most occurring characters

ValueCountFrequency (%)
S215344
43.3%
i215344
43.3%
N33590
 
6.7%
o33590
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter248934
50.0%
Lowercase Letter248934
50.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S215344
86.5%
N33590
 
13.5%
Lowercase Letter
ValueCountFrequency (%)
i215344
86.5%
o33590
 
13.5%

Most occurring scripts

ValueCountFrequency (%)
Latin497868
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S215344
43.3%
i215344
43.3%
N33590
 
6.7%
o33590
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII497868
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S215344
43.3%
i215344
43.3%
N33590
 
6.7%
o33590
 
6.7%

ESTU_HORASSEMANATRABAJA
Categorical

MISSING

Distinct5
Distinct (%)< 0.1%
Missing12916
Missing (%)5.0%
Memory size2.0 MiB
Más de 30 horas
94222 
0
46185 
Entre 11 y 20 horas
42324 
Entre 21 y 30 horas
36666 
Menos de 10 horas
28443 

Length

Max length19
Median length17
Mean length13.89548096
Min length1

Characters and Unicode

Total characters3443856
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMenos de 10 horas
2nd rowMenos de 10 horas
3rd rowEntre 21 y 30 horas
4th rowMás de 30 horas
5th rowMás de 30 horas

Common Values

ValueCountFrequency (%)
Más de 30 horas94222
36.1%
046185
17.7%
Entre 11 y 20 horas42324
16.2%
Entre 21 y 30 horas36666
 
14.1%
Menos de 10 horas28443
 
10.9%
(Missing)12916
 
5.0%

Length

2022-05-24T16:10:22.687199image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T16:10:22.941921image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
horas201655
21.6%
30130888
14.0%
de122665
13.2%
más94222
10.1%
entre78990
 
8.5%
y78990
 
8.5%
046185
 
5.0%
1142324
 
4.5%
2042324
 
4.5%
2136666
 
3.9%
Other values (2)56886
 
6.1%

Most occurring characters

ValueCountFrequency (%)
683955
19.9%
s324320
9.4%
r280645
 
8.1%
0247840
 
7.2%
o230098
 
6.7%
e230098
 
6.7%
h201655
 
5.9%
a201655
 
5.9%
1149757
 
4.3%
3130888
 
3.8%
Other values (8)762945
22.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1950771
56.6%
Space Separator683955
 
19.9%
Decimal Number607475
 
17.6%
Uppercase Letter201655
 
5.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s324320
16.6%
r280645
14.4%
o230098
11.8%
e230098
11.8%
h201655
10.3%
a201655
10.3%
d122665
 
6.3%
n107433
 
5.5%
á94222
 
4.8%
t78990
 
4.0%
Decimal Number
ValueCountFrequency (%)
0247840
40.8%
1149757
24.7%
3130888
21.5%
278990
 
13.0%
Uppercase Letter
ValueCountFrequency (%)
M122665
60.8%
E78990
39.2%
Space Separator
ValueCountFrequency (%)
683955
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2152426
62.5%
Common1291430
37.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
s324320
15.1%
r280645
13.0%
o230098
10.7%
e230098
10.7%
h201655
9.4%
a201655
9.4%
M122665
 
5.7%
d122665
 
5.7%
n107433
 
5.0%
á94222
 
4.4%
Other values (3)236970
11.0%
Common
ValueCountFrequency (%)
683955
53.0%
0247840
 
19.2%
1149757
 
11.6%
3130888
 
10.1%
278990
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3349634
97.3%
None94222
 
2.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
683955
20.4%
s324320
9.7%
r280645
8.4%
0247840
 
7.4%
o230098
 
6.9%
e230098
 
6.9%
h201655
 
6.0%
a201655
 
6.0%
1149757
 
4.5%
3130888
 
3.9%
Other values (7)668723
20.0%
None
ValueCountFrequency (%)
á94222
100.0%

ESTU_PRGM_ACADEMICO
Categorical

HIGH CARDINALITY

Distinct810
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
ADMINISTRACION DE EMPRESAS
20413 
DERECHO
19422 
CONTADURIA PUBLICA
 
15770
PSICOLOGIA
 
12497
INGENIERIA INDUSTRIAL
 
10763
Other values (805)
181891 

Length

Max length109
Median length86
Mean length22.00076317
Min length4

Characters and Unicode

Total characters5736831
Distinct characters44
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)< 0.1%

Sample

1st rowADMINISTRACION DE EMPRESAS
2nd rowADMINISTRACION DE EMPRESAS
3rd rowADMINISTRACION DE EMPRESAS
4th rowADMINISTRACION DE EMPRESAS
5th rowADMINISTRACION DE EMPRESAS

Common Values

ValueCountFrequency (%)
ADMINISTRACION DE EMPRESAS20413
 
7.8%
DERECHO19422
 
7.4%
CONTADURIA PUBLICA15770
 
6.0%
PSICOLOGIA12497
 
4.8%
INGENIERIA INDUSTRIAL10763
 
4.1%
ADMINISTRACIÓN DE EMPRESAS9042
 
3.5%
INGENIERIA CIVIL7634
 
2.9%
MEDICINA6573
 
2.5%
INGENIERIA DE SISTEMAS6440
 
2.5%
PSICOLOGÍA6244
 
2.4%
Other values (800)145958
56.0%

Length

2022-05-24T16:10:23.220279image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
de50848
 
7.6%
ingenieria50214
 
7.5%
en42026
 
6.3%
administracion31121
 
4.7%
empresas31032
 
4.7%
licenciatura25912
 
3.9%
y23660
 
3.6%
derecho19556
 
2.9%
publica18543
 
2.8%
administración16712
 
2.5%
Other values (502)355642
53.5%

Most occurring characters

ValueCountFrequency (%)
I787362
13.7%
A662715
11.6%
E558044
9.7%
N509430
8.9%
406426
 
7.1%
C405942
 
7.1%
R340779
 
5.9%
O322656
 
5.6%
S297683
 
5.2%
D231577
 
4.0%
Other values (34)1214217
21.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5322896
92.8%
Space Separator406426
 
7.1%
Other Punctuation3966
 
0.1%
Dash Punctuation3445
 
0.1%
Decimal Number49
 
< 0.1%
Other Symbol49
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I787362
14.8%
A662715
12.5%
E558044
10.5%
N509430
9.6%
C405942
7.6%
R340779
 
6.4%
O322656
 
6.1%
S297683
 
5.6%
D231577
 
4.4%
T222519
 
4.2%
Other values (26)984189
18.5%
Other Punctuation
ValueCountFrequency (%)
,3246
81.8%
¿377
 
9.5%
:288
 
7.3%
.55
 
1.4%
Space Separator
ValueCountFrequency (%)
406426
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3445
100.0%
Decimal Number
ValueCountFrequency (%)
349
100.0%
Other Symbol
ValueCountFrequency (%)
°49
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5322896
92.8%
Common413935
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
I787362
14.8%
A662715
12.5%
E558044
10.5%
N509430
9.6%
C405942
7.6%
R340779
 
6.4%
O322656
 
6.1%
S297683
 
5.6%
D231577
 
4.4%
T222519
 
4.2%
Other values (26)984189
18.5%
Common
ValueCountFrequency (%)
406426
98.2%
-3445
 
0.8%
,3246
 
0.8%
¿377
 
0.1%
:288
 
0.1%
.55
 
< 0.1%
349
 
< 0.1%
°49
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII5680333
99.0%
None56498
 
1.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I787362
13.9%
A662715
11.7%
E558044
9.8%
N509430
9.0%
406426
 
7.2%
C405942
 
7.1%
R340779
 
6.0%
O322656
 
5.7%
S297683
 
5.2%
D231577
 
4.1%
Other values (22)1157719
20.4%
None
ValueCountFrequency (%)
Ó21766
38.5%
Í21306
37.7%
Ñ4935
 
8.7%
Ú4484
 
7.9%
Á1851
 
3.3%
É1582
 
2.8%
¿377
 
0.7%
Ü94
 
0.2%
°49
 
0.1%
À30
 
0.1%
Other values (2)24
 
< 0.1%

ESTU_PRGM_DEPARTAMENTO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
BOGOTÁ
105234 
ANTIOQUIA
32073 
VALLE
17040 
ATLANTICO
15315 
SANTANDER
10968 
Other values (23)
80126 

Length

Max length15
Median length12
Mean length7.128219485
Min length4

Characters and Unicode

Total characters1858726
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowVALLE
2nd rowVALLE
3rd rowVALLE
4th rowVALLE
5th rowVALLE

Common Values

ValueCountFrequency (%)
BOGOTÁ105234
40.4%
ANTIOQUIA32073
 
12.3%
VALLE17040
 
6.5%
ATLANTICO15315
 
5.9%
SANTANDER10968
 
4.2%
NORTE SANTANDER8667
 
3.3%
BOLIVAR7675
 
2.9%
BOYACA5594
 
2.1%
NARIÑO5127
 
2.0%
CUNDINAMARCA5047
 
1.9%
Other values (18)48016
18.4%

Length

2022-05-24T16:10:23.483347image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bogotá105234
38.8%
antioquia32073
 
11.8%
santander19635
 
7.2%
valle17040
 
6.3%
atlantico15315
 
5.6%
norte8667
 
3.2%
bolivar7675
 
2.8%
boyaca5594
 
2.1%
nariño5127
 
1.9%
cundinamarca5047
 
1.9%
Other values (19)49968
18.4%

Most occurring characters

ValueCountFrequency (%)
O303953
16.4%
A262620
14.1%
T205126
11.0%
B122915
 
6.6%
I119280
 
6.4%
N117750
 
6.3%
G110111
 
5.9%
Á105234
 
5.7%
L79815
 
4.3%
R68998
 
3.7%
Other values (15)362924
19.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1848107
99.4%
Space Separator10619
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O303953
16.4%
A262620
14.2%
T205126
11.1%
B122915
 
6.7%
I119280
 
6.5%
N117750
 
6.4%
G110111
 
6.0%
Á105234
 
5.7%
L79815
 
4.3%
R68998
 
3.7%
Other values (14)352305
19.1%
Space Separator
ValueCountFrequency (%)
10619
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1848107
99.4%
Common10619
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
O303953
16.4%
A262620
14.2%
T205126
11.1%
B122915
 
6.7%
I119280
 
6.5%
N117750
 
6.4%
G110111
 
6.0%
Á105234
 
5.7%
L79815
 
4.3%
R68998
 
3.7%
Other values (14)352305
19.1%
Common
ValueCountFrequency (%)
10619
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1748365
94.1%
None110361
 
5.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
O303953
17.4%
A262620
15.0%
T205126
11.7%
B122915
7.0%
I119280
 
6.8%
N117750
 
6.7%
G110111
 
6.3%
L79815
 
4.6%
R68998
 
3.9%
C59791
 
3.4%
Other values (13)298006
17.0%
None
ValueCountFrequency (%)
Á105234
95.4%
Ñ5127
 
4.6%

MOD_RAZONA_CUANTITAT_PUNT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct172
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean146.5563822
Minimum0
Maximum300
Zeros55
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2022-05-24T16:10:23.733209image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile96
Q1123
median146
Q3168
95-th percentile200
Maximum300
Range300
Interquartile range (IQR)45

Descriptive statistics

Standard deviation31.71409836
Coefficient of variation (CV)0.2163952049
Kurtosis0.03595965347
Mean146.5563822
Median Absolute Deviation (MAD)23
Skewness0.2108137414
Sum38215456
Variance1005.784035
MonotonicityNot monotonic
2022-05-24T16:10:24.019537image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1453164
 
1.2%
1473118
 
1.2%
1523117
 
1.2%
1443100
 
1.2%
1433094
 
1.2%
1463092
 
1.2%
1353086
 
1.2%
1543086
 
1.2%
1553086
 
1.2%
1423082
 
1.2%
Other values (162)229731
88.1%
ValueCountFrequency (%)
055
< 0.1%
665
 
< 0.1%
6730
 
< 0.1%
6826
 
< 0.1%
6935
 
< 0.1%
7054
< 0.1%
7165
< 0.1%
7274
< 0.1%
73103
< 0.1%
74118
< 0.1%
ValueCountFrequency (%)
300264
0.1%
23527
 
< 0.1%
23476
 
< 0.1%
233100
 
< 0.1%
23272
 
< 0.1%
23173
 
< 0.1%
23079
 
< 0.1%
22964
 
< 0.1%
228186
0.1%
227233
0.1%

MOD_LECTURA_CRITICA_PUNT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct182
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean149.0340817
Minimum0
Maximum300
Zeros52
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2022-05-24T16:10:24.287127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile98
Q1126
median149
Q3172
95-th percentile200
Maximum300
Range300
Interquartile range (IQR)46

Descriptive statistics

Standard deviation31.37696168
Coefficient of variation (CV)0.2105354784
Kurtosis-0.170868429
Mean149.0340817
Median Absolute Deviation (MAD)23
Skewness0.0608466345
Sum38861531
Variance984.5137241
MonotonicityNot monotonic
2022-05-24T16:10:24.546819image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1533089
 
1.2%
1563080
 
1.2%
1473078
 
1.2%
1553074
 
1.2%
1463067
 
1.2%
1453050
 
1.2%
1583048
 
1.2%
1503042
 
1.2%
1443036
 
1.2%
1493022
 
1.2%
Other values (172)230170
88.3%
ValueCountFrequency (%)
052
< 0.1%
571
 
< 0.1%
581
 
< 0.1%
5911
 
< 0.1%
6010
 
< 0.1%
619
 
< 0.1%
6221
< 0.1%
6332
< 0.1%
6424
< 0.1%
6546
< 0.1%
ValueCountFrequency (%)
300163
0.1%
23712
 
< 0.1%
23517
 
< 0.1%
2346
 
< 0.1%
2333
 
< 0.1%
23290
< 0.1%
23141
 
< 0.1%
23030
 
< 0.1%
229107
< 0.1%
22867
< 0.1%

MOD_COMPETEN_CIUDADA_PUNT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct180
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean142.4883684
Minimum0
Maximum300
Zeros210
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2022-05-24T16:10:24.830878image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile89
Q1118
median143
Q3167
95-th percentile197
Maximum300
Range300
Interquartile range (IQR)49

Descriptive statistics

Standard deviation33.34157143
Coefficient of variation (CV)0.2339950397
Kurtosis-0.2921520717
Mean142.4883684
Median Absolute Deviation (MAD)25
Skewness0.009068225782
Sum37154697
Variance1111.660385
MonotonicityNot monotonic
2022-05-24T16:10:25.121980image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1452936
 
1.1%
1502935
 
1.1%
1462884
 
1.1%
1442882
 
1.1%
1522834
 
1.1%
1582826
 
1.1%
1382825
 
1.1%
1542813
 
1.1%
1552811
 
1.1%
1432805
 
1.1%
Other values (170)232205
89.1%
ValueCountFrequency (%)
0210
0.1%
6011
 
< 0.1%
6157
 
< 0.1%
6272
 
< 0.1%
6375
 
< 0.1%
6461
 
< 0.1%
6592
< 0.1%
66107
< 0.1%
67163
0.1%
68174
0.1%
ValueCountFrequency (%)
30072
< 0.1%
23718
 
< 0.1%
23630
< 0.1%
23523
 
< 0.1%
23441
< 0.1%
23339
< 0.1%
23267
< 0.1%
23162
< 0.1%
23059
< 0.1%
22959
< 0.1%

MOD_INGLES_PUNT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct163
Distinct (%)0.1%
Missing71
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean151.990805
Minimum0
Maximum300
Zeros688
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2022-05-24T16:10:25.397530image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile104
Q1131
median149
Q3173
95-th percentile207
Maximum300
Range300
Interquartile range (IQR)42

Descriptive statistics

Standard deviation32.30237748
Coefficient of variation (CV)0.2125284979
Kurtosis1.633051373
Mean151.990805
Median Absolute Deviation (MAD)20
Skewness0.1921270345
Sum39621723
Variance1043.443591
MonotonicityNot monotonic
2022-05-24T16:10:25.649714image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1453927
 
1.5%
1463883
 
1.5%
1433854
 
1.5%
1473853
 
1.5%
1423817
 
1.5%
1503773
 
1.4%
1413771
 
1.4%
1443741
 
1.4%
1403733
 
1.4%
1493704
 
1.4%
Other values (153)222629
85.4%
ValueCountFrequency (%)
0688
0.3%
726
 
< 0.1%
7332
 
< 0.1%
74103
 
< 0.1%
7599
 
< 0.1%
76134
 
0.1%
77171
 
0.1%
78220
 
0.1%
79227
 
0.1%
80207
 
0.1%
ValueCountFrequency (%)
300793
0.3%
23318
 
< 0.1%
231156
 
0.1%
23036
 
< 0.1%
229152
 
0.1%
228224
 
0.1%
227115
 
< 0.1%
226154
 
0.1%
225226
 
0.1%
224174
 
0.1%

MOD_COMUNI_ESCRITA_PUNT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct180
Distinct (%)0.1%
Missing310
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean143.7554963
Minimum0
Maximum300
Zeros8364
Zeros (%)3.2%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2022-05-24T16:10:25.902433image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile98
Q1132
median143
Q3167
95-th percentile194
Maximum300
Range300
Interquartile range (IQR)35

Descriptive statistics

Standard deviation37.54039465
Coefficient of variation (CV)0.2611405866
Kurtosis5.526207455
Mean143.7554963
Median Absolute Deviation (MAD)17
Skewness-1.342473619
Sum37440544
Variance1409.28123
MonotonicityNot monotonic
2022-05-24T16:10:26.203191image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
08364
 
3.2%
1407451
 
2.9%
1397292
 
2.8%
1387037
 
2.7%
1416947
 
2.7%
1376719
 
2.6%
1426564
 
2.5%
1366225
 
2.4%
1436072
 
2.3%
1355978
 
2.3%
Other values (170)191797
73.6%
ValueCountFrequency (%)
08364
3.2%
611
 
< 0.1%
633
 
< 0.1%
647
 
< 0.1%
714
 
< 0.1%
721
 
< 0.1%
737
 
< 0.1%
7414
 
< 0.1%
759
 
< 0.1%
7626
 
< 0.1%
ValueCountFrequency (%)
300808
0.3%
2452
 
< 0.1%
2441
 
< 0.1%
2431
 
< 0.1%
2422
 
< 0.1%
2411
 
< 0.1%
2403
 
< 0.1%
2392
 
< 0.1%
2381
 
< 0.1%
2376
 
< 0.1%

PUNT_GLOBAL
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct216
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean146.7303648
Minimum0
Maximum256
Zeros3
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2022-05-24T16:10:26.452322image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile110
Q1130
median145
Q3162
95-th percentile187
Maximum256
Range256
Interquartile range (IQR)32

Descriptive statistics

Standard deviation23.63107117
Coefficient of variation (CV)0.1610509945
Kurtosis0.06667746627
Mean146.7303648
Median Absolute Deviation (MAD)16
Skewness0.1693183991
Sum38260823
Variance558.4275248
MonotonicityNot monotonic
2022-05-24T16:10:26.764582image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1394493
 
1.7%
1414470
 
1.7%
1444441
 
1.7%
1434424
 
1.7%
1404417
 
1.7%
1454406
 
1.7%
1374396
 
1.7%
1424364
 
1.7%
1474360
 
1.7%
1464357
 
1.7%
Other values (206)216628
83.1%
ValueCountFrequency (%)
03
< 0.1%
181
 
< 0.1%
191
 
< 0.1%
201
 
< 0.1%
232
< 0.1%
242
< 0.1%
261
 
< 0.1%
282
< 0.1%
311
 
< 0.1%
321
 
< 0.1%
ValueCountFrequency (%)
2561
 
< 0.1%
2541
 
< 0.1%
2531
 
< 0.1%
2521
 
< 0.1%
2502
< 0.1%
2492
< 0.1%
2482
< 0.1%
2472
< 0.1%
2463
< 0.1%
2451
 
< 0.1%

Interactions

2022-05-24T16:10:16.212954image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:07.016267image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:08.677165image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:10.338621image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:12.040192image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:13.709699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:16.474154image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:07.306872image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:08.959068image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:10.596365image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:12.338048image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:14.013201image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:16.734282image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:07.562783image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:09.249001image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:10.890292image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:12.612426image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:14.265800image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:17.013015image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:07.846863image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:09.517028image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:11.159160image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:12.897753image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:15.414578image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:17.263702image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:08.128370image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:09.819304image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:11.446130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:13.160820image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:15.668554image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:17.544557image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:08.389233image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:10.072083image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:11.763927image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:13.453346image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T16:10:15.944491image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-05-24T16:10:26.981292image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-24T16:10:27.279330image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-24T16:10:27.559196image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-24T16:10:27.914717image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-24T16:10:28.202523image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-24T16:10:18.014407image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-24T16:10:18.859880image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-24T16:10:19.966517image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-24T16:10:20.379138image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

ESTU_GENEROESTU_DEPTO_RESIDEESTU_SEMESTRECURSAFAMI_ESTRATOVIVIENDAFAMI_TIENEINTERNETESTU_HORASSEMANATRABAJAESTU_PRGM_ACADEMICOESTU_PRGM_DEPARTAMENTOMOD_RAZONA_CUANTITAT_PUNTMOD_LECTURA_CRITICA_PUNTMOD_COMPETEN_CIUDADA_PUNTMOD_INGLES_PUNTMOD_COMUNI_ESCRITA_PUNTPUNT_GLOBAL
0MVALLE08Estrato 4SiMenos de 10 horasADMINISTRACION DE EMPRESASVALLE11081100147.0126.0113
1FVALLE09Estrato 1NoMenos de 10 horasADMINISTRACION DE EMPRESASVALLE105111100111.0106.0107
2FVALLE08Estrato 3SiEntre 21 y 30 horasADMINISTRACION DE EMPRESASVALLE130137171140.0143.0144
3FVALLE08Estrato 3SiMás de 30 horasADMINISTRACION DE EMPRESASVALLE135141135143.0175.0146
4FVALLE09Estrato 2SiMás de 30 horasADMINISTRACION DE EMPRESASVALLE120110154122.0145.0130
5FVALLE09Estrato 3Si0ADMINISTRACION DE EMPRESASVALLE142153147162.00.0121
6FVALLE08Estrato 2SiMenos de 10 horasADMINISTRACION DE EMPRESASVALLE147129139134.0143.0138
7FVALLE08Estrato 1SiEntre 11 y 20 horasADMINISTRACION DE EMPRESASVALLE12813913197.0137.0126
8FVALLE09Estrato 3SiMás de 30 horasADMINISTRACION DE EMPRESASVALLE198180119141.0129.0153
9MVALLE09Estrato 4SiMás de 30 horasADMINISTRACION DE EMPRESASVALLE157151131124.00.0113

Last rows

ESTU_GENEROESTU_DEPTO_RESIDEESTU_SEMESTRECURSAFAMI_ESTRATOVIVIENDAFAMI_TIENEINTERNETESTU_HORASSEMANATRABAJAESTU_PRGM_ACADEMICOESTU_PRGM_DEPARTAMENTOMOD_RAZONA_CUANTITAT_PUNTMOD_LECTURA_CRITICA_PUNTMOD_COMPETEN_CIUDADA_PUNTMOD_INGLES_PUNTMOD_COMUNI_ESCRITA_PUNTPUNT_GLOBAL
260746FCESAR10Estrato 4SiMenos de 10 horasDERECHOCESAR10011313494.0116.0111
260747MBOGOTÁ07Estrato 4SiEntre 11 y 20 horasINGENIERIA DE SISTEMAS Y COMPUTACIONBOGOTÁ226220177300.0195.0224
260748FANTIOQUIA07Estrato 5NoMás de 30 horasBIOLOGIAANTIOQUIA137155144106.0102.0129
260749FCESAR12 o másEstrato 3SiEntre 11 y 20 horasDERECHOCESAR89135147138.0144.0131
260750FBOGOTÁ10Estrato 4SiMenos de 10 horasGEOCIENCIASBOGOTÁ196193159212.0138.0180
260751MCALDAS11Estrato 3SiEntre 11 y 20 horasADMINISTRACION DE SISTEMAS INFORMATICOSCALDAS116132124144.0140.0131
260752MCALDAS12 o másEstrato 2SiMás de 30 horasADMINISTRACION DE SISTEMAS INFORMATICOSCALDAS159192169196.0143.0172
260753FATLANTICO08Estrato 2SiMás de 30 horasCONTADURIA PUBLICAATLANTICO18414283120.0140.0134
260754FBOGOTÁ08Estrato 1Si0GEOCIENCIASBOGOTÁ210167160198.0138.0175
260755FBOGOTÁ08Estrato 2SiEntre 21 y 30 horasLICENCIATURA EN PEDAGOGIA INFANTILBOGOTÁ158166148140.0194.0161