Dataset statistics
Number of variables | 14 |
---|---|
Number of observations | 260756 |
Missing cells | 47800 |
Missing cells (%) | 1.3% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 27.9 MiB |
Average record size in memory | 112.0 B |
Variable types
Categorical | 8 |
---|---|
Numeric | 6 |
ESTU_PRGM_ACADEMICO has a high cardinality: 810 distinct values | High cardinality |
MOD_RAZONA_CUANTITAT_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 1 other fields | High correlation |
MOD_LECTURA_CRITICA_PUNT is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fields | High correlation |
MOD_COMPETEN_CIUDADA_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 2 other fields | High correlation |
MOD_INGLES_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 2 other fields | High correlation |
PUNT_GLOBAL is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fields | High correlation |
MOD_RAZONA_CUANTITAT_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 1 other fields | High correlation |
MOD_LECTURA_CRITICA_PUNT is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fields | High correlation |
MOD_COMPETEN_CIUDADA_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 2 other fields | High correlation |
MOD_INGLES_PUNT is highly correlated with MOD_LECTURA_CRITICA_PUNT and 2 other fields | High correlation |
MOD_COMUNI_ESCRITA_PUNT is highly correlated with PUNT_GLOBAL | High correlation |
PUNT_GLOBAL is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 4 other fields | High correlation |
MOD_RAZONA_CUANTITAT_PUNT is highly correlated with PUNT_GLOBAL | High correlation |
MOD_LECTURA_CRITICA_PUNT is highly correlated with PUNT_GLOBAL | High correlation |
MOD_COMPETEN_CIUDADA_PUNT is highly correlated with PUNT_GLOBAL | High correlation |
MOD_INGLES_PUNT is highly correlated with PUNT_GLOBAL | High correlation |
PUNT_GLOBAL is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fields | High correlation |
ESTU_DEPTO_RESIDE is highly correlated with ESTU_PRGM_DEPARTAMENTO | High correlation |
ESTU_PRGM_DEPARTAMENTO is highly correlated with ESTU_DEPTO_RESIDE | High correlation |
ESTU_DEPTO_RESIDE is highly correlated with ESTU_PRGM_DEPARTAMENTO | High correlation |
ESTU_PRGM_DEPARTAMENTO is highly correlated with ESTU_DEPTO_RESIDE | High correlation |
MOD_RAZONA_CUANTITAT_PUNT is highly correlated with MOD_COMPETEN_CIUDADA_PUNT and 2 other fields | High correlation |
MOD_LECTURA_CRITICA_PUNT is highly correlated with MOD_COMPETEN_CIUDADA_PUNT and 1 other fields | High correlation |
MOD_COMPETEN_CIUDADA_PUNT is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 3 other fields | High correlation |
MOD_INGLES_PUNT is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 2 other fields | High correlation |
MOD_COMUNI_ESCRITA_PUNT is highly correlated with PUNT_GLOBAL | High correlation |
PUNT_GLOBAL is highly correlated with MOD_RAZONA_CUANTITAT_PUNT and 4 other fields | High correlation |
ESTU_SEMESTRECURSA has 4263 (1.6%) missing values | Missing |
FAMI_ESTRATOVIVIENDA has 16367 (6.3%) missing values | Missing |
FAMI_TIENEINTERNET has 11822 (4.5%) missing values | Missing |
ESTU_HORASSEMANATRABAJA has 12916 (5.0%) missing values | Missing |
MOD_COMUNI_ESCRITA_PUNT has 8364 (3.2%) zeros | Zeros |
Reproduction
Analysis started | 2022-05-24 16:09:47.281007 |
---|---|
Analysis finished | 2022-05-24 16:10:20.840584 |
Duration | 33.56 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
ESTU_GENERO
Categorical
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 26 |
Missing (%) | < 0.1% |
Memory size | 2.0 MiB |
F | |
---|---|
M |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Characters and Unicode
Total characters | 260730 |
---|---|
Distinct characters | 2 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | M |
---|---|
2nd row | F |
3rd row | F |
4th row | F |
5th row | F |
Common Values
Value | Count | Frequency (%) |
F | 153820 | |
M | 106910 | |
(Missing) | 26 | < 0.1% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
f | 153820 | |
m | 106910 |
Most occurring characters
Value | Count | Frequency (%) |
F | 153820 | |
M | 106910 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 260730 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
F | 153820 | |
M | 106910 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 260730 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
F | 153820 | |
M | 106910 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 260730 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
F | 153820 | |
M | 106910 |
Distinct | 34 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2025 |
Missing (%) | 0.8% |
Memory size | 2.0 MiB |
BOGOTÁ | |
---|---|
ANTIOQUIA | |
VALLE | |
ATLANTICO | |
CUNDINAMARCA | |
Other values (29) |
Length
Max length | 15 |
---|---|
Median length | 12 |
Mean length | 7.366249116 |
Min length | 4 |
Characters and Unicode
Total characters | 1905877 |
---|---|
Distinct characters | 26 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 2 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | VALLE |
---|---|
2nd row | VALLE |
3rd row | VALLE |
4th row | VALLE |
5th row | VALLE |
Common Values
Value | Count | Frequency (%) |
BOGOTÁ | 72185 | |
ANTIOQUIA | 31153 | |
VALLE | 20633 | 7.9% |
ATLANTICO | 15145 | 5.8% |
CUNDINAMARCA | 14930 | 5.7% |
SANTANDER | 12604 | 4.8% |
NORTE SANTANDER | 8374 | 3.2% |
BOLIVAR | 7980 | 3.1% |
BOYACA | 6870 | 2.6% |
NARIÑO | 6496 | 2.5% |
Other values (24) | 62361 |
Length
Value | Count | Frequency (%) |
bogotá | 72185 | |
antioquia | 31153 | |
santander | 20978 | 7.8% |
valle | 20633 | 7.6% |
atlantico | 15145 | 5.6% |
cundinamarca | 14930 | 5.5% |
norte | 8374 | 3.1% |
bolivar | 7980 | 3.0% |
boyaca | 6870 | 2.5% |
nariño | 6496 | 2.4% |
Other values (26) | 65003 |
Most occurring characters
Value | Count | Frequency (%) |
A | 323499 | |
O | 244967 | |
T | 176925 | 9.3% |
N | 141889 | 7.4% |
I | 133559 | 7.0% |
L | 93856 | 4.9% |
B | 92811 | 4.9% |
R | 87505 | 4.6% |
C | 87236 | 4.6% |
G | 78640 | 4.1% |
Other values (16) | 444990 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 1894861 | |
Space Separator | 11016 | 0.6% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
A | 323499 | |
O | 244967 | |
T | 176925 | |
N | 141889 | 7.5% |
I | 133559 | 7.0% |
L | 93856 | 5.0% |
B | 92811 | 4.9% |
R | 87505 | 4.6% |
C | 87236 | 4.6% |
G | 78640 | 4.2% |
Other values (15) | 433974 |
Space Separator
Value | Count | Frequency (%) |
11016 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 1894861 | |
Common | 11016 | 0.6% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
A | 323499 | |
O | 244967 | |
T | 176925 | |
N | 141889 | 7.5% |
I | 133559 | 7.0% |
L | 93856 | 5.0% |
B | 92811 | 4.9% |
R | 87505 | 4.6% |
C | 87236 | 4.6% |
G | 78640 | 4.2% |
Other values (15) | 433974 |
Common
Value | Count | Frequency (%) |
11016 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1827196 | |
None | 78681 | 4.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
A | 323499 | |
O | 244967 | |
T | 176925 | |
N | 141889 | 7.8% |
I | 133559 | 7.3% |
L | 93856 | 5.1% |
B | 92811 | 5.1% |
R | 87505 | 4.8% |
C | 87236 | 4.8% |
G | 78640 | 4.3% |
Other values (14) | 366309 |
None
Value | Count | Frequency (%) |
Á | 72185 | |
Ñ | 6496 | 8.3% |
Distinct | 12 |
---|---|
Distinct (%) | < 0.1% |
Missing | 4263 |
Missing (%) | 1.6% |
Memory size | 2.0 MiB |
09 | |
---|---|
10 | |
08 | |
07 | |
11 | |
Other values (7) |
Length
Max length | 8 |
---|---|
Median length | 2 |
Mean length | 2.186367659 |
Min length | 2 |
Characters and Unicode
Total characters | 560788 |
---|---|
Distinct characters | 15 |
Distinct categories | 3 ? |
Distinct scripts | 2 ? |
Distinct blocks | 2 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 08 |
---|---|
2nd row | 09 |
3rd row | 08 |
4th row | 08 |
5th row | 09 |
Common Values
Value | Count | Frequency (%) |
09 | 86643 | |
10 | 73503 | |
08 | 58838 | |
07 | 16668 | 6.4% |
11 | 9028 | 3.5% |
12 o más | 7967 | 3.1% |
06 | 2377 | 0.9% |
05 | 542 | 0.2% |
04 | 453 | 0.2% |
03 | 275 | 0.1% |
Other values (2) | 199 | 0.1% |
(Missing) | 4263 | 1.6% |
Length
Value | Count | Frequency (%) |
09 | 86643 | |
10 | 73503 | |
08 | 58838 | |
07 | 16668 | 6.1% |
11 | 9028 | 3.3% |
12 | 7967 | 2.9% |
o | 7967 | 2.9% |
más | 7967 | 2.9% |
06 | 2377 | 0.9% |
05 | 542 | 0.2% |
Other values (4) | 927 | 0.3% |
Most occurring characters
Value | Count | Frequency (%) |
0 | 239498 | |
1 | 99579 | |
9 | 86643 | 15.5% |
8 | 58838 | 10.5% |
7 | 16668 | 3.0% |
15934 | 2.8% | |
2 | 8113 | 1.4% |
o | 7967 | 1.4% |
m | 7967 | 1.4% |
á | 7967 | 1.4% |
Other values (5) | 11614 | 2.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 512986 | |
Lowercase Letter | 31868 | 5.7% |
Space Separator | 15934 | 2.8% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 239498 | |
1 | 99579 | |
9 | 86643 | 16.9% |
8 | 58838 | 11.5% |
7 | 16668 | 3.2% |
2 | 8113 | 1.6% |
6 | 2377 | 0.5% |
5 | 542 | 0.1% |
4 | 453 | 0.1% |
3 | 275 | 0.1% |
Lowercase Letter
Value | Count | Frequency (%) |
o | 7967 | |
m | 7967 | |
á | 7967 | |
s | 7967 |
Space Separator
Value | Count | Frequency (%) |
15934 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 528920 | |
Latin | 31868 | 5.7% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 239498 | |
1 | 99579 | |
9 | 86643 | 16.4% |
8 | 58838 | 11.1% |
7 | 16668 | 3.2% |
15934 | 3.0% | |
2 | 8113 | 1.5% |
6 | 2377 | 0.4% |
5 | 542 | 0.1% |
4 | 453 | 0.1% |
Latin
Value | Count | Frequency (%) |
o | 7967 | |
m | 7967 | |
á | 7967 | |
s | 7967 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 552821 | |
None | 7967 | 1.4% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 239498 | |
1 | 99579 | |
9 | 86643 | 15.7% |
8 | 58838 | 10.6% |
7 | 16668 | 3.0% |
15934 | 2.9% | |
2 | 8113 | 1.5% |
o | 7967 | 1.4% |
m | 7967 | 1.4% |
s | 7967 | 1.4% |
Other values (4) | 3647 | 0.7% |
None
Value | Count | Frequency (%) |
á | 7967 |
Distinct | 7 |
---|---|
Distinct (%) | < 0.1% |
Missing | 16367 |
Missing (%) | 6.3% |
Memory size | 2.0 MiB |
Estrato 2 | |
---|---|
Estrato 3 | |
Estrato 1 | |
Estrato 4 | |
Estrato 5 | |
Other values (2) | 6687 |
Length
Max length | 11 |
---|---|
Median length | 9 |
Mean length | 9.013552165 |
Min length | 9 |
Characters and Unicode
Total characters | 2202813 |
---|---|
Distinct characters | 16 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Estrato 4 |
---|---|
2nd row | Estrato 1 |
3rd row | Estrato 3 |
4th row | Estrato 3 |
5th row | Estrato 2 |
Common Values
Value | Count | Frequency (%) |
Estrato 2 | 84244 | |
Estrato 3 | 80130 | |
Estrato 1 | 38549 | |
Estrato 4 | 25379 | 9.7% |
Estrato 5 | 9400 | 3.6% |
Estrato 6 | 5031 | 1.9% |
Sin Estrato | 1656 | 0.6% |
(Missing) | 16367 | 6.3% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
estrato | 244389 | |
2 | 84244 | 17.2% |
3 | 80130 | 16.4% |
1 | 38549 | 7.9% |
4 | 25379 | 5.2% |
5 | 9400 | 1.9% |
6 | 5031 | 1.0% |
sin | 1656 | 0.3% |
Most occurring characters
Value | Count | Frequency (%) |
t | 488778 | |
E | 244389 | |
s | 244389 | |
r | 244389 | |
a | 244389 | |
o | 244389 | |
244389 | ||
2 | 84244 | 3.8% |
3 | 80130 | 3.6% |
1 | 38549 | 1.7% |
Other values (6) | 44778 | 2.0% |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 1469646 | |
Uppercase Letter | 246045 | 11.2% |
Space Separator | 244389 | 11.1% |
Decimal Number | 242733 | 11.0% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
t | 488778 | |
s | 244389 | |
r | 244389 | |
a | 244389 | |
o | 244389 | |
i | 1656 | 0.1% |
n | 1656 | 0.1% |
Decimal Number
Value | Count | Frequency (%) |
2 | 84244 | |
3 | 80130 | |
1 | 38549 | |
4 | 25379 | 10.5% |
5 | 9400 | 3.9% |
6 | 5031 | 2.1% |
Uppercase Letter
Value | Count | Frequency (%) |
E | 244389 | |
S | 1656 | 0.7% |
Space Separator
Value | Count | Frequency (%) |
244389 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 1715691 | |
Common | 487122 | 22.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
t | 488778 | |
E | 244389 | |
s | 244389 | |
r | 244389 | |
a | 244389 | |
o | 244389 | |
S | 1656 | 0.1% |
i | 1656 | 0.1% |
n | 1656 | 0.1% |
Common
Value | Count | Frequency (%) |
244389 | ||
2 | 84244 | 17.3% |
3 | 80130 | 16.4% |
1 | 38549 | 7.9% |
4 | 25379 | 5.2% |
5 | 9400 | 1.9% |
6 | 5031 | 1.0% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 2202813 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
t | 488778 | |
E | 244389 | |
s | 244389 | |
r | 244389 | |
a | 244389 | |
o | 244389 | |
244389 | ||
2 | 84244 | 3.8% |
3 | 80130 | 3.6% |
1 | 38549 | 1.7% |
Other values (6) | 44778 | 2.0% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 11822 |
Missing (%) | 4.5% |
Memory size | 2.0 MiB |
Si | |
---|---|
No |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Characters and Unicode
Total characters | 497868 |
---|---|
Distinct characters | 4 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Si |
---|---|
2nd row | No |
3rd row | Si |
4th row | Si |
5th row | Si |
Common Values
Value | Count | Frequency (%) |
Si | 215344 | |
No | 33590 | 12.9% |
(Missing) | 11822 | 4.5% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
si | 215344 | |
no | 33590 | 13.5% |
Most occurring characters
Value | Count | Frequency (%) |
S | 215344 | |
i | 215344 | |
N | 33590 | 6.7% |
o | 33590 | 6.7% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 248934 | |
Lowercase Letter | 248934 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
S | 215344 | |
N | 33590 | 13.5% |
Lowercase Letter
Value | Count | Frequency (%) |
i | 215344 | |
o | 33590 | 13.5% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 497868 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
S | 215344 | |
i | 215344 | |
N | 33590 | 6.7% |
o | 33590 | 6.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 497868 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
S | 215344 | |
i | 215344 | |
N | 33590 | 6.7% |
o | 33590 | 6.7% |
Distinct | 5 |
---|---|
Distinct (%) | < 0.1% |
Missing | 12916 |
Missing (%) | 5.0% |
Memory size | 2.0 MiB |
Más de 30 horas | |
---|---|
0 | |
Entre 11 y 20 horas | |
Entre 21 y 30 horas | |
Menos de 10 horas |
Length
Max length | 19 |
---|---|
Median length | 17 |
Mean length | 13.89548096 |
Min length | 1 |
Characters and Unicode
Total characters | 3443856 |
---|---|
Distinct characters | 18 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 2 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Menos de 10 horas |
---|---|
2nd row | Menos de 10 horas |
3rd row | Entre 21 y 30 horas |
4th row | Más de 30 horas |
5th row | Más de 30 horas |
Common Values
Value | Count | Frequency (%) |
Más de 30 horas | 94222 | |
0 | 46185 | |
Entre 11 y 20 horas | 42324 | |
Entre 21 y 30 horas | 36666 | 14.1% |
Menos de 10 horas | 28443 | 10.9% |
(Missing) | 12916 | 5.0% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
horas | 201655 | |
30 | 130888 | |
de | 122665 | |
más | 94222 | |
entre | 78990 | 8.5% |
y | 78990 | 8.5% |
0 | 46185 | 5.0% |
11 | 42324 | 4.5% |
20 | 42324 | 4.5% |
21 | 36666 | 3.9% |
Other values (2) | 56886 | 6.1% |
Most occurring characters
Value | Count | Frequency (%) |
683955 | ||
s | 324320 | |
r | 280645 | 8.1% |
0 | 247840 | 7.2% |
o | 230098 | 6.7% |
e | 230098 | 6.7% |
h | 201655 | 5.9% |
a | 201655 | 5.9% |
1 | 149757 | 4.3% |
3 | 130888 | 3.8% |
Other values (8) | 762945 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 1950771 | |
Space Separator | 683955 | 19.9% |
Decimal Number | 607475 | 17.6% |
Uppercase Letter | 201655 | 5.9% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
s | 324320 | |
r | 280645 | |
o | 230098 | |
e | 230098 | |
h | 201655 | |
a | 201655 | |
d | 122665 | 6.3% |
n | 107433 | 5.5% |
á | 94222 | 4.8% |
t | 78990 | 4.0% |
Decimal Number
Value | Count | Frequency (%) |
0 | 247840 | |
1 | 149757 | |
3 | 130888 | |
2 | 78990 | 13.0% |
Uppercase Letter
Value | Count | Frequency (%) |
M | 122665 | |
E | 78990 |
Space Separator
Value | Count | Frequency (%) |
683955 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 2152426 | |
Common | 1291430 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
s | 324320 | |
r | 280645 | |
o | 230098 | |
e | 230098 | |
h | 201655 | |
a | 201655 | |
M | 122665 | 5.7% |
d | 122665 | 5.7% |
n | 107433 | 5.0% |
á | 94222 | 4.4% |
Other values (3) | 236970 |
Common
Value | Count | Frequency (%) |
683955 | ||
0 | 247840 | 19.2% |
1 | 149757 | 11.6% |
3 | 130888 | 10.1% |
2 | 78990 | 6.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 3349634 | |
None | 94222 | 2.7% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
683955 | ||
s | 324320 | |
r | 280645 | |
0 | 247840 | 7.4% |
o | 230098 | 6.9% |
e | 230098 | 6.9% |
h | 201655 | 6.0% |
a | 201655 | 6.0% |
1 | 149757 | 4.5% |
3 | 130888 | 3.9% |
Other values (7) | 668723 |
None
Value | Count | Frequency (%) |
á | 94222 |
Distinct | 810 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 2.0 MiB |
ADMINISTRACION DE EMPRESAS | |
---|---|
DERECHO | |
CONTADURIA PUBLICA | 15770 |
PSICOLOGIA | 12497 |
INGENIERIA INDUSTRIAL | 10763 |
Other values (805) |
Length
Max length | 109 |
---|---|
Median length | 86 |
Mean length | 22.00076317 |
Min length | 4 |
Characters and Unicode
Total characters | 5736831 |
---|---|
Distinct characters | 44 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 2 ? |
Unique
Unique | 28 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | ADMINISTRACION DE EMPRESAS |
---|---|
2nd row | ADMINISTRACION DE EMPRESAS |
3rd row | ADMINISTRACION DE EMPRESAS |
4th row | ADMINISTRACION DE EMPRESAS |
5th row | ADMINISTRACION DE EMPRESAS |
Common Values
Value | Count | Frequency (%) |
ADMINISTRACION DE EMPRESAS | 20413 | 7.8% |
DERECHO | 19422 | 7.4% |
CONTADURIA PUBLICA | 15770 | 6.0% |
PSICOLOGIA | 12497 | 4.8% |
INGENIERIA INDUSTRIAL | 10763 | 4.1% |
ADMINISTRACIÓN DE EMPRESAS | 9042 | 3.5% |
INGENIERIA CIVIL | 7634 | 2.9% |
MEDICINA | 6573 | 2.5% |
INGENIERIA DE SISTEMAS | 6440 | 2.5% |
PSICOLOGÍA | 6244 | 2.4% |
Other values (800) | 145958 |
Length
Value | Count | Frequency (%) |
de | 50848 | 7.6% |
ingenieria | 50214 | 7.5% |
en | 42026 | 6.3% |
administracion | 31121 | 4.7% |
empresas | 31032 | 4.7% |
licenciatura | 25912 | 3.9% |
y | 23660 | 3.6% |
derecho | 19556 | 2.9% |
publica | 18543 | 2.8% |
administración | 16712 | 2.5% |
Other values (502) | 355642 |
Most occurring characters
Value | Count | Frequency (%) |
I | 787362 | |
A | 662715 | |
E | 558044 | |
N | 509430 | |
406426 | 7.1% | |
C | 405942 | 7.1% |
R | 340779 | 5.9% |
O | 322656 | 5.6% |
S | 297683 | 5.2% |
D | 231577 | 4.0% |
Other values (34) | 1214217 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 5322896 | |
Space Separator | 406426 | 7.1% |
Other Punctuation | 3966 | 0.1% |
Dash Punctuation | 3445 | 0.1% |
Decimal Number | 49 | < 0.1% |
Other Symbol | 49 | < 0.1% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
I | 787362 | |
A | 662715 | |
E | 558044 | |
N | 509430 | |
C | 405942 | |
R | 340779 | 6.4% |
O | 322656 | 6.1% |
S | 297683 | 5.6% |
D | 231577 | 4.4% |
T | 222519 | 4.2% |
Other values (26) | 984189 |
Other Punctuation
Value | Count | Frequency (%) |
, | 3246 | |
¿ | 377 | 9.5% |
: | 288 | 7.3% |
. | 55 | 1.4% |
Space Separator
Value | Count | Frequency (%) |
406426 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 3445 |
Decimal Number
Value | Count | Frequency (%) |
3 | 49 |
Other Symbol
Value | Count | Frequency (%) |
° | 49 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 5322896 | |
Common | 413935 | 7.2% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
I | 787362 | |
A | 662715 | |
E | 558044 | |
N | 509430 | |
C | 405942 | |
R | 340779 | 6.4% |
O | 322656 | 6.1% |
S | 297683 | 5.6% |
D | 231577 | 4.4% |
T | 222519 | 4.2% |
Other values (26) | 984189 |
Common
Value | Count | Frequency (%) |
406426 | ||
- | 3445 | 0.8% |
, | 3246 | 0.8% |
¿ | 377 | 0.1% |
: | 288 | 0.1% |
. | 55 | < 0.1% |
3 | 49 | < 0.1% |
° | 49 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5680333 | |
None | 56498 | 1.0% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
I | 787362 | |
A | 662715 | |
E | 558044 | |
N | 509430 | |
406426 | 7.2% | |
C | 405942 | 7.1% |
R | 340779 | 6.0% |
O | 322656 | 5.7% |
S | 297683 | 5.2% |
D | 231577 | 4.1% |
Other values (22) | 1157719 |
None
Value | Count | Frequency (%) |
Ó | 21766 | |
Í | 21306 | |
Ñ | 4935 | 8.7% |
Ú | 4484 | 7.9% |
Á | 1851 | 3.3% |
É | 1582 | 2.8% |
¿ | 377 | 0.7% |
Ü | 94 | 0.2% |
° | 49 | 0.1% |
À | 30 | 0.1% |
Other values (2) | 24 | < 0.1% |
Distinct | 28 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 2.0 MiB |
BOGOTÁ | |
---|---|
ANTIOQUIA | |
VALLE | |
ATLANTICO | |
SANTANDER | |
Other values (23) |
Length
Max length | 15 |
---|---|
Median length | 12 |
Mean length | 7.128219485 |
Min length | 4 |
Characters and Unicode
Total characters | 1858726 |
---|---|
Distinct characters | 25 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 2 ? |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | VALLE |
---|---|
2nd row | VALLE |
3rd row | VALLE |
4th row | VALLE |
5th row | VALLE |
Common Values
Value | Count | Frequency (%) |
BOGOTÁ | 105234 | |
ANTIOQUIA | 32073 | 12.3% |
VALLE | 17040 | 6.5% |
ATLANTICO | 15315 | 5.9% |
SANTANDER | 10968 | 4.2% |
NORTE SANTANDER | 8667 | 3.3% |
BOLIVAR | 7675 | 2.9% |
BOYACA | 5594 | 2.1% |
NARIÑO | 5127 | 2.0% |
CUNDINAMARCA | 5047 | 1.9% |
Other values (18) | 48016 |
Length
Value | Count | Frequency (%) |
bogotá | 105234 | |
antioquia | 32073 | 11.8% |
santander | 19635 | 7.2% |
valle | 17040 | 6.3% |
atlantico | 15315 | 5.6% |
norte | 8667 | 3.2% |
bolivar | 7675 | 2.8% |
boyaca | 5594 | 2.1% |
nariño | 5127 | 1.9% |
cundinamarca | 5047 | 1.9% |
Other values (19) | 49968 |
Most occurring characters
Value | Count | Frequency (%) |
O | 303953 | |
A | 262620 | |
T | 205126 | |
B | 122915 | 6.6% |
I | 119280 | 6.4% |
N | 117750 | 6.3% |
G | 110111 | 5.9% |
Á | 105234 | 5.7% |
L | 79815 | 4.3% |
R | 68998 | 3.7% |
Other values (15) | 362924 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 1848107 | |
Space Separator | 10619 | 0.6% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
O | 303953 | |
A | 262620 | |
T | 205126 | |
B | 122915 | 6.7% |
I | 119280 | 6.5% |
N | 117750 | 6.4% |
G | 110111 | 6.0% |
Á | 105234 | 5.7% |
L | 79815 | 4.3% |
R | 68998 | 3.7% |
Other values (14) | 352305 |
Space Separator
Value | Count | Frequency (%) |
10619 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 1848107 | |
Common | 10619 | 0.6% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
O | 303953 | |
A | 262620 | |
T | 205126 | |
B | 122915 | 6.7% |
I | 119280 | 6.5% |
N | 117750 | 6.4% |
G | 110111 | 6.0% |
Á | 105234 | 5.7% |
L | 79815 | 4.3% |
R | 68998 | 3.7% |
Other values (14) | 352305 |
Common
Value | Count | Frequency (%) |
10619 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1748365 | |
None | 110361 | 5.9% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
O | 303953 | |
A | 262620 | |
T | 205126 | |
B | 122915 | |
I | 119280 | 6.8% |
N | 117750 | 6.7% |
G | 110111 | 6.3% |
L | 79815 | 4.6% |
R | 68998 | 3.9% |
C | 59791 | 3.4% |
Other values (13) | 298006 |
None
Value | Count | Frequency (%) |
Á | 105234 | |
Ñ | 5127 | 4.6% |
MOD_RAZONA_CUANTITAT_PUNT
Real number (ℝ≥0)
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
Distinct | 172 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 146.5563822 |
Minimum | 0 |
---|---|
Maximum | 300 |
Zeros | 55 |
Zeros (%) | < 0.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.0 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 96 |
Q1 | 123 |
median | 146 |
Q3 | 168 |
95-th percentile | 200 |
Maximum | 300 |
Range | 300 |
Interquartile range (IQR) | 45 |
Descriptive statistics
Standard deviation | 31.71409836 |
---|---|
Coefficient of variation (CV) | 0.2163952049 |
Kurtosis | 0.03595965347 |
Mean | 146.5563822 |
Median Absolute Deviation (MAD) | 23 |
Skewness | 0.2108137414 |
Sum | 38215456 |
Variance | 1005.784035 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
145 | 3164 | 1.2% |
147 | 3118 | 1.2% |
152 | 3117 | 1.2% |
144 | 3100 | 1.2% |
143 | 3094 | 1.2% |
146 | 3092 | 1.2% |
135 | 3086 | 1.2% |
154 | 3086 | 1.2% |
155 | 3086 | 1.2% |
142 | 3082 | 1.2% |
Other values (162) | 229731 |
Value | Count | Frequency (%) |
0 | 55 | |
66 | 5 | < 0.1% |
67 | 30 | < 0.1% |
68 | 26 | < 0.1% |
69 | 35 | < 0.1% |
70 | 54 | |
71 | 65 | |
72 | 74 | |
73 | 103 | |
74 | 118 |
Value | Count | Frequency (%) |
300 | 264 | |
235 | 27 | < 0.1% |
234 | 76 | < 0.1% |
233 | 100 | < 0.1% |
232 | 72 | < 0.1% |
231 | 73 | < 0.1% |
230 | 79 | < 0.1% |
229 | 64 | < 0.1% |
228 | 186 | |
227 | 233 |
MOD_LECTURA_CRITICA_PUNT
Real number (ℝ≥0)
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
Distinct | 182 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 149.0340817 |
Minimum | 0 |
---|---|
Maximum | 300 |
Zeros | 52 |
Zeros (%) | < 0.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.0 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 98 |
Q1 | 126 |
median | 149 |
Q3 | 172 |
95-th percentile | 200 |
Maximum | 300 |
Range | 300 |
Interquartile range (IQR) | 46 |
Descriptive statistics
Standard deviation | 31.37696168 |
---|---|
Coefficient of variation (CV) | 0.2105354784 |
Kurtosis | -0.170868429 |
Mean | 149.0340817 |
Median Absolute Deviation (MAD) | 23 |
Skewness | 0.0608466345 |
Sum | 38861531 |
Variance | 984.5137241 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
153 | 3089 | 1.2% |
156 | 3080 | 1.2% |
147 | 3078 | 1.2% |
155 | 3074 | 1.2% |
146 | 3067 | 1.2% |
145 | 3050 | 1.2% |
158 | 3048 | 1.2% |
150 | 3042 | 1.2% |
144 | 3036 | 1.2% |
149 | 3022 | 1.2% |
Other values (172) | 230170 |
Value | Count | Frequency (%) |
0 | 52 | |
57 | 1 | < 0.1% |
58 | 1 | < 0.1% |
59 | 11 | < 0.1% |
60 | 10 | < 0.1% |
61 | 9 | < 0.1% |
62 | 21 | |
63 | 32 | |
64 | 24 | |
65 | 46 |
Value | Count | Frequency (%) |
300 | 163 | |
237 | 12 | < 0.1% |
235 | 17 | < 0.1% |
234 | 6 | < 0.1% |
233 | 3 | < 0.1% |
232 | 90 | |
231 | 41 | < 0.1% |
230 | 30 | < 0.1% |
229 | 107 | |
228 | 67 |
MOD_COMPETEN_CIUDADA_PUNT
Real number (ℝ≥0)
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
Distinct | 180 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 142.4883684 |
Minimum | 0 |
---|---|
Maximum | 300 |
Zeros | 210 |
Zeros (%) | 0.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.0 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 89 |
Q1 | 118 |
median | 143 |
Q3 | 167 |
95-th percentile | 197 |
Maximum | 300 |
Range | 300 |
Interquartile range (IQR) | 49 |
Descriptive statistics
Standard deviation | 33.34157143 |
---|---|
Coefficient of variation (CV) | 0.2339950397 |
Kurtosis | -0.2921520717 |
Mean | 142.4883684 |
Median Absolute Deviation (MAD) | 25 |
Skewness | 0.009068225782 |
Sum | 37154697 |
Variance | 1111.660385 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
145 | 2936 | 1.1% |
150 | 2935 | 1.1% |
146 | 2884 | 1.1% |
144 | 2882 | 1.1% |
152 | 2834 | 1.1% |
158 | 2826 | 1.1% |
138 | 2825 | 1.1% |
154 | 2813 | 1.1% |
155 | 2811 | 1.1% |
143 | 2805 | 1.1% |
Other values (170) | 232205 |
Value | Count | Frequency (%) |
0 | 210 | |
60 | 11 | < 0.1% |
61 | 57 | < 0.1% |
62 | 72 | < 0.1% |
63 | 75 | < 0.1% |
64 | 61 | < 0.1% |
65 | 92 | |
66 | 107 | |
67 | 163 | |
68 | 174 |
Value | Count | Frequency (%) |
300 | 72 | |
237 | 18 | < 0.1% |
236 | 30 | |
235 | 23 | < 0.1% |
234 | 41 | |
233 | 39 | |
232 | 67 | |
231 | 62 | |
230 | 59 | |
229 | 59 |
MOD_INGLES_PUNT
Real number (ℝ≥0)
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
Distinct | 163 |
---|---|
Distinct (%) | 0.1% |
Missing | 71 |
Missing (%) | < 0.1% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 151.990805 |
Minimum | 0 |
---|---|
Maximum | 300 |
Zeros | 688 |
Zeros (%) | 0.3% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.0 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 104 |
Q1 | 131 |
median | 149 |
Q3 | 173 |
95-th percentile | 207 |
Maximum | 300 |
Range | 300 |
Interquartile range (IQR) | 42 |
Descriptive statistics
Standard deviation | 32.30237748 |
---|---|
Coefficient of variation (CV) | 0.2125284979 |
Kurtosis | 1.633051373 |
Mean | 151.990805 |
Median Absolute Deviation (MAD) | 20 |
Skewness | 0.1921270345 |
Sum | 39621723 |
Variance | 1043.443591 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
145 | 3927 | 1.5% |
146 | 3883 | 1.5% |
143 | 3854 | 1.5% |
147 | 3853 | 1.5% |
142 | 3817 | 1.5% |
150 | 3773 | 1.4% |
141 | 3771 | 1.4% |
144 | 3741 | 1.4% |
140 | 3733 | 1.4% |
149 | 3704 | 1.4% |
Other values (153) | 222629 |
Value | Count | Frequency (%) |
0 | 688 | |
72 | 6 | < 0.1% |
73 | 32 | < 0.1% |
74 | 103 | < 0.1% |
75 | 99 | < 0.1% |
76 | 134 | 0.1% |
77 | 171 | 0.1% |
78 | 220 | 0.1% |
79 | 227 | 0.1% |
80 | 207 | 0.1% |
Value | Count | Frequency (%) |
300 | 793 | |
233 | 18 | < 0.1% |
231 | 156 | 0.1% |
230 | 36 | < 0.1% |
229 | 152 | 0.1% |
228 | 224 | 0.1% |
227 | 115 | < 0.1% |
226 | 154 | 0.1% |
225 | 226 | 0.1% |
224 | 174 | 0.1% |
Distinct | 180 |
---|---|
Distinct (%) | 0.1% |
Missing | 310 |
Missing (%) | 0.1% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 143.7554963 |
Minimum | 0 |
---|---|
Maximum | 300 |
Zeros | 8364 |
Zeros (%) | 3.2% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.0 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 98 |
Q1 | 132 |
median | 143 |
Q3 | 167 |
95-th percentile | 194 |
Maximum | 300 |
Range | 300 |
Interquartile range (IQR) | 35 |
Descriptive statistics
Standard deviation | 37.54039465 |
---|---|
Coefficient of variation (CV) | 0.2611405866 |
Kurtosis | 5.526207455 |
Mean | 143.7554963 |
Median Absolute Deviation (MAD) | 17 |
Skewness | -1.342473619 |
Sum | 37440544 |
Variance | 1409.28123 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 8364 | 3.2% |
140 | 7451 | 2.9% |
139 | 7292 | 2.8% |
138 | 7037 | 2.7% |
141 | 6947 | 2.7% |
137 | 6719 | 2.6% |
142 | 6564 | 2.5% |
136 | 6225 | 2.4% |
143 | 6072 | 2.3% |
135 | 5978 | 2.3% |
Other values (170) | 191797 |
Value | Count | Frequency (%) |
0 | 8364 | |
61 | 1 | < 0.1% |
63 | 3 | < 0.1% |
64 | 7 | < 0.1% |
71 | 4 | < 0.1% |
72 | 1 | < 0.1% |
73 | 7 | < 0.1% |
74 | 14 | < 0.1% |
75 | 9 | < 0.1% |
76 | 26 | < 0.1% |
Value | Count | Frequency (%) |
300 | 808 | |
245 | 2 | < 0.1% |
244 | 1 | < 0.1% |
243 | 1 | < 0.1% |
242 | 2 | < 0.1% |
241 | 1 | < 0.1% |
240 | 3 | < 0.1% |
239 | 2 | < 0.1% |
238 | 1 | < 0.1% |
237 | 6 | < 0.1% |
Distinct | 216 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 146.7303648 |
Minimum | 0 |
---|---|
Maximum | 256 |
Zeros | 3 |
Zeros (%) | < 0.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 2.0 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 110 |
Q1 | 130 |
median | 145 |
Q3 | 162 |
95-th percentile | 187 |
Maximum | 256 |
Range | 256 |
Interquartile range (IQR) | 32 |
Descriptive statistics
Standard deviation | 23.63107117 |
---|---|
Coefficient of variation (CV) | 0.1610509945 |
Kurtosis | 0.06667746627 |
Mean | 146.7303648 |
Median Absolute Deviation (MAD) | 16 |
Skewness | 0.1693183991 |
Sum | 38260823 |
Variance | 558.4275248 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
139 | 4493 | 1.7% |
141 | 4470 | 1.7% |
144 | 4441 | 1.7% |
143 | 4424 | 1.7% |
140 | 4417 | 1.7% |
145 | 4406 | 1.7% |
137 | 4396 | 1.7% |
142 | 4364 | 1.7% |
147 | 4360 | 1.7% |
146 | 4357 | 1.7% |
Other values (206) | 216628 |
Value | Count | Frequency (%) |
0 | 3 | |
18 | 1 | < 0.1% |
19 | 1 | < 0.1% |
20 | 1 | < 0.1% |
23 | 2 | |
24 | 2 | |
26 | 1 | < 0.1% |
28 | 2 | |
31 | 1 | < 0.1% |
32 | 1 | < 0.1% |
Value | Count | Frequency (%) |
256 | 1 | < 0.1% |
254 | 1 | < 0.1% |
253 | 1 | < 0.1% |
252 | 1 | < 0.1% |
250 | 2 | |
249 | 2 | |
248 | 2 | |
247 | 2 | |
246 | 3 | |
245 | 1 | < 0.1% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
ESTU_GENERO | ESTU_DEPTO_RESIDE | ESTU_SEMESTRECURSA | FAMI_ESTRATOVIVIENDA | FAMI_TIENEINTERNET | ESTU_HORASSEMANATRABAJA | ESTU_PRGM_ACADEMICO | ESTU_PRGM_DEPARTAMENTO | MOD_RAZONA_CUANTITAT_PUNT | MOD_LECTURA_CRITICA_PUNT | MOD_COMPETEN_CIUDADA_PUNT | MOD_INGLES_PUNT | MOD_COMUNI_ESCRITA_PUNT | PUNT_GLOBAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | M | VALLE | 08 | Estrato 4 | Si | Menos de 10 horas | ADMINISTRACION DE EMPRESAS | VALLE | 110 | 81 | 100 | 147.0 | 126.0 | 113 |
1 | F | VALLE | 09 | Estrato 1 | No | Menos de 10 horas | ADMINISTRACION DE EMPRESAS | VALLE | 105 | 111 | 100 | 111.0 | 106.0 | 107 |
2 | F | VALLE | 08 | Estrato 3 | Si | Entre 21 y 30 horas | ADMINISTRACION DE EMPRESAS | VALLE | 130 | 137 | 171 | 140.0 | 143.0 | 144 |
3 | F | VALLE | 08 | Estrato 3 | Si | Más de 30 horas | ADMINISTRACION DE EMPRESAS | VALLE | 135 | 141 | 135 | 143.0 | 175.0 | 146 |
4 | F | VALLE | 09 | Estrato 2 | Si | Más de 30 horas | ADMINISTRACION DE EMPRESAS | VALLE | 120 | 110 | 154 | 122.0 | 145.0 | 130 |
5 | F | VALLE | 09 | Estrato 3 | Si | 0 | ADMINISTRACION DE EMPRESAS | VALLE | 142 | 153 | 147 | 162.0 | 0.0 | 121 |
6 | F | VALLE | 08 | Estrato 2 | Si | Menos de 10 horas | ADMINISTRACION DE EMPRESAS | VALLE | 147 | 129 | 139 | 134.0 | 143.0 | 138 |
7 | F | VALLE | 08 | Estrato 1 | Si | Entre 11 y 20 horas | ADMINISTRACION DE EMPRESAS | VALLE | 128 | 139 | 131 | 97.0 | 137.0 | 126 |
8 | F | VALLE | 09 | Estrato 3 | Si | Más de 30 horas | ADMINISTRACION DE EMPRESAS | VALLE | 198 | 180 | 119 | 141.0 | 129.0 | 153 |
9 | M | VALLE | 09 | Estrato 4 | Si | Más de 30 horas | ADMINISTRACION DE EMPRESAS | VALLE | 157 | 151 | 131 | 124.0 | 0.0 | 113 |
Last rows
ESTU_GENERO | ESTU_DEPTO_RESIDE | ESTU_SEMESTRECURSA | FAMI_ESTRATOVIVIENDA | FAMI_TIENEINTERNET | ESTU_HORASSEMANATRABAJA | ESTU_PRGM_ACADEMICO | ESTU_PRGM_DEPARTAMENTO | MOD_RAZONA_CUANTITAT_PUNT | MOD_LECTURA_CRITICA_PUNT | MOD_COMPETEN_CIUDADA_PUNT | MOD_INGLES_PUNT | MOD_COMUNI_ESCRITA_PUNT | PUNT_GLOBAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
260746 | F | CESAR | 10 | Estrato 4 | Si | Menos de 10 horas | DERECHO | CESAR | 100 | 113 | 134 | 94.0 | 116.0 | 111 |
260747 | M | BOGOTÁ | 07 | Estrato 4 | Si | Entre 11 y 20 horas | INGENIERIA DE SISTEMAS Y COMPUTACION | BOGOTÁ | 226 | 220 | 177 | 300.0 | 195.0 | 224 |
260748 | F | ANTIOQUIA | 07 | Estrato 5 | No | Más de 30 horas | BIOLOGIA | ANTIOQUIA | 137 | 155 | 144 | 106.0 | 102.0 | 129 |
260749 | F | CESAR | 12 o más | Estrato 3 | Si | Entre 11 y 20 horas | DERECHO | CESAR | 89 | 135 | 147 | 138.0 | 144.0 | 131 |
260750 | F | BOGOTÁ | 10 | Estrato 4 | Si | Menos de 10 horas | GEOCIENCIAS | BOGOTÁ | 196 | 193 | 159 | 212.0 | 138.0 | 180 |
260751 | M | CALDAS | 11 | Estrato 3 | Si | Entre 11 y 20 horas | ADMINISTRACION DE SISTEMAS INFORMATICOS | CALDAS | 116 | 132 | 124 | 144.0 | 140.0 | 131 |
260752 | M | CALDAS | 12 o más | Estrato 2 | Si | Más de 30 horas | ADMINISTRACION DE SISTEMAS INFORMATICOS | CALDAS | 159 | 192 | 169 | 196.0 | 143.0 | 172 |
260753 | F | ATLANTICO | 08 | Estrato 2 | Si | Más de 30 horas | CONTADURIA PUBLICA | ATLANTICO | 184 | 142 | 83 | 120.0 | 140.0 | 134 |
260754 | F | BOGOTÁ | 08 | Estrato 1 | Si | 0 | GEOCIENCIAS | BOGOTÁ | 210 | 167 | 160 | 198.0 | 138.0 | 175 |
260755 | F | BOGOTÁ | 08 | Estrato 2 | Si | Entre 21 y 30 horas | LICENCIATURA EN PEDAGOGIA INFANTIL | BOGOTÁ | 158 | 166 | 148 | 140.0 | 194.0 | 161 |