Skip to content
Snippets Groups Projects
Commit 529b8626 authored by Nicolas Mantilla Molina's avatar Nicolas Mantilla Molina
Browse files

Add design final project

parent 52e0e9fc
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Cáncer de cuello uterino en Colombia: Análisis Estadístico
%% Cell type:code id: tags:
``` python
import PyPDF2
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
```
%% Cell type:code id: tags:
``` python
pdf = open('Colombia_profile.pdf', 'rb')
```
%% Cell type:code id: tags:
``` python
pdfReader = PyPDF2.PdfReader(pdf)
bucaramanga = pdfReader.pages[5].extract_text()
cali = pdfReader.pages[13].extract_text()
manizales = pdfReader.pages[21].extract_text()
pasto = pdfReader.pages[29].extract_text()
COL = pdfReader.pages[37].extract_text()
```
%% Cell type:code id: tags:
``` python
pdf.close()
```
%% Cell type:code id: tags:
``` python
col_data = COL.split('\n')[30].split(" ")
buc_data = bucaramanga.split('\n')[30].split(" ")
cal_data = cali.split('\n')[30].split(" ")
man_data = manizales.split('\n')[31].split(" ")
pas_data = pasto.split('\n')[30].split(" ")
```
%% Cell type:code id: tags:
``` python
col_data = [float(i) for i in [col_data[0][5::]] + [col_data[1]] + col_data[7:19]]
buc_data = [float(i) for i in [buc_data[0][5::]] + [buc_data[1]] + buc_data[7:19]]
cal_data = [float(i) for i in [cal_data[0][5::]] + [cal_data[1]] + cal_data[7:19]]
man_data = [float(i) for i in [man_data[0][5::]] + [man_data[1]] + man_data[7:19]]
pas_data = [float(i) for i in [pas_data[0][5::]] + [pas_data[1]] + pas_data[7:19]]
```
%% Cell type:code id: tags:
``` python
df = pd.DataFrame({'Bucaramanga':buc_data,'Cali':cal_data,'Manizales':man_data,'Pasto':pas_data,
'Colombia': col_data}, index=['Total','Desconocido','20-24',
'25-29',' 30-34',' 35-39', '40-44','45-49' ,'50-54', '55-59',
'60-64', '65-69', '70-74', '>75'])
```
%% Cell type:markdown id: tags:
**Incidencia por cada 100.000 mujeres**
%% Cell type:code id: tags:
``` python
df
```
%% Output
Bucaramanga Cali Manizales Pasto Colombia
Total 482.0 1181.0 225.0 271.0 2159.0
Desconocido 4.0 30.0 0.0 0.0 34.0
20-24 7.2 8.1 11.0 8.2 8.2
25-29 21.8 20.3 31.2 20.5 21.7
30-34 25.3 29.9 26.4 34.9 28.9
35-39 26.2 33.0 35.4 39.6 32.1
40-44 35.2 49.1 35.1 71.2 45.9
45-49 35.3 49.4 32.7 69.6 45.7
50-54 42.0 48.3 62.3 71.9 50.5
55-59 60.7 50.3 58.9 77.7 56.6
60-64 52.3 75.5 64.3 110.1 71.4
65-69 69.9 61.4 67.0 101.9 68.1
70-74 56.7 66.2 52.9 94.8 64.6
>75 18.1 21.9 22.7 27.2 21.5
%% Cell type:code id: tags:
``` python
factors = pd.DataFrame({'Ciudad':(['Bucaramanga']*12 +['Cali']*12 + ['Manizales']*12 + ['Pasto']*12 + ['Colombia']*12),
'Edades':('20-24',
'25-29',' 30-34',' 35-39', '40-44','45-49' ,'50-54', '55-59',
'60-64', '65-69', '70-74', '>75')*5, 'Incidencia':
buc_data[2:] + cal_data[2:] + man_data[2:] + pas_data[2:] + col_data[2:]})
factors
```
%% Output
Ciudad Edades Incidencia
0 Bucaramanga 20-24 7.2
1 Bucaramanga 25-29 21.8
2 Bucaramanga 30-34 25.3
3 Bucaramanga 35-39 26.2
4 Bucaramanga 40-44 35.2
5 Bucaramanga 45-49 35.3
6 Bucaramanga 50-54 42.0
7 Bucaramanga 55-59 60.7
8 Bucaramanga 60-64 52.3
9 Bucaramanga 65-69 69.9
10 Bucaramanga 70-74 56.7
11 Bucaramanga >75 18.1
12 Cali 20-24 8.1
13 Cali 25-29 20.3
14 Cali 30-34 29.9
15 Cali 35-39 33.0
16 Cali 40-44 49.1
17 Cali 45-49 49.4
18 Cali 50-54 48.3
19 Cali 55-59 50.3
20 Cali 60-64 75.5
21 Cali 65-69 61.4
22 Cali 70-74 66.2
23 Cali >75 21.9
24 Manizales 20-24 11.0
25 Manizales 25-29 31.2
26 Manizales 30-34 26.4
27 Manizales 35-39 35.4
28 Manizales 40-44 35.1
29 Manizales 45-49 32.7
30 Manizales 50-54 62.3
31 Manizales 55-59 58.9
32 Manizales 60-64 64.3
33 Manizales 65-69 67.0
34 Manizales 70-74 52.9
35 Manizales >75 22.7
36 Pasto 20-24 8.2
37 Pasto 25-29 20.5
38 Pasto 30-34 34.9
39 Pasto 35-39 39.6
40 Pasto 40-44 71.2
41 Pasto 45-49 69.6
42 Pasto 50-54 71.9
43 Pasto 55-59 77.7
44 Pasto 60-64 110.1
45 Pasto 65-69 101.9
46 Pasto 70-74 94.8
47 Pasto >75 27.2
48 Colombia 20-24 8.2
49 Colombia 25-29 21.7
50 Colombia 30-34 28.9
51 Colombia 35-39 32.1
52 Colombia 40-44 45.9
53 Colombia 45-49 45.7
54 Colombia 50-54 50.5
55 Colombia 55-59 56.6
56 Colombia 60-64 71.4
57 Colombia 65-69 68.1
58 Colombia 70-74 64.6
59 Colombia >75 21.5
%% Cell type:code id: tags:
``` python
fig, axs = plt.subplots(2, 1, figsize=(20, 20))
axs[0].set_title('Región vs Incidencia',fontsize=15)
sns.boxplot(x="Incidencia", y="Ciudad", data=factors, ax=axs[0])
sns.swarmplot(x="Incidencia", y="Ciudad", data=factors, color='black',
alpha = 0.5, ax=axs[0])
axs[1].set_title('Edad vs Incidencia')
sns.boxplot(x="Edades", y="Incidencia", data=factors, ax=axs[1])
sns.swarmplot(x="Edades", y="Incidencia", data=factors, color='black',alpha = 0.5, ax=axs[1])
```
%% Output
<Axes: title={'center': 'Edad vs Incidencia'}, xlabel='Edades', ylabel='Incidencia'>
%% Cell type:code id: tags:
``` python
plt.figure(figsize=(10, 8))
sns.displot(data=factors, bins=10, x="Incidencia", col="Ciudad", col_wrap=2, kde=True, hue="Ciudad",
palette="dark", legend=False)
```
%% Output
<seaborn.axisgrid.FacetGrid at 0x1aab9853640>
%% Cell type:code id: tags:
``` python
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment