Corrigé exos série 1¶
In [2]:
Copied!
import pandas as pd
titanic = pd.read_csv("titanic.csv")
titanic.head()
import pandas as pd
titanic = pd.read_csv("titanic.csv")
titanic.head()
Out[2]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
Nombre de survivants et de morts¶
In [3]:
Copied!
# Permet de rendre la colonne plus compréhensible (moins confus que des 1 et des 0)
titanic["Did survive"] = ["yes" if x == 0 else "no" for x in titanic["Survived"]]
titanic["Did survive"].value_counts()
# Permet de rendre la colonne plus compréhensible (moins confus que des 1 et des 0)
titanic["Did survive"] = ["yes" if x == 0 else "no" for x in titanic["Survived"]]
titanic["Did survive"].value_counts()
Out[3]:
Did survive yes 549 no 342 Name: count, dtype: int64
In [4]:
Copied!
survived = titanic.groupby("Did survive")["PassengerId"].count()
survived
survived = titanic.groupby("Did survive")["PassengerId"].count()
survived
Out[4]:
Did survive no 342 yes 549 Name: PassengerId, dtype: int64
- Calculer le pourcentage de survivants
Comme les valeurs sont 0 ou 1, la moyenne représente le pourcentage de morts. Donc si on fait 1 - la moyenne, on obtient le pourcentage de survivants.
In [ ]:
Copied!
avg = 100 * (1 - titanic["Survived"].mean())
avg
avg = 100 * (1 - titanic["Survived"].mean())
avg
Out[ ]:
np.float64(61.61616161616161)
In [6]:
Copied!
print(f"{avg:.2f}%")
print(f"{avg:.2f}%")
61.62%
Calculer le pourcentage de survivants par classe.¶
In [7]:
Copied!
df = 100 * (1 - titanic[["Pclass", "Survived"]].groupby("Pclass").mean())
df
df = 100 * (1 - titanic[["Pclass", "Survived"]].groupby("Pclass").mean())
df
Out[7]:
Survived | |
---|---|
Pclass | |
1 | 37.037037 |
2 | 52.717391 |
3 | 75.763747 |
In [8]:
Copied!
df.style.format({"Survived": "{:.2f}%"})
df.style.format({"Survived": "{:.2f}%"})
Out[8]:
Survived | |
---|---|
Pclass | |
1 | 37.04% |
2 | 52.72% |
3 | 75.76% |
In [9]:
Copied!
def get_percentage_format(val):
if val < 50:
return "color: red"
else:
return "color: green"
df.style.applymap(get_percentage_format)
def get_percentage_format(val):
if val < 50:
return "color: red"
else:
return "color: green"
df.style.applymap(get_percentage_format)
C:\Users\a527524\AppData\Local\Temp\ipykernel_24536\3562722453.py:7: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead. df.style.applymap(get_percentage_format)
Out[9]:
Survived | |
---|---|
Pclass | |
1 | 37.037037 |
2 | 52.717391 |
3 | 75.763747 |
In [10]:
Copied!
import seaborn as sns
cm = sns.light_palette("green", as_cmap=True)
df.style.format({"Survived": "{:.2f} %"}).background_gradient(cmap=cm)
import seaborn as sns
cm = sns.light_palette("green", as_cmap=True)
df.style.format({"Survived": "{:.2f} %"}).background_gradient(cmap=cm)
Out[10]:
Survived | |
---|---|
Pclass | |
1 | 37.04 % |
2 | 52.72 % |
3 | 75.76 % |
Calculer le pourcentage de survivants par sexe.¶
In [11]:
Copied!
100 * (1 - titanic[["Sex", "Survived"]].groupby("Sex").mean())
100 * (1 - titanic[["Sex", "Survived"]].groupby("Sex").mean())
Out[11]:
Survived | |
---|---|
Sex | |
female | 25.796178 |
male | 81.109185 |
Calculer le nombre de passagers dont l'âge est supérieur à la moyenne.¶
In [12]:
Copied!
avg = titanic["Age"].mean()
(titanic["Age"] > avg).values.sum()
avg = titanic["Age"].mean()
(titanic["Age"] > avg).values.sum()
Out[12]:
np.int64(330)
In [13]:
Copied!
# En une ligne
(titanic["Age"] > titanic["Age"].mean()).values.sum()
# En une ligne
(titanic["Age"] > titanic["Age"].mean()).values.sum()
Out[13]:
np.int64(330)
Les noms des passagers qui ont payé le plus cher leur billet¶
In [14]:
Copied!
max = titanic["Fare"].max()
titanic[titanic["Fare"] == max][["Name", "Fare"]]
max = titanic["Fare"].max()
titanic[titanic["Fare"] == max][["Name", "Fare"]]
Out[14]:
Name | Fare | |
---|---|---|
258 | Ward, Miss Anna | 512.3292 |
679 | Cardeza, Mr. Thomas Drake Martinez | 512.3292 |
737 | Lesurer, Mr. Gustave J | 512.3292 |
Les noms des passagers les plus agés¶
In [15]:
Copied!
max_age = titanic["Age"].max()
print("max age", max_age)
titanic[titanic["Age"] == max_age]["Name"]
max_age = titanic["Age"].max()
print("max age", max_age)
titanic[titanic["Age"] == max_age]["Name"]
max age 80.0
Out[15]:
630 Barkworth, Mr. Algernon Henry Wilson Name: Name, dtype: object
Les noms des passagers les plus jeunes.
In [ ]:
Copied!
min_age = titanic["Age"].min()
print("min age", min_age)
titanic[titanic["Age"] == min_age]["Name"]
min_age = titanic["Age"].min()
print("min age", min_age)
titanic[titanic["Age"] == min_age]["Name"]
max age 0.42
Out[ ]:
803 Thomas, Master Assad Alexander Name: Name, dtype: object
In [19]:
Copied!
titanic["Name length"] = [len(x) for x in titanic["Name"]]
titanic.head()
titanic["Name length"] = [len(x) for x in titanic["Name"]]
titanic.head()
Out[19]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Did survive | name_length | Name length | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | yes | 23 | 23 |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | no | 51 | 51 |
2 | 3 | 1 | 3 | Heikkinen, Miss Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | no | 21 | 21 |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S | no | 44 | 44 |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S | yes | 24 | 24 |
In [21]:
Copied!
max_name_length = titanic["Name length"].max()
titanic[titanic["Name length"] == max_name_length]
max_name_length = titanic["Name length"].max()
titanic[titanic["Name length"] == max_name_length]
Out[21]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Did survive | name_length | Name length | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
307 | 308 | 1 | 1 | Penasco y Castellana, Mrs. Victor de Satode (M... | female | 17.0 | 1 | 0 | PC 17758 | 108.9 | C65 | C | no | 82 | 82 |
Les noms des personnes qui ont survécu et qui ont le plus grand âge¶
In [23]:
Copied!
df = titanic[titanic["Survived"] == 0]
max_age = df["Age"].max()
print("max survivor age", max_age)
df[df["Age"] == max_age]["Name"]
df = titanic[titanic["Survived"] == 0]
max_age = df["Age"].max()
print("max survivor age", max_age)
df[df["Age"] == max_age]["Name"]
max survivor age 74.0
Out[23]:
851 Svensson, Mr. Johan Name: Name, dtype: object