Corrigé exos série 1¶

In [2]:

Copied!

import pandas as pd

titanic = pd.read_csv("titanic.csv")
titanic.head()
import pandas as pd

titanic = pd.read_csv("titanic.csv")
titanic.head()

Out[2]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

Nombre de survivants et de morts¶

In [3]:

Copied!

# Permet de rendre la colonne plus compréhensible (moins confus que des 1 et des 0)
titanic["Did survive"] = ["yes" if x == 0 else "no" for x in titanic["Survived"]]
titanic["Did survive"].value_counts()
# Permet de rendre la colonne plus compréhensible (moins confus que des 1 et des 0)
titanic["Did survive"] = ["yes" if x == 0 else "no" for x in titanic["Survived"]]
titanic["Did survive"].value_counts()

Out[3]:

Did survive
yes    549
no     342
Name: count, dtype: int64

In [4]:

Copied!

survived = titanic.groupby("Did survive")["PassengerId"].count()
survived
survived = titanic.groupby("Did survive")["PassengerId"].count()
survived

Out[4]:

Did survive
no     342
yes    549
Name: PassengerId, dtype: int64

Calculer le pourcentage de survivants

Comme les valeurs sont 0 ou 1, la moyenne représente le pourcentage de morts. Donc si on fait 1 - la moyenne, on obtient le pourcentage de survivants.

In [ ]:

Copied!

avg = 100 * (1 - titanic["Survived"].mean())
avg
avg = 100 * (1 - titanic["Survived"].mean())
avg

Out[ ]:

np.float64(61.61616161616161)

In [6]:

Copied!

print(f"{avg:.2f}%")
print(f"{avg:.2f}%")

61.62%

Calculer le pourcentage de survivants par classe.¶

In [7]:

Copied!

df = 100 * (1 - titanic[["Pclass", "Survived"]].groupby("Pclass").mean())
df
df = 100 * (1 - titanic[["Pclass", "Survived"]].groupby("Pclass").mean())
df

Out[7]:

	Survived
Pclass
1	37.037037
2	52.717391
3	75.763747

In [8]:

Copied!

df.style.format({"Survived": "{:.2f}%"})
df.style.format({"Survived": "{:.2f}%"})

Out[8]:

	Survived
Pclass
1	37.04%
2	52.72%
3	75.76%

In [9]:

Copied!





def get_percentage_format(val):
  if val < 50:
    return "color: red"
  else:
    return "color: green"
  
df.style.applymap(get_percentage_format)
def get_percentage_format(val):
  if val < 50:
    return "color: red"
  else:
    return "color: green"
  
df.style.applymap(get_percentage_format)

C:\Users\a527524\AppData\Local\Temp\ipykernel_24536\3562722453.py:7: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  df.style.applymap(get_percentage_format)

Out[9]:

	Survived
Pclass
1	37.037037
2	52.717391
3	75.763747

In [10]:

Copied!

import seaborn as sns
cm = sns.light_palette("green", as_cmap=True)
df.style.format({"Survived": "{:.2f} %"}).background_gradient(cmap=cm)
import seaborn as sns
cm = sns.light_palette("green", as_cmap=True)
df.style.format({"Survived": "{:.2f} %"}).background_gradient(cmap=cm)

Out[10]:

	Survived
Pclass
1	37.04 %
2	52.72 %
3	75.76 %

Calculer le pourcentage de survivants par sexe.¶

In [11]:

Copied!

100 * (1 - titanic[["Sex", "Survived"]].groupby("Sex").mean())
100 * (1 - titanic[["Sex", "Survived"]].groupby("Sex").mean())

Out[11]:

	Survived
Sex
female	25.796178
male	81.109185

Calculer le nombre de passagers dont l'âge est supérieur à la moyenne.¶

In [12]:

Copied!

avg = titanic["Age"].mean()
(titanic["Age"] > avg).values.sum()
avg = titanic["Age"].mean()
(titanic["Age"] > avg).values.sum()

Out[12]:

np.int64(330)

In [13]:

Copied!

# En une ligne
(titanic["Age"] > titanic["Age"].mean()).values.sum()
# En une ligne
(titanic["Age"] > titanic["Age"].mean()).values.sum()

Out[13]:

np.int64(330)

Les noms des passagers qui ont payé le plus cher leur billet¶

In [14]:

Copied!

max = titanic["Fare"].max()
titanic[titanic["Fare"] == max][["Name", "Fare"]]
max = titanic["Fare"].max()
titanic[titanic["Fare"] == max][["Name", "Fare"]]

Out[14]:

	Name	Fare
258	Ward, Miss Anna	512.3292
679	Cardeza, Mr. Thomas Drake Martinez	512.3292
737	Lesurer, Mr. Gustave J	512.3292

Les noms des passagers les plus agés¶

In [15]:

Copied!

max_age = titanic["Age"].max()
print("max age", max_age)
titanic[titanic["Age"] == max_age]["Name"]
max_age = titanic["Age"].max()
print("max age", max_age)
titanic[titanic["Age"] == max_age]["Name"]

max age 80.0

Out[15]:

630    Barkworth, Mr. Algernon Henry Wilson
Name: Name, dtype: object

Les noms des passagers les plus jeunes.

In [ ]:

Copied!

min_age = titanic["Age"].min()
print("min age", min_age)
titanic[titanic["Age"] == min_age]["Name"]
min_age = titanic["Age"].min()
print("min age", min_age)
titanic[titanic["Age"] == min_age]["Name"]

max age 0.42

Out[ ]:

803    Thomas, Master Assad Alexander
Name: Name, dtype: object

In [19]:

Copied!

titanic["Name length"] = [len(x) for x in titanic["Name"]]
titanic.head()
titanic["Name length"] = [len(x) for x in titanic["Name"]]
titanic.head()

Out[19]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked	Did survive	name_length	Name length
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S	yes	23	23
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C	no	51	51
2	3	1	3	Heikkinen, Miss Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S	no	21	21
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S	no	44	44
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S	yes	24	24

In [21]:

Copied!

max_name_length = titanic["Name length"].max()
titanic[titanic["Name length"] == max_name_length]
max_name_length = titanic["Name length"].max()
titanic[titanic["Name length"] == max_name_length]

Out[21]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	Did survive	name_length	Name length
307	308	1	1	Penasco y Castellana, Mrs. Victor de Satode (M...	female	17.0	1	0	PC 17758	108.9	C65	C	no	82	82

Les noms des personnes qui ont survécu et qui ont le plus grand âge¶

In [23]:

Copied!





df = titanic[titanic["Survived"] == 0]
max_age = df["Age"].max()
print("max survivor age", max_age)
df[df["Age"] == max_age]["Name"]
df = titanic[titanic["Survived"] == 0]
max_age = df["Age"].max()
print("max survivor age", max_age)
df[df["Age"] == max_age]["Name"]

max survivor age 74.0

Out[23]:

851    Svensson, Mr. Johan
Name: Name, dtype: object