Team13: Capstone project of Python Bootcamp

This is the Capstone project for Team 13 of the Python Data Analysis Bootcamp. We are trying, more or less, to follow the structure of jupytemplate.

Purpose

State the purpose of the notebook.

Methology

Quickly describe assumptions and processing steps.

TODO / Improvements

  • [x] Find a dataset that has at least 2 CSV files
  • [ ] Come up with 5 questions that you want to answer while exploring the dataset
  • [ ] Perform EDA (Exploratoty Data Analysis) on your dataset with basic visualisations

Results

Data processing

1. What are the five cities with the highest/lowest cost of living (incl. rent)?

In [5]:
caption_column = 'City'
index_column = 'Cost of Living Plus Rent Index'

def display_cost_of_living(costs, title):
    filtered_costs = costs[[caption_column, index_column]].sort_values(index_column, ascending = True)
    filtered_costs.plot.barh(title = title, x = caption_column, y = index_column)
    plt.show();
    display(filtered_costs.sort_values(index_column, ascending = False).style.hide_index())

# print the ten most expensive cities in the database in 2018
display_cost_of_living(cost_of_living.nlargest(5, index_column), 'Largest Rent Index')
display_cost_of_living(cost_of_living.nsmallest(5, index_column), 'Smallest Rent Index')
City Cost of Living Plus Rent Index
Hamilton, Bermuda 128.760000
San Francisco, CA, United States 106.290000
Zurich, Switzerland 105.030000
Geneva, Switzerland 104.380000
New York, NY, United States 100.000000
City Cost of Living Plus Rent Index
Bhubaneswar, India 15.140000
Visakhapatnam, India 15.110000
Mysore, India 14.980000
Alexandria, Egypt 14.400000
Thiruvananthapuram, India 13.260000

2. What are the five happiest countries in Europe?

In [6]:
index_column = "People with highest life satisfaction [%]"
caption_column = 'Country'

top_countries_life_satisfaction = life_satisfaction[[caption_column, index_column]]
top_countries_life_satisfaction = top_countries_life_satisfaction.nlargest(5, index_column)
top_countries_life_satisfaction = top_countries_life_satisfaction.sort_values(index_column, ascending = True)
top_countries_life_satisfaction.plot.barh(title = 'Percentage of satisfied people', x = caption_column, y = index_column);
plt.show();
display(top_countries_life_satisfaction.sort_values(index_column, ascending = False).style.hide_index())
Country People with highest life satisfaction [%]
Denmark 42.700000
Finland 38.600000
Switzerland 38.500000
Iceland 38.100000
Austria 37.900000

3. What are the European countries with the most coast line in relation to their area?

In [7]:
index_column = "Coastline (coast/area ratio)"
caption_column = 'Country'

coastline_data = generic_european_country_data[[caption_column, index_column]]

coastline_data = coastline_data.nlargest(5, index_column)
coastline_data = coastline_data.sort_values(index_column, ascending = True)
coastline_data.plot.barh(title = 'Countries with the most coast line in relation to their area', x = caption_column, y = index_column);
plt.show();
display(coastline_data.sort_values(index_column, ascending = False).style.hide_index())
Country Coastline (coast/area ratio)
Monaco 205.000000
Gibraltar 171.430000
Faroe Islands 79.840000
Guernsey 64.100000
Malta 62.280000

4. Is there a correlation between happiness and access to a coastline?

In [9]:
merged = pd.merge(generic_european_country_data, life_satisfaction, on = COUNTRY_COLUMN, how = 'inner')  

coastline_data = generic_european_country_data[[COAST_COLUMN, COUNTRY_COLUMN]]

# sort by coast
merged = merged.sort_values(COAST_COLUMN, ascending = True, ignore_index = True)

ax = plt.gca()
merged.plot(kind = 'line', y = COAST_COLUMN, x = COUNTRY_COLUMN ,ax=ax)
merged.plot(kind = 'line', y = SATISFACTION_COLUMN, x = COUNTRY_COLUMN ,ax=ax)

plt.grid(b = True, color = 'aqua', alpha = 0.1, linestyle = 'dashdot')
plt.show();