PERFORMING ANALYSIS OF METEOROLOGICAL DATA

 

PERFORMING ANALYSIS OF METEOROLOGICAL DATA

Performing Analysis of Meteorological Data

Goal: To transform the raw data into information and then convert it into knowledge

H0: Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming.

To find: Whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not.

Analysis:

  1. Here we start by importing mandatory libraries,

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

2. Reading and viewing the file,

metadata=pd.read_csv(r”E:\Lectures\Docs\Docs\weatherHistory.csv”)

metadata.head()

Formatted DateSummaryPrecip TypeTemperature (C)Apparent Temperature (C)HumidityWind Speed (km/h)Wind Bearing (degrees)Visibility (km)Pressure (millibars)Daily Summary
02006-04-01 00:00:00.000 +0200Partly Cloudyrain9.4722227.3888890.8914.119725115.82631015.13Partly cloudy throughout the day.
12006-04-01 01:00:00.000 +0200Partly Cloudyrain9.3555567.2277780.8614.264625915.82631015.63Partly cloudy throughout the day.
22006-04-01 02:00:00.000 +0200Mostly Cloudyrain9.3777789.3777780.893.928420414.95691015.94Partly cloudy throughout the day.
32006-04-01 03:00:00.000 +0200Partly Cloudyrain8.2888895.9444440.8314.103626915.82631016.41Partly cloudy throughout the day.
42006-04-01 04:00:00.000 +0200Mostly Cloudyrain8.7555566.9777780.8311.0446259
  • . Here we make changes to the raw data and convert it into a readable format  and then resample the data,
columns=['Formatted Date','Apparent Temperature (C)','Humidity']

Here we consider only the columns, “Formatted Date, “Temperature” and “Humidity” as these three are the only required columns for our analysis by domain knowledge.

  • Creating a new data frame

df=metadata[columns]

5.Here the formatted date column has unreadable date time format , so first we convert the time into readable format,

df[‘Formatted Date’]=pd.to_datetime(df[‘Formatted Date’] , utc=True)

df1=df.set_index(‘Formatted Date’)

  • Now we resample it using the mean,

df1=df1.resample(‘MS’).mean()

df1.head()

Out[45]:

Apparent Temperature (C)Humidity
Formatted Date
2005-12-01 00:00:00+00:00-4.0500000.890000
2006-01-01 00:00:00+00:00-4.1737080.834610
2006-02-01 00:00:00+00:00-2.9907160.843467
2006-03-01 00:00:00+00:001.9697800.778737
2006-04-01 00:00:00+00:0012.0988270.728625

Here we can see the time has been updated in a readable format and has also the dataframe has been resampled.

  • Now plotting the graph of Changes in Temperature and Humidity with respect to time

plt.figure(figsize=(14,6))

plt.title(‘Changes in Temperature and Humidity with respect to time’)

plt.plot(df1)

[<matplotlib.lines.Line2D at 0x1f4926c3948>,

 <matplotlib.lines.Line2D at 0x1f4926e8048>]

Here we can see, both the peaks and the troughs are almost same throughout the period of 10 years.

  • Now considering the month of April over 10 years and plotting it,

dfapril=df1[df1.index.month==4]

plt.figure(figsize=(16,8))

plt.plot(dfapril)

[<matplotlib.lines.Line2D at 0x1f492d4b348>,

 <matplotlib.lines.Line2D at 0x1f492d76088>]

Here there is a sharp rise in temperature in 2009 whereas there is a fall in temperature in 2015.

Conclusion:

Hence global warming has caused an uncertainty in temperature over the past 10 years while humidity as remained constant throughout 10 years.

Comments