PERFORMING ANALYSIS OF METEOROLOGICAL DATA
PERFORMING ANALYSIS OF METEOROLOGICAL DATA
Performing Analysis of Meteorological Data
Goal: To transform the raw data into information and then convert it into knowledge
H0: Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming.
To find: Whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not.
Analysis:
- Here we start by importing mandatory libraries,
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
2. Reading and viewing the file,
metadata=pd.read_csv(r”E:\Lectures\Docs\Docs\weatherHistory.csv”)
metadata.head()
Formatted Date | Summary | Precip Type | Temperature (C) | Apparent Temperature (C) | Humidity | Wind Speed (km/h) | Wind Bearing (degrees) | Visibility (km) | Pressure (millibars) | Daily Summary | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2006-04-01 00:00:00.000 +0200 | Partly Cloudy | rain | 9.472222 | 7.388889 | 0.89 | 14.1197 | 251 | 15.8263 | 1015.13 | Partly cloudy throughout the day. |
1 | 2006-04-01 01:00:00.000 +0200 | Partly Cloudy | rain | 9.355556 | 7.227778 | 0.86 | 14.2646 | 259 | 15.8263 | 1015.63 | Partly cloudy throughout the day. |
2 | 2006-04-01 02:00:00.000 +0200 | Mostly Cloudy | rain | 9.377778 | 9.377778 | 0.89 | 3.9284 | 204 | 14.9569 | 1015.94 | Partly cloudy throughout the day. |
3 | 2006-04-01 03:00:00.000 +0200 | Partly Cloudy | rain | 8.288889 | 5.944444 | 0.83 | 14.1036 | 269 | 15.8263 | 1016.41 | Partly cloudy throughout the day. |
4 | 2006-04-01 04:00:00.000 +0200 | Mostly Cloudy | rain | 8.755556 | 6.977778 | 0.83 | 11.0446 | 259 |
- . Here we make changes to the raw data and convert it into a readable format and then resample the data,
columns=['Formatted Date','Apparent Temperature (C)','Humidity']
Here we consider only the columns, “Formatted Date, “Temperature” and “Humidity” as these three are the only required columns for our analysis by domain knowledge.
- Creating a new data frame
df=metadata[columns]
5.Here the formatted date column has unreadable date time format , so first we convert the time into readable format,
df[‘Formatted Date’]=pd.to_datetime(df[‘Formatted Date’] , utc=True)
df1=df.set_index(‘Formatted Date’)
- Now we resample it using the mean,
df1=df1.resample(‘MS’).mean()
df1.head()
Out[45]:
Apparent Temperature (C) | Humidity | |
Formatted Date | ||
2005-12-01 00:00:00+00:00 | -4.050000 | 0.890000 |
2006-01-01 00:00:00+00:00 | -4.173708 | 0.834610 |
2006-02-01 00:00:00+00:00 | -2.990716 | 0.843467 |
2006-03-01 00:00:00+00:00 | 1.969780 | 0.778737 |
2006-04-01 00:00:00+00:00 | 12.098827 | 0.728625 |
Here we can see the time has been updated in a readable format and has also the dataframe has been resampled.
- Now plotting the graph of Changes in Temperature and Humidity with respect to time
plt.figure(figsize=(14,6))
plt.title(‘Changes in Temperature and Humidity with respect to time’)
plt.plot(df1)
[<matplotlib.lines.Line2D at 0x1f4926c3948>,
<matplotlib.lines.Line2D at 0x1f4926e8048>]

Here we can see, both the peaks and the troughs are almost same throughout the period of 10 years.
- Now considering the month of April over 10 years and plotting it,
dfapril=df1[df1.index.month==4]
plt.figure(figsize=(16,8))
plt.plot(dfapril)
[<matplotlib.lines.Line2D at 0x1f492d4b348>,
<matplotlib.lines.Line2D at 0x1f492d76088>]

Here there is a sharp rise in temperature in 2009 whereas there is a fall in temperature in 2015.
Conclusion:
Hence global warming has caused an uncertainty in temperature over the past 10 years while humidity as remained constant throughout 10 years.
Comments
Post a Comment