BIKE SHARING DATA ANALYSIS
• After analyzing the data, the main challenge seems to be the uncertainty in demand (each hour & each day) • 2 of the discrete factor variables and 4 continuous variables that can have an impact on demand and their
relationship has been depicted in the following graphs and table (Blue Boxes):
Total users in a day
Bike demand per hour can be segregated as:
High : 7-9 and 17-19 hours
Average : 10-16 hours
Low : 0-6 and 20-24 hours
Correlation factor of continuousvariables on all three
dependentvariables
2 types of data were available, analysis of file with hourly
data has been presented.
The independentvariables identified are: -instant: record index -dteday: date
-season: season (1:springer, 2:summer, 3:fall, 4:winter) -yr: year (0: 2011, 1:2012)
-mnth: month ( 1 to 12) -hr: hour (0 to 23) -holiday: weather day is holiday or not
-weekday: day of the week
-workingday: if day is not weekend/holiday its 1 else 0. -weathersit: weather(1: Clear, 2: Mist, 3: Light Snow, Light Rain, 4: Heavy Rain + Ice Pallets
-temp: Normalized temperature in Celsius.
-atemp: Normalized feeling temperature in Celsius. -hum: Normalized humidity
-windspeed: Normalized wind speed.
Dependentvariables are: - casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and registered
• The plot shows impactof Rainon the demand as it felldrastically
• Demand is highest during clear weather as well as the numbers are steady
• Variables and their individualeffects have been presented as examples.
• However, it will be interesting to measure their impacts
altogetherand prepare a model to predict the hourly demand
• A samplemodel was prepared with numeric type variables after factorizingthe discretevariables.
• The correlation factorbetween fitted and original values
of “cnt” obtained is 0.7921and the plot is as shown in the
right.
• Thus, it needs to be determined if this model can be