
Leveraging SQL for Deeper Insights:
While Excel provided a solid foundation for initial data exploration, its limitations became apparent as we delved deeper into the research questions. To unlock the full potential of the massive dataset (1 million+ rows), we transitioned to a robust SQL environment (Google Big Query).
SQL's Advantage:
Scalability: Effortlessly handles large datasets, enabling efficient analysis without performance bottlenecks.
Advanced Analytics: Provides powerful querying capabilities to perform in-depth data mining and uncover hidden trends.
Targeted Exploration: Allows us to focus on specific data subsets relevant to our research questions, such as cost savings, convenience, and targeted customer segmentation.
Unlocking Hidden Value:
By leveraging SQL's capabilities, we were able to:
Refine Data Cleaning: Perform more granular cleaning operations to ensure data integrity and accuracy for our analysis.
Extract Actionable Insights: Utilize advanced SQL queries to unearth valuable patterns and trends related to customer behavior and profitability.
Targeted Customer Segmentation: Segment the customer base based on key characteristics identified through SQL analysis, informing targeted marketing strategies.
This transition from Excel to SQL demonstrates a strategic approach to data analysis, leveraging the right tools at each stage to maximize the value extracted from the data.
The images in this section showcases further refined data cleaning operations to ensure data accuracy and integrity of our analysis, the sql queries shows the table being updated with a new column without null values.
"The query image presented herein elucidated insights pivotal to understanding why casual riders opt for memberships. By examining the benefits of convenience (via using a duplicate query, revealing increased access for annual members) and cost savings (utilizing ride frequency to highlight higher savings for members), we uncovered that annual members enjoy broader station access and greater financial savings compared to casual riders."
"Facilitating the direct upload of substantial files to Google Big Query encounters a constraint with a file size cap of 100MB. Therefore, the most expedient recourse for uploading voluminous files involves leveraging the robust capabilities of the Google Cloud Console, as elucidated in the accompanying visual representation in this section”
Limitations of SQL Compared to R for In-Depth Analysis:
While SQL remains a cornerstone for data querying and manipulation, its capabilities for in-depth customer analysis can be outpaced by R in certain aspects:
Limited Data Manipulation: SQL excels at structured data retrieval but may struggle with granular data transformations. R empowers us to effortlessly drop irrelevant columns, filter data by complex criteria, and create new dataframes with precise control over structure and content.
Focus on Relational Queries: SQL's core strength lies in relational database querying. While offering some analytical capabilities, it may not provide the same level of advanced data exploration as R. R boasts a rich library of functions for tasks like identifying key differences between customer segments, creating calculated columns for in-depth analysis (e.g., time of day, day of week effects), and exploring complex relationships within the data.
Data Visualization: While some SQL environments offer basic visualization tools, R excels in this realm. Its powerful libraries allow for the creation of compelling and interactive visualizations (tables, charts, graphs) that effectively communicate insights gleaned from the data.
By acknowledging these limitations, we can highlight the strategic use of both SQL and R in the data analysis pipeline. SQL provides the foundation for data retrieval and cleaning, while R takes the reins for in-depth exploration, visualization, and advanced statistical analysis.