data-analyst-lucia

Lucia Ceron - AWS Portfolio

Welcome to my AWS Portfolio! This collection showcases a series of cloud-based data analytics projects and case studies that I designed and implemented using real-world datasets and business questions. Each project demonstrates my ability to leverage AWS services to solve practical business challenges, from building data lakes and ETL pipelines to performing cost analysis and implementing secure, scalable cloud solutions. The portfolio also includes case studies that highlight my understanding of AWS deployment models, cost optimization, and best practices

HR Data Lake Solution on AWS

Project Description

Implementation of a cloud-based data lake using AWS services to analyze factors influencing employee satisfaction and engagement within an HR department.

Project Title

Design and Deployment of HR Data Lake Architecture using AWS

Objective

The primary goal of this project was to analyze, design, and implement a data lake solution in AWS to address the business question provided by the HR operations team:
“What are the main factors impacting employee satisfaction and engagement?”
This project represents a scaled-down, functional version of the data lake solution that enables correlating employee sentiment with performance data, allowing better data-driven decisions.

Business Question

What are the main factors impacting employee satisfaction and engagement?

Dataset

The project utilized two main datasets provided by the HR operations team:

The datasets were provided in CSV and JSON formats with quarterly read/write frequency requirements.

Methodology

1. Analysis and Data Understanding

project-1-fishbone

2. Data Lake Design

project-1-datalake-design

3. Implementation

4. Cost Optimization

5. Deployment Testing

Tools and Technologies

Deliverables

Summary

This project demonstrates the implementation of a scalable AWS data lake to support HR data analytics, enabling data-driven decisions to improve employee engagement and satisfaction.


Cost Evaluation and Dataset Cleaning for HR Data Lake on AWS

Project Description

This project focuses on the cost evaluation of dataset ingestion and the analysis, design, and implementation of the dataset cleaning process for an HR Data Lake on AWS. As a member of the HR data team, the objective was to optimize the ingestion pipeline and ensure data quality for further analytical tasks.

Objective

The primary objective of this project was to evaluate the ingestion cost of HR datasets and to perform a comprehensive cleaning process to prepare the data for further analysis. The datasets involved were Employee Surveys and Performance Reviews. The project aimed to identify data quality issues, design appropriate cleaning strategies using AWS Glue DataBrew, and execute data cleansing while leveraging cloud services for scalability and automation.

Dataset

The HR department provided two datasets critical for employee satisfaction and engagement analysis:

Methodology

1. Cost Evaluation of Dataset Ingestion

2. Dataset Cleaning Analysis & Design

project-2-cost-eval-design

3. Dataset Cleaning Implementation

Cloud Features Utilized

Tools and Technologies

Deliverables

Conclusion

This project demonstrates the advantages of cloud computing for data management. By leveraging AWS services, complex tasks such as data ingestion, data cleaning, and storage organization were automated efficiently. The seamless integration between S3 and Glue DataBrew allowed for simplified data preparation and scalable storage management.


Employee Satisfaction and Engagement Analysis using AWS Cloud Services

Project Description

The project aimed to analyze employee satisfaction and performance metrics by designing and implementing a data analytics pipeline within the AWS ecosystem. This involved transforming raw HR datasets into a reliable Single Source of Truth (SST) to support descriptive, diagnostic, and prescriptive analysis aligned with business objectives.

Objective

The primary objective was to analyze, design, implement, and evaluate a data pipeline solution to address the business question:

“What are the main factors impacting employee satisfaction and engagement?”

Through descriptive, diagnostic, and prescriptive analysis, the project aimed to provide reliable metrics to support HR decision-making.

Dataset

The project utilized two key datasets provided by the HR Operations team:

Both datasets were delivered in CSV, with a quarterly update frequency.

Methodology

1. Data Collection and Preparation

2. Cost Evaluation and Monitoring

3. ETL Design

4. ETL Implementation

project-3-engagement-analysis

Tools and Technologies

Deliverables


Cloud Data Lake Ingestion for HR Business Questions: A DAP Approach

Project Description

This project focuses on analyzing, designing, and implementing a data lake solution to support the HR department in addressing the business question: “What are the factors influencing employee satisfaction and engagement?” The solution involved understanding the operational environment, analyzing data lineage, designing data processing pipelines, and implementing the initial ingestion phase in the cloud.

Objective

The primary goal of this project was to create a foundational cloud-based data architecture that allows HR analysts to ingest, store, and prepare datasets for future analysis of employee satisfaction drivers. The work was focused on analyzing the existing environment, identifying data sources, and building a data ingestion solution using AWS S3.

Dataset

The datasets provided by the HR operations team included:

These datasets contain information relevant to employee sentiment, engagement levels, and performance outcomes.

Methodology

1. Analysis of Operational Environment

project-4-dap-design

2. Root Cause Analysis

project-4-fishbone

3. Solution Architecture Design

project-4-solution

4. Implementation

Tools and Technologies

Deliverables

This project demonstrates the initial steps in building a scalable cloud-based data lake architecture to support advanced HR analytics and employee engagement insights.


AWS Data Analytic Platform for The City of Vancouver

Project Title: Migration and Descriptive Analysis of Cultural Spaces Data for The City of Vancouver

Objective:

The objective of this project is to design, implement, and deploy a cloud-based data analytic platform (DAP) for the City of Vancouver using AWS services. The platform enables descriptive analysis on the ‘Cultural Spaces’ dataset, focusing on analyzing trends in average square footage by ownership type and the distribution of cultural space types over time. The final goal is to assist city officials in decision-making and resource allocation.

Dataset:

The dataset was sourced from the City of Vancouver’s open data portal, specifically selecting the “Cultural Spaces” dataset, which includes:

Methodology:

1. Data Collection and Preparation:

project-5-design

2. Data Profiling:

3. Data Cleaning:

4. Data Cataloging:

5. Data Summarization:

Insights and Findings:

Recommendations:

Tools and Technologies:

Deliverables:

This project demonstrates end-to-end implementation of a cloud-based data analytics pipeline using AWS services, enabling public sector data-driven decision making.


Secure Cloud Data Platform with Governance and Monitoring for City of Vancouver

Cloud Migration and Data Governance for Cultural Spaces Dataset - City of Vancouver

Objective

The objective of this project was to design and implement a Data Analytics Platform (DAP) on AWS to support the City of Vancouver’s migration requirements. The project specifically focused on securely migrating the “Cultural Spaces” dataset, ensuring data governance, quality, monitoring, and resilience in alignment with municipal compliance and operational standards.

Dataset

The dataset used for this project was the City of Vancouver’s “Cultural Spaces” dataset, containing information about cultural venues within the city. The dataset included the following key features:

Methodology

1. Data Collection and Preparation
2. Data Governance
3. Data Monitoring

Tools and Technologies

Deliverables

This project demonstrates secure cloud data migration, effective data governance practices, and proactive monitoring solutions, providing the City of Vancouver with a resilient and compliant data analytics infrastructure.


AWS Deployment and Service Models

Case Study 1: Traditional Computing Model vs Cloud Computing Model

Explanation of Results:
In this case study, I compared traditional on-premises IT infrastructure with cloud computing. Traditional models require significant capital investment, manual hardware management, and ongoing maintenance. In contrast, AWS cloud computing offers on-demand scalability, consumption-based pricing, and fully managed services. This analysis demonstrated how cloud computing allows organizations to optimize costs, increase agility, and focus on business innovation rather than infrastructure management.

case-study-1-comp-model

Case Study 2: Cloud Deployment Models

Explanation of Results:
I analyzed the different AWS cloud deployment models:

For my projects, I implemented a Public Cloud deployment where all services were provisioned directly within AWS. This provided simplicity, scalability, and cost efficiency without needing any on-premise infrastructure.

case-study-2-comp-deploy

Case Study 3: Cloud Service Models

Explanation of Results:
This case study focused on understanding the different AWS service models:

My projects used a combination of these models to build scalable data pipelines and analytics platforms fully managed in AWS, showcasing how service model selection supports flexible architecture design.

case-study-3-cloud-svc-model


AWS Cost Analysis

Case Study 4: Total Cost of Ownership — Delaware North

Explanation of Results:
I analyzed the Delaware North case where migrating to AWS significantly reduced total infrastructure costs. The shift from capital-intensive on-premise hardware to flexible cloud services resulted in operational cost savings, resource optimization, and scalability. This case highlighted the financial advantages of cloud migration based on AWS’s Total Cost of Ownership (TCO) framework.

case-study-4-total-cost

Case Study 5: AWS Support Plans

Explanation of Results:
Using AWS Pricing Calculator, I estimated projected costs for my data lake projects. For example, HR dataset ingestion was calculated at approximately $6.01 USD annually based on S3 storage with quarterly uploads. This tool enabled me to forecast costs accurately, assess financial feasibility, and plan resource usage effectively before deployment.

case-study-5-aws-est

Case Study 6: AWS Support Plans

Explanation of Results:
I reviewed AWS Support Plan options: Basic, Developer, Business, and Enterprise. For production workloads, the Business Support Plan would provide 24/7 technical support, faster response times, infrastructure event management, and architectural guidance. This evaluation showed how selecting appropriate support levels can ensure operational continuity and effective issue resolution.

case-study-6-support-plan


AWS Global infrastructure

Explanation of Results:

The project was deployed in the us-east-1 (N. Virginia) AWS region, leveraging AWS’s global infrastructure for high availability and resilience. This region provided multiple Availability Zones for fault tolerance, low-latency resource access, and geographic redundancy. This case study demonstrated how AWS’s global infrastructure supports business continuity, disaster recovery, and scalability.

case-study-7-globl-inf


AWS IAM

(Case #8) Who is Responsible

Explanation of Results:

I analyzed AWS’s Shared Responsibility Model, which divides security responsibilities:

case-study-8-responsible

IAM Practice Lab 1

Explanation of Results:

Through hands-on lab exercises, I configured IAM users, groups, and roles with custom policies. I applied the principle of least privilege, granting users only necessary access permissions. These practical tasks demonstrated how IAM controls secure resource access, prevent unauthorized actions, and support compliance within cloud environments.

case-study-9-iam-lab


AWS VPC

Explanation of Results:

In this lab activity, I designed a custom Virtual Private Cloud (VPC) with public and private subnets, route tables, internet gateways, and security groups. EC2 instances were deployed inside the VPC to simulate secure cloud networking. This exercise provided valuable experience in network isolation, controlled traffic flows, and the foundational principles of secure VPC design for cloud environments.

case-study-10-build-vpc


AWS Lambda

Explanation of Results:

AWS Lambda was explored through lab activities where I created serverless functions triggered by S3 events. These functions automated simple tasks such as generating notifications and processing file uploads. This exercise introduced me to event-driven computing, serverless architecture benefits, and integration with CloudWatch for monitoring Lambda executions.

case-study-11-lambda


AWS EBS

Explanation of Results:

Amazon Elastic Block Store (EBS) volumes were attached to EC2 instances to provide persistent storage during data processing tasks. EBS offered scalable storage performance with snapshot capabilities for backup and recovery. These activities demonstrated how EBS supports compute workloads requiring durable, high-performance storage in cloud environments

images/case-study-12-ebs-lab