SOPHIA UMUKORO'S PORTFOLIO
Let's Connect

As a DATA SCIENTIST and ANALYST, I am passionate about transforming raw data into actionable insights that drive informed decision-making and propel business and technology forward. Building on a solid background in Data Science and Analytics, I apply sophisticated methods in Machine Learning, NLP, and Clustering, alongside skills in Python programming, SQL database management, and data visualization with tools like Power BI, to display patterns and predictions that matter. I am continuously learning and evolving, currently deepening my expertise through a Master's degree in Computer Science (Machine Learning). My work reflects a commitment to excellence, supported by a track record of successful projects that showcase my ability to solve problems and communicate insights effectively. Explore my portfolio to see how I apply data science to real-world challenges, and feel free to connect with me to discuss the future of technology using data.

View Resume

You can also view or download my resume by clicking the button above.

Projects

Project Image

AI-Powered Job-Match and Skill Gap Analysis Web Platform

Visit GitHub Repo

This platform is designed to enhance job matching and identify skill gaps using AI and machine learning. It integrates tools to analyze résumés and job descriptions, predicting job categories and matching candidates to suitable roles.

- Languages & Tools: Python, MySQL, Streamlit, Jupyter Notebook, GitHub.
- Core Functionality: The platform employs AI and NLP techniques, such as text classification with LinearSVC, job category prediction model, document embedding using Doc2Vec, data preprocessing, tokenization, feature extraction, data cleaning, and visualization.
- Evaluation Metrics: Precision, recall, and F1-score to assess model performance.
- User Interaction: Candidates can upload résumés, search for jobs, and receive skill enhancement recommendations. Employers can post jobs and view candidate matches.

Results: The platform effectively matches candidates with relevant job opportunities, helping both job seekers and employers streamline their recruitment processes.

BCG GenAI Job Simulation Image

BCG GenAI Job Simulation

Completed a job simulation involving AI-powered financial chatbot development for BCG's GenAI Consulting team.

- Languages & Tools: Jupyter Notebook, Python, pandas, Excel, Word.
- Core Functionality: Processed and analyzed financial data from 10-K and 10-Q reports using rule-based logic to create a user-friendly financial insights chatbot. Utilized pandas for data manipulation and Excel for data entry and organization.
- Communication: Documented and presented findings in a structured Word report, detailing the chatbot's functionality and its application for financial data interpretation.

Results: Developed a chatbot capable of delivering user-friendly financial insights, demonstrating expertise in data analysis, AI integration, and professional reporting.

Accenture Project Image

Accenture North America Data Analytics and Visualization Job Simulation

Completed a job simulation focused on analyzing datasets and providing actionable insights as a Data Analyst for a hypothetical social media client.

- Languages & Tools: Excel, PowerPoint.
- Core Functionality: Processed, structured, and analyzed seven datasets to uncover content trends and support strategic decisions. Created PowerPoint decks and recorded video presentations to communicate findings effectively.
- Communication: Delivered actionable recommendations and presented insights to internal stakeholders, showcasing data-driven decision-making skills.

Results: Gained hands-on experience in data cleaning, trend analysis, and professional client communication, demonstrating strong analytical and presentation skills.

PwC Project Image

PwC Switzerland Power BI Job Simulation

Completed a job simulation where I enhanced my Power BI skills to better understand client needs in data visualization. Developed interactive dashboards that communicated KPIs effectively and provided actionable insights for decision-making.

- Languages & Tools: Power BI, Excel, Word.
- Core Functionality: Designed Power BI dashboards to display HR data, focusing on gender-related KPIs. Created reports that identified the root causes of gender balance issues at the executive level, driving data-driven decisions.
- Communication: Delivered valuable insights through concise, informative email communication with engagement partners, showcasing strong analytical problem-solving and reporting skills.

Results: The simulation demonstrated expertise in responding to client requests, delivering well-designed solutions, and improving data visualization outcomes.

Kaggle Project Image

FIFA23 Official Dataset of Players Info (Kaggle)

Processed and refined the FIFA23 Official Dataset containing player data from 2017 to 2023. The cleaned dataset provides enhanced usability for programmers, analysts, and enthusiasts, ensuring better data integrity for various programs and research applications.

- Languages & Tools: Python, Pandas, Matplotlib, Seaborn, Kaggle, Excel.
- Core Functionality: Cleaned and preprocessed raw player data, handling missing values, inconsistent formats, and irrelevant records. Designed visualizations to showcase key insights, such as player demographics, performance metrics, and market trends. Published the cleaned dataset back on Kaggle for global access.
- Communication: Provided a detailed dataset description and context on Kaggle to help users understand its structure and potential applications, fostering collaboration within the data community.

Results: Delivered a comprehensive, ready-to-use dataset that enhanced data accessibility and usability, empowering the programming and analytics community to drive meaningful insights and innovative solutions.

Project Image

Insurance Claim Prediction Model

Visit GitHub Repo

This project aims to predict insurance claims based on various factors related to residential buildings. The dataset includes features such as the year of observation, period of insurance, building type, and geographical location.

- Languages & Tools: Python, XGBoost, Streamlit, Jupyter Notebook, GitHub.
- Core Functionality: Data analysis, cleaning, preprocessing, visualization (using Seaborn and Matplotlib), feature engineering, model development with XGBoost, cross-validation, and hyperparameter tuning.
- Evaluation Metric: ROC-AUC to assess model performance.
- User Interaction: Users can input data through the Streamlit interface to predict potential insurance claims.

Results: The model accurately predicts insurance claims, providing valuable insights for insurance companies and policyholders.

Project Image

Bank Account Ownership Prediction Model

Visit GitHub Repo

This project focuses on predicting bank account ownership based on demographic and socioeconomic factors. The dataset includes features like age, gender, education level, marital status, and cellphone access.

- Languages & Tools: Python, XGBoost, Streamlit, Jupyter Notebook, GitHub.
- Core Functionality: Data analysis, cleaning, preprocessing, visualization (using Seaborn and Matplotlib), feature engineering, model development with XGBoost, cross-validation, and hyperparameter tuning.
- Evaluation Metrics: Accuracy and F1-score to assess model performance.
- User Interaction: Users can input data through the Streamlit interface to predict bank account ownership.

Results: The model effectively predicts bank account ownership, helping financial institutions better understand and serve their target populations.

Work Experience

NCAIR Logo

National Centre of Artificial Intelligence and Robotics (NCAIR), NITDA

Data Science Intern, March – August 2023

SCIDaR Logo

Solina Centre for International Development and Research (SCIDaR)

Data Analyst Intern, February 2025 – December 2025

SCIDaR Logo

Solina Centre for International Development and Research (SCIDaR)

Program Data Analyst, January 2026 – Present

Info

Feel free to contact and connect with me on my social media platforms!

Phone: +234-905-910-1267

Email: oghenekes73@gmail.com

Twitter Instagram LinkedIn GitHub