SOPHIA UMUKORO

AI-Powered Job-Match and Skill Gap Analysis Web Platform

This platform is designed to enhance job matching and identify skill gaps using AI and machine learning. It integrates tools to analyze résumés and job descriptions, predicting job categories and matching candidates to suitable roles.

- Languages & Tools: Python, MySQL, Streamlit, Jupyter Notebook, GitHub.
- Core Functionality: The platform employs AI and NLP techniques, such as text classification with LinearSVC, job category prediction model, document embedding using Doc2Vec, data preprocessing, tokenization, feature extraction, data cleaning, and visualization.
- Evaluation Metrics: Precision, recall, and F1-score to assess model performance.
- User Interaction: Candidates can upload résumés, search for jobs, and receive skill enhancement recommendations. Employers can post jobs and view candidate matches.

Results: The platform effectively matches candidates with relevant job opportunities, helping both job seekers and employers streamline their recruitment processes.

BCG GenAI Job Simulation

Completed a job simulation involving AI-powered financial chatbot development for BCG's GenAI Consulting team.

- Languages & Tools: Jupyter Notebook, Python, pandas, Excel, Word.
- Core Functionality: Processed and analyzed financial data from 10-K and 10-Q reports using rule-based logic to create a user-friendly financial insights chatbot. Utilized pandas for data manipulation and Excel for data entry and organization.
- Communication: Documented and presented findings in a structured Word report, detailing the chatbot's functionality and its application for financial data interpretation.

Results: Developed a chatbot capable of delivering user-friendly financial insights, demonstrating expertise in data analysis, AI integration, and professional reporting.

Accenture North America Data Analytics and Visualization Job Simulation

Completed a job simulation focused on analyzing datasets and providing actionable insights as a Data Analyst for a hypothetical social media client.

- Languages & Tools: Excel, PowerPoint.
- Core Functionality: Processed, structured, and analyzed seven datasets to uncover content trends and support strategic decisions. Created PowerPoint decks and recorded video presentations to communicate findings effectively.
- Communication: Delivered actionable recommendations and presented insights to internal stakeholders, showcasing data-driven decision-making skills.

Results: Gained hands-on experience in data cleaning, trend analysis, and professional client communication, demonstrating strong analytical and presentation skills.

PwC Switzerland Power BI Job Simulation

Completed a job simulation where I enhanced my Power BI skills to better understand client needs in data visualization. Developed interactive dashboards that communicated KPIs effectively and provided actionable insights for decision-making.

- Languages & Tools: Power BI, Excel, Word.
- Core Functionality: Designed Power BI dashboards to display HR data, focusing on gender-related KPIs. Created reports that identified the root causes of gender balance issues at the executive level, driving data-driven decisions.
- Communication: Delivered valuable insights through concise, informative email communication with engagement partners, showcasing strong analytical problem-solving and reporting skills.

Results: The simulation demonstrated expertise in responding to client requests, delivering well-designed solutions, and improving data visualization outcomes.

FIFA23 Official Dataset of Players Info (Kaggle)

Processed and refined the FIFA23 Official Dataset containing player data from 2017 to 2023. The cleaned dataset provides enhanced usability for programmers, analysts, and enthusiasts, ensuring better data integrity for various programs and research applications.

- Languages & Tools: Python, Pandas, Matplotlib, Seaborn, Kaggle, Excel.
- Core Functionality: Cleaned and preprocessed raw player data, handling missing values, inconsistent formats, and irrelevant records. Designed visualizations to showcase key insights, such as player demographics, performance metrics, and market trends. Published the cleaned dataset back on Kaggle for global access.
- Communication: Provided a detailed dataset description and context on Kaggle to help users understand its structure and potential applications, fostering collaboration within the data community.

Results: Delivered a comprehensive, ready-to-use dataset that enhanced data accessibility and usability, empowering the programming and analytics community to drive meaningful insights and innovative solutions.

Insurance Claim Prediction Model

Visit GitHub Repo

This project aims to predict insurance claims based on various factors related to residential buildings. The dataset includes features such as the year of observation, period of insurance, building type, and geographical location.

- Languages & Tools: Python, XGBoost, Streamlit, Jupyter Notebook, GitHub.
- Core Functionality: Data analysis, cleaning, preprocessing, visualization (using Seaborn and Matplotlib), feature engineering, model development with XGBoost, cross-validation, and hyperparameter tuning.
- Evaluation Metric: ROC-AUC to assess model performance.
- User Interaction: Users can input data through the Streamlit interface to predict potential insurance claims.

Results: The model accurately predicts insurance claims, providing valuable insights for insurance companies and policyholders.

Bank Account Ownership Prediction Model

Visit GitHub Repo

This project focuses on predicting bank account ownership based on demographic and socioeconomic factors. The dataset includes features like age, gender, education level, marital status, and cellphone access.

- Languages & Tools: Python, XGBoost, Streamlit, Jupyter Notebook, GitHub.
- Core Functionality: Data analysis, cleaning, preprocessing, visualization (using Seaborn and Matplotlib), feature engineering, model development with XGBoost, cross-validation, and hyperparameter tuning.
- Evaluation Metrics: Accuracy and F1-score to assess model performance.
- User Interaction: Users can input data through the Streamlit interface to predict bank account ownership.

Results: The model effectively predicts bank account ownership, helping financial institutions better understand and serve their target populations.

Projects

AI-Powered Job-Match and Skill Gap Analysis Web Platform

BCG GenAI Job Simulation

Accenture North America Data Analytics and Visualization Job Simulation

PwC Switzerland Power BI Job Simulation

FIFA23 Official Dataset of Players Info (Kaggle)

Insurance Claim Prediction Model

Bank Account Ownership Prediction Model

Work Experience

National Centre of Artificial Intelligence and Robotics (NCAIR), NITDA

Solina Centre for International Development and Research (SCIDaR)

Solina Centre for International Development and Research (SCIDaR)

Info