As a DATA SCIENTIST and ANALYST, I am passionate about transforming raw data into actionable insights that drive informed decision-making and propel business and technology forward. Building on a solid background in Data Science and Analytics, I apply sophisticated methods in Machine Learning, NLP, and Clustering, alongside skills in Python programming, SQL database management, and data visualization with tools like Power BI, to display patterns and predictions that matter. I am continuously learning and evolving, currently deepening my expertise through a Master's degree in Computer Science (Machine Learning). My work reflects a commitment to excellence, supported by a track record of successful projects that showcase my ability to solve problems and communicate insights effectively. Explore my portfolio to see how I apply data science to real-world challenges, and feel free to connect with me to discuss the future of technology using data.
You can also view or download my resume by clicking the button above.
This platform is designed to enhance job matching and identify skill gaps using AI and machine learning. It integrates tools to analyze résumés and job descriptions, predicting job categories and matching candidates to suitable roles.
- Languages & Tools: Python, MySQL, Streamlit, Jupyter Notebook, GitHub.
- Core Functionality: The platform employs AI and NLP techniques, such as text classification with LinearSVC, job category prediction model, document embedding using Doc2Vec, data preprocessing, tokenization, feature extraction, data cleaning, and visualization.
- Evaluation Metrics: Precision, recall, and F1-score to assess model performance.
- User Interaction: Candidates can upload résumés, search for jobs, and receive skill enhancement recommendations. Employers can post jobs and view candidate matches.
Results: The platform effectively matches candidates with relevant job opportunities, helping both job seekers and employers streamline their recruitment processes.
Completed a job simulation involving AI-powered financial chatbot development for BCG's GenAI Consulting team.
- Languages & Tools: Jupyter Notebook, Python, pandas, Excel, Word.
- Core Functionality: Processed and analyzed financial data from 10-K and 10-Q reports using rule-based logic to create a user-friendly financial insights chatbot. Utilized pandas for data manipulation and Excel for data entry and organization.
- Communication: Documented and presented findings in a structured Word report, detailing the chatbot's functionality and its application for financial data interpretation.
Results: Developed a chatbot capable of delivering user-friendly financial insights, demonstrating expertise in data analysis, AI integration, and professional reporting.
Completed a job simulation focused on analyzing datasets and providing actionable insights as a Data Analyst for a hypothetical social media client.
- Languages & Tools: Excel, PowerPoint.
- Core Functionality: Processed, structured, and analyzed seven datasets to uncover content trends and support strategic decisions. Created PowerPoint decks and recorded video presentations to communicate findings effectively.
- Communication: Delivered actionable recommendations and presented insights to internal stakeholders, showcasing data-driven decision-making skills.
Results: Gained hands-on experience in data cleaning, trend analysis, and professional client communication, demonstrating strong analytical and presentation skills.
Completed a job simulation where I enhanced my Power BI skills to better understand client needs in data visualization. Developed interactive dashboards that communicated KPIs effectively and provided actionable insights for decision-making.
- Languages & Tools: Power BI, Excel, Word.
- Core Functionality: Designed Power BI dashboards to display HR data, focusing on gender-related KPIs. Created reports that identified the root causes of gender balance issues at the executive level, driving data-driven decisions.
- Communication: Delivered valuable insights through concise, informative email communication with engagement partners, showcasing strong analytical problem-solving and reporting skills.
Results: The simulation demonstrated expertise in responding to client requests, delivering well-designed solutions, and improving data visualization outcomes.
Processed and refined the FIFA23 Official Dataset containing player data from 2017 to 2023. The cleaned dataset provides enhanced usability for programmers, analysts, and enthusiasts, ensuring better data integrity for various programs and research applications.
- Languages & Tools: Python, Pandas, Matplotlib, Seaborn, Kaggle, Excel.
- Core Functionality: Cleaned and preprocessed raw player data, handling missing values, inconsistent formats, and irrelevant records. Designed visualizations to showcase key insights, such as player demographics, performance metrics, and market trends. Published the cleaned dataset back on Kaggle for global access.
- Communication: Provided a detailed dataset description and context on Kaggle to help users understand its structure and potential applications, fostering collaboration within the data community.
Results: Delivered a comprehensive, ready-to-use dataset that enhanced data accessibility and usability, empowering the programming and analytics community to drive meaningful insights and innovative solutions.
This project aims to predict insurance claims based on various factors related to residential buildings. The dataset includes features such as the year of observation, period of insurance, building type, and geographical location.
- Languages & Tools: Python, XGBoost, Streamlit, Jupyter Notebook, GitHub.
- Core Functionality: Data analysis, cleaning, preprocessing, visualization (using Seaborn and Matplotlib), feature engineering, model development with XGBoost, cross-validation, and hyperparameter tuning.
- Evaluation Metric: ROC-AUC to assess model performance.
- User Interaction: Users can input data through the Streamlit interface to predict potential insurance claims.
Results: The model accurately predicts insurance claims, providing valuable insights for insurance companies and policyholders.
This project focuses on predicting bank account ownership based on demographic and socioeconomic factors. The dataset includes features like age, gender, education level, marital status, and cellphone access.
- Languages & Tools: Python, XGBoost, Streamlit, Jupyter Notebook, GitHub.
- Core Functionality: Data analysis, cleaning, preprocessing, visualization (using Seaborn and Matplotlib), feature engineering, model development with XGBoost, cross-validation, and hyperparameter tuning.
- Evaluation Metrics: Accuracy and F1-score to assess model performance.
- User Interaction: Users can input data through the Streamlit interface to predict bank account ownership.
Results: The model effectively predicts bank account ownership, helping financial institutions better understand and serve their target populations.
Data Science Intern, March – August 2023
Data Analyst Intern, February 2025 – December 2025
Program Data Analyst, January 2026 – Present
Feel free to contact and connect with me on my social media platforms!
Phone: +234-905-910-1267
Email: oghenekes73@gmail.com