Data Science Portfolio of Chris Westendorf
Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Contents
-
Projects
- Benchmarking Forecasting Models for Financial Markets: Developed a plug and play end-to-end machine learning pipeline for forecasting stock prices on a daily scale. Focused primarily on the S&P 500 tickers while empirically surveying linear models such as AR, MA, ARIMA, ARMA, GBRT, PM, SVM cast into a window-based machine learning framework, and non-linear models such as ARCH, GARCH, DNN (LSTMs). Feeding the models and ensembles various fundamental, technical, and decomposed features with the aim of lowering the computational tasks, model complexity, and to afford interpretability
- Benchmarking Facial Beauty Predictors: Explored and benchmarked the visual recognition problem of Facial Beauty Prediction (FBP), assessing facial attractiveness that is consistent with human perception: feature engineering, convolution, scale-invariant feature transformations, bag of visual features term indexing and clustering, convolution neural networks, transfer learning, support vector regression, random forest regression, passive aggressive classification, partial fitting models, hyper parameter tuning.
- Covid-19’s Impact on the Value of Homes in LA: Extracted millions of data points by mining real estate APIs to conduct quantitative analysis and statistical analysis of the housing market as a function of Covid-19 cases: regression, auto-correlation, frequency decomposition, rolling averages, difference in difference, causal inference.
Tools: Dask, Requests, JSON, Pandas, Pickle,Time, Statistics, Aiohttp, Numpy, Altair, Plotly, Datetime, Statsmodels, CV2, Tensorflow, Sklearn, Scipy, PyEMD, XGBoost
-
Data Manipulation & Mining
- Wrangling Metro Sports: Exposing hidden unicode during data imports. Nested functions for modular code efficiency. Correlation & T-test for a convenient data task excuse.
- Wrangling API and Location Data: Import dataset, map neighborhoods, call foursquare API for nearby venues, extrapolate and resample data, create and export dataframe for downstream.
- NLP: N-Grams & Markov: Train a model to create Shakespearan Sonnets; Train a Hidden Markov Model (HMM) that is able to tag words with their part-of-speech (POS)
- Time Series Analysis: Using COVID-19 data to explore: Seasonal Decomposition, Trend Lines, Weighted Moving Averages (WMA), "Time" Exponential Moving Averages (EMA), Similarity using Euclidean Distanct, Calculate Dynamic Time Warping (DTW) Cost, Conduct Stationary Tests, Autocorrelation, Partial Autocorrelation, ARMA Forecasting, Vector Autoregresion (VAR) Forecasting, Granger Causality.
- Streaming Data: Using a file of tweets to explore streaming data techniques: Reservoir Sampling, Counting, Bloom Filter, and a Lossy Counter.
Tools: Regex, NumPy, SciPy, Pandas, Requests, DateTime, Collections, GeoPy, Folium, NLTK, Re, Statsmodels, Path, Sklearn, Math
-
Charting & Visualization: Exploratory Data Analysis
- Altair Plotting for Procedural Programing Access: The benefit of Altair is you get to specify exact steps to get the plot you want. This gives you a lot of control and thus requires a lot of code. Fivethirtyeight replications mostly: heatmaps, bar charts, lineplots, layering & joining.
- Searborn & MatplotLib with US Visitation Data, Generated Samples, Student Metrics: Data visualized through as QQ-plot, distribution plot, facet grid, box plot, violin plot, probability density, bar chart, line plot.
- Plotly Walk Through: Line chart & Scatter Mapbox Tutorial
- Exploratory Data Analysis - Plotly: The journey of a quick exploratory data analysis with strava (fitness app) data: histograms, scatter plots, trendline, moving regression, violin plot, box plot, scatter mapbox.
Tools: Pandas, Altair, Seaborn, MatPlotLib, Plotly Express, Datetime, NumPy, URLlib, SciPy, Math, Watermark
-
Big Data & Code Efficiency
Tools: MapReduce, PySpark, Regex, NLTK, Pandas
Getting In Touch
If you liked what you saw, want to have a chat with me about the portfolio, work opportunities, or collaboration, shoot an email at westendorf {/dot/} chris {/@/} gmail.com