labs & assignments
EMSE 6992 Assignments
Assignment0
Opens N/A and Due N/A
introduction to Python, Jupyter Notebooks, Github, and Portfolios
-
Lab: Installing and Editing Portfolios
In this in class, we will be using a variety of tools that will require some initial configuration including Python, Jupyter Notebooks, Github, and your Portfolio. To ensure everything goes smoothly moving forward, we will setup the majority of those tools in this in class activity.
Assignment | Repository |
---|---|
assignment0 | assignment0 repositiory |
Assignment1
Opens 9/10 and Due 10/1
data maniuplation and aggregation
-
Lab: Exploratory Data Analysis for Classification using Pandas and Matplotlib
In Part 1 of the assignment, the goals of this assignment are: To practice data manipulation with Pandas To develop intuition about the interplay of precision, accuracy, and bias when making predictions To better understand how election forecasts are constructed
visualization
Applying different visualization techniques to Part 1
Assignment | Repository |
---|---|
assignment1 | assignment1 repositiory |
Assignment2
Opens 10/1 and Due 10/22
scientific computing
-
Lab: Scikit-Learn, Regression, PCA
The goal of this assignment is to introduce Scikit-Learn and its functions, Regression, and PCA, and still more regression. All objects within scikit-learn share a uniform common basic API consisting of three complementary interfaces: an estimator interface for building and fitting models, a predictor interface for making predictions and a transformer interface for converting data. The estimator interface is at the core of the library. It defines instantiation mechanisms of objects and exposes a fit method for learning a model from training data. All supervised and unsupervised learning algorithms (e.g., for classification, regression or clustering) are offered as objects implementing this interface. Machine learning tasks like feature extraction, feature selection or dimensionality reduction are also provided as estimators.
statistic alanalysis
- Lab: Bias, Variance, Cross-Validation
-
Lab: Bayes, Linear Regression, and Metropolis Sampling
In this lab, and in homework 2, we alluded to cross-validation with a weak explanation about finding the right hyper-parameters, some of which were regularization parameters. We will have more to say about regularization soon, but lets tackle the reasons we do cross-validation. The bottom line is: finding the model which has an appropriate mix of bias and variance. We usually want to sit at the point of the tradeoff between the two: be simple but no simpler than necessary. We do not want a model with too much variance: it would not generalize well. This phenomenon is also called overfitting. There is no point doing prediction if we cant generalize well. At the same time, if we have too much bias in our model, we will systematically underpredict or overpredict values and miss most predictions. This is also known as underfitting. Cross-Validation provides us a way to find the "hyperparameters" of our model, such that we achieve the balance point.
Finally, this lab will address the Bayesian formulation of regression and the posterior predictive distribution and Markov/Metropolis-Hastings/Monte-Carlo sampling
Assignment | Repository |
---|---|
assignment2 | assignment2 repositiory |
Assignment3
Opens 10/22 and Due 11/12
machine learning part1
Classification
Identifying to which category an object belongs to.
Applications: Spam detection, Image recognition.
Algorithms: SVM, nearest neighbors, random forest, …
Regression
Predicting a continuous-valued attribute associated with an object.
Applications: Drug response, Stock prices.
Algorithms: SVR, ridge regression, Lasso, …
Clustering
Automatic grouping of similar objects into sets.
Applications: Customer segmentation, Grouping experiment outcomes
Algorithms: k-Means, spectral clustering, mean-shift, …
machine learning part2
Dimensionality reduction
Reducing the number of random variables to consider.
Applications: Visualization, Increased efficiency
Algorithms: PCA, feature selection, non-negative matrix factorizations
Model selection
Comparing, validating and choosing parameters and models.
Goal: Improved accuracy via parameter tuning
Modules: grid search, cross validation, metrics.
Preprocessing
Feature extraction and normalization.
Application: Transforming input data such as text for use with machine learning algorithms.
Modules: preprocessing, feature extraction.
Assignment | Repository |
---|---|
assignment3 | assignment3 repositiory |
Assignment4
Opens 11/12 and Due 12/3
portfolio implementation
-
Tutorial: Portfolio Template Modifications
The goal of this assignment is for each student to have all tabs/links within the portfolio properly updated with their pertinent information (Resume, CV, bio, contact info, etc.). The student must also have incorporated links to all assignments in their portfolio, as well as updating the Resources and Toolkit sections with examples using the data and Python libraries.
network analysis
-
{Add description of example an assignment here}
big data analytics
-
{Add description of example an assignment here}
Assignment | Repository |
---|---|
assignment4 | assignment4 repositiory |
Extra Credit Assignment
webs craping
- Lab: Web Scraping - Part 1
-
{Add description of example an assignment here}
sampling and text processing
-
Lab: Sampling and Text Processing
{Add description of example an assignment here}
Assignment | Repository |
---|---|
Extra Credit | Extra Credit repositiory |
EMSE 6992 Labs
lab assignments
- Week 1: Installing and Editing Portfolios - Part 1
- Week 2: Installing and Editing Portfolios - Part 2
- Week 3: Exploratory Data Analysis for Classification using Pandas and Matplotlib - Part 1
- Week 4: Web Scraping
- Week 5: Exploratory Data Analysis for Classification using Pandas and Matplotlib - Part 2
- Week 6: MapReduce
- Week 7: Scikit-Learn, Regression, PCA
- Week 8: Bayes, Linear Regression, and Metropolis Sampling
- Week 9: Sampling and Text Processing
- Week 10: Support Vector Machines and Neural Networks
- Week 11: Bias, Variance, Cross-Validation - Part 1
- Week 12: Bias, Variance, Cross-Validation - Part 2
- Week 13: Holiday
- Week 14: Networks
- Week 15: Presentations