EMSE 6992 Assignments

Assignment0

Opens N/A and Due N/A

introduction to Python, Jupyter Notebooks, Github, and Portfolios
  • Lab: Installing and Editing Portfolios

      In this in class, we will be using a variety of tools that will require some initial configuration including Python, Jupyter Notebooks, Github, and your Portfolio. To ensure everything goes smoothly moving forward, we will setup the majority of those tools in this in class activity. 
    
Assignment Repository
assignment0 assignment0 repositiory

Assignment1

Opens 9/10 and Due 10/1

data maniuplation and aggregation
visualization

Applying different visualization techniques to Part 1

Assignment Repository
assignment1 assignment1 repositiory

Assignment2

Opens 10/1 and Due 10/22

scientific computing
  • Lab: Scikit-Learn, Regression, PCA

      The goal of this assignment is to introduce Scikit-Learn and its functions, Regression, and PCA, and still more regression.  All objects within scikit-learn share a uniform common basic API consisting of three complementary interfaces: an estimator interface for building and fitting models, a predictor interface for making predictions and a transformer interface for converting data.  The estimator interface is at the core of the library. It defines instantiation mechanisms of objects and exposes a fit method for learning a model from training data. All supervised and unsupervised learning algorithms (e.g., for classification, regression or clustering) are offered as objects implementing this interface. Machine learning tasks like feature extraction, feature selection or dimensionality reduction are also provided as estimators.
    
statistic alanalysis
  • Lab: Bias, Variance, Cross-Validation
  • Lab: Bayes, Linear Regression, and Metropolis Sampling

      In this lab, and in homework 2, we alluded to cross-validation with a weak explanation about finding the right hyper-parameters, some of which were regularization parameters. We will have more to say about regularization soon, but lets tackle the reasons we do cross-validation. The bottom line is: finding the model which has an appropriate mix of bias and variance. We usually want to sit at the point of the tradeoff between the two: be simple but no simpler than necessary.  We do not want a model with too much variance: it would not generalize well.  This phenomenon is also called overfitting. There is no point doing prediction if we cant generalize well. At the same time, if we have too much bias in our model, we will systematically underpredict or overpredict values and miss most predictions. This is also known as underfitting.  Cross-Validation provides us a way to find the "hyperparameters" of our model, such that we achieve the balance point.
    

Finally, this lab will address the Bayesian formulation of regression and the posterior predictive distribution and Markov/Metropolis-Hastings/Monte-Carlo sampling

Assignment Repository
assignment2 assignment2 repositiory

Assignment3

Opens 10/22 and Due 11/12

machine learning part1

Classification

Identifying to which category an object belongs to.
Applications: Spam detection, Image recognition.
Algorithms: SVM, nearest neighbors, random forest, …

Regression

Predicting a continuous-valued attribute associated with an object.
Applications: Drug response, Stock prices.
Algorithms: SVR, ridge regression, Lasso, …

Clustering

Automatic grouping of similar objects into sets.
Applications: Customer segmentation, Grouping experiment outcomes
Algorithms:  k-Means, spectral clustering, mean-shift, …
machine learning part2

Dimensionality reduction

Reducing the number of random variables to consider.
Applications: Visualization, Increased efficiency
Algorithms: PCA, feature selection, non-negative matrix factorizations

Model selection

Comparing, validating and choosing parameters and models.
Goal: Improved accuracy via parameter tuning
Modules:  grid search, cross validation, metrics.

Preprocessing

Feature extraction and normalization.
Application: Transforming input data such as text for use with machine learning algorithms.
Modules: preprocessing, feature extraction.
Assignment Repository
assignment3 assignment3 repositiory

Assignment4

Opens 11/12 and Due 12/3

portfolio implementation
  • Tutorial: Portfolio Template Modifications

      The goal of this assignment is for each student to have all tabs/links within the portfolio properly updated with their pertinent information (Resume, CV, bio, contact info, etc.).  The student must also have incorporated links to all assignments in their portfolio, as well as updating the Resources and Toolkit sections with examples using the data and Python libraries.  
    
network analysis
big data analytics
Assignment Repository
assignment4 assignment4 repositiory

Extra Credit Assignment

webs craping
sampling and text processing
Assignment Repository
Extra Credit Extra Credit repositiory

EMSE 6992 Labs

lab assignments