Initial HMM commit

Kosaro · Kosaro · commit 0e707367c7fc · 2020-11-09T22:00:47.000-05:00
diff --git a/FSA.ipynb b/FSA.ipynb
@@ -8,7 +8,7 @@
     }
    },
    "source": [
-    "# FSA Assignment\n",
+    "https://www.linkedin.com/in/oscar-kosar-kosarewicz/# FSA Assignment\n",
     " Implement the FSA variable selectionmethod for linear models and binary classification with the logistic loss, as\n",
     " described in the slides. Use the parameters s = 0.0001, μ = 30, N iter = 500. Take special care to normalize each column of the X matrix to have zero mean and variance 1 and to use the same mean and standard deviation that you used for normalizing the train set also for normalizing the test set.\n"
    ]
diff --git a/HMM.ipynb b/HMM.ipynb
@@ -0,0 +1,158 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# HMM Assignment\n",
+    "1. Download the dataset hmm_pb1.csv from Canvas. It represents a sequence of\n",
+    "dice rolls $x$ from the Dishonest casino model discussed in class. The model parameters\n",
+    "are exactly those presented in class. The states of $Y$ are 1=’Fair’ and 2=’Loaded’.\n"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "#### Import dependencies"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 407,
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "from matplotlib import pyplot as plt\n",
+    "from sklearn.cluster import KMeans\n",
+    "from os.path import join\n",
+    "from scipy.stats import multivariate_normal\n",
+    "from itertools import repeat\n",
+    "from random import randint"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "#### Data loading functions"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 408,
+   "outputs": [],
+   "source": [
+    "def get_pb1():\n",
+    "    return load_data(\"hmm_pb1.csv\")\n",
+    "\n",
+    "def get_pb2():\n",
+    "    return load_data(\"hmm_ph2.csv\")\n",
+    "\n",
+    "def load_data(filename):\n",
+    "    path = \"data/HMM/\"\n",
+    "    data = np.loadtxt(join(path,filename), delimiter=',')\n",
+    "    return data\n"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "a) Implement the Viterbi algorithm and find the most likely sequence $y$ that generated the observed $x$.\n",
+    " Use the log probabilities, as shown in the HMM slides from\n",
+    "Canvas. Report the obtained sequence $y$ of 1’s and 2’s for verification. (2 points)"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "b) Implement the forward and backward algorithms and run them on the observed\n",
+    "x. You should memorize a common factor $u_t$ for the $\\alpha_t^k$\n",
+    "to avoid floating point underflow, since $\\alpha_t^k$ quickly become very small. The same holds for\n",
+    "$\\beta_t^k$. Report $\\alpha_{125}^1 / \\alpha^2_{125}$ and $\\beta_{125}^1 / \\beta^2_{125}$,\n",
+    "where the counting starts from $t$ = 1. (3 points)"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "2. Download the dataset hmm_pb2.csv from Canvas. It represents a sequence of\n",
+    "10000 dice rolls x from the Dishonest casino model but with other values for the a and\n",
+    "b parameters than those from class. Having so many observations, you are going to\n",
+    "learn the model parameters.\n"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Implement and run the Baum-Welch algorithm using the forward and backward\n",
+    "algorithms that you already implemented for Pb 1. You can initialize the $\\pi,a,b$ with\n",
+    "your guess, or with some random probabilities (make sure that $\\pi$ sums to 1 and that\n",
+    "$a_{ij}, b^i_k$\n",
+    "sum to 1 for each $i$). The algorithm converges quite slowly, so you might need\n",
+    "to run it for up 1000 iterations or more for the parameters to converge.\n",
+    "Report the values of $\\pi,a,b$ that you have obtained. (4 points)\n",
+    "\n"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  }
+ ],
+ "metadata": {
+  "authors": [
+   {
+    "name": "Oscar Kosar-Kosarewicz"
+   },
+   {
+    "name": "Nicholas Phillips"
+   }
+  ],
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

Original file line number	Diff line number	Diff line change
`@@ -8,7 +8,7 @@`
`8`	`8`	`}`
`9`	`9`	`},`
`10`	`10`	`"source": [`
`11`		`- "# FSA Assignment\n",`
	`11`	`+ "https://www.linkedin.com/in/oscar-kosar-kosarewicz/# FSA Assignment\n",`
`12`	`12`	`" Implement the FSA variable selectionmethod for linear models and binary classification with the logistic loss, as\n",`
`13`	`13`	`" described in the slides. Use the parameters s = 0.0001, μ = 30, N iter = 500. Take special care to normalize each column of the X matrix to have zero mean and variance 1 and to use the same mean and standard deviation that you used for normalizing the train set also for normalizing the test set.\n"`
`14`	`14`	`]`