Digit Classification
This tutorial will use Machine Learning to classify handwritten digits from the following dataset: Optical Recognition of Handwritten Digits. This will showcase the use of Kosh Loaders which can read any type of data.
Overview
Kosh is a powerful database tool that can be used to store, load, and modify different types of data such as scalar/string data, timeseries data, metadata, and different filetypes. We will use Kosh to store Optical Recognition of Handwritten Digits and add metadata to it. We will then load the images through Kosh and prepare them for classification machine learning model. We will use SciKit Learn's machine learning algorithms for this tutorial.
Visualization Digit Classification Kosh Notebook
This notebook allows the user to train a machine learning model and visualize its predictions. The notebook can be updated as needed to modify the machine learning model and post-process the prediction results. The notebook has more details on what training a machine learning model entails.
Below is the confusion matrix of the machine learning model prediction and its true value for comparison. This is where fine tuning the model becomes an "art" since adjusting these values could give a completely different prediction. What works for this set of data might not work for another. Note that due to the randomness of the data split, the predictions below will not be the same each time.

How to run
-
Run
setup.shin the top directory to create a virtual environment with all necessary dependencies and install the jupyter kernel. -
Run
source weave_demos_venv/bin/activateto enter the virtual environment (you candeactivatewhen you've finished the demo to exit it) andcdback into this directory. -
Run
visualization_metadata_machine_learning_kosh.ipynbas it contains all the necessary information.
Content overview
Starting files:
visualization_metadata_machine_learning_kosh.ipynb: A Jupyter notebook to train the machine learning model.
Files created by the demo:
*confusion_matrix.png: Confusion matrices of the Train, Validation, and Test datasets for the machine learning model.images/: The location of each datapoint saved image.