I ported a Python tutorial to J.

J has libraries to read numpy arrays from file; we first export the relevant data, viz.

import numpy as np
import tensorflow\_datasets as tfds

train = tfds.load('mnist', split='train', as\_supervised=True, batch\_size=-1) # as\_supervised and batch\_size are cargo-culted to make it export to numpy
train\_image\_data, train\_labels = tfds.as\_numpy(train)
train\_images = train\_image\_data.astype('float64')/255

\# normalize + export to float64 for J
np.save("data/train-images.npy", train\_images)
np.save("data/train-labels.npy", train\_labels)

test = tfds.load('mnist', split='test', as\_supervised=True, batch\_size=-1)
test\_image\_data, test\_labels = tfds.as\_numpy(test)
test\_images = test\_image\_data.astype('float64')/255

np.save("data/test-images.npy", test\_images)
np.save("data/test-labels.npy", test\_labels)

Then:

load 'convert/numpy'

NB. raze to a 60000x784 array, instead of 28x28x1 images they're vectors of 784
train\_images =: ,"3 readnpy\_pnumpy\_ 'data/train-images.npy'

NB. convert labels (1, 2, etc.) to vectors (0 1 0 0 ...)
vectorize =: 3 : '=&y (i.10)'
train\_labels =: vectorize"0 readnpy\_pnumpy\_ 'data/train-labels.npy'

test\_images =: ,"3 readnpy\_pnumpy\_ 'data/test-images.npy'
test\_labels =: vectorize"0 readnpy\_pnumpy\_ 'data/test-labels.npy'

X =: train\_images
Y =: train\_labels

NB. move (0,1) -> (-1, 1)
scale =: (-&1)@:(\*&2)

NB. initialize weights randomly
init\_weights =: 3 : 'scale"0 y ?@$ 0'
w\_hidden =: init\_weights 784 128
w\_output =: init\_weights 128 10
weights\_init =: w\_hidden;w\_output

dot =: +/ . \*

mmax =: (]->./)

NB. softmax that won't blow up https://cs231n.github.io/linear-classify/#softmax
softmax =: ((^ % (+/@:^)) @: mmax)
d\_softmax =: (([\*(1&-)) @: softmax @: mmax)

sigmoid =: monad define
    % 1 + ^ - y
)
sigmoid\_ddx =: 3 : '(^-y) % ((1+^-y)^2)'

NB. forward prop
forward =: dyad define
    'l1 l2' =. x
    X =. y
    x\_l1 =. X dot l1
    x\_sigmoid =. sigmoid x\_l1
    x\_l2 =. x\_sigmoid dot l2
    prediction =. softmax"1 x\_l2
    (x\_l1;x\_l2;x\_sigmoid;prediction)
)

train =: dyad define
    'X Y' =. x
    'l1 l2' =. y
    'x\_l1 x\_l2 x\_sigmoid prediction' =. y forward X
    l2\_err =. (2 \* (Y - prediction) % {.$prediction) \* (d\_softmax"1 x\_l2)
    l1\_err =. (|: l2 dot (|: l2\_err)) \* (sigmoid\_ddx"1 x\_l1)
    l2\_adj =. l2 + (|: x\_sigmoid) dot l2\_err
    l1\_adj =. l1 + (|: X) dot l1\_err
    (l1\_adj;l2\_adj)
)

train\_mnist =: (X;Y) & train

NB. smooth out a guess into a canonical estimate
pickmax =: monad define
    max =. >./ y
    =&max y
)

eq\_arr1 =: \*/ @: =

point\_accuracy =: monad define
    (+/ (pickmax"1 y) eq\_arr1"1 test\_labels) % {.$ test\_labels
)

NB. to store weights
encode =: 3!:1

w\_train =: (train\_mnist ^: 10000) weights\_init

NB. guess =: >3 { w\_train forward test\_images
NB. point\_accuracy guess
NB.

(encode w\_train) fwrite 'weights.jdat'

This happens to be 95% accurate! So this is a good result despite the fact that training a neural net on CPU is not so satisfying.

It also happens to be substantially slower than using plain NumPy, presumably since it only uses one core.