I adapted the xor example here from Python; they used NumPy.
NB. input data
X =: 4 2 $ 0 0 0 1 1 0 1 1
NB. target data, ~: is 'not-eq' aka xor?
Y =: , (i.2) ~:/ (i.2)
scale =: (-&1)@:(*&2)
NB. initialize weights b/w _1 and 1
NB. see https://code.jsoftware.com/wiki/Vocabulary/dollar#dyadic
init_weights =: 3 : 'scale"0 y ?@$ 0'
w_hidden =: init_weights 2 2
w_output =: init_weights 2
b_hidden =: init_weights 2
b_output =: scale ? 0
dot =: +/ . *
sigmoid =: monad define
% 1 + ^ - y
)
sigmoid_ddx =: 3 : 'y * (1-y)'
NB. forward prop
forward =: dyad define
'WH WO BH BO' =. x
hidden_layer_output =. sigmoid (BH +"1 X (dot "1 2) WH)
prediction =. sigmoid (BO + WO dot"1 hidden_layer_output)
(hidden_layer_output;prediction)
)
train =: dyad define
'X Y' =. x
'WH WO BH BO' =. y
'hidden_layer_output prediction' =. y forward X
l1_err =. Y - prediction
l1_delta =. l1_err * sigmoid_ddx prediction
hidden_err =. l1_delta */ WO
hidden_delta =. hidden_err * sigmoid_ddx hidden_layer_output
WH_adj =. WH + (|: X) dot hidden_delta
WO_adj =. WO + (|: hidden_layer_output) dot l1_delta
BH_adj =. +/ BH,hidden_delta
BO_adj =. +/ BO,l1_delta
(WH_adj;WO_adj;BH_adj;BO_adj)
)
w_trained =: (((X;Y) & train) ^: 10000) (w_hidden;w_output;b_hidden;b_output)
guess =: >1 { w_trained forward X
Compare to this K implementation for style.
As it happens, this J code is substantially faster than the equivalent using NumPy (0.13s vs. 0.59s).
I'm quite curious as to why the J is so much more performant. I read APL since 1978 recently and APL has quite a few differences as an array environment compared to conventional programming languages.