I adapted the xor example here from Python; they used NumPy.
NB. input data
X =: 4 2 $ 0 0 0 1 1 0 1 1
NB. target data, ~: is 'not-eq' aka xor?
Y =: , (i.2) ~:/ (i.2)
scale =: (-&1)@:(\*&2)
NB. initialize weights b/w \_1 and 1
NB. see https://code.jsoftware.com/wiki/Vocabulary/dollar#dyadic
init\_weights =: 3 : 'scale"0 y ?@$ 0'
w\_hidden =: init\_weights 2 2
w\_output =: init\_weights 2
b\_hidden =: init\_weights 2
b\_output =: scale ? 0
dot =: +/ . \*
sigmoid =: monad define
% 1 + ^ - y
)
sigmoid\_ddx =: 3 : 'y \* (1-y)'
NB. forward prop
forward =: dyad define
'WH WO BH BO' =. x
hidden\_layer\_output =. sigmoid (BH +"1 X (dot "1 2) WH)
prediction =. sigmoid (BO + WO dot"1 hidden\_layer\_output)
(hidden\_layer\_output;prediction)
)
train =: dyad define
'X Y' =. x
'WH WO BH BO' =. y
'hidden\_layer\_output prediction' =. y forward X
l1\_err =. Y - prediction
l1\_delta =. l1\_err \* sigmoid\_ddx prediction
hidden\_err =. l1\_delta \*/ WO
hidden\_delta =. hidden\_err \* sigmoid\_ddx hidden\_layer\_output
WH\_adj =. WH + (|: X) dot hidden\_delta
WO\_adj =. WO + (|: hidden\_layer\_output) dot l1\_delta
BH\_adj =. +/ BH,hidden\_delta
BO\_adj =. +/ BO,l1\_delta
(WH\_adj;WO\_adj;BH\_adj;BO\_adj)
)
w\_trained =: (((X;Y) & train) ^: 10000) (w\_hidden;w\_output;b\_hidden;b\_output)
guess =: >1 { w\_trained forward X
Compare to this K implementation for style.
As it happens, this J code is substantially faster than the equivalent using NumPy (0.13s vs. 0.59s).
I'm quite curious as to why the J is so much more performant. I read APL since 1978 recently and APL has quite a few differences as an array environment compared to conventional programming languages.
