Five simple examples Edit on GitHub
Here are five simple hands-on steps, to get started with Torch!
This tutorial supposes the package torch
is already required via
require 'torch'
or that you are using the REPL th
(which requires it automatically).
1. Define a positive definite quadratic form
We rely on a few torch functions here:
rand()
which creates tensor drawn from uniform distributiont()
which transposes a tensor (note it returns a new view)dot()
which performs a dot product between two tensorseye()
which returns a identity matrix*
operator over matrices (which performs a matrix-vector or matrix-matrix multiplication)
We first make sure the random seed is the same for everyone
torch.manualSeed(1234)
-- choose a dimension
N = 5
-- create a random NxN matrix
A = torch.rand(N, N)
-- make it symmetric positive
A = A*A:t()
-- make it definite
A:add(0.001, torch.eye(N))
-- add a linear term
b = torch.rand(N)
-- create the quadratic form
function J(x)
return 0.5*x:dot(A*x)-b:dot(x)
end
Printing the function value (here on a random point) can be easily done with:
print(J(torch.rand(N)))
2. Find the exact minimum
We can inverse the matrix (which might not be numerically optimal)
xs = torch.inverse(A)*b
print(string.format('J(x^*) = %g', J(xs)))
3. Search the minimum by gradient descent
We first define the gradient w.r.t. x
of J(x)
:
function dJ(x)
return A*x-b
end
We then define some current solution:
x = torch.rand(N)
And then apply gradient descent (with a given learning rate lr
) for a while:
lr = 0.01
for i=1,20000 do
x = x - dJ(x)*lr
-- we print the value of the objective function at each iteration
print(string.format('at iter %d J(x) = %f', i, J(x)))
end
You should see
...
at iter 19995 J(x) = -3.135664
at iter 19996 J(x) = -3.135664
at iter 19997 J(x) = -3.135665
at iter 19998 J(x) = -3.135665
at iter 19999 J(x) = -3.135665
at iter 20000 J(x) = -3.135666
4. Using the optim package
Want to use more advanced optimization techniques, like conjugate gradient
or LBFGS? The optim
package is there for that purpose! First, we need to
install it:
luarocks install optim
A word on local variables
In practice, it is never a good idea to use global variables. Use local
at
everywhere. In our examples, we have defined everything in global, such that
they can be cut-and-pasted in the interpreter command line.
Indeed, defining a local like:
local A = torch.rand(N, N)
will be only available to the current scope, which, when running the interpreter, is limited to the current input line. Subsequent lines would not have access to this local.
In lua one can define a scope with the do...end
directives:
do
local A = torch.rand(N, N)
print(A)
end
print(A)
If you cut-and-paste this in the command line, the first print will be a
5x5 matrix (because the local A
is defined for the duration of the scope
do...end
), but will be nil
afterwards.
Defining a closure with an upvalue
We need to define a closure which returns both J(x)
and dJ(x)
. Here we
define a scope with do...end
, such that the local variable neval
is an
upvalue to JdJ(x)
: only JdJ(x)
will be aware of it. Note that in a
script, one would not need to have the do...end
scope, as the scope of
neval
would be until the end of the script file (and not the end of the
line like the command line).
do
local neval = 0
function JdJ(x)
local Jx = J(x)
neval = neval + 1
print(string.format('after %d evaluations J(x) = %f', neval, Jx))
return Jx, dJ(x)
end
end
Training with optim
The package is not loaded by default, so let’s require it:
require 'optim'
We first define a state for conjugate gradient:
state = {
verbose = true,
maxIter = 100
}
and now we train:
x = torch.rand(N)
optim.cg(JdJ, x, state)
You should see something like:
after 120 evaluation J(x) = -3.136835
after 121 evaluation J(x) = -3.136836
after 122 evaluation J(x) = -3.136837
after 123 evaluation J(x) = -3.136838
after 124 evaluation J(x) = -3.136840
after 125 evaluation J(x) = -3.136838
5. Plot
Plotting can be achieved in various ways. For example, one could use the
recent iTorch package. Here, we are
going to use gnuplot
.
luarocks install gnuplot
Store intermediate function evaluations
We modify slightly the closure we had previously, such that it stores intermediate function evaluations (as well as the real time it took to train so far):
evaluations = {}
time = {}
timer = torch.Timer()
neval = 0
function JdJ(x)
local Jx = J(x)
neval = neval + 1
print(string.format('after %d evaluations, J(x) = %f', neval, Jx))
table.insert(evaluations, Jx)
table.insert(time, timer:time().real)
return Jx, dJ(x)
end
Now we can train it:
state = {
verbose = true,
maxIter = 100
}
x0 = torch.rand(N)
cgx = x0:clone() -- make a copy of x0
timer:reset()
optim.cg(JdJ, cgx, state)
-- we convert the evaluations and time tables to tensors for plotting:
cgtime = torch.Tensor(time)
cgevaluations = torch.Tensor(evaluations)
Add support for stochastic gradient descent
Let’s add the training with stochastic gradient, using optim
:
evaluations = {}
time = {}
neval = 0
state = {
lr = 0.1
}
-- we start from the same starting point than for CG
x = x0:clone()
-- reset the timer!
timer:reset()
-- note that SGD optimizer requires us to do the loop
for i=1,1000 do
optim.sgd(JdJ, x, state)
table.insert(evaluations, Jx)
end
sgdtime = torch.Tensor(time)
sgdevaluations = torch.Tensor(evaluations)
Final plot
We can now plot our graphs. A first simple approach is to use gnuplot.plot(x, y)
.
Here we precede it with gnuplot.figure()
to make sure plots are on different figures.
require 'gnuplot'
gnuplot.figure(1)
gnuplot.title('CG loss minimisation over time')
gnuplot.plot(cgtime, cgevaluations)
gnuplot.figure(2)
gnuplot.title('SGD loss minimisation over time')
gnuplot.plot(sgdtime, sgdevaluations)
A more advanced way, which plots everything on the same graph would be the following. Here we save everything in a PNG file.
gnuplot.pngfigure('plot.png')
gnuplot.plot(
{'CG', cgtime, cgevaluations, '-'},
{'SGD', sgdtime, sgdevaluations, '-'})
gnuplot.xlabel('time (s)')
gnuplot.ylabel('J(x)')
gnuplot.plotflush()