Conjugate Gradient Using Java

The conjugate gradient method is a mathematical technique that can be useful for the optimization of both linear and non-linear systems. This technique can be used as an iterative algorithm, and also as a direct method, and it will produce a numerical solution. Generally this method is used for very large systems where it is not practical to solve with a direct method.

We initially start with an initial guess of the solution and then we calculate the gradient to find a first direction to move. During the next iterations, the directions have to be conjugate to the previous ones, thereby assuring the conditions of the previous section.

Let u, v be vectors in R^n and let A be a positive definite n×n matrix. u and v are said to be mutually A-conjugate if and only if u’Av=0 where u’ represents u transpose.

As in gradient descent, we start with a guess x0 for x and calculate the gradient at that point, which will be Ax0−b. We take the first conjugate vector p1 in the opposite direction and make the step x1=x0+α1p1. Now on we have to guarantee that the next steps are in conjugate directions to the previous. Hence, for every residual rk=b−Axk. we need to build the conjugate by making it orthogonal to the previous ones (this is done by removing any projection of the current residual in the previous conjugate vectors).

βik is the projector operator of rk on pi. After that, we make the step in the direction of the new conjugate vector x(k+1).

Step size is defined as αk=(p_k’)(b)/(p_k’)(A)(p_k).

Pseudo Code:

Java Code:

Importing Libraries and Function create a range of values with defined interval.

Function to print the matrix and Function to multiply a matrix with a scalar value.

Function to create random matrix.

Function for Conjugate Gradient.

Class – three_mat for returning three matrices namely x_mat (solution), p_mat (direction), and r_mat (residual).

To check whether the residuals are perpendicular to directions. It can also be used to check between residuals and also between directions

Main Function:

Output:

Code and Libraries at https://github.com/Prasanth-s-n/Java_Code

This is the Second Blog in Gradient Based Optimization Algorithm Series. Check out my previous blogs and other blogs.

Hope this is helpful.

Thank you, Have a great day.

PLEASE FEEL FREE TO SEND FEEDBACK OR COMMENT TO IMPROVE US

Gradient Descent Algorithm Using Java

Gradient Descent:

Gradient Descent is an iterative optimization algorithm used to find a local minimum of a given function. In each step, it tries to move down the slope and reaches near to the minimum point. It’s widely used within high-level machine learning algorithms to minimize loss functions also known as cost function.

Let us consider a function f(x) = (1/2)*(x’Ax)-b’x (‘ = transpose, b’ = transpose of b ). It’s gradient (also known as slope) can be obtained by differentiating it w.r.t x. So,

gradient = df(x)/dx = Ax-b

To start with gradient descent, we should choose an arbitrary value of x (For say x0). So the gradient at this point is given by

g(x0) = Ax0 – b

The Next point can be found out by updating the previous point (i.e x0).

xnew = xold + Update

Update = alpha*(-g(x0))

r0 = -g(x0) where r0 is the residual at point x0. We use residual because as the slope increase we cannot arrive at the solution. In other words, Increase in slope implies moving upwards whereas we need to move downwards.

alpha = step size (larger the step size faster we arrive at solution)

We need to have good alpha because if we set the alpha to be higher then there will be a problem called over-passing the minima which tell us that we cannot reach the minima point. To understand this better, for example you are 10 meters away from home (for say you might not know the distance). To reach home faster you keep 3 steps at once (for say 3 steps/sec). So after 3 seconds you reach 9 meters and next you keep 3 more steps which tells that you did not reach home due to the larger step size. (Jokes apart :))

Similarly smaller steps leads to time consumption. How to do we find good alpha?

we know, x1 = x0 + (alpha*(-g(x0))) = x0-alpha*g(x0). So we can rewrite f(x1) as

We consider df/dalpha = 0 because we assume that the next point is the solution (also know as end point).

Example for the above mentioned function is given below.

Advantages and Disadvantages:

Advantages:

  • Memory Efficient
  • Efficient For Large Dataset
  • Computationally Fast

Disadvantages:

  • Frequent Update is Computationally Expensive
  • Difficult to perform on Noisy Function
  • Noisy due to Frequent Updates

Java Code:

Gradient Function:

Inputs: Matrix A, Matrix B, Arbitrary vector of x, No. of iterations

Main Function:

Output:

Library Used:

  • Matplotlib4j (com.github.sh0nk.matplotlib4j)- Python Based Library used for Plotting
  • Jblas – Library Used to perform Matrix based Operations

Link to download libraries (Jar files) and java code: https://drive.google.com/file/d/1t_hD4kdQbg3r_4PQtXyam6hMwPvxRuhs/view?usp=sharing

NOTE: This is the First Blog in Gradient Based Optimization Algorithm Series. Subscribe to notify when we upload the next blog of the series along with other blogs.

Thank you, Have a great day.

PLEASE FEEL FREE TO SEND FEEDBACK OR COMMENT TO IMPROVE US

← Back

Thank you for your response. ✨

Please rate our website(required)