Structural Optimization Univ Univ.Prof. .Prof. Dr. Dr. Christ Christian ian Bucher Bucher Vienna University University of Technology, echnology, Austria WS 2009/10 Last Last Upda Update: te: Octobe Octoberr 22, 22, 2009
Contents 1
Introduction 1.1 Mathematical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Nonlinear Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Unconstrained Optimization 2.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . 2.2 Search methods . . . . . . . . . . . . . . . . . . . . 2.2.1 Newton-Raphson method . . . . . . . . . . 2.2.2 Steepest descent method . . . . . . . . . . 2.2.3 Quasi-Newton methods . . . . . . . . . . . 2.3 Applications to shape optimization . . . . . . . . . 2.3.1 Minimal Surfaces . . . . . . . . . . . . . . 2.3.2 Shape optimization by energ y minimization
3
4
2 2 3
. . . . . . . .
6 6 7 7 9 10 12 12 13
. . . .
18 18 19 21 22
Genetic algorithms 4.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Choice of encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Selection Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26 26 27 28
Constrained Optimization 3.1 Optimality Conditions . . . . . . . . . . 3.2 Quadratic problem with linear constraints 3.3 Sequential quadratic programming (SQP) 3.4 Penalty methods . . . . . . . . . . . . .
1
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . .
WS 09/10 4.4 4.5 5
Structural Optimization
Recombination and mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29 29
Robustness in optimization 5.1 Stochastic modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Application example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29 29 31
c 2007-2009 Christian Bucher ⃝
October 22, 2009
2
WS 09/10 4.4 4.5 5
Structural Optimization
Recombination and mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29 29
Robustness in optimization 5.1 Stochastic modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Application example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29 29 31
c 2007-2009 Christian Bucher ⃝
October 22, 2009
2
WS 09/10
1
Structural Optimization
Intr Introd oduc ucti tion on
1.1
Mathematical Mathematical Optimizati Optimization on
A mathematical optimization problem has the form Minimize f 0(x) subject to f i (x)
(1.1)
≤ 0; i = 1 . . . m
n R is the optimization The vector (x) = [x1 , x2 , . . . , xn ] optimization (design) (design) variable, variable, the function f 0 : n n R R is the objective function and the functions f i : R R; i = 1 . . . m are the constraint functions. A vector x∗ is called optimal optimal (i.e. (i.e. a solution solution of the problem problem (1.1)) (1.1)) if it has he smalles smallestt objective objective among all vectors vectors that that satisfy the constraint constraints. s. So for any z with f 1 (z) 0, . . . fm (z) 0 we have ∗ f (z) f (x ). Equality constraints can be realized by using pairs pairs of inequality constraints, constraints, e.g. e.g.
∈
→
→
≤
≥
≤
f i (x)
≤0 −f (x) ≤ 0
(1.2)
i
The set
F containing all vectors z satisfying the constraints is called feasible domain . F = {z|f (z) ≤ 0, . . . , f (z) ≤ 0} m
1
Example:
(1.3)
Maximize the area of a rectangle for given circumference 4L.
We We want to maximize A = x1 x2 subject to 2x1 + 2 x2 = 4L. In terms terms of Eq.(1. Eq.(1.1) 1) we write write f 0 =
−x x 1
2
f 1 = 2x1 + 2x2 f 2 =
− 4L
−2x − 2x 1
2
+ 4L
We We can easily find a direct d irect solution by eliminating one variable using the constraints so that x1 = 2L x1 which gives
−
f 0 =
2 1
−A = −x x = −x (2L − x ) = x − 2Lx 1
2
1
1
1
Elementary calculus gives df 0 = 2x1 dx1 from which we find that x1 = L and x2 = L. c 2007-2009 Christian Bucher ⃝
− 2L = 0
October 22, 2009
3
WS 09/10
Structural Optimization
Exercise:
Minimize the circumference 2x1 +2x2 of a rectangle subject to a given area A = x1 x2 .
Convex Optimization Problems: A special class of optimization problems is called convex . In these problems, both objective and constraint functions are convex functions. This means that
f i [αx + (1
− α) y] ≤ αf (x) + (1 − α)f ( y); i = 0, . . . m i
(1.4)
i
Geometrically, this means the the function between two points lies ”below” a straight line. Example:
Given the optimization problem f 0 (x) = x1 + x2 f 1 (x) = x21 + x22
−R
2
F
Show that this is a convex optimization problem and determine the feasible domain . For the solution we need to discuss the properties of f 0 and f 1 . f 0 is a linear function, and we can easily see that f 0 [αx + (1
− α) y] = αx
+ (1 α)y1 + αx2 + (1 α)y2 = α(x1 + x2) + (1 α)(y1 + y2 ) = αf 0(x) + (1
−
1
−
−
− α)f ( y) 0
which satisfies the requirement. For f 1 the process is a bit more lengthy Note: A twice differentiable function g : Rn semi-definite for all x. The Hessian is defined by
→ R is convex, if its Hessian matrix H is positive
H g
=
∂ 2 g ∂x 21
g
∂ 2 g ∂x 1 ∂x n
... .. .
.. .
2
∂ g ∂x n ∂x 1
.. .
2
∂ g ∂x 2n
...
(1.5)
Exercise:
f 0 (x) = ex ex 1
f 1 (x) =
2
−x − x − R 1
2
Show that this is a convex optimization problem and determine the feasible domain
1.2
F .
Nonlinear Optimization
In typical practical problems, both the objective function f 0 and the constraint functions f 1, . . . fm depend nonlinearly in the design variables xk , k = 1 . . . n. In such a case, there may be severyl local minima so that in a subset ℓ of the feasible domain we have a local minimum point x∗ℓ . f (x∗ ). In general it is very difficult to decide if a This means that for all z ℓ we have f (z) local minimum point x∗ℓ is actuall the global minimum point x∗ .
∈ F
F ⊂ F
≥
Kuhn-Tucker-Condition: A necessary condition for the existence of a local minimum point x∗ℓ in the interior of is ∂f 0 = 0; k = 1 . . . n (1.6) ∂x k Note: This condition need not be fulfilled for a local minimum point on the boundary ∂ of the feasible domain.
F
F
c 2007-2009 Christian Bucher ⃝
October 22, 2009
4
WS 09/10
Structural Optimization
Figure 1.1: Local minima in a nonlinear optimization problem
Example:
Consider the optimization problem f 0 (x) = x2 f 1 (x) = R
−x
for different values of R. The feasible domain is the interval from R to , = [R, ). The KT condition for f 0 states that a local minimum point should satisfy ddf x = 2x = 0. We can immediately see that for any R < the point x = 0 belongs to the feasible domain, and hence x∗ = 0. For R > 0, however, the point x = 0 does not belong to the feasible set, and we have x∗ = R. 0
Exercise:
∞ F
∞
For the optimization problem f 0 (x) = 1
2
4
−x +x f (x) = R − x 1
determine the number and location of local minimum points depending on the value of R. Note: Convex optimization problems have only one local minimum point which is the global minimum point x∗ . c 2007-2009 Christian Bucher ⃝
October 22, 2009
5
WS 09/10 Example:
Structural Optimization Consider a simply two-bar truss system
We want to choose the height x of the truss system such that the deformation energy under a static load F becomes minimal. The deformation energy is equal to the work done by the applied load which is (assuming linear elasticity) W = 12 F u. Since we keep the load F constant in the optimization, this is equivalent to minimizing the deflection u. For convenience, we introduce a new variable α defined by tan α = Lx . From equilibrium conditions, we find that the force F s in one bar is given by F s =
F 2 sin α
From that we compute the compression us of one bar as us =
F s Ls FL = EA 2EA sin α cos α
an finally the vertical displacement u becomes u=
1 us FL = 2EA sin2 α cos α sin α
Minimizing u is equivalent to maximizing the function f = sin2 α cos α. The KT-condition for this function is df = 2 sin α cos2 α sin3 α = 0 dα One solution is sin α = 0 which does not give a useful result. The other solution is given by
−
2 cos2 α
− sin
2
α=2
2
− 2 sin α − sin
2
The first value of α satifying this relation is α = arcsin x = L tan 54.7 Exercise:
◦
√ = 2 L.
α=2
− 3 sin
2
α=0
= 54 7 2 3
. ◦ . For this value, we have
Solve the same problem under a horizontal load F .
c 2007-2009 Christian Bucher ⃝
October 22, 2009
6
WS 09/10
2 2.1
Structural Optimization
Unconstrained Optimization Basic Concepts
Nonlinear optimization problems are frequently solved by search techniques. Such methods generate a sequence of points xk , k = 1 . . . whose limit is a local minimum point x∗ℓ . In this context, several properties of the objective function f (x) are of interest.
∞
• Local properties
Here the most important quantity is the gradient of the objective function
∂f ∂f ∂f f = , , ..., ∂x 1 ∂x 2 ∂x n
∇
T
= g(xa )
(2.1)
The gradient can be used to expand f (x) into a first-order Taylor series about an arbitrary point xa : f (x) = f (xa ) + ( x
−x ) a
T
g(x)
(2.2)
Note that a tangent plane on f (x) in x = xa is given by
(x
T
−x ) a
g(xa ) = 0
(2.3)
Obviously, this requires local differentiability of the objective function.
• Regional properties
This means looking a the ”topography” of the objective function A ridge
Figure 2.1: Regional properties of objective function
is loosely defined as a region with a pronounced change of the objective function in one specific direction including at least one local optimum. A saddle is a region in which the objective appears to have a minimum along certain directions while it appears to possess a maximum in other specific directions.
c 2007-2009 Christian Bucher ⃝
October 22, 2009
7
WS 09/10
Structural Optimization
• Global properties
This deals with properties affecting the convergence of search methods to the global minimum. The properties of interest are
• continuity and differentiability • convexity • separability Remark: Small errors in the objective function (numerical ”noise”) may actually lead to large errors in the gradients which may effectively destroy differentiability. As an excamples consider the function x f (x) = 1 + ε sin (2.4) ε which is almost constant for small values of ε. However, its derivative is
f ′ (x) = cos
x ε
(2.5)
which is not small and very rapidly oscillating. Definition: A function f (x) is called separable (non-interacting) if can be expressed as n
∑ ( )=
f x
q k (xk )
(2.6)
k=1
Such an objective function can be minimized by minimizing the partial functions q k separately. Sometimes a function can be made separable by an appropriate change of variables. Example:
Consider the function f (x1 , x2 ) = x21 + 10x1 x2 + 100x22
If we introduce a new set of variables z 1, z 2 by x1 = z 1
− √ 575 z ; 2
x2 =
√ 175 z
2
we obtain f = z 12
1 5 100 − 2 √ 575 z z + 25 z + 10 √ z z − 10 z + z 75 75 75 75 1 2
2 2
2 2
1 2
2 2
= z 12 + z 22
which is separable in the new variables (cf. Fig. 2.2).
2.2
Search methods
2.2.1
Newton-Raphson method
Within this method, the sequence xk is constructed using the Hessian matrix H at each iteration. Given a start vector x0 , the iteration proceeds as xk+1 = xk + ∆xk = xk c 2007-2009 Christian Bucher ⃝
−H
October 22, 2009
−1
(xk )g(xk )
(2.7) 8
WS 09/10
Structural Optimization
Figure 2.2: Objective function in non-separable and separable form
This, of course, requires that f is twice differentiable. Since we assumed convexity the Hessian matrix is positive definite and hence gT (xk )∆xk =
T
−g
(xk )H−1(xk )g(xk )
≤0
(2.8)
The Newton step is a descent step (but not the steepest). The choice of the Newton method can be motivated by studying a second order Taylor expansion f ˆ(x) of the objective function f (x):
1 2
f ˆ(x + v ) = f (x) + gT (x) v + v T H(x) v
This is a convex function of v which is minimized by v =
−1
−H
(2.9)
(x)g(x).
Figure 2.3: Quadratic approximation of objective function
c 2007-2009 Christian Bucher ⃝
October 22, 2009
9
WS 09/10 Example:
Structural Optimization Consider the objective function f (x1 , x2 ) = (x1 + 1)2 + x21 x22 + exp(x1
−x ) 2
(2.10)
A plot of this function is shown in Fig. 2.4
Figure 2.4: Plot of objective function and Newton iteration sequence
2.2.2
Steepest descent method
The first-order Taylor approximation f ˆ of f around a point x is f ˆ(x + v ) = f (x) + gT (x) v
(2.11)
We choose the direction of v such that the decrease in f ˆ becomes as large as possible. Let . be any norm on Rn . We define a normalized steepest descent direction as
∥∥
p = argmin gT (x) v ∥ v ∥=1
[
If we choose . to be the Euclidean norm, i.e. x =
∥∥
∥∥
p =
]
(2.12)
√
− ∥gg((xx))∥
xT x, then we obtain the direction
(2.13)
The steepest descent method then performs a line search along the direction defined by p, so that xk+1 = xk + t p. There are several possibilities of searching. c 2007-2009 Christian Bucher ⃝
October 22, 2009
10
WS 09/10
Structural Optimization
• Exact line search. Determine
t = argmin f (x + s p)
(2.14)
s≥0
This may be very expensive.
• Backtracking line search. Given a descent direction p, we choose α ∈ (0, 0.5) and β ∈ (0, 1). Then we apply this algorithm:
– t := t0 – while ϵ = f (x + t p)
T
− f (x) + αtg
(x) p > 0 do t := βt
Figure 2.5: Sufficient descent condition This algorithm ensures that we obtain a descent which is at least as large as by the α-fold gradient. Typical values for applications are 0.01 α 0.30 and 0.1 β 0.8.
≤ ≤
≤ ≤ Example: Minimize the objective function f (x , x ) = (x − 1) + x x + exp(x − x ) using the steepest descent method with backtracking line search. Start at x = [0, 0] and use t = ∥g∥. We get g = [3, −1] and p = [−0.949, 0.316] . In the line search we start with t = 3.162 giving ϵ = 13.018. Then we get t = 1.581 with ϵ = -0.052. This is acceptable. Hence we get x = [−1.5, 0.5]. The further steps are shown in the table below. 1
2
1
2
2 1
0
T
T
2 2
1
2
T
0
0
1
1
i t x1 x2 f 2.2.3
1 1.581 -1.500 0.500 0.948
2 0.665 -1.096 -0.029 0.354
3 0.219 -1.171 0.178 0.322
4 0.069 -1.132 0.121 0.322
10 0.0005 -1.130 0.113 0.322
Quasi-Newton methods
The basic idea of quasi-Newton methods is to utilize successive approximations of the Hessian matrix H(x) or its inverse B(x) = H−1 (x). One specific popular method is the BFGS approach (named after Broyden, Fletcher, Goldfarb and Shanno). The procedure uses a quadratic approximation of the objective function f in terms of
1 2
(2.15)
October 22, 2009
11
T f ˆk ( v ) = gT k v + v Hk v
c 2007-2009 Christian Bucher ⃝
WS 09/10
Structural Optimization
Figure 2.6: Plot of objective function and steepest descent iteration sequence
Here Hk is a symmetric, positive definite matrix which is updated during the iteration process. The minimizer pk of f ˆk ( v ) is 1 pk = H− Bk gk (2.16) k gk =
−
−
In most implementations, this vector is used as a search direction and the new iterate for the design vector is formed from xk+1 = xk + t pk (2.17) Here the value of t is computed from a line search (typically backtracking starting from t=1). Then a new approximation f ˆk+1( v ) is constructed from
1 2
T f ˆk+1 ( v ) = gT k+1 v + v Hk+1 v
(2.18)
For this purpose we compute sk = xk+1
−x ; k
yk = gk+1
−g
k
(2.19)
We then check the so-called curvature condition γ k = sT k yk > 0
(2.20)
0 we set Hk+1 = Hk . Otherwise, we compute the next approximation to the inverse If γ k Hessian Bk+1 from sk yT yk sT sk sT k k k Bk+1 = I Bk I + (2.21) γ k γ k γ k
≤
(
c 2007-2009 Christian Bucher ⃝
−
) (
October 22, 2009
−
)
12
WS 09/10
Structural Optimization
Usually the procedure is started with B0 = I. In large problems it is not helpful to keep all update vectors in the analysis. Therefore, limited-memory BFGS (L-BFGS) has been developed. In this approach, only a small number m of most recent vectors sk and yk is stored and Bk is re-computed from these vectors in each step. Due to round-off it may happen that the updated matrix becomes very ill-conditioned. In this case, the update process is completely restarted from B = I. Example: Minimize the objective function f (x1 , x2 ) = (x1 + 1)2 + x21 x22 + exp(x1 x2 ) using the BFGS method with backtracking line search. Start at x0 = [0, 0]T and use t0 = 1. We get g0 = [3, 1]T and p0 = [ 3, 1]T . In the line search we start with t0 = 1 giving ϵ = 13.018. Then we get t1 = 0.5 with ϵ = -0.052. This is acceptable. Hence we get x1 = [ 1.5, 0.5] and g1 = [ 1.615, 2.115]T . From that, we have s1 = [ 1.5, 0.5]T , y1 = [ 4.615, 3.115] and γ 1 = 8.479. This leads to an updated inverse Hessian
−
−
−
−
−
0 603
. 0.411 B1 = 0.411 0.770
−
−
and a new search direction p1 = [0.103, 0.964]. The further steps are shown in the table below as well as in Fig. 2.7. i t x1 x2 f
1 1.581 -1.500 0.500 0.948
2 0.5 -1.448 0.018 0.432
3 1 -0.981 0.231 0.322
2.3
Applications to shape optimization
2.3.1
Minimal Surfaces
4 1 -1.111 0.158 0.322
10 1 -1.130 0.113 0.322
Two circles with radius R and the distance H should be connected by a membrane with minimal surface area A. We discretize the problem by replacing the meridian curve by a polygon as sketched. Then the membrane surface area is given by
A = 2π (R + r)
(R
− r)
2
+
H 2
9
+ 2πr
H
3
(2.22)
Here r is to be determined by minimizing A. Taking derivative w.r.t. r we have dA = 2π dr
(R
− r)
2
+
H 2
9
+ 2rπ
2(R − r)(R + r ) − H + 2π = 0 3 2 (R − r ) + 2
H 2
(2.23)
9
= 1, the solution is r = 0.867R, for H = 1.3 it is r = 0.707R. The analytical For a ratio of H R R solution for the meridian curve of this problem can be obtained as r(z ) = a cosh az in which a has ) = R. For H = 1, this leads to a = r(0) = 0.843, for H = 1.3, to be chosen such that r ( H R R 2 we obtain a = r (0) = 0.642. So there is some level of agreement even with this very simple discretization.
c 2007-2009 Christian Bucher ⃝
October 22, 2009
13
WS 09/10
Structural Optimization
Figure 2.7: Plot of objective function and BFGS iteration sequence
Figure 2.8: Connecting two circles with a membrane
Exercise Connect two squares ( L Consider the cases
a) for
H L
=
b) for
H L
=1
2.3.2
× L ) by trapezoidal elements minimizing the total surface area.
1 2
Shape optimization by energy minimization
Here we try to find a str uctural geometry in such a way that the work done by the applied loads becomes minimal. For a structure with concentrated applied loads F and corresponding displacements u this means FT u Min.!
→
c 2007-2009 Christian Bucher ⃝
October 22, 2009
14
WS 09/10
Structural Optimization
Figure 2.9: Connecting two squares with trapezoidal elements
Example The geometry of a statically determinate system as shown in Fig. 2.10 is to be configured such that the total external work W = F 1 u1 + F 2 u2 becomes minimal. The design variables are the vertical locations of the load application points, i.e. x1 and x2 . We assume identical rectangular
Figure 2.10: Minimize external work cross sections d d throughout with the following geometrical relations: d = 0.05L, A = d2 , I = d12 . Furthermore, we solve the problem for the fixed load relation F 1 = F , F 2 = 2F . Computing W for the range 0 x1, x2 5 results in the values as shown in logarithmic scale in Fig. 2.11. Even in log-scale it can be seen that there is a deep and narrow ravine along the line x2 = 54 x1 . This line defines a moment-free geometric configuration of the system. Tracing this line in x1 , x2 -space easily allows the location of a global minimum at x1 = 1.375L as shown in Fig. 2.12. 4
×
≤
≤
Example The geometry of a statically determinate system as shown in Fig. 2.13 is to be configured such that the total external work W becomes minimal. Assuming symmetry, the design variables are the vertical locations of the load application points, i.e. z 1 , z 2, z 3 . Again we assume identical rectangular cross sections d d throughout. We now start by solving for possible moment-free configurations. The moments in the points e and d are easily found. From the condition M e = 0 we get z 1 = 59 z 3 . From the condition M d = 0 we get z 2 = 89 z 3 so that the energy will be minimized using z 3 only. Using these relations, we locate a global minimum at z 3 = 2.375L as shown in Fig. 2.14.
×
c 2007-2009 Christian Bucher ⃝
October 22, 2009
15
WS 09/10
Structural Optimization
Figure 2.11: External work as a function of x1 and x2
Figure 2.12: External work as a function of x1 along the line x2 = 54 x1
c 2007-2009 Christian Bucher ⃝
October 22, 2009
16
WS 09/10
Structural Optimization
Figure 2.13: Minimize external work
Figure 2.14: External work as a function of z 3 in the plane z 2 = 59 z 3 , z 2 = 89 z 3
c 2007-2009 Christian Bucher ⃝
October 22, 2009
17
WS 09/10
Structural Optimization
Figure 2.15: Initial configuration and load distribution
Figure 2.16: Final configuration
c 2007-2009 Christian Bucher ⃝
October 22, 2009
18
WS 09/10
Structural Optimization
Application to Finite Element models. This shape optimization by energy minimization can also be used in the context of the finite element method. Here the nodal coordinates of the mesh are the optimization variables. Of course, this implies that the element matrices and the global matrices have to be re-assembled in each step of the optimization. The shape of the structure with the loads as indicated in Fig. 2.15 should be optimized with respect to minimal external work. The optimized shape is shown in Fig. 2.16.
3 3.1
Constrained Optimization Optimality Conditions
We now return to the problem of optimization with inequality constraints. Without loss of generality, this can be written in the form of Minimize f 0(x) subject to f 1 (x)
(3.1)
≤0
R may actually involve several constraint conditions put together e.g. The function f 1 : Rn in terms of a max-operator. The standard approach o the solution of this problem involves the consrution of a Lagrange-function L combining the objective and the constraint:
→
L(x, λ) = f 0 (x) + λf 1 (x);
λ
≥0
(3.2)
∈
The parameter λ R is called Lagrange multiplier. It is an additional optimization variable. The so-called Karush-Kuhn-Tucker (KKT) conditions for this optimization problem are the usual necessary conditions for the existence of a local minimum:
∇f (x) + λ∇f (x) = 0 0
1
λf 1 (x) = 0
λ Example
(3.3)
≥ 0; f (x) ≤ 0 1
Consider a one-dimensional problem as previously discussed: f 0 (x) = x2
→ Min.! f (x) = x + 1 ≤ 0
(3.4)
1
The Lagrange function or this problem is L(x, λ) = x2 + λ(x + 1)
(3.5)
3. The Lagrange function has a This function is shown in Fig. 3.1 for the range 3 x, λ stationary point defined by 2x + λ = 0 and x + 1 = 0, so that x∗ = 1 and λ∗ = 2. Again, this is shown in Fig. 3.1. It is easily seen that this point is a saddle point in (x, λ)-space.
− ≤
≤
−
Example Consider the optimization problem (as previously discussed in similar form): We want to maximize A = x1 x2 subject to 2x1 + 2x2 4L. In terms of Eq.(3.1) we write
≤ f = −x x 0
1
2
f 1 = 2x1 + 2x2 c 2007-2009 Christian Bucher ⃝
October 22, 2009
− 4L 19
WS 09/10
Structural Optimization
Figure 3.1: Lagrange function and the KKT-conditions become
+ 2λ = 0 1 + 2λ = 0 λ(2x1 + 2 x2 4L) = 0 0; 2x1 + 2x2 4L 0
−x −x
λ
≥
2
− − ≤
(3.6)
The first three equations have the solutions x1 = 0, x2 = 0, λ = 0. This solution obviously defines a maximum of f 0 . The second solution is x1 = L, x2 = L, λ = L2 . This satisfies all conditions, and therefore describes a local minimum of f 0 (and therefore a maximum of A ). If the function f 0 and f 1 are both convex and differentiable, then the KKT conditions for ( x, λ) are necessary and sufficient for a local optimum. If, moreover, f 0 is strictly convex, then the solution x is unique (i.e. a global minimum). Note that in the previous example f 0 is not convex!
3.2
Quadratic problem with linear constraints
Consider the optimization problem f 0 (x) =
1 T x H x + gT x Min.! 2 f 1 (x) = aT x + b 0
→
(3.7)
≤
with a positive definite matrix H. Since the objective function is strictly convex and the constraint equation is convex, the solution of the KKT-conditions (if it exists) defines the unique minimum. The KKT-conditions are H x + g + λa = 0
λ(aT x + b) = 0 c 2007-2009 Christian Bucher ⃝
October 22, 2009
(3.8) 20
WS 09/10
Structural Optimization
together with λ > 0 and aT x + b 0. One possibility is λ = 0 and from that x = point is feasible, then it is the solution. The alternative with λ = 0 requires that
≤
aT x =
−b;
aT x =
T
−λa
H−1
̸ a−a
T
−1
−H
H−1 g
g. If this
(3.9)
from which we immediately get λ=
b
T
−a
H−1g
(3.10)
aT H−1 a
and furthermore x∗ =
−1
−λH a − H
−1
(3.11)
g
This can be used as a starting point for sequential numerical procedures (SQP methods such as NLPQL) utilizing a second order approximation for the objective function and a first order approximation for the constraints. Example Find the minimum of f 0 = x21 + x1 x2 + x22 subject to the constraint x1 x2 < R, i.e. f 1 = x1 x2 R. In order to rewrite this in the previous notation, we introduce the matrix H and the vector a as well as the scalar b:
−
− −
H=
2 1 1 2
;
a=
1 −1
;
b=
−R
The solution λ = 0 and, correspondingly, x = 0 exists only for R
(3.12)
≥ 0 (see Fig. 3.2). The second
Figure 3.2: Objective function and feasible domain for R = 2 (lower right) and R = left)
−2 (upper
possible solution is obtained from H−1
c 2007-2009 Christian Bucher ⃝
1 = 3
2
−1
−1 ; 2
aT H−1 a = 2;
October 22, 2009
λ=
R
−2;
1 x∗ = 2
R R
−
(3.13)
21
WS 09/10
3.3
Structural Optimization
Sequential quadratic programming (SQP)
Essentially, this is a repeated application of the minimization of a quadratic function with linear constraints. In the process, most implementations do not use the exact Hessian matrix of the objective function, rather an approximation based on gradient information during the iteration (such as the BFGS-updating) is used. In this case, it may be helpful to include a line search procedure using the solution of Eq. 3.11 as search direction. Also, scaling of the constraints can significantly influence the convergence! Example
Minimize the objective function f (x1 , x2 ) = (x1 + 1)2 + x21 x22 + exp(x1
−x )
(3.14)
2
subject to the constraint condition x21
− 2 −x
2
+ 1.5 < 0
(3.15)
A plot of this function and the feasible domain is shown in Fig. 3.3
Figure 3.3: Plot of objective function and iteration sequence
Repeated application of the quadratic constrained minimization leads to a local minimum. Starting the procedure at x0 = [ 1, 0]T we get fast convergence and end up with the global minimum x∗ = [ 1.632, 0.168]. With a slightly modified starting vector of x0 = [ 0.5, 0]T we converge (slowly) to the local minimum x∗ = [ 1.000, 1.000] (see Fig. 3.3). Interestingly, when starting at the origin, we converge (very slowly) to the global minimum x∗ = [ 1.632, 0.168].
−
c 2007-2009 Christian Bucher ⃝
−
−
−
−
October 22, 2009
22
WS 09/10
3.4
Structural Optimization
Penalty methods
An alternative approach to explicit handling of constraints is the application of modifications to the objective function in such a way as to prevent the optimization algorithm from reaching the infeasible domain. A simple way to achieve this is adding a penalty term p(x) to the objective function f 0 (x) which is large enough to shift the minimum of the augmented objective function f p (x) = f 0 (x) + p(x) to the feasible domain. For computational purposes it is useful to construct p(x) in such a way that the objective function remains differentiable (or at least continuous). Usually
Figure 3.4: Interior and exterior penalty functions it will not be possible to adjust p(x) in such a way that the minimum of the augmented objective will be located exactly at the boundary of the feasible domain (cf. Fig. 3.4). Interior penalty functions attempt to keep the optimization process away from the boundary of the feasible domain by adding a term which increases sharply when approaching the boundary from the interior. So the solution will be feasible. Exterior penalties lead to an optimum which is not in the feasible domain. However, it is usually easier to construct suitable exterior penalty functions, e.g. N
∑ ( )=
p x
ai H [f i (x)]f i (x)ℓ
(3.16)
i
i=1
0 Here H (.) denotes the Heaviside (unit step) function and the coefficients ai > 0 and ℓi are chosen according the the specific problem. The choice ℓi = 2 is leads to a differentiable augmented objective function and is usually quite acceptable. By increasing the values of ai the solution approaches the boundary of the feasible domain.
≥
Example
Minimize the objective function f 0 (x1 , x2 ) = (x1 + 1)2 + x21 x22 + exp(x1
−x ) 2
(3.17)
subject to the constraint condition f 1 (x1 , x2 ) =
x21
− 2 −x
2
+ 1 .5 < 0
(3.18)
We choose the exterior penalty function p(x1 , x2 ) = aH [f 1 ]f 12 c 2007-2009 Christian Bucher ⃝
October 22, 2009
(3.19) 23
WS 09/10
Structural Optimization
Figure 3.5: Plot of augmented objective function and iteration sequence, a = 1
A plot of this function for a = 1 is shown in Fig. 3.5. Application of the BFGS method to the augmented objective function leads to the iteration sequence as shown in Fig. 3.5. The convergence to the point x∗ = [ 1.449, 0.180] is quite fast, but the final result is clearly infeasible. Changing the value a = 10 leads to an augmented objective as shown in Fig. 3.6. Here a second minimum becomes visible which is actually found when starting from the origin. Starting at the point ( 1, 0) we converge to the point x∗ = [ 1.609, 0.170] which is reasonably close to the solution of the constrained optimization problem.
−
−
−
Example The cross sectional areas of the truss structure as shown in Fig. 3.7 should be chosen such that the total structural mass becomes minimal. As a constraint, the maximum stress (absolute value) in any truss member should not exceed a value of β . We assume numerical values of L = 3 and H = 4. The objective function is then
f 0 = 5(A1 + A3 + A5 ) + 6(A2 + A4 )
(3.20)
Since this is a statically determinate system, the member forces F k and stresses σk are easily computed as
5 F, F 2 = 8 5F σ1 = , σ2 = 8A1 F 1 =
− 38 F, F = − 38 F, F = 34 F, F = − 54 F 3
4
5
3F 3F 3F 5F , σ3 = , σ4 = , σ5 = 8A2 8A3 4A4 4A5
(3.21)
In this case, the objective function is linear, but the constraints are not.1 1
By introducing the inverses of the cross sectional areas as new design variables, the problem could be changed to nonlinear objective with linear constraints. c 2007-2009 Christian Bucher ⃝
October 22, 2009
24
WS 09/10
Structural Optimization
Figure 3.6: Plot of augmented objective function and iteration sequence, a = 10
Figure 3.7: Simple truss structure
We solve the problem by introducing an exterior penalty function in the form of p = aH (s)s4 ;
s = max σk k =1...5
− β
(3.22)
In the following numerical evaluation we fix the values F =1, β =1 and vary a. Using a BFGS iteration with numerical gradient evaluation (central differences with ∆Ak = 10−6 starting at Ak =1 we get the results as shown in Table 1. It can be seen that as a increases, the results approach the fully stressed design in which each truss member reaches the stress limit. The convergence of the c 2007-2009 Christian Bucher ⃝
October 22, 2009
25
WS 09/10
Structural Optimization
Table 1: a 105 107 109
Truss example: A1 A2 0.603 0.362 0.620 0.372 0.624 0.374
convergence of A3 A4 0.362 0.724 0.372 0.744 0.374 0.749
penalty method A5 N 1.207 540 1.240 706 1.248 736
objective and the constraint is shown in Fig. 3.8 for the case of a = 109 . The number of iterations N required for convergence is given in Table 1.
Figure 3.8: Convergence for simple truss structure
Example The cross sectional areas of the frame structure as shown in Fig. 3.9 should be chosen such that the total mass becomes minimal. The structure is modeled by 4 beam elements having square cross sections with widths b1 , b2 , b3 and b4, respectively. As constraints we utilize displacement conditions, i.e. u < ζ and w < ζ (cf. Fig. 3.9). For steel as material with E = 2.1 GPa and
||
| |
Figure 3.9: Simple frame structure ρ = 7.85 t/m3, a deformation limit ζ =50 mm as well as a penalty parameter A = 105 and starting values bi = B we obtain the optimal section widths as shown in Table 2. For B = 0.5 we obtain the optimum with 171 iterations. This solution has a total mass of 2173 kg, a horizontal displacement of 50.02 mm and a vertical displacement of 49.96 mm. Convergence is shown in Fig. 3.10. It should be noted that for different starting values B the procedure converges to a different solution. c 2007-2009 Christian Bucher ⃝
October 22, 2009
26
WS 09/10
B [mm] 500 100
Structural Optimization
Table 2: Frame example: optimal section widths b1 [mm] b2 [mm] b3 [mm] b4 [mm] f 0 [kg] u [mm] w [mm] 201 105 109 109 2173 50.02 49.96 178 136 149 141 2549 49.75 19.47
Figure 3.10: Convergence for simple frame structure
4 4.1
Genetic algorithms Basic Principles
The general idea of genetic algorithms for optimization utilizes a string representation of the design variables ( chromosome ). With a set of different designs ( population ) we can try to find better designs ( individuals ) through the processes of reproduction which involves recombination and mutation . The recombination process is usually carried out by cross-over in which parts of the strings are swapped between individuals. The simplest string representation is a bit string representing states or discrete numerical values. As a matter of fact, any digital representation of real numbers is such a bit string. As an example, consider maximizing the function f (x) = x2 for integer x in the interval [0, 31]. Within that range, any integer can be represented by 5 bits. Let us assume that an initial population with four strings has been generated
01101 11000 01000 10011
(4.1)
For the reproduction process it is a good idea to consider primarily those individuals which have a high value of the objective function (the fitness ). According to this concept, the individuals with a higher fitness have a large probability of being selected for reproduction. In Table 3 the selection probability P S is shown proportional to a fitness value which is equal to the objective function From this table it becomes obvious that it is beneficial for the fitness to have the high-order bits Table 3: Sample strings and fitness values No. 1 2 3 4
String 01101 11000 01000 10011
Fitness 169 576 64 361
P S
0.144 0.492 0.055 0.309
set in the strings. This means that in the reproduction process all individuals with a chromosome containing sub-strings having the high-order bits set should be preferred as they are more likely to achieve better fitness values. c 2007-2009 Christian Bucher ⃝
October 22, 2009
27
WS 09/10
Structural Optimization
The cross-over process when applied to two individuals cuts the strings of each individual at the same location chosen at random and swaps the pieces. As an example, consider the first two strings in Table 3. We choose to cut the chromosomes after the fourth bit: A1 = 0110 1 A2
| = 1100|0
(4.2)
Swapping the pieces results in A′1 = 01100
(4.3)
A′2 = 11001
It is easy to see that now we have a new individual ( A′2 ) with a better fitness than any other before in the population. This individual decodes to the numerical value of x = 25 with an objective function value of f (x) = 625. Mutation can be introduced by randomly flipping the state of one single bit. Usually, the probability of occurrence is kept rather small in order not to destroy the selection process. However, mutation can help the optimization process to escape the trap of a local extreme. A very interesting property of genetic algorithms is that they are essentially ”blind” to the mathematical characteristics of the objective function. In particular, there are no requirements of differentiability or even continuity.
4.2
Choice of encoding
In the introductory section we discussed an example of an integer-valued design variable x in the range of [0, 2ℓ 1] with ℓ being the number of bits (5 in this case). This is certainly not a typical situations. We may have continuous variables y varying within an interval [ymin , ymax ]. A straightforward coding would be a linear mapping from the interval [0, 2ℓ 1] to the interval [ymin , ymax ]:
−
−
y=
−y 2 −1
ymax
min
ℓ
x
(4.4)
and x is represented by ℓ bits. Here the choice of ℓ affects the resolution of the variable x but not its range. Multi-parameter encodings can be achieved by concatenating single-parameter encodings. A problem can arise from the fact that adjacent values of x can have a large number of different bits. As an example, consider the 4-bit representations of the numbers 7 (0111) and 8 (1000). All bits are different. Therefore sometimes so-called Gray-code (reflected binary code) is used. Gray coding reduces the bitwise difference between actually neighboring numbers to one single bit. Gray codes are constructed by arranging the binary strings into sequences in which the neighbors differ only by one bit. For ℓ = 2 one possible sequence is easily found as
00, 01, 11, 10
(4.5)
000, 001, 011, 010, 110, 111, 101, 100
(4.6)
and for ℓ = 3 we have for instance
For a 5-bit encoding, the natural and Gray codes are shown in Table 4.
c 2007-2009 Christian Bucher ⃝
October 22, 2009
28
WS 09/10
Structural Optimization
Table 4: Natural and Gray codes for 5-bit encodings
4.3
x
Natural
Gray
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111
00000 00001 00011 00010 00110 00111 00101 00100 01100 01101 01111 01110 01010 01011 01001 01000
x
Natural
Gray
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
11000 11001 11011 11010 11110 11111 11101 11100 10100 10101 10111 10110 10010 10011 10001 10000
Selection Process
During the course of a genetic optimization we want to keep the population size constant. If we initially have a few individuals with a significantly higher fitness than the others, then it is very likely that the population will be dominated by these individuals and their offspring. This can lead to a trap in a local maximum. One way to avoid this involves a scaling of the fitness function such that the best value is moderately large than the average. At the same time we want to maintain the average fitness also for the scaled function which is needed for average individuals to maintain their chance of survival. Linear scaling introduces a scaled fitness f ′ in terms of the raw fitness f as (cf. Fig. 4.1): f ′ = af + b (4.7) ′ ′ = C mult f avg . For typical small Here f max is chosen as multiple of the average fitness, i.e. f max
Figure 4.1: Linear fitness scaling population sizes of 50 to 100, a choice of C mult between 1.2 and 2.0 as been used successfully. As c 2007-2009 Christian Bucher ⃝
October 22, 2009
29
WS 09/10
Structural Optimization
the optimization approaches the end, typically fitness values within the population show very little variation with the exception of a few very bad cases. Linear scaling might assign negative fitness values, which must be suppressed by adjusting the factor C mult accordingly. The actual selection
Figure 4.2: Biased roulette wheel for selection
process picks individuals at random with a selection probability P S proportional to the fitness. This can be viewed as a roulette wheel with a non-uniform slot size. For the sample population as given in Table 3, this is shown in Fig. 4.2.
4.4
Recombination and mutation
Once surviving individuals have been selected, they are paired and the chromosomes are cut at random locations. The pieces are then swapped thus forming two new individuals. In order to create previously unavailable bit patterns, individual bits may be flipped randomly simulating sponanteous mutations.
4.5
Elitism
Due to the random reproduction process it may happen that genetic material related to the best individuals gets lost. This can be avoided by granting survival to a subset of the population with the highest fitness ( elite ), usually one individual.
5 5.1
Robustness in optimization Stochastic modelling
In many engineering applications of optimization there is some uncertainty about the exact values of design variables and/or other parameters affecting the objective function and constraints. This uncertainty is due to e.g. manufacturing tolerances or environmental conditions and can frequently be described in terms of probabilities. As a consequence, the objective function and the constraints become random, i.e. they have a probability distribution. This implies that the objective function may on average be significantly larger than in the deterministic situation and that constraints can be violated with large probabilities. Such a case would be called non-robust. Robust optimization aims at mitigating the effect of random uncertainties by taking them into account during the optimization process. c 2007-2009 Christian Bucher ⃝
October 22, 2009
30
WS 09/10
Structural Optimization
Uncertainties in the optimization process can be attributed to three major sources as shown in Fig. 5.1 These sources of uncertainties or stochastic scatter are
Figure 5.1: Sources of uncertainty in optimization
• Uncertainty of design variables.
This means that the manufacturing process is unable to achieve the design precisely. The magnitude of such uncertainty depends to a large extent on the quality control of the manufacturing process.
• Uncertainty in the objective function. This means that some parameters affecting the struc-
tural performance are beyond the control of the designer. These uncertainties may be reduced by a stringent specification of operating conditions. This may be possible for mechanical structures, but is typically not feasible for civil structures subjected to environmental loading such as earthquakes or severe storms which cannot be controlled.
• Uncertainty of the feasible domain. This means that the admissibility of a particular design (such as its safety or serviceability) cannot be determined deterministically. Such problems are at the core of probability-based design of structures.
Monte Carlo Simulation This is a frequently used method to deal with the effect of random uncertainties. Typically its application aims at integrations such as the computation of expected values (e.g. mean or standard deviation). As an example, consider the determination of the area of a quarter circle of unit radius. As we know, the area is π4 . Using a the so-called Monte Carlo Method we can obtain approximations to this result based on elementary function evaluations. Using 1000 uniformly distributed random numbers x and y (cf. Fig. 5.2), and counting the number N c of pairs N 776 (x, y ) for which x2 + y 2 < 1, we get an estimate π4 = 1000 = 0.776. 1000
≈
c
Simple example Consider the objective function f 0 (x) = a(x 1)2 0.5 with different numerical values for a a. This function has the minimum at x∗ = 1 with an objective function value f (x∗ ) = -0.5 for all values of a > 0. We assume that the actual design value of x is not located exactly at the deterministic minimum but has a random offset y (representing e.g. manufacturing tolerances) so that the objective is evaluated at xD = 1 + y . This introduces more randomness into the objective function as evaluated at the actual design xD . y is assumed to be a zero-mean Gaussian variable with a coefficient of variation of 0.1. Carrying out a Monte-Carlo simulation with 1000 random samples drawn x with a fixed value of a = 0.5 we obtain random samples for the objective function and the location of the minimum.
−
c 2007-2009 Christian Bucher ⃝
October 22, 2009
−
31
WS 09/10
Structural Optimization
Figure 5.2: Estimating π4 by Monte Carlo Simulation
Figure 5.3: Simple objective function and location of the minimum The samples are shown in Fig. 5.4. It can be easily seen that the objective function value is always larger than the deterministic value. The mean value of the objective function is -0.495 with a COV of 0.014. For numerical value of a = 10 the same analysis yields a mean objective of -0.40 with a COV of 0.35. This demonstrates the effect of local curvature of the objective function at the deterministic optimum. Assume now that both a is a Gaussian random variables with mean values of 0.5 and coefficients of variation of 0.1. This introduces additional into the objective function. Carrying out a Monte-Carlo simulation with 1000 random samples drawn for a and x we obtain samples as shown in Fig. 5.5.
5.2
Application example
As an example, consider a simple beam under dynamic loading (cf. Fig. 5.6). For this beam with length L = 1m and a rectangular steel cross section ( w , h ) subjected to vertical harmonic loading F V (t) = AV sin ωV t and horizontal harmonic loading F H (t) = AH sin ωH t the mass should be minimized considering the constraints that the center deflection due to the loading should be smaller than 10 mm. Larger deflections are considered to be serviceability failures. The design variables are
c 2007-2009 Christian Bucher ⃝
October 22, 2009
32
WS 09/10
Structural Optimization
Figure 5.4: Random samples of design values and objective function, deterministic objective
Figure 5.5: Random samples of design values and objective function, random objective bounded in the range 0 < w, h < 0.1 m. Force values are AV = AH = 300 N, ωV = 1000 rad/s, ωH = 700 rad/s. Material data are E = 210 GPa and ρ = 7850 kg/m3 . Using a modal representation of the beam response and taking into account the fundamental vertical and horizontal modes only, the stationary response amplitudes uV and uH are readily computed. Fig. 5.7 shows the dynamic response u = u2V + u2H as a function of the beam geometry. The contour line shown indicates a response value of 0.01 m. Defining this value as acceptable limit of deformation it is seen that the feasible domain is not simply connected. There is an island of feasibility around w = 0.03 m and h= 0.05 m. The deterministic optimum is located on the
√
c 2007-2009 Christian Bucher ⃝
October 22, 2009
33
WS 09/10
Structural Optimization
Figure 5.6: Beam with rectangular cross section
Figure 5.7: Dynamic response of beam and feasible domain boundary of this island, i.e. at the values w ∗ = 0.022 m and h∗ = 0.045 m. In the next step, the loading amplitudes are assumed to be log-normally distributed and the excitation frequencies are assumed to be Gaussian random variables. The mean values are assumed to be the nominal values as given above, and the coefficients of variation are assumed to be 5% for the load amplitudes and for the excitation frequencies (Case 1). This implies that the constraints can be satisfied only with a certain probability < 1. Fig. 5.8 shows the probability P ( w, h)of violating the constraint as a function of the design variables w and h. Accepting a possible violation of the constraint condition with a probability of 20%, it is seen that the location of the deterministic optimum still contains a probabilistically feasible region. In that sense, the deterministic optimum may be considered as robust. In a comparative analysis, the coefficients of variation are assumed to be 10% for the load amplitudes and for the frequencies (Case 2). The resulting conditional failure probabilities are shown
F|
c 2007-2009 Christian Bucher ⃝
October 22, 2009
34
WS 09/10
Structural Optimization
Figure 5.8: Conditional failure probability P (
F|w, h) depending on w und h, Case 1
in Fig. 5.9. Due to the increased random variability, the feasible region around the deterministic
Figure 5.9: Conditional failure probability P (
c 2007-2009 Christian Bucher ⃝
F|w, h) depending on w und h, Case 2
October 22, 2009
35