Dynamic programming
Sat 03 December 2016Remember your Past.
If the given problem can be broken up in to smaller sub-problems and these smaller subproblems are in turn divided in to still-smaller ones, and in this process, if you observe some over-lappping subproblems, then its a big hint for DP the optimal solutions to the subproblems contribute to the optimal solution of the given problem
- (Top-down) Memoization Start solving the given problem by breaking it down. If you see that the problem has been solved already, then just return the saved answer.
- (Bottom-up) Dynamic Programming Analyze the problem and see the order in which the sub-problems are solved and start solving from the trivial subproblem, up towards the given problem.
Difference with respect to divide-and-conquer: in d-and-c the problem is divided by non-overlapping subproblems
Useful: $C(n,m) = C(n-1,m) + C(n-1,m-1)$
Problem : Minimum Steps to One
On a positive integer, you can perform any one of the following 3 steps.
- Subtract 1 from it. ( n = n - 1 ) ,
- If its divisible by 2, divide by 2. ( if n % 2 == 0 , then n = n / 2 ) ,
- If its divisible by 3, divide by 3. ( if n % 3 == 0 , then n = n / 3 ).
Given a positive integer n, find the minimum number of steps that takes n to 1 eg:
- For n = 1 , output: 0
- For n = 4 , output: 2 ( 4 /2 = 2 /2 = 1 )
- For n = 7 , output: 3 ( 7 -1 = 6 /3 = 2 /2 = 1 )
Recursive solution
def f(N):
if N==1: return 0
r = 1+f(N-1)
if N%2==0: r = min(r,1+f(N/2))
if N%3==0: r = min(r,1+f(N/3))
return r
>> f(10)
>> 3
>> %timeit f(100)
>> 100 loops, best of 3: 19.2 ms per loop
But many f(N) are recalculated multiple times. Solution - remember them.
DP solution
Main idea: remember any intermediate solution in the temporary array M
, and don't repeat calculations if there is a solution already.
Here in order to use timing command, I wrapped DP solution into separate function which initializes array first.
def fDPrun(N):
M = [-1 for i in range(1000)]
M[1] = 0
def fDP(n):
if M[n]!=-1: return M[n]
r=1+fDP(n-1)
if n%2==0: r = min(r,1+fDP(n/2))
if n%3==0: r = min(r,1+fDP(n/3))
M[n] = r
return r
return fDP(N)
>> fDPrun(10)
>> 3
>> %timeit fDPrun(100)
>> 10000 loops, best of 3: 110 µs per loop
Problem: Longest common subsequence (LCS)
Recursive solution:
A=[1,4,6,8,5,3,6,8]
B=[4,6,6,2,1,4,5,6,8]
def LCSrec(N,M):
if N==0 or M==0: return 0
if A[N-1]==B[M-1]: return 1+LCSrec(N-1,M-1)
else: return max(LCSrec(N-1,M),LCSrec(N,M-1))
>> %timeit LCSrec(len(A),len(B))
>> 1000 loops, best of 3: 277 µs per loop
Dynamic programming
N=len(A)
M=len(B)
def LCSdp():
Mlcs=[[-1 for i in range(M+1)] for j in range(N+1)]
for i in range(N+1):
for j in range(M+1):
if i==0 or j==0:
Mlcs[i][j] = 0
else: ##important, you can easily go negative in index...
if A[i-1]==B[j-1]:
Mlcs[i][j] = 1+Mlcs[i-1][j-1]
else: Mlcs[i][j] = max(Mlcs[i-1][j],Mlcs[i][j-1])
return Mlcs[N][M]
>> %timeit LCSdp()
>> 10000 loops, best of 3: 47.8 µs per loop
but for larger values A,B
A=list("AAACCGTGAGTTATTCGTTCTAGAA")
B=list("CACCCCTAAGGTACCTTTGGTT")
>> %timeit LCSdp()
>> 1000 loops, best of 3: 311 µs per loop
After the computation the resulting Mlcs
array for sequencies A=[1,4,6,8,5,3,6,8]
and B=[4,6,6,2,1,4,5,6,8]
looks like this:
[0, 4, 6, 6, 2, 1, 4, 5, 6, 8]
[1, 0, 0, 0, 0, 1, 1, 1, 1, 1]
[4, 1, 1, 1, 1, 1, 2, 2, 2, 2]
[6, 1, 2, 2, 2, 2, 2, 2, 3, 3]
[8, 1, 2, 2, 2, 2, 2, 2, 3, 4]
[5, 1, 2, 2, 2, 2, 2, 3, 3, 4]
[3, 1, 2, 2, 2, 2, 2, 3, 3, 4]
[6, 1, 2, 3, 3, 3, 3, 3, 4, 4]
[8, 1, 2, 3, 3, 3, 3, 3, 4, 5]
where first column represents vector A
and first row represents B
.
0 4 6 6 2 1 4 5 6 8
1 0 0 0 0 1* 1 1 1 1
4 1 1 1 1 1 2* 2 2 2
6 1 2 2 2 2 2* 2 3 3
8 1 2 2 2 2 2* 2 3 4
5 1 2 2 2 2 2 3* 3 4
3 1 2 2 2 2 2 3* 3 4
6 1 2 3 3 3 3 3 4* 4
8 1 2 3 3 3 3 3 4 5*
For bigger sequences
A=list("AAACCGTGAGTTATTCGTTCTAGAA")
B=list("CACCCCTAAGGTACCTTTGGTTC")
>> %timeit LCSdp()
>> 1000 loops, best of 3: 304 µs per loop
and LCS is equal to 14. The recusive solution hasn't finished in the reasonable time...
Backtrack is pretty simple: start with longest and go diagonal if A[i]=B[j]
, stay on the longer path otherwise.
0 C* A C C C C T A A G G T A C C T T T G G T T C
A 0 1* 1* 1* 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
A 0 1 1 1* 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
A 0 1 1 1* 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
C 1 1 2 2 2* 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4
C 1 1 2 3 3 3* 3 3 3 3 3 3 3 4 5 5 5 5 5 5 5 5 5
G 1 1 2 3 3 3* 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 6 6
T 1 1 2 3 3 3 4* 4* 4* 4 4 5 5 5 5 6 6 6 6 6 7 7 7
G 1 1 2 3 3 3 4 4 4 5* 5 5 5 5 5 6 6 6 7 7 7 7 7
A 1 2 2 3 3 3 4 5 5 5* 5 5 6 6 6 6 6 6 7 7 7 7 7
G 1 2 2 3 3 3 4 5 5 6 6* 6 6 6 6 6 6 6 7 8 8 8 8
T 1 2 2 3 3 3 4 5 5 6 6 7* 7* 7* 7* 7 7 7 7 8 9 9 9
T 1 2 2 3 3 3 4 5 5 6 6 7 7 7 7 8* 8 8 8 8 9 10 10
A 1 2 2 3 3 3 4 5 6 6 6 7 8 8 8 8* 8 8 8 8 9 10 10
T 1 2 2 3 3 3 4 5 6 6 6 7 8 8 8 9 9* 9 9 9 9 10 10
T 1 2 2 3 3 3 4 5 6 6 6 7 8 8 8 9 10 10* 10* 10 10 10 10
C 1 2 3 3 4 4 4 5 6 6 6 7 8 9 9 9 10 10 10* 10 10 10 11
G 1 2 3 3 4 4 4 5 6 7 7 7 8 9 9 9 10 10 11 11* 11 11 11
T 1 2 3 3 4 4 5 5 6 7 7 8 8 9 9 10 10 11 11 11 12* 12 12
T 1 2 3 3 4 4 5 5 6 7 7 8 8 9 9 10 11 11 11 11 12 13* 13
C 1 2 3 4 4 5 5 5 6 7 7 8 8 9 10 10 11 11 11 11 12 13 14*
T 1 2 3 4 4 5 6 6 6 7 7 8 8 9 10 11 11 12 12 12 12 13 14*
A 1 2 3 4 4 5 6 7 7 7 7 8 9 9 10 11 11 12 12 12 12 13 14*
G 1 2 3 4 4 5 6 7 7 8 8 8 9 9 10 11 11 12 13 13 13 13 14*
A 1 2 3 4 4 5 6 7 8 8 8 8 9 9 10 11 11 12 13 13 13 13 14*
A 1 2 3 4 4 5 6 7 8 8 8 8 9 9 10 11 11 12 13 13 13 13 14
i=N
j=M
path = []
lcs = []
while i>0 and j>0:
if A[i-1]==B[j-1]:
path.append((i,j))
lcs.insert(0,A[i-1])
i-=1
j-=1
else:
path.append((i,j))
if Mlcs[i-1][j]<Mlcs[i][j-1]:
j-=1
else:
i-=1
lcs
>>['A', 'C', 'C', 'T', 'G', 'G', 'T', 'T', 'T', 'T', 'G', 'T', 'T', 'C']
Knapsack problem
Given weights w[i]
and values v[i]
of items, put maximum total value into the sack of the total weight less than or equal to W
.
Recursive solution
Idea: if we know the solution for the weight W, taking one item v[i]
from the sack gives solution for W-w[i]
.
N=3
vi=[1,4,6]
wi=[1,2,5]
def KnapsackRec(W):
if W==0: return 0
r=-1
for i in range(N):
if W-wi[i]>=0:
r=max(r,vi[i]+KnapsackRec(W-wi[i]))
return r
>> KnapsackRec(10)
>> 20
DP solution, bottom-up
The same, but solution is built starting from trivial case W=0
, and increasing.
N=3
vi=[1,4,6]
wi=[1,2,5]
def Knapsack(W):
mi=[-1 for x in range(W)]
mi[0]=0
for w in range(1,W):
for i in range(N):
if w-wi[i]>=0:
mi[w] = max(mi[w],vi[i]+mi[w-wi[i]])
print mi
>> Knapsack(11)
>> [0, 1, 4, 5, 8, 9, 12, 13, 16, 17, 20]