Acknowledgements:
Ian Davies, Mike Barrell, John Bent, and Iman Anvari
The durability of data in storage systems is determined by physical device failure rates. A typical storage system has multiple storage devices, which significantly increases the likelihood of failures. In order to improve the reliability, data redundancy is added to the system so that it becomes possible to recover data on a failed storage device from the parities stored on the remaining healthy devices. For example, in the case of Redundant Array of Inexpensive Disks (RAID), adding one redundancy ensures data robustness to one failure, whereas adding two redundancies will provide robustness against two simultaneous failures[3]. The data durability can be further increased by implementing sophisticated erasure coding schemes[4]–[8] and also by prioritizing the recovery of critically damaged data stripes[9]. In all of these schemes, the durability critically depends on how fast the data on a failed device can be reconstructed to restore the redundancy.
\[\begin{equation} f_{\alpha,\beta}(t)=\frac{\beta }{\alpha} \left( \frac{t}{\alpha}\right)^{\beta -1} e^{-\left( \frac{t}{\alpha}\right)^{\beta}},(\#eq:weipdf) \end{equation}\] \[\begin{equation} F_{\alpha,\beta}(t)= \int_{0}^{t} d\tau f_{\alpha,\beta}(\tau)=1-e^{-\left(\frac{t}{\alpha}\right)^{\beta}},(\#eq:weicdf) \end{equation}\] \[\begin{equation} h_{\alpha,\beta}(t)= \frac{f_{\alpha,\beta}(t)}{1-F_{\alpha,\beta}(t)}=\frac{\beta }{\alpha} \left( \frac{t}{\alpha}\right)^{\beta -1}.(\#eq:weihazard) \end{equation}\]
The hazard rate function, \(h_{\alpha,\beta}(t)\), is the event rate normalized with respect the to the population that survived until \(t\).
\(\beta=1\) gives the exponential distribution, which has completely random head failure times with a fixed failure(hazard) rate: \(h_{\alpha,\beta}(t)=\frac{1}{\alpha}\equiv\lambda\). This simplifies Eqs. @ref(eq:weipdf) , @ref(eq:weicdf) and @ref(eq:weihazard) to: \[\begin{equation} f_{\lambda}(t)= \lambda e^{-\lambda t},\quad F_{\lambda}(t)=1-e^{-\lambda t},\, \rm{and}\quad h_{\lambda}(t)=\lambda. (\#eq:expfuncs) \end{equation}\]
An interactive plot of Weibull distributions defined in Eqs. @ref(eq:weipdf) , @ref(eq:weicdf) and @ref(eq:weihazard). Use the toggles for linear/log scales. … [+] The vertical dashed lines mark the value \(t=\alpha\), and the horizontal one on \(F\) marks the value \(0.63\). The \(y\) axis can be transformed to \(ln\left[-ln(1-F_{\alpha,\beta})\right]\), and the time axis can be transformed to \(ln(t)\). The effect of \(\beta\) on \(F\) is best observed with the log time scale, and \(y\) in the probability scale: notice how \(F_{\alpha,\beta}\) becomes a line with the slope \(\beta\). Selecting the log scale for time and linear scale for the \(y\) axis, shows that the changing scale parameter \(\alpha\) moves the functions horizontally while preserving their shapes.
A drive will contain \(N\) heads and other components. We will model the drive as a device with two competing failure modes, one of which is ReMan’able while the other is not.
A brief timeline of the erasure coding. … [+] Replication has very poor capacity efficieny. RAID5(6) provides protection against one(two) failure(s). Seagate enclosures can be used to declustered parity as well as erasure coding across network. Durability and availablity can be further improved by implementing Multi-layer erasure coding. Most of the method shown in this figure will be discussed in detail in this presentation. Credit:John Bent
A Symbolic representation of the RAID including URE failures. Use the input sliders to change the number of drives or the redundancy. … [+] We will later derive the formulas for data durability for RAID5 in Eq. @ref(eq:m1f2) , and RAID6 in Eq. @ref(eq:mtwof) .
Press down arrow to navigate this sections.
\[\begin{equation} \text{number of nines}=\text{Floor}\left(-log10(1-\text{Durability})\right) (\#eq:nondefinition) \end{equation}\]
An illustration of the time frames.
An illustration of the time scales.
Markov Chain with one redundancy.
The transition equations: \[\begin{eqnarray} \dot{p}_{n}+n \lambda p_{n} -\mu p_{n-1}&=&0 \nonumber\\ \dot{p}_{n-1}+(n-1) \lambda p_{n-1} +\mu p_{n-1}-n \lambda p_{n} &=&0 \nonumber\\ \dot{p}_{n-2}-(n-1) \lambda p_{n-1} &=&0 . (\#eq:m1) \end{eqnarray}\] After Laplace transforming and converting to a matrix equation, we get: \[\begin{eqnarray} \hspace{-0.5em} \begin{bmatrix} P_{n}(s) \\ P_{n-1}(s)\\ P_{n-2}(s) \end{bmatrix}\hspace{-0.5em}=\hspace{-0.5em} \begin{bmatrix} s +n \lambda & -\mu &0 \\ -n\lambda & s +(n-1)\lambda +\mu &0\\ 0 &-(n-1)\lambda & s \end{bmatrix}^{-1} \begin{bmatrix} 1 \\ 0\\ 0 \end{bmatrix}\hspace{-0.4em} . (\#eq:m1lap) \end{eqnarray}\] We are interested in \(p_{n-2}\), which is the probability that system will have \(2\) simultaneous failures, hence data will be lost. Inverse Laplace transforming, we can calculate the reliability as \[\begin{eqnarray} {\rm R}(t)&=&1-p_{n-2}(t)\nonumber\\ &=&\frac{\mu + (2 n-1) \lambda + \sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n}}{2 \sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n})}\nonumber\\ &&\times e^{-\frac{1}{2}\left(\mu + \lambda (2n-1) -\sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n}\right) t}\nonumber\\ &-&\frac{\mu + (2 n-1) \lambda - \sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n}}{2 \sqrt{(\lambda - \mu)^2 +4\lambda \mu n})}\nonumber\\ &&\times e^{-\frac{1}{2}\left(\mu + \lambda (2n-1)+\sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n}\right) t}\nonumber\\ &\sim& \left[1+n(n-1)\frac{\lambda^2}{\mu^2}\right]e^{-\frac{t}{\mu/[\lambda^2n(n-1)]}}\nonumber\\ &&-n(n-1)\frac{\lambda^2}{\mu^2}e^{-t[\mu+\lambda(2n-1]}, (\#eq:m1f) \end{eqnarray}\] where we expanded in powers of \(\lambda/\mu\) and kept the leading terms in the coefficients and exponents. We can actually further simplify the equation since \(\frac{\lambda^2}{\mu^2}\ll 1\), for instance for recent products we can use \({\rm AFR}<0.5\%\) and the restore time is at the order of days, which gives \(\frac{\lambda}{\mu}\sim 10^{-4}\). Therefore, we can safely drop \(\frac{\lambda^2}{\mu^2}\) terms to get \[\begin{equation} {\rm R}(t)\sim e^{-t/{\rm MTTDL}_1},\nonumber\\ (\#eq:m1f2R) \end{equation}\] with \[\begin{equation} {\rm MTTDL}_1=\frac{1}{n(n-1) \lambda}\frac{\mu}{\lambda}(\#eq:m1f2). \end{equation}\]
Reliability with two redundancies:
It will get more complicated but, we will be getting better at extracting the relevant term. Let’s apply this to an \(n-\)drive system with two redundancies, which will have the Markov chain in Fig.
Markov Chain for a system with two redundancies
The state equation in the \(s\)-domain is \[\begin{eqnarray} \hspace{-1em} \begin{bmatrix} P_{n}(s) \\ P_{n-1}(s)\\ P_{n-2}(s)\\ P_{n-3}(s) \end{bmatrix}&=& \mathcal{S} \begin{bmatrix} 1 \\ 0\\ 0\\ 0 \end{bmatrix},(\#eq:statematrix) \end{eqnarray}\] where we defined \(\mathcal{S}\) as: \[\begin{eqnarray} \hspace{-1em} \hspace{-0.5em}\begin{bmatrix} s +n \lambda & -\mu &0 &0 \\ -n\lambda & s +(n-1)\lambda +\mu &-\mu&0\\ 0 &(n-1)\lambda & s +(n-2)\lambda +\mu &0\\ 0&0&(n-2)\lambda&s \end{bmatrix}^{-1}(\#eq:m2s) \end{eqnarray}\]
We are interested in \(P_{n-3}\). One can study the roots of the inverse \(s\) matrix to find that \(P_{n-3}\) has a pole at \(s=0\) with coefficient \(1\) and another one with coefficient \(1\) at \(s=S_1\) where \[\begin{equation} S_1\simeq \lambda n (n-1)(n-2)\left(\frac{\lambda}{\mu}\right)^2 (\#eq:spoles) \end{equation}\] Inverse Laplace transforming, we can calculate the reliability as
\[\begin{equation} {\rm R}(t)=1-p_{n-1}(t)\simeq1-(1-e^{-t/{\rm MTTDL}_2})=e^{-t/{\rm MTTDL}_2}(\#eq:m2fR) \end{equation}\] with \[\begin{equation} {\rm MTTDL}_2=\frac{1}{n(n-1)(n-2) \lambda}\left(\frac{\mu}{\lambda}\right)^2 (\#eq:mtwof). \end{equation}\]
\[\begin{eqnarray} \dot{p}_{n}+n \lambda p_{n} -\mu p_{n-1}&=&0\nonumber \\ \dot{p}_{n-1}+(n-1) \lambda p_{n-1} +\mu p_{n-1}-n \lambda p_{n} &=&0\nonumber\\ \dot{p}_{\rm F}-(n-1) \lambda p_{n-1} &=&0(\#eq:oneredmarkov) \end{eqnarray}\]
//Nov 2020 contact Serkay Olmez for questions
scr_path=Get Default Directory();scr_path=substr(scr_path,2,length(scr_path));scr_name = Current Window() << get window title;
calc_reli=function({},
newname="Reli simulation "||substitute(substitute(MDYHMS(today()),"/","_"),":","_");
dtt=new table(newname,invisible(1));
dtt<< add rows(num_simulations*num_drives_per_system);
eval(substitute(expr(dtt<<new column("sys_id",formula(1+Floor((Row()-1)/num_drps_)))),expr(num_drps_),name expr(num_drives_per_system)));
eval(substitute(expr(dtt<<new column("drv_id",formula(Row()-Floor((Row()-1)/num_drps_)*num_drps_))),expr(num_drps_),name expr(num_drives_per_system)));
dtt<<delete column(1); wait(1+log10(num_simulations)/2);:drv_id<<delete formula();:sys_id<<delete formula();
eval(substitute(expr(dtt<<new column("drv_failure_time",formula(Random Weibull(beta_,alpha_)))),expr(alpha_),name expr(alpha),expr(beta_),name expr(betav)));
column("drv_failure_time")<<delete formula();
dtt << Sort( By(:sys_id,column("drv_failure_time")), Order(Ascending,Ascending), Replace Table);
dtt<<new column("num_failures",numeric); dtt<<new column("num_max_failures",numeric);
:num_failures[1]=1;:num_max_failures[1]=1;maxseen=1;toberecovered={};
for(rr=2,rr<=nrows(dtt),rr++,
rr_time=:drv_failure_time[rr];
rr_m1_time=:drv_failure_time[rr-1];
if(:sys_id[rr-1]!=:sys_id[rr],:num_failures[rr]=1; toberecovered={};maxseen=1;:num_max_failures[rr]=1);
if(:sys_id[rr-1]==:sys_id[rr],
Insert Into( toberecovered,:drv_failure_time[rr-1]);
tosubtracted=0;
for(rc=nitems(toberecovered),rc>=1,rc--,
if(:drv_failure_time[rr]-toberecovered[rc]>repair_time, tosubtracted++;remove from(toberecovered,rc))
);
:num_failures[rr]=:num_failures[rr-1]-tosubtracted+1;
if(:num_failures[rr]>maxseen,maxseen=:num_failures[rr]);
:num_max_failures[rr] =maxseen
);
);
rc=1;
//throw();
mm=1;rr=1;mms=1;
big_failure_perc_table={};
for(mms=1,mms<=12/stepsize*max_year,mms++,
mm=stepsize*mms; //do this quarterly to speed up
month_tresh=365/12*mm;
if (mod(mm, 5) == 0,caption("Status:"|| char(round(100*mm/(12*max_year)))|| "% "); wait(0); ) ;
try(dtt<<delete column("filtered_num_max_failures"));
Eval(Substitute(Expr(dtt<<new column("filtered_num_max_failures",formula(if(:drv_failure_time>_thres_,.,:num_max_failures)))), Expr(_thres_), month_tresh));
dts=dtt <<Summary(Group( :sys_id ), Max( :filtered_num_max_failures ),Link to original data table( 0 ),invisible(1));
column("Max(filtered_num_max_failures)")<<set name("max_in_frame");
:max_in_frame[dts<<get rows where(is missing(:max_in_frame))]=0; //filling in the missing rows- they had no failures
failure_perc_table={};dts<<new column("system_failure",numeric);
for(rr=1,rr<=nitems(redundancy),rr++,
Eval(Substitute(Expr(:system_failure<<set formula(if(max_in_frame>_red,100,0))), Expr(_red), redundancy[rr]));
:system_failure<<eval formula();
failure_perc_table[rr]=col mean(:system_failure)
);
dts<<close window();big_failure_perc_table[mms]=failure_perc_table;
);
caption(remove); dtt<<close window();
);
////////////////////////////////////////
clear log();
try(dtsum<<close window());try(dtt<<close window());
start_time=today();
max_year=5;
AFRlist={25,31,38};repair_timelist={6};redundancy={1};num_drives_per_system_list={9};
AFRlist={0.5,1,2,5};repair_timelist={12,24,48,120,240};redundancy={0,1,2,3};num_drives_per_system_list={4,8,16,24};
AFRlist={2};repair_timelist={12,24,48,120,240};redundancy={0,1,2,3};num_drives_per_system_list={4,8,16,24};
AFRlist={2.5};repair_timelist={4};//days
redundancy={0,1};num_drives_per_system_list={20};
betalist={1,1.25,1.5};
//AFRlist={0.2};repair_timelist={0.33}; num_drives_per_system_list={8};
stepsize=3; //in months
dtsum=new table("Reli Summary", invisible(1));
dtsum<<new column("simulation_size");dtsum<<new column("total_num_of_drives");dtsum<<new column("redundancy");
eval(parse("dtsum<<new column(\!"AFR(%)\!")")); eval(parse("dtsum<<new column(\!"repair_time(days)\!")"));
dtsum<<new column("time(months)");dtsum<<new column("system failure(%)");
dtsum<<new column("system_reli",formula(1- :Name( "system failure(%)" ) / 100 ));
dtsum<<new column("MC_num_of_nines",formula(floor(-Log10( :Name( "system failure(%)" ) / 100 ))));
dtsum<<new column("MC_num_of_nines_",formula(-Log10( :Name( "system failure(%)" ) / 100 )));
dtsum<<new column("simulation_duration");
dtsum<<new column("beta");
index=0;
indexmax= nitems(AFRlist)*nitems(repair_timelist)*nitems(num_drives_per_system_list)*nitems(betalist);
gridindex=0;bi=1;afrindex=1;repindex=1;numodindex=1;
for(bi=1,bi<=nitems(betalist),bi++,
betav=betalist[bi];
for(numodindex=1,numodindex<=nitems(num_drives_per_system_list),numodindex++,
num_drives_per_system=num_drives_per_system_list[numodindex];
for(afrindex=1,afrindex<=nitems(AFRlist),afrindex++,
for(repindex=1,repindex<=nitems(repair_timelist),repindex++,
gridindex++;
remaining_time=round((indexmax-gridindex)*(today()-start_time)/gridindex/60);
caption(char(bi)||"_"||char(numodindex)||"_"||char(afrindex)||"_"||char(repindex)||" out of "
||char(nitems(betalist))||"_"||char(nitems(num_drives_per_system_list))||"_"||char(nitems(AFRlist))||"_"||char(nitems(repair_timelist))||" "||char(remaining_time)||" mins to go");
AFRmx=AFRlist[afrindex];
AFR=100*(1-(1-AFRmx/100)^(1/max_year));
repair_time=repair_timelist[repindex];
alpha=round(-365/log(1-AFR/100)); // this is in days
num_simulations=4*10^Floor(7-LOG10(num_drives_per_system));
print("number of drives:"||char(num_simulations*num_drives_per_system));
//throw();
simstart=today();
calc_reli();
simend=today();
print(char(round(simend-simstart))||" seconds");
for(mms=1,mms<=12/stepsize*max_year,mms++,
mm=stepsize*mms; //do this quarterly to speed up
for(rr=1,rr<=nitems(redundancy),rr++,
failure_perc_table=big_failure_perc_table[mms];
dtsum<<add rows(1);index++;
column("redundancy")[index]=redundancy[rr];
column("time(months)")[index]=mm;
column("system failure(%)")[index]=failure_perc_table[rr];
column("AFR(%)")[index]=AFR;
column("repair_time(days)")[index]=repair_time;
:total_num_of_drives[index]=num_drives_per_system;
:simulation_size[index]=num_simulations;
:simulation_duration[index]=round(simend-simstart);
:beta[index]=betav;
);
);
//throw();
newname=substitute(substitute(MDYHMS(today()),"/","_"),":","_");
dtsum<<save("C:\Users\451516\my_codes\erasure_coded_storage_system_analysis\EC reli MC simulation results_"||newname||".jmp");
);
);
);
);
dtsum<<new column("alpha",formula(round(-365.25/log(1-Name("AFR(%)")/100))));
dtsum<<new column("beta_sys",formula((:redundancy+1)*(:beta-1)+1));
dtsum<<new column("alpha_sys",formula( ((Name("repair_time(days)")^:redundancy)*(factorial(:total_num_of_drives)/(factorial(:total_num_of_drives-:redundancy-1)
*factorial(:redundancy)))*(:beta^(:redundancy+1)/:alpha^(:beta*(:redundancy+1)))*1/(:beta_sys))^(-1/:beta_sys)));
dtsum<<new column("theory_system failure(%)",formula(100*(1-Exp( -(365.25/:alpha_sys*:Name( "time(months)" ) / 12)^:beta_sys ))));
dtsum<<new column("theory_system_reli",formula(1- :Name( "theory_system failure(%)" ) / 100 ));
dtsum<<new column("theory_num_of_nines",formula(floor(-Log10( :Name( "theory_system failure(%)" ) / 100 ))));
dtsum<<new column("theory_num_of_nines_",formula(-Log10( :Name( "theory_system failure(%)" ) / 100 )));
:redundancy<<modeling type(nominal);
newname=substitute(substitute(MDYHMS(today()),"/","_"),":","_");
notein="This data table is created by "||scr_path||scr_name||".jsl on"||char( As Date(Today()););
dtsum << New Table Variable( "Notes",notein );
dtsum<< new column("Notes");
:Notes[1]=notein; //this table will go into CSV. Just add a column to keep the info
dtsum<< new column("Date");
thisdate=substitute(substitute(MDYHMS(today()),"/","_"),":","_");
:Date[1]==thisdate;
dtsum<<save("C:\Users\451516\my_codes\erasure_coded_storage_system_analysis\EC reli MC simulation results generic beta "||newname||".jmp");
duration=char(floor((today()-start_time)/60))||" minutes";
Bivariate(
Y( :MC_num_of_nines_ ),
X( :theory_num_of_nines_ ),
Fit Special( Intercept( 0 ), Slope( 1 ), {Line Color( {212, 73, 88} )} ),
SendToReport(
Dispatch(
{},
"1",
ScaleBox,
{Label Row( {Show Major Grid( 1 ), Show Minor Grid( 1 )} )}
),
Dispatch(
{},
"2",
ScaleBox,
{Label Row( {Show Major Grid( 1 ), Show Minor Grid( 1 )} )}
),
Dispatch(
{},
"Bivar Plot",
FrameBox,
{Row Legend(
redundancy,
Color( 1 ),
Color Theme( "JMP Default" ),
Marker( 0 ),
Marker Theme( "" ),
Continuous Scale( 0 ),
Reverse Scale( 0 ),
Excluded Rows( 0 )
)}
)
)
)
The comparison of theory vs Monte Carlo results with various input parameters. Number of nines is defined in Eq. @ref(eq:nondefinition) where flooring is omitted. … [+] The parameters swept are: redundancy(\(c\)) \(\in\) {0, 1}, AFR(%) \(\in\) {1}, \(\beta \in\) {1, 1.2},total number of drives \(\in\) {8} and repair time(days) \(\in\) {1.4}.
Markov chain with one redundancy with UREs.
\[\begin{equation} h_{n-1}\equiv1-(1-\rm{UER})^{N_\text{bits}}\simeq 1-e^{-UER\times N_\text{bits}}=1-e^{-UER\times (n-1)\times C_\text{bits}}(\#eq:hnmone) \end{equation}\]
Venn Diagram of overlaps of 3 failures with 53-drive pool and EC size of 10.
An approximate Markov chain for distributed parity of pool size \(D\) and redundancy \(2\). … [+] The recovery rates \(\mu_1\) and \(\mu_2\) are determined by the geometrical overlaps. ADAPT prioritizes recovering the critically damaged stripes, hence \(\mu_2\) is very quick.
A Sankey Diagram of stripe degradation percentages with +2 Erasure coding. Hover on the image to see the numbers. … [+] As the pool size increases, any given set of failed drives will have less and less overlap. This reduces the relative size of the critically damaged stripes enabling a quick recovery from this most vulnerable state.
The overall data durability can be improved by implementing another layer of erasure coding. Top layer is composed of already erasure coded sub-elements.
Left: the first layer of the overall erasure coding of size \(n\) and \(c\) reduncies.
Right: A block diagram representation of the erasure coded system as a single element. … [+] The system will lose data when \(c+1\) drive failures or \(c\) drive failures and URE(s) are encountered. These two cases are characterized by the rates \(\Lambda^{\rm DG} =\Lambda^{\rm DG}(c)=\left(\frac{\lambda}{\mu}\right)^c \frac{\lambda n! }{(n-c-1)!}\) and \(\Lambda^{\rm U} =\Lambda^{\rm U}(c)=h_{n-c}\Lambda^{\rm DG}(c-1)\)
Calm on the surface, but always paddling like hell underneath. A lot of data paddling inside the enclosure, but from outside, it is just a very reliable petabyte drive.
Top layer EC of size \(m\) and \(d\) reduncies built with self-erasure coded elements.
\[\begin{eqnarray} \frac{1}{{\rm MTTDL}^{\rm t}_d}&=&\frac{1}{{\rm MTTDL}^{\rm t}_{d,{\rm DG}}}+\frac{h^{\rm t}_{m-d}}{{\rm MTTDL}^{\rm t}_{d-1,{\rm DG}}}\nonumber \end{eqnarray}\]
\[\begin{eqnarray} {\rm MTTDL}^{\rm t}_{d,{\rm DG}} &=&\left(\frac{\mu_{\rm t}}{ \Lambda^{\rm DG}}\right)^d \frac{(m-d-1)!}{ \Lambda^{\rm DG} m!},\nonumber \end{eqnarray}\] \(h^{\rm t}_{m-d}=1-(1-{\rm UER})^{(n-c)(m-d)C}\) probability of URE(s).
\[\begin{eqnarray} \frac{1}{{\rm MTTDL}^{\rm t}_d}&=&\frac{1}{{\rm MTTDL}^{\rm t}_{d,{\rm DG}}}+\frac{h^{\rm t}_{m-d}}{{\rm MTTDL}^{\rm t}_{d-1,{\rm DG}}} \end{eqnarray}\]
\[\begin{eqnarray}
{\rm MTTDL}^{\rm t}_{d,{\rm DG}} &=&\left(\frac{\mu_{\rm t}}{ \Lambda^{\rm DG}}\right)^d \frac{(m-d-1)!}{ \Lambda^{\rm DG} m!},
\end{eqnarray}\] \(h^{\rm t}_{m-d}=1-(1-{\rm UER})^{(n-c)(m-d)C}\) probability of URE(s)
during the critical top layer repairs.
An illustration of the time scales.
Availability=1-Fraction of time spent for repair:
\[\begin{equation} A_0 \simeq1-\frac{n \lambda_s}{\mu_s}.(\#eq:a0pre) \end{equation}\]
Consider the Markov chain representation below:
Markov Chain with no redundancy.
The green arrow shows that even your system is down, you did not lose any data, you just need to repair it. This will make a lot of difference in the math. We will skip a couple of steps and start with the \(s\)-matrix: \[\begin{equation} \begin{bmatrix} s +n \lambda_s & -\mu_s \\ -n \lambda_s & s +\mu_s \end{bmatrix} \begin{bmatrix} P_{n}(s) \\ P_{n-1}(s) \end{bmatrix}= \begin{bmatrix} 1 \\ 0 \end{bmatrix},(\#eq:m0lap) \end{equation}\] which is solved by inverting the matrix \[\begin{eqnarray} \begin{bmatrix} P_{n}(s) \\ P_{n-1}(s) \end{bmatrix}&=& \begin{bmatrix} s +n \lambda_s & -\mu_s \\ -n \lambda_s & s +\mu_s \end{bmatrix}^{-1} \begin{bmatrix} 1 \\ 0 \end{bmatrix}\nonumber\\ &=& \frac{1}{1+\frac{n \lambda_s}{\mu_s}}\begin{bmatrix} \frac{1}{s}+\frac{n \lambda_s}{\mu_s}\frac{1}{s+(n \lambda_s+\mu_s)} \\ \frac{n \lambda_s/\mu_s}{s}-\frac{n \lambda_s}{\mu_s}\frac{1}{s+(n \lambda_s+\mu_s)} \end{bmatrix}. (\#eq:m0lap2) \end{eqnarray}\] Finally we inverse Laplace transform to get \[\begin{equation} \begin{bmatrix} p_{n}(t) \\ p_{n-1}(t) \end{bmatrix}= \frac{1}{1+\frac{n \lambda_s}{\mu_s}}\begin{bmatrix} 1+\frac{n \lambda_s}{\mu_s} e^{-(n \lambda_s+\mu_s)t} \\ \frac{n \lambda_s}{\mu_s}-\frac{n \lambda_s}{\mu_s} e^{-(n \lambda_s+\mu_s)t} \end{bmatrix}.(\#eq:m0a) \end{equation}\] Note that \(p_{n}(t) +p_{n-1}(t)=1\), as expected. The system will be available if it is in the first box, and the probability of being in that state is \(p_{n}\). Given a time interval \(T_0\), we can calculate the fraction of the time the system will be available as follows: \[\begin{eqnarray} A(T_0)&=&\frac{1}{T_0}\int_0^{T_0} d\tau p_n(\tau) \nonumber\\ &=&\frac{1}{T_0(1+\frac{n \lambda_s}{\mu_s})}\int_0^{T_0} d\tau \left(1+\frac{n \lambda_s}{\mu_s} e^{-(n \lambda_s+\mu_s)\tau}\right)\nonumber\\ &\simeq&\frac{1}{1+\frac{n \lambda_s}{\mu_s}}\simeq1-\frac{n \lambda_s}{\mu_s}(\#eq:a0) \end{eqnarray}\] which can also be transformed to \(\frac{{\rm MTTF}/n}{{\rm MTTR}+{\rm MTTF}/n }\), if we convert the rates \(\lambda_s\) and \(\mu_s\) to time frames \(1/{\rm MTTF}\) and \(1/{\rm MTTR}\), respectively. This reproduces the standard availability equation.
Markov Chain with one redundancy.
The availability can be computed as
\[\begin{equation} A_1=1-\left[n\frac{\lambda_s}{\mu_s}\right]\left[(n-1)\frac{\lambda_s}{\mu_s}\right](\#eq:a1pre) \end{equation}\]
Markov Chain with \(c\) redundancies.
Recognizing the patterns in Eqs. @ref(eq:a0pre) and @ref(eq:a1pre) we can generalize the formula to a system with \(c\) redundancies: \[\begin{equation} A_c\simeq 1-\frac{n !}{(n-c-1)!}\left[\frac{\lambda_s}{\mu_s}\right]^{c+1} (\#eq:avgc) \end{equation}\]
Press down arrow to navigate this sections.
A mock up MACH2 drive with two actuators.
An interactive plot showing the gains with MACH2. … [+] Dual actuator drives deliver 2X gain in data rate, which reduces the recovery time by \(50\%\). This results in 2x/4x gain in dirability for RAID5/6.
Markov chain split into two parallel paths: recovery from subcomponent failures and recovery from device failures.
[1] Y. Xie, Dynamic documents with R and knitr, 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC, 2015 [Online]. Available: http://yihui.name/knitr/
[2] Hakim El Hattab, Revealjs. 2020 [Online]. Available: https://revealjs.com/
[3] D. A. Patterson, G. Gibson, and R. H. Katz, “A case for redundant arrays of inexpensive disks (raid),” SIGMOD Rec., vol. 17, no. 3, pp. 109–116, Jun. 1988, doi: 10.1145/971701.50214. [Online]. Available: https://doi.org/10.1145/971701.50214
[4] L. Hellerstein, G. A. Gibson, R. M. Karp, R. H. Katz, and D. A. Patterson, “Coding techniques for handling failures in large disk arrays,” Algorithmica, vol. 12, no. 2, pp. 182–208, Sep. 1994, doi: 10.1007/BF01185210. [Online]. Available: https://doi.org/10.1007/BF01185210
[5] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inf. Theor., vol. 56, no. 9, pp. 4539–4551, Sep. 2010, doi: 10.1109/TIT.2010.2054295. [Online]. Available: https://doi.org/10.1109/TIT.2010.2054295
[6] G. Wang, L. Xiao-Guang, and L. Jing, “Parity declustering data layout for tolerating dependent disk failures in network raid systems,” 2002, pp. 22–25, doi: 10.1109/ICAPP.2002.1173547.
[7] V. Venkatesan and I. Iliadis, “Effect of codeword placement on the reliability of erasure coded data storage systems,” in Quantitative evaluation of systems, 2013, pp. 241–257.
[8] A. Thomasian and M. Blaum, “Higher reliability redundant disk arrays: Organization, operation, and coding,” ACM Trans. Storage, vol. 5, no. 3, Nov. 2009, doi: 10.1145/1629075.1629076. [Online]. Available: https://doi.org/10.1145/1629075.1629076
[9] T. Kawaguchi, “Reliability analysis of distributed raid with priority rebuilding,” 2013.
[10] W. Padgett, “Weibull distribution,” 2011, pp. 1651–1653.
[11] B. Wilson, “Cloud storage durability,” 2018 [Online]. Available: https://www.backblaze.com/blog/cloud-storage-durability/
[12] B. Beach, “Python code to calculate the durability of data stored with erasure coding,” 2018 [Online]. Available: https://github.com/Backblaze/erasure-coding-durability/
[13] I. Iliadis, R. Haas, X.-Y. Hu, and E. Eleftheriou, “Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems,” vol. 36, no. 1, pp. 241–252, Jun. 2008, doi: 10.1145/1384529.1375485. [Online]. Available: https://doi.org/10.1145/1384529.1375485
[14] S. Olmez, “Reliability analysis of storage systems with partially repairable devices,” 2021 [Online]. Available: Under peer review, available up on request