Acknowledgements:
Ian Davies, Mike Barrell, John Bent, and Iman Anvari
The durability of data in storage systems is determined by physical device failure rates. A typical storage system has multiple storage devices, which significantly increases the likelihood of failures. In order to improve the reliability, data redundancy is added to the system so that it becomes possible to recover data on a failed storage device from the parities stored on the remaining healthy devices. For example, in the case of Redundant Array of Inexpensive Disks (RAID), adding one redundancy ensures data robustness to one failure, whereas adding two redundancies will provide robustness against two simultaneous failures[3]. The data durability can be further increased by implementing sophisticated erasure coding schemes[4]–[8] and also by prioritizing the recovery of critically damaged data stripes[9]. In all of these schemes, the durability critically depends on how fast the data on a failed device can be reconstructed to restore the redundancy.
\[\begin{equation} f_{\alpha,\beta}(t)=\frac{\beta }{\alpha} \left( \frac{t}{\alpha}\right)^{\beta -1} e^{-\left( \frac{t}{\alpha}\right)^{\beta}},(\#eq:weipdf) \end{equation}\] \[\begin{equation} F_{\alpha,\beta}(t)= \int_{0}^{t} d\tau f_{\alpha,\beta}(\tau)=1-e^{-\left(\frac{t}{\alpha}\right)^{\beta}},(\#eq:weicdf) \end{equation}\] \[\begin{equation} h_{\alpha,\beta}(t)= \frac{f_{\alpha,\beta}(t)}{1-F_{\alpha,\beta}(t)}=\frac{\beta }{\alpha} \left( \frac{t}{\alpha}\right)^{\beta -1}.(\#eq:weihazard) \end{equation}\]
The hazard rate function, \(h_{\alpha,\beta}(t)\), is the event rate normalized with respect the to the population that survived until \(t\).
\(\beta=1\) gives the exponential distribution, which has completely random head failure times with a fixed failure(hazard) rate: \(h_{\alpha,\beta}(t)=\frac{1}{\alpha}\equiv\lambda\). This simplifies Eqs. @ref(eq:weipdf) , @ref(eq:weicdf) and @ref(eq:weihazard) to: \[\begin{equation} f_{\lambda}(t)= \lambda e^{-\lambda t},\quad F_{\lambda}(t)=1-e^{-\lambda t},\, \rm{and}\quad h_{\lambda}(t)=\lambda. (\#eq:expfuncs) \end{equation}\]
A drive will contain \(N\) heads and other components. We will model the drive as a device with two competing failure modes, one of which is ReMan’able while the other is not.
Press down arrow to navigate this sections.
\[\begin{equation} \text{number of nines}=\text{Floor}\left(-log10(1-\text{Durability})\right) (\#eq:nondefinition) \end{equation}\]
The transition equations: \[\begin{eqnarray} \dot{p}_{n}+n \lambda p_{n} -\mu p_{n-1}&=&0 \nonumber\\ \dot{p}_{n-1}+(n-1) \lambda p_{n-1} +\mu p_{n-1}-n \lambda p_{n} &=&0 \nonumber\\ \dot{p}_{n-2}-(n-1) \lambda p_{n-1} &=&0 . (\#eq:m1) \end{eqnarray}\] After Laplace transforming and converting to a matrix equation, we get: \[\begin{eqnarray} \hspace{-0.5em} \begin{bmatrix} P_{n}(s) \\ P_{n-1}(s)\\ P_{n-2}(s) \end{bmatrix}\hspace{-0.5em}=\hspace{-0.5em} \begin{bmatrix} s +n \lambda & -\mu &0 \\ -n\lambda & s +(n-1)\lambda +\mu &0\\ 0 &-(n-1)\lambda & s \end{bmatrix}^{-1} \begin{bmatrix} 1 \\ 0\\ 0 \end{bmatrix}\hspace{-0.4em} . (\#eq:m1lap) \end{eqnarray}\] We are interested in \(p_{n-2}\), which is the probability that system will have \(2\) simultaneous failures, hence data will be lost. Inverse Laplace transforming, we can calculate the reliability as \[\begin{eqnarray} {\rm R}(t)&=&1-p_{n-2}(t)\nonumber\\ &=&\frac{\mu + (2 n-1) \lambda + \sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n}}{2 \sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n})}\nonumber\\ &&\times e^{-\frac{1}{2}\left(\mu + \lambda (2n-1) -\sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n}\right) t}\nonumber\\ &-&\frac{\mu + (2 n-1) \lambda - \sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n}}{2 \sqrt{(\lambda - \mu)^2 +4\lambda \mu n})}\nonumber\\ &&\times e^{-\frac{1}{2}\left(\mu + \lambda (2n-1)+\sqrt{(\lambda - \mu)^2 + 4 \lambda \mu n}\right) t}\nonumber\\ &\sim& \left[1+n(n-1)\frac{\lambda^2}{\mu^2}\right]e^{-\frac{t}{\mu/[\lambda^2n(n-1)]}}\nonumber\\ &&-n(n-1)\frac{\lambda^2}{\mu^2}e^{-t[\mu+\lambda(2n-1]}, (\#eq:m1f) \end{eqnarray}\] where we expanded in powers of \(\lambda/\mu\) and kept the leading terms in the coefficients and exponents. We can actually further simplify the equation since \(\frac{\lambda^2}{\mu^2}\ll 1\), for instance for recent products we can use \({\rm AFR}<0.5\%\) and the restore time is at the order of days, which gives \(\frac{\lambda}{\mu}\sim 10^{-4}\). Therefore, we can safely drop \(\frac{\lambda^2}{\mu^2}\) terms to get \[\begin{equation} {\rm R}(t)\sim e^{-t/{\rm MTTDL}_1},\nonumber\\ (\#eq:m1f2R) \end{equation}\] with \[\begin{equation} {\rm MTTDL}_1=\frac{1}{n(n-1) \lambda}\frac{\mu}{\lambda}(\#eq:m1f2). \end{equation}\]
Reliability with two redundancies:
It will get more complicated but, we will be getting better at extracting the relevant term. Let’s apply this to an \(n-\)drive system with two redundancies, which will have the Markov chain in Fig.
The state equation in the \(s\)-domain is \[\begin{eqnarray} \hspace{-1em} \begin{bmatrix} P_{n}(s) \\ P_{n-1}(s)\\ P_{n-2}(s)\\ P_{n-3}(s) \end{bmatrix}&=& \mathcal{S} \begin{bmatrix} 1 \\ 0\\ 0\\ 0 \end{bmatrix},(\#eq:statematrix) \end{eqnarray}\] where we defined \(\mathcal{S}\) as: \[\begin{eqnarray} \hspace{-1em} \hspace{-0.5em}\begin{bmatrix} s +n \lambda & -\mu &0 &0 \\ -n\lambda & s +(n-1)\lambda +\mu &-\mu&0\\ 0 &(n-1)\lambda & s +(n-2)\lambda +\mu &0\\ 0&0&(n-2)\lambda&s \end{bmatrix}^{-1}(\#eq:m2s) \end{eqnarray}\]
We are interested in \(P_{n-3}\). One can study the roots of the inverse \(s\) matrix to find that \(P_{n-3}\) has a pole at \(s=0\) with coefficient \(1\) and another one with coefficient \(1\) at \(s=S_1\) where \[\begin{equation} S_1\simeq \lambda n (n-1)(n-2)\left(\frac{\lambda}{\mu}\right)^2 (\#eq:spoles) \end{equation}\] Inverse Laplace transforming, we can calculate the reliability as
\[\begin{equation} {\rm R}(t)=1-p_{n-1}(t)\simeq1-(1-e^{-t/{\rm MTTDL}_2})=e^{-t/{\rm MTTDL}_2}(\#eq:m2fR) \end{equation}\] with \[\begin{equation} {\rm MTTDL}_2=\frac{1}{n(n-1)(n-2) \lambda}\left(\frac{\mu}{\lambda}\right)^2 (\#eq:mtwof). \end{equation}\]
\[\begin{eqnarray} \dot{p}_{n}+n \lambda p_{n} -\mu p_{n-1}&=&0\nonumber \\ \dot{p}_{n-1}+(n-1) \lambda p_{n-1} +\mu p_{n-1}-n \lambda p_{n} &=&0\nonumber\\ \dot{p}_{\rm F}-(n-1) \lambda p_{n-1} &=&0(\#eq:oneredmarkov) \end{eqnarray}\]
//Nov 2020 contact Serkay Olmez for questions
scr_path=Get Default Directory();scr_path=substr(scr_path,2,length(scr_path));scr_name = Current Window() << get window title;
calc_reli=function({},
newname="Reli simulation "||substitute(substitute(MDYHMS(today()),"/","_"),":","_");
dtt=new table(newname,invisible(1));
dtt<< add rows(num_simulations*num_drives_per_system);
eval(substitute(expr(dtt<<new column("sys_id",formula(1+Floor((Row()-1)/num_drps_)))),expr(num_drps_),name expr(num_drives_per_system)));
eval(substitute(expr(dtt<<new column("drv_id",formula(Row()-Floor((Row()-1)/num_drps_)*num_drps_))),expr(num_drps_),name expr(num_drives_per_system)));
dtt<<delete column(1); wait(1+log10(num_simulations)/2);:drv_id<<delete formula();:sys_id<<delete formula();
eval(substitute(expr(dtt<<new column("drv_failure_time",formula(Random Weibull(beta_,alpha_)))),expr(alpha_),name expr(alpha),expr(beta_),name expr(betav)));
column("drv_failure_time")<<delete formula();
dtt << Sort( By(:sys_id,column("drv_failure_time")), Order(Ascending,Ascending), Replace Table);
dtt<<new column("num_failures",numeric); dtt<<new column("num_max_failures",numeric);
:num_failures[1]=1;:num_max_failures[1]=1;maxseen=1;toberecovered={};
for(rr=2,rr<=nrows(dtt),rr++,
rr_time=:drv_failure_time[rr];
rr_m1_time=:drv_failure_time[rr-1];
if(:sys_id[rr-1]!=:sys_id[rr],:num_failures[rr]=1; toberecovered={};maxseen=1;:num_max_failures[rr]=1);
if(:sys_id[rr-1]==:sys_id[rr],
Insert Into( toberecovered,:drv_failure_time[rr-1]);
tosubtracted=0;
for(rc=nitems(toberecovered),rc>=1,rc--,
if(:drv_failure_time[rr]-toberecovered[rc]>repair_time, tosubtracted++;remove from(toberecovered,rc))
);
:num_failures[rr]=:num_failures[rr-1]-tosubtracted+1;
if(:num_failures[rr]>maxseen,maxseen=:num_failures[rr]);
:num_max_failures[rr] =maxseen
);
);
rc=1;
//throw();
mm=1;rr=1;mms=1;
big_failure_perc_table={};
for(mms=1,mms<=12/stepsize*max_year,mms++,
mm=stepsize*mms; //do this quarterly to speed up
month_tresh=365/12*mm;
if (mod(mm, 5) == 0,caption("Status:"|| char(round(100*mm/(12*max_year)))|| "% "); wait(0); ) ;
try(dtt<<delete column("filtered_num_max_failures"));
Eval(Substitute(Expr(dtt<<new column("filtered_num_max_failures",formula(if(:drv_failure_time>_thres_,.,:num_max_failures)))), Expr(_thres_), month_tresh));
dts=dtt <<Summary(Group( :sys_id ), Max( :filtered_num_max_failures ),Link to original data table( 0 ),invisible(1));
column("Max(filtered_num_max_failures)")<<set name("max_in_frame");
:max_in_frame[dts<<get rows where(is missing(:max_in_frame))]=0; //filling in the missing rows- they had no failures
failure_perc_table={};dts<<new column("system_failure",numeric);
for(rr=1,rr<=nitems(redundancy),rr++,
Eval(Substitute(Expr(:system_failure<<set formula(if(max_in_frame>_red,100,0))), Expr(_red), redundancy[rr]));
:system_failure<<eval formula();
failure_perc_table[rr]=col mean(:system_failure)
);
dts<<close window();big_failure_perc_table[mms]=failure_perc_table;
);
caption(remove); dtt<<close window();
);
////////////////////////////////////////
clear log();
try(dtsum<<close window());try(dtt<<close window());
start_time=today();
max_year=5;
AFRlist={25,31,38};repair_timelist={6};redundancy={1};num_drives_per_system_list={9};
AFRlist={0.5,1,2,5};repair_timelist={12,24,48,120,240};redundancy={0,1,2,3};num_drives_per_system_list={4,8,16,24};
AFRlist={2};repair_timelist={12,24,48,120,240};redundancy={0,1,2,3};num_drives_per_system_list={4,8,16,24};
AFRlist={2.5};repair_timelist={4};//days
redundancy={0,1};num_drives_per_system_list={20};
betalist={1,1.25,1.5};
//AFRlist={0.2};repair_timelist={0.33}; num_drives_per_system_list={8};
stepsize=3; //in months
dtsum=new table("Reli Summary", invisible(1));
dtsum<<new column("simulation_size");dtsum<<new column("total_num_of_drives");dtsum<<new column("redundancy");
eval(parse("dtsum<<new column(\!"AFR(%)\!")")); eval(parse("dtsum<<new column(\!"repair_time(days)\!")"));
dtsum<<new column("time(months)");dtsum<<new column("system failure(%)");
dtsum<<new column("system_reli",formula(1- :Name( "system failure(%)" ) / 100 ));
dtsum<<new column("MC_num_of_nines",formula(floor(-Log10( :Name( "system failure(%)" ) / 100 ))));
dtsum<<new column("MC_num_of_nines_",formula(-Log10( :Name( "system failure(%)" ) / 100 )));
dtsum<<new column("simulation_duration");
dtsum<<new column("beta");
index=0;
indexmax= nitems(AFRlist)*nitems(repair_timelist)*nitems(num_drives_per_system_list)*nitems(betalist);
gridindex=0;bi=1;afrindex=1;repindex=1;numodindex=1;
for(bi=1,bi<=nitems(betalist),bi++,
betav=betalist[bi];
for(numodindex=1,numodindex<=nitems(num_drives_per_system_list),numodindex++,
num_drives_per_system=num_drives_per_system_list[numodindex];
for(afrindex=1,afrindex<=nitems(AFRlist),afrindex++,
for(repindex=1,repindex<=nitems(repair_timelist),repindex++,
gridindex++;
remaining_time=round((indexmax-gridindex)*(today()-start_time)/gridindex/60);
caption(char(bi)||"_"||char(numodindex)||"_"||char(afrindex)||"_"||char(repindex)||" out of "
||char(nitems(betalist))||"_"||char(nitems(num_drives_per_system_list))||"_"||char(nitems(AFRlist))||"_"||char(nitems(repair_timelist))||" "||char(remaining_time)||" mins to go");
AFRmx=AFRlist[afrindex];
AFR=100*(1-(1-AFRmx/100)^(1/max_year));
repair_time=repair_timelist[repindex];
alpha=round(-365/log(1-AFR/100)); // this is in days
num_simulations=4*10^Floor(7-LOG10(num_drives_per_system));
print("number of drives:"||char(num_simulations*num_drives_per_system));
//throw();
simstart=today();
calc_reli();
simend=today();
print(char(round(simend-simstart))||" seconds");
for(mms=1,mms<=12/stepsize*max_year,mms++,
mm=stepsize*mms; //do this quarterly to speed up
for(rr=1,rr<=nitems(redundancy),rr++,
failure_perc_table=big_failure_perc_table[mms];
dtsum<<add rows(1);index++;
column("redundancy")[index]=redundancy[rr];
column("time(months)")[index]=mm;
column("system failure(%)")[index]=failure_perc_table[rr];
column("AFR(%)")[index]=AFR;
column("repair_time(days)")[index]=repair_time;
:total_num_of_drives[index]=num_drives_per_system;
:simulation_size[index]=num_simulations;
:simulation_duration[index]=round(simend-simstart);
:beta[index]=betav;
);
);
//throw();
newname=substitute(substitute(MDYHMS(today()),"/","_"),":","_");
dtsum<<save("C:\Users\451516\my_codes\erasure_coded_storage_system_analysis\EC reli MC simulation results_"||newname||".jmp");
);
);
);
);
dtsum<<new column("alpha",formula(round(-365.25/log(1-Name("AFR(%)")/100))));
dtsum<<new column("beta_sys",formula((:redundancy+1)*(:beta-1)+1));
dtsum<<new column("alpha_sys",formula( ((Name("repair_time(days)")^:redundancy)*(factorial(:total_num_of_drives)/(factorial(:total_num_of_drives-:redundancy-1)
*factorial(:redundancy)))*(:beta^(:redundancy+1)/:alpha^(:beta*(:redundancy+1)))*1/(:beta_sys))^(-1/:beta_sys)));
dtsum<<new column("theory_system failure(%)",formula(100*(1-Exp( -(365.25/:alpha_sys*:Name( "time(months)" ) / 12)^:beta_sys ))));
dtsum<<new column("theory_system_reli",formula(1- :Name( "theory_system failure(%)" ) / 100 ));
dtsum<<new column("theory_num_of_nines",formula(floor(-Log10( :Name( "theory_system failure(%)" ) / 100 ))));
dtsum<<new column("theory_num_of_nines_",formula(-Log10( :Name( "theory_system failure(%)" ) / 100 )));
:redundancy<<modeling type(nominal);
newname=substitute(substitute(MDYHMS(today()),"/","_"),":","_");
notein="This data table is created by "||scr_path||scr_name||".jsl on"||char( As Date(Today()););
dtsum << New Table Variable( "Notes",notein );
dtsum<< new column("Notes");
:Notes[1]=notein; //this table will go into CSV. Just add a column to keep the info
dtsum<< new column("Date");
thisdate=substitute(substitute(MDYHMS(today()),"/","_"),":","_");
:Date[1]==thisdate;
dtsum<<save("C:\Users\451516\my_codes\erasure_coded_storage_system_analysis\EC reli MC simulation results generic beta "||newname||".jmp");
duration=char(floor((today()-start_time)/60))||" minutes";
Bivariate(
Y( :MC_num_of_nines_ ),
X( :theory_num_of_nines_ ),
Fit Special( Intercept( 0 ), Slope( 1 ), {Line Color( {212, 73, 88} )} ),
SendToReport(
Dispatch(
{},
"1",
ScaleBox,
{Label Row( {Show Major Grid( 1 ), Show Minor Grid( 1 )} )}
),
Dispatch(
{},
"2",
ScaleBox,
{Label Row( {Show Major Grid( 1 ), Show Minor Grid( 1 )} )}
),
Dispatch(
{},
"Bivar Plot",
FrameBox,
{Row Legend(
redundancy,
Color( 1 ),
Color Theme( "JMP Default" ),
Marker( 0 ),
Marker Theme( "" ),
Continuous Scale( 0 ),
Reverse Scale( 0 ),
Excluded Rows( 0 )
)}
)
)
)
\[\begin{equation} h_{n-1}\equiv1-(1-\rm{UER})^{N_\text{bits}}\simeq 1-e^{-UER\times N_\text{bits}}=1-e^{-UER\times (n-1)\times C_\text{bits}}(\#eq:hnmone) \end{equation}\]
The overall data durability can be improved by implementing another layer of erasure coding. Top layer is composed of already erasure coded sub-elements.
\[\begin{eqnarray} \frac{1}{{\rm MTTDL}^{\rm t}_d}&=&\frac{1}{{\rm MTTDL}^{\rm t}_{d,{\rm DG}}}+\frac{h^{\rm t}_{m-d}}{{\rm MTTDL}^{\rm t}_{d-1,{\rm DG}}}\nonumber \end{eqnarray}\]
\[\begin{eqnarray} {\rm MTTDL}^{\rm t}_{d,{\rm DG}} &=&\left(\frac{\mu_{\rm t}}{ \Lambda^{\rm DG}}\right)^d \frac{(m-d-1)!}{ \Lambda^{\rm DG} m!},\nonumber \end{eqnarray}\] \(h^{\rm t}_{m-d}=1-(1-{\rm UER})^{(n-c)(m-d)C}\) probability of URE(s).
\[\begin{eqnarray} \frac{1}{{\rm MTTDL}^{\rm t}_d}&=&\frac{1}{{\rm MTTDL}^{\rm t}_{d,{\rm DG}}}+\frac{h^{\rm t}_{m-d}}{{\rm MTTDL}^{\rm t}_{d-1,{\rm DG}}} \end{eqnarray}\]
\[\begin{eqnarray}
{\rm MTTDL}^{\rm t}_{d,{\rm DG}} &=&\left(\frac{\mu_{\rm t}}{ \Lambda^{\rm DG}}\right)^d \frac{(m-d-1)!}{ \Lambda^{\rm DG} m!},
\end{eqnarray}\] \(h^{\rm t}_{m-d}=1-(1-{\rm UER})^{(n-c)(m-d)C}\) probability of URE(s)
during the critical top layer repairs.
Availability=1-Fraction of time spent for repair:
\[\begin{equation} A_0 \simeq1-\frac{n \lambda_s}{\mu_s}.(\#eq:a0pre) \end{equation}\]
Consider the Markov chain representation below:
The green arrow shows that even your system is down, you did not lose any data, you just need to repair it. This will make a lot of difference in the math. We will skip a couple of steps and start with the \(s\)-matrix: \[\begin{equation} \begin{bmatrix} s +n \lambda_s & -\mu_s \\ -n \lambda_s & s +\mu_s \end{bmatrix} \begin{bmatrix} P_{n}(s) \\ P_{n-1}(s) \end{bmatrix}= \begin{bmatrix} 1 \\ 0 \end{bmatrix},(\#eq:m0lap) \end{equation}\] which is solved by inverting the matrix \[\begin{eqnarray} \begin{bmatrix} P_{n}(s) \\ P_{n-1}(s) \end{bmatrix}&=& \begin{bmatrix} s +n \lambda_s & -\mu_s \\ -n \lambda_s & s +\mu_s \end{bmatrix}^{-1} \begin{bmatrix} 1 \\ 0 \end{bmatrix}\nonumber\\ &=& \frac{1}{1+\frac{n \lambda_s}{\mu_s}}\begin{bmatrix} \frac{1}{s}+\frac{n \lambda_s}{\mu_s}\frac{1}{s+(n \lambda_s+\mu_s)} \\ \frac{n \lambda_s/\mu_s}{s}-\frac{n \lambda_s}{\mu_s}\frac{1}{s+(n \lambda_s+\mu_s)} \end{bmatrix}. (\#eq:m0lap2) \end{eqnarray}\] Finally we inverse Laplace transform to get \[\begin{equation} \begin{bmatrix} p_{n}(t) \\ p_{n-1}(t) \end{bmatrix}= \frac{1}{1+\frac{n \lambda_s}{\mu_s}}\begin{bmatrix} 1+\frac{n \lambda_s}{\mu_s} e^{-(n \lambda_s+\mu_s)t} \\ \frac{n \lambda_s}{\mu_s}-\frac{n \lambda_s}{\mu_s} e^{-(n \lambda_s+\mu_s)t} \end{bmatrix}.(\#eq:m0a) \end{equation}\] Note that \(p_{n}(t) +p_{n-1}(t)=1\), as expected. The system will be available if it is in the first box, and the probability of being in that state is \(p_{n}\). Given a time interval \(T_0\), we can calculate the fraction of the time the system will be available as follows: \[\begin{eqnarray} A(T_0)&=&\frac{1}{T_0}\int_0^{T_0} d\tau p_n(\tau) \nonumber\\ &=&\frac{1}{T_0(1+\frac{n \lambda_s}{\mu_s})}\int_0^{T_0} d\tau \left(1+\frac{n \lambda_s}{\mu_s} e^{-(n \lambda_s+\mu_s)\tau}\right)\nonumber\\ &\simeq&\frac{1}{1+\frac{n \lambda_s}{\mu_s}}\simeq1-\frac{n \lambda_s}{\mu_s}(\#eq:a0) \end{eqnarray}\] which can also be transformed to \(\frac{{\rm MTTF}/n}{{\rm MTTR}+{\rm MTTF}/n }\), if we convert the rates \(\lambda_s\) and \(\mu_s\) to time frames \(1/{\rm MTTF}\) and \(1/{\rm MTTR}\), respectively. This reproduces the standard availability equation.
The availability can be computed as
\[\begin{equation} A_1=1-\left[n\frac{\lambda_s}{\mu_s}\right]\left[(n-1)\frac{\lambda_s}{\mu_s}\right](\#eq:a1pre) \end{equation}\]
Recognizing the patterns in Eqs. @ref(eq:a0pre) and @ref(eq:a1pre) we can generalize the formula to a system with \(c\) redundancies: \[\begin{equation} A_c\simeq 1-\frac{n !}{(n-c-1)!}\left[\frac{\lambda_s}{\mu_s}\right]^{c+1} (\#eq:avgc) \end{equation}\]
Press down arrow to navigate this sections.
[1] Y. Xie, Dynamic documents with R and knitr, 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC, 2015 [Online]. Available: http://yihui.name/knitr/
[2] Hakim El Hattab, Revealjs. 2020 [Online]. Available: https://revealjs.com/
[3] D. A. Patterson, G. Gibson, and R. H. Katz, “A case for redundant arrays of inexpensive disks (raid),” SIGMOD Rec., vol. 17, no. 3, pp. 109–116, Jun. 1988, doi: 10.1145/971701.50214. [Online]. Available: https://doi.org/10.1145/971701.50214
[4] L. Hellerstein, G. A. Gibson, R. M. Karp, R. H. Katz, and D. A. Patterson, “Coding techniques for handling failures in large disk arrays,” Algorithmica, vol. 12, no. 2, pp. 182–208, Sep. 1994, doi: 10.1007/BF01185210. [Online]. Available: https://doi.org/10.1007/BF01185210
[5] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inf. Theor., vol. 56, no. 9, pp. 4539–4551, Sep. 2010, doi: 10.1109/TIT.2010.2054295. [Online]. Available: https://doi.org/10.1109/TIT.2010.2054295
[6] G. Wang, L. Xiao-Guang, and L. Jing, “Parity declustering data layout for tolerating dependent disk failures in network raid systems,” 2002, pp. 22–25, doi: 10.1109/ICAPP.2002.1173547.
[7] V. Venkatesan and I. Iliadis, “Effect of codeword placement on the reliability of erasure coded data storage systems,” in Quantitative evaluation of systems, 2013, pp. 241–257.
[8] A. Thomasian and M. Blaum, “Higher reliability redundant disk arrays: Organization, operation, and coding,” ACM Trans. Storage, vol. 5, no. 3, Nov. 2009, doi: 10.1145/1629075.1629076. [Online]. Available: https://doi.org/10.1145/1629075.1629076
[9] T. Kawaguchi, “Reliability analysis of distributed raid with priority rebuilding,” 2013.
[10] W. Padgett, “Weibull distribution,” 2011, pp. 1651–1653.
[11] B. Wilson, “Cloud storage durability,” 2018 [Online]. Available: https://www.backblaze.com/blog/cloud-storage-durability/
[12] B. Beach, “Python code to calculate the durability of data stored with erasure coding,” 2018 [Online]. Available: https://github.com/Backblaze/erasure-coding-durability/
[13] I. Iliadis, R. Haas, X.-Y. Hu, and E. Eleftheriou, “Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems,” vol. 36, no. 1, pp. 241–252, Jun. 2008, doi: 10.1145/1384529.1375485. [Online]. Available: https://doi.org/10.1145/1384529.1375485
[14] S. Olmez, “Reliability analysis of storage systems with partially repairable devices,” 2021 [Online]. Available: Under peer review, available up on request