ние программы позволит существенно увеличить скорость и эффективность обработки заказов. В результате сокращение запасов составит 173,0 тыс. руб., сокращение издержек по заказу - 5,6 тыс. руб., рост доходов благодаря сокращению дефицитов - 113,2 тыс. руб., прирост оборачиваемости - 8,4 дней, а рентабельности продаж - 3,6 п.п. Общий экономический эффект составит 264,8 тыс. руб.

## Список использованной литературы

1.Обоснование направлений повышения эффективности функционирования предприятия на основе инновационной деятельности / Д. А. Глушанина [и др.]; науч. рук. О. Г. Довыдова // НИРС БГЭУ: сборник научных статей. Вып. 7 / [редкол: А.А. Быков (пред. и др.]; М-во образования Респ. Беларусь, Белорус. гос. экон. ун-т. - Минск: БГЭУ, 2018. - С. 212-216.

УДК 004.056.53

## DESIGN OF A SPECIAL-PURPOSE MULTIPROCESSOR ARRAY FOR DATA PROTECTION AND DIGITAL SIGNATURE

Tiunchick A.A., PhD, assistant professor<br>Belarusian State Agrarian Technical University, Minsk

Ключевые слова: защита данных, ЭЦП, многопроцессорные устройства, модульное умножение Монтгомери.

Key words: data protection, digital signature, multiprocessor array, Montgomery modular multiplication.

Аннотация: Представлено специализированное многопроцессорное устройство для модульного умножения и возведения в степень по алгоритму Монтгомери. Устройство может быть реализовано на основе технологии СБИС или ПЛМ.

Summary: A special-purpose multiprocessor array for Montgomery modular multiplication and exponentiation is presented/ The array can be implemented via VLSI or FPGA technology/

Public-key cryptography plays an important role in digitalization of the economy. Modular multiplication is the basis of many encryption and digital signature algorithms. Efficient implementation of the operation is important to process larger numbers and increase cryptographic strength. Besides that, ehere exists a great demand for developing special-purpose hardware to speed up the computations.

A modular exponentiation operation $M^{e} \bmod n$ cannot be implemented in a naive fashion by first exponentiating $M^{e}$ and then performing reduction modulo $n$, since the intermediate result $M^{e}$ contains too many digits. Hence, the intermediate results of the exponentiation are to be reduced modulo $n$ at each step. The straightforward reduction modulo $n$ involves a number of arithmetic operations (division, subtraction, etc.), and is very time consuming. Therefore, special algorithms for modular operations are to be used.
P. L. Montgomery [1] proposed an algorithm for modular multiplication $A B \bmod m$ without trial division. Let $A, B$ be elements of $Z_{m}$, where $Z_{m}$ is the set of integers between 0 and $m-1$. Let $h$ be an integer coprime to $m$, and $h>m$. Montgomery modular multiplication (MM) is an operation

$$
\begin{equation*}
A^{h, m} \otimes B=A \cdot B \cdot h^{-1} \bmod m . \tag{1}
\end{equation*}
$$

Implementation of this operation is much easier than a normal reduction modulo $m$; and is based on some facts from number theory. The use of MM does not result in the desirable speed-up immediately. To compute $A B \bmod m$, a computation of MM is to be performed twice:

$$
\begin{aligned}
& C=A h, m B=A \cdot B \cdot h^{-1} \bmod m, \text { and } \\
& C h, m\left(h^{2} \bmod m\right)=A B h^{-1} \cdot h^{2} \cdot h^{-1} \bmod m=A B \bmod m,
\end{aligned}
$$

where $h^{2} \bmod m$ is computed in advance. The advantage of using two Montgomery multiplications instead of one operation of plain modular multiplication is uncertain.

An efficient way to compute $A B \bmod m$ using MM is by exploiting special representations of $A$ and $B$. Let $\overline{\bar{X}}$ be called an image of $X$ if $\overline{\bar{X}}=X \cdot h \bmod m, h>m$. If $h$ and $m$ are relatively prime, then there exists a one-to-one correspondence between $X$ and $\overline{\bar{X}}$. MM of $\overline{\bar{A}}$ and $\overline{\bar{B}}$ is isomorphic to the modular multiplication of $A$ and $B$. Indeed,

$$
\stackrel{=}{A, m} \stackrel{\bar{\otimes}}{\otimes}=(A h \bmod m) \stackrel{h, m}{\otimes}(B h \bmod m)=(A B) h \bmod m=\overline{\overline{A \cdot B}} .
$$

The reduction of $X$ to $\bar{X}$ and vice versa can be carried out on the basis of MM:

$$
\begin{align*}
& X \stackrel{h, m}{\otimes}\left(h^{2} \bmod m\right)=X \cdot h^{2} h^{-1} \bmod m=\overline{\bar{X}}  \tag{2}\\
& \bar{X}^{h, m} \otimes \otimes^{\otimes} 1=X \cdot h \bmod m \cdot 1 \cdot h^{-1} \bmod m=X \tag{3}
\end{align*}
$$

By virtue of the isomorphism of modular multiplication and MM, the use of the images is very convenient for exponentiation. Let $\left(\otimes^{h, m} X\right)^{n}$ denote $(n-1) \mathrm{MMs}$ of $X$ by itself. To compute $Y=X^{n} \bmod m$, we should perform three steps: first, convert $X$ to $\overline{\bar{X}}$ by (2); next, realize $\overline{\bar{Y}}=\left({ }^{h, m} \overline{\bar{X}}\right)^{n}=\overline{\overline{X^{n}}}$; and finally, convert $\overline{\bar{Y}}$ to $Y$ by (3).

Several algorithms suitable for hardware implementation of MM are known. In this paper, the design of a multiprocessor array is based on the algorithm described and analysed in [2]. Let numbers $A, B$ and $m$ be written with radix 2:

$$
A=\sum_{i=0}^{N-1} a_{i} \cdot 2^{i}, \quad B=\sum_{i=0}^{M} b_{i} \cdot 2^{i}, \quad m=\sum_{i=0}^{M-1} m_{i} \cdot 2^{i}
$$

where $a_{i}, b_{i}, m_{i} \in\{0 ; 1\}, N$ and $M$ are the numbers of digits in $A$ and $m$, respectively. $B$ satisfies condition $B<2 m$, and has at most $M+1$ digits. $m$ is odd (to be coprime to the radix 2). Extend a definition of $A$ with an extra zero digit $a_{N}=0$. The algorithm for MM is given below.

$$
s:-0
$$

For $i:=0$ to $N$ do

## Begin

$$
\begin{align*}
& u_{i}:=\left(\left(s_{0}+a_{i} * b_{0}\right) * w\right) \bmod r  \tag{4}\\
& s:=\left(s+a_{i} * B+u_{i} * m\right) \operatorname{div} r
\end{align*}
$$

## End

Initial condition $B<2 m$ ensures that intermediate and final values of $s$ are bounded by $3 m$. The use of an iteration with $a_{N}=0$ ensures that the final value $s<2 m$. Hence, this value can be used for $B$ input in a subsequent multiplication. Since 2 and $m$ are relatively prime, we can precompute value $w=\left(2-m_{0}\right)^{-1} \bmod 2$. An implementation of the operations div 2 and $\bmod 2$ is trivial (shifting and inspecting the lowest digit, respectively). Algorithm (4) returns either $s=A \cdot B \cdot 2^{-n-1} \bmod m$ or $s+m$ (because $s<2 m$ ). In any case, this extra $m$ has no effect on subsequent arithmetics modulo $m$. It should be noted, that the number of iterations in (4) affects $h$. In our case, (4) presents the implementation of (1) with $h=2^{N+1}$.

To design multiprocessor array, first we construct a data dependency graph (also referred as DG) for Algorithm (4). For $N$ - and $M$-digit integers $A$ and $B$, a graph consists of $N+2$ rows and $M+1$ columns. The $i$-th row represents the $i$-th iteration of (4). Arrows are associated with digits transferred along
indicated directions. Each vertex $v(j, i), \quad i \in\{0, \ldots, N\}, j \in\{0, \ldots, M\}$ is associated with the operation

$$
s_{j-1}^{(i+1)}+2 \cdot c_{\text {out }}:=s_{j}^{(i)}+a_{i} \cdot b_{j}+u_{i} \cdot m_{j}+c_{i n}
$$

where $S_{j}^{(i)}$ denotes the $j$-th digit of the $i$-th partial product of $s, c_{o u t}$ and $c_{i n}$ are the output and input carries. Rightmost starred vertices, i.e., vertices marked with «...», perform calculations of $u_{i}:=\left(\left(s_{0}+a_{i} * b_{0}\right) * w\right) \bmod 2$ besides an ordinary operation. Using standard notation, the vertex operations can be specified in terms of inputs/outputs as follows:

$$
\begin{align*}
s_{\text {out }}+2 \cdot c_{\text {out }} & :=s_{\text {in }}+a_{\text {in }} \cdot b_{\text {in }}+u_{\text {in }} \cdot m_{\text {in }}+c_{\text {in }}, \\
a_{\text {out }} & :=a_{\text {in }}, \quad b_{\text {out }}:=b_{\text {in }},  \tag{5}\\
u_{\text {out }} & :=u_{\text {in }}, \quad m_{\text {out }}:=m_{\text {in }}
\end{align*}
$$

for plain vertices, and

$$
\begin{gather*}
u_{\text {out }}:=\left(s_{\text {in }}+a_{\text {in }} \cdot b_{\text {in }}\right) \cdot w_{\text {in }} \\
c_{\text {out }}:=m \operatorname{maj}_{2}\left(s_{\text {in }}, a_{\text {in }} \cdot b_{\text {in }}, u_{\text {in }} \cdot m_{\text {in }}\right)  \tag{6}\\
a_{\text {out }}:=a_{\text {in }}, \quad b_{\text {out }}:=b_{\text {in }}, \quad m_{\text {out }}:=m_{\text {in }},
\end{gather*}
$$

for starred vertices, where $\operatorname{maj}_{2}\left(s_{i n}, a_{i n} \cdot b_{i n}, u_{i n} \cdot m_{i n}\right)$ is 1 if at least two out of three entries are 1 s ; otherwise it is 0 .

If instead of digits $A$ we input digits $B$ both, at the topmost and rightmost vertices of DG , then the graph model represents a calculation of

$$
B \stackrel{h, m}{\otimes} B=(\stackrel{h, m}{\otimes} B)^{2},
$$

called M- squaring. To represent the computation of $\left({ }_{(\otimes, m}^{h, m} B\right)^{3}$, two graphs can be joined in a single graph by connecting $s_{j}$-outputs of the first graph with $b_{j}$-inputs of the next (identical) graph, in which rightmost inputs $a_{i}$ get digits of $B$ as before. To compute $(h, m B)^{n}$, we will need $n-1$ joined graphs. The resulting graph model consists of vertices located in a rectangular domain $V_{1}=\{v(i, j) \mid 0 \leq i \leq n \times(M+2)-1,0 \leq j \leq M+1\}$. The graph is almost homogeneous, with exceptional starred vertices in the rightmost column.

However, a faster way to compute $B^{n} \bmod m$ is by reducing the computation to a sequence of modular squares and multiplications. Let $\left[n_{0} \ldots n_{k}\right.$ ] be a binary representation of $n$, i.e., $n=n_{0}+2 n_{1}+\cdots+2^{k} n_{k}, \quad n_{j} \in\{0 ; 1\}, \quad k=\left\lfloor\log _{2} n\right\rfloor$, $n_{k}=1$. Let $\beta$ denote a partial product. We start out with $\beta=B$ and run from $n_{k-1}$
to $n_{0}$ as follows: if $n_{j}=0$, then $\beta:=\beta^{2}$; if $n_{j}=1$, then $\beta:=\beta^{2} * B$. Thus, we need at most $2 k$ operations to compute $B^{n}$. This algorithm has an advantage over a low-to-high binary method of exponentiation since, when implemented in hardware, it requires only one set of storage registers for intermediate results as opposed to two for a low-to-high method.

To perform M-squaring the dependency graph for M-multiplication can be modified in such a way that all the $b_{j}$ 's inputs enter the graph only via the toprow vertices. This eliminates rightmost $a_{i}$-inputs entirely. To deliver all $b_{j} \mathrm{~s}$ to the rightmost vertices, we have to pump them through the graph in a direction determined by vector $(1,-1)$. To do it, additional arcs $x_{j}$ 's for propagation of $b_{j}$ 's digits have to be added to DG. Vertex operations are to be slightly modified to provide propagation of these digits: each non-starred vertex just transmits its $x$ input data to an $x$-output, while when arriving at the rightmost vertices, these data are "reflected" and propagated to the left as if they were ordinary $a_{i}$ 's input data. It is known that the output value $s<2 m$. Hence, we need at most $M+1$ rows for the "reflected" factor and an additional row for the extension with an extra zero digit. Using standard notation, the vertex operations for Msquaring can be specified in terms of inputs/outputs as follows:

$$
\begin{gather*}
s_{\text {out }}+2 \cdot c_{\text {out }}:=s_{\text {in }}+a_{\text {in }} \cdot b_{\text {in }}+u_{\text {in }} \cdot m_{\text {in }}+c_{\text {in }}, \\
a_{\text {out }}:=a_{\text {in }}, \quad x_{\text {out }}:=x_{\text {in }},  \tag{7}\\
b_{\text {out }}:=b_{\text {in }}, \quad u_{\text {out }}:=u_{\text {in }}, \quad m_{\text {out }}:=m_{\text {in }},
\end{gather*}
$$

for plain vertices, and for starred vertices the operation is:

$$
\begin{gather*}
u:=\left(s_{\text {in }}+x_{\text {in }} \cdot b_{\text {in }}\right) \cdot w_{\text {in }}, \\
s_{\text {out }}+2 \cdot c_{\text {out }}:=s_{\text {in }}+x_{\text {in }} \cdot b_{\text {in }}+u \cdot m_{\text {in }},  \tag{8}\\
c_{\text {out }}:=\operatorname{maj}_{2}\left(s_{\text {in }}, x_{\text {in }} \cdot b_{\text {in }}, u_{\text {in }} \cdot m_{\text {in }}\right), \\
u_{\text {out }}:=u, \quad a_{\text {out }}:=x_{\text {in }}, \quad b_{\text {out }}:=b_{\text {in }}, \quad m_{\text {out }}:=m_{\text {in }} .
\end{gather*}
$$

DG for an exponentiation as a whole is constructed as a composition of graphs for M-multiplication and M-squaring by joining outputs of one graph with corresponding inputs of the consecutive graph. There are at most $2 k$ graphs altogether, and the precise number of required graphs for M multiplication and M -squaring and the order in which they occur in the composition is fully determined only by the binary representation of $n$. The vertices of the resulting graph constitute a rectangular domain $V_{2}=\{v(i, j) \mid 0 \leq i \leq 2 k \times(M+2), 0 \leq j \leq M+1\}$, where $k=\left\lfloor\log _{2} n\right\rfloor$.

The next stage of the design is a space-time mapping of domain $V_{2}$ onto a one-dimensional domain of processing elements (PE). Spatial mapping is determined by a linear operator with matrix $P=(10)$, which maps an indefinitely long composition of the cohered DGs onto a linear processor array with $M+1$ processing elements: each column of vertices is mapped onto one PE. Hence, each PE has to be able to operate in two modes. To control the operation modes, a sequence of one-bit control signals $\tau$ is fed into the rightmost PE and propagated through the array. If $\tau=0$ the PE implements an operation for M-multiplication, if $\tau=1$, for M-squaring. The order in which control signals are input is determined by the binary representation of $n$. A timing function that provides a correct order of operations is $t(v)=2 i+j$. The total running time is thus at most $\left(4\left\lfloor\log _{2} n\right\rfloor+1\right) M+8\left\lfloor\log _{2} n\right\rfloor$ time units.

## References

1. Montgomery, P.L. Modular multiplication without trial division // P.L. Montgomery Mathematics of Computations, 1985 (44) 519-521.
2. Walter, C.D. Systolic Modular Multiplication // IEEE Trans. on Comput., vol. 42 (1993), No. 3, pp. 376-378.
3. Tiountchik, A.A. Systolic modular exponentiation via Montgomery algorithm. // J. Electronics Letters. 1998. Vol. 34, No 9. pp. 874-875.

## УДК $\mathbf{6 3 1 . 1 4 5}$

## ПОКАЗАТЕЛИ УРОВНЯ МЕХАНИЗАЦИИ ПРОИЗВОДСТВА В ОТРАСЛЯХ АГРОПРОМЫШЛЕННОГО КОМПЛЕКСА

Цыганов В.А., к.ф.-м.н., доцент<br>УО «Белорусский государственный аграрный технический университет», г. Минск

Ключевые слова: агропромышленный комплекс, механизация, коэффициент механизации работ, коэффициент механизации труда
Key words: agro-industrial complex, mechanization, work mechanization coefficient, labor mechanization coefficient

Аннотация: в статье дана краткая характеристика этапов замены ручного труда машинным в отраслях агропромышленного комплекса, приведены соотношения для расчета коэффициентов механизации работ и труда.

