DOI: 10.14489/vkit.2018.05.pp.042-050
Исупов К. С., Куваев А. С. ВЫСОКОТОЧНОЕ ПАРАЛЛЕЛЬНОЕ УМНОЖЕНИЕ ЧИСЕЛ С ПЛАВАЮЩЕЙ ТОЧКОЙ НА ЦЕНТРАЛЬНЫХ ПРОЦЕССОРАХ И ГРАФИЧЕСКИХ УСКОРИТЕЛЯХ (pp. 42-50)
Аннотация. Рассматривается алгоритм умножения чисел с плавающей точкой многократной точности, в котором используется система остаточных классов (СОК) для представления многоразрядных мантисс. Цифры в СОК, называемые остатками, не имеют весов и являются взаимно независимыми, что позволяет обрабатывать их параллельно и без распространения переноса. Новый алгоритм умножения реализован на языке C для центральных процессоров (CPU) и графических ускорителей (GPU) с использованием программнно-аппаратной архитектуры параллельных вычислений (NVIDIA CUDA). Параллельные GPU-подпрограммы спроектированы в виде device-функций и запускаются из основного CUDA-ядра. Эксперименты показывают, что новый алгоритм эффективно распараллеливается, а его производительность значительно выше, чем у алгоритмов на основе двоичной системы счисления, реализованных в известных высокоточных библиотеках.
Ключевые слова: высокоточное умножение; система остаточных классов; параллельные алгоритмы; модулярно-позиционный формат; вычисления общего назначения на графических процессорах; программнно-аппаратная архитектура параллельных вычислений.
Isupov K. S., Kuvaev A. S. HIGH-PRECISION PARALLEL FLOATING-РOINT MULTIPLICATION ON CPU AND GPU (pp. 42-50)
Abstract. Multiple-precision arithmetic is in demand in many modern computer applications such as satellite collision simulation, the high accuracy calculation of elliptic integrals, and applying the Inverse Laplace Transform for solving differential equations of fractional order. Since multiple-precision data types are not natively supported by general-purpose hardware, emulation of multiple-precision arithmetic implemented in software libraries is used. Due to the large amount of computations, performance is considered to be one of the most important requirements in multiple-precision applications. This leads to the necessity of using modern parallel computing systems, where multicore processors (CPUs) are often used alongside with graphics processing units (GPUs). However, existing multiple-precision libraries are mostly targeted at CPUs, and do not allow using GPUs for high-precision calculations. Therefore, the development of effective algorithms for high-precision arithmetic and the implementation of these algorithms on GPU are actual problems of modern high-performance computing. In this paper, a multiple-precision floating-point multiplication algorithm is presented in which the residue number system (RNS) is used to represent multiple-precision significands. In RNS the digits of number, called residues, have no weights and are mutually independent, which allows them to be processed in parallel and without carry propagation. We implemented the new multiple-precision multiplication algorithm in C language for CPU and GPU. For CPU, only a serial version of the algorithm is implemented. For GPU, the both serial and parallel versions of the algorithm are implemented using the NVIDIA CUDA architecture. For GPU-routines, two parallelization schemes are considered. The first scheme is based on the use of CUDA dynamic parallelism. In accordance with the second scheme, parallel GPU-routines are designed as device-functions, which are called from the main CUDA kernel. Experiments show that the new multiple-precision multiplication algorithm is well parallelized, and its performance is much higher than that of algorithms based on the binary system, which are implemented in known multiple-precision libraries.
Keywords: Multiple-precision multiplication; Residue number system; Parallel algorithms; Modular-positional floating-point representation; General-purpose compyting for graphcs processing units; Compute Unifid Device Architecture.
К. С. Исупов, А. С. Куваев (Вятский государственный университет, г. Киров, Россия) E-mail:
Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript
K. S. Isupov, A. S. Kuvaev (Vyatka State University, Kirov, Russia) E-mail:
Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript
1. High-precision Secure Computation of Satellite Collision Probabilities / B. Hemenway et al. // Lecture Notes in Computer Science. 2016. V. 9841. P. 169 – 187. 2. Leweke S., von Lieres E. Fast Arbitrary Order Moments and Arbitrary Precision Solution of the General Rate Model of Column Liquid Chromatography with Linear Isotherm // Computers & Chemical Engineering. 2016. V. 84. P. 350 – 362. 3. He K., Zhou X., Lin Q. High accuracy complete elliptic integrals for solving the Hertzian elliptical contact problems // Computers & Mathematics with Applications. 2017. V. 73, No. 1. P. 122 – 128. 4. Brzeziński D. W., Ostalczyk P. Numerical calculations accuracy comparison of the inverse Laplace transform algorithms for solutions of fractional order differential equations // Nonlinear Dynamics. 2016. V. 84, Is. 1. P 65 – 77. 5. Bailey D. H., Barrio R., Borwein J. M. High-Precision Computation: Mathematical Physics and Dynamics // Applied Mathematics and Computation. 2012. V. 218, No. 20. P. 10106 – 10121. 6. Bailey D. H., Borwein J. M. High-Precision Arithmetic: Progress and Challenges [Электронный ресурс]. URL: http://www.davidhbailey.com/dhbpapers/hp-arith.pdf (дата обращения 01.12.2017). 7. Brent R., Zimmermann P. Modern Computer Arithmetic. Cambridge: Cambridge University Press. 2010. 236 p. 8. Исупов К. С., Князьков В. С. Арифметика многократной точности на основе систем остаточных классов // Программные системы: теория и приложения. 2016. Т. 7, № 1(28). С. 61 – 97. 9. Parhami B. Computer Arithmetic: Algorithms and Hardware Designs. NY, Oxford: Oxford University Press. 2000. 510 p. 10. Mohan P.V.A. Residue Number Systems. Theory and Applications. Springer, Basel: Birkhäuser. 2016. 332 p. 11. Bajard J.-C., Eynard J., Merkiche N. Montgomery Reduction within the Context of Residue Number System Arithmetic // Journal of Cryptographic Engineering. 07 March 2017. P. 1–12. 12. Cardarilli G. C., Nannarelli A., Re M. RNS Applications in Digital Signal Processing // Molahosseini A., de Sousa L., Chang C. H. (Eds). Embedded Systems Design with Special Arithmetic and Number Systems. Cham: Springer, 2017. P. 181 – 215. 13. Younes D., Steffan P. Efficient Image Processing Application Using Residue Number System // Proc. of the 20th Intern. Conf. on Mixed Design of Integrated Circuits and Systems (MIXDES). Gdynia, 20 – 22 June 2013. Р. 468 – 472. 14. Multiple-Precision Residue-Based Arithmetic Library for Parallel CPU-GPU Architectures: Data Types and Features / K. Isupov et al. // Malyshkin V. (Eds) Parallel Computing Technologies. PaCT’2017. Lecture Notes in Computer Science. 2017. V. 10421. P. 196 – 204. 15. Isupov K., Knyazkov V. Interval Estimation of Relative Values in Residue Number System // Journal of Circuits, Systems and Computers. 2018. V. 27, No. 1. P. 1850004-1 – 1850004-36. doi: 10.1142/ S0218126618500044 16. Hauser J. R. Handling Floating-Point Exceptions in Numeric Programs // ACM Transactions on Programming Languages and Systems. 1996. V. 18, No 2. P. 139 – 174.. 17. Isupov K., Knyazkov V., Kuvaev A. Fast Power-of-Two RNS Scaling Algorithm for Large Dynamic Ranges // Proc. of the 2017 Intern. Conf. on Engineering and Telecommunication. Moscow, USA: Los Alamitos, IEEE Computer Society. 2017. P. 135 – 139. doi: 10.1109/ICEnT.2017.36 18. ARPREC: An Arbitrary Precision Computation Package / D. H. Bailey et al. 2002. 8 p. [Электронный ресурс]. URL: http://www. davidhbailey.com/dhbpapers/arprec.pdf (дата обращения: 29.11.2017) 19. Lu M., He B., Luo Q. Supporting High Precision on Graphics Processors [Электронный ресурс] // Proc. of the Sixth International Workshop on Data Management on New Hardware (DaMoN’2010). June 7, 2010, Indianapolis, Indiana. 8 p. URL: http://caxapa.ru/thumbs/355681/gpuprecision.pdf (дата обращения: 29.11.2017)
1. Hemenway B. et al. (2016). High-precision secure computation of satellite collision probabilities. Lecture Notes in Computer Science, 9841, pp. 169-187. 2. Leweke S., von Lieres E. (2016). Fast arbitrary order moments and arbitrary precision solution of the general rate model of column liquid chromatography with linear isotherm. Computers & Chemical Engineering, 84, pp. 350362. 3. He K., Zhou X., Lin Q. (2017). High accuracy complete elliptic integrals for solving the hertzian elliptical contact problems. Computers & Mathematics with Applications, 73(1), pp. 122128. 4. Brzeziński D. W., Ostalczyk P. (2016). Numerical calculations accuracy comparison of the inverse laplace transform algorithms for solutions of fractional order differential equations. Nonlinear Dynamics, 84(1), pp. 65-77. 5. Bailey D. H. (2012). High-precision computation: mathematical physics and dynamics. Applied Mathematics and Computation, 218(20), pp. 10106-10121. 6. Bailey D. H. (2013). High-precision arithmetic: progress and challenges. Available at: http://www. davidhbailey.com/dhbpapers/hp-arith.pdf (Accessed: 01.12.2017) 7. Brent R., Zimmermann P. (2010). Modern computer arithmetic. Cambridge, NY: Cambridge University Press. 8. Isupov K. S. (2016). Multiple precision arithmetic based on residual class systems. Programmnye sistemy: teoriya i prilozheniya, Vol. 7, 28(1), pp. 61-97. [in Russian language] 9. Parhami B. (2000). Computer arithmetic: algorithms and hardware designs. NY, USA: Oxford University Press. 10. Mohan P. A. (2016). Residue number systems. Theory and applications. Cham: Birkhäuser. 11. Bajard J.-C., Eynard J., Merkiche N. (2017). Montgomery reduction within the context of residue number system arithmetic. Journal of Cryptographic Engineering, pp. 1-12. 12. Cardarilli G. C., Nannarelli A., Re M. (2017). RNS applications in digital signal processing. Embedded Systems Design with Special Arithmetic and Number Systems. (pp. 181-215). Cham: Springer. 13. Younes D., Steffan P. (2013). Efficient image processing application using residue number system. Proceedings of the 20th Int. Conf. Mixed Design of Integrated Circuits and Systems (MIXDES). (pp. 468-472). Gdynia: IEEE. 14. Isupov K. et al. (2017). Multiple-precision residue-based arithmetic library for parallel CPU-GPU Architectures: data types and features. Lecture Notes in Computer Science, 10421, pp. 196-204. 15. Isupov K., Knyazkov V. (2018). Interval estimation of relative values in residue number system. Journal of Circuits, Systems and Computers, 27(1), pp. 1850004-1 – 1850004-36. 16. Hauser J. R. (1996). Handling floating-point exceptions in numeric programs. ACM Transactions on Programming Languages and Systems, 18(2), pp. 139-174. 17. Isupov K., Knyazkov V., Kuvaev A. (2017). Fast Power-of-Two RNS scaling algorithm for large dynamic ranges. Proceedings of the 2017 Int. Conf. Engineering and Telecommunication. (pp. 135-139). Moscow, USA: Los Alamitos, IEEE Computer Society. 18. Bailey D. H. et al. (2002). ARPREC: an arbitrary precision computation package. Available at: http://www.davidhbailey.com/dhbpapers/arprec.pdf (Accessed: 29.11.2017) 19. Supporting high precision on graphics processors. Available at: https://code.google.com/archive/ p/gpuprec/ (Accessed: 29.11.2017)
Статью можно приобрести в электронном виде (PDF формат).
Стоимость статьи 350 руб. (в том числе НДС 18%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.
После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.
Для заказа скопируйте doi статьи:
10.14489/vkit.2018.05.pp.042-050
и заполните форму
Отправляя форму вы даете согласие на обработку персональных данных.
.
This article is available in electronic format (PDF).
The cost of a single article is 350 rubles. (including VAT 18%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.
After depositing your payment on our bank account we send you file of the article by e-mail.
To order articles please copy the article doi:
10.14489/vkit.2018.05.pp.042-050
and fill out the form
.
|