libgpucrypto

Introduction

libgpucrypto is subset of SSLShader software that implements few cryptographic algorithms: AES, SHA1, RSA using CUDA. It also includes several data structures to help utilize CUDA's stream for better performance. See here for more details.

Installation

(一)Install required libraries you can download CUDA stuff at http://developer.nvidia.com/cuda-toolkit-40 libgpucrypto requires CUDA dev driver, CUDA toolkit, and CUDA SDK.
We have tested under software settings as below.
CUDA 4.0 CUDA driver : 270.41.19 CUDA toolkit : 4.0.17 CUDA SDK : 4.0.17 CUDA 3.2 CUDA driver : 260.19.26 CUDA toolkit : 3.2.16 CUDA SDK : 3.2.16
O/S Ubuntu 10.04 LTS 64bit
(二)Install OpenSSL libraries and headers you can download OpenSSL at http://openssl.org/source/
(三)Configure following variables in Makefile.in OPENSSL_DIR CUDA_TOOLKIT_DIR CUDA_SDK_DIR
if you're using system default opeenssl development library, then you may leave OPENSSL_DIR as blank.
(四)Build libgpucrypto

#make

(五)Try running test code

#./bin/aes_test -m ENC
------------------------------------------
AES-128-CBC ENC, Size: 16KB
------------------------------------------
#msg latency(usec) thruput(Mbps)
   1          6012            21
   2          6305            41
   4          7020            74
   8          8737           120
  16         11834           177
  32         16168           259
  64         17244           486
 128         19256           871
 256         24579          1365
 512         27067          2479
1024         31605          4246
2048         40924          6559
4096         61402          8743
Correctness check (batch, random): .............OK
#./bin/rsa_test -m MP
-snip-
#./bin/sha_test
-snip-

you can see more detailed usage by running program w/o arguments or w/ incorrect one :).

How to use?

Here, I'll explain how to use libgpucrypto with an example of AES. Below is part of the code from aes_test.cc.

  
          device_context dev_ctx;
          pinned_mem_pool *pool;
          aes_enc_param_t param;
          operation_batch_t ops;
   
          //1. initialize device context
          dev_ctx.init(num_flows * flow_len * 3, 0);

          //2. create aes_context.
          aes_context aes_ctx(&dev_ctx);

          //generate test random test case
          gen_aes_cbc_data(&ops,
                           key_bits,
                           num_flows,
                           flow_len,
                           true);

          //3. prepare data to be encrypted
          pool = new pinned_mem_pool();
          pool->init(num_flows * flow_len * 3);

          aes_cbc_encrypt_prepare(&ops, &param, pool);

          //4. Launch GPU code
          aes_ctx.cbc_encrypt(param.memory_start,
                              param.in_pos,
                              param.key_pos,
                              param.ivs_pos,
                              param.pkt_offset_pos,
                              param.tot_in_len,
                              param.out,
                              param.num_flows,
                              param.tot_out_len,
                              0);

          //5. Wait for completion
          aes_ctx.sync(0);

(一) Initialize device_context: libgpucrypto has several wrapper for CUDA initialization and stream manipulation. To utilize libgpucrypto, you need to create device_context .
(二) Create aes_context: class aes_context provides APIs to launch GPU code using CUDA library. You need an initialized device_context for this.
(三) Prepare data to be encrypted: To use aes_context, you need to organize data and prepare some metadata. GPU requires large batch size to get maximum throughput and you need to copy data into GPU's memory before processing. Data copy cost between GPU's memory and host memory is relatively huge when you copy small amount of data. For this reason, we gather all data into one big buffer before passing to aes_context. Please read sample code aes_test.cc in test directory for details.
In the above example we used pinned_page to avoid another copy in CPU's memory. Before CUDA4.0, unless you allocate pinned page using CUDA, it will copy data into pinned page internally before copying into GPU. To avoid this we use pinned page explicitly.
We know it's not very friendly. We're working on improving the interface.
(四) Launch GPU code: aes_context will copy data into GPU's memory and launch GPU kernel.
(五) Wait for completion: sync function poll to check whether the GPU execution has finished, and it will copy data back to host memory once kernel execution is done. You can use this function in async manner to just check status. See here for more details.
Please see files in test directory for more examples.

Documentation

Doxygen API documentation

Source Code

SSLShader is in the process of being tech-transferred, and we no longer release the source code. Sorry for the inconvenience. Last modified: Tue, Oct 18, 2011 / Advanced Networking Lab and Networked & Distributed Computing Systems Lab