This paper presents the middleware needed to deploy jobs to non-geographically colocated clusters with decentralized look-up severs. We have named our framework the Initium Remote Job Submission (RJS) system. Initium generates a jar file that is signed by a trusted certificate authority CA [Lyon]. The jar is run by a Computation Server (CS), (a remote computer running the Initium Computer Server Software). A Web Server (WS) has a Java Network Launch Protocol (JNLP) file that makes reference to a signed jar file on a web server. The signed jar file contains a job for the computation server. This jar file is called the computation jar. The computation jar is executed on the CS and the answer is sent back to the server using RMI over SSL (RMI/SSL). A look-up server (LUS) is used to register computation servers as they come on line. When a computation server is started, it registers with a LUS. The LUS then updates its list of computation servers, in the cluster. Look-up servers can be started using Java Web Start. They can be contacted with multicast protocols. 1 THE DISTRIBUTED COMPUTING SECURITY PROBLEM We seek to create a framework that exploits idle CPU’s on the network. We are subject to the constraint that our code must be downloaded over the Internet (that is, a normally insecure channel). Further that the configuration should be automatic. In addition, the code should be verified as originating from a trusted source. Finally, computed answers should be returned over a secure link to a Web Server (WS). The web server is able to hold “jobs” for deployment, and hold computed answers. 1.1 Approach Our approach for authentication and encryption is to use Java Secure Socket Extension (JSSE) to implement a Java technology version of Secure Socked Layer (SSL) protocols. This is integrated into the Java 2 SDK, Standard Edition v 1.4 [Sun 2004]. Our goal is to securely download a list of jobs to the LUS, and securely upload the answer jar to the WS. All LUS – CS commutation is on the LAN behind the firewall. Most grid systems use SSL for authentication, but by default do not establish encrypted communication in a secure manner [Globus 2]. Our approach requires that we encrypt all WS-LUS communication via a session key. In order to establish the session key, WS and LUS must agree on shared key, without sending any secret data in the clear. To do this, REMOTE JOB SUBMISSION SECURITY 14 JOURNAL OF OBJECT TECHNOLOGY VOL. 5 NO. 1 we use the RSA key exchange algorithm as a standard method for SSL key exchange. In RSA key exchange, the WS encrypts a number of random bytes with the LUS's public RSA key and they both use this shared secret to create the session keys. In order to guarantee the integrity of the messages, we use the Message-Digest 5 (MD5) algorithm. The MD5 algorithm is intended for digital signature applications, where a large file must be compressed in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA [Rivest]. The CS receives a URL to a jnlp file and downloads the Java Web Start “job” in order to compute it. The owner of the CS must accept the job owner certificate. Once the certificate is trusted, all jobs signed by it will be executed automatically. RSA algorithms ensure secure communication and MD5 message digest ensures that no one has altered the data in transit. An alternative approach to security might use the Kerberos network authentication protocol [Kerberos] or SSH protocol [OpenSSH]. However, such tools are vulnerable, as the attackers can gain access to user's password as it is being typed [Basney]. 1.2 Motivation Industrial computers are typically idle 14 or more hours per workday. If we assume that they are also idle on the weekend then they are idle 118 hours out of every 168 hours during the week (i.e., 70% of the time). We are motivated to address security issues in grid computing because an insecure network could be exploited by an advisory in order to gain access to data, destroy data, and steal logins or services (email, CPU cycles). Users will not agree to donate their CPU time without an assurance of security. A grid-based virus can spread faster than a normal virus, since it uses push technology to deploy itself and runs on-demand. Grids need to be protected not only because they are high-value assets representing lots of hardware and software, but also because they often serve a strategic function that's central to success [Myer].
Read full abstract