application repository
- name: Early and Late Stage Protein Folding
- domain: Biology/genomics
- country: Poland
- author: I. Roterman et al
- institute:
Jagellonian University Medical College (JU-CM)
- contacts:
myroterm@cyf-kr.edu.pl
- description: The application performs nucleotide sequence comparison of randomly created sequences of 10^7 polypeptide chains with the sequence of human genome and predicts 3-D structures for randomly created 10^7 sequences.
- functionalities:
The application includes two modules:
- Genomics analisis: performs nucleotide sequence comparison of randomly created sequences of 10^7 polypeptide chains with the sequence of human genome.
Similar sequences are identified and classified as: gene region (exon, intron), non-coding fragments.
Results (localization and characteristics of the locus) are interpreted in the aspect of evolution in context with sequences analyzed in the project.
- Ab initio Protein folding CMUJ: predicts the three-dimensional structure of proteins containing 70 amino acids in polypeptide chain according to a model assuming the presence of an early-stage (ES) and late-stage (LS) folding steps in folding process.
- middleware requirements:
- shared jobs control - In order to distribute the control of computation process to several persons, when computation covers large numbers of jobs, the applicatiom requires to control the access rights to jobs (list access right, cancellation and get-output access right) for different users of a same VO. [See also EGEE PTF #100809. (Status: none)]
- DAG execution over single CE - cTo enhance performance, depended jobs spawned as sub-jobs of DAG job should be spawned close to previous ones if needed, thus limiting large data transfers. The system should accept "co-location hints" to guarantee that given parts of a single DAG are dispatched to a single CE. [See also EGEE PTF #100792. (Status: none)]
- Job listing per user - In order to improve effectiveness of processes monitoring, each user should be enabled to list the jobs he/she submitted (even if he/she does not know the jobs IDs). The function should return a list of uncompleted jobs (running or output not retrieved). [See also EGEE PTF #100535 (Status: satisfied)]
- Data file listing per user -
Same than previous requirement, but for files registered in grid storage. The functionality should return a list of all files assigned to each user.
- Scalable relational database accessible from grid nodes -
Elements of input data and output data in the application are collected in relational database. Secure access to the database from working nodes, based on GSS, should be provided.
- Large number of files collected in hierarchical structure
- data from a large number of elementary experiments (10^7) require collecting in hierarchical structure. The structure should be reflected on grid storage in distributed way, but independent from physical location of files. Files should be browsed efficiently.
- resources requirements:
- computation stage 1: genomic analysis >>
- computation stage 2: Early stage protein folding >>
- computation stage 3: Late stage protein folding
>>
- computation stage 4: Measurements of irregularities
>>
- notes:
Back to list of applications >>