Input Amino Acid Sequences in FASTA format
There are two methods used to submit sequence(s) for AXP (Anti-Microbial Peptide (AMP)/Anti-Viral Peptide (AVP)/ Anti-Fungal Peptide (AFP)/Anti-Cancer Peptide (ACP)/ Anti-COVID Peptide (ACVP) with hemolysis) prediction (Figure.1):
1. Paste FASTA-format text in the area circled with the red line
2.Upload the FASTA file from the local disk folder via the button marked with the blue line.
•We ONLY accept FASTA-format text
Figure 1. GUI of AI4AXP.
Leave an Email address for system notification (Optional)
Usually, the time used for computation will not be too long. The user may wait until the result is available or leave a valid email address; the system will email the submitter when the result has come out.
1. Provide an email address and click the checkbox, 'Terms of Use' explains that we only use provided email address for notification use. Email addresses will not be stored in our backend for another use.
2. Press the Submit button, and please wait for the prediction came out.
Figure 2. key-in a validated email (optional) if needs, then read the "Terms of Use" and tick the checkbox.
How to Read the results
Figure 3. The prediction results of the demo FASTA file for each submitted sequence indicate their potential functions.
The figure above shows the prediction results of the demo FASTA file. The column "Peptide" lists the names of input peptides. The column "Score" shows prediction scores indicating how much probability a peptide contains AXP activity. The column "Prediction results" shows whether the peptide is an AXP or not. Here, the threshold is around 0.5. "YES" means the prediction score of the peptide is more than 0.5. Otherwise, "No" means the prediction score of the peptide is less than 0.5.
Meanwhile, for those submitted sequences with the length > 50 a.a., AI4AXP will cut the sequence(s) into several ones by a sliding window (windows size =49 aa and a move =25 aa).
Figure 4. Overall the result of demo
submission in piecharts as a global view.
The pie charts above show the visualization of the results in a global view (Figure 4). There are a total of 20 peptides in the demo FASTA file, and the prediction result shows the number of submitted peptides which may possess the functions of AMPs/ AVPs/ ACPs/AFPs/ ACVP with the ability of non-hemolytic activity. Submitted long peptides (>50 aa) and unrecognized sequences are visualized and counted in separated piecharts. However, it must be emphasized that most AVPs composed of essential amino acids with a length shorter than 50 amino acids.
Figure 5. The download area for each submission.
Users can download their prediction results in the "Download area." Click "Result" to download the prediction results represented in the CSV file. Here are also "Submission in fasta file," "Unrecognized fasta file," and "Logfile" provided as references.
Prediction |
encoding Methods |
Model |
Github |
Reference |
CNN+LSTM |
||||
CNN |
||||
CNN |
||||
Ensemble (RF+SVM+CNN) |
Submitted |
|||
ACVP |
CNN |
- |
This study |
|
Hemolysis |
CNN |
- |
This study |
PC6 protein-encoding method
Figure A1. Encoding the amino acids sequence according six physicochemical properties (PC6).
The idea of the PC6 encoding method is to use physicochemical properties as word embeddings. Each amino acid character in sequence would be replaced with a vector composed of six physicochemical property values (PC6). The six features of PC6 include hydrophobicity (H1), the volume of side chains (V), polarity (Pl), and pH at the isoelectric point (pI), the dissociation constant for the -COOH group (pKa), and the net charge index of the side chain (NCI) (detail in mSystems, 2021).
Deep Learning Models used in AI4AMP
FIG A2 (A) PC6 protein-encoding method. Each input sequence will be transformed into a 200 ¡Ñ 6 matrix, respectively. (B) Deep neural network model. The PC6 encoded data matrix will pass through one convolution layer, one LSTM layer, and one dense layer (mSystems, 2021).
Deep Learning Models used in AI4ACP
Figure A3. Model architecture in this study. After PC6 encoding, protein sequences will go through every layer in this model (Pharmaceuticals, 2022).
Deep Learning Models used in AI4AVP
Figure A4. The structure of models for GAN and Deep CNN used for AVP prediction (Bioinformatics Advances, 2022).
The model used for AI4AVP was proposed in our study, 'Developing an Antiviral Peptides Predictor with Generative Adversarial Network Data Augmentation' (BioRxiv, 2021).' We used a generative adversarial network (GAN) to generate a positive training dataset to allow the deep CNN classifier to train on balanced datasets. Then, we encoded peptides using the PC6 protein-encoding method and trained the AI4AVP model based on three CNN blocks.
Deep Learning Models used in AI4AFP
Figure A5. The ensemble model including two encoding and three ML/AI models for Anti-Fungal Peptide prediction.
Copyright © 2024 Institute of Information Science, Academia Sinica, TAIWAN. |
All Rights reserved. |