Home | Papers | Reports | Projects | Code Fragments | Dissertations | Presentations | Posters | Proposals | Lectures given | Course notes |
(Ortholog) Mapping the Applied Biosystems Human/Mouse Survey v2.0 Micro-arrays to Ensembl Gene IdentifiersWerner Van Belle1* - werner@yellowcouch.org, werner.van.belle@gmail.com Abstract : This document describes cross joining of gene tables between Celera's mouse genome identifiers, Celera human genome identifiers and the more useful Ensembl identifiers. The context in which this research is set are the genes FKRP and TAF4. By using siRNA's we interfered with the transcription and measured their effect upon the transcriptome. The Applied Biosystems 1700 micro-array scanner measured and reported the transcription quantities. Two micro-array types were used: the human genome survey v2.0 micro-arrays and the mouse genome survey v2.0 micro-arrays. Based on the different micro-array measurements we wanted to predict which proteins would be influenced in a cell system if we know the up/down regulation of the measured probes. To this end we wanted to use the human protein interaction map (as defines by Rual'06), which uses Ensembl annotated genes. This of course formed a major problem. First, the Applied Biosystems scanner does not export Ensembl gene annotations. Secondly, the human protein interaction map might not be a good model for a mouse micro-array, so we needed to go through various orthologs. This document tells two stories: first, and most annoyingly: how to get Ensembl identifiers into an Applied Biosystems micro-array. Secondly, and slightly more interesting, how to retrieve a mouse to human ortholog mapping from Ensembl.
Keywords:
ortholog gene mapping celere gene identifiers applied biosystems gene identifiers probe sequences |
In this document we work with three forms of information. First there is a local database, which we call FkrpTaf4. It will contain all the information we need to perform further analysis. Secondly there is the public accessible ensembldb. Most people know this through the web-interface. What is probably lesser known is that this database is also immediately accessible through SQL, which makes it optimal for our purposes (http://ensembldb.ensembl.org). The third set of information are local comma separated file (CSV), which are used a) to transfer information from the Ensembl database into our own FkrpTaf4 database and b) to import data from the Applied Biosystems into our FkrpTaf4 database).
For each query we do not mention whether something is a temporary table neither do we drop tables if they already exist. This is properly done in the actual SQL files, but since it is of little relevance to understanding what is happening we omit this information here. There is however one catch: if a table is a temporary table it cannot be reused within the same query. In that case, one might need to make a copy into a second temporary table. For each created table we also list a small example output that illustrates the contents of each table.
To execute a query on a database one can do the following mysql -user=werner -D FkrpTaf4 -A -batch <import-ab.sql
For the Ensembl database one can use mysql -user=anonymous -h ensembldb.ensembl.org -D homo_sapiens_core_44_36f -A -batch <swissprot2ensembl.sql >imports/swissprot2ensg.csv or mysql -user=anonymous -h ensembldb.ensembl.org -D mus_musculus_core_44_36e -A -batch <swissprot2ensembl.sql >imports/swissprot2ensmusg.csv
We measured the influence of FKRP and TAF4 on various cell systems through measuring the transcriptional changes with human and mouse genome survey arrays. Once this was done, we aimed to integrate this data into a protein interaction map as to find the proteins that are likely influenced mostly by the proteins of interest. We encountered some major obstacles to this approach. First, the Applied Biosystems 1700 scanner does not provide Ensembl annotated genes. Secondly, the Applied Biosystems 1700 scanner does not provide the probe sequences, making an automatic mapping to either the mouse or human genome more complicated than it should be and thirdly: exporting the Unigene/Swissprot annotated genes from the Applied Biosystems 1700 machine was prohibitively slow. In the end we exported all tables using a tedious 10 columns at a time approach. This lead to 6 tables that we could join afterward. One conducting high throughput proteomics will find this 'small bug' a major issue, since the export times become prohibitively long. In other words: aside from the fact that Applied Biosystems provides only 56% useful measurements, the Applied Biosystems 1700 scanner also seems rather unusable in high throughput settings.
To import the Applied Biosystems tables, we exported three different experiments in 6 separate files. We also had to clean out some spaces that were added in the allset1 data. When this operation is performed, we have three tables: FkrpSiRna1, FkrpSiRna2 and FkrpScrambled. The file import-ab contains all the details on the import process.
To illustrate the mechanism, we elaborate somewhat on allset1. The
target table must first be created, thereby reflecting the columns
as they occur in the original Applied Biosystems CSV tables. We also
introduce assay_name, probe_id, gene_name and sample_name as indices
since we later need to join on these columns. If we don't do this,
most operations will be extremely slow. The query below is ran on
the FkrpTaf4 schema.
CREATE TABLE allset1 (Assay_Name VARCHAR(128), INDEX (Assay_Name, Probe_ID, Gene_ID, Sample_Name), Row FLOAT, Col FLOAT, Probe_ID VARCHAR(128), Probe_Type TEXT, Gene_ID VARCHAR(128), X FLOAT, Y FLOAT, Assay_Normalized_Signal FLOAT, Signal FLOAT, CL_Sig FLOAT, CL_Raw FLOAT, SDEV FLOAT, CV FLOAT, S_N FLOAT, CL_Sig_Error FLOAT, CL_Raw_SDEV FLOAT, Flags INT, Sample_Name VARCHAR(128), id INT AUTO_INCREMENT PRIMARY KEY);Once the table is created we can import the data with:
LOAD DATA LOCAL INFILE 'imports/all-set1.ab.csv' INTO TABLE allset1;The query above ran on the FkrpTaf4 schema.
Of course, it would be nice if the sample_name could be compared
between table1 and table6. In practice, this could not be done since
alslet1 included a 10 character at the end of each sample name. To
get rid of those we needed the following update. The query below is
ran on the FkrpTaf4 schema.
# Some tables had still newline characters embedded in there sample_name # The allset1 table had sample names that ended with something unwanted # we resolved this by trimming them REPLACE allset1 SELECT Assay_name, row, col, probe_id, probe_type, gene_id, x, y, assay_normalized_signal, signal, cl_sig, cl_raw, sdev, cv, s_n, cl_sig_error, cl_raw_sdev, flags, Trim('\r' FROM sample_name), id FROM allset1;This gives a table consisting of
+-------------------------+------+------+----------+------------+--------------+ | Assay_name | Row | Col | Probe_id | Probe_type | gene_id | +-------------------------+------+------+----------+------------+--------------+ | HB00588 3/1/07 12:12 PM | 189 | 77 | 100002 | probe |hCG1643199.4 | | HB00588 3/1/07 12:12 PM | 45 | 39 | 100003 | probe |hCG2041918 | | HB00588 3/1/07 12:12 PM | 109 | 152 | 100027 | probe |hCG31426.2 | | HB00588 3/1/07 12:12 PM | 70 | 7 | 100036 | probe |hCG1979099.1 | | HB00588 3/1/07 12:12 PM | 173 | 129 | 100037 | probe |hCG42687.4 | | HB00588 3/1/07 12:12 PM | 114 | 54 | 100039 | probe |hCG2015782 | | HB00588 3/1/07 12:12 PM | 68 | 51 | 100044 | probe |hCG36953.3 | | HB00588 3/1/07 12:12 PM | 153 | 123 | 100045 | probe |hCG1776836.3 | | HB00588 3/1/07 12:12 PM | 75 | 157 | 100051 | probe |hCG1642464.3 | | HB00588 3/1/07 12:12 PM | 146 | 139 | 100052 | probe |hCG22993.3 | ... +---------+---------+-------------------------+---------+---------+--------+---------+ | x | y | assay_normalized_signal | signal | cl_sig |cl_raw | sdev | +---------+---------+-------------------------+---------+---------+--------+---------+ | 991.21 | 1135.54 | 161.22 | 171638 | 170314 |177617 | 1233.32 | | 575.93 | 875.26 | 0.15 | 194.56 | 140.64 |3674.4 | 194.56 | | 1792.77 | 1560.84 | 0.2 | 251.37 | -38.76 |4600.85 | 251.37 | | 232.78 | 1145.18 | 8.32 | 10512.5 | 11065.4 |16749.1 | 691.04 | | 1549.67 | 960.67 | 44.04 | 46886.9 | 45509.2 |51538.9 | 590.25 | | 740.1 | 1617.52 | 11.22 | 14174.1 | 13962.3 |20199.8 | 579.15 | | 706.61 | 1122.22 | 0.17 | 212.63 | -75.6 |4337.4 | 212.63 | | 1484.97 | 746.02 | 0.26 | 277.94 | -626.52 |4945.81 | 277.94 | | 1846.74 | 1194.41 | 0.23 | 291.73 | -496.82 |4405.79 | 291.73 | | 1655.39 | 670.39 | 0.58 | 621.79 | -64.09 |6495.53 | 621.79 | ... +-------+--------+--------------+-------------+-------+-------------+---+ | cv | s_n | CL_sig_error | CL_Raw_sdev | Flags | sample_name |id | +-------+--------+--------------+-------------+-------+-------------+---+ | 0.05 | 139.17 | 296.57 | 0 | 0 | II-1 |2 | | 1.14 | 0.88 | 103.51 | 0 | 1 | II-1 |3 | | 31.09 | -0.03 | 131.81 | 0 | 1 | II-1 |4 | | 0.08 | 15.21 | 173.15 | 0 | 0 | II-1 |5 | | 0.05 | 79.44 | 174.7 | 0 | 0 | II-1 |6 | | 0.06 | 24.47 | 189.76 | 0 | 0 | II-1 |7 | | 4.14 | -0.24 | 125.93 | 0 | 1 | II-1 |8 | | 0.51 | -1.96 | 98.86 | 0 | 1 | II-1 |9 | | 0.61 | -1.65 | 138.64 | 0 | 1 | II-1 |10 | | 5.63 | -0.18 | 113.36 | 0 | 1 | II-1 |11 | ...
The following query imports, among other things, the important mCG, Swissprot and Unigene identifiers into the FkrpTaf4 database. The query below is ran on the FkrpTaf4 schema.
CREATE TABLE allset6 (Assay_Name VARCHAR(128), Probe_ID VARCHAR(128), Gene_ID VARCHAR(128), Sample_Name VARCHAR(128), INDEX (Assay_Name, Probe_ID, Gene_ID, Sample_Name), Status TEXT, SwissProt TEXT, UniGene TEXT, dbEST TEXT, hCG TEXT, hCP TEXT, hCT TEXT, mCG TEXT, mCP TEXT, mCT TEXT, rCG TEXT, rCP TEXT, rCT TEXT); LOAD DATA LOCAL INFILE 'imports/all-set6.ab.csv' INTO TABLE allset6;This gives a table similar to:
+-------------------------+----------+--------------+-------------+-----------+ | assay_name | probe_id | gene_id | sample_name |status | +-------------------------+----------+--------------+-------------+-----------+ | HB00588 3/1/07 12:12 PM | 100002 | hCG1643199.4 | II-1 |pseudogene | | HB00588 3/1/07 12:12 PM | 100003 | hCG2041918 | II-1 |current | | HB00588 3/1/07 12:12 PM | 100027 | hCG31426.2 | II-1 |current | | HB00588 3/1/07 12:12 PM | 100036 | hCG1979099.1 | II-1 |current | | HB00588 3/1/07 12:12 PM | 100037 | hCG42687.4 | II-1 |current | | HB00588 3/1/07 12:12 PM | 100039 | hCG2015782 | II-1 |current | | HB00588 3/1/07 12:12 PM | 100044 | hCG36953.3 | II-1 |current | | HB00588 3/1/07 12:12 PM | 100045 | hCG1776836.3 | II-1 |current | | HB00588 3/1/07 12:12 PM | 100051 | hCG1642464.3 | II-1 |current | | HB00588 3/1/07 12:12 PM | 100052 | hCG22993.3 | II-1 |current | ... +------------------------------------+-----------+--------------------------------------------------- | swissprot | unigene | hcg +------------------------------------+-----------+--------------------------------------------------- | | | hCG1643199.4 | | | hCG2041918 | O95201;P13682;P17027;P51523;Q15776 | Hs.57679 | hCG31426.2 | Q92610 | Hs.368756 | hCG1820838.1;hCG1979099.1;hCG1989348;hCG1994281.1 | | Hs.302903 | hCG42687.4 | Q9UJX3 | Hs.530379 | hCG2015782 | | Hs.278954 | hCG36953.3 | | | hCG1776836.3 | | | hCG1642464.3 | | Hs.199068 | hCG22993.3 ...
We extract 3 different tables: FkrpSiRna1, FkrpSiRna2 and FkrpScrambled using the following SQL statements. Each table will have duplicate rows for specific genes. This is because they have also been measured multiple times. FkrpSiRna2 is smaller than the two others since we only had two slides and not three.
Allset1 and allset6 are imported in 2.1.1 and 2.1.2. The query below is ran on the FkrpTaf4 schema.
# The creation of distinct tables for the various FkRp experiments CREATE TABLE FkrpSiRna1 SELECT allset1.Gene_ID, Assay_Normalized_Signal FROM allset6, allset1 WHERE (allset6.Sample_Name="1-1" or allset6.Sample_Name="2-1" or allset6.Sample_Name="3-1") and allset6.sample_name=allset1.sample_name and allset1.Assay_Name=allset6.Assay_Name and allset1.gene_id=allset6.gene_id and allset1.Probe_ID=allset6.Probe_ID;
| Gene_ID | Assay_Normalized_Signal |
+-------+-------------+
| AK079773.1 | 1.58 |
| mCG9222.3 | 19.46 |
| mCG5316.2 | 0.31 |
| mCG1036527.1 | 6.28 |
| mCG1050139 | 0.2 |
| mCG121612 | 36.77 |
| mCG142727 | 0.14 |
| mCG130331.1 | 0.35 |
| mCG1045481.1 | 1.99 |
| mCG141353 | 0.34 |
Allset1 and allset6 are imported in 2.1.1
and 2.1.2.
The query below is ran on the FkrpTaf4 schema.
CREATE TABLE FkrpSiRna2 SELECT allset1.Gene_ID, Assay_Normalized_Signal FROM allset6, allset1 WHERE (allset6.Sample_Name="1-2" or allset6.Sample_Name="2-2") and allset6.sample_name=allset1.sample_name and allset1.Assay_Name=allset6.Assay_Name and allset1.gene_id=allset6.gene_id and allset1.Probe_ID=allset6.Probe_ID;
+-------+-------------+
| Gene_ID | Assay_Normalized_Signal |
+-------+-------------+
| AK079773.1 | 1.33 |
| mCG9222.3 | 15.48 |
| mCG5316.2 | 0.19 |
| mCG1036527.1 | 4 |
| mCG1050139 | 0.2 |
| mCG121612 | 40.8 |
| mCG142727 | 0.21 |
| mCG130331.1 | 0.45 |
| mCG1045481.1 | 6.6 |
| mCG141353 | 0.33 |
Allset1 and allset6 are imported in 2.1.1 and 2.1.2. The query below is ran on the FkrpTaf4 schema.
CREATE TABLE FkrpScrambled SELECT allset1.Gene_ID, Assay_Normalized_Signal FROM allset6, allset1 WHERE (allset6.Sample_Name="1-3" or allset6.Sample_Name="2-3" or allset6.Sample_Name="3-3") and allset6.sample_name=allset1.sample_name and allset1.Assay_Name=allset6.Assay_Name and allset1.gene_id=allset6.gene_id and allset1.Probe_ID=allset6.Probe_ID;
| Gene_ID | Assay_Normalized_Signal |
+-------+-------------+
| AK079773.1 | 1.67 |
| mCG9222.3 | 14.78 |
| mCG5316.2 | 0.96 |
| mCG1036527.1 | 2.03 |
| mCG1050139 | 0.12 |
| mCG121612 | 40.13 |
| mCG142727 | 0.13 |
| mCG130331.1 | 0.26 |
| mCG1045481.1 | 1.93 |
| mCG141353 | 0.23 |
The query below is ran on the FkrpTaf4 schema.
CREATE TABLE IF NOT EXISTS Taf4SiRnaHela SELECT allset1.Gene_ID, Assay_Normalized_Signal FROM allset6, allset1 WHERE (allset6.Sample_Name="I-1" or allset6.Sample_Name="I-2" or allset6.Sample_Name="I-3" or allset6.Sample_Name="I-4") and allset6.sample_name=allset1.sample_name and allset1.Assay_Name=allset6.Assay_Name and allset1.gene_id=allset6.gene_id and allset1.Probe_ID=allset6.Probe_ID;
| Gene_ID | Assay_Normalized_Signal |
+-------+-------------+
| hCG2041918 | 0.08 |
| hCG31426.2 | 0.39 |
| hCG1979099.1 | 7.13 |
| hCG42687.4 | 40.41 |
| hCG2015782 | 9.79 |
| hCG36953.3 | 0.08 |
| hCG1776836.3 | 0.37 |
| hCG1642464.3 | 0.21 |
| hCG22993.3 | 0.17 |
| hCG1793655.1 | 3.59 |
The query below is ran on the FkrpTaf4 schema.
CREATE TABLE IF NOT EXISTS Taf4ScrambledHela SELECT allset1.Gene_ID, Assay_Normalized_Signal FROM allset6, allset1 WHERE (allset6.Sample_Name="II-1" or allset6.Sample_Name="II-2" or allset6.Sample_Name="II-3") and allset6.sample_name=allset1.sample_name and allset1.Assay_Name=allset6.Assay_Name and allset1.gene_id=allset6.gene_id and allset1.Probe_ID=allset6.Probe_ID;
| Gene_ID | Assay_Normalized_Signal |
+-------+-------------+
| hCG2041918 | 0.15 |
| hCG31426.2 | 0.2 |
| hCG1979099.1 | 8.32 |
| hCG42687.4 | 44.04 |
| hCG2015782 | 11.22 |
| hCG36953.3 | 0.17 |
| hCG1776836.3 | 0.26 |
| hCG1642464.3 | 0.23 |
| hCG22993.3 | 0.58 |
| hCG1793655.1 | 4.2 |
The query below is ran on the FkrpTaf4 schema.
CREATE TABLE IF NOT EXISTS Taf4SiRnaSkndz SELECT allset1.Gene_ID, Assay_Normalized_Signal FROM allset6, allset1 WHERE (allset6.Sample_Name="Si I" or allset6.Sample_Name="Si II" or allset6.Sample_Name="Si III" or allset6.Sample_Name="Si IV") and allset6.sample_name=allset1.sample_name and allset1.Assay_Name=allset6.Assay_Name and allset1.gene_id=allset6.gene_id and allset1.Probe_ID=allset6.Probe_ID;
| Gene_ID | Assay_Normalized_Signal |
+-------+-------------+
| hCG2041918 | 0.16 |
| hCG31426.2 | 0.12 |
| hCG1979099.1 | 8.83 |
| hCG42687.4 | 53.12 |
| hCG2015782 | 14.13 |
| hCG36953.3 | 0.11 |
| hCG1776836.3 | 0.22 |
| hCG1642464.3 | 0.14 |
| hCG22993.3 | 0.54 |
| hCG1793655.1 | 1.02 |
The query below is ran on the FkrpTaf4 schema.
CREATE TABLE IF NOT EXISTS Taf4ScrambledSkndz SELECT allset1.Gene_ID, Assay_Normalized_Signal FROM allset6, allset1 WHERE (allset6.Sample_Name="Scr I" or allset6.Sample_Name="Scr II" or allset6.Sample_Name="Scr III" or allset6.Sample_Name="Scr IV") and allset6.sample_name=allset1.sample_name and allset1.Assay_Name=allset6.Assay_Name and allset1.gene_id=allset6.gene_id and allset1.Probe_ID=allset6.Probe_ID;
| Gene_ID | Assay_Normalized_Signal |
+-------+-------------+
| hCG2041918 | 0.55 |
| hCG31426.2 | 0.16 |
| hCG1979099.1 | 10.38 |
| hCG42687.4 | 56.67 |
| hCG2015782 | 11.33 |
| hCG36953.3 | 0.3 |
| hCG1776836.3 | 0.31 |
| hCG1642464.3 | 1.23 |
| hCG22993.3 | 0.32 |
| hCG1793655.1 | 1.61 |
To integrate the human protein interaction map into the previously mentioned network, we need to go through the external annotated identifiers. The Applied Biosystems 1700 scanner can produce tables that include the Unigene and Swissprot identifiers. Based on these we can find the Ensembl genes and thus link them back to our hCG identifiers.
To make this work we first need to find various sources of data in the Ensembl database.
The query below can be ran on the homo_sapiens_core_44_36f ensembldb schema or on the mus_musculus_core_44_36e schema. Depending on the choice of database we map ENSG... or ENSMUSG... identifiers to their description.
| stable_id | description |
+---------+----------------------------------------------------------------------------------------------------------+
| ENSG00000129824 | 40S ribosomal protein S4, Y isoform 1. [Source:Uniprot/SWISSPROT;Acc:P22090] |
| ENSG00000067646 | Zinc finger Y-chromosomal protein. [Source:Uniprot/SWISSPROT;Acc:P08048] |
| ENSG00000176679 | Homeobox protein TGIF2LY (TGFB-induced factor 2-like protein, Y- linked) (TGF(beta)induced transcription factor 2-like protein) (TGIF- like on the Y). [Source:Uniprot/SWISSPROT;Acc:Q8IUE0] | | ENSG00000099715 | Protocadherin-11 Y-linked precursor (Protocadherin-11) (Protocadherin- 22) (Protocadherin on the Y chromosome) (PCDH-Y) (Protocadherin prostate cancer) (Protocadherin-PC). [Source:Uniprot/SWISSPROT;Acc:Q9BZA8] | | ENSG00000173394 | NULL |
| ENSG00000168757 | testis specific protein, Y-linked 2 [Source:RefSeq_peptide;Acc:NP_072095] |
| ENSG00000186406 | RNA binding motif (Fragment). [Source:Uniprot/SPTREMBL;Acc:Q13381] |
| ENSG00000129816 | testis-specific transcript, Y-linked 1 (TTTY1) on chromosome Y [Source:RefSeq_dna;Acc:NR_001538] |
| ENSG00000197285 | testis-specific transcript, Y-linked 2 (TTTY2) on chromosome Y [Source:RefSeq_dna;Acc:NR_001536] |
| ENSG00000206198 | testis-specific transcript, Y-linked 21 (TTTY21) on chromosome Y [Source:RefSeq_dna;Acc:NR_001535] |
| stable_id | description |
+----------+------------------------------------------------------------+
| ENSMUSG00000053211 | zinc finger protein 2, Y linked [Source:MarkerSymbol;Acc:MGI:99213] |
| ENSMUSG00000068457 | ubiquitously transcribed tetratricopeptide repeat gene, Y chromosome [Source:MarkerSymbol;Acc:MGI:894810] |
| ENSMUSG00000069053 | Ubiquitin-activating enzyme E1 Y (Ubiquitin-activating enzyme E1). [Source:Uniprot/SWISSPROT;Acc:P31254] |
| ENSMUSG00000056673 | jumonji, AT rich interactive domain 1D (Rbp2 like) [Source:MarkerSymbol;Acc:MGI:99780] |
| ENSMUSG00000069049 | eukaryotic translation initiation factor 2, subunit 3, structural gene Y-linked [Source:MarkerSymbol;Acc:MGI:1349430] |
| ENSMUSG00000069045 | DEAD (Asp-Glu-Ala-Asp) box polypeptide 3, Y-linked [Source:MarkerSymbol;Acc:MGI:1349406] |
| ENSMUSG00000069044 | ubiquitin specific peptidase 9, Y chromosome [Source:MarkerSymbol;Acc:MGI:1313274] |
| ENSMUSG00000069618 | RIKEN cDNA 1700012B15 gene [Source:MarkerSymbol;Acc:MGI:1921423] |
| ENSMUSG00000020671 | RAB10, member RAS oncogene family [Source:MarkerSymbol;Acc:MGI:105066] |
| ENSMUSG00000075505 | NULL |
To create a mapping from a Swissprot identifier to an Ensembl identifier requires us to join the stable_gene_id, gene and xref tables. In addition, it seems that sometimes multiple mappings are necessary. Namely, an external identifier can refer to a transcript or to the gene immediately. To resolve this we need the union of 2 queries.
This is a query to map the external id immediately onto the gene. The query below can be ran on the homo_sapiens_core_44_36f ensembldb schema or on the mus_musculus_core_44_36e schema.
| dbprimary_acc | stable_id | description |
+--------+----------+--------------------------------------------------------------------------------------------------------------------------------+
| P46425 | ENSMUSG00000038155 | Glutathione S-transferase P 2 (EC 2.5.1.18) (GST YF-YF) (GST-piA) (GST class-pi) (Gst P2). [Source:Uniprot/SWISSPROT;Acc:P46425] |
| Q8K2L9 | ENSMUSG00000033450 | T-cell activation Rho GTPase-activating protein (T-cell activation GTPase-activating protein). [Source:Uniprot/SWISSPROT;Acc:Q8K2L9] |
| P83882 | ENSMUSG00000049751 | 60S ribosomal protein L36a (60S ribosomal protein L44). [Source:Uniprot/SWISSPROT;Acc:P83882] |
| P05531 | ENSMUSG00000054626 | X-linked lymphocyte-regulated protein PM1. [Source:Uniprot/SWISSPROT;Acc:P05531] |
| Q9WV98 | ENSMUSG00000021079 | Mitochondrial import inner membrane translocase subunit Tim9. [Source:Uniprot/SWISSPROT;Acc:Q9WV98] |
| Q6IE32 | ENSMUSG00000060201 | Serine protease inhibitor Kazal-type 7 precursor (Esophagus cancer- related gene 2 protein) (ECRG-2). [Source:Uniprot/SWISSPROT;Acc:Q6IE32] |
| O88574 | ENSMUSG00000031609 | Histone deacetylase complex subunit SAP30 (Sin3-associated polypeptide, 30 kDa) (Sin3 corepressor complex subunit SAP30). [Source:Uniprot/SWISSPROT;Acc:O88574] |
| Q9JI46 | ENSMUSG00000024213 | Diphosphoinositol polyphosphate phosphohydrolase 1 (EC 3.6.1.52) (DIPP-1) (muDIPP1) (Diadenosine 5',5'''-P1,P6-hexaphosphate hydrolase 1) (EC 3.6.1.-) (Nucleoside diphosphate-linked moiety X motif 3) (Nudix motif 3). [Source:Uniprot/SWISSPROT;Acc:Q9JI46] |
| Q03740 | ENSMUSG00000070870 | Gamma crystallin E. [Source:Uniprot/SWISSPROT;Acc:Q03740] |
| P27545 | ENSMUSG00000055694 | LAG1 longevity assurance homolog 1 (UOG-1 protein). [Source:Uniprot/SWISSPROT;Acc:P27545] |
This query maps the external id to its transcript, which is then mapped onto its producing gene. The query below can be ran on the homo_sapiens_core_44_36f ensembldb schema or on the mus_musculus_core_44_36e schema.
| dbprimary_acc | stable_id | description |
+--------+----------+-----------------------------------------------------------------------+
| P46425 | ENSMUSG00000038155 | Glutathione S-transferase P 2 (EC 2.5.1.18) (GST YF-YF) (GST-piA) (GST class-pi) (Gst P2). [Source:Uniprot/SWISSPROT;Acc:P46425] |
| Q8BKE5 | ENSMUSG00000066307 | RIKEN cDNA E130016E03 gene [Source:MarkerSymbol;Acc:MGI:2444973] |
| Q8K2L9 | ENSMUSG00000033450 | T-cell activation Rho GTPase-activating protein (T-cell activation GTPase-activating protein). [Source:Uniprot/SWISSPROT;Acc:Q8K2L9] |
| P83882 | ENSMUSG00000049751 | 60S ribosomal protein L36a (60S ribosomal protein L44). [Source:Uniprot/SWISSPROT;Acc:P83882] |
| P05531 | ENSMUSG00000054626 | X-linked lymphocyte-regulated protein PM1. [Source:Uniprot/SWISSPROT;Acc:P05531] |
| Q9WV98 | ENSMUSG00000021079 | Mitochondrial import inner membrane translocase subunit Tim9. [Source:Uniprot/SWISSPROT;Acc:Q9WV98] |
| Q9WV98 | ENSMUSG00000021079 | Mitochondrial import inner membrane translocase subunit Tim9. [Source:Uniprot/SWISSPROT;Acc:Q9WV98] |
| Q9WV98 | ENSMUSG00000021079 | Mitochondrial import inner membrane translocase subunit Tim9. [Source:Uniprot/SWISSPROT;Acc:Q9WV98] |
| Q6IE32 | ENSMUSG00000060201 | Serine protease inhibitor Kazal-type 7 precursor (Esophagus cancer- related gene 2 protein) (ECRG-2). [Source:Uniprot/SWISSPROT;Acc:Q6IE32] |
| Q80WG5 | ENSMUSG00000007476 | phytanoyl-CoA dioxygenase domain containing 1 [Source:MarkerSymbol;Acc:MGI:3612860] |
Unigene identifiers are mapped as well through their transcripts but in a slightly different manner. By executing the following statement on the Ensembl database homo_sapiens_core_44_36f we obtain the unigene2ensg mapping (All stable_id's will be of the form ENSG000...). If we execute the query on the mus_musculus_core_44_36e database we obtain the unigene2ensmusg mapping (all id's will be of the form ENSMUSG000...).
| unigene | ensembl |
+------+----------+
| Mm.27038 | ENSMUSG00000036083 |
| Mm.39752 | ENSMUSG00000021573 |
| Mm.41636 | ENSMUSG00000002733 |
| Mm.76494 | ENSMUSG00000028528 |
| Mm.253378 | ENSMUSG00000043556 |
| Mm.260194 | ENSMUSG00000020474 |
| Mm.268582 | ENSMUSG00000071256 |
| Mm.317248 | ENSMUSG00000066443 |
| Mm.317248 | ENSMUSG00000041453 |
| Mm.390885 | ENSMUSG00000014077 |
One might assume now that we can simply link the mCG identifiers through their Swissprot or Unigene link to the Ensembl human gene using one of the above tables. That is however incorrect. Such a table will be empty since no Swissprot identifier nor Unigene identifier from the human genome is reused in the mouse genome. Instead we need to go through an ortholog mapping. We start out with creating a mCG2ensmusg table first. This is of course again more tricky than it looks at first sight. The problem that we have now is that the list of Swissprot identifiers linked to each mCG gene is separated with semicolons (;). To find them we must thus work the other way around: create a set of all the Swissprot fuckers we have and find them back inside the mCG tables. To reduce calculation time we first create a small table containing all the mCG|Swissprot links
By appending a ';' to the end of each Swissprot identifier list, we are sure that every one of our Swissprot identifiers will be found when we attach a ; to it as well. If we don't do this we might find MM123 back in the list 'MM1234; MM44'. All that is left now is to make the joins of the swissport2ensmusg vs mcg2ensmusg and unigen2ensmusg vs mcg2ensmusg. The query below is ran in the FkrpTaf4 database.
| Gene_ID | swissprot | unigene |
+-------+-------------------+------+
| hCG31426.2 | O95201;P13682;P17027;P51523;Q15776; | Hs.57679; |
| hCG1979099.1 | Q92610; | Hs.368756; |
| hCG42687.4 | ; | Hs.302903; |
| hCG2015782 | Q9UJX3; | Hs.530379; |
| hCG36953.3 | ; | Hs.278954; |
| hCG22993.3 | ; | Hs.199068; |
| hCG33215.3 | ; | Hs.30011; |
| hCG22998.3 | ; | Hs.294009; |
| hCG2039675 | Q8IUX4;Q9UH17; | Hs.337667; |
| hCG14966.2 | O14944; | Hs.115263; |
This table creates a mapping from the various mCG identifiers we found to their associated ensmusg identifier. It relies on the mCGBlurb table (3.2.1) created before (that table contains multiple Swissprot/Unigene identifiers in a semicolon separated/terminated list. To find back which Swissprot/Unigene identifiers occur in each mCGBlurb field we search for each of them in turn. This is less than optimal and could be optimized. However, the query only takes 7 minutes, so I don't care that much (at the moment). The Swissprot2Ensmusg table was imported in section 5.3.1. The Unigene2Ensmusg table was imported in section 5.3.2. The query below is ran in the FkrpTaf4 database.
| Gene_ID | Ensembl |
+-------+----------+
| mCG126572.1 | ENSMUSG00000062203 |
| mCG141162 | ENSMUSG00000049152 |
| mCG113184.1 | ENSMUSG00000050876 |
| mCG132220.1 | ENSMUSG00000030137 |
| mCG19273.2 | ENSMUSG00000028465 |
| mCG5925.1 | ENSMUSG00000053560 |
| mCG141342 | ENSMUSG00000051048 |
| mCG1031868.1 | ENSMUSG00000048355 |
| mCG1036470.1 | ENSMUSG00000038077 |
| mCG49016.1 | ENSMUSG00000025795 |
In order to integrate the FKRP results we needed to map the mouse genes to the human genome, thereby respecting the function the different genes preform. inter-species genes with the same function are called orthologs. Ensembl provides a comparative database of genes between different species. However automatically mapping one onto the other was not as straightforward as one would expect. Theoretically one could write a query that would take all the stably annotated mouse genes, find them back in the homology table, determine the homology family and then find the human gene within that same family. The major problem that we encountered was that the Ensembl database has over 31'000'000 homology members, making straightforward joins of various tables a less than optimal solution. We optimized the querying using the following tricks.
Below we assume that we have a list of gene identifiers (form ENSMUSG...123) in the tomap.mouse_id column. The goal now is to create a new mapping from all these mouse_id's to human gene_ids. The tomap table is in our case defined as all potential mouse Ensembl id. Joining the stable_id's from Swissprot2mid (3.1.2) and Unigene2mid (3.1.3) provides us with the necessary things.
| mouse_id |
+----------+
| ENSMUSG00000000028 |
| ENSMUSG00000000031 |
| ENSMUSG00000000037 |
| ENSMUSG00000000056 |
| ENSMUSG00000000058 |
| ENSMUSG00000000078 |
| ENSMUSG00000000088 |
| ENSMUSG00000000093 |
| ENSMUSG00000000103 |
This table will list all the homolog members that belong to the mouse family for which we are interested in the mapping. The tomap table is created in 4.1.
| member_id | stable_id |
+------+----------+
| 828231 | ENSMUSG00000000028 |
| 338749 | ENSMUSG00000000031 |
| 682679 | ENSMUSG00000000037 |
| 1073599 | ENSMUSG00000000056 |
| 157512 | ENSMUSG00000000058 |
| 780907 | ENSMUSG00000000078 |
| 877703 | ENSMUSG00000000088 |
| 1052453 | ENSMUSG00000000093 |
| 667029 | ENSMUSG00000000103 |
| 1056719 | ENSMUSG00000000120 |
This table will be used as a starting point to find the homologies we are interested in. Afterward we will compare all the members of the homologies we like to a second table of human_homologs.
This table lists all possible target homologs (in our case, all stable gene members of the human genome)
| member_id | stable_id |
+------+---------+
| 3 | ENSG00000198763 |
| 5 | ENSG00000198804 |
| 7 | ENSG00000198712 |
| 9 | ENSG00000198744 |
| 11 | ENSG00000198899 |
| 13 | ENSG00000198938 |
| 15 | ENSG00000198840 |
| 17 | ENSG00000198868 |
| 19 | ENSG00000198886 |
| 21 | ENSG00000198786 |
| member_id | stable_id |
+------+----------+
| 338098 | ENSG00000168394 |
| 338108 | ENSMUSG00000025147 |
| 338125 | ENSG00000204261 |
| 338142 | ENSG00000204259 |
| 338145 | ENSMUSG00000043186 |
| 338185 | ENSMUSG00000037887 |
| 338194 | ENSG00000204258 |
| 338215 | ENSG00000204257 |
| 338272 | ENSG00000204256 |
| 338283 | ENSMUSG00000073786 |
The homologs that could be interesting are those that are no paralogs and those that have members in either the human gene or in the mouse gene. The first constraint is implemented by going through the homology compara table. The second constraint is implemented using the homolog_members table we made before.
| homology_id | member_id |
+-------+------+
| 3097693 | 1 |
| 3097793 | 1 |
| 3097840 | 1 |
| 3097935 | 1 |
| 3097966 | 1 |
| 3097987 | 1 |
| 3098140 | 1 |
| 3098283 | 1 |
| 3098381 | 1 |
| 3098403 | 1 |
| 3098478 | 1 |
| 3098500 | 1 |
| 5290964 | 3 |
| 5291000 | 3 |
| 5291061 | 3 |
| 5291114 | 3 |
| 5291145 | 3 |
| 5291260 | 3 |
| 5291274 | 3 |
| 5291346 | 3 |
| 5291503 | 3 |
| 5291563 | 3 |
| 5291571 | 3 |
| 712679 | 5 |
| 712702 | 5 |
| 712865 | 5 |
| 712903 | 5 |
| 712931 | 5 |
| 712987 | 5 |
| 713089 | 5 |
| 713115 | 5 |
We are only interested in orthologs (4.5) that include a mouse homology (4.2).
| stable_id | homology_id |
+----------+-------+
| ENSMUSG00000000001 | 20625752 |
| ENSMUSG00000000001 | 20625974 |
| ENSMUSG00000000001 | 20636886 |
| ENSMUSG00000000001 | 20639601 |
| ENSMUSG00000000001 | 20640068 |
| ENSMUSG00000000001 | 20640705 |
| ENSMUSG00000000001 | 20643921 |
| ENSMUSG00000000001 | 20646072 |
| ENSMUSG00000000001 | 20646940 |
| ENSMUSG00000000001 | 20649601 |
and check whether they occur in the human_homolog members. This is done by joining the previous calculated 'really interesting homologs' (4.6) with the orthologs (4.5) and the human_homologs (4.3).
| mouse | human |
+----------+---------+
| ENSMUSG00000000028 | ENSG00000093009 |
| ENSMUSG00000000056 | ENSG00000141562 |
| ENSMUSG00000000058 | ENSG00000105971 |
| ENSMUSG00000000078 | ENSG00000067082 |
| ENSMUSG00000000088 | ENSG00000178741 |
| ENSMUSG00000000093 | ENSG00000121068 |
| ENSMUSG00000000103 | ENSG00000005889 |
| ENSMUSG00000000103 | ENSG00000067646 |
| ENSMUSG00000000120 | ENSG00000064300 |
| ENSMUSG00000000125 | ENSG00000108379 |
| ENSMUSG00000000126 | ENSG00000143816 |
| ENSMUSG00000000127 | ENSG00000151422 |
| ENSMUSG00000000131 | ENSG00000169180 |
| ENSMUSG00000000142 | ENSG00000168646 |
| ENSMUSG00000000148 | ENSG00000106009 |
| ENSMUSG00000000149 | ENSG00000146535 |
| ENSMUSG00000000154 | ENSG00000110628 |
| ENSMUSG00000000157 | ENSG00000160255 |
| ENSMUSG00000000159 | ENSG00000183067 |
| ENSMUSG00000000168 | ENSG00000150768 |
| ENSMUSG00000000171 | ENSG00000204370 |
| ENSMUSG00000000182 | ENSG00000118972 |
| ENSMUSG00000000183 | ENSG00000111241 |
| ENSMUSG00000000184 | ENSG00000118971 |
| ENSMUSG00000000194 | ENSG00000148358 |
| ENSMUSG00000000204 | ENSG00000172123 |
| ENSMUSG00000000204 | ENSG00000205045 |
| ENSMUSG00000000216 | ENSG00000166828 |
| ENSMUSG00000000223 | ENSG00000102385 |
| ENSMUSG00000000244 | ENSG00000064201 |
Our aim is to create a table that describes for each Ensembl gene the measured up/down regulations.
| gene_id | signal |
+---------+-----+
| 4930503K07 | 1.78667 |
| AB041802.1 | 1.55333 |
| AB045716.1 | 0.183333 |
| AB076245.1 | 0.283333 |
| AB080658.1 | 0.183333 |
| AB091827.1 | 8.20667 |
| AB099818.1_CDS_3 | 13.0667 |
| AF014450.1 | 1.05667 |
| AF045504.1 | 0.433333 |
| AF059259.1 | 0.23 |
The above query make the scrambled average table, which contains for each gene (annotated as mCG... the average signal)
| gene_id | signal |
+---------+----+
| 4930503K07 | 1.746 |
| AB041802.1 | 1.844 |
| AB045716.1 | 0.202 |
| AB076245.1 | 0.376 |
| AB080658.1 | 0.27 |
| AB091827.1 | 6.316 |
| AB099818.1_CDS_3 | 18.152 |
| AF014450.1 | 0.93 |
| AF045504.1 | 0.44 |
| AF059259.1 | 0.442 |
This table calculates the average FkRp SiRNA signal intensity, based on the measurements of the scrambled SiRna (5.1) and non-scrambled SiRNA (5.2).
| gene_id | ratio |
+---------+-----+
| 4930503K07 | 0.977239 |
| AB041802.1 | 1.18712 |
| AB045716.1 | 1.10182 |
| AB076245.1 | 1.32706 |
| AB080658.1 | 1.47273 |
| AB091827.1 | 0.769618 |
| AB099818.1_CDS_3 | 1.38918 |
| AF014450.1 | 0.880126 |
| AF045504.1 | 1.01538 |
| AF059259.1 | 1.92174 |
Before we can replace mCG identifiers with Ensembl identifiers we need to import the mappings we created early on. We first need the mouse to ensmusg table and then the ensmsug2enseg table. Both can be imported fairly straightforward from the imports as before:
This query imports the data from the csv file we obtained when querying the Ensembl database for the Swissprot2Ensmusg mapping (5.3.1).
LOAD DATA LOCAL INFILE 'imports/swissprot2ensmusg.csv'
INTO TABLE Swissprot2Ensmusg;
| swissprot | ensembl |
+------+----------+
| P31254 | ENSMUSG00000069053 |
| P46425 | ENSMUSG00000038155 |
| Q8K2L9 | ENSMUSG00000033450 |
| P83882 | ENSMUSG00000049751 |
| P05531 | ENSMUSG00000054626 |
| Q9WV98 | ENSMUSG00000021079 |
| Q6IE32 | ENSMUSG00000060201 |
| O88574 | ENSMUSG00000031609 |
| Q9JI46 | ENSMUSG00000024213 |
| Q03740 | ENSMUSG00000070870 |
This query imports the data from the csv file we obtained when querying the Ensembl database for the Unigen2Ensmusg mapping (3.1.3). The query below is ran in the FkrpTaf4 database.
LOAD DATA LOCAL INFILE 'imports/unigene2ensmusg.csv'
INTO TABLE Unigene2Ensmusg;
| unigene | ensembl |
+------+----------+
| Mm.10721 | ENSMUSG00000024963 |
| Mm.27038 | ENSMUSG00000036083 |
| Mm.39752 | ENSMUSG00000021573 |
| Mm.41636 | ENSMUSG00000002733 |
| Mm.76494 | ENSMUSG00000028528 |
| Mm.253378 | ENSMUSG00000043556 |
| Mm.260194 | ENSMUSG00000020474 |
| Mm.268582 | ENSMUSG00000071256 |
| Mm.317248 | ENSMUSG00000066443 |
| Mm.390885 | ENSMUSG00000014077 |
The final join links the mcg2ensmusg table over the mouse to human orthologs. First, importing of the ensmusg2ensg table (downloaded from Ensembl using query 4.7).
LOAD DATA LOCAL INFILE 'imports/ensmus2ensg.csv'
INTO TABLE Ensmusg2Ensg;
| mouse | human |
+----------+---------+
| ENSMUSG00000000001 | ENSG00000065135 |
| ENSMUSG00000000028 | ENSG00000093009 |
| ENSMUSG00000000056 | ENSG00000141562 |
| ENSMUSG00000000058 | ENSG00000105971 |
| ENSMUSG00000000078 | ENSG00000067082 |
| ENSMUSG00000000088 | ENSG00000178741 |
| ENSMUSG00000000093 | ENSG00000121068 |
| ENSMUSG00000000103 | ENSG00000005889 |
| ENSMUSG00000000103 | ENSG00000067646 |
| ENSMUSG00000000120 | ENSG00000064300 |
This statement imports the Ensembl genes descriptions (3.1.1) we obtained from the Ensembl database into the FkrpTaf4 database.
LOAD DATA LOCAL INFILE 'imports/ensgdescriptions.csv'
INTO TABLE EnsgDescriptions;
The final join links the mCG identifiers in mcg2Ratio (5.2.1) to their Ensembl mouse identifier using the mcg2Ensmusg table (3.2.2). This joined table is then further linked to the Ensembl human genome identifiers through the ortholog mapping developed earlier (5.4) and then annotated with appropriate descriptions according to (5.5).
| human | abslogratio | description |
+---------+----------+-----------------------------------------------------------------+
| ENSG00000168671 | 0.158794414944941 | UDP glycosyltransferase 3 family, polypeptide A2 [Source:RefSeq_peptide;Acc:NP_777574] |
| ENSG00000145626 | 0.158794414944941 | UDP glycosyltransferase 3 family, polypeptide A1 [Source:RefSeq_peptide;Acc:NP_689617] |
| ENSG00000183785 | 0.0131185176546133 | Peroxisome assembly protein 26 (Peroxin-26). [Source:Uniprot/SWISSPROT;Acc:Q7Z412] |
| ENSG00000137076 | 0.0896014658925754 | Talin-1. [Source:Uniprot/SWISSPROT;Acc:Q9Y490] |
| ENSG00000160888 | 0.125966605983794 | Immediate early response gene 2 protein (Protein ETR101). [Source:Uniprot/SWISSPROT;Acc:Q9BTL4] |
| ENSG00000149380 | 0.378918118515268 | prolyl 4-hydroxylase, alpha III subunit precursor [Source:RefSeq_peptide;Acc:NP_878907] |
| ENSG00000151079 | 0.825970582585705 | Potassium voltage-gated channel subfamily A member 6 (Voltage-gated potassium channel subunit Kv1.6) (HBK2). [Source:Uniprot/SWI |
| ENSG00000153179 | 0.211211462669622 | Ras association domain-containing protein 3. [Source:Uniprot/SWISSPROT;Acc:Q86WH2] |
| ENSG00000176040 | 0.050123519815702 | Transmembrane protease, serine 7 precursor (EC 3.4.21.-). [Source:Uniprot/SWISSPROT;Acc:Q7RTY8] |
| ENSG00000086102 | 0.183398227233556 | Transcriptional repressor NF-X1 (EC 6.3.2.-) (Nuclear transcription factor, X box-binding, 1). [Source:Uniprot/SWISSPROT;Acc:Q12 |
The average tables are based on the SiRNA and Scrambled measurements in the Hela (section 2.3.1 and 2.3.2) and Skndz cells (section 2.3.3 and 2.3.4).
This ratio table is based on 6.1.1 and 6.1.2. The query is executed in the FkrpTaf4 schema
This ratio table is based on 6.1.3 and 6.1.4. This query is executed in the FkrpTaf4 schema.
| ensembl | hela | skndz | description |
+---------+-----------+-----------+-------------------------------- ----------------------------------+
| ENSG00000115380 | -0.079138526676969 | -0.248584127949312 | EGF-containing fibulin-like extracellular matrix protein 1 pre cursor (Fibulin-3) (FIBL-3) (Fibrillin-like protein) (Extracellula |
| ENSG00000177453 | 0.111473798056098 | -0.307484718821447 | Serine/threonine-protein kinase NIM1 (EC 2.7.11.1). [Source:Un iprot/SWISSPROT;Acc:Q8IY84] |
| ENSG00000163702 | -0.0600845596048174 | -0.00482188024916856 | Interleukin-17 receptor C precursor (IL-17 receptor C) (IL-17R C) (Interleukin-17 receptor-like protein) (IL-17RL) (Interleukin-1 |
| ENSG00000164070 | -0.189481181943389 | -0.0204088316899511 | Heat shock 70 kDa protein 4L (Osmotic stress protein 94) (Heat shock 70-related protein APG-1). [Source:Uniprot/SWISSPROT;Acc:O9 |
| ENSG00000164761 | -0.122394168057243 | -0.345501629149025 | Tumor necrosis factor receptor superfamily member 11B precurso r (Osteoprotegerin) (Osteoclastogenesis inhibitory factor). [Sourc |
| ENSG00000173918 | -0.490711308082634 | -0.171471503892196 | Complement C1q tumor necrosis factor-related protein 1 precurs or (G protein-coupled receptor-interacting protein) (GIP). [Source |
| ENSG00000178562 | 0.186102216655366 | -1.04400820819106 | T-cell-specific surface glycoprotein CD28 precursor (TP44). [S ource:Uniprot/SWISSPROT;Acc:P10747] |
| ENSG00000009709 | -0.0957910545498839 | 0.0281143734785264 | Paired box protein Pax-7 (HUP1). [Source:Uniprot/SWISSPROT;Acc :P23759] |
| ENSG00000132356 | 0.303738474652669 | 0.0885700945845682 | 5'-AMP-activated protein kinase catalytic subunit alpha-1 (EC 2.7.11.1) (AMPK alpha-1 chain). [Source:Uniprot/SWISSPROT;Acc:Q131 |
| ENSG00000125652 | 0.133938469086117 | -0.0268717677139614 | Alkylated repair protein alkB homolog 7 precursor (Spermatogen esis- associated protein 11) (Spermatogenesis cell proliferation-r |
We presented the steps necessary to
Since the work we performed took a couple of weeks and we imagine
that many people are interested in the actual mapping from mCG/hCG
identifiers to their respective gene annotated identifiers we brought
the mCG2ensmusg, hCG2Ensg and human2mouse mappings online at the top of
this page. Please spend some time adding a reference to our work. Thank
you very much.