Mercurial > repos > devteam > ncbi_blast_plus
comparison tool-data/blastdb_d.loc.sample @ 1:5e9d5e536b79 draft
Uploaded v0.1.02 preview 2, clarify sample blastdb loc files, etc
| author | peterjc |
|---|---|
| date | Tue, 03 Mar 2015 05:32:18 -0500 |
| parents | 432ea9614cc9 |
| children |
comparison
equal
deleted
inserted
replaced
| 0:432ea9614cc9 | 1:5e9d5e536b79 |
|---|---|
| 1 #This is a sample file distributed with Galaxy that is used to define a | 1 # This is a sample file distributed with Galaxy that is used to define a |
| 2 #list of protein domain databases, using three columns tab separated | 2 # list of protein domain databases, using three columns tab separated |
| 3 #(longer whitespace are TAB characters): | 3 # (longer whitespace are TAB characters): |
| 4 # | 4 # |
| 5 #<unique_id> <database_caption> <base_name_path> | 5 # <unique_id>{tab}<database_caption>{tab}<base_name_path> |
| 6 # | 6 # |
| 7 #The captions typically contain spaces and might end with the build date. | 7 # The captions typically contain spaces and might end with the build date. |
| 8 #It is important that the actual database name does not have a space in it, | 8 # It is important that the actual database name does not have a space in |
| 9 #and that there are only two tabs on each line. | 9 # it, and that there are only two tabs on each line. |
| 10 # | 10 # |
| 11 #You can download the NCBI provided databases as tar-balls from here: | 11 # You can download the NCBI provided databases as tar-balls from here: |
| 12 #ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ | 12 # ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ |
| 13 # | 13 # |
| 14 #So, for example, if your database is CDD and the path to your base name | 14 # For simplicity, many Galaxy servers are configured to offer just a live |
| 15 #is /data/blastdb/Cdd, then the blastdb_d.loc entry would look like this: | 15 # version of each NCBI BLAST database (updated with the NCBI provided |
| 16 # Perl scripts or similar). In this case, we recommend using the case | |
| 17 # sensistive base-name of the NCBI BLAST databases as the unique id. | |
| 18 # Consistent naming is important for sharing workflows between Galaxy | |
| 19 # servers. | |
| 16 # | 20 # |
| 17 #Cdd{tab}NCBI Conserved Domains Database (CDD){tab}/data/blastdb/Cdd | 21 # For example, consider the NCBI Conserved Domains Database (CDD), where |
| 22 # you have downloaded and decompressed the files under the directory | |
| 23 # /data/blastdb/domains/ meaning at the command line BLAST+ would be | |
| 24 # run as follows any would look at the files /data/blastdb/domains/Cdd.*: | |
| 18 # | 25 # |
| 19 #and your /data/blastdb directory would contain all of the files associated | 26 # $ rpsblast -db /data/blastdb/domains/Cdd -query ... |
| 20 #with the database, /data/blastdb/Cdd.*. | |
| 21 # | 27 # |
| 22 #Your blastdb_d.loc file should include an entry per line for each "base name" | 28 # In this case use Cdd (title case to match the NCBI file naming) as the |
| 23 #you have stored. For example: | 29 # unique id in the first column of blastdb_d.loc, giving an entry like |
| 30 # this: | |
| 24 # | 31 # |
| 25 #Cdd NCBI CDD /data/blastdb/domains/Cdd | 32 # Cdd{tab}NCBI Conserved Domains Database (CDD){tab}/data/blastdb/domains/Cdd |
| 26 #Kog KOG (eukaryotes) /data/blastdb/domains/Kog | |
| 27 #Cog COG (prokaryotes) /data/blastdb/domains/Cog | |
| 28 #Pfam Pfam-A /data/blastdb/domains/Pfam | |
| 29 #Smart SMART /data/blastdb/domains/Smart | |
| 30 #Tigr TIGR /data/blastdb/domains/Tigr | |
| 31 #Prk Protein Clusters database /data/blastdb/domains/Prk | |
| 32 #...etc... | |
| 33 # | 33 # |
| 34 #See also blastdb.loc which is for any nucleotide BLAST database, and | 34 # Your blastdb_d.loc file should include an entry per line for each "base name" |
| 35 #blastdb_p.loc which is for any protein BLAST databases. | 35 # you have stored. For example: |
| 36 # | |
| 37 # Cdd{tab}NCBI CDD{tab}/data/blastdb/domains/Cdd | |
| 38 # Kog{tab}KOG (eukaryotes){tab}/data/blastdb/domains/Kog | |
| 39 # Cog{tab}COG (prokaryotes){tab}/data/blastdb/domains/Cog | |
| 40 # Pfam{tab}Pfam-A{tab}/data/blastdb/domains/Pfam | |
| 41 # Smart{tab}SMART{tab}/data/blastdb/domains/Smart | |
| 42 # Tigr{tab}TIGR /data/blastdb/domains/Tigr | |
| 43 # Prk{tab}Protein Clusters database{tab}/data/blastdb/domains/Prk | |
| 44 # ...etc... | |
| 45 # | |
| 46 # Alternatively, rather than a "live" mirror of the NCBI databases which | |
| 47 # are updated automatically, for full reproducibility the Galaxy Team | |
| 48 # recommend saving date-stamped copies of the databases. In this case | |
| 49 # your blastdb_d.loc file should include an entry per line for each | |
| 50 # version you have stored. For example: | |
| 51 # | |
| 52 # Cdd_05Jun2010{tab}NCBI CDD 05 Jun 2010{tab}/data/blastdb/domains/05Jun2010/Cdd | |
| 53 # Cdd_15Aug2010{tab}NCBI CDD 15 Aug 2010{tab}/data/blastdb/domains/15Aug2010/Cdd | |
| 54 # ...etc... | |
| 55 # | |
| 56 # See also blastdb.loc which is for any nucleotide BLAST database, and | |
| 57 # blastdb_p.loc which is for any protein BLAST databases. |
