Mercurial > repos > mvdbeek > dedup_hash
comparison README.rst @ 0:627dc826a68f draft default tip
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
| author | mvdbeek |
|---|---|
| date | Wed, 23 Nov 2016 07:46:20 -0500 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:627dc826a68f |
|---|---|
| 1 .. image:: https://travis-ci.org/mvdbeek/dedup_hash.svg?branch=master | |
| 2 :target: https://travis-ci.org/mvdbeek/dedup_hash | |
| 3 | |
| 4 dedup_hash | |
| 5 ---------------------------- | |
| 6 | |
| 7 | |
| 8 This is a commandline utility to remove exact duplicate reads | |
| 9 from paired-end fastq files. Reads are assumed to be in 2 separate | |
| 10 files. Read sequences are then concatenated and a short hash is calculated on | |
| 11 the concatenated sequence. If the hash has been previsouly seen the read will | |
| 12 be dropped from the output file. This means that reads that have the same | |
| 13 start and end coordinate, but differ in lengths will not be removed (but those | |
| 14 will be "flattened" to at most 1 occurence). | |
| 15 | |
| 16 This algorithm is very simple and fast, and saves memory as compared to | |
| 17 reading the whole fastq file into memory, such as fastuniq does. | |
| 18 | |
| 19 Installation | |
| 20 ------------ | |
| 21 | |
| 22 depdup_city relies on the cityhash python package, | |
| 23 which supports python-2.7 exclusively. | |
| 24 | |
| 25 ``pip install dedup_hash`` | |
| 26 |
