annotate damid_to_bedgraph.py @ 1:35011939bc8b draft default tip

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
author mvdbeek
date Thu, 03 Jan 2019 09:33:02 -0500
parents 8e4ebcd58df3
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
1 from collections import OrderedDict
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
2
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
3 import click
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
4 import numpy as np
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
5 import pandas as pd
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
6 import traces
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
7
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
8
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
9 def order_index(df):
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
10 """
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
11 Split chr_start_stop in df index and order by chrom and start.
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
12 """
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
13 idx = df.index.str.split('_')
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
14 idx = pd.DataFrame.from_records(list(idx))
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
15
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
16 idx.columns = ['chr', 'start', 'stop']
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
17 idx = idx.astype(dtype={"chr": "object",
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
18 "start": "int32",
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
19 "stop": "int32"})
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
20 coordinates = idx.sort_values(['chr', 'start'])
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
21 df.index = np.arange(len(df.index))
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
22 df = df.loc[coordinates.index]
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
23 df = coordinates.join(df)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
24 # index is center of GATC site
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
25 df.index = df['start'] + 2
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
26 return df
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
27
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
28
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
29 def interpolate_values(df, sampling_width=100):
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
30 result = []
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
31 for chrom in df['chr'].unique():
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
32 chrom_df = df[df['chr'] == chrom]
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
33 time_series = traces.TimeSeries(chrom_df['log2FC'])
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
34 s = pd.DataFrame.from_records(time_series.sample(sampling_width, interpolate='linear'))
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
35 # Calculate new start and end of interpolated region
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
36 start = s[0] - int(sampling_width / 2)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
37 start.loc[start < 0] = 1
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
38 end = s[0] + int(sampling_width / 2)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
39 result.append(pd.DataFrame(OrderedDict([('chr', chrom), ('start', start), ('end', end), ('score', s[1])])))
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
40 return pd.concat(result)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
41
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
42
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
43 @click.command()
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
44 @click.argument('input_path', type=click.Path(exists=True), required=True)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
45 @click.argument('output_path', type=click.Path(exists=False), required=True)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
46 @click.option('--resolution', help="Interpolate log2 fold change at this resolution (in basepairs)", default=50)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
47 def deseq2_to_bedgraph(input_path, output_path, resolution=50):
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
48 """Convert deseq2 output on GATC fragments to bedgraph file with interpolated values."""
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
49 df = pd.read_csv(input_path, sep='\t', header=None, index_col=0, usecols=[0, 2], names=['GATC', 'log2FC'])
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
50 df = df[~df.index.str.contains('\.')]
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
51 df = order_index(df)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
52 r = interpolate_values(df, sampling_width=resolution)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
53 r.to_csv(output_path, sep='\t', header=None, index=None)
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
54
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
55
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
56 if __name__ == '__main__':
8e4ebcd58df3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/damid_deseq2_to_bedgraph commit 98722d2ca8205595f032361072aaab450e5f4f83
mvdbeek
parents:
diff changeset
57 deseq2_to_bedgraph()