Skip to content

Commit 7666a08

Browse files
author
Daniel Mapleson
committed
Finish Release-2.3.2
KAT now allows users to pipe in gzipped files via process substitution. Updated distribution analysis script to make it more robust and added calculations for estimated genome size and level of heterozygos content. Fixed a bug in spectra hist plot title. Fixed a bug in the help message for '-n' option in kat comp.
2 parents acbaa32 + 4a2eb07 commit 7666a08

23 files changed

+807
-437
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,4 @@ Makefile.in
4040
*.pc
4141
*.la
4242
*.out
43+
/tags

configure.ac

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
# Autoconf initialistion. Sets package name version and contact details
66
AC_PREREQ([2.68])
7-
AC_INIT([kat],[2.3.1],[https://github.com/TGAC/KAT/issues],[kat],[https://github.com/TGAC/KAT])
7+
AC_INIT([kat],[2.3.2],[https://github.com/TGAC/KAT/issues],[kat],[https://github.com/TGAC/KAT])
88
AC_CONFIG_SRCDIR([src/kat.cc])
99
AC_CONFIG_AUX_DIR([build-aux])
1010
AC_CONFIG_MACRO_DIR([m4])
@@ -176,6 +176,7 @@ else
176176
if [[ -z "${BOOST_TIMER_STATIC_LIB}" ]] || [[ -z "${BOOST_CHRONO_STATIC_LIB}" ]] || [[ -z "${BOOST_FILESYSTEM_STATIC_LIB}" ]] || [[ -z "${BOOST_PROGRAM_OPTIONS_STATIC_LIB}" ]] || [[ -z "${BOOST_SYSTEM_STATIC_LIB}" ]]; then
177177
AC_MSG_WARN([Not all static boost libraries could be found. Will use dynamic libraries instead.])
178178
BOOST_LIBS="${BOOST_DYN_LIBS}"
179+
dynboost="yes"
179180
else
180181
BOOST_LIBS="${BOOST_STATIC_LIBS}"
181182
fi

doc/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,9 @@
5252
# built documents.
5353
#
5454
# The short X.Y version.
55-
version = '2.3.1'
55+
version = '2.3.2'
5656
# The full version, including alpha/beta/rc tags.
57-
release = '2.3.1'
57+
release = '2.3.2'
5858

5959
# The language for content autogenerated by Sphinx. Refer to documentation
6060
# for a list of supported languages.

doc/source/faq.rst

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,11 @@
44
Frequently Asked Questions
55
==========================
66

7-
Can KAT handle gzipped sequence files?
8-
--------------------------------------
7+
Can KAT handle compressed sequence files?
8+
-----------------------------------------
99

10-
Yes, but only via named pipes. For example, say we wanted to run ``kat hist`` using
10+
Yes, via named pipes. Anonymous named pipes (process substitution) is also supported.
11+
For example, say we wanted to run ``kat hist`` using
1112
gzipped paired end dataset, we can use a named pipe to do this as follows::
1213

1314
mkfifo pe_dataset.fq && gunzip -c pe_dataset_?.fq.gz > pe_dataset.fq &
@@ -21,7 +22,17 @@ consuming from the named pipe will take data that has been gunzipped first. To
2122
clear this means you do not have to decompress the gzipped files to disk, this happens
2223
on the fly as consumed by KAT.
2324

24-
Thanks to John Davey for suggesting this.
25+
Alternatively, using process substitution we could write the previous example more
26+
concisely in a single line like this::
27+
28+
kat hist -o oe_dataset.hist <(gunzip -c pe_dataset_?.fq.gz)
29+
30+
As a more complex example, the KAT comp tool can be driven in spectra-cn mode using
31+
both compressed paired end reads and a compressed assembly as follows::
32+
33+
kat comp -o oe_spectra_cn <(gunzip -c pe_dataset_?.fq.gz) <(gunzip -c asm.fa.gz)
34+
35+
Thanks to John Davey and Torsten Seeman for suggesting this.
2536

2637

2738
Why is jellyfish bundled with KAT?
46.4 KB
Loading
78.8 KB
Loading

doc/source/walkthrough.rst

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -308,8 +308,9 @@ Genome assembly analysis using k-mer spectra
308308
--------------------------------------------
309309

310310
One of the most frequently used tools in KAT are the so called "assembly spectra
311-
copy number plots" or spectra-cn. We use these as a fast first analysis for assembly coherence
312-
to the data in the reads they are representing. Basically we represent how many elements
311+
copy number plots" or spectra-cn. We use these as a fast first analysis to check
312+
assembly coherence against
313+
the content within reads that were used to produce the assembly. Basically we represent how many elements
313314
of each frequency on the read’s spectrum ended up not included in the assembly, included
314315
once, included twice etc.
315316

@@ -374,6 +375,36 @@ duplications, inclusion of extra variation, etc:
374375
:scale: 33%
375376

376377

378+
Distribution decomposition analysis
379+
-----------------------------------
380+
381+
It's useful to be able to fit distributions to each peak in a k-mer histogram or spectra-cn matrix
382+
in order to work out how many distinct k-mers can be associated with those distributions. By counting
383+
k-mers in this way we can make predictions around genome size, heterozygous rates (if diploid) and
384+
assembly completeness. To this end we bundle a script with kat called kat_distanalysis.py. It takes
385+
in either a spectra-cn matrix file, or kat histogram file as input, then proceeds to identify peaks
386+
and fit distributions to each one. In the case of spectra-cn matrix files it also identifies peaks
387+
for each copy number for an assembly.
388+
389+
The user can help to get correct predictions out of the tool by providing an approximate frequency for
390+
the homozygous part of the distribution. By default, this is assumed to be the last peak. For example,
391+
this command::
392+
393+
kat_distanalysis.py --plot spectra-cn.mx
394+
395+
might produce the following output for this tetraploid genome:
396+
397+
.. image:: images/distanalysis_console.png
398+
:scale: 100%
399+
400+
.. image:: images/distanalysis_plot.png
401+
:scale: 100%
402+
403+
404+
405+
406+
407+
377408
Finding repetitive regions in assemblies
378409
----------------------------------------
379410

lib/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,4 @@
44
*.lo
55
*.o
66
*.kat-2.1.pc.swp
7+
/tags

lib/include/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/tags

lib/include/kat/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/tags

0 commit comments

Comments
 (0)