Skip to content

how to handle levels within categorical variables #1013

@Overcraft90

Description

@Overcraft90

I have a dataset with two categorical variable: chromosomes and generations for 23 individuals each one of which has 2 replicates in hap1 and hap2.
In this df I'm counting values of an M-pattern for the 23 individuals, and I would like to represent them with boxplots with associated stats.

In particular, I need to represent on the y-axis the various chromosomes set of generations G1 to G4 while showing whether interactions between them are significant or not.


This is the ggstatsplot code I'm using

library(tibble)
library(ggplot2)
library(ggstatsplot)

grouped_ggbetweenstats(data = tmp, x = chr, y = values, 
    grouping.var = hap, type = "nonparametric", ylab = expression(log[10] * 
        "(values)"), violin.args = list(width = 0, linewidth = 0), 
    ggplot.component = list(ggplot2::theme(axis.text.y = ggplot2::element_text(angle = -90, 
        hjust = 0.5)), coord_flip()))

and this is a subset of five individuals out of the 23 over two chromosomes only

tmp <- tibble::tribble(~`M-pattern`, ~id, ~hap, ~values, ~gen, 
    ~chr, "M0", 200080, "hap1", 2, "G3", "chr1", "M1", 200080, 
    "hap1", 4.30102999566398, "G3", "chr1", "M10", 200080, "hap1", 
    0.301029995663981, "G3", "chr1", "M0", 200081, "hap1", 2.30102999566398, 
    "G4", "chr1", "M1", 200081, "hap1", 4.60205999132796, "G4", 
    "chr1", "M10", 200081, "hap1", 0.602059991327962, "G4", "chr1", 
    "M0", 200084, "hap1", 1.69897000433602, "G4", "chr1", "M1", 
    200084, "hap1", 4, "G4", "chr1", "M10", 200084, "hap1", 0, 
    "G4", "chr1", "M0", 200085, "hap1", 2.69897000433602, "G2", 
    "chr1", "M1", 200085, "hap1", 4.17609125905568, "G2", "chr1", 
    "M10", 200085, "hap1", 1.8750612633917, "G2", "chr1", "M0", 
    200086, "hap1", 2.39794000867204, "G1", "chr1", "M1", 200086, 
    "hap1", 3.8750612633917, "G1", "chr1", "M10", 200086, "hap1", 
    2.09691001300806, "G1", "chr1", "M0", 200080, "hap2", 2.60205999132796, 
    "G3", "chr1", "M1", 200080, "hap2", 4.90308998699194, "G3", 
    "chr1", "M10", 200080, "hap2", 0.903089986991944, "G3", "chr1", 
    "M0", 200081, "hap2", 2.90308998699194, "G4", "chr1", "M1", 
    200081, "hap2", 5.20411998265593, "G4", "chr1", "M10", 200081, 
    "hap2", 1.20411998265592, "G4", "chr1", "M0", 200084, "hap2", 
    2.30102999566398, "G4", "chr1", "M1", 200084, "hap2", 4.60205999132796, 
    "G4", "chr1", "M10", 200084, "hap2", 0.602059991327962, "G4", 
    "chr1", "M0", 200085, "hap2", 3.30102999566398, "G2", "chr1", 
    "M1", 200085, "hap2", 4.77815125038364, "G2", "chr1", "M10", 
    200085, "hap2", 2.47712125471966, "G2", "chr1", "M0", 200086, 
    "hap2", 3, "G1", "chr1", "M1", 200086, "hap2", 4.47712125471966, 
    "G1", "chr1", "M10", 200086, "hap2", 2.69897000433602, "G1", 
    "chr1", "M0", 200080, "hap1", 1.14612803567824, "G3", "chr2", 
    "M1", 200080, "hap1", 2.30102999566398, "G3", "chr2", "M10", 
    200080, "hap1", 0.301029995663981, "G3", "chr2", "M0", 200081, 
    "hap1", 1.30102999566398, "G4", "chr2", "M1", 200081, "hap1", 
    2.45178643552429, "G4", "chr2", "M10", 200081, "hap1", 0.477121254719662, 
    "G4", "chr2", "M0", 200084, "hap1", 1, "G4", "chr2", "M1", 
    200084, "hap1", 2.14921911265538, "G4", "chr2", "M10", 200084, 
    "hap1", 0, "G4", "chr2", "M0", 200085, "hap1", 1.50514997831991, 
    "G2", "chr2", "M1", 200085, "hap1", 2.2380461031288, "G2", 
    "chr2", "M10", 200085, "hap1", 1.07918124604762, "G2", "chr2", 
    "M0", 200086, "hap1", 1.34242268082221, "G1", "chr2", "M1", 
    200086, "hap1", 2.08635983067475, "G1", "chr2", "M10", 200086, 
    "hap1", 1.20411998265592, "G1", "chr2", "M0", 200080, "hap2", 
    1.44715803134222, "G3", "chr2", "M1", 200080, "hap2", 2.60205999132796, 
    "G3", "chr2", "M10", 200080, "hap2", 0.602059991327962, "G3", 
    "chr2", "M0", 200081, "hap2", 1.60205999132796, "G4", "chr2", 
    "M1", 200081, "hap2", 2.75281643118827, "G4", "chr2", "M10", 
    200081, "hap2", 0.778151250383644, "G4", "chr2", "M0", 200084, 
    "hap2", 1.30102999566398, "G4", "chr2", "M1", 200084, "hap2", 
    2.45178643552429, "G4", "chr2", "M10", 200084, "hap2", 0.477121254719662, 
    "G4", "chr2", "M0", 200085, "hap2", 1.79934054945358, "G2", 
    "chr2", "M1", 200085, "hap2", 2.53907609879278, "G2", "chr2", 
    "M10", 200085, "hap2", 1.38021124171161, "G2", "chr2", "M0", 
    200086, "hap2", 1.65321251377534, "G1", "chr2", "M1", 200086, 
    "hap2", 2.38916608436453, "G1", "chr2", "M10", 200086, "hap2", 
    1.50514997831991, "G1", "chr2")

The problem, as per the image below, is that each chromosome is a clump of all generations G1-4, whereas I need to show each generation as an independent, dodged boxplot (ideally colored by generation of origin) and draw comparisons between them.

Image

This following plot is a very close outcome to the expected result done with ggplot2; however, it has the problem that in order to determine interactions between generations, I had to produce an interaction(gen, chr) which has resulted in unwanted white spaces between boxplots...

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions