homework6

I chose to simulate data based on my ongoing research project.
If the biological hypothesis were found to be true. I am investigating whether a handful of insulin binding proteins have a role in C. elegans associative learning ability. The data would take the form of four groups (two strains x two treatments), both following a normal distribution of indices betweein -1 and 1. The strains are n2 (wild type) and zig-4. The conditions are 25mM NaCl (low salt) and 100mM NaCl (high salt).
I used a for loop to create my data set that way the four sets of numbers would be in order for displaying data. I combined it into a data frame outside the loop because I needed some variables to remain in their current forms for example “exp” which I needed to scale according to sample size.
I did not do a posthoc test yet only when playing with parameters of the data. Simple anova to make sure the data datas!

exp <- c("n2high","zig4high","n2low","zig4low") #experimental labels, just as a group of four to be used later 
for (j in 1:5) {
  for (i in 1:4){#four total sets of data, each unique
    if(i %% 2==1){#two must be closer to zero and two closer to one to simulate significant results
      varname <- paste0("x_",i) #creating a unique name for each vector
      assign(varname,rnorm(50,mean=runif(1,min=0.4,max=1),sd=runif(1,min=0.05,max=0.5)))#the data has a normal distribution and each new vector has a random mean in a desired range and a random standard deviation within a reasonable range
      } else {
        varname <- paste0("x_",i)
        assign(varname,rnorm(50,mean=runif(1,min=0,max=0.6),sd=runif(1,min=0.05,max=0.5)))
      }
  if (i==4){ #two of the vectors (low salt treatment) have to be negative so I waited till the last loop and multiplied the vectors by -1
    x_3 <- x_3*-1
    x_4 <- x_4*-1
    experiment <- rep(exp, each = 50) #scaling experimental labels to fit the amount of data
worm_index <- c(x_1,x_2,x_3,x_4) # combining data into a single vector
fakewormdata <- data.frame(worm_index,experiment) #data frame of whole dataset (just preference index and experimental labels)
anova1 <- aov(fakewormdata$worm_index~ fakewormdata$experiment)

sum <- summary(anova1) #quick initial anova before adjusting data parameters
cat("anova number",j,"\n")
print(sum) 
cat("\n\n\n")
  }
  }
}

## anova number 1 
##                          Df Sum Sq Mean Sq F value              Pr(>F)    
## fakewormdata$experiment   3  46.58  15.528     285 <0.0000000000000002 ***
## Residuals               196  10.68   0.054                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## 
## anova number 2 
##                          Df Sum Sq Mean Sq F value              Pr(>F)    
## fakewormdata$experiment   3  72.89  24.296   212.1 <0.0000000000000002 ***
## Residuals               196  22.46   0.115                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## 
## anova number 3 
##                          Df Sum Sq Mean Sq F value              Pr(>F)    
## fakewormdata$experiment   3  57.65  19.215   260.1 <0.0000000000000002 ***
## Residuals               196  14.48   0.074                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## 
## anova number 4 
##                          Df Sum Sq Mean Sq F value              Pr(>F)    
## fakewormdata$experiment   3  38.24  12.746   208.8 <0.0000000000000002 ***
## Residuals               196  11.97   0.061                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## 
## anova number 5 
##                          Df Sum Sq Mean Sq F value              Pr(>F)    
## fakewormdata$experiment   3  71.01  23.669   161.8 <0.0000000000000002 ***
## Residuals               196  28.67   0.146                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

*note: without the post hoc test all come back significant because half the data is below 0 and half is above. In the post hoc I will compare the groups that are on the same side of 0 with each other

Below I just assembled the data created above into a data frame

exp <- c("n2high","zig4high","n2low","zig4low")
experiment <- rep(exp, each = 50)
worm_index <- c(x_1,x_2,x_3,x_4)
fakewormdata <- data.frame(worm_index,experiment)

5/6. I used a for loop to adjust the sample size to explore how significance would change with sample size. I did this by subsetting the data based on the counter value and reassembling the data frame with the subsetted data. Then I ran an anova and a Tukey post-hoc test to extract the important comparisons from the statistical analysis.

for (j in 2:5) {
  y_1 <- x_1[1:(j*10)] #resampling the data
  y_2 <- x_2[1:(j*10)]
  y_3 <- x_3[1:(j*10)]
  y_4 <- x_4[1:(j*10)]
  index <- c(y_1,y_2,y_3,y_4) #recombining the data
  exp <- rep(exp,each=(j*10))
  data <- data.frame(index,exp) #turning it into a data frame
  anova <- aov(data$index ~ data$exp) 
  turkey <- TukeyHSD(anova) #post hoc test 
  high <- turkey$data[2,4] #these two lines isolate the desired parts of the post hoc test, one comparing the high salt conditioned groups to each other and one comparing the low salt conditioned ones. 
  low <- turkey$data[5,4]
  cat("When n =",j*40,"and each strain x treatment has a random standard deviation between 0.05 and 0.5: \n Analysis: \n zig-4 (mean =",mean(y_2),") x wild type (mean =",mean(y_1),"), with high salt conditioning: \n p =",high,"\n Analysis: \n zig-4 (mean =",mean(y_4),") x wild type (mean =",mean(y_3),"), with low salt conditioning: \n p =",low,"\n\n\n")
}

## When n = 80 and each strain x treatment has a random standard deviation between 0.05 and 0.5: 
##  Analysis: 
##  zig-4 (mean = 0.2079407 ) x wild type (mean = 0.8984498 ), with high salt conditioning: 
##  p = 0.000009808103 
##  Analysis: 
##  zig-4 (mean = -0.1756984 ) x wild type (mean = -0.7844878 ), with low salt conditioning: 
##  p = 0.0001036103 
## 
## 
## When n = 120 and each strain x treatment has a random standard deviation between 0.05 and 0.5: 
##  Analysis: 
##  zig-4 (mean = 0.1608291 ) x wild type (mean = 0.9156905 ), with high salt conditioning: 
##  p = 1 
##  Analysis: 
##  zig-4 (mean = -0.1877919 ) x wild type (mean = -0.8486365 ), with low salt conditioning: 
##  p = 1 
## 
## 
## When n = 160 and each strain x treatment has a random standard deviation between 0.05 and 0.5: 
##  Analysis: 
##  zig-4 (mean = 0.1751816 ) x wild type (mean = 0.957495 ), with high salt conditioning: 
##  p = 1 
##  Analysis: 
##  zig-4 (mean = -0.1735121 ) x wild type (mean = -0.7591877 ), with low salt conditioning: 
##  p = 1 
## 
## 
## When n = 200 and each strain x treatment has a random standard deviation between 0.05 and 0.5: 
##  Analysis: 
##  zig-4 (mean = 0.1911334 ) x wild type (mean = 0.9550838 ), with high salt conditioning: 
##  p = 1 
##  Analysis: 
##  zig-4 (mean = -0.1723003 ) x wild type (mean = -0.6806365 ), with low salt conditioning: 
##  p = 1

homework6

Nick Maurizi

2025-03-06