###Quality trim all fastq.gz files using [Trimmomatic (v0.30)](http://www.usadellab.org/cms/?page=trimmomatic)

####Code explanation of for loop below:
1. ```%%bash``` specifies to use the shell for this Jupyter cell
2. ```for file in /Volumes/nightingales/C_virginica/2112_lane1_[^N]*``` initiates a for loop to handle all files beginning with ```2212_lane2_``` and only those that do not have the letter "N" at that position in the file name.
3. ```do``` tells the for loop what to do with each of the files.
4. ```newname=${file##*/}``` takes the value of the ```$file``` variable (which is ```/Volumes/nightingales/C_gigas/2212_lane2_[^N]*```) and trims the longest match from the beginning of the pattern (the pattern is ```*/```; the ```##``` is a bash command to specifiy how to trim). The resulting output (which is just the file name without the full path) is then stored in the ```newname``` variable.
5. This line initiates Trimmomatic and uses the following arguments to specify order of execution:
 1. single end reads (```SE```)
 1. number of threads (```-threads 16```), 
 2. type of quality score (```-phred33```),
 3. input file location (```"$file"```),
 4. output file name/location (```/Volumes/Data/Sam/scratch/20150414_trimmed_$newname```),
 5. single end Illumina TruSeq adaptor trimming (```ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10```),
 6. cut number of bases at beginning of read if below quality threshold (```LEADING:3```)
 7. cut number of bases at end of read if below quality threshold (```TRAILING:3```)
 8. cut if average quality within 4 base window falls below 15 (```SLIDINGWINDOW:4:15```)
6. ```done``` closes for loop.

In [2]:
%%bash
for file in /Volumes/nightingales/C_virginica/2112_lane1_[^N]*
do
newname=${file##*/}
java -jar /usr/local/bioinformatics/Trimmomatic-0.30/trimmomatic-0.30.jar SE -threads 16 -phred33 "$file" /Volumes/Data/Sam/scratch/20150414_trimmed_$newname ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15;
done

TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_virginica/2112_lane1_ACAGTG_L001_R1_001.fastq.gz /Volumes/Data/Sam/scratch/20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 16000000 Surviving: 11370712 (71.07%) Dropped: 4629288 (28.93%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_virginica/2112_lane1_ACAGTG_L001_R1_002.fastq.gz /Volumes/Data/Sam/scratch/20150414_trimmed_2112_lane1_ACAGTG_L001_R1_002.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa

###FASTQC on all trimmed files using [FASTQC (v0.11.2)](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

In [23]:
%%bash
for file in /Volumes/Data/Sam/scratch/20150414_trimmed_2112*; do fastqc "$file" --outdir=/Volumes/Eagle/Arabidopsis/; done

Analysis complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_002.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_ATCACG_L001_R1_001.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_ATCACG_L001_R1_002.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_ATCACG_L001_R1_003.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_CAGATC_L001_R1_001.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_CAGATC_L001_R1_002.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_CAGATC_L001_R1_003.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_GCCAAT_L001_R1_001.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_GCCAAT_L001_R1_002.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_TGACCA_L001_R1_001.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_TTAGGC_L001_R1_001.fastq.gz
Analysis complete for 20150414_trimmed_2112_lane1_TTAGGC_L001_R1

Started analysis of 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 5% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 10% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 15% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 20% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 25% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 30% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 35% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 40% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 45% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 50% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 55% complete for 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001.fastq.gz
Approx 60% complete for 20150414_trimmed_2112

###Unzip all FASTQC .zip files

In [24]:
cd /Volumes/Eagle/Arabidopsis/

/Volumes/Eagle/Arabidopsis


In [25]:
%%bash
for file in 20150414_trimmed_2112_lane1_*.zip; do unzip "$file"; done

Archive: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc.zip
 creating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/
 creating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/Icons/
 creating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/Images/
 inflating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/Icons/fastqc_icon.png 
 inflating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/Icons/error.png 
 inflating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/Icons/tick.png 
 inflating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/summary.txt 
 inflating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/Images/per_base_quality.png 
 inflating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/Images/per_tile_quality.png 
 inflating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/Images/per_sequence_quality.png 
 inflating: 20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc/Images/per_base_sequence_content.p

###Concatenate groups of sequences into single files

In [3]:
cd /Volumes/Data/Sam/scratch/

/Volumes/Data/Sam/scratch


####HB2 	25,000ppm oil 	Index - ATCACG

In [4]:
%%bash
#gunzips all matching files in folder and appends the data to a single file:
#20150414_trimmed_2112_lane1_HB2_Oil_25000ppm_ATCACG.fastq
for file in 20150414_trimmed_2112_lane1_ATCACG*
do
gunzip -c "$file" >> 20150414_trimmed_2112_lane1_HB2_Oil_25000ppm_ATCACG.fastq
done

In [5]:
%%bash
#Gzip file
gzip 20150414_trimmed_2112_lane1_HB2_Oil_25000ppm_ATCACG.fastq

####HB16 	25,000ppm oil 	Index - TTAGGC

In [20]:
%%bash
#gunzips all matching files in folder and appends the data to a single file:
#20150414_trimmed_2112_lane1_HB16_Oil_25000ppm_TTAGGC.fastq
for file in 20150414_trimmed_2112_lane1_TTAGGC*
do
gunzip -c "$file" >> 20150414_trimmed_2112_lane1_HB16_Oil_25000ppm_TTAGGC.fastq
done

In [21]:
%%bash
#Gzip file
gzip 20150414_trimmed_2112_lane1_HB16_Oil_25000ppm_TTAGGC.fastq

####HB30 	25,000ppm oil 	Index - TGACCA

In [8]:
%%bash
#gunzips all matching files in folder and appends the data to a single file:
#20150414_trimmed_2112_lane1_HB30_Oil_25000ppm_TGACCA.fastq
for file in 20150414_trimmed_2112_lane1_TGACCA*
do
gunzip -c "$file" >> 20150414_trimmed_2112_lane1_HB30_Oil_25000ppm_TGACCA.fastq
done

In [9]:
%%bash
#Gzip file
gzip 20150414_trimmed_2112_lane1_HB30_Oil_25000ppm_TGACCA.fastq

####NB3 	No oil 	Index - ACAGTG

In [10]:
%%bash
#gunzips all matching files in folder and appends the data to a single file:
#20150414_trimmed_2112_lane1_NB3_NoOil_ACAGTG.fastq
for file in 20150414_trimmed_2112_lane1_ACAGTG*
do
gunzip -c "$file" >> 20150414_trimmed_2112_lane1_NB3_NoOil_ACAGTG.fastq
done

In [11]:
%%bash
#Gzip file
gzip 20150414_trimmed_2112_lane1_NB3_NoOil_ACAGTG.fastq

####NB6 	No oil 	Index - GCCAAT

In [12]:
%%bash
#gunzips all matching files in folder and appends the data to a single file:
#20150414_trimmed_2112_lane1_NB6_NoOil_GCCAAT.fastq
for file in 20150414_trimmed_2112_lane1_GCCAAT*
do
gunzip -c "$file" >> 20150414_trimmed_2112_lane1_NB6_NoOil_GCCAAT.fastq
done

In [13]:
%%bash
#Gzip file
gzip 20150414_trimmed_2112_lane1_NB6_NoOil_GCCAAT.fastq

####NB11 	No oil 	Index - CAGATC

In [14]:
%%bash
#gunzips all matching files in folder and appends the data to a single file:
#20150414_trimmed_2112_lane1_NB11_NoOil_CAGATC.fastq
for file in 20150414_trimmed_2112_lane1_CAGATC*
do
gunzip -c "$file" >> 20150414_trimmed_2112_lane1_NB11_NoOil_CAGATC.fastq
done

In [15]:
%%bash
#Gzip file
gzip 20150414_trimmed_2112_lane1_NB11_NoOil_CAGATC.fastq

###Copy files to Eagle for web-based access

In [19]:
%%bash
for file in 2015*e1_[NH]B*; do cp "$file" /Volumes/Eagle/Arabidopsis/; done