9 fosmids assemblies (indépendant) consensus (573331) run through chit-est. no initial limited on coverage and size


robertsmac:cd-hit-v4.5.4-2011-03-07 sr320$ ./cd-hit-est -i /Volumes/Bay4\ scratch/BIGFOSMID.fa -o /Volumes/Bay4\ scratch/BIGFOSMID_cdhit -M 2500
================================================================
Program: CD-HIT, V4.5.4, Feb 23 2012, 11:03:06
Command: ./cd-hit-est -i /Volumes/Bay4 scratch/BIGFOSMID.fa -o
         /Volumes/Bay4 scratch/BIGFOSMID_cdhit -M 2500

Started: Sat Feb 25 10:04:34 2012
================================================================
                            Output                             
----------------------------------------------------------------
total seq: 573331
longest and shortest : 45747 and 876
Total letters: 1520457838
Sequences have been sorted

Approximated minimal memory consumption:
Sequence        : 1592M
Buffer          : 1 X 28M = 28M
Table           : 1 X 25M = 25M
Miscellaneous   : 11M
Total           : 1659M

Table limit with the given memory limit:
Max number of representatives: 4194304
Max number of word counting entries: 105119873

comparing sequences from          0  to       8048
comparing sequences from       8048  to      33677
..........    10000  finished       6565  clusters
..........    30000  finished      17621  clusters
comparing sequences from      30470  to      78377
..........    50000  finished      27961  clusters
..........    60000  finished      32752  clusters
comparing sequences from      66272  to     147166
..........    80000  finished      42042  clusters
..........   100000  finished      50760  clusters
..........   120000  finished      59254  clusters
comparing sequences from     121616  to     260978
..........   130000  finished      63410  clusters
..........   140000  finished      67390  clusters
..........   170000  finished      79059  clusters
..........   180000  finished      82911  clusters
..........   200000  finished      90239  clusters
comparing sequences from     208854  to     475563
..........   240000  finished     104516  clusters
..........   250000  finished     107875  clusters
..........   270000  finished     114614  clusters
..........   290000  finished     121203  clusters
..........   320000  finished     130854  clusters
..........   350000  finished     140267  clusters
comparing sequences from     357117  to     573331
..........   360000  finished     143323  clusters
..........   370000  finished     146454  clusters
..........   380000  finished     149489  clusters
..........   390000  finished     152532  clusters
..........   450000  finished     170317  clusters
..........   460000  finished     173215  clusters
..........   500000  finished     184701  clusters
..........   540000  finished     196292  clusters
..........   560000  finished     201868  clusters
......
   573331  finished     205903  clusters

Apprixmated maximum memory consumption: 2500M
writing new database
writing clustering information
program completed !

Total CPU time 27885
robertsmac:cd-hit-v4.5.4-2011-03-07 sr320$ 


--
need to get unique number prefixes

Done on Galaxy 


FASTA FILE: http://main.g2.bx.psu.edu/datasets/83ae21f9d999536d/display?username=sr320&to_ext=fasta&slug=repository   (618MB)



-- Should move to CLC for blasting.