User Tools

Site Tools


distributed_computation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
distributed_computation [2015/04/09 17:52]
mganzeboom created
distributed_computation [2015/04/30 07:55] (current)
mganzeboom
Line 1: Line 1:
-====== Distributed computing ======+====== SPRAAK Distributed computing ======
 Different ways are used to distribute computing for training and evaluating in SPRAAK. See corresponding sections below. Different ways are used to distribute computing for training and evaluating in SPRAAK. See corresponding sections below.
  
Line 7: Line 7:
 ===== Evaluating (i.e. recognition) experiments ===== ===== Evaluating (i.e. recognition) experiments =====
 To distribute the computation required for example the recognition of a large corpus of speech recordings, the [[http://www.spraak.org/documentation/doxygen/doc/html/spr__scoreres_8c.html#programs__spr_scoreres|spr_scoreres]] program can be used.\\ To distribute the computation required for example the recognition of a large corpus of speech recordings, the [[http://www.spraak.org/documentation/doxygen/doc/html/spr__scoreres_8c.html#programs__spr_scoreres|spr_scoreres]] program can be used.\\
-This program recalculates the totals of multiple .RES result files that SPRAAK produces in a recognition experiment. In this way, you can manually divide the corpus in multiple parts (let's say the amount of cores you have in your machine or the amount of machines you have available), run a recognition experiment with every part on a core or separate machine and merge the .RES result file of each to one big result file with ''spr_scoreres''.+This program recalculates the totals of multiple .RES result files that SPRAAK produces in a recognition experiment. In this way, you can manually divide the corpus in multiple parts (let's say the amount of cores you have in your machine or the amount of machines you have available), run a recognition experiment with every part on a core or separate machine and merge the .RES result file of each to one big result file with ''spr_scoreres''To do a merge of multiple files, following is required: 
 +  - Merge the contents of every .RES file into a single text file (e.g. use 'cat' command on linux) 
 +  - Use ''spr_scoreres'' to recalculate the scores in the merged file: ''spr_scoreres -PAR -c <path to .cor file containing full corpus> -r <path to file in which all .RES are merged> -nr <path to output file> -omit "<tags/words to ignore or omit in the output file (e.g. <s>)>"''
  
 ===== Optimize SPRAAK for N cores ===== ===== Optimize SPRAAK for N cores =====
distributed_computation.1428594777.txt.gz · Last modified: 2015/04/09 17:52 by mganzeboom