paper/poster/eamt2011_poster.tex - operations/debs/contenttranslation/apertium-af-nl - Gitiles

 \documentclass[a0,landscape]{a0poster}
 %\documentclass[a0,draft]{a0poster}
 % With the word preview in the square bracket, your poster will be a4
 % size, handy for previewing or submit to x4u queue for A0.
 % with preview removed, you get a file the right size for the HP
 % large-format printer. This is a bit bigger than A0, the width is
 % exactly one imperial yard.

 % This is so we can have multiple columns of text side-by-side
 \usepackage[utf8]{inputenc}
 \usepackage{graphicx}
 \usepackage{multicol}
 \usepackage{helvet}
 \usepackage{sectsty}
 %\allsectionsfont{\usefont{OT1}{phv}{bc}{n}\selectfont}
 \allsectionsfont{\sffamily} \subsubsectionfont{\sffamily\large}
 \columnsep=100pt

 % This is the thickness of the black line between the columns of text
 \columnseprule=3pt

 % this package gives you coloured text and various other simple
 % graphics hacks.
 % Details in /usr/local/teTeX/texmf/doc/generic/pstricks/*
 \usepackage{pstricks}

 %\psset{unit=1cm}

 \usepackage{times}
 \usepackage{url}

 % Define names for some colours
 \newcmykcolor{Inblue}{1.00 0.37 0.00 0.00}
 \newcmykcolor{Inred}{0.00 1.00 0.63 0.00}
 \newrgbcolor{Inmaroon}{0.4 0.0 0.4}
 \newrgbcolor{darkblue}{0.0 0.0 0.5}

 % Colour used for figure captions. Change to suit your own preference
 \newrgbcolor{captcolor}{0.0 .5 0.0}

 \begin{document}

 % This is black magic to put a title at the top of the page.
 % Thanks to Mark ``the magician'' Filipiak, although I have re-done a
 % lot of the code here -- hopefully it is more robust.

 % Make a box 0.55* width of poster for title and names
 % If your title is long, replace 0.55 with something bigger and
 % make the box for the address smaller by the same amount.
 \begin{minipage}[b]{0.75\linewidth}
 \veryHuge \bf
 \textsf{Rapid rule-based machine translation between Dutch and Afrikaans}
 \\[1cm]
 \huge \bf Pim Otte, Francis M. Tyers,\\
 \huge \rm Mendelcollege, Universitat d'Alacant
 \end{minipage}
 % Make box 0.35 X width of poster for address
 \begin{minipage}[b]{0.25\linewidth}
 \Large
 Contact: [email protected], [email protected] \\

 \end{minipage}
 %% Make box 0.05*width of poster for EU Shield graphic
 %\begin{minipage}[b]{0.05\linewidth}
 %\includegraphics[width=10cm]{logo_ofis_liv.pdf}
 %%\includegraphics{uitlogo}
 %\end{minipage}
 %\vspace{0.3cm}
 %\hrule
 %vspace{0.3cm}
 %\begin{minipage}[b]{\linewidth}
 %\hrule
 %{\small \url{http://www.ofis-bzh.org/bzh/ressources_linguistiques/index-troerofis.php}}
 %\end{minipage}
 %

 % This is how many columns your poster will be broken into.
 \begin{multicols}{4}

 % This is the width your figures will be scaled to. Change this if you
 % change the number of columns.
 \newlength{\figwidth}
 \setlength{\figwidth}{20cm}

 % Set to half of figwidth. Used for putting two figs side by side
 \newlength{\fighalfwidth}
 \setlength{\fighalfwidth}{10cm}

 % Can't use the figure environment within multicolumns. Set up our own
 % counter for figures.
 \newcounter{figscount}

 \section{Introduction}

 \noindent
 {\bf Dutch} is a West-Germanic language spoken by nearly 23 million people, mostly from the
 Netherlands and Flanders. {\bf Afrikaans} is spoken by at least 5 million people, mainly in
 South-Africa, but also in Namibia. Afrikaans also belongs to the West-Germanic language family and originates from the language
 spoken by the the Dutch colonists of the Cape Colony. In 1925 Afrikaans replaced Dutch as
 an official language in South-Africa, to be the joint official language together with English.
 On this poster we will describe the development of {\small {\tt apertium-af-nl}}, a bi-directional Afrikaans
 and Dutch machine-translation system

 \begin{center}
 \begin{minipage}[b]{26cm}
 \includegraphics[width=260mm]{mapdutchworld.png}
 \end{minipage}\\
 \textbf{Figure 1:} Where Afrikaans or Dutch is spoken as official language.
 \vspace{0.3cm}
 \end{center}

 %\vspace{0.5cm}

 \section{Method}

 \noindent
 The system is based on {\bf Apertium} (\url{http://www.apertium.org/}), a free/open-source rule-based
 machine translation platform. To create the language pair we used an existing resource and created
 several new ones. \\

 \begin{center}
 \begin{minipage}[b]{26cm}
 \includegraphics[width=260mm]{apertium2.pdf}
 \end{minipage}\\
 \textbf{Figure 2:} Modules of the Apertium translation system
 \vspace{0.3cm}
 \end{center}


 \vspace{0.5cm}

 \subsection{Existing resources}

 \noindent
 We reused the morphological transducer for Afrikaans, created during a currently dormant English-Afrikaans
 Machine Translation project.\\

 \subsection{Resources created}

 \subsubsection{Dutch morphological transducer}
 A new Dutch morphological transducer was created, because existing ones were unsuitable for several reasons:

 \begin{itemize}
  \item Non-free licence
  \item Not bidirectional (only analysis, not generation)
  \item Tagset different from Afrikaans transducer
 \end{itemize}

 %\vspace{0.5cm}

 \noindent
 The open categories (nouns, verbs, adjectives, adverbs) for the Dutch
 morphological analyser were extracted semi-automatically from
 Wiktionary, (\url{http://www.wiktionary.org}), which has entries like in {\bf Figure 3}. The resulting analyser entries were all hand checked.
 Closed categories were added by hand based on the grammar of Dutch of Shetter and Ham.
 \vspace{0.5cm}

 \begin{center}
 \begin{minipage}[b]{26cm}
 \includegraphics[width=260mm]{hoofdstad.png}
 \end{minipage}\\
 \textbf{Figure 3:} Example of Wiktionary entry: hoofdstad
 \label{wikt1}
 \vspace{0.3cm}
 \end{center}


 \vspace{0.5cm}

 \subsubsection{Bilingual dictionary}

 \noindent
 The bilingual dictionary was developed by adding exact matches, extracting proper names
 from Wikipedia, adding cognates, words which often have a small spelling difference and adding
 some entries by hand, such as closed categories and frequently missing words.

 \subsection{Transfer rules}

 \subsubsection{Afrikaans to Dutch}
 An example of an Afrikaans to Dutch transfer rule, which handles the negation scope marker, can be seen in
 {\bf Figure 4}

 \begin{center}
 \begin{minipage}[b]{25cm}
 \begin{scriptsize}
 \begin{verbatim}
     <rule comment="REGLA: nie">
       <pattern>
         <pattern-item n="nie"/>
       </pattern>
       <action>
         <choose>
           <when>
             <test>
               <equal>
                 <var n="seen_neg"/>
                 <lit v="true"/>
               </equal>
             </test>
           </when>
           <otherwise>
             <out>
               <chunk name="nie">
                 <tags>
                   <tag><lit-tag v="ADV"/></tag>
                 </tags>
                 <lu>
                   <clip pos="1" side="tl" part="lemh"/>
                   <clip pos="1" side="tl" part="a_adv"/>
                   <clip pos="1" side="tl" part="lemq"/>
                 </lu>
               </chunk>
             </out>
           </otherwise>
         </choose>
         <let>
          <var n="seen_neg"/>
          <lit v="true"/>
         </let>
       </action>
     </rule>
 \end{verbatim}
 \end{scriptsize}
 \end{minipage}\\
 ~\\
 \textbf{Figure 4:} Transfer rule for handling negation.
 \vspace{0.5cm}
 \end{center}

 \subsubsection{Compound words}
 In both Afrikaans and Dutch words can combine very productively into compounds.
 {\small {\tt apertium-af-nl}} can handle compounds that consist of one or more nouns,
 such as the one in {\bf Figure 5}. \\

 \begin{center}
 \begin{minipage}[b]{26cm}
 \begin{small}
 \begin{verbatim}
 lugmag

 ^lug<n><sg><cmp>+mag<n><sg>$

 ^lucht<n><mf><sg><cmp>$^macht<n><mf><sg>$

 luchtmacht
 \end{verbatim}
 \end{small}
 \end{minipage}\\
 \end{center}
 ~\\
 \textbf{Figure 5:} Example of compound analysis and translation: lugmag (air force)


 \subsubsection{Separable verbs}

 \noindent
 The system can handle separable verbs, which exist both in Afrikaans and Dutch, as
 long as there are no other phrases between the separated parts.\\

 \section{Evaluation}
 \noindent
 We evaluated the system in several ways: Naïve coverage, compound analysis, qualitative, quantitative and comparative

 \subsection{Coverage}

 \noindent
 Naïve coverage was calculated over Wikipedia corpora.\\


 \begin{minipage}[b]{25cm}
 \begin{center}
   \begin{tabular}{|l|r|r|}
    \hline
    {\bf Corpus}           & {\bf Tokens}    & {\bf Coverage}\\
    \hline
    {\tt af} Wikipedia     & 2,926,943       & 82.1\% $\pm$ 0.8 \\
    \hline
    {\tt nl} Wikipedia     & 18,569,183      & 80.5\% $\pm$ 0.7 \\
    \hline
   \end{tabular}

  \end{center}
  \textbf{Table 1:} Na\"ive vocabulary coverage for the two morphological analysers.
 \end{minipage}\\

 \subsection{Compound words}
 \noindent
 To test the accuracy of the compound word analysis, we marked words as correctly segmented and/or
 correctly translated, this test was performed in the Afrikaans$\rightarrow$Dutch direction. \\

 \begin{minipage}[b]{25cm}
   \begin{center}
   \begin{tabular}{|l|r|r|}
    \hline
    {\bf Corpus}    & {\bf Corr. Seg.}    & {\bf Corr. Trans.}\\
    \hline
    top-1,000       & 914                 &  776 \\
    \hline
    random-1,000    & 957                 &  801 \\
    \hline
   \end{tabular}
   \end{center}
   \textbf{Table 2:} Compound word accuracy in analysis and translation.
 \end{minipage}\\

 \subsection{Quantitative}
 \noindent
 The translation quality was measured using word error rate (WER). For this we used sets
 of 100 sentences from Wikipedia. The set C1 could not contain any unknown words, set C2 could. \\
 \begin{minipage}[b]{25cm}
 \begin{center}
   \begin{tabular}{|l|l|r|r|}
    \hline
    {\bf Dir.} & {\bf System}             & {\bf C1}          & {\bf C2} \\
    \hline
    {af-nl}  & {\small Apertium}  & 16.625 $\pm$ 1.465 & 23.405 $\pm$ 1.235 \\
             & {\small Google}  & {\bf 9.485 $\pm$ 1.115} & {\bf 10.575 $\pm$ 1.795} \\
    \hline
    {nl-af} & {\small Apertium }  & {\bf 15.435 $\pm$ 1.885}  & {\bf 21.72 $\pm$ 1.06} \\
            & {\small Google }  & 21.81 $\pm$ 1.72& 	25.71 $\pm$ 1.22	\\

    \hline
   \end{tabular}

     \label{table:quan}
   \end{center}
 	\textbf{Table 3:} Accuracy for the test corpora for the two systems as measured by Word Error Rate
         with 95\% confidence interval.
 \end{minipage}\\

 \subsection{Qualitative}
 To discover which areas of the system could use the most improvement we reviewed the translation errors
 in the Afrikaans$\rightarrow$Dutch direction and categorised them as in {\bf Table 4}. \\
 \begin{minipage}[b]{25cm}
   \begin{center}
   \begin{tabular}{|l|c|r|r|}
      \hline
      {\bf Error type}    & {\bf Count} & {\bf \% of total} \\
      \hline
      Syntactic transfer   & 235         & 42.4 \\
      \hline
      ~~~- Verb concordance & 99          & 17.9 \\
      ~~~- Auxiliary verbs & 13          & 2.3 \\
      ~~~- Relative pronoun& 11          & 2.0 \\
      ~~~- Capitalisation  & 10          & 1.8 \\
      ~~~- Chunking error  & 9           & 1.6 \\
      ~~~- Other           & 93          & 16.8 \\
      \hline
      Unknown word         & 147         & 26.5 \\
      Disambiguation       & 106         & 19.1 \\
      Morphology           & 28          & 5.1 \\
      Polysemy             & 23          & 4.2 \\
      Multiword            & 6           & 1.1 \\
      Compounding          & 6           & 1.1 \\
      Separable verb       & 3           & 0.5 \\
      \hline
      Total                & 554         & 100 \\
      \hline
   \end{tabular}

   \end{center}
      \textbf{Table 4:} Contribution to total error by type. Syntactic transfer errors are split into
       further categories.

 \end{minipage}\\

 \subsection{Comparative}
 Finally, we compared {\small {\tt apertium-af-nl}} with Google translate, using WER. The results
 are in {\bf Table 3}. {\bf Figure 6} is an example of an error, caused by wrong disambiguation. \\

 \begin{itemize}
     \item[] Hier volg 'n lys van hoofstede.
     \item[] Hier {\em volgen} een lijst van hoofdsteden.
     \item[] Hier {\em volgt} een lijst van hoofdsteden.
     \item[] `Here follows a list of capital cities.'
 \end{itemize}
 \textbf{Figure 6:} Example of disambiguation error.

 \section{Future work}
 We have presented a bi-directional rule-based machine translation system between Dutch and Afrikaans.
 The system gives promising results and offers improvement in the Dutch$\rightarrow$Afrikaans direction
 over another public system, but does not offer improvement in quality in the other direction.

 The three biggest issues in the system are:
 \begin{itemize}
  \item Lack of dictionary coverage -- some common words are missing, such as kalender, rewolosie, binneland, silwer
  \item Poor morphological disambiguation -- work is needed to be able to, e.g. distinguish better between present and infinitive in Afrikaans
  \item Insufficient syntactic transfer -- separable constructions need more support, amongst which separable verbs and auxiliary verbs with participles.
 \end{itemize}

 \section*{Acknowledgements}
 Development of this system was partially supported by the Google Code-in,
 a contest to introduce pre-university students to contributing to open-source
 software. This work has also received the support of the Spanish Ministry of Science and Innovation through project TIN2009-14009-C02-01.


 \includegraphics{apertium.png}


 \end{multicols}
 \end{document}
	\documentclass[a0,landscape]{a0poster}
	%\documentclass[a0,draft]{a0poster}
	% With the word preview in the square bracket, your poster will be a4
	% size, handy for previewing or submit to x4u queue for A0.
	% with preview removed, you get a file the right size for the HP
	% large-format printer. This is a bit bigger than A0, the width is
	% exactly one imperial yard.

	% This is so we can have multiple columns of text side-by-side
	\usepackage[utf8]{inputenc}
	\usepackage{graphicx}
	\usepackage{multicol}
	\usepackage{helvet}
	\usepackage{sectsty}
	%\allsectionsfont{\usefont{OT1}{phv}{bc}{n}\selectfont}
	\allsectionsfont{\sffamily} \subsubsectionfont{\sffamily\large}
	\columnsep=100pt

	% This is the thickness of the black line between the columns of text
	\columnseprule=3pt

	% this package gives you coloured text and various other simple
	% graphics hacks.
	% Details in /usr/local/teTeX/texmf/doc/generic/pstricks/*
	\usepackage{pstricks}

	%\psset{unit=1cm}

	\usepackage{times}
	\usepackage{url}

	% Define names for some colours
	\newcmykcolor{Inblue}{1.00 0.37 0.00 0.00}
	\newcmykcolor{Inred}{0.00 1.00 0.63 0.00}
	\newrgbcolor{Inmaroon}{0.4 0.0 0.4}
	\newrgbcolor{darkblue}{0.0 0.0 0.5}

	% Colour used for figure captions. Change to suit your own preference
	\newrgbcolor{captcolor}{0.0 .5 0.0}

	\begin{document}

	% This is black magic to put a title at the top of the page.
	% Thanks to Mark ``the magician'' Filipiak, although I have re-done a
	% lot of the code here -- hopefully it is more robust.

	% Make a box 0.55* width of poster for title and names
	% If your title is long, replace 0.55 with something bigger and
	% make the box for the address smaller by the same amount.
	\begin{minipage}[b]{0.75\linewidth}
	\veryHuge \bf
	\textsf{Rapid rule-based machine translation between Dutch and Afrikaans}
	\\[1cm]
	\huge \bf Pim Otte, Francis M. Tyers,\\
	\huge \rm Mendelcollege, Universitat d'Alacant
	\end{minipage}
	% Make box 0.35 X width of poster for address
	\begin{minipage}[b]{0.25\linewidth}
	\Large
	Contact: [email protected], [email protected] \\

	\end{minipage}
	%% Make box 0.05*width of poster for EU Shield graphic
	%\begin{minipage}[b]{0.05\linewidth}
	%\includegraphics[width=10cm]{logo_ofis_liv.pdf}
	%%\includegraphics{uitlogo}
	%\end{minipage}
	%\vspace{0.3cm}
	%\hrule
	%vspace{0.3cm}
	%\begin{minipage}[b]{\linewidth}
	%\hrule
	%{\small \url{http://www.ofis-bzh.org/bzh/ressources_linguistiques/index-troerofis.php}}
	%\end{minipage}
	%

	% This is how many columns your poster will be broken into.
	\begin{multicols}{4}

	% This is the width your figures will be scaled to. Change this if you
	% change the number of columns.
	\newlength{\figwidth}
	\setlength{\figwidth}{20cm}

	% Set to half of figwidth. Used for putting two figs side by side
	\newlength{\fighalfwidth}
	\setlength{\fighalfwidth}{10cm}

	% Can't use the figure environment within multicolumns. Set up our own
	% counter for figures.
	\newcounter{figscount}

	\section{Introduction}

	\noindent
	{\bf Dutch} is a West-Germanic language spoken by nearly 23 million people, mostly from the
	Netherlands and Flanders. {\bf Afrikaans} is spoken by at least 5 million people, mainly in
	South-Africa, but also in Namibia. Afrikaans also belongs to the West-Germanic language family and originates from the language
	spoken by the the Dutch colonists of the Cape Colony. In 1925 Afrikaans replaced Dutch as
	an official language in South-Africa, to be the joint official language together with English.
	On this poster we will describe the development of {\small {\tt apertium-af-nl}}, a bi-directional Afrikaans
	and Dutch machine-translation system

	\begin{center}
	\begin{minipage}[b]{26cm}
	\includegraphics[width=260mm]{mapdutchworld.png}
	\end{minipage}\\
	\textbf{Figure 1:} Where Afrikaans or Dutch is spoken as official language.
	\vspace{0.3cm}
	\end{center}

	%\vspace{0.5cm}

	\section{Method}

	\noindent
	The system is based on {\bf Apertium} (\url{http://www.apertium.org/}), a free/open-source rule-based
	machine translation platform. To create the language pair we used an existing resource and created
	several new ones. \\

	\begin{center}
	\begin{minipage}[b]{26cm}
	\includegraphics[width=260mm]{apertium2.pdf}
	\end{minipage}\\
	\textbf{Figure 2:} Modules of the Apertium translation system
	\vspace{0.3cm}
	\end{center}


	\vspace{0.5cm}

	\subsection{Existing resources}

	\noindent
	We reused the morphological transducer for Afrikaans, created during a currently dormant English-Afrikaans
	Machine Translation project.\\

	\subsection{Resources created}

	\subsubsection{Dutch morphological transducer}
	A new Dutch morphological transducer was created, because existing ones were unsuitable for several reasons:

	\begin{itemize}
	\item Non-free licence
	\item Not bidirectional (only analysis, not generation)
	\item Tagset different from Afrikaans transducer
	\end{itemize}

	%\vspace{0.5cm}

	\noindent
	The open categories (nouns, verbs, adjectives, adverbs) for the Dutch
	morphological analyser were extracted semi-automatically from
	Wiktionary, (\url{http://www.wiktionary.org}), which has entries like in {\bf Figure 3}. The resulting analyser entries were all hand checked.
	Closed categories were added by hand based on the grammar of Dutch of Shetter and Ham.
	\vspace{0.5cm}

	\begin{center}
	\begin{minipage}[b]{26cm}
	\includegraphics[width=260mm]{hoofdstad.png}
	\end{minipage}\\
	\textbf{Figure 3:} Example of Wiktionary entry: hoofdstad
	\label{wikt1}
	\vspace{0.3cm}
	\end{center}


	\vspace{0.5cm}

	\subsubsection{Bilingual dictionary}

	\noindent
	The bilingual dictionary was developed by adding exact matches, extracting proper names
	from Wikipedia, adding cognates, words which often have a small spelling difference and adding
	some entries by hand, such as closed categories and frequently missing words.

	\subsection{Transfer rules}

	\subsubsection{Afrikaans to Dutch}
	An example of an Afrikaans to Dutch transfer rule, which handles the negation scope marker, can be seen in
	{\bf Figure 4}

	\begin{center}
	\begin{minipage}[b]{25cm}
	\begin{scriptsize}
	\begin{verbatim}
	<rule comment="REGLA: nie">
	<pattern>
	<pattern-item n="nie"/>
	</pattern>
	<action>
	<choose>
	<when>
	<test>
	<equal>
	<var n="seen_neg"/>
	<lit v="true"/>
	</equal>
	</test>
	</when>
	<otherwise>
	<out>
	<chunk name="nie">
	<tags>
	<tag><lit-tag v="ADV"/></tag>
	</tags>
	<lu>
	<clip pos="1" side="tl" part="lemh"/>
	<clip pos="1" side="tl" part="a_adv"/>
	<clip pos="1" side="tl" part="lemq"/>
	</lu>
	</chunk>
	</out>
	</otherwise>
	</choose>
	<let>
	<var n="seen_neg"/>
	<lit v="true"/>
	</let>
	</action>
	</rule>
	\end{verbatim}
	\end{scriptsize}
	\end{minipage}\\
	~\\
	\textbf{Figure 4:} Transfer rule for handling negation.
	\vspace{0.5cm}
	\end{center}

	\subsubsection{Compound words}
	In both Afrikaans and Dutch words can combine very productively into compounds.
	{\small {\tt apertium-af-nl}} can handle compounds that consist of one or more nouns,
	such as the one in {\bf Figure 5}. \\

	\begin{center}
	\begin{minipage}[b]{26cm}
	\begin{small}
	\begin{verbatim}
	lugmag

	^lug<n><sg><cmp>+mag<n><sg>$

	^lucht<n><mf><sg><cmp>$^macht<n><mf><sg>$

	luchtmacht
	\end{verbatim}
	\end{small}
	\end{minipage}\\
	\end{center}
	~\\
	\textbf{Figure 5:} Example of compound analysis and translation: lugmag (air force)



	\subsubsection{Separable verbs}

	\noindent
	The system can handle separable verbs, which exist both in Afrikaans and Dutch, as
	long as there are no other phrases between the separated parts.\\

	\section{Evaluation}
	\noindent
	We evaluated the system in several ways: Naïve coverage, compound analysis, qualitative, quantitative and comparative

	\subsection{Coverage}

	\noindent
	Naïve coverage was calculated over Wikipedia corpora.\\


	\begin{minipage}[b]{25cm}
	\begin{center}
	\begin{tabular}{\|l\|r\|r\|}
	\hline
	{\bf Corpus} & {\bf Tokens} & {\bf Coverage}\\
	\hline
	{\tt af} Wikipedia & 2,926,943 & 82.1\% $\pm$ 0.8 \\
	\hline
	{\tt nl} Wikipedia & 18,569,183 & 80.5\% $\pm$ 0.7 \\
	\hline
	\end{tabular}

	\end{center}
	\textbf{Table 1:} Na\"ive vocabulary coverage for the two morphological analysers.
	\end{minipage}\\

	\subsection{Compound words}
	\noindent
	To test the accuracy of the compound word analysis, we marked words as correctly segmented and/or
	correctly translated, this test was performed in the Afrikaans$\rightarrow$Dutch direction. \\

	\begin{minipage}[b]{25cm}
	\begin{center}
	\begin{tabular}{\|l\|r\|r\|}
	\hline
	{\bf Corpus} & {\bf Corr. Seg.} & {\bf Corr. Trans.}\\
	\hline
	top-1,000 & 914 & 776 \\
	\hline
	random-1,000 & 957 & 801 \\
	\hline
	\end{tabular}
	\end{center}
	\textbf{Table 2:} Compound word accuracy in analysis and translation.
	\end{minipage}\\

	\subsection{Quantitative}
	\noindent
	The translation quality was measured using word error rate (WER). For this we used sets
	of 100 sentences from Wikipedia. The set C1 could not contain any unknown words, set C2 could. \\
	\begin{minipage}[b]{25cm}
	\begin{center}
	\begin{tabular}{\|l\|l\|r\|r\|}
	\hline
	{\bf Dir.} & {\bf System} & {\bf C1} & {\bf C2} \\
	\hline
	{af-nl} & {\small Apertium} & 16.625 $\pm$ 1.465 & 23.405 $\pm$ 1.235 \\
	& {\small Google} & {\bf 9.485 $\pm$ 1.115} & {\bf 10.575 $\pm$ 1.795} \\
	\hline
	{nl-af} & {\small Apertium } & {\bf 15.435 $\pm$ 1.885} & {\bf 21.72 $\pm$ 1.06} \\
	& {\small Google } & 21.81 $\pm$ 1.72& 25.71 $\pm$ 1.22 \\

	\hline
	\end{tabular}

	\label{table:quan}
	\end{center}
	\textbf{Table 3:} Accuracy for the test corpora for the two systems as measured by Word Error Rate
	with 95\% confidence interval.
	\end{minipage}\\

	\subsection{Qualitative}
	To discover which areas of the system could use the most improvement we reviewed the translation errors
	in the Afrikaans$\rightarrow$Dutch direction and categorised them as in {\bf Table 4}. \\
	\begin{minipage}[b]{25cm}
	\begin{center}
	\begin{tabular}{\|l\|c\|r\|r\|}
	\hline
	{\bf Error type} & {\bf Count} & {\bf \% of total} \\
	\hline
	Syntactic transfer & 235 & 42.4 \\
	\hline
	~~~- Verb concordance & 99 & 17.9 \\
	~~~- Auxiliary verbs & 13 & 2.3 \\
	~~~- Relative pronoun& 11 & 2.0 \\
	~~~- Capitalisation & 10 & 1.8 \\
	~~~- Chunking error & 9 & 1.6 \\
	~~~- Other & 93 & 16.8 \\
	\hline
	Unknown word & 147 & 26.5 \\
	Disambiguation & 106 & 19.1 \\
	Morphology & 28 & 5.1 \\
	Polysemy & 23 & 4.2 \\
	Multiword & 6 & 1.1 \\
	Compounding & 6 & 1.1 \\
	Separable verb & 3 & 0.5 \\
	\hline
	Total & 554 & 100 \\
	\hline
	\end{tabular}

	\end{center}
	\textbf{Table 4:} Contribution to total error by type. Syntactic transfer errors are split into
	further categories.

	\end{minipage}\\

	\subsection{Comparative}
	Finally, we compared {\small {\tt apertium-af-nl}} with Google translate, using WER. The results
	are in {\bf Table 3}. {\bf Figure 6} is an example of an error, caused by wrong disambiguation. \\

	\begin{itemize}
	\item[] Hier volg 'n lys van hoofstede.
	\item[] Hier {\em volgen} een lijst van hoofdsteden.
	\item[] Hier {\em volgt} een lijst van hoofdsteden.
	\item[] `Here follows a list of capital cities.'
	\end{itemize}
	\textbf{Figure 6:} Example of disambiguation error.

	\section{Future work}
	We have presented a bi-directional rule-based machine translation system between Dutch and Afrikaans.
	The system gives promising results and offers improvement in the Dutch$\rightarrow$Afrikaans direction
	over another public system, but does not offer improvement in quality in the other direction.

	The three biggest issues in the system are:
	\begin{itemize}
	\item Lack of dictionary coverage -- some common words are missing, such as kalender, rewolosie, binneland, silwer
	\item Poor morphological disambiguation -- work is needed to be able to, e.g. distinguish better between present and infinitive in Afrikaans
	\item Insufficient syntactic transfer -- separable constructions need more support, amongst which separable verbs and auxiliary verbs with participles.
	\end{itemize}

	\section*{Acknowledgements}
	Development of this system was partially supported by the Google Code-in,
	a contest to introduce pre-university students to contributing to open-source
	software. This work has also received the support of the Spanish Ministry of Science and Innovation through project TIN2009-14009-C02-01.


	\includegraphics{apertium.png}





	\end{multicols}
	\end{document}