Commit 09360e40 by Hannes Kinks

Initial commit

parents
@misc{urlSource,
title = {Tallinn University of Technology},
howpublished = {\url{http://www.ttu.ee}},
}
@ARTICLE{articleSource,
author={},
journal={},
title={},
year={},
volume={},
number={},
pages={},
keywords={},
doi={},
ISSN={},
month={},}
\relax
\providecommand\hyper@newdestlabel[2]{}
\providecommand\HyperFirstAtBeginDocument{\AtBeginDocument}
\HyperFirstAtBeginDocument{\ifx\hyper@anchor\@undefined
\global\let\oldcontentsline\contentsline
\gdef\contentsline#1#2#3#4{\oldcontentsline{#1}{#2}{#3}}
\global\let\oldnewlabel\newlabel
\gdef\newlabel#1#2{\newlabelxx{#1}#2}
\gdef\newlabelxx#1#2#3#4#5#6{\oldnewlabel{#1}{{#2}{#3}}}
\AtEndDocument{\ifx\hyper@anchor\@undefined
\let\contentsline\oldcontentsline
\let\newlabel\oldnewlabel
\fi}
\fi}
\global\let\hyper@last\relax
\gdef\HyperFirstAtBeginDocument#1{#1}
\providecommand\HyField@AuxAddToFields[1]{}
\providecommand\HyField@AuxAddToCoFields[2]{}
\citation{urlSource}
\citation{urlSource}
\citation{urlSource}
\citation{urlSource}
\@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{10}{section.1}}
\newlabel{Introduction}{{1}{10}{Introduction}{section.1}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces Tallinn University of technology \cite {urlSource}\relax }}{10}{figure.caption.1}}
\providecommand*\caption@xref[2]{\@setref\relax\@undefined{#1}}
\newlabel{fig:ttuExample}{{1}{10}{Tallinn University of technology \cite {urlSource}\relax }{figure.caption.1}{}}
\@writefile{toc}{\contentsline {subsection}{\numberline {1.1}Subsection}{10}{subsection.1.1}}
\@writefile{toc}{\contentsline {section}{\numberline {2}First section}{11}{section.2}}
\@writefile{toc}{\contentsline {section}{\numberline {3}Second section}{12}{section.3}}
\@writefile{toc}{\contentsline {section}{\numberline {4}Conclusion}{13}{section.4}}
\newlabel{Conclusion}{{4}{13}{Conclusion}{section.4}{}}
\bibstyle{ieeetr}
\bibdata{literature}
\bibcite{urlSource}{1}
\newlabel{LastPage}{{}{14}{}{page.14}{}}
\xdef\lastpage@lastpage{14}
\xdef\lastpage@lastpageHy{14}
\begin{thebibliography}{1}
\bibitem{urlSource}
``Tallinn university of technology.'' \url{http://www.ttu.ee}.
\end{thebibliography}
This is BibTeX, Version 0.99dThe top-level auxiliary file: thesis.aux
The style file: ieeetr.bst
Database file #1: literature.bib
\contentsline {figure}{\numberline {1}{\ignorespaces Tallinn University of technology \cite {urlSource}\relax }}{10}{figure.caption.1}
\BOOKMARK [1][-]{section.1}{Introduction}{}% 1
\BOOKMARK [2][-]{subsection.1.1}{Subsection}{section.1}% 2
\BOOKMARK [1][-]{section.2}{First section}{}% 3
\BOOKMARK [1][-]{section.3}{Second section}{}% 4
\BOOKMARK [1][-]{section.4}{Conclusion}{}% 5
File added
\documentclass[a4paper, 12pt]{article}
\usepackage[top=2.5cm, bottom=2.5cm, left=3cm, right=3cm, includefoot]{geometry} % Geometry of the page
\usepackage{graphicx} % Figures inside text
\usepackage{titlesec} % For editing titles
\usepackage{longtable} % For creating page wide tables
\usepackage{multirow} % Needed for merging multiple rows in a table
\usepackage{todonotes} % For adding todo notes in the work
\usepackage{url} % For using URLs
\usepackage{float} % For formating tables and figures
\usepackage{blindtext} % Stubs
\usepackage{pgfplots} % For plotting
\usepackage[T2A,T1]{fontenc} % For using estonian and russian letters
\usepackage[utf8]{inputenc} % %UTF8 decoding
\usepackage{tocloft} % For editing contents
\usepackage{amssymb} % For square itemized lists
\renewcommand{\labelitemi}{\tiny$\blacksquare$} %For square itemized lists
\usepackage{caption} % Used when captioning tables and figures
\captionsetup{labelsep=period} % Adds period to the end of table or figure
\usepackage{verbatimbox} %To put program code in the center using Verbatim
\titlelabel{\thetitle.\quad} % Adds period at the end of titles
\usepackage{times} % Times type text
\usepackage{fancyhdr} % Usage of headers and footers
\setlength{\parindent}{0cm} % Paragraph intent is set to 0
\usepackage{setspace} % Used for spacing of text
\onehalfspacing % 1,5 spacing between lines of text
\setlength{\parskip}{\baselineskip}
\setcounter{secnumdepth}{4} % Levels
\usepackage{hyperref} % clickable references
\usepackage[]{algorithm2e} % pseudocode
\usepackage{tikz} % for drawing graphs
\usetikzlibrary{matrix,chains,positioning,decorations.pathreplacing,arrows}
\usepackage{amsmath} % Math symbols
\usepackage{lastpage} % last page
\usepackage{listings} % syntax highlight
\usepackage{enumitem}
\usepackage{subfig}
% redefine section so that it would start every time on a new page
\let\stdsection\section
\renewcommand\section{\newpage\stdsection}
% syntax highlight for vhdl
\definecolor{black}{rgb}{0,0,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\lstset{frame=tb,
language=vhdl,
aboveskip=3mm,
belowskip=3mm,
showstringspaces=false,
columns=flexible,
basicstyle={\small\ttfamily},
numbers=none,
numberstyle=\tiny\color{gray},
keywordstyle=\color{gray},
commentstyle=\color{dkgreen},
stringstyle=\color{gray},
breaklines=true,
breakatwhitespace=true,
tabsize=3
}
\begin{document}
%------------------------------TITLE PAGE---------------------------------
\thispagestyle{fancy}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}
\headheight = 57pt
\headsep = 0pt
\chead{
\textsc{\begin{Large}
Tallinn University of Technology\\
\end{Large} }
Faculty of Information Technology\\
Department of Computer Engineering
}
\vspace*{7 cm}
\begin{center}
IAY70LT\\[0cm]
Firstname Lastname 123456 ABCD\\
\begin{LARGE}
\textsc{Title of the Thesis\\}
%\textsc{Implementing neuroevolution on reprogrammable hardware\\}
\end{LARGE}
Master Thesis\\[2cm]
\end{center}
\begin{flushright} %Joondab teksti paremale
Supervisor: Firstname Lastname PhD\\
Co-Supervisor: Firstname Lastname MSc\\[0cm]
\end{flushright}
\cfoot{Tallinn <year>}
%\renewcommand{\headrulewidth}{0pt} %Eemaldab päisest horisontaalse joone
\pagebreak %Lehe lõpp
%------------------------------TIITELLEHT EST---------------------------------
\thispagestyle{fancy} %Leht sisaldab päist ja jalust
\renewcommand{\headrulewidth}{0pt} %Eemaldab päisest horisontaalse joone
\renewcommand{\footrulewidth}{0pt} %Eemaldab jalusest horisontaalse joone
\headheight = 57pt %Paneb paika päise laiuse (vastavalt kompilaatori soovitusele)
\headsep = 0pt %Vähendab päise ja teksti vahelise kauguse nullini
%\footskip = 10pt %Jaluse ruum
\chead{ %Paigutab järgneva teksti päises keskele
\textsc{\begin{Large} %Tekst suurtähtedega ja suuremaks
Tallinna tehnikaülikool\\
\end{Large} }
Infotehnoloogia teaduskond\\
Arvutitehnika instituut
}
\vspace*{7 cm} %Tekitab lehe alguse ja teksti vahele tühja ala vastava laiusega
\begin{center} %Tekst keskele
IAY70LT\\[0cm]
Firstname Lastname 123456 ABCD\\
\begin{LARGE}
\textsc{Lõputöö pealkiri\\}
%\textsc{Implementing neuroevolution on reprogrammable hardware\\}
\end{LARGE}
Magistritöö\\[2cm]
\end{center}
\begin{flushright} %Joondab teksti paremale
Juhendaja: Firstname Lastname PhD\\
Kaasjuhendaja: Firstname Lastname MSc\\[0cm]
\end{flushright}
\cfoot{Tallinn <year>} %Lisab asukoha ja kuupäeva jalusesse
%\renewcommand{\headrulewidth}{0pt} %Eemaldab päisest horisontaalse joone
\pagebreak %Lehe lõpp
%---------------------------Author's declaration of originality-------------------------
\section*{\begin{center} Author's declaration of originality \end{center}}
I hereby certify that I am the sole author of this thesis and that no part of this thesis has been published or submitted for publication.
All works and major viewpoints of the other authors, data from other sources of literature and elsewhere used for writing this paper have been referenced.
%Autorideklaratsioon on iga lõputöö kohustuslik osa, mis järgneb tiitellehele.
%Autorideklaratsioon esitatakse järgmise tekstina:
%
%Olen koostanud antud töö iseseisvalt. Kõik töö koostamisel kasutatud teiste autorite tööd, olulised seisukohad, kirjandusallikatest ja mujalt pärinevad andmed on viidatud. Käsolevat tööd ei ole varem esitatud kaitsmisele kusagil mujal.
Author: Hannes Kinks
\today
\pagebreak
%-----------------------------ABSTRACT-----------------------------------
\section*{\begin{center}
Abstract
\end{center}}
Here goes your abstract...
The thesis is in English and contains \pageref{LastPage} pages of text, 5 chapters, 23 figures, 8 tables.
\pagebreak
%---------------------------ANNOTATSIOON---------------------------------
\section*{\begin{center}
Annotatsioon
\end{center}}
Annotatsioon on lõputöö kohustuslik osa, mis annab lugejale ülevaate töö eesmärkidest, olulisematest käsitletud probleemidest ning tähtsamatest tulemustest ja järeldustest. Annotatsioon on töö lühitutvustus, mis ei selgita ega põhjenda midagi, küll aga kajastab piisavalt töö sisu. Inglisekeelset annotatsiooni nimetatakse Abstract, venekeelset aga
Sõltuvalt töö põhikeelest, esitatakse töös järgmised annotatsioonid:
\begin{itemize}
\item kui töö põhikeel on eesti keel, siis esitatakse annotatsioon eesti keeles mahuga $\frac{1}{2 }$ A4 lehekülge ja annotatsioon \textit{Abstract} inglise keeles mahuga vähemalt 1 A4 lehekülg;
\item kui töö põhikeel on inglise keel, siis esitatakse annotatsioon (Abstract) inglise keeles mahuga $\frac{1}{2}$ A4 lehekülge ja annotatsioon eesti keeles mahuga vähemalt 1 A4 lehekülg;
\end{itemize}
Annotatsiooni viimane lõik on kohustuslik ja omab järgmist sõnastust:
Lõputöö on kirjutatud inglise keeles ning sisaldab teksti \pageref{LastPage} leheküljel, 5 peatükki, 23 joonist, 8 tabelit.
\pagebreak
%---------------------LÜHENDITE JA MÕISTETE SÕNASTIK---------------------
\section*{Glossary of Terms and Abbreviations}
\begin{tabular}{p{3cm}p{11cm}}
ATI&TTÜ Arvutitehnika instituut\\
DPI&\textit{Dots per inch}, punkti tolli kohta
\end{tabular}
\pagebreak
%----------------------------Contents----------------------------------
\tableofcontents
\newpage
%----------------------List of figures-------------------------------
\listoffigures
\pagebreak
%----------------------List of tables---------------------------------
\listoftables
\pagebreak
%-----------------------------SISSEJUHATUS-------------------------------
\section{Introduction}
\label{Introduction} %Allows referencing titles with \ref
\begin{figure}[h]
\centering
\includegraphics[width=3cm]{img/example.png} % edit the width according to need
\caption{Tallinn University of technology \cite{urlSource}}
\label{fig:ttuExample}
\end{figure}
Example of referencing to figure \ref{fig:ttuExample}. Example of citing something \cite{urlSource}.
\subsection{Subsection}
Example of subsection
%-------------------------------TOPIC START---------------------------
\section{First section}
\section{Second section}
%-------------------------------CONCLUSION---------------------------
\section{Conclusion}
\label{Conclusion}
\pagebreak
%-------------------------------Bibliography-------------------------
\bibliographystyle{ieeetr}
\footnotesize
\setstretch{0}
\bibliography{literature} % viide bibtex failile (literature.bib)
\end{document}
\documentclass[a4paper, 12pt]{article} %Dokumendiklassi defineerimine ja väljastatava teksti suuruse seadistamine
\usepackage{graphicx} %Võimaldab teksti sees kasutada jooniseid
\usepackage[top=2.5cm, bottom=2.5cm, left=3cm, right=3cm, includefoot]{geometry} %Määrab ära lehekülje suuruse
\usepackage{titlesec} %Vajalik pealkirjade modifitseerimiseks
\usepackage{longtable} %Vajalik pakett, et saaks teha üle ühe leheküljelisi tabeleid
\usepackage{multirow} %Vajalik, kui tahta tabelites mitut rida kokku panna
\usepackage{todonotes} %Vajalik, kui tahta lisada töösse todo märkmeid
\usepackage{url} %Vajalik, kui töös on kasutusel URL aadress. Sel juhul märkida URL tagi vahele ning LaTeX ei hakka seda lahti kompileerima eraldi käskudeks vms
\usepackage{float} %Vajalik töös olevate tabelite ja jooniste vormistamiseks
\usepackage{blindtext} %Stub-ide tekitamiseks
\usepackage[T2A,T1]{fontenc} % Vajalik vene ja eesti keelsete tähtede kasutamiseks
\usepackage[utf8]{inputenc} % %UTF8 dekodeerimist kasutatakse
\usepackage{tocloft} %Selleks, et modida sisukorda
%\setlength\cftparskip{-2pt}
%\setlength\cftbeforechapskip{0pt}
\usepackage{amssymb} %For square itemized listss
\renewcommand{\labelitemi}{\tiny$\blacksquare$} %For square itemized lists
\usepackage{caption} %Vajalik tabelite ja jooniste pealkirjastamisel
\captionsetup{labelsep=period} %Lisab tabeli või joonise nime lõppu punkti
\usepackage{verbatimbox} %To put program code in the center using Verbatim
\titlelabel{\thetitle.\quad} %Lisab pealkirjade lõppu punkti
\usepackage{times} %Tekst on Times tüüpi
\usepackage{fancyhdr} %Võimaldab kasutada päiseid ja jaluseid
\setlength{\parindent}{0cm} %Lõigu taane on seatud nulliks
\usepackage{setspace} %Vajalik teksti vahede seadistamiseks
\onehalfspacing %Ridade vahel on 1,5 tähe kõrgusest
%\usepackage{parskip}
\setlength{\parskip}{\baselineskip}
%\hangindent=0.7cm
\setcounter{secnumdepth}{4} % tasemed
\hyphenation{põhi-tekstis üliõpilas-kood lehe-küljed joonda-takse} %Ebakorrektse poolitamise parandamine
% redefine section so that it would start every time on a new page
\let\stdsection\section
\renewcommand\section{\newpage\stdsection}
\usepackage{hyperref} % clickable references
\usepackage[]{algorithm2e} % pseudocode
% tikz - drawing graphs etc
\usepackage{tikz}
\usetikzlibrary{matrix,chains,positioning,decorations.pathreplacing,arrows}
% for the implication symbol => \implies
\usepackage{amsmath}
\usepackage{pgfplots} % for plotting
% todonotes
\usepackage{todonotes}
% last page
\usepackage{lastpage}
% syntax highlight
\usepackage{listings}
\definecolor{black}{rgb}{0,0,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\lstset{frame=tb,
language=vhdl,
aboveskip=3mm,
belowskip=3mm,
showstringspaces=false,
columns=flexible,
basicstyle={\small\ttfamily},
numbers=none,
numberstyle=\tiny\color{gray},
keywordstyle=\color{gray},
commentstyle=\color{dkgreen},
stringstyle=\color{gray},
breaklines=true,
breakatwhitespace=true,
tabsize=3
}
\usepackage{enumitem}
\usepackage{subfig}
\begin{document}
%------------------------------TIITELLEHT---------------------------------
\thispagestyle{fancy} %Leht sisaldab päist ja jalust
\renewcommand{\headrulewidth}{0pt} %Eemaldab päisest horisontaalse joone
\renewcommand{\footrulewidth}{0pt} %Eemaldab jalusest horisontaalse joone
\headheight = 57pt %Paneb paika päise laiuse (vastavalt kompilaatori soovitusele)
\headsep = 0pt %Vähendab päise ja teksti vahelise kauguse nullini
%\footskip = 10pt %Jaluse ruum
\chead{ %Paigutab järgneva teksti päises keskele
\textsc{\begin{Large} %Tekst suurtähtedega ja suuremaks
Tallinn University of Technology\\
\end{Large} }
Faculty of Information Technology\\
Department of Computer Engineering
}
\vspace*{7 cm} %Tekitab lehe alguse ja teksti vahele tühja ala vastava laiusega
\begin{center} %Tekst keskele
IAY70LT\\[0cm]
Hannes Kinks 132465 IASM\\
\begin{LARGE}
\textsc{Implementing Neural Networks on Field Programmable Gate Array\\}
%\textsc{Implementing neuroevolution on reprogrammable hardware\\}
\end{LARGE}
Master Thesis\\[2cm]
\end{center}
\begin{flushright} %Joondab teksti paremale
Supervisor: Peeter Ellervee PhD\\
Co-Supervisor: Siavoosh Payandeh Azad MSc\\[0cm]
\end{flushright}
\cfoot{Tallinn 2015} %Lisab asukoha ja kuupäeva jalusesse
%\renewcommand{\headrulewidth}{0pt} %Eemaldab päisest horisontaalse joone
\pagebreak %Lehe lõpp
%------------------------------TIITELLEHT EST---------------------------------
\thispagestyle{fancy} %Leht sisaldab päist ja jalust
\renewcommand{\headrulewidth}{0pt} %Eemaldab päisest horisontaalse joone
\renewcommand{\footrulewidth}{0pt} %Eemaldab jalusest horisontaalse joone
\headheight = 57pt %Paneb paika päise laiuse (vastavalt kompilaatori soovitusele)
\headsep = 0pt %Vähendab päise ja teksti vahelise kauguse nullini
%\footskip = 10pt %Jaluse ruum
\chead{ %Paigutab järgneva teksti päises keskele
\textsc{\begin{Large} %Tekst suurtähtedega ja suuremaks
Tallinna tehnikaülikool\\
\end{Large} }
Infotehnoloogia teaduskond\\
Arvutitehnika instituut
}
\vspace*{7 cm} %Tekitab lehe alguse ja teksti vahele tühja ala vastava laiusega
\begin{center} %Tekst keskele
IAY70LT\\[0cm]
Hannes Kinks 132465 IASM\\
\begin{LARGE}
\textsc{Närvivõrgu realiseerimine programmeeritaval ventiilmaatriksil\\}
%\textsc{Implementing neuroevolution on reprogrammable hardware\\}
\end{LARGE}
Magistritöö\\[2cm]
\end{center}
\begin{flushright} %Joondab teksti paremale
Juhendaja: Peeter Ellervee PhD\\
Kaasjuhendaja: Siavoosh Payandeh Azad MSc\\[0cm]
\end{flushright}
\cfoot{Tallinn 2015} %Lisab asukoha ja kuupäeva jalusesse
%\renewcommand{\headrulewidth}{0pt} %Eemaldab päisest horisontaalse joone
\pagebreak %Lehe lõpp
%---------------------------AUTORIDEKLARATSIOON-------------------------
\section*{\begin{center}
Author's declaration of originality
\end{center}}
I hereby certify that I am the sole author of this thesis and that no part of this thesis has been published or submitted for publication.
All works and major viewpoints of the other authors, data from other sources of literature and elsewhere used for writing this paper have been referenced.
%Autorideklaratsioon on iga lõputöö kohustuslik osa, mis järgneb tiitellehele.
%Autorideklaratsioon esitatakse järgmise tekstina:
%
%Olen koostanud antud töö iseseisvalt. Kõik töö koostamisel kasutatud teiste autorite tööd, olulised seisukohad, kirjandusallikatest ja mujalt pärinevad andmed on viidatud. Käsolevat tööd ei ole varem esitatud kaitsmisele kusagil mujal.
Author: Hannes Kinks
\today
\pagebreak
%---------------------------ANNOTATSIOON---------------------------------
%\section*{\begin{center}
%Annotatsioon
%\end{center}}
%
%Annotatsioon on lõputöö kohustuslik osa, mis annab lugejale ülevaate töö eesmärkidest, olulisematest käsitletud probleemidest ning tähtsamatest tulemustest ja järeldustest. Annotatsioon on töö lühitutvustus, mis ei selgita ega põhjenda midagi, küll aga kajastab piisavalt töö sisu. Inglisekeelset annotatsiooni nimetatakse Abstract, venekeelset aga
%%\foreignlanguage{russian}{Aннотация}.
%
%Sõltuvalt töö põhikeelest, esitatakse töös järgmised annotatsioonid:
%\begin{itemize}
%\item kui töö põhikeel on eesti keel, siis esitatakse annotatsioon eesti keeles mahuga $\frac{1}{2 }$ A4 lehekülge ja annotatsioon \textit{Abstract} inglise keeles mahuga vähemalt 1 A4 lehekülg;
%\item kui töö põhikeel on inglise keel, siis esitatakse annotatsioon (Abstract) inglise keeles mahuga $\frac{1}{2}$ A4 lehekülge ja annotatsioon eesti keeles mahuga vähemalt 1 A4 lehekülg;
%\end{itemize}
%
%Annotatsiooni viimane lõik on kohustuslik ja omab järgmist sõnastust:
%
%Lõputöö on kirjutatud [mis keeles] keeles ning sisaldab teksti [lehekülgede arv] leheküljel, [peatükkide arv] peatükki, [jooniste arv] joonist, [tabelite arv] tabelit.
%\pagebreak
%-----------------------------ABSTRACT-----------------------------------
\section*{\begin{center}
Abstract
\end{center}}
%Võõrkeelse annotatsiooni koostamise ja vormistamise tingimused on esitatud eestikeelse annotatsiooni juures.
The thesis is in English and contains \pageref{LastPage} pages of text, [chapters] chapters, [figures] figures, [tables] tables.
\pagebreak
%---------------------LÜHENDITE JA MÕISTETE SÕNASTIK---------------------
\section*{Glossary of Terms and Abbreviations}
%\section*{\begin{center}
%%Lühendite ja mõistete sõnastik
%\end{center}}
%Lühendite ning mõistete sõnastikku lisatakse kõik töö põhitekstis kasutatud uued ning ka mitmetähenduslikud üldtuntud terminid. Näiteks inglisekeelne lühend PC võib tähendada nii Personal Computer kui ka Program Counter, sõltuvalt kontekstist. Lühendid ja mõisted esitatakse tabuleeritult kahte tulpa selliselt, et vasakul on esitatud lühend või mõiste ja paremal tulbas seletus. Inglisekeelsed sõnad seletustes esitatakse kaldkirjas. Alltoodud näited esitavad lühendite ja mõistete sõnastiku korrektset vormistamist.
\begin{tabular}{p{3cm}p{11cm}} %Tabel, mille esimese lahtri laius on 3cm.
%ATI&TTÜ Arvutitehnika instituut\\
%DPI&\textit{Dots per inch}, punkti tolli kohta
FPGA&Field Programmable Gate Array\\
ASIC&Application Specific Integrated Circuit\\
HDL&Hardware Description Language\\
GPU&Graphics Processing Unit\\
CPU&Central Processing Unit\\
CLB&Configurable Logic Block\\
LUT&Lookup Table\\
LFSR&Linear feedback shift register\\
RAM&Random Access Memory\\
ANN&Artificial Neural Network\\
FANN&Feed-forward Artificial Neural Network\\
Genetic operator&Functions used in genetic computing to diversify the solutions (mutation) and combine them together (crossover). Mutation is unary and crossover is binary operation.\\
Genotype&Individuals full genetic information, set of rules that can be used to generate phenotype.\\
Phenotype&The individual itself that is the result of specific genotype.\\
PCA&Principal Component Analysis\\
2DPCA&Two-dimensional Principal Component Analysis\\
VHDL&VHSIC Hardware Description Language\\
\end{tabular}
\pagebreak
%----------------------------SISUKORD----------------------------------
\tableofcontents
\newpage
%----------------------JOONISTE NIMEKIRI-------------------------------
\listoffigures
\pagebreak
%----------------------TABELITE NIMEKIRI---------------------------------
\listoftables
\pagebreak
%-----------------------------SISSEJUHATUS-------------------------------
\section{Introduction}
\label{Introduction} %Võimaldab pealkirjale viidata \ref käsuga
% Human brain
% 85 billion brain cells
% 20 W Energy consumption
% 8.75 Mbit/s
% 10Hz-200Hz
When engineering computers to solve real life problems, which are in their nature fuzzy, changing and without a strict rule set, one of the approaches is to take inspiration from what already works the best in nature - the human brain.
Even though computing power of processors has been increasing twofold every two years, they still do not seem to be a match for brains in many areas. Of course brain and computer intrinsically work through different principles and are not directly comparable. The motivation behind computers architecture has been to perform mathematical calculations while brains evolved to survive in its natural environment. Therefore more complex classification and estimation tasks, like speech and image recognition, planning and decision making, which brains can naturally solve have proven to be a challenge for traditional approach of mathematics. At best, if computer is programmed accordingly to simulate brain's information processing, it can reach the level of an insect brain according to the approximation of Figure \ref{fig:computingGrowth}.
\begin{figure}[h]
\centering
\includegraphics[width=10cm]{img/future.jpg}
\caption{Exponential growth of computing \cite{singularity}}
\label{fig:computingGrowth}
\end{figure}
%Although, not directly comparable, there are a few figures to give a brief idea how efficient and powerful the brain is. Human brain has approximately 85 billion brain cells firing on frequencies around 10Hz to 200Hz, while consuming around 20W of energy. Intel i7 Ivy Bridge has 1.4 billion transistors, clock speed of ~3.5 GHz and consuming at least 50W when on full load. The key differences are that the brain does not have the Von Neumann's bottleneck as memory is combined with processing and it has massive parallelism through approximately 1,000 trillion connections between neurons.
\subsection{Background and motivation}
The topic of the thesis stemmed from interest towards the functioning of the brain and exploring the field of artificial intelligence. In addition from the supervisor's side there was a general idea of experimenting with genetic algorithms to develop intelligent systems.
In parallel an example project of implementing face recognition on a FPGA was done \cite{saiSiavoosh}.
The challenges that arose from the project were the basis of the thesis at hand. As the number of configurable logic blocks and memory on a FPGA is limited, it is essential to choose efficiently tuned network for making the implementation feasible.
Even though state of the art machine learning techniques often involve highly interconnected Deep Neural Networks with large number of layers, the need for sparse and more simpler neural networks has not dissipated. Simple Feedforward Neural Networks can be of great use in embedded systems when classifying or predicting data with smaller dimensions. E.g. monitoring and security systems or smart home applications \cite{smarthome}. Furthermore, there are application areas, like embedded systems for space applications, which require minimizing the overhead of used subsystems as much as possible. Therefore, refining FANN networks for FPGA usage definitely has a potential use and there is a current lack of formalized set of methods that would allow to do so.
\subsection{Problem and goal definition}
Selecting and tuning neural network hyperparameters, finding good architecture and data representation that works the best for given problem can be often very time consuming task that requires lot of manual trial and error. Furthermore, if designing neural networks for hardware, the limitations and challenges expand further.
The objective of the thesis at hand is to provide guidance when designing simple Feedforward Artificial Neural Networks (FANNs) on lower end FPGAs. The final outcome will be a neural network hardware realization that can classify images of faces, while trying to maximise the network accuracy and minimize area requirements. Common existing methodologies and author's own approaches are described, experimented and compared with, giving an overview how neural networks could be optimized and configured for making FPGA mapping feasible.
The practical task is broadly divided into two parts:
The first part of the task is to explore and experiment with different network topology optimization approaches, where pruning and evolutionary computing is used. Minimal neural network is the prerequisite for an efficient FPGA implementation.
The second part tackles more hardware specific challenges of mapping a feedforward neural network on a FPGA.
%Additionally, some of the questions risen during the research were investigated:
%\todo[inline]{List of some of the subgoals and questions risen during the research that would be needed to be rephrased and clarified later on}
%\begin{enumerate}\itemsep0pt \parskip0pt \parsep0pt
% \item Will evolving the ANN topologies with GA yield any more effective solutions than what currently is being used?
% \item Does the use of GA give any advantage in addition to the learning phase of ANN?
% \item Does it make any difference of removing the connections completely or training takes care of it? And does it make sense when using parallel computing?
% \item Does the pruning of connections have a benefit in the context of memory usage on FPGA?
% \item Is it possible to replace the 2DPCA feature extraction with ANN implementation and still be feasible or even have a benefit?
%\end{enumerate}
\subsection{Methodology}
Tools, frameworks used. Datasets. Hardware?
\begin{figure}[h]
\centering
\includegraphics[width=10cm]{img/ORL.png}
\caption{The ORL Database of Faces \cite{orl}}
\label{fig:orlDataset}
\end{figure}
%-------------------------------TOPIC START---------------------------
\section{Theory}
In the following chapters there will be a overview of the basics, brief background and the state of the art for the artificial neural networks, evolutionary programming and implementing them on FPGAs.
\subsection{Artificial neural networks}
% about neurons
% https://en.wikipedia.org/wiki/Sparse_distributed_memory#Neuron_model
Artificial Neural Networks (ANN) are a class of statistical models inspired by the brain research and the biological neural networks. The central idea of ANN is not to use feature engineering, where the rules and semantics of input data are previously specified by human. Instead, it can adapt and train itself based on given examples. It can be used to classify data, recognize patterns and predict. \cite{tyai}
Artificial neural networks became a new paradigm in 1980s and nowadays it has already proven useful in numerous life applications like data mining, search engines, weather prediction, forecasting financial markets, monitoring systems, giving medical diagnosis, voice and image recognition etc. The latter is includes also face recognition that is the goal of the thesis at hand.
\subsubsection{Artificial neuron}
The basic processing unit of an biological neural network is believed to be a neuron. Already in the beginning of 20th century it had been observed by anatomists of that time that the cortex of a brain has similar cellular structure all over it. It was known that each biological neuron cell consisted of cell body, dendrites and axons (Figure \ref{fig:biological_neuron}). Dendrites are carrying input signal to cell body and after reaching a certain threshold, neuron fires an output through axon. Neurons are connected to each other through synapses in between axons and the dendrites of other neurons.
\begin{figure}[h]
\centering
\includegraphics[width=10.5cm]{img/neuron.png}
\caption{Structure of biological neuron cell}
\label{fig:biological_neuron}
\end{figure}
In 1978 Vernon Mountcastle proposed, based on his observations, that all of the brain regions are performing actually the same operation and the function that a specific region performs is related to the connections between neurons\cite{mountcastle}. Therefore in principle any brain region can be trained to classify any type of information, for example visual recognition is no different from hearing in terms of the underlying mechanism. To illustrate that, there has been experiments done where newborn ferrets' brain has been rewired so that the eyes send their signals to areas where hearing normally occurs. As a result these auditory areas develop functioning visual pathways instead. \cite{ferrets} When applying this knowledge to the artificial intelligence theory, we can make a presumption that pattern recognition can be effectively done on any input information, by having a great enough number of interconnected artificial neurons.
These observations of that time were used for creating perceptron in 1978 by Frank Rosenblatt \cite{rosenblatt}. Perceptron is a simplified, artificial neuron, that takes in a vector of $n$ inputs, which are being multiplied by their associated weights $\sum_{i=0}^{n} x_iw_i$ and gets the output $y$ by feeding it to the activation function $\phi$. This can also represented as a dot product of two vectors.
$$y=\phi\left(\sum_{i=0}^{n} x_iw_i\right) = \phi(\mathbf{w}^T\mathbf x)$$
\tikzset{%
every neuron/.style={
circle,
draw,
minimum size=1cm
},
neuron missing/.style={
draw=none,
scale=4,
text height=0.333cm,
execute at begin node=\color{black}$\vdots$
},
}
\begin{figure}[h]
\centering
\begin{tikzpicture}[
init/.style={
draw,
circle,
inner sep=2pt,
font=\Huge,
join = by -latex
},
squa/.style={
draw,
inner sep=2pt,
font=\Large,
join = by -latex
},
start chain=2,node distance=13mm
]
\node[on chain=2]
(x2) {$...$};
\node[on chain=2,join=by o-latex]
{$...$};
\node[on chain=2,init] (sigma)
{$\displaystyle\Sigma$};
\node[on chain=2,squa,label=above:{\parbox{2cm}{\centering Activation \\ function}}]
{$\phi$};
\node[on chain=2,label=above:Output,join=by -latex]
{$y$};
\begin{scope}[start chain=1]
\node[on chain=1,label=above:{\parbox{2cm}{\centering Input}}] at (0,1.5cm)
(x1) {$x_1$};
\node[on chain=1,label=above:Weights,join=by o-latex]
(w1) {$w_1$};
\end{scope}
\begin{scope}[start chain=3]
\node[on chain=3] at (0,-1.5cm)
(x3) {$x_n$};
\node[on chain=3,join=by o-latex]
(w3) {$w_n$};
\end{scope}
\node[label=above:\parbox{2cm}{\centering Bias \\ $b\cdot{w_0}$}] at (sigma|-w1) (b) {};
\draw[-latex] (w1) -- (sigma);
\draw[-latex] (w3) -- (sigma);
\draw[o-latex] (b) -- (sigma);
%\draw[decorate,decoration={brace,mirror}] (x1.north west) -- node[left=10pt] {Inputs} (x3.south west);
\end{tikzpicture}
\caption{Artificial neuron with bias}
\label{fig:artificial_neuron}
\end{figure}
Graphically it can be represented as shown in Figure \ref{fig:artificial_neuron}.
Artificial neurons often also have an additional bias input $b$, the value of which is always with the value of 1. The role of the bias is to provide a constant value in order to shift the activation function, consequently allowing the representation of all linear functions. If we have a unit of one artificial neuron with linear activation function, one input x and a bias, then we end up with a classical two variable linear equation:
$$ \text{Let}\ n = 1 \implies \phi\left(\sum_{i=0}^{n} x_iw_i\right) = \phi(x_0w_0+x_1w_1) $$
$$ \phi(x) = x \implies y=\phi(x_0w_0+x_1w_1) = x_0w_0+x_1w_1 $$
$$ x_0 = b = 1.0 \implies y = x_0w_0+x_1w_1 = x_1w_1 + w_0 = ax + b$$
Whether the weighted sum of neuron's inputs trigger the output is decided by activation function. Classically there's three types of activation functions: linear, threshold (step) and sigmoid (soft-step). Sigmoid function is a special case of logistic function that is characterized by its S-shaped curve. It is often used as it introduces non-linearity to the network and is easily derivable for weight learning. Based on the output range sigmoid functions divide into: logarithmic sigmoid, which is range from [0,1] and a scaled version of it, hyperbolic tangent sigmoid, that is in the range of [-1,1].
\begin{itemize}
\item Identity (Linear)
$$\phi(x)=x$$
\item Binary step
$$\phi(x)=\begin{cases}
0\ \text{for}\ x<0\\
1\ \text{for}\ x\geq0
\end{cases}$$
\item Logarithmic sigmoid (Soft step)
$$\phi(x)=\frac{1}{1+e^{-x}}$$
\item Hyperbolic tangent sigmoid (TanH, Tansig)
$$\phi(x)= \frac{2}{1+e^{-2x}}-1$$
\item Softmax
$$\phi(x)=\frac{e^x}{\sum{e^x}}$$
\end{itemize}
Perceptron itself can only do linear classification, which at the time of its invention was the main criticism over it. For example, it can successfully learn logical 'AND' and 'OR', yet classifying 'XOR' is impossible, as the classes of it are not linearly separable. However, if perceptrons are connected together into multiple layers, they can be far more powerful.
\begin{figure}[h]
\centering
\begin{tikzpicture}[x=1.5cm, y=1.5cm, >=stealth]
\foreach \m/\l [count=\y] in {1,2,3,missing,4}
\node [every neuron/.try, neuron \m/.try] (input-\m) at (0,2.5-\y) {};
\foreach \m [count=\y] in {1,missing,2}
\node [every neuron/.try, neuron \m/.try ] (hidden-\m) at (2,2-\y*1.25) {};
\foreach \m [count=\y] in {1,missing,2}
\node [every neuron/.try, neuron \m/.try ] (output-\m) at (4,1.5-\y) {};
\foreach \l [count=\i] in {1,2,3,n}
\draw [<-] (input-\i) -- ++(-1,0)
node [above, midway] {$x_\l$};
\foreach \l [count=\i] in {1,n}
\node [above] at (hidden-\i.north) {$h_\l$};
\foreach \l [count=\i] in {1,n}
\draw [->] (output-\i) -- ++(1,0)
node [above, midway] {$y_\l$};
\foreach \i in {1,...,4}
\foreach \j in {1,...,2}
\draw [->] (input-\i) -- (hidden-\j);
\foreach \i in {1,...,2}
\foreach \j in {1,...,2}
\draw [->] (hidden-\i) -- (output-\j);
\foreach \l [count=\x from 0] in {Input, Hidden, Ouput}
\node [align=center, above] at (\x*2,2) {\l \\ layer};
\end{tikzpicture}
\caption{Artificial neural network}
\label{fig:aan}
\todo[inline]{Go through the mathematics of back propagation and using matrix forms}
\end{figure}
\subsubsection{Feedforward neural network}
Artificial neural networks are usually organized into layers, where the output of one node is connected to the input of next layer \ref{fig:aan}. The first layer of artificial neuron nodes is called the input layer and the last output layer. The intermediate layers are hidden layers.
\subsubsection{Preprocessing}
Feature extraction, selection. PCA.
\subsubsection{Constructive Learning}
The classical use of neural networks is done by deciding upon the architecture, setting up the connections between neurons and layers statically before starting the training phase. This however, does not guarantee the minimal network size needed. Constructive (or generative) learning algorithms have another approach, where the network is started off very small (usually with single neuron) and grown by adding neurons until satisfactory solution is found \cite{book:nnLearningAndExpertSys}. The key benefits of this approach are the following \cite{constructiveLearning}:
\begin{itemize}
\item{In addition to weight space, exploring the space of network topologies and thus overcoming the limitation of fixed topology.}
\item{Offer a potential to construct a minimal solution, which matches the intrinsic complexity of the underlying learning task.}
\item{Offer and approximation of the expected case complexity of the learning task.}
\item{Trade-off possibilities between the network size and accuracy for example.}
\item{Incorporating previous domain knowledge by learning construction on a simpler task and applying the topology on a new, related task. \cite{lifelongLearning}}
\end{itemize}
%\subsubsection{Classification with neural networks}
%\subsubsection{Scaled Conjugate Gradient}
%\subsubsection{Hierarchical Temporal Memory}
%\subsubsection{Long short term memory}
%\subsubsection{Deep learning}
%Caffe
%\section{Hardware}
%\label{hardware}
%A survey of Dynamically Reconfigurable FPGA Devices \cite{dynamicallyReconfigurableFPGA}. Coarse grained vs fine grained architecture. Configurability: partially vs fully. On-chip vs off-chip configuration.
\subsection{Evolutionary computation}
Evolutionary computation is a subfield of artificial intelligence characterized by using techniques inspired from Darwinian principles and natural evolutionary processes. Techniques categorized into the field include evolutionary algorithms like genetic algorithms, differential evolution, evolutionary programming, neuroevolution and also algorithms that are based on some naturally observed behaviours already emerged from biological evolution, like swarm intelligence, ant colony optimization, artificial life and bees algorithm.
\subsubsection{Genetic algorithms}
The origins of Genetic Algorithms can be contributed to John Holland who invented it during 1970's. Evolutionary algorithms are hugely inspired by evolution of lifeforms in the nature, using the idea of survival of the fittest and knowledge obtained from genetics. The algorithms are used as a search heuristic mostly in optimization problems and machine learning. They are more robust than deterministic search algorithms as they are able to filter out some level of noise in the data and adapt to changes in input. One of the problems can be getting stuck in local minima. Presently EA successfully being used in areas such as computer-automated design\cite{exampleCAD}, fault diagnosis in hardware \cite{exampleFaultDiag}, software engineering etc. % linguistic analysis, scheduling applications
John Holland's genetic algorithm is known in literature as Simple Genetic Algorithm (SGA) and it has the following components \cite{GAsurvey}:
%\vspace{-10mm}
\begin{itemize}\itemsep0pt \parskip0pt \parsep0pt
\item population of individuals
\item individuals encoded as binary strings
\item fitness function
\item genetic operators: crossover and mutation
\item selection mechanism
\end{itemize}
\begin{figure}[h]
\centering
\includegraphics[width=8cm]{img/SGA_graph.png}
\caption{Simple Genetic Algorithm \cite{travelingSalesman}}
\label{fig:SGA}
\end{figure}
The algorithm of SGA can be seen on figure \ref{fig:SGA}. It begins by generating initial population, which consists of a set of individuals, each individual representing a possible solution to a problem . Individuals are encoded as a finite length vector of bits, which corresponds to chromosome consisting of genes in terms of biology (Figure \ref{fig:SGA_population}).
\begin{figure}[h]
\centering
\includegraphics[width=8cm]{img/SGA_population.png}
\caption{Population of Simple Genetic Algorithm}
\label{fig:SGA_population}
\end{figure}
After generating the initial population, the best solutions are selected using a fitness function, which determines individuals who performed better than the others. The selected individuals will be then allowed to pass on their genes to next generation by organizing them into pairs and combining their chromosomes using crossover operation (Figure \ref{fig:SGA_crossover}). With this operation there is a great chance that the attributes which made the parents the best individuals are being carried over to the offspring and combining genes from both of them can produce even fitter individuals.
\begin{figure}[h]
\centering
\includegraphics[width=8cm]{img/SGA_crossoverBW.png}
\caption{Crossover operator}
\label{fig:SGA_crossover}
\end{figure}
In addition to crossover there is a low probability of random changes happening in the genes, called mutation. Mutation inhibits premature convergence and it helps maintain diversity in the population. \cite{GAOverview}
\subsection{Field Programmable Gate Array basics}
Field Programmable Gate Array (FPGA) is a integrated circuit designed in a way that it can be reconfigured after manufacturing.
Due to the possibility of directly describing hardware with HDLs, specifically for the task at hand without much overhead, their performance is much higher than using CPU or GPU for solving the same tasks. Even though Application Specific Integrated Circuits (ASICs) have even higher computational capability and efficiency, they are expensive to manufacture. Therefore FPGA's flexibility, coming from the ability to be reprogrammed endlessly, makes them often the choice of platform for prototyping hardware or accelerating demanding computing tasks.
FPGA's architecture commonly consists of an array of configurable logic blocks (CLBs) (Figure \ref{fig:clb}), which are surrounded by configurable interconnection structure. CLB typically consists of a Lookup Table (LUT) with 4 inputs which essentially is a truth table that can be defined to behave as any 4-input combinational function. LUTs themselves are typically built out of SRAM bits to hold the configuration memory (LUT-mask) and multiplexers that are used to select the according bit to be driven to the output.
\begin{figure}
\centering
\includegraphics[width=8cm]{img/clb.png}
\caption{Typical logic block \cite{clbimage}}
\label{fig:clb}
\end{figure}
For specific FPGAs' the CLB architecture and even the terminology varies. E.g. Xilinx divides the CLB further into slices and logic cells.
The CLBs of most Xilinx FPGA's can also be configured to behave as so called \textit{Distributed RAM} that spreads out over number of LUTs rather than being located in a single dedicated block. This gives them flexibility, however they are not area efficient and rather small.
In addition to logic blocks there are also a number of dedicated areas of \textit{Block RAM} which cannot be configured for other functionality, but are larger in size.
Which RAM to use depends on the memory requirements - for small sized memories the distributed RAM is better as the usage of block RAM would bee wastage of space. On the other hand using distributed RAM for bigger sized memories would cause extra wiring delays and the available amount might now be sufficient. Also the reading of block RAM is synchronous while distributed RAM is asynchronous.
%When using Xilinx Synthesizer Tool (XST) and specifying a range of memory in HDL, it uses block RAM as default. To specify distributed RAM, \textit{ram\_style} constraint should be used:
%
%\begin{lstlisting}
% attribute ram_style: string
% attribute ram_style of ram : signal is "distributed";
%\end{lstlisting}
\subsection{FPGA implementation of neural networks}
\caption{FPGAImplementationIssues}
\paragraph*{Parallelism in neural networks}
\caption{NNparallelism}
To fully exploit the power of neural networks, they should be parallelized akin to their biological counterpart, but it can be made parallel on hardware in different ways. In general, the only categorical statement that can be made is that, except for networks of a trivial size, fully parallel implementation in hardware is not feasible - virtual parallelism is necessary, and this, in turn, implies some sequential processing \cite{fpgaNNsurvey}. Therefore, it needs some analysis at which stage we make the computation happen parallel. According to \cite{fpgaNNsurvey} the types of parallelism are the following from higher level to lower level:
\begin{itemize}
\item Training parallelism - Parallel training sessions running simultaneously.
\item Layer parallelism - Multilayer networks layers are processed in parallel
\item Node parallelism - Each individual node is processed in parallel.
\item Weight parallelism - During the computation of weights, multiplications can be done in parallel.
\item Bit-level parallelism - Increasing word size of individual processors or making communication between different functional units bit-parallel.
\end{itemize}
%\subsection{FPGA implementation of neural networks}
\paragraph*{Weight multiplication in hardware}
One of the fundamental problems with implementing parallel processing ANNs on FPGAs is due to the large number of connections between neurons. To calculate neuron's input values, multiplication is required and making it fully-parallel on the weight level it would mean one binary multiplier for each input. If there is a fully connected layer with $n$ inputs and $l$ neurons, it requires $n \times l$ multipliers. As we increase the input dimensionality for more complex problems and keep the number of neurons in proportion to the inputs, the growth of multipliers needed becomes quadratic.
There are broadly two ways to make the implementations of ANN feasible: either decrease the number of multipliers and lose in parallelism and performance, or decrease the complexity of multipliers and lose in accuracy \cite{stochasticNNonFPGA}. Typical solution for example is to use time-division multiplexing (TDM) to share one multiplier per neuron across it's inputs \cite{TDM}.
Another approach, proposed in \cite{stochasticNNonFPGA}, is the use of stochastic computing, which uses probabilistic properties of bit streams to find approximations. To multiply for example operands 0.12 and 0.3 together, it can be done as following:
Let there be two streams of random bits, $A$ and $B$
\begin{eqnarray*}
A = \{a_0, \dots, a_n\}; a_i \in \{0,1\};\ n,i \in \mathbb{N} \\
B = \{b_0, \dots, b_n\}; b_i \in \{0,1\};\ n,i \in \mathbb{N}
\end{eqnarray*}
The random sequence of bits will be generated with a given probability according to the operands. So the probability of $1$ in the first stream will be 0.12 and for the second stream 0.3
$$p=P_{a_i}(1) = 0.12$$
$$q=P_{b_i}(1) = 0.3 $$
The product of these two probabilities is equal to the probability of $1$ in an output stream of logical AND taken from $A$ and $B$.
$$p \times q = P_{a_i \wedge b_i}(1) = 0.36 $$
In order to get a good estimation of the frequencies of ones in the stream and therefore accurate enough multiplication result, the $n$ has to be sufficiently large.
Traditional binary multiplication needs a state machine and an adder, while the complexity of the whole operation is $O(n^2)$. With the stochastic method we need to generate random bit stream, for which LFSR can be used, and only an AND gate to find the product.
\paragraph*{Routing}
\caption{routing}
Arguably the main bottleneck for building a fully parallel ANN on hardware would be the routing limits of FPGA. The problem stems from the FPGA architecture itself and due to the high interconnectivity of ANNs themselves, as their size increases, the routing limits are hit fast.
As process technology improves, FPGA vendors are able to build larger arrays of these identical tiles. As they do, routability degrades because proportionally more interconnect is required on large device \cite{gamal}.
The additional problem with is that the routing utilization is difficult to measure and estimate when the FPGA capacity limit is reached due to the interconnect usage \cite{FPGArouting}. Practically it means that at some point the speed advantage the parallelism should provide, will be lost to due routing delays.
\subsection{Data representation and precision}
An important aspect to consider in a NN hardware implementation is the data representation - the number format and bit length that is used for the inputs, weights and activation function. Mainly the format of weights is the trade-off point between accuracy of the network and the hardware implementation costs, because that in turn decides the complexity and area needs for multipliers and activation function. In general use computers floating point arithmetic is typically used as they can provide much higher accuracy in dynamic ranges - with the use of exponents very large or small numbers can be represented. However by raising the magnitude, loss of precision will occur. Fixed point representation on the other hand has a more limited precision and data range. The advantages are that the precision and absolute error always stays the same, which is needed in some applications e.g. finance. Secondly, fixed point arithmetic in hardware is also more closer to integer operations - simpler, more area-efficient and faster. When the application at hand requires arithmetic only in small fixed range, then fixed point implementation on FPGA can be more feasible.
For neural network FPGA implementations the fixed point representation is often chosen because of the lower implementation costs. Although the use of floating point have been researched \cite{floatingPointFeasibility}, it has been found that fixed point numbers with precision of 16 bits for weights and 8 bit for activation function are sufficient \cite{precision_analysis}, \cite{backPropagationPrecision}. Also it should be noted that learning requires more precision, as the back-propagation errors can more rapidly accumulate.
To hold the data in memory, block RAM should be used as Distributed RAM would be too small for non-trivial sized networks, including the approximate network needed for image recognition.
\subsection{Activation function implementation}
\label{sec:Activation function}
Computation of activation functions directly by its formula is often not feasible in digital systems due to the high area requirements and delay. Instead, approximation methods can be used to trade off precision. Piecewise Linear (PWL) Approximation and Lookup Table (LUT) Approximation will be discussed more in detail, however other methods also exist like e.g. Truncated Series Expansion.
\vspace{-\topsep}
\begin{enumerate}[topsep=0pt, partopsep=0pt]
\item
\textit{Piecewise Linear Approximation} uses a series of linear segments to approximate the activation function \cite{linearApproxSigmoid}. Different PWL schemes exist: A-law based approximation \cite{alaw}, Approximation of Alippi and Storti–Gajani \cite{alippi}, second-order approximation of Zhang, Vassiliadis and Delgado–Frias \cite{zhang} and Centered Linear Approximation (CRI) \cite{linearApproxCRI}.
As an example, the CRI is a recursive computational scheme for the generation of PWL. The simplest initial logarithmic sigmoid approximation with CRI is defined by three line segments (Equation \ref{eq:CRI}) and as the algorithm goes over its iterations the segmentation increases as seen on Figure \ref{fig:sigmoidCRI}. Depending on how much precision is needed, the sigmoid approximation can be divided into more segments, decreasing the error.
\begin{align}
\label{eq:CRI}
H(z)&=
\begin{cases}
1\ & \text{for} \ z\geq L\\
\frac{1}{2}\cdot(1+\frac{z}{2}) & \text{for} \ -L > z > L\\
0 & \text{for} \ z\leq -L
\end{cases}
\end{align}
Where $L$ is the given range in which the segmentation is carried out.
\begin{figure}[h]
\centering
\includegraphics[width=12cm]{img/sigmoidCRI.png}
\caption{Sigmoid function Piecewise Linear Approximation with CRI (L=2). On the left the initial approximation with three line segments and on the right approximation after one iteration, resulting 5 line segments}
\label{fig:sigmoidCRI}
\end{figure}
\item{\textit{Lookup Table Approximation}} maps input values to uniformly distributed set of output values. This method results in a design with better performance in terms of speed, as no arithmetic operations are needed. However, the area or memory requirements would be higher due to the need of storing the mapped values. Relation between the area requirements and precision is exponential \cite{activationFunctionHDLCoder}, which makes this method impractical if high precision is needed. The approximation for a LUT with a width of 4 bits can be seen on Figure \ref{fig:sigmoidLUT}. In terms of average error, this method also outperforms PWL schemes according to \cite{sigmoidCRIvsLUT}.
\begin{figure}[h]
\centering
\includegraphics[width=8cm]{img/sigmoidLUT.png}
\caption{Sigmoid function Lookup Table Approximation}
\label{fig:sigmoidLUT}
\end{figure}
\end{enumerate}
\section{Face recognition task}
\subsection{Neural network models}
\label{sec:nnmodels}
In various stages of modelling neural network accuracy in software, two different models were mainly used. One was mainly a comparison reference and the other one was used for more precise simulation.
For evaluating the results of the experiments made, a reference network neural network was taken, which worked already fairly well. It was generated with Matlab's built in function 'patternet' and it is very close to the neural network used in \cite{saiSiavoosh}. The basic structure of the network can be seen on the first example of Figure \ref{table:matlab_results}. The inputs were preprocessed beforehand with 2DPCA to decrease the input dimensions. There are fully connected hidden and output layer, both with 40 neurons, using tansig activation functions and a bias. In addition normalization methods were used on the input data beforehand. The average test accuracy of 10 experiments when using the ORL dataset was 83.5\%. However, in the best cases up to 98\% accuracy can be achieved. The dataset was divided into training and test set with 8:2 ratio. From further on, this neural network model will be referenced as Matlab NN.
Matlab Neural Network Toolbox allows creating custom networks while specifying relations between layers, input sources, activation functions and allows modifying the weights, however specifying individual connections is nontrivial and would still require getting into the source code itself. For example, when setting the weights to zero and essentially cutting them off, the default training methods will still retrain them. Due to these limitations a custom neural network implementation was written as it was deemed to be easier than rewriting the source code of existing tool.
The ANN implementation is based on the lessons of Coursera's Machine Learning course \cite{coursera}. It is a simple two layer feedforward network, with gradient descent back propagation learning and regularization. The average test accuracy over 10 experiments was 85\%. In the best cases, the accuracy was up to 92.5\%. The dataset was divided into training and test set with 8:2 ratio as before. For the purpose of testing optimization and pruning methods the lower accuracy in the best cases was considered less relevant at this point. This neural network model will be referenced as Coursera NN.
The main differences (and similarities) between these two models can be seen in Table \ref{tab:matlabvscoursera}. The input given for both networks is normalized beforehand in the range of $[0, 1]$ and preprocessed with 2DPCA. After that the values are being normalized once again, this time with \textit{mapminmax} function that maps the values in the range of $[-1, 1]$. The choice for this range does not have a good argumentation over for example the range $[0 1]$ other than simply performing slightly better. Possibly because the activation function's point of symmetry is where $x=0$ and the inputs get then more evenly distributed in the range where function derivative is the highest.
\begin{table}[]
\centering
\begin{tabular}{lll}
& \textbf{Matlab NN} & \textbf{Coursera NN} \\
Activation fn & Tansig & Logsig \\
Training & Scaled conjugate gradient & Gradient descent \\
Preprocessing & 2DPCA & 2DPCA \\
Layers & 2 & 2 \\
Hidden neurons & 40 & 40 \\
Avg. accuracy & 83.5\% & 85\% \\
Best accuracy & 98\% & 92.5\%
\end{tabular}
\caption{Differences between Matlab and Coursera NN implementation}
\label{tab:matlabvscoursera}
\end{table}
%\begin{figure}[h]
% \centering
% \frame{\includegraphics[width=8cm]{img/matlab_nn.png}}
% \caption{Base example}
% \label{fig:base_matlab_nn}
%\end{figure}
\subsection{Experiments with optimizing the network}
%\paragraph*{\color{red}{Rant about the justifications and indications of the experiments}}
Common approach with neural networks is to make a fully connected layers and let the training phase utilize the connections as needed. This means that a number of connections can be redundant. The question that arose, was whether to include the individual connections and weights optimization into the GA. When combining ANN and GA together, then essentially there two different optimization methods working in parallel and if the training of ANN already modifies weights, does adding it to GA have any extra value. When looking into the weights of a trained neural networks, it is very unlikely for any weight becoming zero due to the working of back propagation. However, they can get very close to zero and therefore considerably insignificant from which point they could be removed if necessary. The first experiments goal was to see how pruning affects the performance of ANN in general and whether there is a reason to proceed with more complex, evolutionary optimization methods. %However, there are existing techniques that concentrate on evolving individual connections, like NEAT \cite{neat}.
% Probably should be moved to a separate state of the art/ theory section
Another thing to consider is the way that the calculation of ANN weights will be implemented. State of the art approach of running ANN on general-purpose computers is to use matrix operations to speed up the process of calculating outputs in fully connected layers. For further acceleration this can be parallel computed on GPU for example \cite{gpumatrixmulti}.
The practice of engineering individual connections between layers is therefore unnecessary as the matrices are representing fully connected layers and disconnecting a synapse would just mean zero valued weight in a corresponding location. This would not however bring any computational gain.
Based on that reasoning it is rational to optimize the individual connections only if its number has an effect on the performance. That is in the case if computing the weight updates separately. If computing them one by one serially, then it affects the time and if done fully parallel, the area.
Initially the vectorized approach was focus on, where matrix operations are used, therefore in this experiment the individual connections were left out and the topology was explored only on layer level.
\subsubsection{Rounding the weights}
The experiment was done by taking mean of 10 training results and rounding down the weights to nearest ten thousandth, thousandth, hundredth and so on, as described in Figure \ref{fig:rounding_algo}.
This essentially rounds the insignificant weights to zero and by reducing the accuracy, the amount of memory necessary to hold the weight values reduces as well.
\begin{figure}[h]
\begin{algorithm}[H]
\For{i=1:n of experiments}{
net = train(net)\;
tempnet = net\;
\For{j=0:4}{
tempnet.weights = floor(net.weights * pow(10,j))/pow(10,j)\;
acc = validate(tempnet)\;
}
}
\end{algorithm}
\caption{Rounding down the weights}
\label{fig:rounding_algo}
\end{figure}
It can be seen from the Figure \ref{fig:rounding_connections} that rounding weights down to hundredths has a very marginal effect on the accuracy of the network (increasing the error by 0.25\%). As the weights with zero values can be considered unconnected, this will mean the removal of around ~2\% of the connections. Therefore, the gain from pruned connections itself is low with this approach, however it shows that there can be room for optimization when pruning further with better methods.
\begin{figure}[h]
\centering
\begin{tikzpicture}
\begin{axis}[tickpos=left,xlabel=Decimal places,ylabel=Accuracy, ymax=100, ymin=0]
\addplot[color=black,mark=*] coordinates{(4,92.125)(3,92.125)(2,91.875)(1,58.75)(0,9.125)};
\end{axis}
\end{tikzpicture}
\caption{The effect of rounding connections with floor on the accuracy of the network}
\label{fig:rounding_connections}
\end{figure}
\subsubsection{Pruning insignificant connections}
This experiment was done by using the methodology proposed in \cite{pruning}. The Coursera NN was first trained, then pruned so that more important connections remain and finally trained once again to balance out the weights for the lost connections. The pruning was done by changing weights that were below a certain threshold to zero. The accuracy of pruned connection compared to the original can be seen plotted on Figure \ref{fig:pruning_plot}.
\begin{figure}
\centering
\includegraphics[width=8cm]{img/pruningPlot.png}
\caption{An example result of pruning connections}
\label{fig:pruning_plot}
\end{figure}
\subsubsection{Exploring layer connections with genetic algorithm}
One of the main approach of interest was using genetic algorithm to optimize neural networks for the task at hand. The intention was to find better performing and more optimal network for implementing on hardware. Focus here was mainly the general topology - connections between layers, activation functions and the number of neurons, while individual connections and more specific hyperparameters were left out. Figure \ref{table:gene_table} represents the table of variables being optimized or 'gene' in the context of genetics. To explore topologies with different number of hidden layers, the GA was given an argument $m$ for maximum number of layers to test. This also determines the length of the gene, as for example layers' connection matrix has a quadratic growth when the number of layers is being increased. After that the gene's first field designates how many layers are actually being used. This however means that also a great number of fields in the gene would be left unused if the number of layers is less than the maximum.
\begin{table}
\scriptsize
\begin{tabular}{| p{0.09\linewidth} | p{0.11\linewidth} | p{0.11\linewidth} | p{0.11\linewidth} | p{0.11\linewidth} | p{0.11\linewidth} | p{0.11\linewidth} |}
\hline
Field & Number of layers & Number of neurons & Bias connections & Transfer functions & Layer connections & Input connections \\ \hline
Field size & 1 & m & m & m & m $\times$m &m \\ \hline
Value type & Integer & Integer & Binary & Enumerated & Binary & Binary \\ \hline
Description & Number of hidden layers used (out of the possible number m) in the current individual. & Number of neurons in each hidden layer & Specifies in which layers the bias input is being used. & \vtop{\hbox{\strut1 - tansig} \hbox{\strut 2 - logsig} \hbox{\strut 3 - purelin} \hbox{\strut 4 - softmax}} & Specifies the connections between hidden layers & Specifies to which hidden layers are inputs connected to \\ \hline
\end{tabular}
\caption{Description of variables optimized}
\label{table:gene_table}
\end{table}
The cost function reads the values from the generated gene and builds a neural network based on it. After that it trains the network, measures the time of training and evaluates the results. It can happen that from the generated gene no valid network can be constructed, in this case a high cost value will be assigned to eliminate them from the gene pool. The final cost is calculated as following:
$$C=\begin{cases}
1000 \ \text{for invalid network}\\
t + (1-\frac{e}{n})\cdot100 \ \text{for valid network}
\end{cases} \\
$$
where\\ t - time of training (s)\\e - number of errors\\n - number of classes
First of all, to make the networks comparable with the reference network, it was tested whether the same example with similar results could be generated using by giving the gene as an input. The representation of the reference network as a gene would be the following:
$$\text{gene} = \{2, 40, 40, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0\}$$
%\paragraph*{Results}
The GA optimization process generated different architectures with good performance, though compared with hand engineered reference, there was no significant gain. In Table \ref{table:matlab_results} there can be seen the generated networks and their corresponding classification accuracies both with raw data and preprocessing with 2DPCA.
\begin{enumerate}
\item reference network
\item GA generated
\item 2nd individual with removed feedbacks
\item GA generated
\end{enumerate}
In general the results tended to converge still to very simplistic structures, not too different from what human would hand engineer. Part of the problem here can be also the choice of the data set. It is possible that with more diverse data set, with higher variance, the optimal structure would be more complex.
In addition the best performing individuals often had still redundant or questionable properties present. E.g. feedback loops or layers without outputs. Because the cost function did not inhibit such phenomena, once they appeared, they persisted into later generations.
For the purpose of implementing simple and minimal neural network on a FPGA, the last GA generated individual (Figure \ref{fig:ga_individuals}) could be of use when coupled with feature extraction. It has only one output layer with softmax activation function and has no hidden layers. Compared to the reference network, it needs 40 neurons less and the weights associated with it, while showing similar performance.
\begin{figure}
\centering
\includegraphics[width=8cm]{img/matlab_results.png} \\
\caption{Chosen individuals for comparison}
\label{fig:ga_individuals}
\end{figure}
\begin{table}
\centering
\begin{tabular}{|l|l|l|l|l|}
\hline
& 1 & 2 & 3 & 4 \\ \hline
2dpca & 93.38 \% & 94.58 \% & 94.40 \% & 94.10 \% \\ \hline
raw & 88.48 \% & 93.40 \% & 93.70 \% & 44.70 \% \\ \hline
\end{tabular}
\caption{Results of optimizing the structure of the ANN with GA}
\label{table:matlab_results}
\end{table}
\subsubsection{Optimization of individual connections with GA}
Genetic algorithm was also applied on individual connection level to find a good trade off between network size and prediction accuracy. Due to the need of having control over the neural network's connections, the Coursera NN (Section \ref{sec:nnmodels}) was used.
% For generating connections with differing probability
\begin{figure}
\centering
\includegraphics[width=8cm]{img/conn_gene.png}
\caption{Gene layout used for finding optimal balance between performance and number of connections}
\label{fig:conn_gene}
\end{figure}
Calculation of cost function:
$$C = \text{connectionsPercentage} + \text{errorPercentage} + \text{trainingTime}$$
\begin{figure}
\centering
\includegraphics[width=8cm]{img/connections_ga_result.png}
\caption{Performance of a partially connected network found with GA}
\label{fig:connection_ga_perf}
\end{figure}
% filler.. doesn't show much, but looks kind of cool
Figure \ref{fig:individual_connections} shows graph of individual connections between input (224 nodes) and hidden layer (40 nodes) with total of 1502 synapses.
\begin{figure}
\centering
\includegraphics[width=8cm]{img/individual_connections.jpg}
\caption{Example of an optimized connection between input and hidden layer visualized as a graph}
\label{fig:individual_connections}
\end{figure}
\subsection{Approximation of activation function}
For finding a good approximation of activation function, LUT and CRI methods (explained in more detail in Section \ref{sec:Activation function}) were experimented with. For the LUT approximation it was necessary to find out what was the table size needed to hold the values without losing too much in accuracy of the network. In addition a decision needs to be made for the range in which the LUT stores the values. For example, when choosing the range [-5,5] and LUT table width of 4 bit, the function $H(z)$ will be split into $2^4$ segments in the given range, while $H(z)=0$ when $z<-5$ and $H(z)=1$ when $z>5$. The Figure \ref{fig:lutapprox} shows the approximations with different range and bit widths, with the example highlighted that gave 95.25\% accuracy over the whole dataset, while regular sigmoid function gave 97.75\%.
Secondly, the CRI approximation produced accurate results in a neural network already with the initial 3-segment function described in Equation \ref{eq:CRI}, without the need for any further iterations. The two methods are compared with the regular sigmoid function on a test dataset on figure \ref{fig:lutvscri}. In this sample run all three cases the accuracy of 92.5\% was reached, with CRI method even having a slightly better result than the original. From the training process it can be seen that the LUT method lags behind the CRI, but for the trained network both have relatively similar accuracy.
%\begin{figure}
% \centering
% \subfloat[Accuracy of neural network using LUT sigmoid approximations with different LUT sizes and ranges]{\includegraphics[width=8cm]{img/3dbarsSigmoidApproxHighlight.png}}
% \subfloat[Comparing neural network test accuracy with LUT and CRI approximation.]{\includegraphics[width=7cm]{img/CRIvsLUT2.png}}
% \caption{Finding sigmoid activation function approximation}
% \label{fig:sigmoidLUTcomparison}
%\end{figure}
\begin{figure}
\centering
\includegraphics[width=12cm]{img/3dbarsSigmoidApproxHighlight.png}
\caption{Accuracy of neural network using LUT sigmoid approximations with different LUT sizes and ranges}
\label{fig:lutapprox}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=9cm]{img/CRIvsLUT3.png}
\caption{Comparing neural network test accuracy with LUT and CRI approximation.}
\label{fig:lutvscri}
\end{figure}
\section{Hardware realization}
The final part of the work at hand was to implement a neural network on a FPGA.
The purpose of this was to take advantage of the speed increase that the use of
FPGA can provide. Secondly, one of the main goals was to optimize the network
for less power consumption and possibly for speed. Therefore the hardware realization was
needed for observing the results of optimization.
The hardware was written in VHDL, which is a language for describing digital electronic systems. The choice of VHDL over for example Verilog or SystemC came from having previous experience with it.
VHDL is designed to write down the structure of a system and its functionality
using typical programming language forms. What makes VHDL (and other hardware description languages) different from other programming languages
is the notion of time and concurrent statements. This makes it possible to also
simulate the design in software to verify its functionality. Finally, the description
can be used to synthesize hardware or configure a FPGA.
% VHDL is designed to fill a number of needs in the design process. First, it allows description
% of the structure of a system, that is, how it is decomposed into subsystems and
% how those subsystems are interconnected. Second, it allows the specification of the function
% of a system using familiar programming language forms. Third, as a result, it allows
% the design of a system to be simulated before being manufactured, so that designers can
% quickly compare alternatives and test for correctness without the delay and expense of
% hardware prototyping. Fourth, it allows the detailed structure of a design to be synthesized
% from a more abstract specification, allowing designers to concentrate on more strategic
% design decisions and reducing time to market. \cite{designersGuideVHDL}
\subsection{Design process}
The design process can be formally best described with a Waterfall or V-model in terms of development model
used in this project. Even though it was not chosen consciously in the beginning, it
was a somewhat natural choice as the development was carried out by an individual and it was
small sized project with fixed requirements. Also Waterfall and V-model are still the development
models used most often in the hardware design \cite{devModelBlog} opposed to agile methodologies
in software development. V-model puts an emphasize on testing and verification, which was
also a substantial part of the development in this case.
V-model usually starts with a concept of operation and a practical need for the system by someone.
Whether it is achieved is checked by client or user acceptance. In this case the goal was
to answer a research question instead, which means the development model is somewhat
mixed with scientific method instead. As an concept there was research questions posed in
the beginning which by the end should be either verified or dismissed by the end by
analysing the results of experiments.
As a result the design process with those mixed aspects can be visualized as seen on
Figure \ref{fig:vmodel}.
\begin{figure}[h]
\centering
\includegraphics[width=10cm]{img/vmodel.jpg}
\caption{The development process of the hardware realization}
\label{fig:vmodel}
\end{figure}
\subsection{Design choices}
\paragraph*{Offline against online learning}
The hardware design is limited at this point only to the neural network structure itself, without having capabilites of learning.
The training process has to be carried out on software and the weights of a trained network must be transferred to the BRAM of the FPGA. Therefore, it is called offline training, as opposed to online training (for example \cite{gadaeaFPGA}), where learning happens on FPGA itself.
The choice was made with the intention to start with crucial components of the system and expand in the future as needed. For running the experiments with modified weights, the offline learning is sufficient and better suited for its simplicity.
<<<<<<< Updated upstream
\paragraph{Parallelism}
When designing the hardware for ANN, there are many ways to approach the parallelism and there are also many limitations that come with it as it was explained in the section \reference{FPGAImplementationIssues}. The main parallelization approaches applicable for the design at hand were the node, weight or bit-level parallelism.
As it was also explained in section \reference{FPGAImplementationIssues} and \reference{routing} the fully weight-parallel design can reach
On the other hand, the advantages of pruning weights should come apparent mostly with either fully weight parallel design or when weight calculation is fully serial. In the first case the trade-off is mostly areawise and in the second case it is the time we are trading off. In addition it can affect the power consumption.
=======
\paragraph*{Parallelism}
>>>>>>> Stashed changes
\subsection{Architecture}
\subsection{Verification}
\subsubsection{Xilinx Artix-7 FPGA overview}
%-------------------------------KOKKUVÕTE/CONCLUSION---------------------------
\section{Conclusion}
\label{Conclusion} %Võimaldab pealkirjale viidata \ref käsuga
\pagebreak
\bibliographystyle{ieeetr}
\footnotesize
\setstretch{0}
\bibliography{literature} % viide bibtex failile (literature.bib)
\end{document}
\contentsline {section}{\numberline {1}Introduction}{10}{section.1}
\contentsline {subsection}{\numberline {1.1}Subsection}{10}{subsection.1.1}
\contentsline {section}{\numberline {2}First section}{11}{section.2}
\contentsline {section}{\numberline {3}Second section}{12}{section.3}
\contentsline {section}{\numberline {4}Conclusion}{13}{section.4}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment