Klausur KogSys-ML-M

2017-06-27 21:20:15 +02:00 · 2017-06-27 21:20:15 +02:00 · 0a4616ac74
commit 0a4616ac74
parent 71dfa69c6a
6 changed files with 180 additions and 0 deletions
--- a/Systeme/WS1617
+++ b/Systeme/WS1617
@ -0,0 +1,180 @@
+\input{../settings/settings}
+\usepackage{amssymb}
+
+\begin{document}
+
+  \klausur{KogSys-ML-M Lernende Systeme)}
+        {Prof. Dr. Ute Schmid}
+        {Wintersemester 16/17)}
+        {90}
+        {All printed and hand-written material and a not programmable calculator.}
+        
+        This exam consists of seven questions. You have to answer a total of six questions. A total of 90 points can be awarded for your answers. The first question is compulsory. The remaining six questions are selective. You have to answer five selective questions. If you answer all six selective questions the five with the most points are considered.\\
+        You may answer in German, or English, or a mixture of both.
+
+  \begin{enumerate}
+  \item Basic Concepts (compulsory)
+  \begin{enumerate}
+  \item (3 Points) You are developing a classification system that predicts the target class \textit{age} of a fish with the values \textit{young}, \textit{middle}, and \textit{old}. The training data consists only of the two attributes \textit{size in centimeter} and \textit{weight in gram}. Name a machine learning algorithm (discussed in lecture or tutorial) which is appropriate for learning this task. Additionally, name one advantage and one disadvantage of your chosen machine learning algorithm.
+  
+  \item (1 Point) Given noise-free data and at least one positive example, why is it not possible to have the $\varnothing$-sign in the final hypothesis learned by Find-S?
+  
+  \item Consider the following training examples with the four attributes \textit{size}, \textit{color}, \textit{care} and \textit{smell}. No attribute may take other values than given. The target concept is \textit{good\_gift}.
+  
+  \begin{tabular}{c|c|c|c|c|c}
+  	\hline
+  	example & size & color & care & smell & good\_gift \\
+  	\hline
+  	1 & medium & violet & difficult & strong & yes \\
+  	2 & medium & white & difficult & weak & no \\
+  	3 & small & violet & easy & strong & yes \\
+  	4 & medium & orange & easy & weak & no \\
+  	5 & small & white & difficult & strong & yes \\
+  	\hline
+  \end{tabular}
+  
+  \begin{enumerate}
+  	\item (6 Points) Apply the Candidate-Elimination algorithm on the training data.
+  	\item (4 Points) Apply the decision tree learning algorithm ID3 on the training examples. Do not calculate the Information Gain for the attributes. Add the attributes to the tree in the sequence: \textit{size}, \textit{color}, \textit{care}, \textit{smell}. Note for each node the numbers of the examples it was created from and the decision attribute or assigned class (for leafs).
+  	\item (1 Point) Name (without calculating) the attribute with the highest Information Gain.
+  \end{enumerate}
+
+  \item (2 Points) State two reasons which lead to an empty version space when using the Candidate-Elimination Algorithm.
+  
+  \item (2 Points) Does the hypothesis language of perceptrons allow a general-to-specific ordering of hypotheses? If yes, give an example for such an order or justify your answer. If not, explain why ordering is not possible.
+  
+  \item (Points 2) Consider the hypothesis h1 = <?,elephant,?,blue> and the hypothesis h2 = <?,?,cartoon,blue> of the hypothesis language of Find-S. Does the relation $h1 \geq_g h2$ (h1 is more\_general\_than\_or\_equal\_to h2) hold if all attributes are independent? Briefly explain your answer.
+  
+  \item (2 Points) State two reasons for overfitting.
+  
+  \item (1 Point) What is the difference between \textit{lazy} learning algorithms and \textit{eager} learning algorithms.
+  
+  \item (1 Point) State the main difference between k-Nearest-Neighbors and k-means Clustering.
+  
+  \end{enumerate}
+\newpage
+
+  \item Neural Networks and Support Vector Machines (selective)
+    \begin{enumerate}
+    	\item (8 Points) Use the perceptron learning rule to learn a thresholded perceptron from the following training examples. Choose the learning rate $\eta = 0.5$ and $w_0 = 1, w_1 = -1, and w_2 = -1$ as initial weights. Note at least the weights ($w_0, w_1, and w_2$) and the output (o) after every learning step. Consider each training example only once, that is do not train the perceptron until you found a solution.
+    	
+    	\begin{tabular}{c|c|c|c}
+    		\hline
+    		example & x1 & x2 & t \\
+    		\hline
+    		1 & 1 & 1.5 & t \\
+    		2 & 3 & 3 & -1 \\
+    		3 & 2 & 3 & 1 \\
+    		4 & 2.5 & 2 & -1 \\
+    		\hline
+    	\end{tabular}
+    
+      \item (2 Points) Give an example that cannot be learned with the perceptron training rule and name the technique that Support Vector Machines use to successfully handle this example.
+      
+      \item (3 Points) Often Artificial Neuronal Networks (ANN) and decision trees (DT) produce results of comparable accuracy. Your boss gives you the task to decide whether to train an ANN or use ID3 to generate a decision tree. She gives you one additionally information: while you already apply the chosen approach, every week you will be provided with additionally new training data. Which approach would you choose. briefly explain your decision.
+    
+    \end{enumerate}
+\newpage
+  \item Hidden Markov Models (selective)
+  
+  \begin{enumerate}
+  	\item Consider the following HMM ($\lambda = (A,B)$) with three states and the possible observations $\oplus and \ominus:$
+  	\image{0.6}{WS1617/hmm.png}{Hidden Markov Model}{Hidden Markov Model}
+  	
+  	\begin{enumerate}
+  		\item (2 Points) Draw the state diagram for the Matrix A and relate Matrix B to this state diagram.
+  		\item (2 Points) Consider the observation sequence $\mathcal{O}_{noway} = <\oplus,\oplus,\oplus,\oplus>$. Explain (without calculating) why the probability of seeing this sequence given the HMM is zero (i.e. $P(\mathcal{O}_{noway}|\lambda) = 0$).
+  		Hint: Take a close look at your state diagram.
+  		\item (8 Points) Compute the probability $P(\mathcal{O}|\lambda)$ for the observation sequence $\mathcal{O} = <\oplus, \ominus>$ using the \textbf{backward procedure}. State all $\beta_t$ in each step of the algorithm.
+  	\end{enumerate}
+  \item (1 Point) Explain briefly how HMMs can be used for classification.
+  \end{enumerate}
+\newpage
+  \item Bayesian Classification and Evolutionary Computation (selective)
+  
+  \begin{enumerate}
+  	\item (5 Points) Consider the following six training examples for the class \textit{t}.
+  	\begin{enumerate}
+  		\item How is the instance $x_q = <att1 = D, att2 = G>$ classified by the Naive Bayes classifier? State all probabilities that are needed for the decision.
+  		\item Additionally, calculate the probability $P(x | x_q)$.
+  		
+  		\begin{tabular}{c|c|c|c}
+  			\hline
+  			example & att1 & att2 & t \\
+  			\hline
+  			1 & C & F & + \\
+  			2 & C & G & + \\
+  			3 & D & F & + \\
+  			4 & D & G & - \\
+  			5 & E & F & + \\
+  			6 & D & G & - \\
+  			\hline
+  		\end{tabular}
+  	\end{enumerate}
+  	    
+  	    \item (2 points) Sometimes in Naive Bayes classification m-estimates are used instead of relative frequencies. Explain briefly the idea of m-estimates.
+  	    \item (2 Points) Briefly explain the preference bias and why the Minimum Description length Principle is a preference bias?
+  	    \item Briefly describe the ideas of \textit{genetic algorithms} and \textit{genetic programming}.
+  	    \item (2 Points) Give the offsprings of the single-point crossover with the initial strings s1 = 1101100 and s2 = 0011011 and the crossover mask m = 1111000.
+  		
+  	\end{enumerate}
+  \newpage
+    \item Reinforcement Learning (selective)
+    Consider the following grid world with states a through f with the absorbing state c. The transitions (state action pairs) are marked with their direct reward (5, 10, or 20, respectively). For unmarked transitions the reward is 0. let $\lambda$ be 0.8.
+    \image{0.6}{WS1617/reinforcement.png}{Reinforcement}{Reinforcement}
+    \begin{enumerate}
+    	\item (4 Points) State he values of $V^*(c), V^*(e), V^*(f), and V^*(d)$.
+    	\item (2 Points) Give the $Q(b, \rightarrow)$-value and the $Q(b, \downarrow)$-value.
+    	\item (1 Points) State the optimal strategy $\pi^*$. If the optimal strategy is obvious, you do not need to calculate the criterion.
+    	\item (4 Points) Apply the Q learning algorithm on the given world. Initialize all \^{Q}(s,a)-values with 0. Simulate two training episodes and state the updated \^{Q}(s,a)-values after each move. The agent takes the following paths for the training episodes:\\
+    	Episode 1: a - b - d - e - c\\
+    	Episode 2: a - b - d - e - c
+    	\item (2 Points) Explain briefly how $\lambda = 0$ and a $\lambda$ close to 1 influences the Q-values.
+    \end{enumerate}
+\newpage
+  \item Inductive Programming (selective)
+  \begin{enumerate}
+  	\item (3 Points) Consider the target concept \textit{grandmother(x,y)} (denoting that x is the grandmother of y). Additionally, the following positive examples and background knowledge is given:\\
+  	
+  	\textit{grandmother(Dortel,Tick), grandmother(Dortel,Track), mother(Dortel,Della), mother(Dortel,Donald), mother(Della,Tick)}\\
+  	
+  	\textbf{Note}: The \textit{closed world assumption} holds.\\
+  	
+  	State all candidate literals for extending $R = grandmother(x,y) \leftarrow {mother(x,z)}$.
+  	\item (1 point) What measure is used to choose the \textit{Best\_literal} in the FOIL-Algorithm?
+  	\item (3 Points) Apply the inverse resolution algorithm on the training example \textit{uncle(Tick,Donald)} and the following background knowledge to get the hypothesis for \textit{uncle(x,y)}. Give substitutions in the refutation tree.\\
+  	
+  	\textit{mother(Della,Tick), brother(Donald,Della)}\\
+  	
+  	\item (4 Points) Consider the following I/O-examples:\\
+  	
+  	\image{0.6}{WS1617/inductive.png}{IO-Examples}{IO-Examples}
+  	
+  	Give one possible program fragment for $f_1(x), f_2(x), f_3(x), and f_4(x), respectively$.
+  	\item (2 Points) Why is it not possible to learn \textit{member} or \textit{sort} with Summers' approach?
+  	
+  \end{enumerate}
+\newpage
+  \item k-Nearest Neighbors and k-means Clustering (selective)
+  
+  \image{0.6}{WS1617/knearest.png}{k-Nearest-Neighbors}{k-Nearest-Neighbors}
+  \begin{enumerate}
+  	\item (1 Point) State the 5 closest neighbors to the query point m1 = <att1=2, att2=2>. USe the distances given in the table.
+  	\item (1 Point) Suppose your target attribute has two values (+ and -). Why should your k be an odd number when using the k-Nearest-Neighbor algorithm using k = 5 and distance weighting = true. Use the distances given in the table. State all values that are needed for deciding this question.
+  	\item Consider the following graphical representation of the data.
+  	
+  	\image{0.6}{WS1617/knnrep.png}{Representation}{Representation}
+  	
+  	\begin{enumerate}
+  		\item (1 Point) Explain briefly why attribute \textit{att2} is irrelevant.
+  		\item (1 Point) Explain the impact of an irrelevant attribute for the classification with k-Nearest-Neighbor.
+  	\end{enumerate}
+    \item Use the k-means clustering algorithm with k = 2 for the following steps of clustering:
+    \begin{enumerate}
+    	\item (2 Points) Initialize the clusters $S_1^0 and S_2^0$ by the arbitrary cluster centers m1 = <att1 = 2, att2 = 2> and m2 = <att1 = 2, att2 = 3>. Use the distances given in the table. If the typical selection criterion is not enough, choose the smaller cluster.
+    	\item (4 Points) Calculate the central means $m_1^1 and m_2^1 for S_1^0 and S_2^0$.
+    \end{enumerate}
+  	
+  \end{enumerate}
+  \end{enumerate}
+\end{document}
--- a/Systeme/WS1617/hmm.png
+++ b/Systeme/WS1617/hmm.png
--- a/Systeme/WS1617/inductive.png
+++ b/Systeme/WS1617/inductive.png
--- a/Systeme/WS1617/knearest.png
+++ b/Systeme/WS1617/knearest.png
--- a/Systeme/WS1617/knnrep.png
+++ b/Systeme/WS1617/knnrep.png
--- a/Systeme/WS1617/reinforcement.png
+++ b/Systeme/WS1617/reinforcement.png