Multi-layer Perceptrons

SCE4101

Multi-layer Perceptrons Computational Intelligence

Multi-layer perceptrons: Selection of a multi-layer perceptron for a specific data classification task Warren Gauci Abstract – This paper considers the different parameters of multi layer perceptron architectures and suggests a suitable architecture to complete a specific data set classification task. Results obtained from this paper are based on Math Works Neural Network Toolbox software. Rigorous testing with ariable hidden layer si!e" learning rate and tests sets lead to a neural architecture with the best performance measures. All testing was performed on an #ris $ata set. %erformance measures are ealuated by the use of the confusion matrix" the mean s&uare error plot and the receier operating characteristic chart. This paper will contribute to further adancements in the field of neuron training and in the field of distinguishing and classifying linearly and non'linearly separable data.

All weighted inputs are then added together and if they e$ceed a pre-set threshold "alue# the neuron fires! An ad,ustment to the M model lead to the formulation of the perceptron.# a term coined y /rank 0osenlatt! A perceptron is an M model with additional# fi$ed# pre processing! &his paper deals with perceptron architectures structure# as this kind of neuron is est for pattern recognition (see 123)! erceptrons may e grouped in single layer or multilayer architectures! Single layer architectures are restricted in classifying only linearly separale data# thus in this paper only multi layer perceptron (M4) networks are used# as the connection form one layer to the ne$t allows for non-linearly separale data recognition and classification! /or a comprehensi"e o"er"iew of other kinds of networks refer to 153!

&he ANN must e trained using a learning process! &his process in"ol"es the memorisation of patterns and the suse%uent response of the neural network! &his can e categorised into two paradigms' associati"e mapping and regularity detection! 4earning is performed y the updating of the "alue of weights associated with each input! &he methodology used in this paper makes use of an adapti"e network# in which neurons found in the input layer are capale of changing their weights! &he adapti"e network is introduced to a super"ised learning procedure# where each neuron actually knows the target output and ad,usts weight to the input signals to minimise the error! &he error is stipulated using a least mean s%uare con"ergence techni%ue! &he eha"iour of an ANN depends also on the input-output transfer function# which is specified for the units! &his paper makes use of sigmoid units# where the output "aries continuously ut not linearly as the input changes! Sigmoid units ear a greater resemlance to real neurons than linear units do! 6n order to train the ANN to perform a cl sifi ti task kind of ight

toolo$ allowed for the di"ision of the data set into training# "alidation and test sets! &he function that changes the numer of neurons in the hidden layer was used to change the M4 architecture! &he performance of the different ANN.s was assessed using the performance plots pro"ided y this software! '." (T(SET

&he dataset used for classification is an 6ris ata set! reated y 0!A /isher# this data set is a classic in the field of pattern recognition! 6t contains 9 classes of ;< instances each! +ach class refers to a type of 6ris plant' Setosa# =ersicolour# =irginica! >ne class is linearly separale from the other two# while the latter are not linearly separale from each other! +ach class has four attriutes' sepal length# sepal width# petal length# petal width! 4. METO

&he est structure of M4 to perform the gi"en classification task was determined

Collection of Data

Sample No 1 " ' 4  7 : 8 9 10

Min mse Average mse "tandard.dev iation

Mean S/uare Error ali%ation ata Set2 ) si3e ) si3e l si3e ) si3e  10 ) si3e ' 70 100 0.00718 0.000""4 0.041 49 4 0.000'' 0.'47'7 4 0.000': 0.014' 0.000'0 0.14077 0.0":""7 "' : 1: 0.0110 0.00'0:  ' 0."1:94 0."188' 0.17041 0.00'4: 0.0008'4 1.00E0.00""" 0" 8 '.89E-08 07 9: 0.00:99" 1.::E0."4494 8 0.0779 0.": 0 0.000:1 : 0.":4' 0.0"78 0.1974 0.14 0.01:4 0.00:7 0.00:' 0."19: 0.01 0.17 0.00""8 0.00 0.00"'' 0.00"41 0.00:4 0.007: 0.1"97 0.""7 0.'"1 0.140'' 0.00"'4 0.0"' 0."704 0.00'4 0.000:1 0.000""4 1.001E- 0.00001 : 4 '.88:E-08 07 :: 0.0:' 0.04044" 0.04019 0.1'9" 0.04:' 97 94 0'4 48 4 0.0947" 0.09491' 0.0:'0: 0.1'':" 0.07744 "9 9" '7 7" 4

Choice of best data ) si3e '5 lr65 Epoc&6"000 Iris Thyroid Datset dataset ".09E-09 0.041"" 0.00':90  0.048414 1.81E-08 0.049'8 0.041 0.0404 ".0"E-08 0.04'': 0.0':'': 0.001' :.47E-07 0.0'997 0.0"918 0.04"'1 1.4E-01 0.04899 0.007489 0.04'" 2.09E-09 0.0'997 0.0449 2.!E-02 8 0.0##$!% 0.00'77 00$ 18"

Dpload the est sample for each hidden layer si7e from the collection of data method' etermine the est o"erall sample and • the corresponding hidden layer si7e (using a"erage standard de"iation functions)' Work out another 2< samples using the • est determined B4 si7e' Select the est sample o"erall using • "alidation and test performance plots' Sa"e parameters of est sample and try • this ANN architecture on a new set of data! &his section of the method allowed for the determination of the o"erall est ANN architecture using another set of samples! . RES!)TS •

&he most rele"ant results are taulated in Table 1! All results were otained using the following percentage ratios for training# "alidation and test

7. ISC!SSION

6t may e concluded that although results are not always satisfactory# consistency is present only in consideraly small si7ed B4 networks! /urthermore# results show that class 5 and 9 are the classes containing non-linearly separale data! 6t may also e concluded that a specific M4 architecture for a particular classification task can e chosen# ut classification in random and not always consistent!

8. REERENCES

123 M! Nrgaard# >! 0a"n# N! oulsen# and 4! Bansen# Neural Networks for Modelling and ontrol of ynamic Systems#M! Grimle and M! Hohnson# +ds! 4ondon: Springer-=erlag# 5<<
9. (ppen%i< (

Multi-layer Perceptrons

Recommend Documents