MAGNETRON Seminar Report
Submitted in partial fulfillment of degree of
Bachelor of Technology In Electronics & Communication Communication
Krishna Institute of Engg. & Technology Ghaziabad (Affiliated to Uttar Pradesh Technical University Lucknow) 2010-2011
Submitted to: Sumita Rai Chaudhary
1
Certificate
This is to certify that the seminar entitled “” presented by Ankit Geol IVth year student of B.Tech (IT) Degree course from R.D. ENGINEERING College affiliated to UttarPradesh Technical University , is done under the direct supervision and guidance of Mr. Sudhanshu Sourabh. The report similar to this topic has not been submitted for any examination and does not form part of any other course undergone by the candidate. His discipline and overall conduct are found good.
(Mr. Sudhansu Sourabh)
(Mohd.Vakil)
Guide
Head of Deptt.
2
Acknowledgement
I take this opportunity to express my sincere thanks and deep gratitude to all those people peo ple who extended their wholehearted co-operation and helped me in completing this seminar report successfully.
I express my deep and sincere gratitude g ratitude to Mr.Shudhanshu Sourabh (guide) and Mohd. Vakil (Head Of Department) who provided me an opportunity ,inspiration and requisite facility to
complete this report in spite of his busy schedule and patiently solving my rather amateurish queries.
I am also earnestly thankful to all the concerned faculties who supported and furnished information to make this endeavor a success .
Ankit Goel B.Tech I.T. IVth yr
3
4
TABLE OF CONTENTS CHAPTER NO.
TITLE Cover Page & Title Page Certificate Acknowledgement Abstract
PAGE NO. i ii iii iv
1. Introducti Introduction on to Artificia Artificiall Neural Neural Network Network 1.1 Introduction 1.2 Historical Background 1.3 Biological Inspiration 1.4 Neural N/w v/s Conventional N/w 1.5 What are Compute Computerr System System good at..not at..not so good at
1 2 2 3
2. Neur Neural al Arti Artite tect ctur uree Analogy to the brain
7
2.2 Artificial Neurons & how they work
7
Electronic Implementation of Artificial Neurons 2.4 Artificial Network Operation
8 9
3Neural Model & Classification 3.1 models 3.2 Biological Model
17
3.3 Mathematical Model
17
3.4 Classification of NN
17
3.5 Probabilistic Neural Network 3.6 Generalised Neural Regression Network 3.7 Linear Network 3.8 SOFM Networks 4
ANN Processing 4.1 Gathering Data for Neural Network 4.2 Learning 4.3 Learning Process 4.4 Transfer Function
5
18 18 18 19 21 21 22 24 25
4.5 5
Pre-Post Processing
26
Application of Neural Network 5.1 Area where we can use it? 5.2 5.2 What What can can do with with NN and and what what no not? t? 5.3 Who is concern with NN? 5.4 Software available
6
26 27 28 29
31
1 Introduction to the Artificial Neural Network 1.1 Introduction Artificial Neural Networks are relatively crude electronic models based on the neural structure of the brain. The brain basically learns from experience. It is natural proof that some problems that are beyond the scope of current computers are indeed solvable by small energy efficient packages. This brain modeling also promises a less technical way to develop machine solutions. This new approach to computing also provides a more graceful degradation during system overload than its more traditional counterparts.
These biologically inspired methods of computing are thought to be the next major advancement in the computing industry. Even simple animal brains are capable of functions that are currently impossible for computers. Computers do rote things well, like keeping ledgers or performing complex math. But computers have trouble recognizing even simple patterns much less generalizing those patterns of the past into actions of the future. Now, advances in biological research promise an initial understanding of the natural thinking mechanism. This research shows that brains store information as patterns. Some of these patterns are very complicated and allow us the ability to recognize individual faces from many different angles. This process of storing information as patterns, utilizing those patterns, and then solving problems encompasses a new field in computing. co mputing. This field, as mentioned before, does not utilize traditional programming but involves the creation of massively parallel networks and the training of those networks to solve specific problems. This field also utilizes words very different from traditional computing, words like behave, react, self-organize, learn, generalize, and forget. Artificial neural networks (ANN) are among the newest signal-processing technologies in the engineer's toolbox. The field is highly interdisciplinary, but our approach will restrict the view to the engineering perspective. In engineering, neural networks serve two important functions: as pattern classifiers and as nonlinear adaptive filters. We will provide a b rief overview of the theory, learning rules, and applications of the most important neural network models. Definitions and Style of Computation An Artificial Neural Network is an adaptive, most often nonlinear system that learns to perform a function (an input/output map) from data. Adaptive means that the system parameters are changed during operation, normally n ormally called the training phase. After the training phase the Artificial Neural Network parameters are fixed and the system is deployed to solve the problem at hand (the testing phase). The Artificial Neural Network is built with a systematic step-by-step procedure to optimize a performance criterion or to follow some implicit internal constraint, which is commonly referred to as the learning rule. The input/output training data are fundamental in neural network technology, because they convey the necessary information to "discover" the optimal operating point.
7
An input is presented to the neural network and a corresponding desired or target response set at the output (when this is the case the training is called supervised). . The error information is fed back to the system and adjusts the system parameters in a systematic fashion (the learning rule). The process is repeated until the performance is acceptable. It is clear from this description that the performance hinges heavily on the data. If one does not have data that cover a significant portion of the operating conditions or if they are noisy, then neural network technology is probably not the right solution. On the other hand, if there is plenty of data and the problem is poorly understood to derive an approximate model, then neural network technology is a good choice. This operating procedure should be contrasted with the traditional engineering design, made of exhaustive subsystem specifications and intercommunication protocols. In artificial neural networks, the designer chooses the network topology, the performance function, the learning rule, and the criterion to stop the training phase, but the system automatically adjusts the parameters. So, it is difficult to bring a priori information into the design, and when the system does not work properly it is also hard to incrementally refine the solution. But ANN-based solutions are extremely efficient in terms of development time and resources, and in many difficult problems artificial neural networks provide performance that is difficult to match with other technologies. Denker 10 years ago said that "artificial neural networks are the secon d best way to implement a solution" motivated by the simplicity of their design and because of their universality, only shadowed by the traditional design obtained by studying the physics of the problem. At present, artificial neural networks are emerging as the technology of choice for many applications, such as pattern recognition, prediction, system identification, and control.
1.2 Historical Background Neural network simulations appear to be a recent development. However, this field was established before the advent of computers, and has survived at least one major setback and several eras. Many important advances have been boosted by the use of inexpensive computer emulations. Following an initial period of enthusiasm, the field survived a period of frustration and disrepute.
8
During this period when funding and professional support was minimal, relatively few researchers made important advances. The first artificial neuron was produced in 1943 by the neurophysiologist Warren McCulloch and the logician Walter Pits. But the technology available at that time did not allow them to do too much. 1. First Attempts: There were some initial simulations using formal logic. McCulloch and
2.
3.
4.
5.
Pitts (1943) developed models of neural networks based on their understanding of neurology. These models made several assumptions about how neurons worked. Their networks were based on simple neurons, which w hich were considered to be binary devices with fixed thresholds. The results of their model were simple logic functions such as "a or b" and "a and b". b ". Another attempt was by using computer simulations. Two group s (Farley and Clark, 1954; Rochester, Holland, Haibit and Duda, 1956). The first group (IBM researchers) maintained closed contact with neuroscientists at McGill University. So whenever their models did not work, wo rk, they consulted the neuroscientists. This interaction established a multidisciplinary trend, which continues to the present day. Promising & Emerging Technology: Not only was neuroscience influential in the development of neural networks, but psychologists p sychologists and engineers also contributed to the progress of neural network simulations. Rosenblatt (1958) stirred considerable interest and activity in the field when he designed and developed the Perceptron. The Perceptron had three layers with the middle layer known as the association layer. This system could learn to connect or associate or associate a given input to a random output unit. Another system was the ADALINE (ADAptive LInear Element), which was dev eloped in 1960 by Widrow and Hoff (of Stanford University). The ADALINE was an analogue electronic device made from simple components. The method used for learning was different to that of the Perceptron; it employed the Least-Mean-Squares (LMS) learning rule. Period of Frustration & Disrepute: In 1969 Minsky and Papert wrote a book in which they generalized the limitations of single layer Perceptions to multilayered systems. In the book they said: "...our intuitive judgment that the extension (to multilayer systems) is sterile". The significant result of their book was to eliminate funding for research with neural network simulations. The conclusions supported the disenchantment of researchers in the field. As a result, considerable prejudice against this field was activated. Innovation: Although public interest and available funding were minimal, several researchers continued working to develop neuromorphically based computational methods for problems such as pattern recognition. During this period several paradigms were generated which modern work continues to enhance. Grossberg's (Steve Grossberg and Gail Carpenter in 1988) influence founded a school of thought, which explores resonating algorithms. They developed the ART (Adaptive Resonance Theory) networks based on biologically plausible models. Anderson and Kohonen developed associative techniques independent of each other. Klopf (A. Henry Klopf) in 1972 developed a basis for learning in artificial neurons based on a biological principle for neuronal learning called homeostasis. The original network was published in 1975 and was called the Cognitron. Re-Emergence: Progress during the late 1970s and early 1980s was important to the reemergence on interest in the neural n eural network field. Several factors influenced this movement. For example, comprehensive books and conferences provided a forum for
9
people in diverse fields with specialized technical languages, and the response to conferences and publications was quite positive. The news media picked up on the increased activity and tutorials helped disseminate the technology. A cademic programs appeared and courses were introduced at most major Universities (in US and Europe). Attention is now focused on funding levels throughout Europe, Japan and the US and as this funding becomes available, several new commercial with applications in industry and financial institutions are emerging. 6. Today: Significant progress has been made in the field of neural networks-enough to attract a great deal of attention and fund further research. Advancement beyond current commercial applications appears to be possible, and research is advancing the field on many fronts. Neutrally based chips are emerging and applications a pplications to complex problems developing. Clearly, today is a period of transition for neural network technology 1.3 Biological Inspiration
Neural networks grew out of research in Artificial Intelligence; specifically, attempts to mimic the fault-tolerance and capacity to learn of biological neural systems by modeling the low-level structure of the brain (see Patterson, 1996). The main branch of Artificial Intelligence research in the 1960s -1980s produced Expert Systems. These are based upon a high-level model of reasoning processes (specifically, the concept that our reasoning processes are built upon manipulation of symbols). It became rapidly apparent that these systems, although very useful in some domains, failed to capture certain key aspects of human intelligence. According to one on e line of speculation, this was due to their failure to mimic the underlying structure of the brain. The axons of one cell connect to the dendrites of another via a synapse. When a neuron is activated, it fires an electrochemical signal along the axon. This signal crosses the synapses to other neurons, n eurons, which may in turn fire. A neuron fires only if the total signal received at the cell ce ll body from the dendrites exceeds a certain level (the firing threshold). •
•
•
The strength of the signal received by a neuron (and therefore its chances of firing) critically depends on the efficacy of the synapses. Each synapse actually contains a gap, with neurotransmitter chemicals poised to transmit a signal across the g ap. One of the most influential researchers into neurological systems (Donald Hebb) postulated that learning consisted principally in altering the "strength" of synaptic connections. For example, in the classic Pavlovian conditioning experiment, where a bell is rung just before dinner is delivered to a dog, the dog rapidly learns to associate the ringing of a bell with the eating of food. The synaptic connections between the appropriate part of the auditory cortex and the salivation glands are strengthened, so that when the auditory cortex is stimulated by the sound of the bell the dog starts to salivate. Recent research in cognitive science, in particular in the area of non-conscious information processing, h ave further demonstrated the enormous capacity of the human h uman mind to infer ("learn") simple input-output co variations from extremely complex stimuli (e.g., see Lewicki, Hill, and Czyzewska, 1992). Thus, from a very large number of extremely simple processing units (each performing a weighted sum of its inputs, and then firing a binary signal if the total
10
input exceeds a certain level) the brain manages to perform extremely complex tasks. Of course, there is a great deal of complexity in the brain which has not been discussed here, but it is interesting that artificial neu neural ral net networ works ks can achieve some remarkable results using a model not much more complex than this 1.4 Neural networks versus conventional computers
Neural networks take a different approach to problem solving than that of conventional conve ntional computers. Conventional computers use an algorithmic approach i.e. the computer follows a set of instructions in order to solve a problem. Unless the specific steps that the computer needs to follow are known the computer cannot solve the problem. That restricts the problem solving capability of conventional computers to problems that we already understand and know how to solve. But computers would be so much more useful if they could do things that we don't exactly know how to do. Neural networks process information in a similar way the human b rain does. The network is composed of a large number of highly interconnected processing elements(neurones) working in parallel to solve a specific problem. Neural networks learn by example. They cannot be programmed to perform a specific task. The examples must be selected carefully otherwise useful time is wasted or even worse the network might be functioning incorrectly. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be unpredictable. On the other hand, conventional computers use a cognitive approach to problem solving; the way the problem is to solved must be known and stated in small unambiguous instructions. These instructions are then converted to a high level language program and then into machine code that the computer can understand. These machines are totally predictable; if anything goes wrong is due to a software or hardware fault. Neural networks and conventional algorithmic computers are not in competition but complement each other. There are tasks are more suited to an algorithmic approach like arithmetic operations and tasks that are more suited to neural n eural networks. Even more, a large number of tasks, require systems that use a combination of the two approaches ap proaches (normally a conventional computer is used to supervise the neural network) in order to perform at maximum efficiency. 1.5 What are (everyday) computer systems good at... .....and not so good at?
11
1.1.1.1.1.1.1.1 Go Good at
1.1.1.1.1.1.1.2 No Not so good at Interacting with noisy data or data from the environment
Fast arithmetic
Doing precisely what the programmer programs them Massive parallelism to do Massive parallelism Fault tolerance Adapting to circumstances
Advantages: •
•
• •
•
A neural network can perform p erform tasks that a linear program can not. When an element of the neural network fails, it can continue without any problem by their parallel nature. A neural network learns and does not need to be reprogrammed. It can be implemented in any application. It can be implemented without any problem.
Disadvantages: •
•
•
The neural network needs training to operate. The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated. Requires high processing time for large neural networks.
2 Neural Architecture 2.1 Analogy to the Brain
The exact workings of the human h uman brain are still a mystery. Yet, some aspects of this amazing processor are known. In particular, the most basic element of the human brain is a specific type of cell which, unlike the rest of the body, doesn't appear to regenerate. Because this type of cell is the only part of the body that isn't slowly replaced, it is assumed that these cells are what provide us with our abilities to remember, think, and apply previous experiences to our every e very 12
action. These cells, all 100 billion of them, are known as neurons. Each of these neurons can connect with up to 200,000 other neurons, although 1,000 to 10,000 are typical. The power of the human mind comes from the sheer numbers of these basic components and the multiple connections between them. It also comes from genetic programming and learning. The individual neurons are complicated. They have a myriad of parts, sub-systems, and control mechanisms. They convey information via a host of electrochemical pathways. There are over one hundred different classes of neurons, depending dep ending on the classification method used. Together Tog ether these neurons and their connections form a process which is not binary, not stable, and not synchronous. In short, it is nothing like the currently available electronic computers, or even artificial neural networks. These artificial neural networks try to replicate only the most basic elements of this complicated, versatile, and powerful organism. They do it in a primitive way. But for the software engineer who is trying to solve problems, neural computing was never about replicating human brains. It is about machines and a new way to solve problems. 2.2 Artificial Neurons and How They Work
The fundamental processing element of a neural network is a neuron. This building block of human awareness encompasses a few general capabilities. Basically, a biological neuron receives inputs from other sources, combines them in some way, w ay, performs a generally nonlinear operation on the result, and then outputs the final result. Figure 2.2.1 shows the relationship of these four parts.
Figure 2.2.1 A Simple Neuron. Within humans there are many variations on this basic type of neuron, further complicating man's attempts at electrically replicating the process of thinking. Yet, all na tural neurons have the same four basic components. These components co mponents are known by their biological names - dendrites,
13
soma, axon, and synapses. Dendrites are hair-like extensions of the soma which act like input channels. These input channels receive their input through the synapses of other neurons. The soma then processes these incoming signals over time. The soma then turns that processed value into an output, which is sent out to other neurons through the axon and the synapses. Recent experimental data has provided further evidence that biological neurons are structurally more complex than the simplistic explanation above. They are significantly more complex than the existing artificial neurons that are built into today's artificial neural networks. As biology provides a better understanding of neurons, and as technology advances, network designers can continue to improve their systems by building upon man's u nderstanding of the biological brain. But currently, the goal of artificial neural networks is not the grandiose recreation of the brain. On the contrary, neural network researchers are seeking an understanding of nature's capabilities for which people can engineer en gineer solutions to problems that have not been be en solved by traditional computing. To do this, the basic unit of neural networks, the artificial neurons, simulates the four basic functions of natural neurons. Figure 2.2.2 shows a fundamental representation of an artificial neuron.
Figure 2.2.2 A Basic Artificial Neuron. In Figure 2.2.2, various inputs to the network are represented by the mathematical symbol, x(n). Each of these inputs is multiplied by a connection con nection weight. These weights are represented by w(n). In the simplest case, these products are simply summed, fed through a transfer function to generate a result, and then output. This process lends itself to physical implementation on a large scale in a small package. This electronic implementation is still possible with other network structures which utilize different summing functions as well as different d ifferent transfer functions.
14
Some applications require "black and white," or binary, answers. These applications include the recognition of text, the identification of speech, and the image deciphering of scenes. These applications are required to turn real-world inputs into discrete values. These potential values are limited to some known set, like the ASCII characters or the most common 50,000 English words. Because of this limitation of output options, these applications don 't always utilize networks composed of neurons that simply sum up, and thereby smooth, inputs. These networks may utilize the binary properties of ORing and ANDing of o f inputs. These functions, and many others, can be built into the summation and transfer functions of a network. Other networks work on problems where the resolutions are not just one of several known values. These networks need to be capable of an infinite number of responses. Applications of this type include the "intelligence" behind robotic movements. This "intelligence" processes inputs and then creates outputs, which actually cause some device to move. That movement can span an infinite number of very precise motions. These networks do indeed want to smooth their inputs which, due to limitations of sensors, come in non-continuous bursts, say thirty times a second. To do that, they might accept these inputs, sum that data, and then produce an output by, for example, applying a hyperbolic tangent as a transfer function. In this manner, output values v alues from the network are continuous and satisfy more real world interfaces. Other applications might simply sum and compare to a threshold, thereby producing one of two possible outputs, a zero or a one. Other functions scale the outputs to match the application, such as the values minus one and one. Some functions even integrate the input data over time, creating time-dependent networks. 2.3 Electronic Implementation of Artificial Neurons
In currently available software packages these artificial neurons are called "processing elements" and have many more capabilities cap abilities than the simple artificial neuron described above. Tho se capabilities will be discussed later in this report. Figure 2.2.3 is a more detailed schematic of this still simplistic artificial neuron.
15
Figure 2.2.3 A Model of a "Processing Element". In Figure 2.2.3, inputs enter into the processing element from the upper left. The first step is for each of these inputs to be multiplied by their respective weighting factor (w(n)). Then these modified inputs are fed into the summing function, which usually just sums these products. Yet, many different types of operations can be selected. These operations could produce a number of different values which are then propagated forward; values such as the average, the largest, the smallest, the ORed values, the ANDed values, etc. Furthermore, most commercial development products allow software engineers to create their own summing functions via routines coded in a higher-level language (C is commonly supported). Sometimes the summing function is further complicated by the addition of an activation function which enables the summing function to operate in a time sensitive way. Either way, the output of the summing function is then sent into a transfer function. This function then turns this number into a real output via some algorithm. It is this algorithm that takes the input and turns it into a zero or a one, a minus one or a one, or some other number. The transfer functions that are commonly supported are sigmoid, sine, hyperbolic tangent, etc. This transfer function also can scale the output or control its value via thresholds. The result of the transfer function is usually the direct output of the processing e lement. An example of how a transfer function works is shown in Figure 2.2.4. 16
This sigmoid transfer function takes the value from the summation function, called sum in the Figure 2.2.4, and turns it into a value between zero and one.
Figure 2.2.4 Sigmoid Transfer Function. Finally, the processing element is ready to output the result of its transfer function. This output is then input into other processing elements, or to an outside connection, as dictated by the structure of the network. All artificial neural networks are constructed from this basic building block - the processing element or the artificial neuron. It is variety and the fundamental differences in these building blocks, which partially cause the implementing of neural networks to be an "art." 2.4 Artificial Network Operations
The other part of the “art” of using neural networks revolves around the myriad of ways these individual neurons can be clustered together. This clustering occurs in the human mind in such a way that information can be processed in a dynamic, interactive, and self-organizing way. Biologically, neural networks are constructed in a three-dimensional world from microscopic components. These neurons seem capable of nearly unrestricted interconnections. That is not true of any proposed, or existing, man-made network. Integrated circuits, using current technology, are two-dimensional devices with a limited number of layers for interconnection. This physical reality restrains the types, and scope, of artificial neural networks that can be implemented in silicon. Currently, neural networks are the simple clustering of the primitive artificial neurons. This clustering occurs by creating layers, which are then connected co nnected to one another. How these layers connect is the other part of the "art" of engineering networks to resolve real world problems.
17
Figure 2.4.1 A Simple Neural Network Diagram. Basically, all artificial neural networks have a similar structure or topology as shown in Figure 2.4.1. In that structure some of the neurons interface to the real world to receive its inputs. Other neurons provide the real world with the network's outputs. This output might be the particular character that the network thinks that it has scanned or the particular image it thinks is being viewed. All the rest of the neurons are hidden from view. But a neural network is more than a bunch of neurons. neu rons. Some early researchers tried to simply connect neurons in a random manner, without much success. Now, it is known that even the brains of snails are structured devices. One of the easiest ways to design a structure is to create layers of elements. It is the grouping of these neurons neu rons into layers, the connections between these layers, and the summation and transfer functions that comprises a functioning neural network. The general terms used to describe these characteristics are common to all networks. The way that the neurons are connected to each other has a significant impact on the operation of the network. In the larger, more professional software development packages the user is allowed to add, delete, and control these connections at will. By "tweaking" parameters these connections can be made to either excite or inhibit. 2.5 Perceptrons
The most influential work on neural nets in the 60's went under the heading h eading of 'perceptrons' a term coined by Frank Rosenblatt. The perception (figure 4.4) turns out to be an MCP model (neuron with weighted inputs) with some additional, fixed, pre--processing. Units lapelled A1 , A2, Aj , Ap are called association units and their task is to extract specific, localized featured from the input images. Perceptrons mimic the basic idea behind the mammalian visual system. They were mainly used in pattern recognition even though their capabilities extended a lot more.
18
Figure 4.4 In 1969 Minsky and Papert wrote a book in which they described the limitations of single layer Perceptrons. The impact that the book had was tremendous and caused a lot of neural network researchers to loose their interest. The book was very well written and showed mathematically that single layer perceptrons could not do some basic pattern recognition operations like determining the parity of a shape or determining d etermining whether a shape is connected or not. What they did not realised, until the 80's, is that given the appropriate training, multilevel perceptrons can do these operations.
3 Neural Model & Classification 3.1 Neural Models 3.1.1 The Biological Model Artificial neural networks emerged after the introduction of simplified neurons by McCulloch and Pitts in 1943 (McCulloch & Pitts, 1943). 194 3). These neurons were presented as models of biological neurons and as conceptual components for circuits that could perform computational tasks. The basic model of the neuron is founded upon the functionality of a biological neuron. "Neurons are the basic signaling units of the nervous system" and "each neuron is a discrete cell whose several processes arise from its cell body".
19
The neuron has four main regions to its structure. The cell body, or soma, has two offshoots from it, the dendrites, and the axon, which end in presynaptic terminals. The cell body is the heart of the cell, containing the nucleus and maintaining protein synthesis. A neuron may have many dendrites, which branch out in a treelike structure, and receive signals from other neurons. A neuron usually only has one axon which grows out from a part of the cell body called the axon hillock. The axon conducts electric signals generated at the axon hillock down its length. These electric signals are called action potentials. The other end of the axon may split into several branches, which end in a presynaptic terminal. Action potentials are the electric signals that neurons use to convey information to the brain. All these signals are identical. Therefore, the brain determines what type of information is being received based on the path that the signal took. The brain analyzes the patterns p atterns of signals being sent and from that information it can interpret the type of information being received. Myelin is the fatty tissue that surrounds and insulates the axon. Often short axons do not need this insulation. There are insinuated parts of the axon. These areas are called Nodes of Ranvier. At these nodes, the signal traveling down the axon is regenerated. This ensures that the signal traveling down the axon travels fast and remains constant (i.e. very short propagation delay and no n o weakening of the signal). The synapse is the area of contact between two neurons. The neurons do not actually physically touch. They are separated by the synaptic cleft, and electric signals are sent through chemical 13 interaction. The neuron sending the signal is called the presynaptic cell and the neuron receiving the signal is called the postsynaptic cell. The signals are generated ge nerated by the membrane potential, which is based on the differences in concentration of sodium sod ium and potassium ions inside and outside the cell membrane. Neurons can be classified by their number of processes (or appendages), or by their function. 3.1.2 The Mathematical Model
When creating a functional model of the biological neuron, there are three basic components of importance. First, the synapses of the neuron are modeled as weights. The strength of the connection between an input and a neuron is noted by the value of the weight. Negative weight values reflect inhibitory connections, while positive values designate excitatory connections [Haykin]. The next two components model the actual activity within the neuron cell. An adder sums up all the inputs modified by their respective weights. This activity is referred to as linear combination. Finally, an activation function controls the amplitude of the output of the neuron. An acceptable range of output is usually between 0 and 1, or -1 and 1.
20
Mathematically, this process is described in the
From this model the interval activity of the neuron can be shown to be:
The output of the neuron, yk, would therefore be the outcome of some activation function on the value of vk. Activation functions As mentioned previously, the activation function acts as a squashing function, such that the output of a neuron in a neural network is between certain values (usually 0 and 1, or -1 and 1). In general, there are three types of activation functions, denoted by Φ(.) . First, there is the Threshold Function, which takes on a value of 0 if the summed input is less than a certain threshold value (v), and the value v alue 1 if the summed input is greater g reater than or equal to the threshold value.
Secondly, there is the Piecewise-Linear function. This function again can take on the values of 0 21
or 1, but can also take on values between that depending on the amplification factor in a certain region of linear operation.
Thirdly, there is the sigmoid function. This function can range between 0 and 1, but it is also sometimes useful to use the -1 to 1 range. An example of the sigmoid function is the hyperbolic tangent function.
The artificial neural networks, which we describe, are all variations on the parallel distributed processing (PDP) idea. The architecture of each neural network is based on very similar building blocks, which perform the processing. In this chapter we first discuss these processing units and discuss different neural network topologies. Learning strategies as a basis for an adaptive system 3.2 Classification of Neural Network 3.2.1 Probabilistic Neural Networks
22
Elsewhere, we briefly mentioned that, in the context of classification of classification problems, problems, a useful interpretation of network outputs was as estimates of probability of class membership, in which case the network was actually learning to estimate a probability density function (pdf.). A similar useful interpretation can be made in regression problems if the output of the network is regarded as the expected value of the model at a given point in input-space. This expected value is related to the joint probability density function of the output and inputs. Estimating probability density functions from data has a long statistical history (Parzen, 1962), and in this context fits into the area of Bayesian statistics. Conventional statistics can, given a known model, inform us what the chances of certain outcomes are (e.g., we know that a unbiased die has a 1/6th chance of coming up with a six). Bayesian statistics turns this situation situation on its head, by estimating the validity of a model given certain data. More generally, Bayesian statistics can estimate the probability density of model parameters given the available data. To minimize error, the model is then selected whose parameters maximize this pdf. In the context of a classification problem, if we can construct estimates of the pdf of the possible classes, we can compare the probabilities p robabilities of the various classes, and select the most probable. This is effectively what we ask a neural network network to to do when it learns a classification problem the network attempts to learn (an approximation to) the pdf. A more traditional approach is to construct an estimate of the pdf. From the data. The most traditional technique is to assume a certain form for the pdf. (Typically, that it is a normal distribution), and then to estimate the model parameters. The normal distribution is commonly used as the model parameters (mean (mean and standard deviation) deviation) can be estimated using analytical techniques. The problem is that the assumption a ssumption of normality is often not justified. In the PNN, there are at least three layers: input, radial, and output layers. The radial u nits are copied directly from the training data, one per case. Each models a Gaussian function centered at the training case. There is one output unit per class. Each is connected connec ted to all the radial units belonging to its class, with zero connections from all other radial units. Hence, the output units simply add up the responses of the units u nits belonging to their own class. The outputs are each proportional to the kernel-based estimates of the pdf of the various classes, and by normalizing these to sum to 1.0 estimates of class probability are produced. 3.2.2 Generalized Regression Neural Networks
Generalized regression neural networks (GRNNs) work in a similar fashion to PNNs, but perform regression rather than classification tasks (see Speckt, 1991; Patterson, 1996; Bishop, 1995). As with the PNN, Gaussian kernel functions are located at each training case. Each case can be regarded, in this case, as evidence that the response surface is a given height at that point in input space, with progressively decaying evidence in the immediate vicinity. The GRNN copies the training cases into the network to be used to estimate the response on new points. The output is estimated using a weighted average of the outputs of the training cases, where the weighting is related to the distance of the point from the point being estimated (so that points nearby contribute most heavily to the estimate).
23
3.2.3 Linear Networks A general scientific principal is that a simple model should always be chosen in preference to a complex model if the latter does not n ot fit the data better. In terms of function approximation, app roximation, the simplest model is the linear model, model, where the fitted function is a hyper plane. In classification classification,, the hyper plane is positioned to divide the two classes (a linear discriminate function); in regression regression,, it is positioned to pass through the data. A linear model is typically represented using an NxN matrix matrix and an Nx1 bias vector. A neural network with no hidden layers, and an output with dot product synaptic function and identity activation function, actually implements a linear model. The we ights correspond to the matrix, and the thresholds to the bias vector. When the network is executed, it effectively multiplies the input by the weights matrix then adds the bias vector. The linear network provides a good benchmark against which to compare the performance of your neural networks. It is quite possible that a problem that is thought to be highly complex can actually be solved as well by linear techniques as by neural networks. If you have only o nly a small number of training cases, you are probably probab ly anyway not justified in using a more complex model.
3.2.4 SOFM Networks
Self Organizing Feature Map (SOFM, or Kohonen) Ko honen) networks are used quite differently to the other networks. Whereas all the other networks are designed for supervised tasks, SOFM networks are designed primarily for unsupervised learning (see Kohonen, 1982; Haykin, 1994; Patterson, 1996; Fausett, 1994). Whereas in supervised learning the training data set contains cases featuring input variables together with the associated outputs (and the network must infer a mapping from the inputs to the outputs), in unsupervised learning the training data set contains only input variables. At first glance this may seem strange. Without outputs, what can the network learn? The answer is that the SOFM network attempts network attempts to learn the structure of the data. One possible use is therefore in exploratory data analysis. The SOFM network can learn to recognize clusters of data, and can also relate similar classes to each other. The user can build up an understanding of the data, which is used to refine the network. As classes of data are recognized, they can be labeled, so that the network becomes capable of classification of classification tasks. SOFM networks can also be used u sed for classification when output classes are immediately available - the advantage in this case is their ability to highlight similarities between classes. A second possible use is in novelty nov elty detection. SOFM networks can learn to recognize recogn ize clusters in the training data, and respond to it. If new data, unlike previous cases, is encountered, the network fails to recognize it and this indicates novelty.
24
A SOFM network has only two layers: the input layer, and an output layer of radial units (also known as the topological map layer). The units in the topological map layer are laid out in space - typically in two dimensions (although (although ST Neural Networks also supports one-dimensional Kohonen networks). 3.2.5 Radial basis function (RBF) network
Radial Basis Functions are powerful techniques for interpolation in multidimensional space. A RBF is a function, which has built into a distance criterion with respect to a center. Radial basis functions have been applied in the area of neural networks where they may be used as a replacement for the sigmoidal hidden layer transfer characteristic in Multi-Layer Perceptrons. RBF networks have two layers of processing: In the first, input is mapped onto each RBF in the 'hidden' layer. The RBF chosen is usually a Gaussan. In regression problems the output layer is then a linear combination of hidden layer values representing mean predicted output. The interpretation of this output layer value is the same as a regression model in statistics. In classification problems the output layer is typically a sigmoid function of a linear combination of hidden layer values, representing a posterior probability. Performance in both cases is often improved by shrinkage techniques, known as ridge regression in classical statistics and known to correspond to a prior belief in small parameter values (and therefore smooth output functions) in a Bayesian framework.
3.2.6 Recurrent network Contrary to feed forward networks, recurrent neural networks (RNs) are models with bidirectional data flow. While a feed forward network propagates data linearly from input to output, RNs also propagate data from later processing stages to earlier stages. 3.2.7 Simple recurrent network
A simple recurrent network (SRN) is a variation on the Multi-Layer Perceptrons, sometimes called an "Elman network" due to its invention by Jeff Elman. Elman. A three-layer network is used, with the addition of a set of "context units" in the input layer. There are connections from the middle (hidden) layer to these context units fixed with w ith a weight of one. At each time step, the input is propagated in a standard feed-forward fashion, and then a learning rule (usually back propagation) is applied. The fixed back ba ck connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate propaga te over the connections before the learning rule is applied). app lied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard Multi-Layer Perceptrons. In a fully recurrent network , every neuron receives inputs from every other neuron in the network. These networks are not arranged in layers. Usually only a subset of the neurons receive external inputs in addition to the inputs from all the other neurons, and another disjunct subset of neurons report their output externally as well as sending it to all the neurons. These distinctive inputs and outputs perform the function of the input and output layers of a feed-forward feed- forward or simple recurrent network, and also join all the other neurons in the recurrent processing.
25
4 Artificial Neural Network Processing Processing 4.1 Gathering Data for Neural Networks
Once you have decided on a problem to solve using neural network you will need to gather data for training purposes. The training data set includes a number of cases, each containing values for a range of input and output variables. The first decisions you will need to make are: which variables to use, and how many (and which) cases to gather. The choice of variables (at least initially) is guided by intuition. Yo ur own expertise in the problem domain will give you some idea of which input variables are likely to be influential. As a first pass, you should include any variables that you think could have an influence - part of the design process will be to whittle this set down. Handling non-numeric data is more difficult. The most co mmon form of non-numeric data consists of nominal-value variables such as Gender ={ ={Male, Female}. Nominal-valued variables can be represented numerically. However, neural networks do not tend to perform well w ell with nominal variables that have a large number of possible values. For example, consider a neural network being trained to estimate the value of houses. The price of houses depends critically on the area of a city in which they are located. A particular city might be subdivided into dozens of named locations, and so it might seem natural to use a nominal-valued variable representing these locations. Unfortunately, it would be very difficult to train a neural network under these circumstances, and a more credible approach would be to assign ratings (based on expert knowledge) to each area; for example, you might assign ratings for the quality of local schools, convenient access to leisure facilities, etc. Other kinds of non-numeric data must either be converted to numeric form, or discarded. Dates and times, if important, can be converted con verted to an offset value from a starting date/time. Currency values can easily be converted. Unconstrained text fields (such as names) cannot be handled and should be discarded. Many practical problems suffer from data that is unreliable: some variables may be corrupted by noise, or values may be missing altogether. Neural altogether. Neural networks are also noise tolerant. However, there is a limit to this tolerance; if there are occasional outliers far outside the range of normal values for a variable, they may bias the training. The best approach to such outliers is to identify and remove them (either discarding the case, or converting the outlier into a missing value). If outliers are difficult to detect, a city block error function (see Bishop, 1995) may be used, but this outlier-tolerant training is generally less effective than the standard approach. 4.2 Learning
We can categorize the learning situations in two distinct sorts. These are: •
Supervised learning or Associative learning in which the network is trained by providing it with input and matching output o utput patterns. These input-output pairs can be
26
provided by an external teacher, or by the system, which contains the neural network (self-supervised).
•
•
Unsupervised learning or Self-organization in which an (output) unit is trained to respond to clusters of pattern within the input. In this paradigm the system is supposed to discover statistically salient features of the input population. Unlike the supervised learning paradigm, there is no a priori set of categories into which the patterns are to be classified; rather the system must develop its own representation of the input stimuli. Reinforcement Learning This type of learning may be considered as an intermediate form of the above two types of learning. Here the learning machine does some action on the environment and gets a feedback response from the environment. The learning system grades its action good (rewarding) or bad (punishable) based on the environmental response and accordingly adjusts its parameters. Generally, parameter adjustment is continued until an equilibrium state occurs, following which there will be no more changes in its parameters. The self-organizing neural learning may be categorized under this type of learning. 4.3Process of Learning
The memorization of patterns and the subsequent response of the network can be categorized into two general paradigms: I associative mapping in which the network learns to produce a particular pattern on the set of input units whenever another particular pattern is applied on the set of input units. The associative mapping can generally be broken down into two mechanisms: (i) auto-association: an input pattern is associated with itself and the states of input and output units coincide. This is used to provide pattern competition, ie to produce a pattern whenever a portion of it or a distorted pattern is presented. In the second case,
27
the network actually stores pairs of patterns building an association between two sets of patterns.
(ii) hetero-association: is related to two recall mechanisms: 1 nearest-neighbour recall, where the output pattern produced corresponds to the input pattern stored, which is closest to the pattern presented, and
2 interpolative recall, where the output pattern is a similarity dependent interpolation of the patterns stored corresponding to the pattern presented. Yet another paradigm, which is a variant associative mapping is classification, ie when there is a fixed set of categories into which the input patterns p atterns are to be classified.
II regularity detection in which units learn to respond to particular properties of the input patterns. Whereas in associative mapping the network stores the relationships among patterns, in regularity detection the response of each unit has a particular 'meaning'. This type of learning mechanism is essential for feature discovery and knowledge representation. Every neural network possesses knowledge which is contained in the values of the connections weights. Modifying the knowledge stored in the network as a function of experience implies a learning rule for changing the values of the weights.
28
Information is stored in the weight matrix W of a neural n eural network. Learning is the determination of the weights. Following the way learning is performed, p erformed, we can distinguish two major categories of neural networks: Fixed networks in which the weights cannot be changed, ie dW/dt=0. In such networks, the weights are fixed a priori according to the problem to solve. Adaptive networks which are able to change their weights, ie dW/dt not= • 0. •
All learning methods used for adaptive neural neu ral networks can be classified into two major categories: Supervised learning which incorporates an external teacher, so that each output unit is told what its desired response to input signals ought to be. During the learning process global information may be required. Paradigms of supervised learning include error-correction learning, reinforcement learning and stochastic learning. An important issue concerning supervised learning is the problem of error convergence, ie the minimization of error between the desired and computed unit values. The aim is to determine a set of weights, which minimizes the error. One well-known method, which is common to many learning paradigms, is the least mean square (LMS) convergence. Unsupervised learning uses no external teacher and is based upon only • local information. It is also referred to as self-organization, in the sense that it selforganizes data presented to the network and detects their emergent collective properties. Paradigms of unsupervised learning are Hebbian learning and competitive learning. Ano2.2 From Human Neurons to Artificial Neuron Esther aspect of learning concerns the distinction or not of a separate phase, during which the network is trained, and a subsequent operation phase. We say that a neural network learns off-line if the learning phase and the operation phase ph ase are distinct. A neural network learns on-line on -line if it learns and operates at the same time. Usually, supervised learning is performed off-line, whereas unsupervised learning is performed on-line. •
4.4 Transfer Function
The behavior of an ANN (Artificial Neural Network) depends on both the weights an d the inputoutput function (transfer function) that is specified for the units. This function typically falls into one of three categories: • • •
linear (or ramp) threshold sigmoid
For linear units, the output activity is proportional to the total w eighted output.
29
For threshold units , the output is set at one of two levels, depending on whether the total input is greater than or less than some threshold value. For sigmoid units , the output varies continuously but not linearly as the input changes. Sigmoid units bear a greater resemblance to real neutrons than do linear or threshold units, un its, but all three must be considered rough approximations. 4.5 Pre- and Post-processing
All neural networks take numeric input and produce numeric output. The transfer function of a unit is typically chosen so that it can accept input in any range, and produces output in a strictly limited range (it has a squashing effect). Although the input can be in any range, there is a saturation effect so that the unit is only sensitive to inputs within a fairly limited range. The illustration below shows one of the most common transfer functions, the logistic function (also sometimes referred to as the sigmoid function, function, although strictly speaking it is only one example of a sigmoid - S-shaped - function). In this case, the output is in the range (0,1), and the input is sensitive in a range not much larger than (-1,+1). The function is also smooth and easily differentiable, facts that are critical in allowing the network training algorithms to op erate (this is the reason why the step function is not used in practice).
The limited numeric response range, together with the fact that information has to be in numeric form, implies that neural solutions require preprocessing and post-processing stages to be used in real applications (see Bishop, 1995). Two issues need to be addressed: Scaling. Numeric values have to be scaled into a range that is appropriate for the network. Typically, raw variable values are scaled linearly. In some circumstances, non-linear scaling may be appropriate (for example, if you know that a variable is exponentially distributed, you might take the logarithm). Non-linear scaling is not supported in ST Neural Networks. Instead, you should scale the variable using STATISTICA's data transformation facilities before transferring the data to ST Neural Networks. Prediction problems may be divided into two main categories: classification, the objective is to determine to which of a number of discrete Classification. In classification, classes a given input case belongs. Examples include credit assignment (is this person a good or
30
bad credit risk), cancer detection (tumor, clear), signature recogn ition (forgery, true). In all these cases, the output required is clearly a single nominal variable. The most common classification tasks are (as above) two-state, although many-state tasks are also not unknown. Regression . In regression, the objective is to predict the value of a (usually) continuous variable: tomorrow's stock price, the fuel consumption of a car, next year's profits. In this case, the output required is a single numeric variable
5 Applications of Neural Networks 5.1 Area of interest where we can use
Neural networks are applicable in virtually every situation in which a relationship between the predictor variables (independents, inputs) and predicted variables (dependents, outputs) exists, even when that relationship is very complex and not easy to articulate in the usual terms of "correlations" or "differences between groups." A few representative ex amples of problems to which neural network analysis has been applied successfully are: •
•
•
•
•
Detection of medical phenomena. A variety of health-related indices (e.g., a combination of heart rate, levels of various va rious substances in the blood, respiration rate) can be monitored. The onset of a particular medical condition could be associated with a very complex (e.g., nonlinear and interactive) combination of changes on a subset of the variables being monitored. Neural networks have been used to recognize this predictive pattern so that the appropriate treatment can be prescribed. Stock market prediction. Fluctuations of stock prices and stock indices are another example of a complex, multidimensional, but in some circumstances at least partiallydeterministic phenomenon. Neural networks are being used by many technical analysts to make predictions about stock prices based upon a large number of factors such as past performance of other stocks and various economic indicators. ab out an Credit assignment. A variety of pieces of information are usually known about applicant for a loan. For instance, the applicant's age, education, occupation, and many other facts may be available. After training a neural network on historical data, neural network analysis can identify the most relevant characteristics and use those to classify applicants as good or bad credit risks. Monitoring the condition of machinery. Neural networks can be instrumental in cutting costs by bringing additional expertise to scheduling the preventive maintenance of machines. A neural network can be trained to distinguish between the sounds a machine makes when it is running normally ("false alarms") versus when it is on the verge of a problem. After this training period, the expertise of the n etwork can be used to warn a technician of an upcoming breakdown, before it occurs and causes costly unforeseen "downtime." Engine management. Neural networks have been used to analyze the input of sensors from an engine. The ne neural ural network controls the various parameters within which the
31
engine functions, in order to achieve a particular goal, such as minimizing fuel consumption.
5.2 What can you do with an NN and what not?
1. In principle principle,, NNs can compute compute any computabl computablee function, function, i.e. they they can do everythin everything ga normal digital computer can do. 2. In practice, practice, NNs are are especially especially useful useful for for classific classification ation and functio function n approximation/mapping problems which are tolerant of some imprecision, which have lots of training data available, but to which hard and fast rules (such as those that might be used in an expert system) cannot easily be applied. 3. Almost Almost any mapping mapping between between vector vector spaces can can be approximat approximated ed to arbitrary arbitrary precisi precision on by feed forward NNs (which are the type most often used in practical applications) if you have enough data and enough computing resources. 4. NNs are, are, at least today, today, difficu difficult lt to apply apply successful successfully ly to problems problems that that concern concern manipulation of symbols and memory. 5. And there there are no methods methods for traini training ng NNs that that can magically magically create create informat information ion that is is not contained in the training data. 5.3 Who is concerned with NNs? • •
• •
•
•
•
• •
5.4
Neural Networks are interesting for quite a lot of very different people: Computer scientists want to find out about the properties p roperties of non-symbolic information processing with neural nets and about learning systems in general. Statisticians use neural nets as flexible, nonlinear regression and classification models. Engineers of many kinds exploit the capabilities cap abilities of neural networks in many areas, such as signal processing and automatic control. Cognitive scientists view neural networks as a possible apparatus to describe models of thinking and consciousness (High-level brain function). Neuro-physiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, motorics). Physicists use neural networks to model phenomena in statistical mechanics and for a lot of other tasks. Biologists use Neural Networks to interpret nucleotide sequences. Philosophers and some other people may also be interested in Neural Networks for various reasons.
Available Software
GENESIS GENESIS 2.0 (GEneral NEural SImulation System) is a general purpose simulation platform which was developed to support the simulation of neural systems ranging from complex models of single neurons to simulations of large networks made up of more abstract neuronal components. Most current GENESIS applications involve realistic simulations of biological neural systems. Although the software can also model more abstract networks, other simulators
32
are more suitable for back propagation and similar connectionist modeling. Runs on most Unix platforms. Graphical front end XODUS. Parallel version for networks of workstations, symmetric multiprocessors, and MPPs also available. DartNet DartNet is a Macintosh-based back propagation simulator, developed at Dartmouth by Jamshed Bharucha and Sean Nolan as a pedagogical tool. It makes use of the Mac's graphical interface, and provides a number of tools for building, editing, training, testing and examining networks. PDP++ The PDP++ software is a new neural-network simulation system written in C++. It represents the next generation of the PDP software released with the McClelland and Rumelhart “Explorations in Parallel Distributed Processing Handbook”, MIT Press, 1987. It is easy enough for novice users, but very powerful and flexible for research use. The current version is 1.0, works on Unix with X-Windows. Features: Full GUI (Interviews), real time network viewer, data viewer, extendable objectoriented, design, CSS scripting language with source-level debugger, GUI macro recording. WinNN WinNN is a shareware Neural Networks (NN) package for windows 3.1. WinNN incorporates a very user-friendly interface with a powerful computational engine. WinNN is intended to be used as a tool for beginners and more advanced neural networks users, it provides an Alternative to using more expensive and hard to use packages. WinNN can implement feed forward multi layered NN and uses a modified fast back-propagation for training. Extensive on line help. Have various Neuron functions. Allows on the fly testing of the network performance and generalization. All training parameters can be easily modified while WinNN is training. Results can be saved on disk or copied to the Clipboard. Supports plotting of the outputs and weight distribution. 5.4.2 Commercial software packages for NN simulation SAS Neural Network Application Operating systems: Windows 3.1, OS/2, HP/UX, Solaris, AIX The SAS Neural Network Application trains a variety of neural nets and includes a graphical user interface, on-site training and customization. Features include multilayer perceptrons, radial basis functions, statistical versions of counter propagation and learning vector quantization, a variety of built-in activation and error functions, multiple hidden layers, direct input-output connections, missing value handling, categorical variables, standardization of inputs and targets, and multiple preliminary optimizations from Copyright ã 1997 by JM & Co/AJRA 6 Random initial values to avoid local minima. Training is done by state-of-the-art numerical optimization NeuroShell2/NeuroWindows NeuroShell 2 combines powerful neural network architectures, a Windows icon driven user interface, and sophisticated utilities for MS-Windows machines. Internal format is spreadsheet, and users can specify that NeuroShell 2 use their own spreadsheet when editing. Includes both Beginner’s and Advanced systems, a Runtime capability, and a choice of 15 Back propagation, Kohonen, PNN and GRNN architectures. Includes Rules, Symbol Translate, Graphics, File Import/Export modules (including MetaStock from Equis International) and NET-PERFECT to
33
prevent overtraining. Options available: Market Technical Indicator Option ($295), Market Technical Indicator Option with Optimizer ($590), and Race Handicapping Option ($149). NeuroShell price: $495. NeuroWindows NeuroWindows is a Programmer’s tool in a Dynamic Link Library (DLL) that can create as many as 128 interactive nets in an application, each with 32 slabs in a single network, and 32K neurons in a slab. Includes Back propagation, Kohonen, PNN, and GRNN paradigms. NeuroWindows can mix supervised and unsupervised nets. The DLL may be called from Visual Basic, Visual C, Access Basic, C, Pascal, and VBA/Excel 5. NeuroWindows price: $369. Gene Hunter is a genetic algorithm with a Dynamic Link Library of genetic algorithm functions that may be called from programming languages such as VisualBasic or C. For non-programmers, GeneHunter also includes an Excel Add-in program, which allows the us
34