Comparison of user features of operating systems refers to a comparison of the general user features of major operating systems in a narrative format. It does not encompass a full exhaustive comparison or description of all technical details of all operating systems. It is a comparison of basic roles and the most prominent features. It also includes the most important features of the operating system's origins, historical development, and role. == Overview == An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also include accounting software for cost allocation of processor time, mass storage, printing, and other resources. For hardware functions such as input and output and memory allocation, the operating system acts as an intermediary between programs and the computer hardware, although the application code is usually executed directly by the hardware and frequently makes system calls to an OS function or is interrupted by it. Operating systems are found on many devices that contain a computer – from cellular phones and video game consoles to web servers and supercomputers. As of June 2024, the dominant general-purpose desktop operating system is Microsoft Windows with a market share of around 72.91%. macOS by Apple Inc. is in second place (14.93%), and the varieties of Linux are collectively in third place (4.04%). In the mobile sector, including both smartphones and tablets, Android is dominant with a market share of 71%, followed by Apple's iOS with 28%; for smartphones alone, Android has 72% and iOS has 28%. Linux distributions are dominant in the server and supercomputing sectors. Other specialized classes of operating systems (special-purpose operating systems)), such as embedded and real-time systems, exist for many applications. Security-focused operating systems also exist. Some operating systems have low system requirements (i.e. light-weight Linux distribution). Others may have higher system requirements. Some operating systems require installation or may come pre-installed with purchased computers (OEM-installation), whereas others may run directly from media (i.e. live cd) or flash memory (i.e. USB stick). == MS-DOS == === Overview === MS-DOS (acronym for Microsoft Disk Operating System) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and some operating systems attempting to be compatible with MS-DOS, are sometimes referred to as "DOS" (which is also the generic acronym for disk operating system). MS-DOS was the main operating system for IBM PC compatible personal computers during the 1980s, from which point it was gradually superseded by operating systems offering a graphical user interface (GUI), in various generations of the graphical Microsoft Windows operating system. IBM licensed and re-released it in 1981 as PC DOS 1.0 for use in its PCs. Although MS-DOS and PC DOS were initially developed in parallel by Microsoft and IBM, the two products diverged after twelve years, in 1993, with recognizable differences in compatibility, syntax, and capabilities. During its lifetime, several competing products were released for the x86 platform, and MS-DOS went through eight versions, until development ceased in 2000. Initially, MS-DOS was targeted at Intel 8086 processors running on computer hardware using floppy disks to store and access not only the operating system, but application software and user data as well. Progressive version releases delivered support for other mass storage media in ever greater sizes and formats, along with added feature support for newer processors and rapidly evolving computer architectures. Ultimately, it was the key product in Microsoft's development from a programming language company to a diverse software development firm, providing the company with essential revenue and marketing resources. It was also the underlying basic operating system on which early versions of Windows ran as a GUI. == Microsoft Windows == === Overview === Microsoft Windows, commonly referred to as Windows, is a group of several proprietary graphical operating system families, all of which are developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. Active Microsoft Windows families include Windows NT and Windows IoT; these may encompass subfamilies, (e.g. Windows Server or Windows Embedded Compact) (Windows CE). Defunct Microsoft Windows families include Windows 9x, Windows Mobile, and Windows Phone. Microsoft announced an operating environment named Windows on 10 November 1983, as a graphical operating system shell for MS-DOS in response to the growing interest in graphical user interfaces (GUIs); Windows 1.0 first shipped on 20 November 1985. Microsoft Windows came to dominate the world's personal computer (PC) market with over 90% market share, overtaking Mac OS, which had been introduced in 1984, while Microsoft has in 2020 lost its dominance of the consumer operating system market, with Windows down to 30%, lower than Apple's 31% mobile-only share (65% for desktop operating systems only, i.e. "PCs" vs. Apple's 28% desktop share) in its home market, the US, and 32% globally (77% for desktops), where Google's Android leads. Apple came to see Windows as an unfair encroachment on their innovation in GUI development as implemented on products such as the Lisa and Macintosh (eventually settled in court in Microsoft's favor in 1993). As of January 2023, on PCs, Windows is still the most popular operating system in all countries. However, in 2014, Microsoft admitted losing the majority of the overall operating system market to Android, because of the massive growth in sales of Android smartphones. In 2014, the number of Windows devices sold was less than 25% that of Android devices sold. This comparison, however, may not be fully relevant, as the two operating systems traditionally target different platforms. Still, numbers for server use of Windows (that are comparable to competitors) show one third market share, similar to that for end user use. As of October 2020, the most recent version of Windows for PCs, tablets and embedded devices is Windows 10, version 20H2. The most recent version for server computers is Windows Server, version 20H2. A specialized version of Windows also runs on the Xbox One video game console. === Windows 95 === Windows 95 introduced a redesigned shell based around a desktop metaphor; File shortcuts (also known as shell links) were introduced and the desktop was re-purposed to hold shortcuts to applications, files and folders, reminiscent of Mac OS. In Windows 3.1 the desktop was used to display icons of running applications. In Windows 95, the currently running applications were displayed as buttons on a taskbar across the bottom of the screen. The taskbar also contained a notification area used to display icons for background applications, a volume control and the current time. The Start menu, invoked by clicking the "Start" button on the taskbar or by pressing the Windows key, was introduced as an additional means of launching applications or opening documents. While maintaining the program groups used by its predecessor Program Manager, it also displayed applications within cascading sub-menus. The previous File Manager program was replaced by Windows Explorer and the Explorer-based Control Panel and several other special folders were added such as My Computer, Dial Up Networking, Recycle Bin, Network Neighborhood, My Documents, Recent documents, Fonts, Printers, and My Briefcase among others. AutoRun was introduced for CD drives. The user interface looked dramatically different from prior versions of Windows, but its design language did not have a special name like Metro, Aqua or Material Design. Internally it was called "the new shell" and later simply "the shell". The subproject within Microsoft to develop the new shell was internally known as "Stimpy". In 1994, Microsoft designers Mark Malamud and Erik Gavriluk approached Brian Eno to compose music for the Windows 95 project. The result was the six-second start-up music-sound of the Windows 95 operating system, The Microsoft Sound and it was first released as a startup sound in May 1995 on Windows 95 May Test Release build 468. When released for Windows 95 and Windows NT 4.0, Internet Explorer 4 came with an optional Windows Desktop Update, which modified the shell to provide several additional updates to Windows Explorer, including a Quick Launch toolbar, and new features integrated with Internet Explorer, such as Active Desktop (which allowed Internet content to be displayed directly on the desktop). Some of the user interface elements introduced in Windows 95, such as the desktop, taskbar, Start menu and Windows
Kdan Mobile
Kdan Mobile Software Limited is a software application development company based in Tainan City, Taiwan. Kdan also has branches in Taipei, Changsha, Irvine, California, Japan, and South Korea. The company was founded in 2009 by Kenny Su, the company's CEO. == History == Kdan Mobile was founded in 2009 by Kenny Su (蘇柏州) and develops an application for PDF documents. Su previously worked at the Industrial Technology Research Institute (ITRI) . In 2018, the company completed its Series B round of fundraising, in which it raised 16 million USD in total. Four global firms, Dattoz Partners (South Korea), WI Harper Group (U.S.), Taiwania Capital (Taiwan), and Golden Asia Fund Mitsubishi UFJ Capital (Japan), made up the Series B investment. Kdan previously raised 5 million USD in its Series A round in 2018.
Variational message passing
Variational message passing (VMP) is an approximate inference technique for continuous- or discrete-valued Bayesian networks, with conjugate-exponential parents, developed by John Winn. VMP was developed as a means of generalizing the approximate variational methods used by such techniques as latent Dirichlet allocation, and works by updating an approximate distribution at each node through messages in the node's Markov blanket. == Likelihood lower bound == Given some set of hidden variables H {\displaystyle H} and observed variables V {\displaystyle V} , the goal of approximate inference is to maximize a lower-bound on the probability that a graphical model is in the configuration V {\displaystyle V} . Over some probability distribution Q {\displaystyle Q} (to be defined later), ln P ( V ) = ∑ H Q ( H ) ln P ( H , V ) P ( H | V ) = ∑ H Q ( H ) [ ln P ( H , V ) Q ( H ) − ln P ( H | V ) Q ( H ) ] {\displaystyle \ln P(V)=\sum _{H}Q(H)\ln {\frac {P(H,V)}{P(H|V)}}=\sum _{H}Q(H){\Bigg [}\ln {\frac {P(H,V)}{Q(H)}}-\ln {\frac {P(H|V)}{Q(H)}}{\Bigg ]}} . So, if we define our lower bound to be L ( Q ) = ∑ H Q ( H ) ln P ( H , V ) Q ( H ) {\displaystyle L(Q)=\sum _{H}Q(H)\ln {\frac {P(H,V)}{Q(H)}}} , then the likelihood is simply this bound plus the relative entropy between P {\displaystyle P} and Q {\displaystyle Q} . Because the relative entropy is non-negative, the function L {\displaystyle L} defined above is indeed a lower bound of the log likelihood of our observation V {\displaystyle V} . The distribution Q {\displaystyle Q} will have a simpler character than that of P {\displaystyle P} because marginalizing over P {\displaystyle P} is intractable for all but the simplest of graphical models. In particular, VMP uses a factorized distribution Q ( H ) = ∏ i Q i ( H i ) , {\displaystyle Q(H)=\prod _{i}Q_{i}(H_{i}),} where H i {\displaystyle H_{i}} is a disjoint part of the graphical model. == Determining the update rule == The likelihood estimate needs to be as large as possible; because it's a lower bound, getting closer log P {\displaystyle \log P} improves the approximation of the log likelihood. By substituting in the factorized version of Q {\displaystyle Q} , L ( Q ) {\displaystyle L(Q)} , parameterized over the hidden nodes H i {\displaystyle H_{i}} as above, is simply the negative relative entropy between Q j {\displaystyle Q_{j}} and Q j ∗ {\displaystyle Q_{j}^{}} plus other terms independent of Q j {\displaystyle Q_{j}} if Q j ∗ {\displaystyle Q_{j}^{}} is defined as Q j ∗ ( H j ) = 1 Z e E − j { ln P ( H , V ) } {\displaystyle Q_{j}^{}(H_{j})={\frac {1}{Z}}e^{\mathbb {E} _{-j}\{\ln P(H,V)\}}} , where E − j { ln P ( H , V ) } {\displaystyle \mathbb {E} _{-j}\{\ln P(H,V)\}} is the expectation over all distributions Q i {\displaystyle Q_{i}} except Q j {\displaystyle Q_{j}} . Thus, if we set Q j {\displaystyle Q_{j}} to be Q j ∗ {\displaystyle Q_{j}^{}} , the bound L {\displaystyle L} is maximized. == Messages in variational message passing == Parents send their children the expectation of their sufficient statistic while children send their parents their natural parameter, which also requires messages to be sent from the co-parents of the node. == Relationship to exponential families == Because all nodes in VMP come from exponential families and all parents of nodes are conjugate to their children nodes, the expectation of the sufficient statistic can be computed from the normalization factor. == VMP algorithm == The algorithm begins by computing the expected value of the sufficient statistics for that vector. Then, until the likelihood converges to a stable value (this is usually accomplished by setting a small threshold value and running the algorithm until it increases by less than that threshold value), do the following at each node: Get all messages from parents. Get all messages from children (this might require the children to get messages from the co-parents). Compute the expected value of the nodes sufficient statistics. == Constraints == Because every child must be conjugate to its parent, this has limited the types of distributions that can be used in the model. For example, the parents of a Gaussian distribution must be a Gaussian distribution (corresponding to the Mean) and a gamma distribution (corresponding to the precision, or one over σ {\displaystyle \sigma } in more common parameterizations). Discrete variables can have Dirichlet parents, and Poisson and exponential nodes must have gamma parents. More recently, VMP has been extended to handle models that violate this conditional conjugacy constraint. == Literature == John Winn; Christopher M. Bishop (2005). "Variational Message Passing" (PDF). Journal of Machine Learning Research. 6: 661–694. ISSN 1533-7928. Wikidata Q139488859. Beal, M.J. (2003). Variational Algorithms for Approximate Bayesian Inference (PDF) (PhD). Gatsby Computational Neuroscience Unit, University College London. Archived from the original (PDF) on 2005-04-28. Retrieved 2007-02-15.
Implicit blockmodeling
Implicit blockmodeling is an approach in blockmodeling, similar to a valued and homogeneity blockmodeling, where initially an additional normalization is used and then while specifying the parameter of the relevant link is replaced by the block maximum. This approach was first proposed by Batagelj and Ferligoj in 2000, and developed by Aleš Žiberna in 2007/08. Comparing with homogeneity, the implicit blockmodeling will perform similarly with max-regular equivalence, but slightly worse in other settings. It will perform worse than valued and homogeneity blockmodeling with a pre-specified blockmodel.
Evolutionary multimodal optimization
In applied mathematics, multimodal optimization deals with optimization tasks that involve finding all or most of the multiple (at least locally optimal) solutions of a problem, as opposed to a single best solution. Evolutionary multimodal optimization is a branch of evolutionary computation, which is closely related to machine learning. Wong provides a short survey, wherein the chapter of Shir and the book of Preuss cover the topic in more detail. == Motivation == Knowledge of multiple solutions to an optimization task is especially helpful in engineering, when due to physical (and/or cost) constraints, the best results may not always be realizable. In such a scenario, if multiple solutions (locally and/or globally optimal) are known, the implementation can be quickly switched to another solution and still obtain the best possible system performance. Multiple solutions could also be analyzed to discover hidden properties (or relationships) of the underlying optimization problem, which makes them important for obtaining domain knowledge. In addition, the algorithms for multimodal optimization usually not only locate multiple optima in a single run, but also preserve their population diversity, resulting in their global optimization ability on multimodal functions. Moreover, the techniques for multimodal optimization are usually borrowed as diversity maintenance techniques to other problems. == Background == Classical techniques of optimization would need multiple restart points and multiple runs in the hope that a different solution may be discovered every run, with no guarantee however. Evolutionary algorithms (EAs) due to their population based approach, provide a natural advantage over classical optimization techniques. They maintain a population of possible solutions, which are processed every generation, and if the multiple solutions can be preserved over all these generations, then at termination of the algorithm we will have multiple good solutions, rather than only the best solution. Note that this is against the natural tendency of classical optimization techniques, which will always converge to the best solution, or a sub-optimal solution (in a rugged, “badly behaving” function). Finding and maintenance of multiple solutions is wherein lies the challenge of using EAs for multi-modal optimization. Niching is a generic term referred to as the technique of finding and preserving multiple stable niches, or favorable parts of the solution space possibly around multiple solutions, so as to prevent convergence to a single solution. The field of Evolutionary algorithms encompasses genetic algorithms (GAs), evolution strategy (ES), differential evolution (DE), particle swarm optimization (PSO), and other methods. Attempts have been made to solve multi-modal optimization in all these realms and most, if not all the various methods implement niching in some form or the other. == Multimodal optimization using genetic algorithms/evolution strategies == De Jong's crowding method, Goldberg's sharing function approach, Petrowski's clearing method, restricted mating, maintaining multiple subpopulations are some of the popular approaches that have been proposed by the community. The first two methods are especially well studied, however, they do not perform explicit separation into solutions belonging to different basins of attraction. The application of multimodal optimization within ES was not explicit for many years, and has been explored only recently. A niching framework utilizing derandomized ES was introduced by Shir, proposing the CMA-ES as a niching optimizer for the first time. The underpinning of that framework was the selection of a peak individual per subpopulation in each generation, followed by its sampling to produce the consecutive dispersion of search-points. The biological analogy of this machinery is an alpha-male winning all the imposed competitions and dominating thereafter its ecological niche, which then obtains all the sexual resources therein to generate its offspring. Recently, an evolutionary multiobjective optimization (EMO) approach was proposed, in which a suitable second objective is added to the originally single objective multimodal optimization problem, so that the multiple solutions form a weak pareto-optimal front. Hence, the multimodal optimization problem can be solved for its multiple solutions using an EMO algorithm. Improving upon their work, the same authors have made their algorithm self-adaptive, thus eliminating the need for pre-specifying the parameters. An approach that does not use any radius for separating the population into subpopulations (or species) but employs the space topology instead is proposed in.
Artificial consciousness
Artificial consciousness, also known as machine consciousness, synthetic consciousness, or digital consciousness, is consciousness hypothesized to be possible for artificial intelligence. It is also the corresponding field of study, which draws insights from philosophy of mind, philosophy of artificial intelligence, cognitive science and neuroscience. The term "sentience" can be used when specifically designating ethical considerations stemming from a form of phenomenal consciousness (P-consciousness, or the ability to feel qualia). Since sentience involves the ability to experience ethically positive or negative (i.e., valenced) mental states, it may justify welfare concerns and legal protection, as with non-human animals. Some scholars believe that consciousness is generated by the interoperation of various parts of the brain; these mechanisms are labeled the neural correlates of consciousness (NCC). Some further believe that constructing a system (e.g., a computer system) that can emulate this NCC interoperation would result in a system that is conscious. Some scholars reject the possibility of non-biological conscious beings. == Philosophical views == As there are many hypothesized types of consciousness, there are many potential implementations of artificial consciousness. In the philosophical literature, perhaps the most common taxonomy of consciousness is into "access" and "phenomenal" variants. Access consciousness concerns those aspects of experience that can be apprehended, while phenomenal consciousness concerns those aspects of experience that seemingly cannot be apprehended, instead being characterized qualitatively in terms of "raw feels", "what it is like" or qualia. === Plausibility debate === Type-identity theorists and other skeptics hold the view that consciousness can be realized only in particular physical systems because consciousness has properties that necessarily depend on physical constitution. In his 2001 article "Artificial Consciousness: Utopia or Real Possibility," Giorgio Buttazzo says that a common objection to artificial consciousness is that, "Working in a fully automated mode, they [the computers] cannot exhibit creativity, unreprogrammation (which means can 'no longer be reprogrammed', from rethinking), emotions, or free will. A computer, like a washing machine, is a slave operated by its components." For other theorists (e.g., functionalists), who define mental states in terms of causal roles, any system that can instantiate the same pattern of causal roles, regardless of physical constitution, will instantiate the same mental states, including consciousness. ==== Thought experiments ==== David Chalmers proposed two thought experiments intending to demonstrate that "functionally isomorphic" systems (those with the same "fine-grained functional organization", i.e., the same information processing) will have qualitatively identical conscious experiences, regardless of whether they are based on biological neurons or digital hardware. The "fading qualia" is a reductio ad absurdum thought experiment. It involves replacing, one by one, the neurons of a brain with a functionally identical component, for example based on a silicon chip. Chalmers makes the hypothesis, knowing it in advance to be absurd, that "the qualia fade or disappear" when neurons are replaced one-by-one with identical silicon equivalents. Since the original neurons and their silicon counterparts are functionally identical, the brain's information processing should remain unchanged, and the subject's behaviour and introspective reports would stay exactly the same. Chalmers argues that this leads to an absurd conclusion: the subject would continue to report normal conscious experiences even as their actual qualia fade away. He concludes that the subject's qualia actually don't fade, and that the resulting robotic brain, once every neuron is replaced, would remain just as sentient as the original biological brain. Similarly, the "dancing qualia" thought experiment is another reductio ad absurdum argument. It supposes that two functionally isomorphic systems could have different perceptions (for instance, seeing the same object in different colors, like red and blue). It involves a switch that alternates between a chunk of brain that causes the perception of red, and a functionally isomorphic silicon chip, that causes the perception of blue. Since both perform the same function within the brain, the subject would not notice any change during the switch. Chalmers argues that this would be highly implausible if the qualia were truly switching between red and blue, hence the contradiction. Therefore, he concludes that the equivalent digital system would not only experience qualia, but it would perceive the same qualia as the biological system (e.g., seeing the same color). Greg Egan's short story Learning To Be Me (mentioned in §In fiction), illustrates how undetectable duplication of the brain and its functionality could be from a first-person perspective. Critics object that Chalmers' proposal begs the question in assuming that all mental properties and external connections are already sufficiently captured by abstract causal organization. Van Heuveln et al. argue that the dancing qualia argument contains an equivocation fallacy, conflating a "change in experience" between two systems with an "experience of change" within a single system. Mogensen argues that the fading qualia argument can be resisted by appealing to vagueness at the boundaries of consciousness and the holistic structure of conscious neural activity, which suggests consciousness may require specific biological substrates rather than being substrate-independent. Anil Seth argues that the complexity of brain neurons intrinsically matters in addition to their function and that it is not possible to replace any part of the brain with a perfect silicon equivalent. He points out that some of biological neurons exhibit activity aimed at cleaning up metabolic waste products, and writes that a perfect silicon replacement would require a silicon-based metabolism, but silicon is not suitable for creating such artificial metabolism. ==== In large language models ==== In 2022, Google engineer Blake Lemoine made a viral claim that Google's LaMDA chatbot was sentient. Lemoine supplied as evidence the chatbot's humanlike answers to many of his questions; however, the chatbot's behavior was judged by the scientific community as likely a consequence of mimicry, rather than machine sentience. Lemoine's claim was widely derided for being ridiculous. Moreover, attributing consciousness based solely on the basis of LLM outputs or the immersive experience created by an algorithm is considered a fallacy. However, while philosopher Nick Bostrom states that LaMDA is unlikely to be conscious, he additionally poses the question of "what grounds would a person have for being sure about it?" One would have to have access to unpublished information about LaMDA's architecture, and also would have to understand how consciousness works, and then figure out how to map the philosophy onto the machine: "(In the absence of these steps), it seems like one should be maybe a little bit uncertain. [...] there could well be other systems now, or in the relatively near future, that would start to satisfy the criteria." David Chalmers argued in 2023 that LLMs today display impressive conversational and general intelligence abilities, but are likely not conscious yet, as they lack some features that may be necessary, such as recurrent processing, a global workspace, and unified agency. Nonetheless, he considers that non-biological systems can be conscious, and suggested that future, extended models (LLM+s) incorporating these elements might eventually meet the criteria for consciousness, raising both profound scientific questions and significant ethical challenges. However, the view that consciousness can exist without biological phenomena is controversial and some reject it. Kristina Šekrst cautions that anthropomorphic terms such as "hallucination" can obscure important ontological differences between artificial and human cognition. While LLMs may produce human-like outputs, she argues that it does not justify ascribing mental states or consciousness to them. Instead, she advocates for an epistemological framework (such as reliabilism) that recognizes the distinct nature of AI knowledge production. She suggests that apparent understanding in LLMs may be a sophisticated form of AI hallucination. She also questions what would happen if an LLM were trained without any mention of consciousness. === Testing === Sentience is an inherently first-person phenomenon. Because of that, and due to the lack of an empirical definition of sentience, directly measuring it may be impossible. Although systems may display numerous behaviors correlated with sentience, determining whether a system is sentient is known as the hard pr
Bayesian network
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. == Graphical model == Formally, Bayesian networks are directed acyclic graphs (DAGs) whose nodes represent variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses. Each edge represents a direct conditional dependency. Any pair of nodes that are not connected (i.e. no path connects one node to the other) represent variables that are conditionally independent of each other. Each node is associated with a probability function that takes, as input, a particular set of values for the node's parent variables, and gives (as output) the probability (or probability distribution, if applicable) of the variable represented by the node. For example, if m {\displaystyle m} parent nodes represent m {\displaystyle m} Boolean variables, then the probability function could be represented by a table of 2 m {\displaystyle 2^{m}} entries, one entry for each of the 2 m {\displaystyle 2^{m}} possible parent combinations. Similar ideas may be applied to undirected, and possibly cyclic, graphs such as Markov networks. == Example == Suppose we want to model the dependencies between three variables: the sprinkler (or more appropriately, its state - whether it is on or not), the presence or absence of rain and whether the grass is wet or not. Observe that two events can cause the grass to become wet: an active sprinkler or rain. Rain has a direct effect on the use of the sprinkler (namely that when it rains, the sprinkler usually is not active). This situation can be modeled with a Bayesian network (shown to the right). Each variable has two possible values, T (for true) and F (for false). The joint probability function is, by the chain rule of probability, Pr ( G , S , R ) = Pr ( G ∣ S , R ) Pr ( S ∣ R ) Pr ( R ) {\displaystyle \Pr(G,S,R)=\Pr(G\mid S,R)\Pr(S\mid R)\Pr(R)} where G = "Grass wet (true/false)", S = "Sprinkler turned on (true/false)", and R = "Raining (true/false)". The model can answer questions about the presence of a cause given the presence of an effect (so-called inverse probability) like "What is the probability that it is raining, given the grass is wet?" by using the conditional probability formula and summing over all nuisance variables: Pr ( R = T ∣ G = T ) = Pr ( G = T , R = T ) Pr ( G = T ) = ∑ x ∈ { T , F } Pr ( G = T , S = x , R = T ) ∑ x , y ∈ { T , F } Pr ( G = T , S = x , R = y ) {\displaystyle \Pr(R=T\mid G=T)={\frac {\Pr(G=T,R=T)}{\Pr(G=T)}}={\frac {\sum _{x\in \{T,F\}}\Pr(G=T,S=x,R=T)}{\sum _{x,y\in \{T,F\}}\Pr(G=T,S=x,R=y)}}} Using the expansion for the joint probability function Pr ( G , S , R ) {\displaystyle \Pr(G,S,R)} and the conditional probabilities from the conditional probability tables (CPTs) stated in the diagram, one can evaluate each term in the sums in the numerator and denominator. For example, Pr ( G = T , S = T , R = T ) = Pr ( G = T ∣ S = T , R = T ) Pr ( S = T ∣ R = T ) Pr ( R = T ) = 0.99 × 0.01 × 0.2 = 0.00198. {\displaystyle {\begin{aligned}\Pr(G=T,S=T,R=T)&=\Pr(G=T\mid S=T,R=T)\Pr(S=T\mid R=T)\Pr(R=T)\\&=0.99\times 0.01\times 0.2\\&=0.00198.\end{aligned}}} Then the numerical results (subscripted by the associated variable values) are Pr ( R = T ∣ G = T ) = 0.00198 T T T + 0.1584 T F T 0.00198 T T T + 0.288 T T F + 0.1584 T F T + 0.0 T F F = 891 2491 ≈ 35.77 % . {\displaystyle \Pr(R=T\mid G=T)={\frac {0.00198_{TTT}+0.1584_{TFT}}{0.00198_{TTT}+0.288_{TTF}+0.1584_{TFT}+0.0_{TFF}}}={\frac {891}{2491}}\approx 35.77\%.} To answer an interventional question, such as "What is the probability that it would rain, given that we wet the grass?" the answer is governed by the post-intervention joint distribution function Pr ( S , R ∣ do ( G = T ) ) = Pr ( S ∣ R ) Pr ( R ) {\displaystyle \Pr(S,R\mid {\text{do}}(G=T))=\Pr(S\mid R)\Pr(R)} obtained by removing the factor Pr ( G ∣ S , R ) {\displaystyle \Pr(G\mid S,R)} from the pre-intervention distribution. The do operator forces the value of G to be true. The probability of rain is unaffected by the action: Pr ( R ∣ do ( G = T ) ) = Pr ( R ) . {\displaystyle \Pr(R\mid {\text{do}}(G=T))=\Pr(R).} To predict the impact of turning the sprinkler on: Pr ( R , G ∣ do ( S = T ) ) = Pr ( R ) Pr ( G ∣ R , S = T ) {\displaystyle \Pr(R,G\mid {\text{do}}(S=T))=\Pr(R)\Pr(G\mid R,S=T)} with the term Pr ( S = T ∣ R ) {\displaystyle \Pr(S=T\mid R)} removed, showing that the action affects the grass but not the rain. These predictions may not be feasible given unobserved variables, as in most policy evaluation problems. The effect of the action do ( x ) {\displaystyle {\text{do}}(x)} can still be predicted, however, whenever the back-door criterion is satisfied. It states that, if a set Z of nodes can be observed that d-separates (or blocks) all back-door paths from X to Y then Pr ( Y , Z ∣ do ( x ) ) = Pr ( Y , Z , X = x ) Pr ( X = x ∣ Z ) . {\displaystyle \Pr(Y,Z\mid {\text{do}}(x))={\frac {\Pr(Y,Z,X=x)}{\Pr(X=x\mid Z)}}.} A back-door path is one that ends with an arrow into X. Sets that satisfy the back-door criterion are called "sufficient" or "admissible." For example, the set Z = R is admissible for predicting the effect of S = T on G, because R d-separates the (only) back-door path S ← R → G. However, if S is not observed, no other set d-separates this path and the effect of turning the sprinkler on (S = T) on the grass (G) cannot be predicted from passive observations. In that case P(G | do(S = T)) is not "identified". This reflects the fact that, lacking interventional data, the observed dependence between S and G is due to a causal connection or is spurious (apparent dependence arising from a common cause, R). (see Simpson's paradox) To determine whether a causal relation is identified from an arbitrary Bayesian network with unobserved variables, one can use the three rules of "do-calculus" and test whether all do terms can be removed from the expression of that relation, thus confirming that the desired quantity is estimable from frequency data. Using a Bayesian network can save considerable amounts of memory over exhaustive probability tables, if the dependencies in the joint distribution are sparse. For example, a naive way of storing the conditional probabilities of 10 two-valued variables as a table requires storage space for 2 10 = 1024 {\displaystyle 2^{10}=1024} values. If no variable's local distribution depends on more than three parent variables, the Bayesian network representation stores at most 10 ⋅ 2 3 = 80 {\displaystyle 10\cdot 2^{3}=80} values. One advantage of Bayesian networks is that it is intuitively easier for a human to understand (a sparse set of) direct dependencies and local distributions than complete joint distributions. == Inference and learning == Bayesian networks perform three main inference tasks: Inferring unobserved variables Parameter learning for the probability distributions of each node in the network Structure learning of the graphical network === Inferring unobserved variables === Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the evidence variables) are observed. This process of computing the posterior distribution of variables given evidence is called probabilistic inference. The posterior gives a universal sufficient statistic for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying Bayes' theorem to complex problems. The most common exact inference methods are: variable elimination, which eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the prod