Statistics / Algorithms
==================
Weibull regression
-----------------------
model is an example of duration analysis. Duration analysis is used in lots of fields, but with different names. Engineers might talk about time-to-failure models. Epidemiologists might talk about survival models. In all cases, we’re trying to make predictions about the time until a particular event – failure of a key mechanical part, or death due to disease. We try to do so on the basis of a number of predictors, or covariates. But instead of factoring in the tensile strength of the metal used in our part, or the virulence of a particular disease, we use features of the coalition to predict its duration. <br />
<
http://blogs.lse.ac.uk/politicsandpolicy/2012/02/23/coalition-termination-hanretty><br />
<
http://chrishanretty.co.uk/blog/index.php/2011/06/23/coalition-has-one-in-three-chance-of-going-the-distance><br />
<
http://en.wikipedia.org/wiki/Proportional_hazards_models>
Simple Linear Regression Analysis
--------------------------------------------
Regression analysis is a statistical technique that attempts to explore and model the relationship between two or more variables. For example, an analyst may want to know if there is a relationship between road accidents and the age of the driver. Regression analysis forms an important part of the statistical analysis of the data obtained from designed experiments and is discussed briefly in this chapter. Every experiment analyzed in DOE++ includes regression results for each of the responses. These results, along with the results from the analysis of variance (explained in our "Analysis of Experiments" discussion), provide information that is useful to identify significant factors in an experiment and explore the nature of the relationship between these factors and the response<br />
<
http://www.weibull.com/DOEWeb/simple_linear_regression_analysis.htm>
Statistical Science and Philosophy of Science
---------------------------------------------------------
The "meeting grounds" of statistical science and philosophy of science are or should be connected by a two-way street: while general philosophical questions about evidence and inference bear on statistical questions (about methods to use, and how to interpret them), statistical methods bear on philosophical problems about inference and knowledge.<br />
<
http://www.rmm-journal.de/htdocs/st01.html>
k-nearest neighbor algorithm
-------------------------------------
In pattern recognition, the k-nearest neighbor algorithm (k-NN) is a method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until classification. The k-nearest neighbor algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of its nearest neighbor.<br />
<
http://www.fastcompany.com/1814225/law-enforcements-secret-weapon-google-maps><br />
<
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>
Building Decision Trees in Python
------------------------------------------
Decision trees fall under the subfield of machine learning within the larger field of artificial intelligence. Decision trees are mainly used for classification purposes, but they are also helpful in uncovering features of data that were previously unrecognizable to the eye.
Physics uses the term entropy to describe the amount of disorder inherent within a system. In information theory, this term has a similar meaning--it is the measure of the disorder in a set of data. The ID3 heuristic uses this concept to come up with the "next best" attribute in the data set to use as a node, or decision criteria, in the decision tree. Thus, the idea behind the ID3 heuristic is to find the attribute that most lowers the entropy for the data set, thereby reducing the amount of information needed to completely describe each piece of data.<br />
<
https://www.readability.com/articles/wbf9ofbz>
further reading:
------------------
Stan: A (Bayesian) Directed Graphical Model Compiler<br />
<
http://andrewgelman.com/2012/01/stan-a-bayesian-directed-graphical-model-compiler>
The holes in my philosophy of Bayesian data analysis
<
http://andrewgelman.com/2011/06/the_holes_in_my/>
Stanford Artificial Intelligence | Machine Learning lectures<br />
<
http://cs229.stanford.edu/><br />
<
http://see.stanford.edu/see/lecturelist.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1>