The post LibreOffice Math and LaTeX first appeared on JustToThePoint.

]]>To insert a formula into a LibreOffice document, open your document in Writer, Calc, Draw, or Impress. If you want to insert a formula in LibreOffice Writer, go to **Insert, Object,** and **Formula**. Let's insert the quadratic formula for the roots of the general quadratic equation.

In the **Elements Dock** (it is on the left of the Formula Editor), select the **category** (e.g. Unary/Binary Operators) you want to use in your formula from the drop-down list at the top of the Elements Dock, then the **symbol** (e.g. Division (Fraction)).

The Formula Editor in LibreOffice Math uses a markup language to represent formulas. For example, a over b produces the fraction ^{a}⁄_{b} and a_n, 3 times 5, and sqrt {3}, a_{n}, 3x5 and √3 respectively when used in a formula. As you enter your formula using the markup language in the Formula Editor, it will appear in the Preview Window or update automatically (View, AutoUpdate display).

Our Formula Editor contains: {**<?>**} over {<?>}. Select the first placeholder <?>, type -b, and right-click in the Formula Editor to open the context menu. Select a category (**Unary/Binary Operators**) and then select the markup example (**+-a**) that you want to use from the sub-context menu. Your Formula Editor should contain this: {-b + - *{ <?>}*} over {}

You should go ahead and select from the category **Functions**, the sub-context menu **sqrt x**: {-b + - sqrt {<?>}} over {<?>}. Right-click again in the Formula Editor and select Functions, x^y and replace {<?>}^{<?>} for b^2, and so on.

Obviously,** you could also enter expressions written in markup language directly** in the Formula Editor: {-b +- sqrt{ b^2 -4ac}} over {2a}.

Another example would be: A=%pi*r^2. It produces π*r^{2}. Observe that Greek characters (%alfa, %beta, %gamma... α, β, γ) can also be entered into a formula using the Symbols dialog. You just need to select** Tools, Symbols** on the main menu.

**LaTeX is a high-quality typesetting system**; it includes features designed for the production of technical and scientific documentation.

The LaTeX source files are plain text. The default extension of LATEX source file is .tex.

\documentclass{article}

% **It declares the type of document**, in this case is an article. The three most commonly used standard document-classes in LaTeX include: article, report, and book. Then, you will write the text of your document between the \begin{document} and \end{document} tags.

\title{Learning LaTeX} % It tittle is Learning LaTeX \author{Máximo Núñez Alarcón, @Anawim, #justtothepoint} % Its author is Máximo Núñez Alarcón. \date{August, 2021} % It was written in August 2021 \usepackage{amsmath}

% The text of your document should be written after the \begin{document} command. The part of your .tex file before this point is called the **preamble**. The point of the preamble is to *define the type of document you are writing and the language, load extra packages you may need and set some parameters*.

The amsmath is an extension package for LaTeX that provides various features to facilitate writing math formulas and to improve the typographical quality of their output.

If you are a non-English speaker, let's say you want to write a LaTeX document in Spanish, you may use: \usepackage[utf8]{inputenc} and \usepackage[spanish]{babel}. The package babel makes possible to display special characters, e.g., "moño" and "canción". The recommended input encoding is utf8.

\begin{document} Pythagorean theorem: \(x^2 + y^2 = z^2\)

% LaTeX allows two writing modes for mathematical expressions. The **inline math mode** is used to *write formulas that are part of a paragraph and we use \(...\) as delimiters*. x^2 is the LaTeX code that produces x^{2}.

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. \[

% The **display math mode** is used to *write expressions that are not part of a paragraph (they will be placed on separate lines)*. You can display math expressions using the following methods: \[...\], \begin{displaymath} ... \end{displaymath}, and \begin{equation} ... \end{equation}

\binom{n}{k} = \frac{n!}{k!(n-k)!} \] When fractions have the same denominators, we simply add or subtract the numerators as indicated and place the result over the common denominator, e.g., \(\frac{3}{2}+\frac{5}{2}=\frac{8}{2}=4\) The quadratic formula is: \(x = \frac{{ - b \pm \sqrt {b^2 - 4ac} }}{{2a}}\) \end{document}

The *\sqrt{arg}* command produces *the square root symbol with the argument as radicand*, e.g., sqrt {b^2 - 4ac}. Besides, fractions are written using the code: \frac{numerator}{denominator}, e.g., frac{3}{2}, frac{5}{2}, and frac{8}{2}.

Overleaf is a collaborative cloud-based LaTeX editor. It is a user-friendly LaTeX tool which makes online document editing and collaboration seamless and hassle free. Papeeira is another online laTeX editor.

TeXworks is a **free**, **multi-platform** (it is available for Windows, Linux, and macOS), and **open-source LaTeX editor**.

We are adding three mathematical expressions to show you how to write integrals, limits, and series in LaTeX.

\[ \int_0^1 x^3dx = \frac{1}{4} = 0.25 \] \[ \lim_{n\to \infty}\frac{1}{n} = 0 \] \[ \sum_{i=1}^{n}i=\frac{n(n+1)}{2} \]

Other LaTeX tools are:

The post LibreOffice Math and LaTeX first appeared on JustToThePoint.

]]>The post Vectors, matrices, and system of linear equations first appeared on JustToThePoint.

]]>**A vector is an object that has both a magnitude or size and a direction**. "Geometrically, we can picture a vector as a directed line segment, whose length is the magnitude of the vector and with an arrow indicating the direction," An introduction to vectors, Math Insight.

We are going to use Maxima to work with vectors and matrices.

Let's use Yacas, an easy to use, general purpose Computer Algebra System, a program for symbolic manipulation of mathematical expressions. It is available online, too.

The Ubuntu installation instructions are quite simple: *sudo add-apt-repository ppa:teoretyk/yacas, sudo apt-get update*

user@pc:~$yacas This is Yacas version '1.3.6'. To exit Yacas, enter Exit(); or quit or Ctrl-c. Type 'restart' to restart Yacas. To see example commands, keep typing Example(); In> v:={3, 4, 5}; # We define two vectors, v and w. Out> {3,4,5} In> w:={2, 7, 1}; Out> {2,7,1} In> v+w; # We perform some operations on them. Out> {5,11,6} In> v-w; Out> {1,-3,4} In> 3*v; Out> {9,12,15} In> v.w; # We ask Yacas for the scalar product of both vectors. Out> 39 In> sqrt(v.v); # The magnitude of v. Out> sqrt(50) In> CrossProduct(v, w); # It returns the cross product of the vectors v and w. Out> {-31,7,13} In> x:={4, 2, 4}; Out> {4,2,4} In> Normalize(x); # It normalizes a vector. In other words, it returns the normalized (unit) vector parallel to v, a vector having the same direction as v but with length 1. Out> {2/3,1/3,2/3}

user@pc:~$ python # Let's work with vectors in Python Python 3.8.10 (default, Jun 2 2021, 10:49:15) >>> import numpy as np # NumPy is a library that supports vectors and matrices in an optimized way. >>> v = np.array([3, 4, 5]) # Data manipulation in Python is almost synonymous with NumPy array manipulation. >>> w = np.array([2, 7, 1]) >>> np.add(v, w) # numpy.add adds arguments element-wise: v+w. array([ 5, 11, 6]) >>> np.subtract(v, w) # numpy.subtract subtracts arguments element-wise: v-w. array([ 1, -3, 4]) >>> v*3 array([ 9, 12, 15]) >>> np.dot(v, w) # It calculates the dot product or scalar product of v and w. 39 >>> np.linalg.norm(v) # It computes the magnitude or Euclidean norm of v. 7.0710678118654755 # = sqrt(50) >>> np.cross(v, w) # numpy.cross returns the cross product of two (arrays of) vectors. array([-31, 7, 13]) >>> x = np.array([4, 2, 4]) >>> x/np.linalg.norm(x) # It normalizes the x vector. array([0.66666667, 0.33333333, 0.66666667]) # {2/3,1/3,2/3}

We can use Wolfram Alpha, too. For instance, **(3, 4, 5)*(2, 7, 1) or {3, 4, 5}*{2, 7, 1} computes a cross product** (it also returns a Vector plot, Vector length, and the Normalized vector); and *{3, 4, 5}.{2, 7, 1}* **computes a dot or scalar product**. Besides, we can perform some vector computations: {3, 4, 5} + {2, 7, 1}, {3, 4, 5}-3*{2, 7, 1}; {3, 4, 5}^3;

*norm{3, 4, 5}* **computes the magnitude or norm** of a vector; normalize {4, 2, 4} normalizes the vector. Finally, MatrixRank[{3, 4, 5},{2, 7, 1}] = 2. MatrixRank[m] gives the rank of the matrix m and finds the number of linearly independent rows, so {3, 4, 5} and {2, 7, 1} are independent. MatrixRank[{1, 4, 2}, {5, 20, 10}] returns 1 because {1, 4, 2} and {5, 20, 10} are linearly dependent.

*Octave is a program specially designed for manipulating matrices*. The easiest way to define matrices in Octave is to type the elements inside square brackets "[]", each entry could be separated by a comma, and you should replace the commas with semicolons to specify the end of a row, e.g., a = [ 2, 1, 1; 3, 2, 1; 2, 1, 2], b = [ 4, 5; 3, 6; 2, 9]. We can create matrices with random elements, too: b = rand(3, 2).

You can use the standard operators to add (a+b), subtract (a-b), multiply (a*b), power (a^2), and transpose (a') matrices.

Let's consider the linear transformation Av = w where A is an "n" by "n" matrix. If v and w are scalar multiples (Av = w = λv), then **v is an eigenvector of the linear transformation A and the scale factor λ is the eigenvalue** corresponding to that eigenvector.

Av = λv can be stated equivalently as (A -λI)v = 0 where I is the identity matrix, and 0 is the zero vector. This equation has a nonzero solution v if and only if the determinant of the matrix (A -λI) is zero, *so the eigenvalues are values λ that satisfy the equation |A - λI| = 0*. The left-hand side of this equation is called the **characteristic polynomial** of A.

Thus, |A - λI| = (λ_{1} -λ) (λ_{2} -λ)...(λ_{n} -λ) where λ_{1}, λ_{2}, ....λ_{n} are the eigenvalues of A. In our example,** [v, h]=eig(a)** returns the eigenvalues λ_{1} = 2 and λ_{2}=-1, and |A - λI| = (λ_{1} -λ) (λ_{2} -λ) = (2 -λ)(-1 -λ) = λ^{2} -λ -2 (**poly(a)** returns the characteristic polynomial of a). You can double check: det(a-2*eye(2)) (**eye(2)** returns a square 2*2 identity matrix) = 0, det(a+eye(2)) = 0, and det(a) = -2 = λ_{1}*λ_{2}.

user@pc:~$yacas # We can obtain the same results with Yacas This is Yacas version '1.3.6'. To exit Yacas, enter Exit(); or quit or Ctrl-c. Type 'restart' to restart Yacas. To see example commands, keep typing Example(); In> matrixA := {{1, 2, 5}, {3, 8, 6}, {2, 5, 7}}; # Yacas can manipulate vectors, represented as lists, and matrices, represented as lists of lists. Out> {{1,2,5},{3,8,6},{2,5,7}} In> matrixB := {{3, 4, 5}, {2, 4, 7}, {3, 1, 9}}; Out> {{3,4,5},{2,4,7},{3,1,9}} In> matrixA + matrixB; # Basic arithmetic operators work on integers, vectors, and matrices. Out> {{4,6,10},{5,12,13},{5,6,16}} In> 3*matrixA; Out> {{3,6,15},{9,24,18},{6,15,21}} In> Transpose(matrixA); # It gets the transpose of a matrix. Out> {{1,3,2},{2,8,5},{5,6,7}} In> Determinant(matrixA); # It returns the determinant of matrixA. Out> 3 In> Inverse(matrixA); # It gets the inverse of matrixA. Out> {{26/3,11/3,(-28)/3},{-3,-1,3},{(-1)/3,(-1)/3,2/3}} In> Trace(matrixA) # It returns the trace of matrixA, the sum of the elements on its diagonal 1 + 8 + 7 Out> 16

Yacas is available online, too. **EigenValues(matrix)** gets *the eigenvalues of a matrix*, and **CharacteristicEquation(matrix, var)** gets *the characteristic polynomial of a "matrix"* using "var".

user@pc:~$ python # Let's work with matrices in Python Python 3.8.10 (default, Jun 2 2021, 10:49:15) >>> import numpy as np # NumPy is a library that supports vectors and matrices in an optimized way. >>> matrixA=np.array([[1, 2, 5], [3, 8, 6], [2, 5, 7]]) # Data manipulation in Python is almost synonymous with NumPy array manipulation. >>> matrixB=np.array([[3, 4, 5], [2, 4, 7], [3, 1, 9]]) >>> np.add(matrixA, matrixB) # numpy.add adds arguments element-wise: matrixA + matrixB. array([[ 4, 6, 10], [ 5, 12, 13], [ 5, 6, 16]]) >>> np.subtract(matrixA, matrixB) # numpy.subtract subtracts arguments element-wise: matrixA - matrixB. array([[-2, -2, 0], [ 1, 4, -1], [-1, 4, -2]]) >>> np.dot(matrixA, matrixB) # If both inputs are 2D arrays, the numpy.dot function performs matrix multiplication. array([[ 22, 17, 64], [ 43, 50, 125], [ 37, 35, 108]]) >>> 3*matrixA array([[ 3, 6, 15], [ 9, 24, 18], [ 6, 15, 21]]) >>> matrixA.T # We use the T method to transpose a matrix. array([[1, 3, 2], [2, 8, 5], [5, 6, 7]]) >>> matrixA.trace() # We use the trace method to calculate the trace of a matrix. 16 >>> np.linalg.matrix_rank(matrixA) # The NumPy's linear algebra linalg.matrix_rank method returns the rank of matrixA. 3 >>> np.linalg.det(matrixA) # The NumPy's linear algebra det method computes the determinant of matrixA. 3.0000000000000018 >>> np.linalg.inv(matrixA) # The NumPy's linear algebra inv method calculates the inverse of matrixA. array([[ 8.66666667, 3.66666667, -9.33333333], [-3. , -1. , 3. ], [-0.33333333, -0.33333333, 0.66666667]]) >>> matrix=np.array([[4, -5], [2, -3]]) >>> eigenvalues, eigenvectors = np.linalg.eig(matrix) # Let's calculate the eigenvalues and eigenvectors of a square matrix. >>> eigenvalues array([ 2., -1.]) >>> eigenvectors array([[0.92847669, 0.70710678], [0.37139068, 0.70710678]]) >>> np.poly(matrix) # It returns the coefficients of the characteristic polynomial of a matrix. array([ 1., -1., -2.])

Alternatively, we can work with WolframAlpha. Transpose({{1, 2, 5}, {3, 8, 6}, {2, 5, 7}}) returns the transpose of the matrix {{1, 2, 5}, {3, 8, 6}, {2, 5, 7}}. Det({{1, 2, 5}, {3, 8, 6}, {2, 5, 7}}) calculates the determinant of a matrix (3). Inverse({{1, 2, 5}, {3, 8, 6}, {2, 5, 7}}) gives the inverse of a square matrix.

Observe that the syntax is almost the same as Python, Yacas or Octave, e.g., {{1, 2, 5}, {3, 8, 6}, {2, 5, 7}} * {{1,2,3},{3,2,1},{1,2,3}}, {{1, 2, 5}, {3, 8, 6}, {2, 5, 7}} + {{1,2,3},{3,2,1},{1,2,3}}, Trace{{1, 2, 5}, {3, 8, 6}, {2, 5, 7}} (= 16), Rank{{1, 2, 5}, {3, 8, 6}, {2, 5, 7}} (=3), etc.

However,** you can use plain English to enter your queries** in WolframAlpha: find the eigenvalues of the matrix {{4, -5}, {2, -3}} or calculate eigenvalues of the matrix {{4, -5}, {2, -3}}. CharacteristicPolynomial{{4, -5}, {2, -3}} gives the characteristic polynomial for the matrix {{4, -5}, {2, -3}}.

**A system of linear equations** (or linear system) **is a set or collection of one or more linear equations**. Observe that only simple variables are allowed in linear equations, e.g., x^{2}, z^{3}, and √x are not allowed. For example,

A **solution** of a system of linear equations is *a vector or an assignment of values to the variables that is simultaneously a solution of each equation in the system*.

A system of linear equations is **consistent** *if it has at least one solution*. Alternatively, *a system with no solutions* will be called **inconsistent**. A system of linear equations has either a unique solution (consistent, e.g., y+6x = 8, y+3x = -4; Solution: x = 4, y = -16), infinite solutions (consistent, e.g., 2x+y = 4, 6x + 3y= 12; Solutions: y = 4-2x), or no solutions (inconsistent, e.g. 2x+y = 4, 6x + 3y= 10).

For any system of linear equations, there are three helpful matrices that are very important: the **coefficient matrix** containing *the coefficients of the variables* (A), the **augmented matrix** which is the coefficient matrix with an extra column added containing the constant terms of the linear system (the coefficients of the variables together with the constant terms) and the **constant matrix** or the column vector (b) of constant terms. The vector equation is equivalent to a matrix equation of the form **Ax = b**.

user@pc:~$yacas This is Yacas version '1.3.6'. To exit Yacas, enter Exit(); or quit or Ctrl-c. Type 'restart' to restart Yacas. To see example commands, keep typing Example(); In> A:={{6, 1}, {3, 1}} # Define the coefficient matrix. y+6x = 8, y+3x = -4; Out> {{6,1},{3,1}} In> B:={8, -4} # Define the constant matrix. Out> {8,-4} In> MatrixSolve(A, B) # It solves the matrix equation Ax = b. Out> {4,-16} # It has a unique solution x= 4, y = -16.

Microsoft Math Solver provides free step by step solutions to a variety of Math problems.

user@pc:~$yacas In> A:={{5, 2, 0}, {-4, 0, -1}, {1, 1, 1}} # Define the coefficient matrix. Out> {{5,2,0},{-4,0,-1},{1,1,1}} In> B:={11,-9, 9} # Define the constant matrix. Out> {11,-9,9} In> MatrixSolve(A, B) # It solves the matrix equation Ax = b. Out> {1,3,5}

Wolfram Alpha is more than capable of solving systems of linear equations: *5*x+2*y=11, -4x-z=-9, x+y+z=9*

Maxima's command *algsys ([expr_1, …, expr_m], [x_1, …, x_n])* solves the simultaneous polynomial equations expr_1, …, expr_m for the variables x_1, …, x_n, e.g., *algsys([5*x+2*y=11,-4*x-z=-9, x+y+z=9], [x, y, z]);*

The Rouché–Capelli theorem determines the number of solutions for a system of linear equations. **The necessary and sufficient condition for a system of m equations and n unknowns to have a solution is that the rank of its coefficient matrix (r) and that of its augmented matrix are equal (r')**.

If r= rang(A) ≠r'= rang(A|b) the system is incompatible. If r = r'=n the system is a** determinate compatible system**; in other words, *it has only one solution*. If r = r' ≠ n the system is an **indeterminate** compatible system, it has *infinite solutions*.

user@pc:~$ python # Let's solve system of linear equations in Python Python 3.8.10 (default, Jun 2 2021, 10:49:15) >>> import numpy as np # NumPy is a library that supports vectors and matrices in an optimized way. >>> matrixA=np.array([[5, 2, 0], [-4, 0, -1], [1, 1, 1]]) # Data manipulation in Python is almost synonymous with NumPy array manipulation. We define the coefficient matrix ... >>> vectorB=np.array([11, -9, 9]) # ... and the constant matrix. >>> np.linalg.solve(matrixA, vectorB) # It solves a linear matrix equation or a system of linear equations. array([1., 3., 5.]) >>> np.linalg.matrix_rank(matrixA) # The NumPy's linear algebra linalg.matrix_rank method returns the rank of matrixA. 3 >>> matrixExtended = np.column_stack([matrixA, vectorB]) # numpy.column_stack staks 1-D arrays (our constant matrix) as columns into a 2-D array (our coefficient matrix) so we can get the augmented matrix A|b. >>> matrixExtended array([[ 5, 2, 0, 11], [-4, 0, -1, -9], [ 1, 1, 1, 9]]) >>> np.linalg.matrix_rank(matrixExtended) # Observe that rang(A) = rang(A|b) = 3 (r=r'=n) so our system is a determinate compatible system, it has an unique solution, [1, 3, 5]. 3

Let's solve a second system of linear equations.

Easy peasy lemon squeezy with Wolfram Alpha:

Oops, it does not find any solutions. Let's see with Maxima what's the problem.

Observe that *algsys([6*x-y+3*z=6,-6*x+8*y=-10, 2*x-5*y-z=4], [x, y, z]);* returns [] because rank(Mext) = rank ( matrix ([6, -1, 3, 6], [-6, 8, 0, -10], [2, -5, -1, 4]) ) = 3 ≠ 2 = rank(M) = rank ( matrix ([6, -1, 3], [-6, 8, 0], [2, -5, -1]); ). In conclusion, rang(A) ≠ rang(A|b) so **the system is incompatible**.

Let's see a third system of linear equations:

user@pc:~$yacas This is Yacas version '1.3.6'. To exit Yacas, enter Exit(); or quit or Ctrl-c. Type 'restart' to restart Yacas. To see example commands, keep typing Example(); In> A:={{8, 1, 4}, {5, -2, 4}, {1, 1, 0}} # Define the coefficient matrix. Out> {{8,1,4},{5,-2,4},{1,1,0}} In> B:={9,6, 1} # Define the constant matrix. Out> {9,6,1} In> MatrixSolve(A, B) # It tries to solve the matrix equation Ax = b, but fails. Out> {0,Undefined,Undefined}

Maxima shows that r = r'= 2 ≠ n = 3, the system is an **indeterminate** compatible system, it has *infinite solutions*. Solutions: x = -^{(4t-8)}⁄_{7}, y = ^{4t-1}⁄_{7}, z = t.

user@pc:~$ python # Let's solve system of linear equations in Python Python 3.8.10 (default, Jun 2 2021, 10:49:15) >>> import numpy as np # NumPy is a library that supports vectors and matrices in an optimized way. >>> matrixA=np.array([[8, 1, 4], [5, -2, 4], [1, 1, 0]]) # Data manipulation in Python is almost synonymous with NumPy array manipulation. We define the coefficient matrix ... >>> vectorB=np.array([9, 6, 1]) # ... and the constant matrix. >>> np.linalg.matrix_rank(matrixA) # The NumPy's linear algebra linalg.matrix_rank method returns the rank of matrixA. 2 >>> matrixExtended = np.column_stack([matrixA, vectorB]) # column_stack staks 1-D arrays (our constant matrix) as columns into a 2-D array (our coefficient matrix) so we can get the augmented matrix A|b. >>> matrixExtended array([[ 8, 1, 4, 9], [ 5, -2, 4, 6], [ 1, 1, 0, 1]]) >>> np.linalg.matrix_rank(matrixExtended) 2 # It shows that r = r'= 2 ≠ n = 3, the system is anindeterminatecompatible system, it hasinfinite solutions. >>> np.linalg.lstsq(matrixA, vectorB, rcond=None) # Calling linalg.solve(matrixA, vectorB) requires that matrixA is a square and full-rank matrix. In other words, all its rows must be linearly independent and its determinant could not be zero. Otherwise, it will raise the exception: LinAlgError: Singular matrix. (array([0.88888889, 0.11111111, 0.44444444]), array([], dtype=float64), 2, array([1.09823822e+01, 2.71795522e+00, 2.54972035e-16]))

np.linalg.lstsq gives us one of the solutions, 0.88888889, 0.11111111, 0.44444444. This result is consistent with what we obtained in Maxima: z = 0.44444444, x = -(4*0.44444444-8)/7 = 0.88888889142, y = -(4*0.44444444-1)/7 = 0.11111110857.

The post Vectors, matrices, and system of linear equations first appeared on JustToThePoint.

]]>The post Complex Numbers first appeared on JustToThePoint.

]]>We are going to use Octave. Octave is one of the most popular high-level programming languages, primarily intended and widely used for numerical computations. The Octave syntax is largely compatible with Matlab. It is the best free alternative to Matlab.

user@pc:~$ python Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> x=8+5j; # The quickest way to define a complex number in Python is by simply typing x = a +bj; in the source code >>> y=7-2j; >>> x+y # Since complex is a native data type in Python, you can do basic arithmetic calculations with them. (15+3j) >>> x-y (1+7j) >>> x*y (66+19j) >>> x/y; (0.8679245283018868+0.9622641509433962j) >>> 3*x (24+15j) >>> x.real # To get the real and imaginary parts of a complex number in Python, we only need to reach for the real and imag attributes. 8.0 >>> x.imag 5.0 >>> x.conjugate() # The conjugate method returns the complex conjugate, it is obtained by changing the sign of its imaginary part. (8-5j) >>> abs(x) # To get the magnitude, radial coordinate or radius of a complex number, we have to call abs() 9.433981132056603 >>> import cmath, numpy >>> cmath.phase(x) # To get the phase, angular coordinate or angle of a complex number, we have to call cmath.phase() 0.5585993153435624 >>> numpy.degrees(cmath.phase(x)) # The phase returned by cmath.phase(x) is in radians and we may want to use the numpy.degrees() function to convert it to degrees. 32.005383208083494

Let's use Geogebra to visualize complex numbers. More specifically, let's visualize 8 + 5i. First, we will draw a triangle. Select **Polygon**, and click on the points or vertices (0, 0), (8, 0), (8, 5), and then the first point again (0, 0) to complete the triangle. Alternatively, you can type in the input field: *A = (0, 0), B = (8, 0), C = (8, 5), Polygon[A, B, C, A]*.

To get the phase or angle, select **Angle**, and click on B, A, and C or just type *Angle[B, A, C]* in the input field. The phase returned by Geogebra is in degrees.

Select C, then right-click **Object Properties**, and select in the drop menu **Show Label**, *Name & Value*.

We can enter the complex number w = 8 + 5i into the Input Bar. Next, click on the point, then right-click **Object Properties**, and select in the drop menu **Show Label**, *Name & Value*. Then, select **Complex Number** (*w = 8 + 5i*), **Cartesian Coordinates** (*w = (8, 5)* ) or **Polar Coordinates** (*w = (9.43, 32.01 ^{0}*) ) from the list of Coordinates formats on the tab

We can use the following commands: x(w), y(w), abs(w), and arg(w). They return the real and imaginary part, the magnitude or radius, and the phase or angle of the given complex number respectively. conjugate(w) returns the conjugate of the complex number w: 8 - 5i.

Let's go back to Octave.

As you can see in the screenshot below, you can do *basic arithmetic on complex numbers with WolframAlpha*, too, e.g., (2+3i)*(4-2i), 1/(1+3i), (2+3i)^3...

We can use the Cartesian coordinate system and describe the location of a point p by two coordinates (x, y). We can also assign the point p a pair of polar coordinates (r, θ).

The first coordinate r "**is the distance from the pole (0, 0) to p** along a straight radial line, and θ is the **angle formed by that radial line and the positive x-axis**, measured anti-clockwise from the x-axis to the line," Maths in a minute: Polar coordinates, +plus magazine. We can think of it as a clock with one hand. We "move out a distance r along the hand from the origin, then rotate the hand upwards (counter-clockwise) by an angle θ to reach the point," xaktly.com, Polar coordinates.

Let's use fooPlot, a simple online plotting tool. You can see a plot with a default function (x^2), but on the right, you can change the function and select **Polar** so it can be used in plotting polar graphs. You can set the background color, show/hide the grid lines, export the graph as svg, eps, pdf, and png, and even share it on your favorite social network.

r(theta) = 2 is the equation of a circle of radius R = 2 centered at the origin in polar coordinates. The graphs of the sine (sin(theta), theta in [0, pi]) and cosine functions (cos(theta), theta in [0, pi]) in polar coordinates are circles, too. sin(2*theta), theta in [0, 2*pi] is *a rose with four petals*: [0, pi/2] (Remember that sin(theta), theta in [0, pi], was a circle, so sin(2*theta), theta in [0, pi/2] is *a petal in the first quadrant*), [pi/2, pi], [pi, 3pi/2], [3pi/2, 2]. 3*sin(2*theta) is the same rose with four petals, but three times the radius.

r(theta) = 2*sin(4*theta), theta in [0, 2pi] draws a pretty flower or rose with eight petals. 2 is the radius of the rose. It is formed by increasing the frequency factor (4), so it increases the number of loops in the polar graph (4=2*2, it doubles the petals of the last example).

The Archimidian spiral is described by the equation r = a + b*theta. a=0, so the center point of our spiral is the origin, and b=0.3. b controls the distance between loops.

import numpy as np # Let's use Python to plot functions in polar coordinates import matplotlib.pyplot as plt # matplotlib.pyplot provides a MATLAB-like way of plotting. theta = np.arange(0, 2*np.pi, 0.01) # np.arange() returns the ndarray object containing evenly spaced values within [0, 2*np.pi]; in other words, we will plot the function for all theta in [0, 2*π] r = 2*theta # Let's plot the Archimidian spiral. plt.polar(theta, r,'b.') # matplotlib.pyplot.polar makes a polar plot. plt.title("r(theta)=2*theta", va='bottom') # A title for our plot plt.show()

import numpy as np # Let's use Python to plot functions in polar coordinates import matplotlib.pyplot as plt # matplotlib.pyplot provides a MATLAB-like way of plotting. theta = np.arange(0, 2*np.pi, 0.01) # np.arange() returns the ndarray object containing evenly spaced values within [0, 2*np.pi]; in other words, we will plot the function for all theta in [0, 2*π] r = 2+3*np.cos(theta) # Let's plot the cardioid. plt.polar(theta, r, 'b.') # matplotlib.pyplot.polar makes a polar plot. plt.title("r(theta)=2+3*np.sin(theta)", va='bottom') # A title for our plot plt.show()

Of course, we can *use WolframAlpha to plot functions in polar coordinates*, too, e.g., polar plot r=0.3*theta, theta=0 to 8*Pi, polar plot r=2 +3*sin(theta), theta=0 to 2*Pi, etc.

GeoGebra supports complex numbers, just enter a = 1 + 2i and b = 3 +i. Alternatively, you can select **New Point**, and plot the points A (1, 2) and B(3, 1).

Next, type aux1=Vector[(0, 0), a] and aux2=Vector[(0, 0), b] in the input field. Graphically, select the option **Vector between Two Points** from the menu **Line through Two Points**, and click on the starting point (0, 0) and A (1, 2) (a) and then again click on (0, 0) and B(3, 1) (b).

Let's draw two parallel lines by selecting the option **Parallel Line** from the menu **Perpendicular Line**. Click on the point B(3, 1) (b) and the vector "aux1" or just type: aux3=Line(b, aux1). Click on the point A(1, 2) (a) and the vector "aux2" or type in the Input field: aux4=Line(a, aux2).

Now, we ask Geogebra to show us the intersection of both parallel lines. Navigate to the menu **New Point, Intersect Two Objects** and click on the lines drawn in the previous step or write *c=Intersect[aux3, aux4]*.

Finally, select the option **Vector between Two Points** from the menu **Line through Two Points**, and click on the starting point (0, 0) and c or type Vector[(0, 0), c]. Observe that A (1, 2) (a, 1 +2i) + B (3, 1) (b, 3 +i) = (4, 3) (c, 4 +3i).

A **parametric equation** defines a group of quantities as functions of one or more independent variables called parameters. It is **a set of equations with more than one dependent variables**, e.g., x = x(t) y = y(t), that defines or specifies the (dependent) variables x and y as functions of a parameter or independent variable t.

from sympy import * # Let's plot a parametric equation with Python import sympy.plotting as plt t = symbols('t') x = 0.3*t*cos(t) # cos(t), sin(t), t∈[0, 2*pi] is a circle; 2*cos(t), 3*sin(t), t∈[0, 2*pi] defines a parable. We draw them with FooPlot. y = 0.3*t*sin(t) # Spirals have the parameter form: x(t) = a*t*cos(t), y(t) = a*t*sin(t), a is a constant. plt.plot_parametric(x, y, (t, 0, 10*pi)) # Plots 2D parametric plots.

from sympy import * # Let's plot a parametric equation with Python import sympy.plotting as plt import math t = symbols('t') x = cos(t) # If you draw a circle with x=cos(t) and y=sin(t) and pull it evenly in z-direction, you get acylindrical spiral or helix. y = sin(t) z = t plt.plot3d_parametric_line(x, y, z, (t, 0, 8*math.pi)) # Plots 3D line plots, defined by a parameter.

A torus with major radius R and minor radius r may be defined as: *x = (r Cos(u) + R) Cos(v); y = (r Cos(u) + R) Sin(v); z = r Sin(u), v, u∈[0, 2*pi]*, R = 3, r = 2. Let's plot it on WolframAlpha:

*ParametricPlot3D[{(2 Cos[u]+3) Cos[v], (2 Cos[u]+3) Sin[v], 2 Sin[u]}, {u, 0, 2 Pi}, {v, 0, 2 Pi}]. We draw a sphere, too, ParametricPlot3D[{(2sin(v)*cos(u)), 2sin(v)*sin(u), 2cos(v) }, {u, 0, 2 Pi}, {v, 0, 2 Pi}].*

The post Complex Numbers first appeared on JustToThePoint.

]]>The post Mathematical analysis first appeared on JustToThePoint.

]]>**A limit is the value to which a function grows close as the input approaches some value**. One would say that the limit of f, as x approaches a, is L. Formally, for every real ε > 0, there exists a real δ > 0 such that for all real x, 0 < | x − a | < δ implies that | f(x) − L | < ε. In other words, *f(x) gets closer and closer to L as x moves closer and closer to a*.

To compute a limit in wxMaxima (a cross-platform graphical front-end for the computer algebra system Maxima), we use the command **limit** with the following syntax: *limit(f(x), x, a)*. Some examples are: limit(3*x+4, x, 2); returns 10; limit(x^2+3, x, 1); returns 4; limit(sin(x)/x, x, 0); returns 1.

sin(1/x) wobbles between -1 and 1 an infinite number of times between zero and any positive x value, no matter how small or close to zero, that's why there is not limit for sin(1/x) as x approaches zero.

WolframAlpha provides functionality to evaluate limits, too. Observe that the function x sin(^{1}⁄_{x}) is a somewhat different story from sin(^{1}⁄_{x}). Since -1 ≤ sin(^{1}⁄_{x}) ≤ 1, -x ≤ x sin(^{1}⁄_{x}) ≤ x, then as x approaches zero so must x sin(^{1}⁄_{x}).

Write in WolframAlpha, limit(sqrt(x), x, infinite), limit(1/x, x, infinite), and limit(1/x, x, 0). You can also compute one-sided limits from a specified direction: lim 1/x as x->0+.

A one-sided limit is the limit of a function f(x) as x approaches a specified value or point "a" either from the left or from the right.

The minus sign indicates "from the left", and the plus sign indicates "from the right". In other words, *f(x) gets closer and closer to L as x moves closer and closer to "a" from the left or from the right*.

Please consider that we do not take ^{1}⁄_{0} as ∞ or −∞. Division by zero is undefined. Similarly, the limit of ^{1}⁄_{x} as x approaches 0 is also undefined. However, if you take the limit of ^{1}⁄_{x} as x approaches zero from the left or from the right, you get negative and positive infinity respectively. These limits are not equal and that's why the limit of ^{1}⁄_{x} as x approaches 0 is undefined.

x = 0 is a vertical asymptote of ^{1}⁄_{x}. In general, if the limit of the function f(x) is ±∞ as x→a from the left or the right, **x = a is a vertical asymptote of the graph of the function y = f(x)**.

We can use Python to calculate limits, too. We will use SymPy, "a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible."

user@pc:~$ python Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sympy import * # Import the SymPy library >>> x = symbols('x') >>> limit(sin(x), x, 0) # SymPy can compute limits with the limit function. Its syntax is limit(f(x), x, a). 0 >>> limit(sqrt(x), x, 0) 0 >>> limit(sqrt(x), x, oo) # Infinite is written as oo, a double lowercase letter 'o'. oo # A function has an infinite limit if it either increases or decreases without bound. >>> limit(1/x, x, 0, '+') # Limit has a fourth argument, dir, which specifies a direction. oo >>> limit(1/x, x, 0, '-') -oo >>> from sympy.plotting import plot # The plotting module allows you to make 2-dimensional and 3-dimensional plots. >>> plot(1/x, (x, -1, 1), ylim= (-100, 100), line_color = 'blue') # It plots^{1}⁄_{x}. ylim denotes the y-axis limits and line_color specifies the color for the plot.

Maxima computes the limit of expr (**limit (expr, x, a, dir)**) as x approaches "a" from the direction dir. dir may have the value "plus" for a limit from above or right, "minus" for a limit from below or left, or may be omitted. You may also want to try f(x):=(x-2)/(x+2); *limit(f(x), x, infinity);* It will return 1.

y = 1 is a horizontal asymptote of ^{x-2}⁄_{x+2}. In general, if the limit of the function f(x) is "a" as x→±∞, **y = a is a horizontal asymptote of the graph of the function y = f(x)**.

Let's see a last example, f(x) = ^{x2 +2x -2}⁄_{x2 +3x +7}. y = 1 is a horizontal asymptote of ^{x2 +2x -2}⁄_{x2 +3x +7}.

A **continuous function** is a function that does not have any abrupt changes in value or discontinuities as the value of x changes. In other words, it **is continuous at every point in its domain**.

More intuitively, we can say that if we want to get all the f(x) values to stay in some small neighborhood or proximity around "f(a)", we simply need to choose a small enough neighborhood for the x values around "a" or let's put it in different terms, if we pick some x close enough to a, we ought to have f(x) close enough to "f(a)".

Examples of continuous functions are polynomials (x, 2x^{2} − 7x + 4), exponential function (e^{x}), sine function, cosine function, etc. ^{1}⁄_{x} is continuous whenever x is nonzero. However, ^{1}⁄_{x} is not defined at x=0, so it is discontinuous at x = 0.

A piecewise function (a function defined by multiple sub-functions, where each sub-function applies to a different interval in the domain) is continuous if its constituent functions are continuous (x^{2}, x) on the corresponding intervals or subdomains, and there is no discontinuity at each endpoint of the subdomains (x = 0).

We are going to use WolframAlpha to demonstrate that the piecewise function f(x) is continuous:

Let's check in wxMaxima that the following piecewise function f(x) is continuous in all points except 0:

Finally, we will determine the continuity of a function in Python.

user@pc:~$ python Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sympy import * # Import the SymPy library >>> x = symbols('x') >>> f = sp.Piecewise( (x**3-2*x+1, x<=2), (2*x-3, x > 2) ) # Represents a piecewise function. Each argument is a 2-tuple defining an expression and a condition. >>> sp.limit(f, x, 2, '+') # SymPy can compute limits with the limit function. The syntax to compute is limit(f(x), x, a, dir). Limit's fourth argument, dir, specifies a direction. 1 >>> sp.limit(f, x, 2, '-') 5 >>> from sympy.plotting import plot # The plotting module allows you to make 2-dimensional and 3-dimensional plots >>> plot(f, (x, -5, 5), ylim=(-10,10), line_color='blue') # It plots the piecewise function. ylim denotes the y-axis limits and line_color specifies the color for the plot.

The derivative of a function measures the sensitivity to change of the function value (output value) with respect to a change in its argument. **The derivative of a function at a given point is the rate of change or slope of the tangent line to the function at that point**.

Formally, y=f(x) is differentiable at a point "a" of its domain, if its domain contains an interval I containing a,

The command *diff(expr, x)* in Maxima returns the derivate or differential of expr with respect to x.

Remember some basic derivates: f'(x^{n}) = n*x^{n-1}, (f(x)±g(x))' = f'(x) ± g'(x) (they are needed for *diff(x^4 +3*x^2- 7*x +15, x);*.

cos'(x) = -sin(x), (f(g(x))' = f'(g(x)) * g'(x) (*diff(cos(4*x^3 +12*x^2 +5), x);*)

Let's demonstrate that f(x)=|x| is continuous at x = 0, but is not differentiable at 0.

user@pc:~$ python # Let's do it with Python Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sympy import * # Import the SymPy library >>> x = symbols('x') >>> init_printing(use_unicode=True) # To make it easier to read, we are going to enable pretty printing. >>> diff(x**4+3*x**2-7*x+15, x) # To take derivatives, use the diff function. 3 4⋅x + 6⋅x - 7 >>> diff(cos(4*x**3+12*x**2+5), x) ⎛ 2 ⎞ ⎛ 3 2 ⎞ -⎝12⋅x + 24⋅x⎠⋅sin⎝4⋅x + 12⋅x + 5⎠ >>> diff((2*x+1)/(x**2+1), x) 2⋅x⋅(2⋅x + 1) 2 - ───────────── + ────── 2 2 ⎛ 2 ⎞ x + 1 ⎝x + 1⎠ >>> simplify(diff((2*x+1)/(x**2+1), x)) # "One of the most useful features of SymPy is the ability to simplify mathematical expressions. SymPy has dozens of functions to perform various kinds of simplification and a general function called simplify() that attempts to apply all of these functions in an intelligent way to arrive at the simplest form of an expression," Sympy's Manual. ⎛ 2 ⎞ 2⋅⎝- x - x + 1⎠ ──────────────── 4 2 x + 2⋅x + 1 >>>

**The derivative of a function at a point is the slope of the tangent line at that point**. It's all abstract, so let's show it graphically in Python.

user@pc:~$ python # Let's do it with Python Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sympy import symbols, init_printing, diff, simplify # Import from the SymPy's library: symbols, init_printing, diff, and simplify. >>> x = symbols('x') >>> init_printing(use_unicode=True) # To make it easier to read, we are going to enable pretty printing. >>> diff(x**2-3*x+4, x) # We use the diff function to derivate f(x) = x^{2}-3x +4 2⋅x - 3 >>> diff(x**2-3*x+4, x).subs(x, 3) # The subs method replaces or substitutes all instances of a variable or expression with something else. 3 # f'(3) = 3 >>> (x**2-3*x+4).subs(x, 3) 4 # f(3) = 4. The tangent line in x = 3 is: y = f'(3)(x - 3) + f(3) = 3(x -3) + 4 = 3*x -9 +4 = 3*x -5 >>> from sympy.plotting import plot # The plotting module allows you to make 2-dimensional and 3-dimensional plots >>> p1 = plot(x**2-3*x+4, show=False) # It plots f(x). show=False, so it will not display the plot yet. >>> p2 = plot(3*x-5, show=False, line_color='red') # It plots f'(x). show=False, so it will not display the plot yet. line_color='red' specifies the red color for the plot. >>> p1.append(p2[0]) # To add the second plot’s first series object to the first, we use the append method. >>> p1.show() # It displays the plot

It states that for two functions f and g which are differentiable on an interval I except possibly at a point "a" contained in I,

So, L'Hospital's Rule is a great shortcut for doing some limits. It tells us that if we have a limit that produce an indeterminate form 0/0 (^{sin(x)}⁄_{x}, ^{ln(x)}⁄_{x-1}) or ∞/∞ all we need to do is differentiate the numerator and the denominator and then resolve the limit. However, sometimes, even repeated applications of the rule doesn't help us to find the limit value.

A **series** is a description of the operation of adding infinitely many quantities, one after the other, to a given starting quantity. It can also be defined as **the sum of a sequence**, where a sequence is an ordered list of numbers (e.g., 1, 2, 3, 4,.... or 1, 2, 4, 8,...).

It is represented by an expression like a_{1} + a_{2} + a_{3} + ... or

**When this limit exists**, the sequence a_{1} + a_{2} + a_{3} + ... is summable or **the series is convergent or summable and the limit is called the sum of the series**. Otherwise, the series is said to be divergent.

*A geometric series is the sum of the terms of a geometric sequence* (a sequence where each term after the first is found by multiplying the previous one by a fixed, non-zero number -it is usually called the common ratio r-: a, ar, ar^{2}, ar^{3}...).

The convergence of the geometric series depends on the value of the common ratio r. If |r|<1 (in the example above r=^{1}⁄_{3}), the series converges to the sum ^{a}⁄_{1-r} (^{1}⁄_{1-1/3}=^{1}⁄_{2/3}=^{3}⁄_{2}=1.5). If r=1, all the terms are identical and the series is infinite. When r=-1, the terms take two values alternately (eg., 3, -3, 3, -3,...) and the sum of the terms oscillates between two values (3, 0, 3, 0, 3,...). Finally, if |r|>1, the sum of the terms gets larger and larger and the series is divergent.

Let's work with wxMaxima. First,* a[n]:=1/(3^n);* defines a series. The basic command syntax is as follows: sum (a[i] -the expression to be summed-, i -the index-, 0 , inf -the upper and lower limits-), simpsum (it simplifies the result);

a[n]:=1/(2^n); is a geometric series with r=^{1}⁄_{2}<1. It converges to the sum ^{a}⁄_{1-r} = ^{1}⁄_{1-1/2}=^{1}⁄_{1/2}=2. wxMaxima can compute other series, too: a[n]:=1/(n^2); it converges to ^{π2}⁄_{6}. We can find **the sum of the first n terms of a series:** sum(i, i, 0, n), simpsum; = ^{n2+n}⁄_{2} or sum(2*i-1,i, 1, n), simpsum; = n^{2}.

Let's ask Wolfram Alpha about other series: *sum(1/(n*(n+1)), n=1..infinity)* (= 1), *sum(1/(n^2), n=1..infinity) *(=^{π2}⁄_{6}), and *sum((7*n^2+5*n-3)/(3*n^2-4*n+2), n=1..infinity)*. The last one diverges.

**A simple test for the divergence of an infinite series** is:

Maxima can compute integrals of many functions. *integrate (expr, x);* attempts to symbolically compute the integral of expr with respect to x

Wolfram|Alpha can compute indefinite and definite integrals, too, eg.: *integrate x^2 sin^3 x dx*, *integrate sin(x) dx*, *integrate x^3+3*x^2+5*x-7 dx*, and *integrate -1/(x+2)^3 dx*.

user@pc:~$ python # Let's do it with Python Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sympy import * >>> x = symbols('x') >>> init_printing(use_unicode=True) # To make it easier to read, we are going to enable pretty printing. >>> integrate(sin(x), x) # The integrals module in SymPy implements methods to calculate definite and indefinite integrals. -cos(x) >>> integrate(x**3+3*x**2+5*x-7, x) 4 2 x 3 5⋅x ── + x + ──── - 7⋅x 4 2 >>> integrate(-1/(x+2)**3,x) 1 ────────────── 2 2⋅x + 8⋅x + 8 >>> integrate(2*x*sqrt(x**2+1),x) ________ ________ 2 ╱ 2 ╱ 2 2⋅x ⋅╲╱ x + 1 2⋅╲╱ x + 1 ──────────────── + ───────────── (=^{2}⁄_{3}(x^{2}+1)(x^{2}+1)^{1/2}=^{2}⁄_{3}(x^{2}+1)^{3/2}) 3 3 >>> integrate(x**3+3*x**2+5*x-7, (x, 1, 2)) # Sympy can calculate definite integrals: integrate(f, (x, a, b)) 45/4 >>> integrate(x**3, (x, -2, 2)) 0 >>> integrate(1/x, (x, 1, e)) 1.00000000000000

Going back to Maxima, **integrate (expr, x)** computes indefinite integrals, while **integrate (expr, x, a, b)** computes definite integrals, with limits of integration a and b.

The **solution to a definite integral** gives you *the signed area of the region that is bounded by the graph* of a given function between two points in the real line. Conventionally, the area of the graph on or above the x-axis is defined as positive. The area of the graph below the x-axis is defined as negative.

Let f be a function defined on [a, b] that admits an antiderivative F on [a, b] (f(x) = F'(x)). If f is integrable on [a, b] then,

Let's see a few more examples.

The **Taylor series** of a function f is *a representation of this function as an infinite sum of terms* that are calculated from the values of the function's derivatives at a single point. Suppose f is a function that is infinitely differentiable at a point "a" in its domain. The Taylor series of f about a is the power series:

We start out with the Taylor series of the exponential function with base e at x=0:

The derivative of the exponential function with base e is the function itself, (e^{x})'=e^{x}.

We can also specify the center point and the order of the expansion: *series sin x at x=pi/2 to order 8*, series ln (1-x) at x=0.

Maxima contains two functions (taylor and powerseries) for finding the series of differentiable functions. *powerseries (expr, x, a)* returns the general form of the power series expansion for expr in the variable x about the point a. *taylor (expr, x, a, n)* expands the expression expr in a truncated Taylor series in the variable x around the point a.

user@pc:~$ python # Let's do it with Python Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sympy import * # Import the SymPy library >>> x = symbols('x') >>> init_printing(use_unicode=True) # To make it easier to read, we are going to enable pretty printing. >>> series(cos(x),x) # series(expr, x, a, n) computes the series expansion of expr around point x = a. n is the number of terms up to which the series is to be expanded. 2 4 x x ⎛ 6⎞ 1 - ── + ── + O⎝x ⎠ 2 24 >>> series(tan(x),x,0,6) 3 5 x 2⋅x ⎛ 6⎞ x + ── + ──── + O⎝x⎠ 3 15 >>> series(ln(1-x),x) 2 3 4 5 x x x x ⎛ 6⎞ -x - ── - ── - ── - ── + O⎝x ⎠ 2 3 4 5 >>> series(exp(x),x) 2 3 4 5 x x x x ⎛ 6⎞ 1 + x + ── + ── + ── + ─── + O⎝x ⎠ 2 6 24 120 >>> series(sin(x),x,pi/2,8) 2 4 6 ⎛ π⎞ ⎛ π⎞ ⎛ π⎞ ⎜x - ─⎟ ⎜x - ─⎟ ⎜x - ─⎟ ⎝ 2⎠ ⎝ 2⎠ ⎝ 2⎠ 1 - ──────── + ──────── - ──────── + O(...) 2 24 720

The post Mathematical analysis first appeared on JustToThePoint.

]]>The post Statistics III. T-tests, Correlations and Linear Regression first appeared on JustToThePoint.

]]>Unpaired Two-Samples T-tests are **used to compare the means of two independent (unrelated or unpaired) groups**.

For example, let's use rnorm to generate two vectors of normally distributed random numbers (*x=rnorm(100)*, *y=rnorm(100)*, if it is not specified as arguments, mean = 0, sd = 1).* t.test(x, y) *performs a two-samples t-test comparing the means of two independent samples (x, y, H_{0}: m_{x} = m_{y} or m_{x} - m_{y} = 0 ). By default, alternative="two.sided" H_{a}: m_{x} - m_{y} ≠ 0. p-value = 0.2162 is not inferior or equal to the significance level 0.05, so we can not reject the null hypothesis.

However, x_{2}=rnorm(100, 2) (we generate a vector of normally distributed random numbers with mean = 2 and sd = 1). *t.test(x, x _{2})* performs a two-samples t-test comparing the means of x and x

Let's be a bit more rigorous.

x = rnorm (100) y = rnorm (100)

First, we must consider that the unpaired two-samples t-test can only be used** when the two groups of samples are normally distributed** (*shapiro_test(x)* returns p.value = 0.8622367 > 0.05 and shapiro_test(y) returns p.value = 0.1753192 > 0.05), so the distribution of the data are not significantly different from the normal distribution. Therefore, we can assume normality. Besides, **both variances should be equal**.

**var.test(x, y)** performs an F test to compare the variances of two samples. The p-value of the F test = 0.7229 > 0.05, so **there is no significant difference between the variances of both samples, we can assume equality of the two variances**.

**t.test(x, y, var.equal = T)** performs a two sample t-test assuming equal variances, p-value = 0.5017 > 0.05, so we can not reject the null hypothesis, H_{0}: m_{x} = m_{y} or m_{x} - m_{y} = 0.

x=rnorm(100) y=rnorm(100, mean=2) var.test(x, y) t.test(x, y, var.equal = T)

var.test(x, y) performs an F test to compare the variances of two samples. The p-value of the F test = 0.6639 > 0.05, so there is no significant difference between the variances of both samples and **we can assume equality of variances **. However, t.test(x, y, var.equal = T) performs a two sample t-test, p-value < 2.2e-16, so we can reject the null hypothesis and conclude that **the mean values of groups x and y are significantly different**.

Let's take data from the book Probability and Statistics with R: *install.packages("PASWR")* (install.packages installs a given package), library("PASWR") (library loads a given package, i.e. attaches it to the search list on your R workspace). **Fertilize** shows a data frame of plants' heights in inches obtained from two seeds, one obtained by cross-fertilization and the other by auto fertilization.

cross self 1 23.500 17.375 2 12.000 20.375 3 21.000 20.000 4 22.000 20.000 5 19.125 18.375 6 21.500 18.625 7 22.125 18.625 8 20.375 15.250 9 18.250 16.500 10 21.625 18.000 11 23.250 16.250 12 21.000 18.000 13 22.125 12.750 14 23.000 15.500 15 12.000 18.000

> names(Fertilize) # names is a funtion to get the names of the variables of the data frame Fertilize

[1] "cross" "self"

> summary(self) # summary is a function to produce statistical summaries of self: Min (the minimum value), 1st Qu. (the first quartile), Median (the median value), Mean, 3rd Qu. (the third quartile), and Max (the maximum value) Min. 1st Qu. Median Mean 3rd Qu. Max. 12.75 16.38 18.00 17.57 18.62 20.38

*library(ggpubr)
ggboxplot(self, xlab = "Self", ylab = "Heights", # Let's plot our data
ggtheme = theme_light()) # We use it to tweak the display *

> shapiro.test(self) Shapiro-Wilk normality test data: self W = 0.93958, p-value = 0.3771

We use the Shapiro-Wilk normality test. The null hypothesis is the following, H_{0}: the data are normally distributed. The corresponding alternative hypothesis is Ha: the data are not normally distributed.

From the output shown below, the p-value = 0.3771 > 0.05 implying that we cannot reject the null hypothesis, the distribution of the data is not significantly different from a normal distribution, and therefore we can assume that the data have a normal distribution.

*ggqqplot(self, ylab = "Self",* # We use a Q-Q (quantile-quantile) plot (it draws the correlation between a given sample and the theoretical normal distribution). It allows us to visually evaluate the normality assumption.

*ggtheme = theme_minimal())*

We want to know if the average plants' heights in inches obtained by auto fertilization is 17.

> t.test(self, mu=17, conf=.95) One Sample t-test data: self t = 1.0854, df = 14, p-value = 0.2961 alternative hypothesis: true mean is not equal to 17 95 percent confidence interval: 16.43882 18.71118 sample estimates: mean of x 17.575

The value of the t test statistics is 1.0854, its p-value is 0.2961, which is greater than the significance level 0.05. We have no reason to reject the null hypothesis, H_{0}: m = 17. It also gives us a confidence interval for the mean: [16.43882, 18.71118]. Observe that the hypothesized mean µ = 17 falls into the confidence interval.

Next, we want to know if the mean of the plants heights obtained by cross-fertilization is significantly different from the one obtained by auto fertilization.

> summary(cross) # summary is a function to produce statistical summaries of cross: Min (the minimum value), 1st Qu. (the first quartile), Median (the median value), Mean, 3rd Qu. (the third quartile), and Max (the maximum value) Min. 1st Qu. Median Mean 3rd Qu. Max. 12.00 19.75 21.50 20.19 22.12 23.50 > library(ggpubr) > boxplot(cross, self, ylab = "Height", xlab = "Method") # Let's plot our data

From the plot shown below, we can see that both samples have very different means. Do the two samples have the same variances?

> var.test(cross, self) F test to compare two variances data: cross and self F = 3.1079, num df = 14, denom df = 14, p-value = 0.04208 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 1.043412 9.257135 sample estimates: ratio of variances 3.107894

**var.test(cross, self)** performs an F test to compare the variances of two samples. The p-value of the F test = 0.04208 < 0.05, so **there is a significant difference between the variances of both samples, we can not assume equality of the two variances**.

We noted previously that one of the assumptions for the t-test is that the variances of the two samples should be equal. However, a modification of the t-test known as Welch's test is used to solve this problem. It is actually the default in R, but it can also be specified with the argument var.equal=FALSE.

> t.test(cross, self, conf=.95) Welch Two Sample t-test data: cross and self t = 2.4371, df = 22.164, p-value = 0.02328 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3909566 4.8423767 sample estimates: mean of x mean of y 20.19167 17.57500

The value of the Welch two sample t-test is 2.4371, its p-value is 0.02328 < 0.05, which is lesser than the significance level 0.05, the null hypothesis is rejected, and therefore we can conclude there is a significant difference between both means.

> shapiro.test(cross) # However, we have not check that the data from the "cross" sample follow a normal distribution Shapiro-Wilk normality test data: cross W = 0.753, p-value = 0.0009744 # From the output, p-value = 0.0009744 < 0.05 implying that the distribution of the data is significantly different from the normal distribution. In other words, we can not assume normality. > wilcox.test(self, cross, exact=FALSE) # If the data are not normally distributed, it’s recommended to use the non-parametric two-samples Wilcoxon test. Wilcoxon rank sum test with continuity correction data: self and cross W = 39.5, p-value = 0.002608 alternative hypothesis: true location shift is not equal to 0

The p-value of the test is 0.002608, which is lesser than the significance level alpha = 0.05. We can conclude that the mean of the plants heights obtained by cross-fertilization is significantly different from the mean of the plants obtained by auto fertilization.

"Virtually since the dawn of television, parents, teachers, legislators, and mental health professionals have wanted to understand the impact of television programs, particularly on children," Violence in the media: Psychologists study potential harmful effects, American Psychological Association. More specifically, we want to assess the impact of watching violent programs on the viewers.

**library("PASWR")** # It loads the PASWR package

**attach(Aggression) ** # A data frame regarding aggressive behavior in relation to exposure to violent television programs. In each of the resulting 16 pairs, one child is randomly selected to view the most violent shows on TV (violence), while the other watches cartoons, situation comedies, and the like (noviolence).

**Aggression** # It shows the data frame in a tabular format

> summary(violence) summary is a function to produce statistical summaries of violence: Min (the minimum value), 1st Qu. (the first quartile), Median (the median value), Mean, 3rd Qu. (the third quartile), and Max (the maximum value) Min. 1st Qu. Median Mean 3rd Qu. Max. 14.00 16.75 23.00 25.12 35.00 40.00 > summary(noviolence) Min. 1st Qu. Median Mean 3rd Qu. Max. 7.00 16.00 19.00 20.56 26.50 33.00 > library(ggpubr) > boxplot(violence, noviolence, ylab="Aggressive behavior", xlab="Exposure to violent television programs") # Let's plot our data

> shapiro.test(violence) Shapiro-Wilk normality test data: violence W = 0.89439, p-value = 0.06538 > shapiro.test(noviolence) Shapiro-Wilk normality test data: noviolence W = 0.94717, p-value = 0.4463

From the output shown below, the p-value are 0.06538 and 0.4463, they are both greater than the significant level of 0.05 implying that we cannot reject the null hypothesis, the distribution of the data is not significantly different from a normal distribution in both conditions (violence and noviolence), and therefore we can assume that **the data have a normal distribution**.

> var.test(violence, noviolence) F test to compare two variances data: violence and noviolence F = 1.4376, num df = 15, denom df = 15, p-value = 0.4905 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.5022911 4.1145543 sample estimates: ratio of variances 1.437604

**var.test(violence, noviolence)** performs an F test to compare the variances of two samples. The p-value of the F test = 0.4905 > 0.05, so **there is no significant difference between the variances of both samples, we can assume equality of the two variances**.

>t.test(violence, noviolence)Welch Two Sample t-test data: violence and noviolence t = 1.5367, df = 29.063, p-value = 0.1352 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.509351 10.634351 sample estimates: mean of x mean of y 25.1250 20.5625 >t.test(violence, noviolence, var.equal = T)Two Sample t-test data: violence and noviolence t = 1.5367, df = 30, p-value = 0.1349 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.501146 10.626146 sample estimates: mean of x mean of y 25.1250 20.5625

t.test(violence, noviolence, var.equal = T) performs a two sample t-test assuming equal variances, p-value = 0.1349 > 0.05, so we can not reject the null hypothesis. We can conclude that there is no statistically significant difference regarding to violence in relation to exposure to violent television programs. I want you to observe that the Welch statistics produce similar results.

Variables within a dataset can be related for lots of reasons. One variable could cause or depend on the values of another variable, or they could both be effects of a third variable. **The statistical relationship between two variables is referred to as their correlation and a correlation test is used to evaluate the association between two or more variables. **

A correlation could be positive, meaning both variables change or move either up or down in the same direction together (an example would be height and weight, taller people tend to be heavier), or negative, meaning that when one variable’s values increase, the other variable’s values decrease and vice versa (an example would be exercise and weight, the more you exercise, the less you will weigh). Correlation can also be zero, meaning that the variables are unrelated or independent of each other. For example, there is no relationship between the amount of Coke drunk by the average person and his or her level of income or earnings.

A correlation coefficient measures the strength of the relationship between two variables. There are three types of correlation coefficients: Pearson, Spearman, and Kendall. The most widely used is the Pearson correlation. It measures a linear dependence between two variables.

Let's use a dataset from the Statistics Online Computational Resource (SOCR). It contains the height (inches) and weights (pounds) of 25,000 different humans of 18 years of age. The file SOCR-HeightWeight.csv reads as follows:

Index,Height,Weight # Initially, the first line was Index,Height(Inches),Weight(Pounds)

1,65.78331,112.9925

2,71.51521,136.4873

3,69.39874,153.0269

4,68.2166,142.3354

5,67.78781,144.2971

6,68.69784,123.3024

7,69.80204,141.4947

8,70.01472,136.4623

9,67.90265,112.3723

[...]

weightHeight <- read.table("/path/SOCR-HeightWeight.csv", header = T, sep = ",") # read.table loads our file into R, we use sep = "," to specify the separator character. weightHeight

library("ggpubr") ggscatter(weightHeight, x = "Height", y = "Weight", # Let's draw our data, x and y are the variables for plotting add = "reg.line", conf.int = TRUE, # add = "reg.line" adds the regression line, and conf.int = TRUE adds the confidence interval. cor.coef = TRUE, cor.method = "pearson", # cor.coef = TRUE adds the correlation coefficient with the p-value, cor.method = "pearson" is the method for computing the correlation coefficient. xlab = "Height (inches)", ylab = "Weight (pounds)")

> shapiro.test(weightHeight$Height[0:5000]) # We can see from the plot above,the relationship is linear. Do my data (Weight, Height) follow a normal distribution? Shapiro test cannot deal with more than 5.000 data points. We are going to do the shapiro test using only the first 5.000 samples. Shapiro-Wilk normality test data: weightHeight$Height[0:5000] W = 0.99955, p-value = 0.2987 > shapiro.test(weightHeight$Weight[0:5000]) Shapiro-Wilk normality test data: weightHeight$Weight[0:5000] W = 0.99979, p-value = 0.9355

From the output above, the two p-values (0.2987, 0.9355) are greater than the significance level 0.05. Thus, we cannot reject the null hypothesis, the distributions of the data from each of the two variables are not significantly different from the normal distribution. In other words, we can assume normality.

We can use Q-Q plots to visualize the correlation between each of the two variables and the normal distribution: library("ggpubr"), ggqqplot(weightHeight$Height, ylab = "Height"), ggqqplot(weightHeight$Weight, ylab = "Weight").

> cor.test(weightHeight$Height, weightHeight$Weight) Pearson's product-moment correlation data: weightHeight$Height and weightHeight$Weight t = 91.981, df = 24998, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.4935390 0.5120626 sample estimates: cor 0.5028585

cor.test performs a correlation test between two variables. It returns both the correlation coefficient and the significance level(or p-value) of the correlation. The t-test statistic value is 91.981, its p-value is < 2.2e-16 so we can conclude that Height and Weight are positively significant correlated (r = 0.5028585, p-value < 2.2e-16).

Linear regression is a model to estimate or **analyze the relationship between a dependent or response variable** (often called or denoted as y) **and one or more independent or explanatory variables** (often called x). In other words, linear regression predicts a value for Y, 'given' a value of X.

It assumes that *there exists a linear relationship between the response variable and the explanatory variables*. Graphically, a linear relationship is a straight line when plotted as a graph. Its general equation is** y = b _{0} + b_{1}*x + e** where y is the response variable, x is the predictor, independent, or explanatory variable, b

Let's use our dataset from the Statistics Online Computational Resource (SOCR). It contains the height (inches) and weights (pounds) of 25,000 different humans of 18 years of age.

weightHeight <- read.table("/path/SOCR-HeightWeight.csv", header = T, sep = ",") # read.table loads our file into R, we use sep = "," to specify the separator character. weightHeight

library("ggpubr") ggscatter(weightHeight, x = "Height", y = "Weight", # Let's draw our data, x and y are the variables for plotting add = "reg.line", conf.int = TRUE, # add = "reg.line" adds the regression line, and conf.int = TRUE adds the confidence interval. cor.coef = TRUE, cor.method = "pearson", # cor.coef = TRUE adds the correlation coefficient with the p-value, cor.method = "pearson" is the method for computing the correlation coefficient. xlab = "Height (inches)", ylab = "Weight (pounds)")

The graph above strongly suggests a linearly relationship between Height and Weight.

> cor.test(weightHeight$Height, weightHeight$Weight) Pearson's product-moment correlation data: weightHeight$Height and weightHeight$Weight t = 91.981, df = 24998, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.4935390 0.5120626 sample estimates: cor 0.5028585

**The correlation coefficient measures the strength of the relationship between two variables x and y**. A correlation coefficient of r=.50 indicates a stronger degree of a linear relationship than one of r=.40. Its value ranges between -1 (perfect negative correlation) and +1 (perfect positive correlation).

If the coefficient is, say, .80 or .95, we know that the corresponding variables closely vary together in the same direction. Otherwise, if it were -.80 or -.95, x and y would vary together in opposite directions. A value closer to 0 suggests a weak relationship between the variables, they vary separately. A low correlation (-0.2 < x < 0.2) suggests that much of the variation of the response variable (y) is not explained by the predictor (x) and we need to look for better predictor variables.
Observe that *the correlation coefficient (0.5028585) is good enough but far from perfect*, so we can continue with our study.

> model <- lm(Weight ~ Height, data = weightHeight) > summary(model) Call: lm(formula = Weight ~ Height, data = weightHeight) Residuals: Min 1Q Median 3Q Max -40.302 -6.711 -0.052 6.814 39.093 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -82.57574 2.28022 -36.21 <2e-16 *** Height 3.08348 0.03352 91.98 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 10.08 on 24998 degrees of freedom Multiple R-squared: 0.2529, Adjusted R-squared: 0.2528 F-statistic: 8461 on 1 and 24998 DF, p-value: < 2.2e-16

A linear regression can be calculated in R with the command lm. The command format is as follows: *lm([target variable] ~ [predictor variables], data = [data source])*.

We can see the values of the intercept and the slope. Therefore, the regression line equation can be written as follows: Weight = -82.57574 + 3.08348*Height.

One measure to test how good is the model is the coefficient of determination or R². **For models that fit the data very well, R² is near to 1**. In other words, you have a very tight correlation and can predict y quite precisely if you know x. On the other side, models that poorly fit the data have R² near 0. In our example, **the model only explains 25.28% of the data variability**. The F-statistic (8461) *gives the overall significance of the model*, it produces a model p-value < 2.2e-16 which is highly significant.

The p-Value of individual predictor variables (extreme right column under ‘Coefficients’) are very important. When there is a p-value, there is a null and alternative hypothesis associated with it. The null hypothesis H_{0} is that the coefficients associated with the variables are equal to zero (i.e., no relationship between x and y). The alternate hypothesis H_{a} is that the coefficients are not equal to zero (i.e., there is some relationship between x and y).** In our example, both the p-values for the Intercept and Height (the predictor variable) are significant, so we can reject the null hypothesis, there is a significant relationship between the predictor (Height) and the outcome variable (Weight).**

The residual data of the linear regression model is the difference between the observed data of the dependent variable y and the fitted values ŷ (the predicted response values given the values for our explanatory variables). We can plot them: *plot(model$residuals)* and **they should look random**.

Observe that the section **residuals** (model <- lm(Weight ~ Height, data = weightHeight), summary(model)) provides a quick view of the distribution of the residuals. The median (-0.052) is close to zero, and the minimum and maximum values are roughly equal in absolute value (-40.302, 39.093).

plot(cooks.distance(model), pch = 10, col = "red") # Cook's distance is used in Regression Analysis to find influential points and outliers. It’s a way to identify points that negatively affect the regression model. We are going to plot them. cook <- cooks.distance(model) significantValues <- cook > 1 # As a general rule, cases with a Cook's distance value greater than one should be investigated further. significantValues # We get none of those points.

Multiple linear regression is an extension of simple linear regression that uses several explanatory variables (predictor or independent variables) to predict the outcome of a response variable (outcome or dependent variable).

Its general equation is** y = b _{0} + b_{1}*x_{1} + b_{2}x_{2} + b_{3}x_{3} + ... + + b_{n}x_{n} +e** where y is the response or dependent variable, x

Let's use stackloss. It is described as operational data of a plant for the oxidation of ammonia to nitric acid. stack.loss is the dependent variable, it is the percentage of the ingoing ammonia to the plant that escapes from the absorption column unabsorbed; that is, an (inverse) measure of the over-all efficiency of the plant. Air Flow is the flow of cooling air. It represents the rate of operation of the plant. Water Temp is the temperature of cooling water circulated through coils in the absorption tower.

> stackloss

library("ggpubr") ggscatter(stackloss, x = "stack.loss", y = "Air.Flow", color = "Water.Temp" # Let's plot our data, we can also add color to our datapoints based on the second factor (color="Water.Temp") add = "reg.line", conf.int = TRUE, # add = "reg.line" adds the regression line, and conf.int = TRUE adds the confidence interval. cor.coef = TRUE, cor.method = "pearson", # cor.coef = TRUE adds the correlation coefficient with the p-value, cor.method = "pearson" is the method for computing the correlation coefficient. xlab = "stack.loss", ylab = "Air.Flow")

A linear regression can be calculated in R with the command lm. The command format is as follows: *lm([target variable] ~ [predictor variables], data = [data source])*.

> model <- lm(stack.loss ~ Air.Flow + Water.Temp, data = stackloss) > summary(model) Call: lm(formula = stack.loss ~ Air.Flow + Water.Temp, data = stackloss) Residuals: Min 1Q Median 3Q Max -7.5290 -1.7505 0.1894 2.1156 5.6588 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -50.3588 5.1383 -9.801 1.22e-08 *** Air.Flow 0.6712 0.1267 5.298 4.90e-05 *** Water.Temp 1.2954 0.3675 3.525 0.00242 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.239 on 18 degrees of freedom Multiple R-squared: 0.9088, Adjusted R-squared: 0.8986 F-statistic: 89.64 on 2 and 18 DF, p-value: 4.382e-10

The regression line equation can be written as follows: stack.loss = -50.3588 + 0.6712*Air.Flow + 1.2954*Water.Temp.

One measure to test how good is the model is the coefficient of determination or R². For models that fit the data very well, R² is close to 1. On the other side, models that poorly fit the data have R² near to 0. In our example, **the model explains 89.86% of the data variability**. In other words, it explains quite a lot of the variance in the dependent variable. The F-statistic (89.64) *gives the overall significance of the model*, it produces a model p-value: 4.382e-10 which is highly significant.

The p-Value of individual predictor variables (extreme right column under ‘Coefficients’) are very important. When there is a p-value, there is a null and alternative hypothesis associated with it. The null hypothesis H_{0} is that the coefficients associated with the variables are equal to zero (i.e., no relationship between x_{i} and y). The alternate hypothesis H_{a} is that the coefficients are not equal to zero (i.e., there is some relationship between x_{i} and y).** In our example, both the p-values for the Intercept, Air.Flow, and Water.Temp (the predictor variables) are significant, so we can reject the null hypothesis, there is a significant relationship between the predictors (Air.Flow, Water.Temp) and the outcome variable (stack.loss).**

The residual data of the linear regression model is the difference between the observed data of the dependent variable y and the fitted values ŷ. We can plot them: *plot(model$residuals)* and **they should look random**.

Observe that the section **residuals** (model <- lm(stack.loss ~ Air.Flow + Water.Temp, data = stackloss), summary(model)) provides a quick view of the distribution of the residuals. The median (0.1894) is close to zero, and the minimum and maximum values are somehow close in absolute magnitude (-7.5290, 5.6588 ).

plot(cooks.distance(model), pch = 10, col = "red") # Cook's distance is used in Regression Analysis to find influential points and outliers. It’s a way to identify points that negatively affect the regression model. We are going to plot them. cook <- cooks.distance(model) significantValues <- cook > 1 # As a general rule, cases with a Cook's distance value greater than one should be investigated further. significantValues # We get none of those points. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 16 17 18 19 20 21 FALSE FALSE FALSE FALSE FALSE FALSE

Let's see another example of multiple linear regression using Edgar Anderson's Iris Data.

> data(iris) # The iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. > summary(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50 Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 > model <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data = iris) > summary(model) Call: lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data = iris) Residuals: Min 1Q Median 3Q Max -0.82816 -0.21989 0.01875 0.19709 0.84570 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.85600 0.25078 7.401 9.85e-12 *** Sepal.Width 0.65084 0.06665 9.765 < 2e-16 *** Petal.Length 0.70913 0.05672 12.502 < 2e-16 *** Petal.Width -0.55648 0.12755 -4.363 2.41e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3145 on 146 degrees of freedom Multiple R-squared: 0.8586, Adjusted R-squared: 0.8557 F-statistic: 295.5 on 3 and 146 DF, p-value: < 2.2e-16 > plot(model$residuals)

The regression line equation can be written as follows: Sepal.Length = 1.85600 + 0.65084*Sepal.Width + 0.70913*Petal.Length + -0.55648*Petal.Width. In this example, **the model explains 85.57% of the data variability** (it explains quite a lot of the variance in the dependent variable, Sepal.Length). The F-statistic (295.5) *gives the overall significance of the model*, it produces a model p-value: < 2.2e-16 which is highly significant.

Both the p-values for the Intercept and the individual predictor variables (Sepal.Width, Petal.Length, and Petal.Width) are significant, so we can reject the null hypothesis, there is a significant relationship between the predictors (Sepal.Width, Petal.Length, and Petal.Width) and the outcome variable (Sepal.Length).

The residual data of the linear regression model is the difference between the observed data of the dependent variable y and the fitted values ŷ. We can plot them: *plot(model$residuals)* and **they look random**.

Let's have a third and last example of multiple linear regression using the trees dataset. It provides measurements of the diameter (it is labelled Girth), height and volume of timber in 31 felled black cherry trees.

> summary(trees) Girth Height Volume Min. : 8.30 Min. :63 Min. :10.20 1st Qu.:11.05 1st Qu.:72 1st Qu.:19.40 Median :12.90 Median :76 Median :24.20 Mean :13.25 Mean :76 Mean :30.17 3rd Qu.:15.25 3rd Qu.:80 3rd Qu.:37.30 Max. :20.60 Max. :87 Max. :77.00 > model <- lm(Volume ~ Girth + Height, data = trees) > summary(model) Call: lm(formula = Volume ~ Girth + Height, data = trees) Residuals: Min 1Q Median 3Q Max -6.4065 -2.6493 -0.2876 2.2003 8.4847 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -57.9877 8.6382 -6.713 2.75e-07 *** Girth 4.7082 0.2643 17.816 < 2e-16 *** Height 0.3393 0.1302 2.607 0.0145 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.882 on 28 degrees of freedom Multiple R-squared: 0.948, Adjusted R-squared: 0.9442 F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16

The regression line equation can be written as follows: Volume = -57.9877 + 4.7082*Girth + 0.3393*Height. In this example, **the model explains 94.42% of the data variability**. It explains most of the variance in the dependent variable, Volume. The F-statistic (255) *gives the overall significance of the model*, it produces a model p-value: < 2.2e-16 which is highly significant.

Both the p-values for the Intercept and the individual predictor variables (Girth and Height) are significant, so we can reject the null hypothesis, there is a significant relationship between the predictors (Girth and Height) and the outcome variable (Volume).

The post Statistics III. T-tests, Correlations and Linear Regression first appeared on JustToThePoint.

]]>The post Statistics II. Chi-squared Test, ANOVA first appeared on JustToThePoint.

]]>Everyone knows that smoking is a bad idea. The question that we will try to address in this article is the following: *Do smoking habits differ between women and men?* We are going to use some data from the Centers for Disease Control.

It defines everyday smokers as current smokers who smoke every day; some-day smokers are current smokers who smoke on some days. Former smokers have smoked at least 100 cigarettes in their lifetime but currently do not smoke at all.

*sexSmoke<-rbind(c(13771, 4987, 30803), c(11722, 3674, 24184))*

# The **rbind() **function is used to bind or combine these two vectors by rows.

*dimnames(sexSmoke)<-list(c("Male", "Female"), c("Every-day smoker", "Some-day smoker", "Former smoker"))*

# The **dimnames() **command sets the row and column names of a matrix

> sexSmoke Every-day smoker Some-day smoker Former smoker Male 13771 4987 30803 Female 11722 3674 24184 > chisq.test(sexSmoke) Pearson's Chi-squared test data: sexSmoke X-squared = 43.479, df = 2, p-value = 3.62e-10

The chi-square test of independence is used to determine whether there is a statistically significant difference between the expected and the observed frequencies in one or more categories of a contingency table.

**Null hypothesis (H0): the row and the column variables of the contingency table are independent. **The test statistic computed from the observations follows a χ2 frequency distribution. *The null hypothesis of the independence assumption is to be rejected if the p-value of the Chi-squared test statistic is less than a given significance level α* (=0.05).

Alternative hypothesis (H1): the row and the column variables of the contingency table are dependent.

In our example, the p-value of the Chi-squared test statistic = 3.62e-10, the row and the column variables are statistically significantly associated. In other words, smoking habits differ between women and men.

Most people will stop here—but the more adventurous try to understand better the Chi-squared test.

sexSmoke Every-day smoker Some-day smoker Former smoker Male 13771 4987 30803 49561 Female 11722 3674 24184 39580 25493 8661 54987 89141

By the assumption of independence under the null hypothesis we should "expect" the number of male every-day smoker to be = ^{25493*49561}⁄_{89141} = 14173.7087648.

> chisq<-chisq.test(sexSmoke) > round(chisq$expected, 2) # We can extract the expected counts from the result of the test Every-day smoker Some-day smoker Former smoker Male 14173.71 4815.38 30571.91 Female 11319.29 3845.62 24415.09

What is the difference between the expected and observed results? The difference between the expected and observed count in the "male every day smoker" cell is ^{(observed-expected)2}⁄_{expected} = ^{(13771-14173.71)2}⁄_{14173.71} = 11.4419826637.

> (chisq$observed-chisq$expected)^2/(chisq$expected) # We can extract the expected and observed counts from the result of the test and calculate the difference Every-day smoker Some-day smoker Former smoker Male 11.44191 6.116505 1.746773 Female 14.32725 7.658922 2.187262

The sum of these quantities on all cells is the test statistic; in this case, χ^{2} = ∑^{(observed-expected)2}⁄_{expected} = 43.47863.

> m= (chisq$observed-chisq$expected)^2/(chisq$expected) sum(m) [1] 43.47863

Under the null hypothesis, this sum has approximately a chi-squared distribution with n = (number of rows-1) (number of columns-1) = (2 - 1) (3 - 1) = 2 degrees of freedom.

The test statistic is improbably large according to that chi-squared distribution, so we should reject the null hypothesis of independence.

The one-way analysis of variance (ANOVA) is used to **determine whether there are any statistically significant differences between the means of three or more independent groups**. The data is organized into several groups based on one single grouping variable (the factor or independent variable), so there is one categorical independent variable and one quantitative dependent variable.

Imagine that you want to test the effect of three different fertilizer mixtures on crop yield or the effect of three drugs on pain. You can use *a one-way ANOVA to find out if there is a difference in crop yields or a difference in pain between the three groups*. The independent variable is the type of fertilizer or drug. A pharmaceutical company conducts an experiment to test the effect of its new drug. The company selects some subjects and each subject is randomly assigned to one of three treatment groups where different doses are applied. The drug is the independent variable.

ANOVA test hypotheses:

H_{0}, null hypothesis: μ_{0} = μ_{1} = μ_{2} = ... =μ_{k}, where μ = group mean and k = number of groups. In other words, the means of the different groups are identical.

H_{A}, alternative hypothesis: At least, the mean of one group is different.

Let's evaluate the relationship between self-reported sleeping hours and Coke consumption (too much, normal, nothing at all). The data is organized into several groups based on one single grouping variable "Coke consumption" and one quantitative dependent variable "sleeping hours."

CokeSleeping.txt

Coke Sleeping toomuch 4 toomuch 3 toomuch 3 toomuch 5 toomuch 7 toomuch 5 normal 7 normal 8 normal 6 normal 8 normal 7 normal 7 normal 8 normal 8 nothing 7 nothing 8 nothing 7 nothing 8 nothing 8 nothing 9

Df Sum Sq Mean Sq F value Pr(>F) Coke 2 40.34 20.171 18.83 4.88e-05 *** Residuals 17 18.21 1.071 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The output includes the F-value column. This is the test statistic from the F test. The idea behind an ANOVA test relies on estimating the common population variance in two different ways: 1) through the common variance, the mean of the sample variances, which is called the variance within samples or residual variance and denoted S^{2}_{within}, and 2) through the variance of the sample means, which is called the variance between samples means and denoted S^{2}_{between.}

The **F-statistic is the ratio of the variance between samples means to the variance within sample means**: S^{2}_{between}/S^{2}_{within}. When the means are not significantly different, the variance of the sample means will be small, relative to the mean of the sample variances. When at least one mean is significantly different from the others, the variance of the sample means will be larger, relative to the mean of the sample variances. The larger the F value, the more likely it is that the variation caused by the independent variable is real and not due to chance.

As the p-value is less than the significance level 0.05, there are significant differences between the groups (it is highlighted with an asterisk “*" in the model summary). In other words, the Coke consumption has a real impact on sleeping.

summary(cocaSleep) Coke Sleeping Length:20 Min. :3.00 Class :character 1st Qu.:5.75 Mode :character Median :7.00 Mean :6.65 3rd Qu.:8.00 Max. :9.00 >tapply(Sleeping, Coke, mean)# computes the mean for each factor variable (Coke) in a vector (Sleeping) normal nothing toomuch 7.375000 7.833333 4.500000 >tapply(Sleeping, Coke, sd)normal nothing toomuch 0.7440238 0.7527727 1.5165751

SS_{between} = Sum Sq = "sum of squares between groups” = ((7.375 - 6.65)^{2} * 8 + (7.8333 - 6.65)^{2} * 6 + (4.50 - 6.65)^{2} * 6 = 40.34. Observe that (7.375 - 6.65)^{2} * 8 = (Mean_{normal} - Mean)^{2} * Count_{normal}). **If the sample means are close to each other (and therefore the grand mean) this will be small.**

Notice that MS means Mean Square:

MS_{between} = Mean Sq = ^{SSbetween}⁄_{dfbetween} = ^{40.34}⁄_{2} = 20.17.

Coke Sleeping Mean_{toomuch}toomuch 4 4.5 toomuch 3 4.5 toomuch 3 4.5 toomuch 5 4.5 toomuch 7 4.5 toomuch 5 4.5

SS_{within} = "Sum of squares within groups” = (4-4.5)^{2} +(3-4.5)^{2} +(3-4.5)^{2} +(5-4.5)^{2}+(7-4.5)^{2} +(5-4.5)^2

+ (7-7.375)^{2} + (8-7.375)^{2}+ (6-7.375)^{2} +(8-7.375)^{2}+ (7-7.3753)^{2} +(7-7.375)^{2} +(8-7.375)^2 +(8-7.375)^{2}

+ (7-7.83)^{2} +(8-7.83)^{2}+ (7-7.83)^{2}2+ (8-7.83)^{2}+ (8-7.83)^{2}+ (9-7.83)

= 11.5 + 5.55 + 4.09 = 11.5 + 3.87 + 2.83 = 18.21

MS_{within} = ^{SSwithin}⁄_{dfwithin} = ^{18.21}⁄_{17} = 1.07.

Finally, F = ^{MSbetween}⁄_{MSwithin} = ^{20.171}⁄_{1.07} = 18.83.

library("ggpubr") ggboxplot(cocaSleep, x = "Coke", y = "Sleeping", color = "Coke", palette = c("blue", "red", "yellow"), ylab = "Sleeping", xlab = "Coke")

TukeyHSD(anova) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Sleeping ~ Coke) $Coke diff lwr upr p adj nothing-normal 0.4583333 -0.9755105 1.892177 0.6960429 toomuch-normal -2.8750000 -4.3088438 -1.441156 0.0002279 toomuch-nothing -3.3333333 -4.8661769 -1.800490 0.0000940

It can be seen from the output's results (**diff** is the *difference between means of the two groups*; lwr and upr are the lower and the upper-end point of the confidence interval; and p adj, p-value after adjustment for the multiple comparisons) that there is significant difference between drinking too much Coke and normal, and drinking too much and nothing.

As we can see in the plot above, it looks quite random, there is no evident relationships between residuals and fitted values (the mean of each group). Therefore, we can assume the homogeneity of variances in the different groups. The Levene's test is also available to check the homogeneity of variances.

library(car) leveneTest(Sleeping~Coke)Levene's Test for Homogeneity of Variance (center = median) Df F value Pr(>F) group 2 2.2956 0.1311 17

"Levene's test is an inferential statistic used to *assess the equality of variances for two or more groups*. If the resulting p-value of Levene's test is less than some significance level (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances," Wiki, Levene's test.

From our result, the p-value 0.1311 > 0.05 so there is no evidence to suggest that the variance across groups is statistically significantly different. In other words, we can assume the homogeneity of variances in the different groups.

A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables. It not only aims at **assessing the main effect of each independent variable but also if there is any interaction between them**.

Hypotheses:

H_{0}, null hypothesis: The means of the dependent variable for each group in the first independent variable are equal

H_{A}, alternative hypothesis: The means of the dependent variable for each group in the first independent variable are not equal

Similar hypotheses are considered for the other independent variable and the interaction effect.

Let's determine if there is a statistically significant difference in "Sleeping hours" based on Coke consumption (too much, normal, nothing at all) and gender. The data is organized based on two grouping independent variables "Coke consumption" and "Gender", and one quantitative dependent variable "Sleeping hours."

CokeSleeping.txt

Gender Coke Sleeping f toomuch 5 m toomuch 5 m toomuch 4 f toomuch 5 f toomuch 4 f toomuch 6 m toomuch 4 f toomuch 5 f toomuch 6 f toomuch 6 m toomuch 6 f toomuch 6 f normal 6 f normal 7 m normal 5 f normal 8 m normal 4 m normal 5 m normal 5 m normal 4 f normal 8 f normal 6 f normal 7 f normal 7 m normal 6 m nothing 6 f nothing 8 m nothing 6 f nothing 8 f nothing 7 m nothing 8 f nothing 6 m nothing 8

Gender Coke Sleeping Length:33 Length:33 Min. :4.00 Class :character Class :character 1st Qu.:5.00 Mode :character Mode :character Median :6.00 Mean :5.97 3rd Qu.:7.00 Max. :8.00

Df Sum Sq Mean Sq F value Pr(>F) Gender 1 7.120 7.120 9.513 0.00467 ** Coke 2 21.948 10.974 14.662 4.89e-05 *** Gender:Coke 2 5.693 2.847 3.803 0.03505 * Residuals 27 20.208 0.748 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

From the ANOVA table above, we can conclude that both independent variables "Coke consumption" and "Gender" are statistically significant, as well as their interaction. Futhermore, Coke consumption is the most significant factor variable.

library("ggpubr") ggboxplot(cocaSleep, x = "Coke", y = "Sleeping", color = "Gender", palette = c("red", "blue")) #Create a boxplot: x = "Coke" is the name of the x variable, y = "Sleeping" is the name of the y variable, and we change outline colors by groups (color = "gender") and use custom color palette ( palette = c("red", "blue") ).

>tapply(Sleeping, list(Gender, Coke), mean)# computes the mean for each factor variable (Gender and Coke) in a vector (Sleeping) normal nothing toomuch f 7.000000 7.25 5.375 m 4.833333 7.00 4.750 > tapply(Sleeping, list(Gender, Coke), sd) normal nothing toomuch f 0.8164966 0.9574271 0.7440238 m 0.7527727 1.1547005 0.9574271

Caffeine can have a disruptive effect on your sleep. Sodas are "loaded with caffeine and lots of sugar. The caffeine can make it hard to fall asleep, and the sugar may affect your ability to stay asleep", sleepeducation.org. This data confirm our assumption, but it also indicates that gender plays a role, it is worse in men.

We will compute Tukey HSD for performing multiple pairwise-comparison between the means of groups. We don’t need to perform the test for the “Gender” variable because it has only two levels, which have been already proven to be significantly different by the two-way ANOVA test. Therefore, the Tukey HSD test will be done only for the factor variable “Coke”.

> TukeyHSD(anova, which = "Coke") Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Sleeping ~ Gender * Coke) $Coke diff lwr upr p adj nothing-normal 1.1611481 0.1972612 2.12503488 0.0158381 toomuch-normal -0.9538269 -1.8125255 -0.09512825 0.0272046 toomuch-nothing -2.1149749 -3.0940420 -1.13590787 0.0000340

As we can see from the output below, all pairwise comparisons are significant with an adjusted p-value < 0.05. However, it is also revealed a more significant difference between drinking too much Coke or not at all.

plot(anova, 1) # A "residuals versus fits plot" is a scatter plot of residuals on the y axis and fitted values (estimated responses) on the x axis. It is used to detect non-linearity, unequal error variances, and outliers. library(car) leveneTest(Sleeping~Gender*Coke) > leveneTest(Sleeping~Gender*Coke) Levene's Test for Homogeneity of Variance (center = median) Df F value Pr(>F) group 5 0.7188 0.6149 27

The residuals bounce randomly around the 0 line. Points 5 (f toomuch 5), 11 (f toomuch 6) and 32 (m nothing 8) are detected as outliers, you may need to consider to remove them from your study. Furthermore, we can see that the Levene's p-value (0.6149) is greater than the significance level of 0.05. In other words, there is no evidence to suggest that the variance across groups is statistically significantly different and **we can assume the homogeneity of variances in the different groups**.

> plot(anova, 2) # A "normal probability plot of the residuals" is a way of learning whether it is reasonable to assume that the error terms are normally distributed. If the data follow a normal distribution, then a plot of the theoretical percentiles of the normal distribution versus the observed sample percentiles (the percentiles of the residuals) should be approximately linear. > # Extract the residuals > my_residuals <- residuals(object = anova) > # Run Shapiro-Wilk test > shapiro.test(x = my_residuals ) # The Shapiro–Wilk test is a test of normality in statistics. W = 0.93722, p-value = 0.05641 indicates that normality is not violated. Shapiro-Wilk normality test data: my_residuals W = 0.93722, p-value = 0.05641

A one-way repeated measures ANOVA or a within-subjects ANOVA is the equivalent of the one-way ANOVA, **but for related, not independent groups**. It requires one independent categorical variable and one dependent variable. *It is used to compare three or more group means where the participants are the same in each group*.

For example, we are going to use a repeated measures ANOVA to understand whether there is a difference in memory retention (subjects are given a list of words that they need to memorize for a test) over time (half an hour, an hour, and next day). In this example, "memory" is the dependent variable (the number of words being remembered by our subjects), whilst the independent variable is "time" (i.e., with three related groups, where each of the three time points is considered a "related group").

memory.text

Subject Time Memory Peter HalfHour 32 Peter Hour 30 Peter NextDay 12 Paul HalfHour 30 Paul Hour 28 Paul NextDay 10 Martha HalfHour 34 Martha Hour 32 Martha NextDay 16 John HalfHour 28 John Hour 24 John NextDay 8 Esther HalfHour 28 Esther Hour 20 Esther NextDay 4 Bew HalfHour 38 Bew Hour 34 Bew NextDay 22 Robert HalfHour 32 Robert Hour 30 Robert NextDay 12

ANOVA Table (type III tests) $ANOVA Effect DFn DFd F p p<.05 ges 1 Time 2 12 302 5.47e-11 * 0.789 $`Mauchly's Test for Sphericity` Effect W p p<.05 1 Time 0.976 0.942 # The assumption of sphericity is automatically checked during the ANOVA test. The null hypothesis is that the variances of the differences between groups are equal. Therefore, a significant p-value indicates that the variances of group differences are not equal, and a not significant p-value (p = 0.942 > 0.05) indicates thatwe can assume sphericity.

The Memory is statistically significantly different at the different time points: F(2, 12) = 302, p = 5.47e-11 < .05.

Considering the Bonferroni adjusted p-value (p.adj), it can be seen that the effect of time in memory is statistically significant at each time point. It is less statistically significant after just half an hour which makes a lot of sense as half an hour is not much time.

It is used when there are two factors in the experiment and the same subjects take part in all of the conditions of the experiment. We evaluate simultaneously the effect of two within-subject factors and their interaction on a dependent variable.

Suppose we have a slightly different experiment in which there are two independent variables: time of day at which subjects are tested (with three levels: half an hour, an hour, and next day) and emotional reaction to words (with two levels: positive and negative). Subjects are given a memory test under all permutations of these two variables. A two-way repeated-measures ANOVA is an appropriate test in these circumstances.

memoryEmotion.text

Subject Time Emotion Memory Peter HalfHour Positive 34 | Peter HalfHour Negative 32 Peter Hour Positive 30 | Peter Hour Negative 27 Peter NextDay Positive 14 | Peter NextDay Negative 11 Paul HalfHour Positive 31 | Paul HalfHour Negative 27 Paul Hour Positive 28 | Paul Hour Negative 25 Paul NextDay Positive 14 | Paul NextDay Negative 9 Martha HalfHour Positive 34 | Martha HalfHour Negative 31 Martha Hour Positive 32 | Martha Hour Negative 31 Martha NextDay Positive 16 | Martha NextDay Negative 9 John HalfHour Positive 29 | John HalfHour Negative 27 John Hour Positive 24 | John Hour Negative 21 John NextDay Positive 12 | John NextDay Negative 7 Esther HalfHour Positive 28 | Esther HalfHour Negative 27 Esther Hour Positive 20 | Esther Hour Negative 19 Esther NextDay Positive 10 | Esther NextDay Negative 7 Bew HalfHour Positive 38 | Bew HalfHour Negative 37 Bew Hour Positive 34 | Bew Hour Negative 32 Bew NextDay Positive 22 | Bew NextDay Negative 12 Robert HalfHour Positive 32 | Robert HalfHour Negative 31 Robert Hour Positive 30 | Robert Hour Negative 28 Robert NextDay Positive 15 | Robert NextDay Negative 9

ANOVA Table (type III tests) $ANOVA Effect DFn DFd F p p<.05 ges 1 Time 2 12 324.037 3.61e-11 * 0.844 2 Emotion 1 6 91.263 7.51e-05 * 0.170 3 Time:Emotion 2 12 10.073 3.00e-03 * 0.051

The Memory is statistically significantly different at the different time points (F(2, 12) = 324.037, p < 0.05) and under both emotional words conditions (F(1, 6) = 91.263, p < 0.05). There is also a statistically significant two-way interaction between Time and Emotion (F(2, 12) = 10.073, p < 0.05). In other words, the impact that one factor (e.g., time) has on the outcome or dependent variable (Memory) depends on the other factor (Emotion) and vice versa.

Considering the Bonferroni adjusted p-value (p.adj), it can be seen that the effect of time in memory is statistically significant at each time point.

"The binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes-no question, and each with its own Boolean-valued outcome: success (with probability p) or failure (with probability q = 1 − p)," (Wikipedia, Binomial distribution).

A binomial test compares a sample proportion to a hypothesized proportion. The test has the following null and alternative hypotheses:

H0: π = p, the population proportion π is equal to some value p.

HA: π ≠ p, the population proportion π is not equal to some value p.

**Binomial tests are used in situations where a researcher wants to know whether or not a subject is just guessing** or truly is able to perform some task. For example, a monkey is shown two pictures of dots simultaneously. Then, it is shown two choices: the previous dots added together and a different number. Let's suppose that the experiment is repeated 100 times and the monkey ended up selecting the right answer on 68 of the trials.

**binom.test(68, 100, 0.5, alternative='greater')** performs an exact test of a simple null hypothesis about the probability of success in a Bernoulli experiment where: x = 68 is the number of successes; n = 100 is the number of trials; p = 0.5 is the hypothesized probability of success; and alternative='greater' indicates the alternative hypothesis.

The binomial test tells us something about what flipping a coin could have done (p = 0.5). It states that the binomial process, flipping a coin p=1/2, is capable of producing 65/100 or greater with only a very small probability (0.0002044). Therefore, the monkey is not really good at Maths, but it is very unlikely that the 68% correct answers could be explained by pure chance, so it was not just guessing. Hire the fucking monkey!

In other words, p-value is less than 0.05. It indicates strong evidence against the null hypothesis. Therefore, we reject the null hypothesis, and accept the alternative hypothesis: the true probability of success is greater than 0.5.

How can we distinguish the fair from the loaded dice?

**binom.test(14, 60, 1/6)**, so x = 14 is the number of successes of an outcome, e.g., the die landing with a six; n = 60 is the number of trials; and p = 1/6 is the hypothesized probability of success (a fair dice).

p-value = 0.1664, it is higher than 0.05 (> 0.05), so it is not statistically significant. It indicates strong evidence for the null hypothesis. This means that we retain the null hypothesis, p = 1/6, the dice is fair.

**binom.test(30, 60, 1/6)**, so x = 30, the die has landed with a six 30 times; n = 60 is the number of trials; and p = 1/6 is the hypothesized probability of success (a fair dice).

p-value = 2.785e-09, it is lower than 0.05 (> 0.05), so it is statistically significant. It indicates strong evidence against the null hypothesis. This means that the dice is loaded. In other words, it is very unlikely that a fair dice will land on six 30 times out of 60.

The post Statistics II. Chi-squared Test, ANOVA first appeared on JustToThePoint.

]]>The post Statistics first appeared on JustToThePoint.

]]>A statistician can have his head in the oven and his feet in the freezer, and he will say that on average he feels fine. Statistics show that those who celebrate more birthdays live longer. Have you heard the latest statistics joke? Probably!

**LibreOffice Calc** is *the free spreadsheet program of the LibreOffice suite*. It comes with many functions, including those for imaginary numbers, as well as financial and statistical functions. It is very intuitive and easy to learn.

Let's work with a spreadsheet that contains data about average life expectancy by country.

Life Expectancy Country Code Life Expectancy Aruba ABW 76.293 Afghanistan AFG 64.833 Angola AGO 61.147 Albania ALB 78.573 United Arab Emirates ARE 77.972 Argentina ARG 76.667 Armenia ARM 75.087 Antigua, Barbuda ATG 77.016 Australia AUS 82.9 Austria AUT 81.79...

We are going to analyze the data in our spreadsheet using Calc's functions. A **function** is a *predefined calculation or formula entered in a cell to help you analyze or manipulate data in a spreadsheet*.

You can insert functions in Calc either using the Function Wizard or manually:

R is a free programming language and statistical software for analyzing and visualizing data.

You could download it and install it from The Comprehensive R Archive Network. You should install RStudio, too (sudo apt-get install gdebi-core && wget https://download2.rstudio.org/server/bionic/amd64/rstudio-server-1.4.1717-amd64.deb && sudo gdebi rstudio-server-1.4.1717-amd64.deb). It runs on port 8787 and accepts connections from all remote clients. After installation, open a web browser and access the server by typing the following address: http://<server-ip>:8787.

varianza(results) = 1.372449 < 7.142857. It means that these academic results are less dispersed and spread out or, in other words, their average 5.357143 represents much better the data set.

I would like to say something about the old joke, "A statistician can have his head in the oven and his feet in the freezer, and he will say that on average he feels fine."

> x <- c(50, 0) # Let's say that the head in the oven is at 50°C and his feet in the freezer are at 0°C. > mean(x) [1] 25 # The average temperature may be comfortable... > varianza(x) [1] 625 # ... but, there is always a but in this world, the variance is very high. It indicates thatthe data points are very spread out from the mean, and from one another.

In 1992 Mackowiak, Wasserman, and Levine measured the body temperatures of 65 men and 65 women: oral (33.2 – 38.2°C, avg(33.2, 38.2) = 35.7), rectal (please, *let's not go into details*, 34.4 –37.8°C, avg(34.4, 37.8) = 36.1), tympanic (ear canal, 35.4 – 37.8°C, avg(35.4, 37.8) = 36.6), axillary (armpit, 35.5 –37.0°C, avg(35.5, 37.0) = 36.25).

> x <- c(35.7, 36.1, 36.6, 36.25) > mean(x) [1] 36.1625 > varianza(x) [1] 0.1042187 # A small variance indicates thatthe data points are very close to the mean, and to each other. All of the data values are almost identical.

Finally, we are going to illustrate a concept with an example:

> results <- c(4, 6, 7, 5, 4, 6, 7, 4, 5, 4, 6, 6, 7, 4) > mean(results) [1] 5.357143 > median(results) [1] 5.5 > results <- c(4, 6, 7, 5, 4, 6, 37, 4, 5, 4, 6, 6, 7, 4) # An outlier (37, an outlier is a data point that differs significantly from other observations) has been introduced into the data set. > mean(results) [1] 7.5 # The 37 is an outlier,which distorts the mean, therefore the median is a better option. > median(results) [1] 5.5

Let's use the following survey: Public opinion on the most important problems facing the U.S. 2021.

"20 percent of respondents stated that poor leadership and a general dissatisfaction with the government were the most important problems facing the U.S. Furthermore, another 25 percent of respondents said that the most important problem facing the U.S. was the coronavirus pandemic," published by Statista Research Department, Mar 29, 2021. Besides, 8% Race relations/Racism, 8% Economy in general, 8% Immigration, 31 Others (Unemployment, Healthcare, debt, etc.).

We are going to define two vectors: *problems <- c(25, 20, 8, 8, 8, 31)* (problems facing the US) and *labels <- c("Covid", "Government", "Racism", "Economy", "Immigration", "Others")* (labels or descriptions).

**Next, we will create a barplot**: *barplot(problems, main="Most important problems facing the US", ylab="Percentage of people", names.arg=labels, col=rainbow(6))*, where **problems** determine the heights of the bars in the plot. We include the option *names.args=labels* to label the bars, add a title (main="Most important problems facing the US") and a label for the y-axis (ylab="Percentage of people"), and use rainbow colors (col=rainbow(6)).

We are going to try to make the labels more descriptive by adding the percentage value: *newlabels <- paste(labels, "", problems, "%")*

Let's ask R to draw a pie: *pie(problems, main="Most important problems facing the US", labels=newlabels, col=rainbow(6))*. Pie charts are created with the function **pie(x, labels=names)** where x is a vector indicating the area of each slice and labels is a character vector of names for the slices.

Surely we can do better than that!

We are going to install two packages:

library(plotrix) # Load the plotrix package library(viridis) # Load the viridis package problems <- c(25, 20, 8, 8, 8, 31) labels <- c("Covid", "Government", "Racism", "Economy", "Immigration", "Others") newlabels <- paste(labels, "", problems, "%")pie3D(problems, main="Most important problems facing the US", labels=newlabels, col = viridis(6), explode = 0.3)

**pie3D** is the function in the plotrix package that *provides 3d pie charts*. We use the function viridis() in the viridis package to generate the colors, too.

**Box Plot in R.** *x=rnorm(500, mean=0, sd=1)*. rnorm(n, mean, sd) generates a vector of normally distributed random numbers.

*boxplot(x)* draws a boxplot for x. A boxplot is a standardized way of displaying the distribution of data based on a five-number summary: minimum (Q_{0}, the lowest data point excluding any outliers), first quartile (Q_{1}), median, third quartile (Q_{3}), and maximum (Q_{4}, the largest data point excluding any outliers). A boxplot is constructed of two parts: a box drawn from Q_{1} to Q_{3} with a horizontal line drawn in the middle to denote the median and a set of whiskers. There is a line like a whisker connecting the first quartile to the minimum and a second line from the third quartile to the maximum.

Anything outside the whiskers is considered as an outlier (dots).

Let's create a **stem-and-leaf plot with R**. It is basically a special table where each data value is split into a "leaf" (the last digit) and a "stem" (all the other digits). It is drawn with two columns separated by a vertical line. The stems are listed to the left of the vertical line. The leaves are listed in increasing order in a row to the right of each stem.

edades <- c(35, 36, 36, 35, 57, 69, 24, 25, 21, 39, 37, 30, 29, 28, 34, 40, 20) # edades is a vector representing ages from a given population. stem(edades)

Finally, we are going to visualize some data about the evolution of unemployment in some countries using time plots. Our input data is in csv format with commas as delimiters:

"Country", "Year", "Unemployment"

"AUS","2005",5.033881

"AUS","2006",4.78524

"AUS","2007",4.379151

"AUS","2008",4.23433

"AUS","2009",5.560385

....

unemployment <- read.table("/path/Unemployment.csv", header = T, sep = ",")

# **Reads a file in table format and creates a data frame from it**; header = T indicates that the file contains the names of the variables as its first line. sep is the field separator character, values on each line of the file are separated by commas.

unemploymentAUS <- subset(unemployment, Country =="AUS", select = c("Year", "Unemployment"))

# The **subset()** function is the easiest way to *select variables and observations*. We select all rows where "Country" is AUS (AUS is the three-letter country abbreviation for Australia) and keep the Year and Unemployment columns.

unemploymentES <- subset(unemployment, Country =="ESP", select = c("Year", "Unemployment")) unemploymentGBR <- subset(unemployment, Country =="GBR", select = c("Year", "Unemployment")) unemploymentUSA <- subset(unemployment, Country =="USA", select = c("Year", "Unemployment")) unemploymentFRA <- subset(unemployment, Country =="FRA", select = c("Year", "Unemployment")) unemploymentGRC <- subset(unemployment, Country =="GRC", select = c("Year", "Unemployment")) plot(unemploymentES, col = "violet", xlab = "Time", type = "o", ylim = range(unemployment["Unemployment"])) # Weplot unemployment rate evolution in Spain(x = Year, y = Unemployment), specify the x-axis label (xlab = "Time"), set the limits of the y-axis (ylim = range(unemployment["Unemployment"])) and the type of plot that gets drawn (it will display segments and points, but with the line overplotted). title("Unemployment evolution") # Define a title of the plot. lines(unemploymentAUS, col="yellow", type="o") # lines() can not produce a plot on its own. However, it can be used to add lines() on an existing graph. It plots unemployment rate evolution in Australia. lines(unemploymentGBR, col="red", type="o") lines(unemploymentUSA, col="blue", type="o") lines(unemploymentFRA, col="orange", type="o") lines(unemploymentGRC, col="green", type="o") legend("topleft", c("ES","AUS", "GBR", "USA", "FRA", "GRC"), lwd=c(5,2), col=c("violet","yellow", "red", "blue", "orange", "green"), y.intersp = 1) # The legend() function is used to add legends to plots. y.intersp is character interspacing factor for vertical (y) spacing. lwd=c(5,2) is used to define the line widths for lines appearing in the legend (other ideas: lwd=c(1), lwd=c(1,2,1,2,1,2), etc.).

We are going to use the prob package. It is a framework for performing elementary probability calculations on finite sample spaces. Let's install it: *install.packages("prob", repos="http://R-Forge.R-project.org")*.

R does not need to be installed. You can also open your favourite browser and open this location: https://rdrr.io/, a comprehensive index of R packages and documentation from CRAN. It lets you run any R code through your browser.

**library(prob)** # Load the library prob.

*tosscoin(2, makespace=TRUE)* # Sets up a sample space for the experiment of tossing a coin two times with the outcomes "H" (heads) or "T" (tails).

**S<-tosscoin(2, makespace=TRUE)
Prob(S, toss1=="H" & toss2=="H")**

If we consider all possible outcomes, there was only one (

*rolldie(1, makespace=TRUE) *# Sets up a sample space for the experiment of rolling a die once. Since there are six possible outcomes, the probability of obtaining any side of the die is = ^{Number of favorable outcomes}⁄_{total number of possible outcomes} = ^{1}⁄_{6} = 0.166...

The die is thrown twice. What is the probability of getting a sum of 7?

library(prob) S<-rolldie(2, makespace=TRUE) Prob(S, X1+X2 == 7)

Prob(S, X1+X2 == 7) = ^{Number of favorable outcomes}⁄_{total number of possible outcomes} = ^{6 (1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)}⁄_{36} = ^{6}⁄_{36} = 0.166...

A die is thrown twice, what is the probability of getting a doublet?

Prob(S, X1==X2) = ^{Number of favorable outcomes}⁄_{total number of possible outcomes} = ^{6 (1, 1),(2, 2), (3, 3), (4, 4), (5,5 ), (6, 6)}⁄_{36} = ^{6}⁄_{36} = 0.166...

A die is thrown twice, what is the probability of getting two 6's?

When some events do not affect one another, they are known as independent events. Examples of independent events include rolling a die more than once, flipping a coin multiple times, and spinning a spinner. If A and B are independent events, P(A ∩ B) = P(A) P(B).

Prob(S, X1==6 & X2==6) = ^{Number of favorable outcomes}⁄_{total number of possible outcomes} = ^{1 (6, 6)}⁄_{36} = ^{1}⁄_{36} = (The events of rolling a die multiple times are independent) = P(X1==6) * P(X2==6)= ^{1}⁄_{6} * ^{1}⁄_{6} = ^{1}⁄_{36} = 0.02777777777.

What is the probability of getting a doublet given that the sum of two dice is 8?

Prob(S, X1==X2, given = (X1+X2==8))

Conditional probability is the probability of an event occurring given that another event has already occurred. The conditional probability of A given B (it is usually written as p(A/B)) = ^{P(A ∩ B)}⁄_{P(B)}

P(A/B) = ^{P(A ∩ B)}⁄_{P(B)} = ^{1/36 (X1=4, X2=4)}⁄_{5/36 ( (2, 6), (6, 2), (3, 5), (5, 3), (4, 4) )} = ^{1}⁄_{5} = 0.2.

What is the probability that the sum of two dice is 8 given that a doublet has been seen?

Prob(S, X1+X2==8, given = (X1==X2))

P(B/A) = ^{P(A ∩ B)}⁄_{P(A)} = ^{1/36 (X1=4, X2=4)}⁄_{6/36 ( (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6) )} = ^{1}⁄_{6} = 0.166...

Prob(S, X1==6, given = (X1%%2==0))

P(A/B) = ^{P(A ∩ B)}⁄_{P(B)} = ^{1/6 (X1=6)}⁄_{3/6 ( X1 = 2, X1 = 4, X1 = 6 )} = ^{1}⁄_{3} = 0.333...

Prob(S, X1==6, given = ( (X2%%2==0) & (X1+X2>8) ) )

P(A/B) = ^{P(A ∩ B)}⁄_{P(B)} = ^{2/36 ((6, 4), (6, 6))}⁄_{6/36 ( (5, 4), (6, 4), (3, 6), (4, 6), (5, 6), (6, 6)} = ^{2}⁄_{6} = 0.333...

Prob(S, X1==6, given = ( (X2%%2!=0) & (X1+X2>8) ) )

P(A/B) = ^{P(A ∩ B)}⁄_{P(B)} = ^{2/36 ( (6, 3), (6, 5) )}⁄_{4/36 ( (6, 3), (4, 5), (5, 5), (6, 5))} = ^{2}⁄_{4} = 0.5.

There are three kinds of students: procrastinators (0.7), academically, average students (0.2), and over achievers (0.1): *prior<- c(0.7, 0.2, 0.1).*

The probability of a student passing an exam varies a lot between groups: 0.2 (procrastinators), 0.8 (average students), or 0.9 (over achievers): *posteriori<- c(0.2, 0.8, 0.9).*

The Bayes' theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event. It is defined as P(A/B) = ^{P(B/A) * P(A)}⁄_{P(B)}.

A student has passed his or her last exam but has forgotten to sign on the answer sheet. What is the probability of this student belonging to each of the groups?

p(B) = probability that a student passes an exam, p(B) = probability a student fails an exam.

P(A_{1}/B) = probability that a student is a procrastinator given that he has passed an exam.

P(A_{1}/B) = ^{P(B/A1) * P(A1)}⁄_{P(B)} = ^{0.2 * 0.7}⁄_{P(B)} = ...

P(B) = { A_{1} ∪ A_{2} ∪ A_{3} = S, the sample space} P(B ∩ A_{1}) + P(B ∩ A_{2}) + P(B ∩ A_{3}) = P(B/A_{1}) * P(A_{1}) + P(B/A_{2}) * P(A_{2}) + P(B/A_{3}) * P(A_{3}) = 0.2 * 0.7 + 0.8 * 0.2 + 0.9 * 0.1 = 0.39

P(A_{1}/B) = ^{0.2 * 0.7}⁄_{0.39} = 0.3589.

P(A_{2}/B) = ^{P(B/A2) * P(A2)}⁄_{P(B)} = ^{0.8 * 0.2}⁄_{ 0.39} = 0.4102.

P(A_{3}/B) = ^{P(B/A3) * P(A3)}⁄_{P(B)} = ^{0.9 * 0.1}⁄_{0.39} = 0.2307.

Let's do these calculations in R:

The post Statistics first appeared on JustToThePoint.

]]>The post Plotting functions II first appeared on JustToThePoint.

]]>Informally, a function is **increasing** if as x gets larger (x-value increases, looking left to right) f(x) or the y-values increases or gets larger. Formally, a function is increasing on the interval (a, b) if ∀ x_{1}, x_{2} ∈ (a, b) -for any x_{1} and x_{2} in the interval-: x_{1} < x_{2} ⇒ f (x_{1}) ≤ f(x_{2}). Similarly, a function is **decreasing** if as x get larger (x-value increases) f(x) or the y-values decreases. Formally, a function is increasing on the interval (a, b) if ∀ x_{1}, x_{2} ∈ (a, b): x_{1} < x_{2} ⇒ f (x_{1}) ≥ f(x_{2}).

We define a function in Maxima: f(x) := 8*x^3 +2*x +5; and plot it. The function is **monotonically increasing** ( ∀ x_{1}, x_{2}: x_{1} < x_{2} ⇒ f (x_{1}) ≤ f(x_{2})) because its derivate is always positive: 24x^{2} + 2 (*diff(f(x), x); returns the derivative of f(x) with respect to x*) ≥ 0.

Let's plot a more complicated function with Maxima and WolframAlpha. f(x):= ^{(x2 -4*x +4)}⁄_{(x - 3)};

Its derivate is ^{(x -4)(x -2)}⁄_{(x - 3)2}; Observe that *diff(f(x), x);* returns the derivate of f(x) with respect to x, *ratsimp(%);* simplifies the result, and factor(%); factors it. As (x - 3)^{2} ≥ 0, we only need to worry about the numerator: (x -4)(x -2). Therefore, the critical points are x= 2, 4. We only need to determine the sign of f'(x) over the intervals (-∞, 2), (2, 4) and (4, +∞).

*assume(x > 4);* tells Maxima to assume that x > 4. is(diff(f(x), x) > 0); returns true, so f'(x) > 0 if x > 4, and thus, f is increasing on (4, +∞).

We could have already reasoned that x > 4 ⇒ x > 2 ⇒ (x -4)(x -2) > 0 ⇒ f'(x) > 0. Another idea is diff(f(x), x); ev(%, x=5); returns ^{3}⁄_{4} > 0.

forget(x > 4); assume(x > 2, x < 4); is(diff(f(x), x) > 0); returns false, so f'(x) ≤ 0 (we really know that f'(x) < 0 because f' has only two roots: 2 and 4) if 2 < x < 4, and thus, f is decreasing on (2, 4).

forget(x > 2, x < 4); assume(x < 2); is(diff(f(x), x) > 0); return true, so f'(x) > 0 if x < 2, and thus, f is increasing on (-∞, 2).

The** maxima and minima (the respective plurals of maximum and minimum) of a function** are *the largest and smallest value of the function, either within a given range* (**the local or relative maximum and minimum**; you can visualize them as kind of local hills and valleys), *or on the entire domain* (**the global or absolute maximum and minimum**).

Formally, a function f defined on a domain D has a **global or absolute maximum point** at x* if f(x*) ≥ f(x) ∀ x in D. Similarly, the function f has a** global or absolute minimum point** at x* if f(x*) ≤ f(x) ∀ x in D.

f is said to have **a local or relative maximum point** at x* if there exists an interval (a, b) containing c such that f(c) is the maximum value of f on (a, b) ∩ D: f(x*) ≥ f(x) ∀ x in (a, b) ∩ D. f is said to have **a local or relative minimum point** at x* if there exists an interval (a, b) containing c such that f(c) is the minimum value of f on (a, b) ∩ D: f(x*) ≤ f(x) ∀ x in (a, b) ∩ D.

The **critical points** of a function f are the x-values, within the domain (D) of f for which f'(x) = 0 or where f' is undefined. Notice that the sign of f' must stay constant between two consecutive critical points. **If the derivative of a function changes sign around a critical point, the function is said to have a local or relative extremum (maximum or minimum) at that point.** If f' changes sign from positive (increasing function) to negative (decreasing function), the function has a local or relative maximum at that critical point. Similarly, if f' changes sign from negative to positive, the function has a local or relative minimum. In our example, 2 is a local maximum (f'(x) > 0 in (-∞, 2), f'(x) < 0 in (2, 4)) and 4 is a local minimum (f'(x) < 0 in (2, 4), f'(x) > 0 in (4, +∞))

*If a function has a critical point for which f′(c) = 0 and the second derivative is positive (f''(c) > 0), then c is a local or relative minimum. Analogously, if the function has a critical point for which f′(c) = 0 and the second derivative is negative (f''(c) < 0), then c is a local or relative maximum*. If f'(c)=0 and f''(c)=0 or if f''(c) does not exist, then this second test is inconclusive.

*diff(f(x),x, 2);* will yield the second derivative of f(x). *ev(%o71, x = 2);* returns -2 (it evaluates the second derivate in x=2) < 0 where %o71 is the output expression of factor(%); (factor the second derivate) so 2 is a local maximum. ev(%o71, x = 4); returns 2 > 0 where %o71 is the output expression of factor(%); (factor the second derivate) so 4 is a local minimum.

An **asymptote** is a *line such that the distance between the graph and the line approaches zero* as one or both of the x or y coordinates tends to infinity. In other words, it is a line that the graph approaches (it gets closer and closer, but never actually reach) as it heads or goes to positive or negative infinity.

y = 2 is a horizontal asymptote of the graph of the function y =

**Vertical asymptotes** are *vertical lines* (perpendicular to the x-axis) of the form x = a (where a is a constant) *near which the function grows without bound*. The line x = a is a vertical asymptote if:

We use the following commands: limit(f(x), x, -3, plus); (this is the limit from the right side of -3) returns -∞ and limit(f(x), x, -3, minus); (this is the limit from the left side of -3) returns ∞ so x = -3 is a vertical asymptote of the graph of the function y = ^{(2x2 -5x +3)}⁄_{(x2 +2x - 3)}. Notice that our complicated function ^{(2x2 -5x +3)}⁄_{(x2 +2x - 3)} could be simplified as ^{(2x -3)}⁄_{(x + 3)} (*factor(f(x);*)

When a linear asymptote is not parallel to the x- or y-axis, it is called an **oblique asymptote**. A function f(x) is asymptotic to y = mx + n (m ≠ 0) if:

^{(3x2 -4x +9)}⁄_{(x - 7)} is asymptotic to y = 3*x + 17. x = 7 is a vertical asymptote, too.

We use WolframAlpha and Python:

user@pc:~$ python Python 3.8.6 (default, May 27 2021, 13:28:02) [GCC 10.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sympy import * # SymPy is a Python library for symbolic mathematics. >>> from sympy.plotting import plot # This class permits the plotting of sympy expressions using numerous backends (matplotlib, textplot, pyglet, etc.) >>> x = symbols('x') # SymPy variables are objects of Symbols class >>> expr = (3*x**2-4*x+9)/(x-7) # Define our function as expr >>> limit(expr/x, x, oo) # Limits are easy to use in Sympy. We compute the limit of expr/x when x increases without bound (∞) 3 # m = 3 >>> limit(expr-3*x, x, oo) 17 # n = 17.^{(3x2 -4x +9)}⁄_{(x - 7)}is asymptotic to y = 3*x + 17. >>> limit(expr, x, 7, '+') # Calculate the limit from the right side of 7 oo >>> limit(expr, x, 7, '-') # Calculate the limit from the left side of 7 -oo # x = 7 is a vertical asymptote >>> p1 = plot(expr, (x, -1, 15), ylim=[-150,150], show=False) # We plot our function ... >>> p2 = plot(3*x+17, (x, -1, 15), ylim=[-150,150], line_color="#ff0000", show = False) # and the oblique asymptote, too >>> p1.append(p2[0]) >>> p1.show()

A function is called **convex** if the *line segment or chord between any two points A*_{1}, A_{2}, on the graph of the function lies above the graph between the two points (the midpoint B lies above the corresponding point A_{0} of the graph of the function). f is concave if -f is convex.

Suppose that f is a twice differentiable function defined on an interval [a, b].

**Inflection points** are *where the function changes concavity (from being concave to convex or vice versa)*. *So the second derivative must be equal to zero to be an inflection point*. However, we have to make sure that the concavity actually changes at that point.

Let's plot x^{4} -4x^{2} with Maxima. Firstly, we define the function: *f(x):= x^4 -4*x^2;* Secondly, we plot it:* wxplot2d([f(x)], [x,-3,3], [y,-5,5]);* Then, we ask Maxima for the second derivate (*diff(f(x), x, 2));* returns 12x^{2} -8), plot it (wxplot2d([12*x^2-8], [x,-3,3]);) and calculate its roots (solve(diff(f(x), x, 2)); returns ± sqrt(2/3)).

assume(x> sqrt(2/3)); is(diff(f(x), x, 2) > 0) returns true. f''(x) > 0, f is convex in (√2/3, ∞).

ev(%o25, x = 0); (%o25 is the output expression of diff(f(x), x, 2);) return -8 < 0. f''(x) < 0, f is concave in [-√2/3, √2/3].

ev(%o25, x = -2); returns 40. f''(x) > 0, f is convex in (-∞, -√2/3) and √2/3 and -√2/3 are **two inflection points because concavity actually changes on them.**

Let's plot another function (5*x^{3} +2x^{2} -3x) with Google and Python (SymPy).

user@pc:~$ python Python 3.8.6 (default, May 27 2021, 13:28:02) [GCC 10.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sympy import * # SymPy is a Python library for symbolic mathematics. >>> from sympy.plotting import plot # This class permits the plotting of sympy expressions using numerous backends (matplotlib, textplot, pyglet, etc.) >>> x = symbols('x') # SymPy variables are objects of Symbols class >>> expr = 5*x**3 +2*x**2 -3*x # Define our function as expr = 5*x^{3}+2x^{2}-3x >>> diff(expr, x, 2) # Calculate the second derivative with respect to x 2*(15*x + 2) # f'(x) = 15*x^{2}+4x^{}-3, f''(x) = 30x + 4 >>> deriv = diff(expr, x, 2) >>> solve(deriv) [-2/15] #^{-2}⁄_{15}is a plausible inflection point. We need to study the sign of f'' in (-∞,^{-2}⁄_{15}), (^{-2}⁄_{15}, +∞) >>> deriv.subs(x, 0) # subs evaluates the expression "deriv" (f'') at x = 0. 4 # f''(0) > 0, f is convex in (^{-2}⁄_{15}, +∞) >>> deriv.subs(x, -1) -26 # f''(-1) < 0, f is concave in (-∞,^{-2}⁄_{15}) and^{-2}⁄_{15}is an inflection point because concavity actually changes on it. >>> plot(expr, (x, -2, 2), ylim=[-2, 2]) # We plot our function

Let's try to visualize Covid data with gnuplot. I went to kaggle.com and downloaded a dataset with this format:

Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths/100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region

Afghanistan,36263,1269,25198,9796,106,10,18,3.5,69.49,5.04,35526,737,2.07,Eastern Mediterranean

Albania,4880,144,2745,1991,117,6,63,2.95,56.25,5.25,4171,709,17.0,Europe ...

First, we are going to **clean up our dataset**: *cut -d, -f "1 2 3" country_wise_latest.csv > myCovid.csv *. The cut command is an utility program that allows you to cut data from a text file. Cutting by column is super easy, you first need to decide a delimiter, default is tab (we are going to override it by providing the "-d" option as -d,), then you need to specify column numbers with -f option ('1 2 3', Country, Confirmed, and Deaths).

Country/Region,Confirmed,Deaths

France,220352,30212

US,4290259,148011

India,1480073,33408

Italy,246286,35112

Russia,816680,13334

Spain,272421,28432

UK,301708,45844

Next, we are going to plot our data with gnuplot:

**set datafile separator ','** # It tells gnuplot that data fields in our input file myCovid.csv are *separated by ',' rather than by whitespace*

**set title "Covid visualization" font "Helvetica,18"** # It produces *a plot title* that is centered at the top of the plot.

**set key autotitle columnheader** # It causes the first entry in each column of input data (Confirmed, Deaths) to be interpreted as a text string and used as a title for the corresponding plot.

**plot for [i=2:3] "myCovid.csv" using i:xticlabels(1) ps 2 pt 7** # Plot using the 2nd and 3rd column; xticlabels(1) means that the 1st column of the input data file is to be used to label x-axis tic marks. Besides, ps 2 will generate points of twice the normal size and pt 7 indicates to use circular points.

We can also join our data points with lines: set style line 1 linecolor rgb '#ff0000' linetype 1 linewidth 2 pointtype 7 pointsize 2; *plot for [i=2:3] "myCovid.csv" using i:xtic(1) with linespoints linestyle 1*

Another common task is to create a histogram with gnuplot:

**set style data histogram** # Set the style to histograms

**set style fill solid border -1
plot for [i=2:3] "myCovid.csv" using i:xtic(1)** # Plot using the 2nd and 3rd column; xticlabels(1) means that the 1st column of the input data file is to be used to label x-axis tic marks.

Let's plot our data with LibreOffice Calc.

**Plotly is an open-source, interactive data visualization library for Python.** Plotly Express is the easy-to-use, high-level interface to Plotly.

user@pc:~$ python Python 3.8.6 (default, May 27 2021, 13:28:02) [GCC 10.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>import plotly.express as px # Requirements: pip install pandas plotly >>>df = px.data.gapminder() # We are going to load the built-in gapminder dataset from plotly >>> df.query("country=='Spain'") # We can filter data by country country continent year lifeExp pop gdpPercap iso_alpha iso_num 1416 Spain Europe 1952 64.940 28549870 3834.034742 ESP 724 1417 Spain Europe 1957 66.660 29841614 4564.802410 ESP 724 1418 Spain Europe 1962 69.690 31158061 5693.843879 ESP 724 1419 Spain Europe 1967 71.440 32850275 7993.512294 ESP 724 1420 Spain Europe 1972 73.060 34513161 10638.751310 ESP 724 1421 Spain Europe 1977 74.390 36439000 13236.921170 ESP 724 1422 Spain Europe 1982 76.300 37983310 13926.169970 ESP 724 1423 Spain Europe 1987 76.900 38880702 15764.983130 ESP 724 1424 Spain Europe 1992 77.570 39549438 18603.064520 ESP 724 1425 Spain Europe 1997 78.770 39855442 20445.298960 ESP 724 1426 Spain Europe 2002 79.780 40152517 24835.471660 ESP 724 1427 Spain Europe 2007 80.941 40448191 28821.063700 ESP 724 >>> spain_data = df.query("country=='Spain'") >>> fig = px.bar(spain_data, x='year', y='pop', color='lifeExp', labels={'pop': 'Population of Spain'}, height=400, template='plotly_dark')

**Create a bar chart** showing the population of Spain (spain_data, y='pop', pop is short for population) over 57 years (x='year', 1950-2007, the gapminder dataset has been updated to 2007) and using color gradient for lifeExp (color='lifeExp')

>>> fig.show()

As you can see in the graph below, Spain has seen an increase in both population (28.5 - 40.4) and life expectancy (64.9-80.9) in the last half century.

>>>fig = px.scatter(df.query("year==2007"), x="gdpPercap", y="lifeExp", size="pop", color="continent", hover_name="country", log_x=True, size_max=60) # Let's plot a bubble chart (a type of scatter plot in which a third dimension of the data is shown through the size of the bubbles): 1. We filter by the year 2007 (df.query("year==2007")). 2. We plot life expectancy versus GDP Per Capita (x="gdpPercap", y="lifeExp"). 3. Let's color the bubbles by continent (color="continent"). 4. The size of the bubbles is set with the size parameter (size="pop", population). >>>fig.show()

Obviously, people in rich countries live longer than people in poor countries. There’s a strong relationship between GDP and life expectancy.

Let's create a choropleth map using Plotly. A **choropleth map** *displays divided geographical areas or regions that are coloured or patterned in relation or proportion to a statistical variable*.

>>>fig = px.choropleth(df, locations='iso_alpha', # Countries as defined in the Natural Earth dataset. Natural Earth Vector draws boundaries of countries according to defacto status. color='lifeExp', # lifeExp is short for life expectancy, a column of gapminder. hover_name='country', # When hovering over the country, a square will show up with the name of the country, year, life expectancy and location as a three-letter ISO country code. animation_frame='year', # It allows us to animate and add the play and stop button under the map. color_continuous_scale=px.colors.sequential.Plasma, # We can set a color scheme. projection='natural earth') >>>fig.show()

Let's plot e^{-(x2+y2)} in WolframAlpha (*plot e^-(x^2+y^2) x=-3..3, y=-3..3*) and Google. It is as simple as typing the function into your browser!

We are going to ask Maxima for **a 3d-plot and contour lines of the two variables function sin(x ^{2})+cos(y^{2})**:

wxplot3d(f(x), [x,-%e,%e], [y,-%e,%e]); # Plots sin(x

wxcontour_plot(f(x), [x,-%e,%e], [y,-%e,%e]); # Plots contour lines of sin(x

Let's ask Maxima to create a surface plot and projected contour plot under it:

**mypreamble : "set contour; set cntrparam levels 20; unset key";** # Define a preamble with some commands: set contour (enable contour drawing for surfaces); set cntrparam levels 20 (set the number of contour levels to 20), unset key (ask Maxima for the plot to be drawn with no visible key).

**wxplot3d(x*sin(y)+y*sin(x), [x, -6, 6], [y, -6, 6], [gnuplot_preamble, mypreamble]);** # Plots x*sin(y)+y*sin(y) within the ranges [-6, 6] and [-6, 6], and using my predefined preamble.

Let's plot e^{-1⁄(x2+y2)} with gnuplot:

user@pc:~$ gnuplot G N U P L O T Version 5.2 patchlevel 8 last modified 2019-12-01 gnuplot> set hidden3d # The set hidden3d command enables hidden line removal for surface plotting. gnuplot> set xlabel 'x' # Set the label for the x-axis. gnuplot> set xlabel 'y' # Set the label for the y-axis. gnuplot> set title "exp (-1/(x**2 + y**2)" gnuplot> set isosamples 80 # Set a higher sampling rate. It will produce more accurate plots, but will take longer. gnuplot> set contour # It enables contour drawing for surfaces. gnuplot> splot [-3:3] [-3:3] exp(-1 /(x**2 + y**2)) # Plot e^{-1⁄(x2+y2)}withing the ranges [-3:3] and [-3:3].

user@pc:~$ gnuplot

G N U P L O T

Version 5.2 patchlevel 8 last modified 2019-12-01

gnuplot> **set hidden3d** # The set hidden3d command enables hidden line removal for surface plotting.

gnuplot> **set parametric **# *It changes the meaning of splot from normal functions to parametric functions*.

gnuplot> set xlabel 'x' # Set the label for the x-axis.

gnuplot> set xlabel 'y' # Set the label for the y-axis.

gnuplot> set title "cos(u)cos(v), sin(u)cos(v), sin(v)"

gnuplot> set isosamples 80 # Set a higher sampling rate. It will produce more accurate plots, but will take longer.

gnuplot> **set contour** # It enables contour drawing for surfaces.

gnuplot> **splot cos(u)*cos(v), sin(u)*cos(v), sin(v)** # Plot cos(u)*cos(v), sin(u)*cos(v), sin(v).

The post Plotting functions II first appeared on JustToThePoint.

]]>The post Plotting functions first appeared on JustToThePoint.

]]>A **function** is *a relation or rule that defines a relationship between one variable* (the independent variable) *and another variable* (the dependent variable). A function is linear if it can be expressed in the form f(x)=mx + b. Its graph is a straight line, m is the slope, and b is the value of the function where x equals zero (it is also the point where the graph crosses the y-axis).

Gnuplot is a free, command-driven, interactive plotting program. The *set grid* command allows grid lines to be drawn on the plot. The *set xzeroaxis* command draws a line at y = 0. set* xrange [-5:5]* and *set yrange [-60:40]* set the horizontal and vertical ranges that will be displayed. plot and splot are the primary commands in Gnuplot, e.g., plot sin(x)/x, splot sin(x*y/20), or plot [-5:5] x**4-6*x**3+5*x**2+24*x-36, where x**4 is x^{4} and -6*x**3 is -6*x^{3}.

set terminal wxt size 400,250 enhanced font 'Verdana,12' persist # Useset terminalto tell gnuplot what kind of output to generate. The wxt terminal device generates output in a separate window. set border linewidth 1.5 #set bordercontrols the display of the graph borders for the plot and splot commands. set style line 1 linecolor rgb '#0060ad' linetype 1 linewidth 2 #set style linedefines a set of line types and widths and point types and sizes so that you can refer to them later by an index instead of repeating all the information at each invocation. set style line 2 linecolor rgb '#dddd1f' linetype 1 linewidth 2 set xzeroaxis linetype 3 linewidth 2.5 # It draws the x axis. set xlabel 'x' # Set the label for the x-axis set ylabel 'y' # Set the label for the y-axis set key top left # The set key enables a key (or legend) describing plots on a plot. f(x)=3*x+5 # We define two linear functions g(x)=4*x plot f(x) title '3*x+5' with lines linestyle 1, \ # We plot f(x) using linestyle 1 g(x) title '4*x' with lines linestyle 2 # and g(x) using linestyle 2

**Gnuplot can be run interactively** (user@pc:~$ **gnuplot**. The environment opens in the console and you can add your commands after gnuplot>. For instance, type *load 'myPlot.gnu'*) **or from script files** (ascii files that have commands written out just as you would enter them interactively): *gnuplot myPlot.gnu*.

**These are concurrent lines** (they intersect each other exactly at one point) and the point of concurrence is (5, 20) (3*x+5 = 4*x; -x=-5; x = 5).

1. We use a x-vector to store the x-values: x=-10:0.1:10; (start -10:step 0.1:end 10, it defines a sequence from start to end by adding increments of step). 2. Define our two functions: y = x.*2; z=x.*2 + 10; **We use element by element operations** (x.*2, x.*2 + 10) on the x-vector to store the functions values in y-vector and z-vector. 3. Finally, we use the command plot(x, y) to plot x*2 or plot(x, y, x, z) to plot both functions.

Observe that *both lines are parallel because they have the same slope*. They have different y-intercepts (or y-value, an (x,y) point with x=0) so they are not coincident lines.

A quadratic function is a type of polynomial function. It has the form f(x) = ax^{2} + bx + c where a, b, and c are numbers with "a" not equal to zero. Its graph is a symmetric U-shaped curve called a parabola.

set terminal wxt size 400,250 enhanced font 'Verdana,11' persist # Useset terminalto tell gnuplot what kind of output to generate. The wxt terminal device generates output in a separate window. set border linewidth 1.5 #set bordercontrols the display of the graph borders for the plot and splot commands. set style line 1 linecolor rgb '#0060ad' linetype 1 linewidth 2 #set style linedefines a set of line types and widths and point types and sizes so that you can refer to them later by an index instead of repeating all the information at each invocation. set style line 2 linecolor rgb '#dddd1f' linetype 1 linewidth 2 set xzeroaxis linetype 3 linewidth 2 # It draws the x axis. set xlabel 'x' # Set the label for the x-axis set ylabel 'y' # Set the label for the y-axis set yrange [-40:40] The set yrange command sets the vertical range that will be displayed on the y axis. set xrange [-10:10] # The set xrange command sets the horizontal range that will be displayed on the x axis. set key top left # The set key enables a key (or legend) describing plots on a plot. f(x)=-x**2-5*x+6 # We define two quadratic functions g(x)=x**2+2*x+5 plot f(x) title '-x^2-5*x+6' with lines linestyle 1, \ # We plot f(x) using linestyle 1 g(x) title 'x^2+2*x+5' with lines linestyle 2 # and g(x) using linestyle 2

Trigonometric or circular functions are functions related to angles. The basic trigonometric functions include the following functions: sine (sin(x)), cosine (cos(x)), tangent (tan(x)), cotangent (cot(x)), secant (sec(x)) and cosecant (csc(x)).

octave:1> x=-10:0.1:10; # We use a x-vector to store the x-values: x=-10:0.1:10; (start -10:step 0.1:end 10, it defines a sequence from start to end by adding increments of step). octave:2> y= sin(x); # Define our sin function. When plotting in Octave the plot points are required to have their x-values stored in one vector and the y-values in another vector. octave:3> plot(x, y); # We use the command plot to plot functions in Octave octave:4> title("Sin(x)") # We can add titles to an existing plot. octave:5> xlabel("x") # We can add labels to an existing plot, too. octave:6> ylabel("y") octave:7>

The tangent function has vertical asymptotes whenever cos(x)=0 (the function is undefined): ^{-Π}⁄_{2}, ^{Π}⁄_{2}, ^{3*Π}⁄_{2}, etc.

The **exponential function** is the function *f(x)=e*^{x}, where e = 2.71828... is Euler's constant. More generally, an exponential function is a mathematical function of the following form: **f(x)=ab**^{x} where b is a positive real number (b > 0), b is not equal to 1 (b ≠ 1), and the argument x occurs as an exponent.

octave:1> x=-10:0.1:20; # We use a x-vector to store the x-values: x=-10:0.1:10; (start -10:step 0.1:end 20, it defines a sequence from start to end by adding increments of step). octave:2> clf; # clear the current figure window. octave:3> plot(x, exp(x), 'r;Exponential;'); # We use the command plot to plot e^{x}in Octave octave:4> title("The exponential function") # We add a title

Plotting functions in Google is as easy as it gets. Observe that 2^{x} (b > 1) doubles every time when we add one to its input x (exponential growth). On the contrary, (^{1}⁄_{2})^{x} (0 < b < 1) shrinks in half every time we add one to its input x (exponential decay). An exponential growth curve is rarely seen in nature. *Exponential growth is not sustainable because it depends on infinite amounts of resources which do not exist in the real world*.

The **natural logarithm** *of x is the power to which e would have to be raised to equal x*, e.g., ln 3.4 = 1.22377..., because e^{1.22377...} = 3.39998... It is the inverse function of the exponential function: e ^{lnx} = x. It slowly grows to +∞ as x increases, and slowly grows to -∞ as x approaches 0, so the y-axis is a vertical asymptote. In other words, the distance between the graph of the natural logarithm and x = 0 approaches zero as x gets closer to zero from the right.

log(x) represents the natural (base e) logarithm of x. However, **Maxima does not have a built-in function for the base 10 logarithm or other bases**. Do not worry my dear reader, for every problem there is a solution which is simple, clean and wrong -just joking.

We will use the change-of-base formula: log_{a} x = ^{logbx}⁄_{logba} so the base-2 logarithm (also called the binary logarithm) is equal to the natural logarithm divided by log(2). Therefore, we could use *log10(x):= log(x) / log(10);* and *log2(x):=log(x)/log(2);* as useful definitions (radcan(log2(8)); = 3, radcan(expr); simplifies expr).

We are going to plot |x|, √x, and sgn(x) using ZeGrapher, an open source, free, and easy to use math plotter. Install it, click on **Windows, Functions** or use the shortcut Ctrl + F, and type: sqrt(x) in the input box named "f(x)", abs(x) in the input box named "g(x)", and x/abs(x) in the input box named "h(x)".

A square root of a number x is a number "y" whose square is x: y^{2} = x. The domain of the square root function (y = sqrt(x)= √x) is all values of x such that x ≥ 0. The curve above looks like half of a parabola lying on its side.

The **absolute value or modulus** of x (it is represented by either abs(x) or |x| and read as "the absolute value of x") is *the non-negative value of x without regarding to its sign*. It is *its distance from zero* on the number line. Namely, |x| = x if x is positive, |x| = −x if x is negative, and |0| = 0.

The **sign of a real number**, also called sgn or signum, is -1 for negative numbers, 0 for the number zero, or +1 for positive numbers. Any real number can be expressed as the product of its absolute value and its sign function: x=|x| * sgn(x), that's why we use the formula x/abs(x) to plot it. It is an **odd** (the following equation holds for all x: -f(x) = f(-x)) **mathematical function**. Besides, it is **not continuous** at x = 0.

Finally, we are going to plot the cube root using Matplotlib. The **cube root of a number** x is a number "y" that we multiply by itself three times to get x or, more formally, is a number y such that y ^{3} = x. It is and **odd function** (its plot is *symmetric* with respect to the coordinate origin).

import numpy as np # NumPy is a library for array computing with Python import matplotlib.pyplot as plt # Matplotlib is a plotting library for the Python programming language X = np.arange(-10, 10, 0.01) # Return evenly spaced numbers within the interval [-10, 10] as the values on the x axis. def p(x): # This is the function that we are going to plot. return np.cbrt(x) # Return the cube-root of an array, element-wise. F = p(X) # The corresponding values on the y axis are stored in F. plt.ylim([-4, 4]) # Set the y-limits on the current axes. plt.plot(X, F, color='r') # These values x, y are plotted using the mathlib's plot() function. plt.xlabel('x') # Set the label for the x-axis. plt.ylabel('y') # Set the label for the y-axis. plt.title('Cube root') # Set a title for the plot plt.axvline(0) # Add a vertical line across the Axes plt.axhline(0) # Add a horizontal line across the Axes plt.show() # The graphical representation is displayed by using the show() function.

A **piecewise-defined function** is a function *defined by two or more sub-functions, where each sub-function applies to a different interval in the domain*.

Let's plot a piecewise-defined function in Maxima. We are going to use Maxima's command *charfun(p);* It returns 0 when the predicate p evaluates to false; and 1 when the predicate evaluates to true.

*f(x):= charfun(x<2)*x^2 + charfun(x>=2)*4;* is how we define piecewise-define functions in Maxima. For all values of x less than two, the first function (x^{2}) is used. For all values of x greater than or equal to two, the second function (4) is used.

Our function is *continuous* (it is continuous at every point in its domain; in other words, it does not have any abrupt changes in value -a function without breaks or gaps) because its constituent functions (x^{2}, 4) are continuous on their corresponding intervals or subdomains ( -∞, 2), [2, +∞) and there is no discontinuity at 2.

Now we will plot another piecewise defined function with Octave:

x=-5:0.01:-1; y=-x; plot(x,y, "linewidth", 4), hold on # 1. We use a x-vector to store the x-values: x=-5:0.01:-1. 2. Define our sub-function: y = -x for x in [-5, -1] 3. Plot it. 4. And hold on, so we ask Octave to retain plot data and settings so that subsequent plot commands are displayed on a single graph. x=-1:0.01:0; y=x.^2; plot(x,y, "linewidth", 4), # We define the sub-function: y = x^{2}for x in [-1, 0] and plot it. We set line width to 4. x=0:0.01:5; y=sqrt(x); plot(x,y, "linewidth", 4) # We define the sub-function: y = √x for x in [0, 5] and plot it.

Finally, we will plot the following piecewise-defined function with Python using the library Mathplotlib.

#!/usr/bin/env python import numpy as np # NumPy is a library for array computing with Python import matplotlib.pyplot as plt # Matplotlib is a plotting library for the Python programming language import math def piecewise (x): # This is our piecewise-defined function if x <= -1: return -x+2 elif -1 <= x <= 0: return pow(x, 2) else: return math.log(x) vpiecewise = np.vectorize(piecewise) # The purpose of np.vectorize is to transform functions which are not numpy-aware into functions that can operate on (and return) numpy arrays. x = np.linspace(-5, 5, 10000) # Return evenly spaced numbers over [-5, 5] as the values on the x axis. y = vpiecewise(x) # Y values on the y axis. plt.axvline(0) # Add a vertical line across the Axes plt.axhline(0) # Add a horizontal line across the Axes plt.plot(x, y, color='r') # These values x, y are plotted using the mathlib's plot() function. plt.show() # The graphical representation is displayed by using the show() function. The function is continuous at all points except where x=0.

The post Plotting functions first appeared on JustToThePoint.

]]>The post Polynomials, Equations, and Systems of equations first appeared on JustToThePoint.

]]>A polynomial is an expression consisting of variables or indeterminates and coefficients, that involves only the operations of addition, subtraction, multiplication, and non-negative integer exponentiation of variables. If there's a single indeterminate x, a polynomial can always be rewritten as a_{n}x^{n} + a_{n-1}x^{n-1} +a_{n-2}x^{n-2} + a_{1}x + a_{0.}

In general, if you want to factorize a polynomial, you may need to extract a common factor (x^{2} + x = x(x + 1), 6*x^{4} -12*x^{3} +4*x^{2} = 2*x^{2}* (3*x^{2} -6*x + 2)), use remarkable formulas (25x^{2} -49 = (5x)^{2}-(7)^{2}=(5x-7)(5x+7) as we are using (a+b)(a-b)=a^{2} -b^{2}) or divide the polynomial between (x-a) by Ruffini method.

The roots or zeroes of a polynomial are those values of the variable that cause the polynomial to evaluate to zero. In other words, the root of a polynomial p(x) is that value "a" that p(a) = 0.

vim factor.py

import sympy x = sympy.Symbol('x') # SymPy variables are objects of the Symbol class. polynomial = x**4 -6*x**3 +5*x**2 +24*x -36 # polynomial = x^{4}-6*x^{3}+5*x^{2}+24*x -36 print(sympy.solve(polynomial)) # sympy.solve algebraically solves equations and polynomials. The roots of polynomial are [-2, 2, 3] print(sympy.factor(polynomial)) # sympy.factor takes a polynomial and factors it into irreducible factors polynomial2 = 2*x**4+x**3-8*x**2-x+6 print(sympy.solve(polynomial2)) print(sympy.factor(polynomial2)) polynomial3 = x**4-16 print(sympy.solve(polynomial3)) print(sympy.factor(polynomial3)) import numpy as np # NumPy is a library for array computing with Python import matplotlib.pyplot as plt # Matplotlib is a plotting library for the Python programming language X = np.linspace(-5, 5, 50, endpoint=True) # Return evenly spaced numbers over the interval [-5, 5] as the values on the x axis. def p(x): return x**4 -6*x**3 +5*x**2 +24*x -36 # This is the polynomial that we are going to plot. F = p(X) # The corresponding values on the y axis are stored in F. plt.ylim([-60, 40]) # Set the y-limits on the current axes. plt.plot(X, F) # These values are plotted using the mathlib's plot() function. plt.show() # The graphical representation is displayed by using the show() function.

An **equation is a statement that says that two mathematical expressions are equal**. An algebraic equation will always have an equal "=" sign and at least one unknown number or quantity, called a variable, represented by a letter (x, y, or z). Some examples of equations are 7x = 12, 2x + 6 = 8, x^{2} + 4*x + 4 = 0, etc. Solving an equation means finding the value of the unknown number, quantity or variable. The degree of an equation is the highest exponent to which the unknown or unknows are elevated.

The solution is: x = ^{-b± √b2 -4*a*c}⁄_{2*a} = ^{0± √02 -4*1*4}⁄_{2*1} = ^{0± √0 -16}⁄_{2} = ^{0± 4i}⁄_{2}= 0±2i. expand((x-2%i)*(x+2%i)); returns x^{2} +4. There are no real roots and the graph does not cross or touch the x-axis. Its graph tells us that the roots of the equation are complex numbers.

import sympy # Let's solve it in Python using SymPy x = sympy.Symbol('x') # SymPy variables are objects of the Symbol class. equation = x**2 +4 # Define the equation x^{2}+ 4 print(sympy.solve(equation)) # sympy.solve algebraically solves equations and polynomials. The roots of the equation are [-2i, 2i] print(sympy.factor(equation)) # We ask Python to factor our equation (x-2i)(x+2i), but fails. import numpy as np # NumPy is a library for array computing with Python import matplotlib.pyplot as plt # Matplotlib is a plotting library for Python X = np.linspace(-5, 5, 50, endpoint=True) # Return evenly spaced numbers over the interval [-5, 5] as the values on the x axis. def p(x): return x**2+4 # This is the equation that we are going to plot. F = p(X) # The corresponding values on the y axis are stored in F. plt.ylim([0, 30]) # Set the y-limits on the current axes. plt.plot(X, F) # These values are plotted using the mathlib's plot() function. plt.show() # The graphical representation is displayed by using the show() function.

A system of linear equations consists of two or more equations made up of two or more variables such that all equations are considered simultaneously. A **system of equations in two variables** consists or comprises of two linear equations of the form

a*x + b*y = c

d*x + e*y = f

The solution to a system of linear equations in two variables is any ordered pair (x, y) that satisfies each equation independently, the point at which the lines representing the linear equations intersect.

a*x + b*y = c

d*x + e*y = f

The solution to a system of linear equations in two variables is any ordered pair (x, y) that satisfies each equation independently, the point at which the lines representing the linear equations intersect.

user@pc:~$ python Python 3.8.6 (default, May 27 2021, 13:28:02) [GCC 10.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sympy >>> x = sympy.Symbol('x') >>> y = sympy.Symbol('y') >>> sympy.solve([3*x+2*y-7, 8*x-6*y+4]) {x: 1, y: 2} >>> sympy.solve([3*x+2*y-7, 9*x+6*y-21]) {x: 7/3 - 2*y/3} >>> sympy.solve([3*x+2*y-7, 6*x+4*y-15]) []

An inequality is a relation that makes a non-equal comparison between two numbers or mathematical expressions, e.g., -3x-7<2x-5, 5y-2>14, etc.

You can always solve the equation x^2 - 7*x -10 = (x-2)(x-5) and prove that x^2 - 7*x -10<= 0 between its roots. (-∞ 2) x^2 - 7*x -10 > 0; [2, 5] x^2 - 7*x -10 <= 0; (5, -∞) x^2 - 7*x -10 > 0

The post Polynomials, Equations, and Systems of equations first appeared on JustToThePoint.

]]>