Thursday, 16 April 2015

Python and R: Basic Sampling Problem

In this post, I would like to share a simple problem about sampling analysis. And I will demonstrate how to solve this using Python and R. The first two problems are originally from Sampling: Design and Analysis book by Sharon Lohr.

Problems

  1. Let $N=6$ and $n=3$. For purposes of studying sampling distributions, assume that all population values are known.

    $y_1 = 98$$y_2 = 102$$y_3=154$
    $y_4 = 133$$y_5 = 190$$y_6=175$

    We are interested in $\bar{y}_U$, the population mean. Consider eight possible samples chosen.

    Sample No.Sample, $\mathcal{S}$$P(\mathcal{S})$
    1$\{1,3,5\}$$1/8$
    2$\{1,3,6\}$$1/8$
    3$\{1,4,5\}$$1/8$
    4$\{1,4,6\}$$1/8$
    5$\{2,3,5\}$$1/8$
    6$\{2,3,6\}$$1/8$
    7$\{2,4,5\}$$1/8$
    8$\{2,4,6\}$$1/8$

Friday, 6 March 2015

Probability Theory: Convergence in Distribution Problem

Let's solve some theoretical problem in probability, specifically on convergence. The problem below is originally from Exercise 5.42 of Casella and Berger (2001). And I just want to share my solution on this. If there is an incorrect argument below, I would be happy if you could point that to me.

Problem

Let $X_1, X_2,\cdots$ be iid (independent and identically distributed) and $X_{(n)}=\max_{1\leq i\leq n}x_i$.
  1. If $x_i\sim$ beta(1,$\beta$), find a value of $\nu$ so that $n^{\nu}(1-X_{(n)})$ converges in distribution;
  2. If $x_i\sim$ exponential(1), find a sequence $a_n$ so that $X_{(n)}-a_n$ converges in distribution.

Solution

  1. Let $Y_n=n^{\nu}(1-X_{(n)})$, we say that $Y_n\rightarrow Y$ in distribution. If $$\lim_{n\rightarrow \infty}F_{Y_n}(y)=F_Y(y).$$ Then, $$ \begin{aligned} \lim_{n\rightarrow\infty}F_{Y_n}(y)&=\lim_{n\rightarrow\infty}P(Y_n\leq y)=\lim_{n\rightarrow\infty}P(n^{\nu}(1-X_{(n)})\leq y)\\ &=\lim_{n\rightarrow\infty}P\left(1-X_{(n)}\leq \frac{y}{n^{\nu}}\right)\\ &=\lim_{n\rightarrow\infty}P\left(-X_{(n)}\leq \frac{y}{n^{\nu}}-1\right)=\lim_{n\rightarrow\infty}\left[1-P\left(-X_{(n)}> \frac{y}{n^{\nu}}-1\right)\right]\\ &=\lim_{n\rightarrow\infty}\left[1-P\left(\max\{X_1,X_2,\cdots,X_n\}< 1-\frac{y}{n^{\nu}}\right)\right]\\ &=\lim_{n\rightarrow\infty}\left[1-P\left(X_1< 1-\frac{y}{n^{\nu}},X_2< 1-\frac{y}{n^{\nu}},\cdots,X_n< 1-\frac{y}{n^{\nu}}\right)\right]\\ &=\lim_{n\rightarrow\infty}\left[1-P\left(X_1< 1-\frac{y}{n^{\nu}}\right)^n\right],\;\text{since}\;X_i's\;\text{are iid.} \end{aligned} $$

Thursday, 26 February 2015

R: How to Layout and Design an Infographic

As promised from my recent article, here's my tutorial on how to layout and design an infographic in R. This article will serve as a template for more infographic design that I plan to share on future posts. Hence, we will go through the following sections:
  1. Layout - mainly handles by grid package.
  2. Design - style of the elements in the layout.
    • Texts - use extrafont package for custom fonts;
    • Shapes (lines and point characters) - use grid, although this package has been removed from CRAN (as of February 26, 2015), the compressed file of the source code of the package is still available. But if I am not mistaken, by default this package is included in R. You might check it first before installing.
    • Plots - several choices for plotting data in R: base plot, lattice, or ggplot2 package.

The Infographic

We aim to obtain the following layout and design in the final output of our code: