Milwaukee Data Science: Next Steps

The purpose of this Meetup is to provide a safe, pleasant, and convenient venue (physical and virtual) to facilitate the growth and nurturing of the Data Science community in Southeastern Wisconsin.

Since Data Science covers the entirety of all human endeavors, this group serves to facilitate dialogue and discussion between all realms of mastery. Strategic advisers, managers, investors, health-care administrators, and quantitative analysts will feel equally at home here, as will mathematicians, statisticians, ecologists, biologists, and social-scientists. All realms of mastery are invited and welcomed to join.

Given the limitless application of technology here, all members of the Information Technology field are welcomed to join. All roles are welcome ranging from technical (developers, architects, system and devops administrators) to project managers and business analysts.

Experts and neophytes with interest in particular languages, environments, frameworks, and technologies have a home here. Be it R, Python, Java, or Octave, there is a place for everyone to learn and share. Our doors are equally open to practitioners applying specific technological offerings for every industry and platform.

With a laid back approach open to all ideas, this group will reflect the contributions and participation of its members, in whatever form time and resources permit.

How to read a 1 GiB file into memory in R

Here is the start of a small and good discussion on how to read large data
sets into many. They appears frequently on the list, but this one seemed to
touch upon a lot of the recommended approaches:

ProPublica’s Guide to Mechanical Turk

ProPublica’s Guide to Mechanical Turk seems to definitely be worth a read.

quantmod Quantitative Financial Modelling & Trading Framework for R

The quantmod package for R is designed to assist the quantitative trader in the development, testing, and deployment of statistically based trading models.
What quantmod IS
A rapid prototyping environment, where quant traders can quickly and cleanly explore and build trading models.
What quantmod is NOT
A replacement for anything statistical. It has no ‘new’ modelling routines or analysis tool to speak of. It does now offer charting not currently available elsewhere in R, but most everything else is more of a wrapper to what you already know and love about the language and packages you currently use.
quantmod makes modelling easier by removing the repetitive workflow issues surrounding data management, modelling interfaces, and performance analysis.

Nice example of converting wide to tall data with tidyr

Nice example.

> x <- structure(c(1961, 1961, 1961, 1961, 1, 1, 1, 1, 1, 2, 3
+         , 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
+         , 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27
+         , 28, 29, 30, 31, 32, 33, 34, 35, 36)
+     , .Dim = c(4L, 12L)
+     , .Dimnames = list(NULL, c("year", "month", "day", "A", "B", "C"
+         , "D", "E", "F", "G", "H", "I"))
+     )
> xdf <- as.data.frame(x)
> xdf
  year month day A B  C  D  E  F  G  H  I
1 1961     1   1 1 5  9 13 17 21 25 29 33
2 1961     1   2 2 6 10 14 18 22 26 30 34
3 1961     1   3 3 7 11 15 19 23 27 31 35
4 1961     1   4 4 8 12 16 20 24 28 32 36
> require(tidyr)
> require(dplyr)
> xdf %>% gather(station, discharge, -year, -month, -day)
   year month day station discharge
1  1961     1   1       A         1
2  1961     1   2       A         2
3  1961     1   3       A         3
4  1961     1   4       A         4
5  1961     1   1       B         5
6  1961     1   2       B         6
7  1961     1   3       B         7
8  1961     1   4       B         8
9  1961     1   1       C         9
10 1961     1   2       C        10
11 1961     1   3       C        11
12 1961     1   4       C        12
13 1961     1   1       D        13
14 1961     1   2       D        14
15 1961     1   3       D        15
16 1961     1   4       D        16
17 1961     1   1       E        17