This is a nice overview of the problem and one or two solutions.
Milwaukee Data Science: Next Steps
The purpose of this Meetup is to provide a safe, pleasant, and convenient venue (physical and virtual) to facilitate the growth and nurturing of the Data Science community in Southeastern Wisconsin.
Since Data Science covers the entirety of all human endeavors, this group serves to facilitate dialogue and discussion between all realms of mastery. Strategic advisers, managers, investors, health-care administrators, and quantitative analysts will feel equally at home here, as will mathematicians, statisticians, ecologists, biologists, and social-scientists. All realms of mastery are invited and welcomed to join.
Given the limitless application of technology here, all members of the Information Technology field are welcomed to join. All roles are welcome ranging from technical (developers, architects, system and devops administrators) to project managers and business analysts.
Experts and neophytes with interest in particular languages, environments, frameworks, and technologies have a home here. Be it R, Python, Java, or Octave, there is a place for everyone to learn and share. Our doors are equally open to practitioners applying specific technological offerings for every industry and platform.
With a laid back approach open to all ideas, this group will reflect the contributions and participation of its members, in whatever form time and resources permit.
How to read a 1 GiB file into memory in R
Here is the start of a small and good discussion on how to read large data
sets into many. They appears frequently on the list, but this one seemed to
touch upon a lot of the recommended approaches:
- ColByCol
- Mentions true cost of loading a 1 GiB file with
read.table
, which is much
more than 1 GiB
- Mentions true cost of loading a 1 GiB file with
- R.filesets
- Large memory and out-of-memory data in High-Performance and Parallel Computing with R
- Use a real database with SQLite or sqldf
R graph gallery
Nirmala – Buddha at the Gas Pump Interview
Be compassionate
Be [compassionate], for every man is fighting a hard battle
— John Watson, in 1903
Some Lessons
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
Via r-help.
ProPublica’s Guide to Mechanical Turk
ProPublica’s Guide to Mechanical Turk seems to definitely be worth a read.
Reading Codebook Files in R
Here is a great discussion about how to automatically utilize codebooks in R via r-bloggers.