What’s the Big Deal about Python?

February 8, 2016
In research and data analysis, a lot of work is routine and repetitive. This is the perfect place for automation: where we tell our computers to do things. How? We write instructions that computers understand—or “programming.”

Why Would I Learn Programming at All?

Programming is a skill. With it, you can accomplish a lot more in your research. Many scientific projects that require a lot of human labor can be done quickly with a computer program—as long as the task is routine and repetitive. Web scraping and visualization tools can be used (and re-used) with a few lines written on your computer. If you conduct scientific experiments and collect quantitative or qualitative data, you can use programming to run statistical analyses efficiently, reproducibly, and error free.

Why Python?

Different programming softwares (i.e. “languages”) have three big variables:

  • Language purpose
  • Learnability
  • Community

Some languages are domain-specific: built for a specific field of work (like IDL for astronomers). Your discipline may have such a software. General-purpose languages, like C or Java, have all the power to complete any task, but you might have to build the tools yourself. Python is the best of both worlds—the functionality of a general-purpose language but with different “packages” for different disciplines (scientific computinggeospatial analysistext processing, etc). When using Python, you can import the necessary packages and already be headfirst into your research.

Not all languages are equally learnable, especially as a first language. For new programmers, notation makes a big difference on learnability: C- and Java-style syntax are not significantly easier to learn than programming languages with random syntax. Python is friendly to novices. Even the undergraduate introductory computer science class is in Python. Most languages can do what you need, but effiency depends on how quickly you can learn it.

The user community of each language makes a big difference when learning to program. Do people in your field use this language? Are there individuals actively improving the language? This is partially a popularity contest: the relevance competition of programming languages is driven by who uses the language. Languages that are privately developed (like Matlab for engineering) struggle to fund development and charge licensing fees to balance budgets. Alternatively, languages that are open-source are actively developed by volunteers. Python is the fifth most popular programming language and is open source. It maintains an active online community and sharing culture, especially within its different “packages.” (The Scientific Python community holds a conference every year for academics, industry folks, and governmental organizations to collaborate.)

If you have never programmed and are working on a research problem, Python is almost certainly the best language to try first.

How Do I Get Started?

There are many ways to start learning, including a variety of online resources. Regardless, learning with a support group (or a labmate in your department) is more effective. If you’re near the Berkeley campus, there are trainings at D-Lab and weekly working groups like The Hacker Within. Your department might have a computing center, like the Geospatial Innovation Facility. There’s even a full course about Python and data science.

In an effort to mentor novice programmers from all disciplines, I’m starting a weekly working group devoted to just that: Learning Python. Each Monday from 5:00 p.m. to 6:30 p.m. in 356 Barrows, we’ll dive into a new topic about programming in Python. There will be an emphasis on learning useful tools for science and social science researchers. All learners of all skill levels are welcome—we start Monday, February 8.

Learning how to code will change the way you do research. Python is a versatile, beginner-friendly software that is a great first programming language. There are many ways to get started, especially if you’re at or near UC Berkeley. And if you’re just getting started, stop by my working group on Monday afternoons.