Today I began learning Python and ended the day with a script that successfully automated a process for me. Nope, I’ve never written Python code before and don’t code for my day job. In fact, it’s been over 20 years since I first took an introductory computer science class. The most significant project I worked to completion was less than 500 lines of perl, an Electronic Data Interchange (EDI) translator for the odd purchase order format that our customer used. But I’ve laid out two goals for myself, work myself up to light scripting competence in Python and light analysis competence in R. OK, those two goals are over 10 years old, so what got me moving?
A few weeks ago, the online SMB IT community, Spiceworks, downsized quite a few members of it’s workforce, and a call went out to community members to help out with job listings for jobs, mostly Austin area. My employer has a building right next door, so I thought I could help out. We have an internal portal with Brassring which can provide customized alerts as jobs are listed, so I signed up for an Austin alert. It started showing up daily, but the web links went to a site which didn’t seem to use our single sign-on solution. Even if they did, I’d rather have provided people with publicly available links.
The emails did provide a job reference code, which I could copy/paste into our public career portal. That got me an intermediate search page with a title/link that I could click-through but not copy/paste with hyperlink intact. It was painful to do manually.
I did it a few times, but very early on I realized it was something I’d have to automate or I wouldn’t do it. Around this time, my wife and I listened to Reshma Saujani’s TED Talk and had an interesting discussion about methods of teaching and learning to code. I dusted off my, “Concepts and pattern of thinking are more important than syntax,” argument, directly channeling my CS1 professor. Which got me thinking, I’ve already learned lots of the 100-level concepts of programming. This was my chance to put theory into practice.
I wanted my workflow to look more like this:
Extract and manipulate text? I could dust off my perl skills, but I wondered if I could use the opportunity to motivate myself to learn Python, as I’d promised myself I’d do so long ago. VMware has SDKs for both languages, so that didn’t help me make a decision. I’d vaguely heard that Python’s libraries to scrape web sites were a bit easier to use than perl’s, and on that tenuous premise, started learning Python.
OK, I had a big advantage in learning Python. It’s a C-like language, not too different from perl. I didn’t have to learn the concept of cardinal numbers, arrays/lists, iteration, loops, or logic testing. I’ve gotten my mind around the idea of pointers, references, and object. Mostly it was a matter of learning Python’s syntax and the use of the specialized libraries. It was simple to learn to load Python libraries, but using the libraries themselves was as challenging as understanding the problem the library was designed to solve.
I’d start out copy/pasting the job list to an input file, read it, extract the job codes, search for them on the public job site, read the search results, extract the title and URL, then output it into a second file. Not too hard.
Well, I actually had to install Python first. Then learn how to open and parse a file and extract the job ID from each line. Opening a web page requires a library, so I learned how to include and use that library. Parsing the resulting web page required installing yet another library, BeautifulSoup, then learning how to use it to extract the exact elements in the results page I needed. Then writing my results to a second file.
Over and over again, I found that the process of learning Python matched my previous scripting experience. Having the core logic in pseudocode was key. Everything else was translating intention into valid code. Nothing took long to write. What took time was learning to do the exact thing I wanted in the Python idiom. Or learning the intended use of the BeautifulSoup library and how to navigate the data structure it created. Oh yeah, data structures, another thing I’d already gotten my mind around.
So, it works. I can copy long lists of jobs with generic titles and useless-to-me hyperlinks into a file, run the script, and get an HTML file with a list of specific job titles hyperlinked to the public-facing web site. I open each link in a tab to read the job description to see if it’s appropriate to send to the people who are looking.
I know there will be a part 2 to this post. Sanity checking my input for blank lines or missing job codes (if I make a mistake in copying). What if the job was pulled or filled since I got the email and the search results page comes back with no job? I’ll also explore what it would take to go from Python script to web app written in Python. I assume that’s what it would take to have expandable in-line job descriptions and the ability to dynamically remove a job entry in the results. Perhaps even email someone the results directly? I’d also like to explore how Python is used to interact with VMware products.
Anyone else using Python for casual automation? Web scraping in other contexts? Python with vSphere or other products in the VMware portfolio?
- Workflow1-Manual: JohnWhite
- Workflow2-Semi-Automated: JohnWhite