Entries in databases (4)

Thursday
Mar062014

An amusing story about databases and pants

I've found that my current career as an IT support guy has been rather rough on my clothes.  Crawling under desks, hauling computers and servers around, taking the old junk to storage, it's a lot for your typical polo and slacks to stand up to.  So a few years ago I switched to worker brands.  Carharts, Dickies, that sort of thing.  These last far longer, and they're passably business appropriate.  Well a pair recently started giving out, so I decided to give Duluth Trading company a try.  I make my order, and start eagerly awaiting delivery.

I tend to check delivery status every couple days for everything.  I don't know why, it's not like I can really do anything if there's a problem, and the crap I order is rarely important.  But I do, and the first few times I checked these everything was fine.  Then I check again and the status shows as "Exception", which I've never seen before.  It says to check the Shipment log for more details, so I scroll down and find this:

TRAIN DERAILMENT?!  That was certainly a new one for me.  Apparently this sort of thing isn't entirely unprecedented for poor Helena.  I checked the status again today, and it's now stuck in Kalispell, MT due to severe weather conditions.  Now there's certain evidence that suggests UPS might be more into logistics than I am, but I feel safe in suggesting that Montana during the winter may not be the best route.  It's anybody's guess when these pants will actually arrive now, but I hear they're pretty tough, so I'm not too worried.

So what does that have to do with databases?  Glad you asked!  As I've started to get more serious about programming, I've had databases on my mind more recently.  One of the first things you're likely to learn about database design is that you want to avoid duplicating data in a table wherever it's feasible to do so.  If you're tracking something like, say, delivery statuses, you don't want to type the same delivery status messages in for the various records.  It's more efficient to make a separate table (sometimes called a lookup table) where you enter the status once, then reference the ID number of that status message in the original table.  This is more efficient in terms of data storage, and it makes it easy to change the message later if that's ever necessary.  Changing the description in the lookup table effectively changes it everywhere because the other places are just saving the reference to that description, not the description itself.

While this makes logical sense, I found it to be extremely counter-intuitive when I was starting out.  The natural approach for me was to try to make the table look like a spreadsheet, where everything I wanted or needed was in a single table.  So there would be a status column with the statuses written in.  Fine for a small table, but what if you need to change the wording of the status?  What if you made a typo in some of them?  It's just too messy and inefficient.  Storing the reference ID makes the table harder to read in the raw form, but how often are you really going to do that?

So to tie this all together, that delivery status field is the perfect candidate for a lookup table, and as I was looking at it that first time, slightly in awe, I wondered if that was a recent addition, or if train derailment came up in a planning session or requirements meeting.  Now I'm not up on train derailment statistics, but it seems like a definite possibility that it hasn't happened since UPS started this online tracking system.  Looking a little closer, I noticed there's a period at the end of derailment, but not at the end of any of the other entries.  That makes me think it was added later.  And I belive my suspicion was confirmed when I checked later and saw a slightly more informative description in its place.  Pretty weird, huh?

Tuesday
Aug132013

Surgery Tracker Update

I've hit a minor snag with my surgery tracking web app.  I think the best way to describe the problem is as a gap between my plans and reality.  What I thought was a very well understood, thought out, simple plan fell apart when I actually tried to implement it.

This being the first even remotely complex thing I'm trying to write, I'm not surprised.  It's this kind of experience that makes one better.  It's just hard to come back from.  My first impulse is to just scrap the whole thing and try something else.  I'm not being paid for this, and it may not even be used once it is done.  However, what would I learn doing that?

As it stands, I am trying to rework thre database to meet the needs that now seem so obvious without destroying too much of the work I've already done.  Fortunately, most of that work was on other tables that won't need to change.

This tracker is basically 3 database tables with a php-driven, form-based front end.  Two of the tables are just lookup tables, one being of the doctors performing the surgeries, the other being the surgery codes and descriptions.  I had already made web forms for adding, updating, and deactivating dotors and surgery codes.  Luckily, these tables aren't changing much (if at all), so the associated PHP and HTML shouldn't change much for those pages either.

The problem child is the third table, which is where the actual surgeries are tracked.  The idea is the user would select a doctor and code from lists, add the date, and save it.  This is currently being done with an excel spreadsheet, which becomes a horrendous mess at the end of the year when it comes time to total everything up.  Doctors will do different surgeries in a given month, which makes it almost impossible for a normal person to maintain continuity between the different months, which are all represented as different worksheets in the spreadsheet.  You either have to go in and recreate every sum function by hand to accommodate for the different locations of a code's cell, or backfill all of the previous months when a new code is added.

My initial thought was to track each surgery separately.  So if a doctor performed 8 of a given surgery in a month, there would be 8 entries in the database.  For some reason that seemed like a good idea at the time.  When it came to implement it, though, I ran into problems.  The first is user interface.  I don't want to make the user enter each surgery one at a time.  These tend to happen in groups, so it seemed reasonable to let the user enter them as groups.

I made a form that required the user to enter how many different surgery codes were used, then a secondary form to select the code from a list, and then enter the number of procedures done that session.  My plan was then to break these apart into their consituent pairs, and loop through them to add each one the appropriate x number of times to the database.

The first problem became glaringly obvious as I started to mess around with the SQL for these commands.  Somehow, between me listing the requirements in my pre-work planning, and actually implementing the database, I forgot to add the date!  There was no date field.  Now that's just embarassing.  However, this being so early in the process still, it was an easy thing to remedy.

I'm fortunate in this project that they really only care the month and year.  Specific days are not important at this time.  So I really just need to manage the year and month.  Maybe it's just that I'm new at all this still, but messing with the timestamps just seems daunting.  It's probably a better way to do this, but I think just using numbers and validating the user input will suffice for now.

The second problem that I started to notice was that I didn't really have a good primary key for this surgery table.  No mix of doctor, code, and time would be gauranteed unique.  In fact, it seemed to be my goal to make it the opposite.  I didn't like the idea of an arbitrary ID field, but I saw now way around it.

The third problem came in the implementation.  It seemed the only way to loop through each group of surgeries was to use nested <code>foreach</code> statements, or nested <code>while</code>s, or both. That thought seemed a bit daunting.  It also started to dawn on me that I would be using a lot of processing power and database writes to accomplish this, and to what end? As I got closer to the actualy implementation, it seemed that I had chosen the most inefficient way to do this possible.

That's where I am now.  I need to rethink the database implementation, and rewrite all of the front end stuff I did before I hit this road block.  I'll write another update when I've made some progress.

Monday
May062013

Surgery Tracking

I've been taking a more active interest in web development lately, and to that end I've started working on a project in my spare time that might eventually be adapted for work.  I run into several situations at work where it strikes me that a database would be an ideal solution, but there's never any time to work on it.  I find I learn best when actually doing something, so I thought I'd try working on this at home, and adapt it to work if it goes anywhere.

Basically, the situation is that there's someone at work who has to keep track of which surgeries a doctor does.  It's an unofficial count, but needs to be accurate.  Currently, the employee tracking it is using a spreadsheet with a few summing functions.  It works fine for month to month, but it's a nightmare when it comes to creating yearly summaries.  Why?  Because the list isn't static.  Between changes to the coding standards, new doctors, and the fact that some things just change over time, there seem to be at least one new surgery code added each month.  The employee tracking it each month is not familiar enough with Excel to confidently change and add functions, so it usually falls to me to go back and check all the functions and see if the data looks right.

I don't mind helping her, but there has to be a better way to do this.  Why not make a database and update it through forms?  This may be a bit more cumbersome in terms of data entry, but it will more than make up for that in accuracy and ease of use when generating reports.

I'm not very far along yet.  I have the database set up with referential integrity for the doctors and surgery codes to mitigate data entry errors, and I have the basics of the surgery entry form.  The tricky part right now is figuring out a good UI for the system.  The easiest way would be to enter one surgery per doctor, but that will make it very cumbersome for the poor tech who has to enter this data.  My next idea was to ask them to select how many doctor/surgery pairs they want to enter, then creat that many input forms.

The other idea I had was to have them select a doctor along with how many surgery types they wanted to enter at once.  This way they could input everything for a given doctor at once.  That might make more logical sense for the tech, and it would eliminate the problem of having to select the doctor each time when making multiple entries.  Ultimately, I will talk to the tech and see which sounds like the better plan.

I should probably do that soon, as I can't really proceed with the form creation or database entry elements until I know those details.  There's still a lot to do, but I like having a project.  Hoepfully this goes somewhere.

 

Saturday
May082010

Databases

I'm currently enrolled in a database class.  It's coming to a close soon, but I found it surprisingly interesting.  For one, it's hard to conceive of how much information is stored in databases just in this country.  I would wager just about every business with more than 1 employee uses a database for something.  Customer information, employee information, payroll.  I heard somewhere that the average American is in over 70,000 databases.  I scoffed when I first heard that, but now I'm surprised it's not more.

Learning about databases has been interesting.  For me, the subject tended to range from fascinating to painfully boring.  It still amazes me how much power there is in SQL statements.  Who would think that you could manipulate all that data with a few well-worded statements?  The boring part has been spending what feels like a lot of time on basic concepts.  That's probably not fair, though.  I tried to teach myself a bit about databases before taking the class, and I think I have the type of weird brain that gets things like this.

One thing I'm finding a little frustrating is that I now know the basics on how to build and use a database, but there's no real instruction on how to utilize it for anything.  I don't know how I would tie it in with an application, or a web page, or even how to set up professional database software.  It's like there's a weird gap of knowledge between using a small, personal-type database system and using a professional one.  The pro tools expect that if you're looking to use them, you already know how to set up and configure them.  Maybe that will come later, but for now I find it frustrating.