Archive for sql

Amazon RDS provides Relational Database Scalability

// November 3rd, 2009 // 1 Comment » // amazon-ec2, cloud computing, mysql, sql

I was very excited to hear that Amazon added a service for Relational Databases for the Cloud. It does make my life easier. One of the most important properties of the Cloud is scalability, and it is one of the most sought properties for companies with huge data flow. Using Amazon EC2 IaaS, makes it easy and flexible to scale your application, only if it is built or architected to be scalable.

There is no free lunch when it comes to scalability in the cloud. In order to take advantage of the flexibility and scalability of the Cloud, you need to architect your application in a way that it can be scaled. What Amazon provides is the infrastructure, but the application must be able to take advantage of that.

Now, this is not simple task, but it is achievable and most EC2 users are doing it now. But scaling the database is not as simple, even if you have the source code, like in the case of MySQL.

For example, let’s think about a SaaS application that is built with fine grained services which can all be scaled using Amazon’s Load Balancer. The application depends on an MySQL Server tuned for high loads sitting on an image in Amazon EC2. The services works fine most of the year, but in December, the load pumps up enough to make MySQL suffer.

Now this app is already in the cloud, so our fine grained services are scaled as needed, but the database, has become our bottleneck. What can we do? Well, there are several common options, which are not specific to the cloud and can be combined:

Now, everybody knows this works, but it comes with a cost in maintenance;  if it works in an on-premise data center it should work in the cloud. The problem with this approach, is that it doesn’t take full advantage of Cloud Scalability in Amazon EC2. The only way to take advantage of EC2’s scalability would be to actually make MySQL natively scalable, like we did with our application. Which is a huge effort, and sometimes not worth it.

Of course, Amazon reps will tell you to use SimpleDB, which I wouldn’t recommend for any mission-critical work due to the consistency issue. SimpleDB relaxes consistency for the sake of  (limited) performance, and also it doesn’t have great performance with huge datasets, which is the norm in enterprise applications. (For more information on SimpleDB’s pros and cons look here.)

Now with the new Amazon RDS, we can use a relational database that is supported by popular languages/frameworks and that can take advantage of the scalability in the cloud (specifically Amazon EC2). We just create our database using the Amazon RDS API, and point our applciation to it. If we need to scale, we can do it using the API, taking full advantage of EC2. When the data flow lowers again, we can easily release the resources using the API.

How did Amazon accomplish this? Well, it is too early to know. There are not a lot of sources of information regarding Amazon’s RDS architecture. But as it looks right now it seems like an API in front of a set of mysql databases sitting in Amazon’s EC2 images, maybe using a combination of the techniques explained before. What Amazon is aiming to provide with this new service is a database server that is easy to use, maintain and scale. Which I think is a pretty cool.

I will post more information regarding RDS after I play around with it more.

Post to Twitter Tweet This Post

SSIS: How to add a timestamp to a Excel File output path.

// September 22nd, 2008 // 6 Comments » // SQL Server, Systems Integration, sql

I have been working with SSIS lately and I have been having some troubles adding a timestamp to the output excel file name. I have found several sources on how to do it, the best one (except for the video’s size which is too small) is: tutorial.

It did help, but it wasn’t what I wanted, because you have to hardcode the file path in the expression, making the developer or deployer hard code the path in two places. And I didn’t want the deployment team to add the path into the actual expression box, any mistakes or typos and that might break the package. So I wanted to extract the actual file path, from the timestamp generation expression.

So this is my solution:

1) Create a Package-wide variable called OutputFilePath

2) Set the Variable’s Value to the expected file path without the timestamp, like a template.

3) Go into the Excel data source and enter the following expression, linking the expression to the ExcelFilePath property:

REPLACE(@OutputFilePath, “.xls”, “_”+ REPLACE((DT_WSTR, 10)(DT_DBDATE)GETDATE(), “-”,”")+REPLACE((DT_WSTR, 10)(DT_DBTIME)GETDATE(), “:”,”")+”.xls”)

4) Run your package, and test.

Voila!

Post to Twitter Tweet This Post

Stored Procedures vs Embedded Queries

// January 18th, 2007 // No Comments » // software development, sql

I have been involved in a lot of discussions lately on whether to use Stored Procs or Embedded Sql Queries in layered applications. I have always supported Stored Procs over Embedded Queries. The main reason is separation because it provides a medium to extract logic from the layers. I know in some cases there is even a performance boost, but not enough to notice. I have always thought of Stored Procedures as a Logic Service, belonging of course to the BLL (Business Logic Layer) not the DAL (Data Access Layer).

People often get frustrated when I say this, but separating logic into web services (or any other kind of service) is the same as separating Logic into stored procedures. The real truth about it, is that it has the same idea. Stored procedures are specialized Services which provide specific logic.

There is an ongoing discussion, which I find very interesting on Microsoft forums.

Post to Twitter Tweet This Post