Along with announcing Redshift AWS also released two new virtual machine instances types meant to work with Redshift, including an XL instance that has 2TB of local storage, and an 8XL instance type with 16TB of storage. AWS has partnered with database analysis company ParAccel to architect Redshift after Amazon.com, AWS's parent company, invested in the company last year. Like traditional on-premise data warehouses, Redshift can be architected to, for example, integrate data from Amazon's DynamoDB NoSQL database, Simple Storage Service (S3), or from existing applications on customer's own premises. Redshift is a repository for the data for it to be exposed to business analytics tools that run reports on it.
"I think there will definitely be some interest" for Redshift, says Kelly, the Wikibon researcher. "One issue with data warehousing is many times this is highly critical, proprietary information that some may be reluctant to ship off to a cloud provider." For organizations with data that is siloed, has variable demands, or for companies that don't have the on-premise infrastructure to manage data warehousing, it could be an attractive option, though. "If you're already doing data management in the cloud, and particularly Amazon's cloud, this seems like an opportunity to take advantage of a new service," he says.
One of the biggest challenges with data warehousing in the cloud is how the data is transferred up into AWS's cloud. Pumping terabytes, or even petabytes, of data up into AWS's cloud over the public Internet can not only come with security concerns, but will eat up bandwidth. AWS does have connections with third-party provider sites, like Equinix, for direct connections to its cloud. And AWS officials say sending data on physical disks via a shipping service is a common way customers get data into and out of AWS's cloud.
Of course, data migration to the cloud is not as much of a problem if the data is already in AWS's cloud, which is the case for many startups that have gone all in on AWS's services thus far. AWS released Data Pipeline on the second day of the conference to help manage the transfer of data all around AWS's cloud using 10 gigabit connections. But many businesses with a lot of data already have a data warehouse, so perhaps an enterprise may test out Redshift for new data warehousing, but sensitive information about company such as financial reports or personally identifiable information of customers may not make it up there any time soon, Kelly suggests.