Netflix best practices when upgrading from Oracle database on Amazon SimpleDB
This is a partial translation of the article Netflix, for only the problematic places of transition c Oracle database on Amazon SimpleDB and their solutions company.
Somewhere in late 2008, Netflix had only one datacenter. And this DTS has put before us some questions. As a single point of failure, for example, due to problems with electricity, it could lead to dissatisfaction of our users. In addition, the simultaneous growth of traffic streaming and subscription services, Netflix would soon outgrew this data center we saw the inevitable demand for electricity, better cooling, we needed more space and more equipment.
Alternatively, it was possible to build new data centers. However, besides high costs, this effort would lead to the fact that our technical staff would be unable to engage in new products would be busy with the extension DTS. In addition, we understand that managing multiple data centers is a complex task. Construction and support of multiple data centers seemed like a dangerous distraction from our core business.
Instead of having to go down this path, we chose more radical. We switched to the solution to IAAS (infrastructure as a service) offered at the time, the web Amazon. With many existing data centers, multiple level of redundancy of the various services (such as S3 and SimpleDB), AWS promised better availability and scalability in a relatively short time.
Giving various network and background tasks in outsourcing, Netflix has focused on its core business, the supply of films and TV series.
In the process of moving to AWS, we have formulated a set of best practices for work AP systems such as SimpleDB.
Partial or no SQL support. Generally, SimpleDB supports podmnozhestva SQL
the
the Lack of connections between domains
the
No transaction
the
No triggers
the
No support schemes – besides, it's not obvious. Request with invalid attribute name leads to an error
the
the Lack of support for the sequence
Sequences are often used as primary keys.
the
Sequences are also often used for order numbers.
the
No operations for working with time
the
No support restrictions, in particular no uniqueness constraints fields no control foreign key, no constraints.
the
In addition, there were features with which we are faced that are specific to SimpleDB. Here are some of them:
SimpleDB domains provide the maximum write speed if you split the data into multiple domains.
Article based on information from habrahabr.ru
Somewhere in late 2008, Netflix had only one datacenter. And this DTS has put before us some questions. As a single point of failure, for example, due to problems with electricity, it could lead to dissatisfaction of our users. In addition, the simultaneous growth of traffic streaming and subscription services, Netflix would soon outgrew this data center we saw the inevitable demand for electricity, better cooling, we needed more space and more equipment.
Alternatively, it was possible to build new data centers. However, besides high costs, this effort would lead to the fact that our technical staff would be unable to engage in new products would be busy with the extension DTS. In addition, we understand that managing multiple data centers is a complex task. Construction and support of multiple data centers seemed like a dangerous distraction from our core business.
Instead of having to go down this path, we chose more radical. We switched to the solution to IAAS (infrastructure as a service) offered at the time, the web Amazon. With many existing data centers, multiple level of redundancy of the various services (such as S3 and SimpleDB), AWS promised better availability and scalability in a relatively short time.
Giving various network and background tasks in outsourcing, Netflix has focused on its core business, the supply of films and TV series.
In the process of moving to AWS, we have formulated a set of best practices for work AP systems such as SimpleDB.
Leaving behind DBMS
Partial or no SQL support. Generally, SimpleDB supports podmnozhestva SQL
the
-
the
- , Use GROUP BY and JOIN operations on the application level.
One way to avoid having to use denormalizer to JOIN multiple tables into a single logical SimpleDB domain
the Lack of connections between domains
the
-
the
- to Realize the communication at the application level
No transaction
the
-
the
- API Use SimpleDB: Conditional ConditionalPut and Delete
No triggers
the
-
the
- it is Possible to do without them
No support schemes – besides, it's not obvious. Request with invalid attribute name leads to an error
the
Implementation of schema validation on the data access layer of the application.
the Lack of support for the sequence
Sequences are often used as primary keys.
the
-
the
- In this case, you must use a natural unique key, for example, in the domain of customer contacts to use the customer's mobile phone as the key. the
- If there is no natural key, you need to use the UUID.
Sequences are also often used for order numbers.
the
-
the
- Use a distributed sequence generator.
No operations for working with time
the
-
the
- without them
No support restrictions, in particular no uniqueness constraints fields no control foreign key, no constraints.
the
-
the
- the Application can check constraints at the time of reading the data and correct the problem after the fact. It is called recovery when read (repair). Recovering when reading should use ConditionalPut or ConditionalDelete the API for changes to be atomic.
New challenges in SimpleDB
In addition, there were features with which we are faced that are specific to SimpleDB. Here are some of them:
SimpleDB domains provide the maximum write speed if you split the data into multiple domains.
the
the - All the data Netflix with significant write load have been distributed on multiple domains.
No support for native data types. All data is stored and processed as strings.
All comparisons (i.e. WHERE clause conditions) and sorts only happens with strings.
the
the - to Store all dates in ISO-8601
the - to Add zeros in front of numbers that are used when sorting and/or WHERE comparison.
Two separate API call to DeleteAtributes and PutAttributes.
How to perform an atomic operation requires the removal of a single attribute, and update another attribute in the same line?
the
the simplest option is to use a pseudo zeros (for example, the word NULL) instead of the DeleteAttributes operation.
the
- It negates optimization of free space tables SimpleDB and leads to data bloat.
Use case sensitive names of domains and attributes
In many DBMS table and column names are case insensitive, and SimpleDB on the contrary, in this connection, the operations put, delete and select statements can operate correctly even without the error. The task of the programmer to detect any inconsistencies between the register names.
the
the - to Accept the agreement, which you can use only the names of domains and attributes in upper case.
the - In the application-level data access it is necessary to provide automatic conversion to upper case.
Operations misspellings in the names of attributes to select, put, or delete can be completed without error notification.
In contrast sensitivity case, this problem occurs due to the lack of schema validation. Simple DB is not only sparse database, but also without support schemes.
the
the - to Introduce into the application a single point of data access layer and implement test there.
If you forget to specify a LIMIT in a select, it may take several requests to get all data.
Multiple queries reduces the likelihood that the site will get all the data for the desired time interval.
the
the - Name a single point of data access at the application level, you can set the LIMIT directly in it.
the - Maximum value may at any moment be changed if Amazon increases the limit.
you also Need to bear in mind the problem of "integrity finally" (eventual consistency).
you Should avoid an anti-pattern reading directly after writing.
the
the - avoid reading directly after recording.
the - If this is not possible, use ConsistentRead
non-indexed queries can be very expensive
Anti-pattern: SELECT * FROM MY_DOMAIN WHERE MY_ATTR_1 IS NULL
the
the - Use a separate flag attribute with a string value TRUE or FALSE instead of checking for NULL. Thus, the query will use the index: SELECT * FROM MY_DOMAIN WHERE MY_ATTR_1 = ‘FALSE’
Some queries are slow even though indexed and.
Selectivity of the index affects the speed as well as in other SQL engines
the
the - As in other database query performance for data sampling is determined by the selectivity of the indexes specified in the WHERE clause. Make sure that you understand the selectivity of your indexes and that your WHERE clause expression contains the best of them.
As in any other multi-user system can see a large jump in response time.
the
the - Protect the app from surges arising in SimpleDB or S3, the front-end caching, such as MemCached
Deciding to move to SimpleDB and S3, Netflix very quickly migrated to a cloud infrastructure, ready, meet new aggressive plans for launch products and traffic growth. There were problems but we dealt with almost all.
the simplest option is to use a pseudo zeros (for example, the word NULL) instead of the DeleteAttributes operation. the
Multiple queries reduces the likelihood that the site will get all the data for the desired time interval.
the
-
the
- Name a single point of data access at the application level, you can set the LIMIT directly in it. the
- Maximum value may at any moment be changed if Amazon increases the limit.
you also Need to bear in mind the problem of "integrity finally" (eventual consistency).
you Should avoid an anti-pattern reading directly after writing.
the
-
the
- avoid reading directly after recording. the
- If this is not possible, use ConsistentRead
non-indexed queries can be very expensive
Anti-pattern: SELECT * FROM MY_DOMAIN WHERE MY_ATTR_1 IS NULL
the
-
the
- Use a separate flag attribute with a string value TRUE or FALSE instead of checking for NULL. Thus, the query will use the index: SELECT * FROM MY_DOMAIN WHERE MY_ATTR_1 = ‘FALSE’
Some queries are slow even though indexed and.
Selectivity of the index affects the speed as well as in other SQL engines
the
-
the
- As in other database query performance for data sampling is determined by the selectivity of the indexes specified in the WHERE clause. Make sure that you understand the selectivity of your indexes and that your WHERE clause expression contains the best of them.
As in any other multi-user system can see a large jump in response time.
the
-
the
- Protect the app from surges arising in SimpleDB or S3, the front-end caching, such as MemCached
Deciding to move to SimpleDB and S3, Netflix very quickly migrated to a cloud infrastructure, ready, meet new aggressive plans for launch products and traffic growth. There were problems but we dealt with almost all.
Comments
Post a Comment