SILK SDS Testing In The ATC
Summary
We worked inside the Advanced Technology Center (or ATC) directly with Silk architects who had a chance to bless the setup and help with the baseline performance testing. Then, as a trusted advisor to our customer, we (the ATC Lab Services team) walked the array through the paces of testing in the Execution Phase of the POC.
ATC Insight
The series of tests that were performed on the Silk storage platform was broken into 3 major sections:
- Baseline Performance Testing
Gathered a baseline of performance data by running five VDBench jobs for a 30-minute iteration and then recorded and captured the data.
- Functionality Testing
Gathered required evidence on management form and functionality for the solution.
- Resiliency Testing
Lastly, tested the resiliency of the platform by introducing hardware failures while replication was taking place with the same VDBench job was running in the background.
Last impressions and thoughts are listed in the conclusion at the end of this ATC Insight.
Hardware and software consisted of the following components:
- 4x Dell Poweredge R640's - were used for the compute resource
- Cisco 9148 switches - for 16GB FC storage traffic
- 5x Silk K8000 compute/storage nodes
- 1x Silk management node
**Testing on the SDS solution was performed in October 2020 and into November 2020.
High level design of the physical environment is depicted below:
Performance Testing
The Silk solution was filled to 50% capacity at the request of the client. The array had the same number of Volumes/LUNs carved out and presented for performance testing on each VM in each cluster. This is a standard used for all VDBench testing.
The client provided us with the performance testing requirements of which is documented below:
Max IOPs, 4K Block, 70% Read, 30% Write
Average Latency | 2.36 ms |
Average IOP's | 950.61 K |
Average Throughput | 3.89 GB/s |
Max IOPs, 8K Block, 30% Read, 70% Write
Average Latency | 4.04 ms |
Average IOP's | 509.16 K |
Average Throughput | 4.17 GB/s |
Max IOPs, 8K Block, 90% Read, 10% Write
Average Latency | 2.74 ms |
Average IOP's | 774.75 K |
Average Throughput | 6.35 GB/s |
Max IOPs, 32K Block, 70% Read, 30% Write
Average Latency | 5.04 ms |
Average IOP's | 406.18 K |
Average Throughput | 13.31 GB/s |
Max IOPs, 1MB Block, 70% Read, 30% Write
Average Latency | 379.52 ms |
Average IOP's | 5.44 K |
Average Throughput | 5.70 GB/s |
Functionality Testing
The client had asked us to go through different aspects of functionality for each solution. This was a combination of QoS testing, performance monitoring, efficiency reporting, snapshot capabilities, and overall general management of the solution. For this ATC Insight, we focused on the QoS testing (if capable), performance monitoring, and snapshot capabilities. If there is further interest in the other tests we performed, please feel to reach out to the author of this ATC Insight.
Live Performance Monitoring
The client required that the solution would alert to an abnormality ion health when an instance would happen, for this, we ran a VDBench job workload of 25K IOPs, 4K block, 50% read and 50% write. After the workload ran for a period of time a FC port was off-lined from the switch layer, the resulting impact that we saw in VDBench was also seen from the Silk management interface.
Snapshot Capabilities
This test revolved around a future feature request from the client, teams were starting to look at the benefits of snapshots in their environment and the client wanted to see what functionality could be garnished from each solution. The ask from the client was as follows:
- Manually create a snapshot
- Create a snap of the previous snapshot
- Present snapshot to host and actively use
- Create a consistency group snapshot
- Create snapshot schedule with retention rules
Resiliency Testing
The final round of testing was based on hardware resiliency. For each of the tests, a baseline VDBench job would run 50K IOPs, 4K block, 50% read and 50% write. The tests consisted of the following:
- Power leg pull from one node
- Connectivity leg pull from one node
- Drive removal from one node, if no impact, pull the drive from subsequent node until impact
Test Tools
VDBench
VDBench is an I/O workload generator for measuring storage performance and verifying the data integrity of direct-attached and network-connected storage. The software is known to run on several operating platforms. It is an open-source tool from Oracle. Visit VDBench Wiki for more information.
Graphite/Grafana
A Graphical User Interface (or GUI) that we use in the Advanced Technology Center (or ATC) to visually depict the results data that we derive in our compute and storage lab efforts. Visit Grafana for more information.
Last Impressions and Thoughts
- This was the first time the ATC was able to vet out the Silk solution and I must say I was very impressed.
- The interface was very intuitive to the point I didn't need help from Silk to understand any of the management functionality.
- Support handles all upgrades which is a nice added benefit for peace of mind.
- There are some feature developments still happening which will make this a great competitor.
- The addition of Silk Clarity is an added bonus for even more insight into the solution.