cros_flash: Use oflag=direct in dd for better performance

Update the dd and sync command in CopyImageToDevice.

Use `oflag=direct` to bypass system buffer, which saves 5-10 seconds.
With `conv=fdatasync`, dd does sync before finish to ensure data physically
written to the device. We don't use `conv=fsync` because we don't need the
metadata from the original image file.

With `sync -d device`, we sync only file data to the specific device instead
of syncing the whole system to prevent halting for other I/O to finish.

BUG=None
TEST=`cros flash usb:// ...`

Change-Id: Id771a67c7e92e13b930859c81bc3c429e0ccd562
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1673768
Tested-by: Frank Huang <frankbozar@chromium.org>
Commit-Queue: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>
diff --git a/cli/flash.py b/cli/flash.py
index 5fd0c87..385b01a 100644
--- a/cli/flash.py
+++ b/cli/flash.py
@@ -226,7 +226,7 @@
       device: Device to copy to.
     """
     cmd = ['dd', 'if=%s' % image, 'of=%s' % device, 'bs=4M', 'iflag=fullblock',
-           'oflag=sync']
+           'oflag=direct', 'conv=fdatasync']
     if logging.getLogger().getEffectiveLevel() <= logging.NOTICE:
       op = UsbImagerOperation(image)
       op.Run(cros_build_lib.SudoRunCommand, cmd, debug_level=logging.NOTICE,
@@ -244,7 +244,10 @@
                                   error_code_ok=True,
                                   debug_level=self.debug_level)
 
-    cros_build_lib.SudoRunCommand(['sync'], debug_level=self.debug_level)
+    cros_build_lib.SudoRunCommand(['partx', '-u', device],
+                                  debug_level=self.debug_level)
+    cros_build_lib.SudoRunCommand(['sync', '-d', device],
+                                  debug_level=self.debug_level)
 
   def _GetImagePath(self):
     """Returns the image path to use."""