Finding elements is only half the job. You also need to interact with them — tap buttons, type in fields, scroll lists, and clear inputs. Mobile interactions have quirks that don't exist on the web: soft keyboards, touch targets that require scroll, and gesture coordinates that differ between devices. This lesson covers the actions you will use in nearly every test.
Tapping (clicking)
The simplest interaction is a tap. Appium uses the same click() method as Selenium:
driver.findElement(AppiumBy.accessibilityId("login_button")).click();On mobile, click() sends a tap event to the element's centre coordinates. This works for 95% of cases. The 5% where it fails:
- The element is off-screen (scroll first)
- The element is partially covered by another element (use a different scroll position)
- The element is not yet visible (add an explicit wait)
For a tap at specific coordinates (useful when no element exists to tap):
PointerInput finger = new PointerInput(PointerInput.Kind.TOUCH, "finger");
Sequence tapSequence = new Sequence(finger, 1)
.addAction(finger.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), 300, 600))
.addAction(finger.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(finger.createPointerUp(PointerInput.MouseButton.LEFT.asArg()));
driver.perform(List.of(tapSequence));Typing text
WebElement emailField = driver.findElement(AppiumBy.id("com.example:id/email_input"));
emailField.sendKeys("user@example.com");sendKeys() types into the focused element. If the element is not focused, Appium taps it first.
Clearing a field before typing:
emailField.clear();
emailField.sendKeys("new.user@example.com");clear() removes the current text. Always clear before typing if the field might already have content (e.g., in noReset test runs where previous test data persists).
Submitting with the keyboard:
On Android, tap the Enter/Return key using:
driver.pressKey(new KeyEvent(AndroidKey.ENTER));On iOS:
driver.findElement(AppiumBy.iOSNsPredicateString("type == 'XCUIElementTypeButton' AND label == 'Return'")).click();
// or
driver.executeScript("mobile: pressButton", Map.of("name", "return"));Dismissing the keyboard:
driver.hideKeyboard(); // works on both platformsOn Android you can also press Back:
driver.pressKey(new KeyEvent(AndroidKey.BACK));Scrolling with W3C Actions
The modern way to scroll in Appium 2.x uses W3C touch actions:
PointerInput finger = new PointerInput(PointerInput.Kind.TOUCH, "finger");
Sequence scroll = new Sequence(finger, 1)
.addAction(finger.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), 540, 1200))
.addAction(finger.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(finger.createPointerMove(Duration.ofMillis(600), PointerInput.Origin.viewport(), 540, 400))
.addAction(finger.createPointerUp(PointerInput.MouseButton.LEFT.asArg()));
driver.perform(List.of(scroll));This scrolls from y=1200 to y=400 (upward scroll reveals content below). Adjust x to the horizontal centre of the scrollable area, and adjust start/end y values to control scroll distance.
Scrolling with mobile: scroll (simpler)
On iOS, mobile: scroll is cleaner for common cases:
// Scroll down inside a specific element
driver.executeScript("mobile: scroll", Map.of(
"direction", "down",
"element", driver.findElement(AppiumBy.accessibilityId("product_list")).getAttribute("uid")
));On Android, use UIAutomator2's UiScrollable (covered in the locators chapter) or:
driver.executeScript("mobile: scrollGesture", Map.of(
"left", 100, "top", 200, "width", 900, "height", 1400,
"direction", "down",
"percent", 0.75
));Long press
PointerInput finger = new PointerInput(PointerInput.Kind.TOUCH, "finger");
WebElement element = driver.findElement(AppiumBy.accessibilityId("message_item"));
Point center = element.getRect().getPoint();
Sequence longPress = new Sequence(finger, 1)
.addAction(finger.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(),
center.getX() + element.getSize().getWidth() / 2,
center.getY() + element.getSize().getHeight() / 2))
.addAction(finger.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(new Pause(finger, Duration.ofSeconds(2))) // hold for 2 seconds
.addAction(finger.createPointerUp(PointerInput.MouseButton.LEFT.asArg()));
driver.perform(List.of(longPress));Double tap
Sequence doubleTap = new Sequence(finger, 1)
.addAction(finger.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), x, y))
.addAction(finger.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(finger.createPointerUp(PointerInput.MouseButton.LEFT.asArg()))
.addAction(new Pause(finger, Duration.ofMillis(100)))
.addAction(finger.createPointerDown(PointerInput.MouseButton.LEFT.asArg()))
.addAction(finger.createPointerUp(PointerInput.MouseButton.LEFT.asArg()));
driver.perform(List.of(doubleTap));Getting element information
WebElement button = driver.findElement(AppiumBy.accessibilityId("submit"));
// Text content
String text = button.getText();
// Visibility
boolean visible = button.isDisplayed();
// Interactability
boolean enabled = button.isEnabled();
// Dimensions and position
Dimension size = button.getSize(); // width, height
Point location = button.getLocation(); // x, y from top-left of screen
// All attributes at once
String contentDesc = button.getAttribute("content-desc"); // Android
String accessId = button.getAttribute("name"); // iOSTaking a screenshot
File screenshot = driver.getScreenshotAs(OutputType.FILE);
Files.copy(screenshot.toPath(), Path.of("screenshot.png"));Screenshots are most useful in @AfterMethod on test failure (shown in the first Android test lesson).